0% found this document useful (0 votes)
27 views34 pages

cst414 - Deep Learning

The document provides an introduction to neural networks, covering their structure, activation functions, and training processes such as backpropagation. It discusses various types of activation functions like Sigmoid, Tanh, ReLU, and Softmax, as well as practical issues in training such as overfitting and gradient problems. Additionally, it outlines loss functions and the concept of risk minimization in the context of neural network training.

Uploaded by

heisenberganaya1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views34 pages

cst414 - Deep Learning

The document provides an introduction to neural networks, covering their structure, activation functions, and training processes such as backpropagation. It discusses various types of activation functions like Sigmoid, Tanh, ReLU, and Softmax, as well as practical issues in training such as overfitting and gradient problems. Additionally, it outlines loss functions and the concept of risk minimization in the context of neural network training.

Uploaded by

heisenberganaya1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

MODULE 1

Introduction to Neural Networks



A method of computing, based on the interaction of multiple connected
processing elements.

A powerful technique to solve many real world problems.

The ability to learn from experience in order to improve their performances.

Ability to deal with incomplete information.

Input the network are represented by the mathematical symbol, Xn.

Each of these inputs are multiplied by a connection weight, Wn.
SUM = W1X1+W2X2+.......+WNXN

These products are simply summed, fed through the transfer function f() to
generate a result and then output.
Activation Functions

It is also known as Transfer Function. It can also be attached in between two
neural networks.

These are important for an ANN to learn and understand the complex
patterns.

It calculates the ‘weighted sum’ and adds direction and decides wheather to
‘fire’ a particular neuron or not.

The main purpose is to convert a input signal of a node in a ANN into an
output signal. That output signal now is used as a input in the next layer in
the stack.
Types of Activation Functions
1) Sigmoid Function
• It transforms its input into an output in the range between 0 and 1.

Sigmoid Function Formula:

Where:
• σ(x) is the output of the sigmoid function.
• x is the weighted sum of inputs in a neuron.
• e is the base of the natural logarithm (~2.718).
2) Tanh (Hyperbolic Tangent)
• It is similar to the Sigmoid function, but with a key difference: it maps input values to
a range between -1 and 1, rather than between 0 and 1.
Tanh Function Formula:

Where,
• tanh(x) is the output of the Tanh function.
• x is the weighted sum of inputs to a neuron.
• e is the base of the natural logarithm (~2.718).
3) ReLU (Rectified Linear Unit)

One of the most commonly used activation functions in modern deep learning
models.

It is simple, computationally efficient, and helps address some of the
limitations of older activation functions like Sigmoid and Tanh.


ReLU sets all negative input values to zero and leaves positive input values
unchanged.

This means that if the input x is greater than or equal to 0, the output is xx. If
the input is less than 0, the output is 0.
4) Softmax activation function

It widely used in the output layer of neural networks, particularly for
multi-class classification problems.

It converts raw network outputs (called logits) into a probability
distribution across multiple classes.

The main goal of the softmax function is to ensure that the output
values are interpretable as probabilities, meaning that they are all
between 0 and 1 and sum to 1.

Given an input vector z= [z1,z2,z3,....,zn] where n represents the number
of classes and zi represents the raw score (logit) for class i, the Softmax
function for each class is defined as:

where,

ezi is the exponential of the i-th logit.

The denominator is the sum of the exponentials of all logits.
Types of Activation Functions
Activation Range Common Key Key
Function Usecase Advantage Disdvantage

Sigmoid (0, 1) Binary Smooth output, Vanishing


classification interpretable as gradients, slow
probability training
Tanh (-1, 1) Hidden layers Zero-centered, Vanishing
better gradients gradients
than sigmoid
ReLU (0, ∞) Hidden layers Fast training, Dying ReLU
non-linear problem

Softmax (0, 1) Multi-class Outputs Not useful for


classification probabilities, binary
interpretable classification
Single layer perceptron neural networks

A perceptron is an artificial neuron in which the activation function is the
threshold function.

Basic building blocks of neural network
 Consider an artifial neuron having x1, x2,...,xn as the input signals ans
w1,w2,...,wn as the associated weights. Let w0 be some constant.
 The neuron is called a perceptron if the output of the neuron is given by
the following function:
Working of a perceptron:
a) All input xi are multiplied by weights wi.
b) Add all multiplied values and call it as weighted sum.
c) Apply the weighted sum to the activation function.
d) Output the result of the activated function.
Representational power of perceptron

A single perceptron can be used to represent many Boolean functions.

Eg: If we assume Boolean value of 1(true) and -1(false), then one way to
represent a ‘2’ input perceptron to implement AND function is to set weights
w0= -0.8, w1= w2= 0.5.

Representations of OR, NAND and NOR
 The functions x1 OR x2, x1 NAND x2 and x1 NOR x2 can also be
represented by perceptrons.
 Table shows the values to be assigned to the weights w0, w1, w2 for
getting these boolean functions.
Multi-layer perceptron neural networks

(MLP) consists of fully connected dense layers that transform input
data from one dimension to another.

It is called “multi-layer” because it contains an input layer, one or
more hidden layers, and an output layer.

The purpose of an MLP is to model complex relationships between
inputs and outputs.
Schematic representation of multi-layer neural network
Backpropagation in MLPs

MLP uses backpropagation for training the network.

Backpropagation is a technique used to optimize the weights of an
MLP using the outputs as inputs.

In a conventional MLP, random weights are assigned to all the
connections. These random weights propagate values through the
network to produce the actual output. Naturally, this output would
differ from the expected output. The diffrence between the two values
is called the Error.

Backpropagation refers to the proces of sending this error back through
the network, readjusting the weights automatically so that eventually,
the error between the actual and expected output is minimized.

This continuous adjustment of the weights is a supervised learning
process called Backpropagation.

This is repeated until the correct output is produced.
The backpropagation algorithm consists of two parts:
1) Forward Pass
2) Backward Pass
Forward Pass

It computes the output of the neural network for a given input by passing the
input through each layer of the network.
Let’s consider a simple MLP with one hidden layer:

Input Layer: x

Weights and Biases:
• W1: weights for the input layer to hidden layer
• b1: biases for the hidden layer
• W2: weights for the hidden layer to output layer
• b2: biases for the output layer
The operations during forward pass are as follows:
1) Input to Hidden Layer: The input x is passed to the hidden layer through
weights W1 and biases b1. The output of the hidden layer h is:
z1 = w1x + b1
The activation function f such as (ReLU, sigmoid,or tanh) is applied
element-wise to z1 to produce the hidden layer output:
h = f (z1)
2) Hidden to Output Layer: The hidden layer output h is passed to the output
layer through weights W2 and biases b2. The output ypred is:
z2 = w2h + b2
The final prediction (after applying an activation function) is:
y pred = f out(z2)
Backward Pass

The partial derivatives of the cost function with respects to the
different parameters are propagated back through the network.

The process continues until the error is at the lowest value.
The technique used to determine how much a weight can be changed
is known as Gradient descent method.
The Backpropagation Algorithm

The algorithm starts when an input vector x is entered into the network. This
input moves from the input layer through the hidden layers to the output
layer and produces an output y.

The loss function is used to compare this output with the expected output to
give the error value. The error value is calculated for each of the neurons in
the output layer.

These error values are then propagated backwards from the output layer,
through the network.

Backpropagation uses these error values to caculate the gradient of the loss
function as they are move back through the network.

This gradient, is then used to update the weights of the nodes. The process is
repeated again to get another output based on the updated weights.

The process of backpropagation is repeated until the error function produces
a minimum value.
Practical issues in neural network training
1) Overfitting
• It occurs when a model learns the details and noise in the training data to an
extent that it negatively impacts the model’s performance
Symptoms of Overfitting:
i. Good performance on training data but poor performance on validation or
test data.
ii. Large difference between training error and validation error.
Causes of Overfitting in Neural Networks:
i. Too Complex Model (High Capacity)
ii. Insufficient Training Data
iii.Too Many Training Epochs
iv. Noisy Data
Methods to Prevent Overfitting:
1) Early Stopping: It involves monitoring the model’s performance on a
validation dataset during training and stopping the training process when
performance on the validation data begins to degrade.
2) Regularization: Regularization methods add a penalty to the loss function used
during training, discouraging the model from fitting the noise in the data.
3) Dropout: It is a regularization technique where, during each training step, a
random subset of neurons is "dropped" or ignored. This forces the model to learn
more robust features and prevents it from relying too heavily on any single neuron.
4) Reducing Model Complexity: Overfitting often occurs when a model is too
complex for the available data. Reducing the number of layers or neurons
(parameters) in the model can help mitigate overfitting.
5) Cross-validation: It is a technique where the training dataset is divided into
several subsets (folds). The model is trained on some of the folds and tested on the
remaining folds, with each fold used as a test set once. The performance is averaged
over all the folds.
2) Vanishing and Exploding gradient
Vanishing Gradient Problem:

It occurs when the gradients of the loss function become extremely small as
they are propagated back through the layers during training.

This can cause the weights in the earlier layers of the network to update very
slowly.
Causes:

This problem often arises when using activation functions like the sigmoid or
tanh functions. These functions squash their inputs into a small range. When the
input values to these functions are very large or very small, the derivatives of
these functions become very small, leading to vanishing gradients.
Exploding Gradient Problem:

It occurs when the gradients become excessively large during backpropagation,
causing the network weights to become very large.

This makes the optimization process unstable, and the model may fail to
converge.
Causes:

Deep Network Architecture

Large Initial Weights

Activation Functions with Large Derivatives
Loss function

A loss function measures how well the neural network's predictions match
the true outputs. It quantifies the error or discrepancy between the predicted
output and the actual target.

The goal of training a neural network is to minimize this error by adjusting
the model’s parameters (weights and biases).
Types of Loss Functions:
1) Mean Squared Error (MSE) Loss: One of the most commonly used loss
functions in machine learning, particularly for regression tasks where the
goal is to predict continuous values.
Formula:

Where,

yi: Actual value

y^i: Predicted value

N: Number of data points
2) Cross-Entropy Loss: Used for classification problems, especially binary
or multi-class classification.
Formula:

Where,

y is the true label (0 or 1) and y^ is the predicted probability that the label is 1.
3) Hinge Loss: Used for Support vector machines (SVM) and binary
classification.
Formula:

Where,

y is the true label (either +1 or -1), and y^ is the predicted value.
Risk Minimization

Risk minimization refers to the process of minimizing the expected loss or risk
over all possible data points and distributions. The goal is to find the best model
parameters (weights and biases) such that the model’s predictions are as accurate
as possible on unseen data.
1) True Risk: The expected loss over the entire distribution of data, expressed as:

where:

P(x,y) is the joint probability distribution of inputs x and outputs y,

f(x) is the predicted output, and

L(f(x),y) is the loss function.
2) Empirical Risk: The average loss over the finite training dataset D =
{(xi,yi)} expressed as:

where N is the number of data points in the training set.

You might also like