cst414 - Deep Learning

The document provides an introduction to neural networks, covering their structure, activation functions, and training processes such as backpropagation. It discusses various types of activation functions like Sigmoid, Tanh, ReLU, and Softmax, as well as practical issues in training such as overfitting and gradient problems. Additionally, it outlines loss functions and the concept of risk minimization in the context of neural network training.

Uploaded by

heisenberganaya1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views34 pages

cst414 - Deep Learning

Uploaded by

heisenberganaya1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

MODULE 1

Introduction to Neural Networks

●
A method of computing, based on the interaction of multiple connected
processing elements.
●
A powerful technique to solve many real world problems.
●
The ability to learn from experience in order to improve their performances.
●
Ability to deal with incomplete information.
●
Input the network are represented by the mathematical symbol, Xn.
●
Each of these inputs are multiplied by a connection weight, Wn.
SUM = W1X1+W2X2+.......+WNXN
●
These products are simply summed, fed through the transfer function f() to
generate a result and then output.
Activation Functions
●
It is also known as Transfer Function. It can also be attached in between two
neural networks.
●
These are important for an ANN to learn and understand the complex
patterns.
●
It calculates the ‘weighted sum’ and adds direction and decides wheather to
‘fire’ a particular neuron or not.
●
The main purpose is to convert a input signal of a node in a ANN into an
output signal. That output signal now is used as a input in the next layer in
the stack.
Types of Activation Functions
1) Sigmoid Function
• It transforms its input into an output in the range between 0 and 1.

Sigmoid Function Formula:

Where:
• σ(x) is the output of the sigmoid function.
• x is the weighted sum of inputs in a neuron.
• e is the base of the natural logarithm (~2.718).
2) Tanh (Hyperbolic Tangent)
• It is similar to the Sigmoid function, but with a key difference: it maps input values to
a range between -1 and 1, rather than between 0 and 1.
Tanh Function Formula:

Where,
• tanh(x) is the output of the Tanh function.
• x is the weighted sum of inputs to a neuron.
• e is the base of the natural logarithm (~2.718).
3) ReLU (Rectified Linear Unit)
●
One of the most commonly used activation functions in modern deep learning
models.
●
It is simple, computationally efficient, and helps address some of the
limitations of older activation functions like Sigmoid and Tanh.

●
ReLU sets all negative input values to zero and leaves positive input values
unchanged.
●
This means that if the input x is greater than or equal to 0, the output is xx. If
the input is less than 0, the output is 0.
4) Softmax activation function
●
It widely used in the output layer of neural networks, particularly for
multi-class classification problems.
●
It converts raw network outputs (called logits) into a probability
distribution across multiple classes.
●
The main goal of the softmax function is to ensure that the output
values are interpretable as probabilities, meaning that they are all
between 0 and 1 and sum to 1.
●
Given an input vector z= [z1,z2,z3,....,zn] where n represents the number
of classes and zi represents the raw score (logit) for class i, the Softmax
function for each class is defined as:

where,
●
ezi is the exponential of the i-th logit.
●
The denominator is the sum of the exponentials of all logits.
Types of Activation Functions
Activation Range Common Key Key
Function Usecase Advantage Disdvantage

Sigmoid (0, 1) Binary Smooth output, Vanishing

classification interpretable as gradients, slow
probability training
Tanh (-1, 1) Hidden layers Zero-centered, Vanishing
better gradients gradients
than sigmoid
ReLU (0, ∞) Hidden layers Fast training, Dying ReLU
non-linear problem

Softmax (0, 1) Multi-class Outputs Not useful for

classification probabilities, binary
interpretable classification
Single layer perceptron neural networks
●
A perceptron is an artificial neuron in which the activation function is the
threshold function.
●
Basic building blocks of neural network
 Consider an artifial neuron having x1, x2,...,xn as the input signals ans
w1,w2,...,wn as the associated weights. Let w0 be some constant.
 The neuron is called a perceptron if the output of the neuron is given by
the following function:
Working of a perceptron:
a) All input xi are multiplied by weights wi.
b) Add all multiplied values and call it as weighted sum.
c) Apply the weighted sum to the activation function.
d) Output the result of the activated function.
Representational power of perceptron
●
A single perceptron can be used to represent many Boolean functions.
●
Eg: If we assume Boolean value of 1(true) and -1(false), then one way to
represent a ‘2’ input perceptron to implement AND function is to set weights
w0= -0.8, w1= w2= 0.5.
●
Representations of OR, NAND and NOR
 The functions x1 OR x2, x1 NAND x2 and x1 NOR x2 can also be
represented by perceptrons.
 Table shows the values to be assigned to the weights w0, w1, w2 for
getting these boolean functions.
Multi-layer perceptron neural networks
●
(MLP) consists of fully connected dense layers that transform input
data from one dimension to another.
●
It is called “multi-layer” because it contains an input layer, one or
more hidden layers, and an output layer.
●
The purpose of an MLP is to model complex relationships between
inputs and outputs.
Schematic representation of multi-layer neural network
Backpropagation in MLPs
●
MLP uses backpropagation for training the network.
●
Backpropagation is a technique used to optimize the weights of an
MLP using the outputs as inputs.
●
In a conventional MLP, random weights are assigned to all the
connections. These random weights propagate values through the
network to produce the actual output. Naturally, this output would
differ from the expected output. The diffrence between the two values
is called the Error.
●
Backpropagation refers to the proces of sending this error back through
the network, readjusting the weights automatically so that eventually,
the error between the actual and expected output is minimized.
●
This continuous adjustment of the weights is a supervised learning
process called Backpropagation.
●
This is repeated until the correct output is produced.
The backpropagation algorithm consists of two parts:
1) Forward Pass
2) Backward Pass
Forward Pass
●
It computes the output of the neural network for a given input by passing the
input through each layer of the network.
Let’s consider a simple MLP with one hidden layer:
●
Input Layer: x
●
Weights and Biases:
• W1: weights for the input layer to hidden layer
• b1: biases for the hidden layer
• W2: weights for the hidden layer to output layer
• b2: biases for the output layer
The operations during forward pass are as follows:
1) Input to Hidden Layer: The input x is passed to the hidden layer through
weights W1 and biases b1. The output of the hidden layer h is:
z1 = w1x + b1
The activation function f such as (ReLU, sigmoid,or tanh) is applied
element-wise to z1 to produce the hidden layer output:
h = f (z1)
2) Hidden to Output Layer: The hidden layer output h is passed to the output
layer through weights W2 and biases b2. The output ypred is:
z2 = w2h + b2
The final prediction (after applying an activation function) is:
y pred = f out(z2)
Backward Pass
●
The partial derivatives of the cost function with respects to the
different parameters are propagated back through the network.
●
The process continues until the error is at the lowest value.
The technique used to determine how much a weight can be changed
is known as Gradient descent method.
The Backpropagation Algorithm
●
The algorithm starts when an input vector x is entered into the network. This
input moves from the input layer through the hidden layers to the output
layer and produces an output y.
●
The loss function is used to compare this output with the expected output to
give the error value. The error value is calculated for each of the neurons in
the output layer.
●
These error values are then propagated backwards from the output layer,
through the network.
●
Backpropagation uses these error values to caculate the gradient of the loss
function as they are move back through the network.
●
This gradient, is then used to update the weights of the nodes. The process is
repeated again to get another output based on the updated weights.
●
The process of backpropagation is repeated until the error function produces
a minimum value.
Practical issues in neural network training
1) Overfitting
• It occurs when a model learns the details and noise in the training data to an
extent that it negatively impacts the model’s performance
Symptoms of Overfitting:
i. Good performance on training data but poor performance on validation or
test data.
ii. Large difference between training error and validation error.
Causes of Overfitting in Neural Networks:
i. Too Complex Model (High Capacity)
ii. Insufficient Training Data
iii.Too Many Training Epochs
iv. Noisy Data
Methods to Prevent Overfitting:
1) Early Stopping: It involves monitoring the model’s performance on a
validation dataset during training and stopping the training process when
performance on the validation data begins to degrade.
2) Regularization: Regularization methods add a penalty to the loss function used
during training, discouraging the model from fitting the noise in the data.
3) Dropout: It is a regularization technique where, during each training step, a
random subset of neurons is "dropped" or ignored. This forces the model to learn
more robust features and prevents it from relying too heavily on any single neuron.
4) Reducing Model Complexity: Overfitting often occurs when a model is too
complex for the available data. Reducing the number of layers or neurons
(parameters) in the model can help mitigate overfitting.
5) Cross-validation: It is a technique where the training dataset is divided into
several subsets (folds). The model is trained on some of the folds and tested on the
remaining folds, with each fold used as a test set once. The performance is averaged
over all the folds.
2) Vanishing and Exploding gradient
Vanishing Gradient Problem:
●
It occurs when the gradients of the loss function become extremely small as
they are propagated back through the layers during training.
●
This can cause the weights in the earlier layers of the network to update very
slowly.
Causes:
●
This problem often arises when using activation functions like the sigmoid or
tanh functions. These functions squash their inputs into a small range. When the
input values to these functions are very large or very small, the derivatives of
these functions become very small, leading to vanishing gradients.
Exploding Gradient Problem:
●
It occurs when the gradients become excessively large during backpropagation,
causing the network weights to become very large.
●
This makes the optimization process unstable, and the model may fail to
converge.
Causes:
●
Deep Network Architecture
●
Large Initial Weights
●
Activation Functions with Large Derivatives
Loss function
●
A loss function measures how well the neural network's predictions match
the true outputs. It quantifies the error or discrepancy between the predicted
output and the actual target.
●
The goal of training a neural network is to minimize this error by adjusting
the model’s parameters (weights and biases).
Types of Loss Functions:
1) Mean Squared Error (MSE) Loss: One of the most commonly used loss
functions in machine learning, particularly for regression tasks where the
goal is to predict continuous values.
Formula:

Where,
●
yi: Actual value
●
y^i: Predicted value
●
N: Number of data points
2) Cross-Entropy Loss: Used for classification problems, especially binary
or multi-class classification.
Formula:

Where,
●
y is the true label (0 or 1) and y^ is the predicted probability that the label is 1.
3) Hinge Loss: Used for Support vector machines (SVM) and binary
classification.
Formula:

Where,
●
y is the true label (either +1 or -1), and y^ is the predicted value.
Risk Minimization
●
Risk minimization refers to the process of minimizing the expected loss or risk
over all possible data points and distributions. The goal is to find the best model
parameters (weights and biases) such that the model’s predictions are as accurate
as possible on unseen data.
1) True Risk: The expected loss over the entire distribution of data, expressed as:

where:
●
P(x,y) is the joint probability distribution of inputs x and outputs y,
●
f(x) is the predicted output, and
●
L(f(x),y) is the loss function.
2) Empirical Risk: The average loss over the finite training dataset D =
{(xi,yi)} expressed as:

where N is the number of data points in the training set.

Complete Deep Learning Interview Question
No ratings yet
Complete Deep Learning Interview Question
46 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Module I
No ratings yet
Module I
109 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
35 pages
URTEC-2878 - Production Forecasting in Shale Reservoirs Using LSTM Method in Deep Learning
No ratings yet
URTEC-2878 - Production Forecasting in Shale Reservoirs Using LSTM Method in Deep Learning
60 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
2 DNN-CNN-RNN
100% (1)
2 DNN-CNN-RNN
87 pages
Industrial Training Report
No ratings yet
Industrial Training Report
26 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Neural Network
100% (1)
Neural Network
54 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
NN Notes
No ratings yet
NN Notes
39 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Varma Document 2
No ratings yet
Varma Document 2
27 pages
Slides 11
No ratings yet
Slides 11
48 pages
Class X Ai Study Material
No ratings yet
Class X Ai Study Material
40 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
College List
No ratings yet
College List
14 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Unit 3
No ratings yet
Unit 3
7 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Deep Learning Interview
No ratings yet
Deep Learning Interview
28 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Lecture 2
No ratings yet
Lecture 2
52 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
3rd Unit ML
No ratings yet
3rd Unit ML
7 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Unit 1
No ratings yet
Unit 1
72 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
45 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unit - 4 Artificial Neural Networks
No ratings yet
Unit - 4 Artificial Neural Networks
33 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Neural
No ratings yet
Neural
53 pages
Our Programmes: Iu International University
No ratings yet
Our Programmes: Iu International University
76 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Gen AI
No ratings yet
Gen AI
20 pages
Neural Network
No ratings yet
Neural Network
97 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Measuring Social Media Influencer Index
No ratings yet
Measuring Social Media Influencer Index
16 pages
Unit I
No ratings yet
Unit I
90 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
AI Project
No ratings yet
AI Project
20 pages
XXX Taffesdsse2017 XXX
No ratings yet
XXX Taffesdsse2017 XXX
14 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
15 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
BSC Final Project
No ratings yet
BSC Final Project
8 pages
Notes of Soft Computing
100% (1)
Notes of Soft Computing
2 pages
Data Analytics in Project Management Spalek Seweryn Instant Download
No ratings yet
Data Analytics in Project Management Spalek Seweryn Instant Download
85 pages
Unit 1 Business Intelligence, Decision Support System
No ratings yet
Unit 1 Business Intelligence, Decision Support System
16 pages
Stock-Price Forecasting Based On Xgboost and LSTM: Ech T Press Science
No ratings yet
Stock-Price Forecasting Based On Xgboost and LSTM: Ech T Press Science
10 pages
Data Science Classification Etc
No ratings yet
Data Science Classification Etc
19 pages
Intel Technology Journal
No ratings yet
Intel Technology Journal
14 pages
Prediction of Asteroid Diameter With The Help of Multi-Layer Perceptron Regressor
No ratings yet
Prediction of Asteroid Diameter With The Help of Multi-Layer Perceptron Regressor
5 pages
IMRAN Defence Presentation
No ratings yet
IMRAN Defence Presentation
20 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
3 pages
Effective Cement Demand Forecasting Using Deep Learning Technology: A Data-Driven Approach For Optimal Demand Forecasting
No ratings yet
Effective Cement Demand Forecasting Using Deep Learning Technology: A Data-Driven Approach For Optimal Demand Forecasting
8 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

cst414 - Deep Learning

Uploaded by

cst414 - Deep Learning

Uploaded by

MODULE 1

Introduction to Neural Networks

Sigmoid Function Formula:

Sigmoid (0, 1) Binary Smooth output, Vanishing

Softmax (0, 1) Multi-class Outputs Not useful for

where N is the number of data points in the training set.

You might also like