Neural Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

NEURAL NETWORKS AND DEEP LEARNING IT1701

UNIT 1: INTRODUCTION
Neural Networks-Application Scope of Neural Networks-Artificial Neural Network: An
Introduction – Perceptron Learning Algorithm - Activation Functions – Need for non-
linear activation functions – Chain Rule and Backpropagation – Deep Neural Networks
– Shallow vs Deep Networks

Neural Networks:
In the fast-evolving era of artificial intelligence, Deep Learning stands as a cornerstone
technology, revolutionizing how machines understand, learn, and interact with complex
data. At its essence, Deep Learning AI mimics the intricate neural networks of the human
brain, enabling computers to autonomously discover patterns and make decisions from
vast amounts of unstructured data. This transformative field has propelled breakthroughs
across various domains, from computer vision and natural language processing to
healthcare diagnostics and autonomous driving.
What is Deep Learning
The definition of Deep learning is that it is the branch of machine learning that is based on
artificial neural network architecture. An artificial neural network or ANN uses layers of
interconnected nodes called neurons that work together to process and learn from the
input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden
layers connected one after the other. Each neuron receives input from the previous layer
neurons or the input layer. The output of one neuron becomes the input to other neurons
in the next layer of the network, and this process continues until the final layer produces
the output of the network. The layers of the neural network transform the input data
through a series of nonlinear transformations, allowing the network to learn complex
representations of the input data.

Fig:1 Deep Learning -Out figure


NEURAL NETWORKS AND DEEP LEARNING IT1701

Deep learning AI can be used for supervised, unsupervised as well as reinforcement


machine learning. it uses a variety of ways to process these.
Supervised Machine Learning: Supervised machine learning is the machine
learning technique in which the neural network learns to make predictions or classify data
based on the labeled datasets.
Here we input both input features along with the target variables. the neural network
learns to make predictions based on the cost or error that comes from the difference
between the predicted and the actual target, this process is known as backpropagation.
Unsupervised Machine Learning: Unsupervised machine learning is the machine
learning technique in which the neural network learns to discover the patterns or to cluster
the dataset based on unlabeled datasets. Here there are no target variables. while the
machine has to self-determine the hidden patterns or relationships within the datasets.
Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to
maximize a reward signal. The agent interacts with the environment by taking action and
observing the resulting rewards.
Neural Networks are computational models that mimic the complex functions of the human
brain. The neural networks consist of interconnected nodes or neurons that process and
learn from data, enabling tasks such as pattern recognition and decision making in machine
learning. The article explores more about neural networks, their working, architecture and
more
Evolution of Neural Networks
Since the 1940s, there have been a number of noteworthy advancements in the field of
neural networks:
1940s-1950s: Early Concepts
Neural networks began with the introduction of the first mathematical model of artificial
neurons by McCulloch and Pitts. But computational constraints made progress difficult.
1960s-1970s: Perceptron
This era is defined by the work of Rosenblatt on perceptron. Perceptron are single-layer
networks whose applicability was limited to issues that could be solved linearly separately.
1980s:Backpropagation and Connectionism
Multi-layer network training was made possible by Rumelhart, Hinton, and Williams’
invention of the backpropagation method. With its emphasis on learning through
interconnected nodes, connectionism gained appeal.
1990s:BoomandWinter
With applications in image identification, finance, and other fields, neural networks saw a
boom. Neural network research did, however, experience a “winter” due to exorbitant
computational costs and inflated expectations.
2000s:ResurgenceandDeepLearning
Larger datasets, innovative structures, and enhanced processing capability spurred a
NEURAL NETWORKS AND DEEP LEARNING IT1701

comeback. Deep learning has shown amazing effectiveness in a number of disciplines by


utilizing numerous layers.
2010s-Present:DeepLearningDominance
Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two deep
learning architectures, dominated machine learning. Their power was demonstrated by
innovations in gaming, picture recognition, and natural language processing.
What are Neural Networks?
Neural networks extract identifying features from data, lacking pre-programmed
understanding. Network components include neurons, connections, weights, biases,
propagation functions, and a learning rule. Neurons receive inputs, governed by thresholds
and activation functions. Connections involve weights and biases regulating information
transfer. Learning, adjusting weights and biases, occurs in three stages: input computation,
output generation, and iterative refinement enhancing the network’s proficiency in diverse
tasks.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed as a result of this
simulation.
3. The neural network then responds in a new way to the environment because of
the changes in its free parameters

Fig 2: Neural Networks with Activation Functions


NEURAL NETWORKS AND DEEP LEARNING IT1701

Working of a Neural Network


Neural networks are complex systems that mimic some features of the functioning of the
human brain. It is composed of an input layer, one or more hidden layers, and an output
layer made up of layers of artificial neurons that are coupled. The two stages of the basic
process are called backpropagation and forward propagation.

Fig 3: Neural Networks with Layers


Forward Propagation
 Input Layer: Each feature in the input layer is represented by a node on the
network, which receives input data.
 Weights and Connections: The weight of each neuronal connection indicates
how strong the connection is. Throughout training, these weights are changed.
 Hidden Layers: Each hidden layer neuron processes inputs by multiplying them
by weights, adding them up, and then passing them through an activation
function. By doing this, non-linearity is introduced, enabling the network to
recognize intricate patterns.
 Output: The result is produced by repeating the process until the output layer
is reached.
Backpropagation
 Loss Calculation: The network’s output is evaluated against the real goal
values, and a loss function is used to compute the difference. For a regression
problem, the Mean Squared Error (MSE) is commonly used as the cost function.

Loss Function:
 Gradient Descent: Gradient descent is then used by the network to reduce the
loss. To lower the inaccuracy, weights are changed based on the derivative of the
loss with respect to each weight.
 Adjusting weights: The weights are adjusted at each connection by applying
this iterative process, or backpropagation, backward across the network.
 Training: During training with different data samples, the entire process of
forward propagation, loss calculation, and backpropagation is done iteratively,
enabling the network to adapt and learn patterns from the data.
NEURAL NETWORKS AND DEEP LEARNING IT1701

 Activation Functions: Model non-linearity is introduced by activation


functions like the rectified linear unit (ReLU) or sigmoid. Their decision on
whether to “fire” a neuron is based on the whole weighted input.

Types of Neural Networks: 5 types:


 Feedforward Networks: A feedforward neural network is a simple artificial
neural network architecture in which data moves from input to output in a single
direction. It has input, hidden, and output layers; feedback loops are absent. Its
straightforward architecture makes it appropriate for a number of applications,
such as regression and pattern recognition.
 Multilayer Perceptron (MLP): MLP is a type of feedforward neural network
with three or more layers, including an input layer, one or more hidden layers,
and an output layer. It uses nonlinear activation functions.
 Convolutional Neural Network (CNN): A Convolutional Neural
Network (CNN) is a specialized artificial neural network designed for image
processing. It employs convolutional layers to automatically learn hierarchical
features from input images, enabling effective image recognition and
classification. CNNs have revolutionized computer vision and are pivotal in tasks
like object detection and image analysis.
 Recurrent Neural Network (RNN): An artificial neural network type intended
for sequential data processing is called a Recurrent Neural Network (RNN). It is
appropriate for applications where contextual dependencies are critical, such as
time series prediction and natural language processing, since it makes use of
feedback loops, which enable information to survive within the network.
 Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to
overcome the vanishing gradient problem in training RNNs. It uses memory cells
and gates to selectively read, write, and erase information.
Advantages of Neural Networks
Neural networks are widely used in many different applications because of their many
benefits:
 Adaptability: Neural networks are useful for activities where the link between
inputs and outputs is complex or not well defined because they can adapt to new
situations and learn from data.
 Pattern Recognition: Their proficiency in pattern recognition renders them
efficacious in tasks like as audio and image identification, natural language
processing, and other intricate data patterns.
 Parallel Processing: Because neural networks are capable of parallel
processing by nature, they can process numerous jobs at once, which speeds up
and improves the efficiency of computations.
 Non-Linearity: Neural networks are able to model and comprehend
complicated relationships in data by virtue of the non-linear activation functions
found in neurons, which overcome the drawbacks of linear models
NEURAL NETWORKS AND DEEP LEARNING IT1701

Disadvantages of Neural Networks


Neural networks, while powerful, are not without drawbacks and difficulties:
 Computational Intensity: Large neural network training can be a laborious and
computationally demanding process that demands a lot of computing power.
 Black box Nature: As “black box” models, neural networks pose a problem in
important applications since it is difficult to understand how they make
decisions.
 Overfitting: Overfitting is a phenomenon in which neural networks commit
training material to memory rather than identifying patterns in the data.
Although regularization approaches help to alleviate this, the problem still
exists.
 Need for Large datasets: For efficient training, neural networks frequently
need sizable, labeled datasets; otherwise, their performance may suffer from
incomplete or skewed data.

Application Scope of Neural Networks:


 Air traffic control could be automated with the help of location, elevation,
direction, and speed of each radar blip which could be taken as input the
network. As a result of this, the output would be the air traffic controller’s
instruction which would be a response to each blip.
 Animal behavior, their relationships, and population cycles may be apt for
analysis using neural networks.
 Evaluation and valuation of property, buildings, automobiles, machinery, etc.
should be an easy task if done with the help of a neural network.
 Wagering on horse races, stock markets, sporting events, etc. could be analyzed
with the help of neural network predictions.
 Criminal sentencing could be anticipated using a large sample of crime details as
input and the resulting sentences as output.
 Complete physical processes that revolve around various mathematical
formulas could be modeled heuristically using a neural network.
 Data mining, cleaning, and validation could be achieved efficiently if we find out
which records suspiciously diverge from the pattern of their peers. This could
be done with the help of a neural network.
 Direct mail advertisers could use neural network analysis of their databases to
decide which customers ought to be focused on, and avoid wasting money on
improbable targets.
 Weather prediction may be possible using a neural network. Inputs would
include weather reports from surrounding areas. Output(s) could be the future
weather in specific areas based on the input information.
 Examination of medical issues is an ideal application for neural networks.
 Research in medical fields relies heavily on classical insights to dissect research
data. Therefore, a neural network should be included in the researcher’s tool kit.

1. Computer vision:
NEURAL NETWORKS AND DEEP LEARNING IT1701

The first Deep Learning applications is Computer vision. In computer vision, Deep learning
AI models can enable machines to identify and understand visual data. Some of the main
applications of deep learning in computer vision include:
 Object detection and recognition: Deep learning model can be used to identify
and locate objects within images and videos, making it possible for machines to
perform tasks such as self-driving cars, surveillance, and robotics.
 Image classification: Deep learning models can be used to classify images into
categories such as animals, plants, and buildings. This is used in applications
such as medical imaging, quality control, and image retrieval.
 Image segmentation: Deep learning models can be used for image
segmentation into different regions, making it possible to identify specific
features within images.
2. Natural language processing (NLP):
In Deep learning applications, second application is NLP. NLP, the Deep learning model
can enable machines to understand and generate human language. Some of the main
applications of deep learning in NLP include:
 Automatic Text Generation – Deep learning model can learn the corpus of text
and new text like summaries, essays can be automatically generated using these
trained models.
 Language translation: Deep learning models can translate text from one
language to another, making it possible to communicate with people from
different linguistic backgrounds.
 Sentiment analysis: Deep learning models can analyze the sentiment of a piece
of text, making it possible to determine whether the text is positive, negative, or
neutral. This is used in applications such as customer service, social media
monitoring, and political analysis.
 Speech recognition: Deep learning models can recognize and transcribe
spoken words, making it possible to perform tasks such as speech-to-text
conversion, voice search, and voice-controlled devices.
3. Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action in an
environment to maximize a reward. Some of the main applications of deep learning in
reinforcement learning include:
 Game playing: Deep reinforcement learning models have been able to beat
human experts at games such as Go, Chess, and Atari.
 Robotics: Deep reinforcement learning models can be used to train robots to
perform complex tasks such as grasping objects, navigation, and manipulation.
 Control systems: Deep reinforcement learning models can be used to control
complex systems such as power grids, traffic management, and supply chain
optimization.

Perceptron Learning Algorithm:


NEURAL NETWORKS AND DEEP LEARNING IT1701

The Perceptron Learning Algorithm is a fundamental algorithm in the field of machine


learning, particularly in the context of supervised learning for binary classification tasks. It
is a type of linear classifier, i.e., it makes its predictions based on a linear predictor function
combining a set of weights with the feature vector.

Concept

 Binary Classification: The perceptron aims to classify input data into one of two
possible classes.
 Linear Separability: The algorithm works best when the two classes are linearly
separable, meaning a straight line (or hyperplane in higher dimensions) can separate
the data points of the two classes.

Components

 Weights (w): The algorithm maintains a set of weights, one for each feature in the
input data.
 Bias (b): An additional parameter that helps the decision boundary not necessarily
pass through the origin.
 Activation Function: A step function that outputs 1 if the weighted sum of the inputs
is greater than or equal to a threshold (usually 0), and -1 otherwise.

Algorithm Steps

1. Initialization: Initialize the weights and bias to small random values (often zero).
2. For each training example (x, y):
o Compute the output using the current weights and bias:
ypred=sign(w⋅x+b)y_{pred} = \text{sign}(w \cdot x + b)ypred=sign(w⋅x+b)
o Update the weights and bias if there is a misclassification:
 If ytrue≠ypredy_{true} \neq y_{pred}ytrue =ypred:
 w:=w+Δww := w + \Delta ww:=w+Δw where
Δw=η⋅(ytrue⋅x)\Delta w = \eta \cdot (y_{true} \cdot
x)Δw=η⋅(ytrue⋅x)
 b:=b+Δbb := b + \Delta bb:=b+Δb where Δb=η⋅ytrue\Delta b =
\eta \cdot y_{true}Δb=η⋅ytrue
o Here, η\etaη is the learning rate, a positive constant that determines the step
size.

Pseudocode

python
Copy code
initialize weights w and bias b to small random values
set learning rate η
NEURAL NETWORKS AND DEEP LEARNING IT1701

while stopping condition not met:


for each (x, y) in training set:
y_pred = sign(dot(w, x) + b)
if y_true != y_pred:
w=w+η*y*x
b=b+η*y

Key Points

 Convergence: If the data is linearly separable, the Perceptron Learning Algorithm


will converge to a solution that correctly classifies the training examples.
 Limitations: If the data is not linearly separable, the algorithm will not converge and
will keep updating the weights indefinitely.

Example

Let's consider a simple example with two features and binary class labels:

Training data:

 (1, 1) with label 1


 (2, 2) with label 1
 (-1, -1) with label -1
 (-2, -2) with label -1

Starting with initial weights (0, 0) and bias 0, and a learning rate of 1, the algorithm updates
the weights and bias based on misclassifications until it finds a suitable separating
hyperplane.

The Perceptron Learning Algorithm is fundamental in understanding more advanced


algorithms in machine learning and neural networks. It laid the groundwork for the
development of multilayer perceptron and deep learning models.

Activation Functions:
NEURAL NETWORKS AND DEEP LEARNING IT1701

Fig 4: Activation Functions


Formula for Activation Function:

Now the value of net input can be any anything from -inf to +inf. The neuron doesn’t really
know how to bound to value and thus is not able to decide the firing pattern. Thus the
activation function is an important part of an artificial neural network. They basically
decide whether a neuron should be activated or not. Thus, it bounds the value of the net
input. The activation function is a non-linear transformation that we do over the input
before sending it to the next layer of neurons or finalizing it as output.
A threshold activation function (or simply the activation function, also known
as squashing function) results in an output signal only when an input signal exceeding a
specific threshold value comes as an input. It is similar in behavior to the biological neuron
which transmits the signal only when the total input signal meets the firing threshold.

Types of Activation Function:


There are different types of activation functions. The most commonly used activation
function are listed below:
A. Identity Function: Identity function is used as an activation function for the input layer.
It is a linear function having the form

B. Threshold/step Function: It is a commonly used activation function. As depicted in the


diagram, it gives 1 as output of the input is either 0 or positive. If the input is negative, it
gives 0 as output. Expressing it mathematically,
NEURAL NETWORKS AND DEEP LEARNING IT1701

Fig 5: Threshold/Step Function


C. ReLU (Rectified Linear Unit) Function: It is the most popularly used activation
function in the areas of convolutional neural networks and deep learning. It is of the form:

Fig 6: Relu Function


This means that f(x) is zero when x is less than zero and f(x) is equal to x when x is above
or equal to zero. This function is differentiable, except at a single point x = 0. In that
sense, the derivative of a ReLU is actually a sub-derivative.
D. Sigmoid Function: It is by far the most commonly used activation function in neural
networks. The need for sigmoid function stems from the fact that many learning algorithms
require the activation function to be differentiable and hence continuous.
NEURAL NETWORKS AND DEEP LEARNING IT1701

Fig 7: Binary Sigmoid Function


A binary sigmoid function is of the form:
where k = steepness or slope parameter, By varying the value of k, sigmoid function with
different slopes can be obtained. It has a range of (0,1). The slope of origin is k/4. As the
value of k becomes very large, the sigmoid function becomes a threshold function.

Need for NON-Linear Activation Functions:


Introducing Non-linearity: Real-world data and relationships are often non-linear. Without
non-linear activation functions, a neural network with multiple layers would behave like a
single-layer network. Non-linear activation functions allow the network to model complex
relationships.
Enabling Deep Learning: Deep learning relies on the ability of the network to learn multiple
levels of abstraction. Non-linear activation functions help the network to build these
abstractions by combining inputs in a non-linear manner.
Gradient Propagation: Non-linear activation functions help in the backpropagation
process. They allow gradients to flow through the network, which is crucial for updating
weights and learning.
Modeling Complex Functions: Non-linear activation functions enable neural networks to
approximate any continuous function. This universality is fundamental to the power and
flexibility of neural networks.
Avoiding Simple Linear Transformations: Without non-linear activation functions, any
number of layers in a neural network would essentially collapse into a single linear
transformation, limiting the network's ability to model complex functions

The chain rule is a fundamental concept in calculus used to find the derivative of a composite
function. If you have two functions, fff and ggg, and you want to differentiate their
composition h(x)=f(g(x))h(x) = f(g(x))h(x)=f(g(x)), the chain rule states that:
NEURAL NETWORKS AND DEEP LEARNING IT1701

h′(x)=f′(g(x))⋅g′(x)h'(x) = f'(g(x)) \cdot g'(x)h′(x)=f′(g(x))⋅g′(x)

Backpropagation is an algorithm used to train artificial neural networks. It works by


adjusting the weights of the network to minimize the error between the predicted output
and the actual output. The process involves the following steps:

1. Forward Pass: Compute the predicted output by passing the input through the
network.
2. Compute Loss: Calculate the loss (or error) using a loss function, such as mean
squared error (MSE) or cross-entropy loss.
3. Backward Pass:
o Compute Gradients: Use the chain rule to compute the gradient of the loss
function with respect to each weight in the network. This is where the chain
rule plays a crucial role.
o Update Weights: Adjust the weights by subtracting a fraction of the gradient
(learning rate) from the current weights.

Example of Backpropagation Using Chain Rule

Consider a simple neural network with one input layer, one hidden layer, and one
output layer. Let:
NEURAL NETWORKS AND DEEP LEARNING IT1701

Deep Neural Networks (DNNs)

Deep Neural Networks (DNNs) are a type of artificial neural network (ANN) with multiple
hidden layers between the input and output layers. These networks are designed to model
complex patterns and relationships in data. They are called "deep" because of the depth
(number of layers) in the network.
NEURAL NETWORKS AND DEEP LEARNING IT1701

Fig: Structure of Deep Neural Networks and Artificial Neural Networks

Components of a Deep Neural Network

1. Input Layer: This layer receives the input data. Each neuron in this layer represents
a feature from the input data.
2. Hidden Layers: These are intermediate layers between the input and output layers.
DNNs have multiple hidden layers, allowing them to learn complex representations.
Each neuron in a hidden layer receives input from neurons in the previous layer and
sends output to neurons in the next layer.
3. Output Layer: This layer produces the final output of the network. The number of
neurons in this layer depends on the task (e.g., one neuron for binary classification,
multiple neurons for multi-class classification).
4. Weights and Biases: Each connection between neurons has an associated weight,
and each neuron has a bias. These parameters are adjusted during training to
minimize the error in the network's predictions.
5. Activation Functions: These functions introduce non-linearity into the network,
allowing it to learn more complex patterns. Common activation functions include
ReLU (Rectified Linear Unit), Sigmoid, and Tanh.

Training Deep Neural Networks

Training a DNN involves adjusting the weights and biases to minimize the error between the
predicted output and the actual output. This is typically done using the backpropagation
algorithm combined with an optimization technique such as gradient descent.

1. Forward Pass: Input data is passed through the network, and the output is
computed.
2. Loss Calculation: The loss function measures the difference between the predicted
output and the actual output. Common loss functions include Mean Squared Error
(MSE) for regression and Cross-Entropy Loss for classification.
NEURAL NETWORKS AND DEEP LEARNING IT1701

3. Backward Pass: The gradients of the loss function with respect to the weights and
biases are computed using the chain rule. This step involves backpropagation.
4. Weight Update: The weights and biases are updated using an optimization
algorithm, such as stochastic gradient descent (SGD) or Adam. The learning rate
controls the size of the updates.
5. Iterative Process: The forward and backward passes are repeated for many
iterations (epochs) until the network's performance converges.
NEURAL NETWORKS AND DEEP LEARNING IT1701

Shallow Network and Deep Network:

You might also like