ML Unit-5
ML Unit-5
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.
There are around 1000 billion neurons in the human brain. Each neuron has an association point
somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such a manner as to be
distributed, and we can extract more than one piece of this data when necessary from our memory
parallelly. We can say that the human brain is made up of incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a digital logic
gate that takes an input and gives an output. "OR" gate, which takes two inputs. If one or both the inputs
are "On," then we get "On" in output. If both the inputs are "Off," then we get "Off" in output. Here the
output depends upon input. Our brain does not perform the same task. The outputs to inputs relationship
keep changing because of the neurons in our brain, which are "learning."
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output
that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias.
This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of task
we are performing.
Artificial neural networks have a numerical value that can perform more than one task simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the network from working.
After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage the network
according to the desired output by demonstrating these examples to the network. The succession of the
network is directly proportional to the chosen instances, and if the event can't appear to the network in all
its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature makes
the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks. The appropriate
network structure is accomplished through experience, trial, and error.
Unrecognized behavior of the network:
It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide insight
concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure.
Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly impact the performance
of the network. It relies on the user's abilities.
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the details
utilized by the artificial neural networks to solve a specific problem ). In general terms, these weights
normally represent the strength of the interconnection between neurons inside the artificial neural
network. All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or something else to
scale up to the system's response. Bias has the same input, and weight equals to 1. Here the total of
weighted inputs can be in the range of 0 to positive infinity. Here, to keep the response in the limits of the
desired value, a certain maximum value is benchmarked, and the total of weighted inputs is passed
through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired output. There is a
different kind of the activation function, but primarily either linear or non-linear sets of functions. Some of
the commonly used sets of activation functions are the Binary, linear, and Tan hyperbolic sigmoidal
activation functions. Let us take a look at each of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there is a threshold
value set up. If the net weighted input of neurons is more than 1, then the final output of the activation
function is returned as one or else the output is returned as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan hyperbolic
function is used to approximate output from the actual net input. The function is defined as:
Working of ANN:
Forward Propagation:
Error Calculation:
· The difference between the predicted output and the actual output is calculated using a loss
function (e.g., Mean Squared Error).
Backward Propagation:
Learning:
· The network iteratively updates weights to minimize the error, eventually learning the underlying
data patterns.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer, and at
least one layer of a neuron. Through assessment of its output by reviewing its input, the intensity of the
network can be noticed based on group behavior of the associated neurons, and the output is decided. The
primary advantage of this network is that it figures out how to evaluate and recognize input patterns.
1. Supervised Learning
· The network is trained using labeled data (input-output pairs).
· The objective is to minimize the error between predicted outputs and actual outputs.
· Steps:
1. Forward propagation: Pass input data through the network to get predictions.
2. Error calculation: Compute the loss (e.g., Mean Squared Error or Cross-Entropy).
3. Backpropagation: Adjust weights and biases to reduce the error.
· Examples:
· Image classification
· Spam email detection
2. Unsupervised Learning
· The network is trained on data without labels.
· The goal is to discover hidden patterns or structures in the data.
· Common approaches:
· Clustering (e.g., K-Means, SOMs)
· Dimensionality reduction (e.g., PCA, Autoencoders)
· Examples:
· Customer segmentation
· Anomaly detection
3. Reinforcement Learning
· The network learns by interacting with an environment and receiving feedback in the form of
rewards or penalties.
· Objective: Maximize cumulative rewards by learning optimal actions.
· Examples:
· Game playing (e.g., AlphaGo)
· Robot navigation
Learning and Adaption
·
· Similar to the Perceptron rule, but used with continuous outputs and differentiable activation
functions.
· Application: Linear regression, simple neural networks.
·
· E: Error
· Application: Multi-layer Perceptrons (MLPs), deep learning models.
·
· r: Reward signal
· Application: Dynamic decision-making, robotics.
2. Based on Architecture
a. Feedforward Networks
· Data flows in one direction, from input to output.
· No loops or cycles in the network.
· Examples:
· Single-Layer Perceptron
· Multi-Layer Perceptron (MLP)
b. Recurrent Networks
· Allow cycles or loops, enabling the network to retain memory of previous states.
· Suitable for sequential or temporal data.
· Examples:
· Recurrent Neural Network (RNN)
· Long Short-Term Memory (LSTM)
· Gated Recurrent Units (GRU)
c. Convolutional Networks
· Specialized for processing grid-like data, such as images.
· Employ convolutional layers for feature extraction.
· Examples:
· Convolutional Neural Network (CNN)
· Fully Convolutional Networks (FCN)
d. Modular Networks
· Composed of multiple smaller networks working together.
· Each module addresses a specific part of the problem.
· Examples:
· Mixture of Experts
· Ensembles of Networks
3. Based on Processing Mode
a. Static Networks
· Input-output mapping is fixed after training.
· No time dependency or temporal context.
· Examples:
· Feedforward Neural Networks
b. Dynamic Networks
· Incorporate temporal context or sequences in their processing.
· Examples:
· Recurrent Neural Networks (RNN)
· Echo State Networks
b. Discrete-Valued Networks
· Process categorical or binary data.
· Examples:
· Classification Networks
b. Regression Networks
· Predict continuous values based on input data.
· Examples:
· Linear Regression Networks
· Bayesian Neural Networks
c. Clustering Networks
· Group similar data points together.
· Examples:
· Self-Organizing Maps (SOM)
· k-Means with NN
d. Generative Networks
· Create new data samples from learned distributions.
· Examples:
· Generative Adversarial Networks (GANs)
· Variational Autoencoders (VAEs)
6. Based on Application
a. Vision Systems
· Examples: CNNs for object detection, image recognition.
d. Control Systems
· Examples: Reinforcement learning networks in robotics.
Output Layer:
· Computes the final output by summing the weighted inputs and applying an activation function.
Activation Function:
Functioning of a Single-Layer NN
Forward Pass:
·
Prediction:
·
· E: Error (e.g., Mean Squared Error).
Limitations of Single-Layer NN
Linear Separability:
· Can solve only linearly separable problems (e.g., AND, OR logic gates).
· Fails for non-linear problems (e.g., XOR).
Applications
· Simple binary classification tasks.
· Pattern recognition with linearly separable data.
· Logical operations like AND, OR.
Example: Perceptron
A perceptron is a classic single-layer neural network:
XOR Problem:
· Challenge: Single-layer networks cannot solve XOR because it is not linearly separable.
· Solution: Introduce multi-layer networks to address non-linear problems.
Applications:
· Image and Speech Recognition
· Natural Language Processing
· Autonomous Systems
· Medical Diagnostics
· Financial Forecasting
· Single-layer networks like the Perceptron can only solve linearly separable problems.
· Real-world problems often involve non-linear relationships, requiring more complex models.
· Multi-layer networks include one or more hidden layers between the input and output layers.
· These networks can approximate non-linear functions, enabling the solution of complex tasks such
as image recognition and language processing.
Need for Efficient Training:
Activation Functions:
Loss Function:
Forward Propagation:
· Compute outputs from input through hidden layers to the output layer.
Error Calculation:
Backward Propagation:
· Propagate the error back through the network using the chain rule of calculus to compute gradients
of the loss with respect to weights.
Weight Updates:
Historical Context
Development:
Impact:
Advantages
· Handles multi-layer architectures.
· Effective for non-linear problems.
· Widely applicable in classification, regression, and function approximation tasks.
Limitations
· Computationally expensive for deep networks.
· Sensitive to hyperparameters like learning rate.
· Prone to issues like vanishing gradients in deep networks.
Back-Propagation Learning
Back-propagation learning is a supervised learning algorithm used in training multi-layer neural
networks, such as Multi-Layer Perceptrons (MLPs). The method systematically adjusts the weights of the
network to minimize the error between the predicted and actual outputs.
· Minimize the error or loss function by updating the weights of the network through gradient
descent.
Error Signal:
· The error is propagated backward from the output layer to the input layer to compute gradients
efficiently.
Learning Process:
· The process involves two main phases: forward pass and backward pass.
· Randomly initialize the weights and biases of the network with small values.
Forward Pass:
· Pass the input data through the network to compute the output:
·
· zj: Weighted sum of inputs.
· yj: Output of the neuron after applying the activation function f.
· Compare the predicted output with the target output using a loss function:
·
· E: Error or loss.
· tk: Target output.
· ok: Predicted output.
Backward Pass:
· Compute the gradients of the loss function with respect to weights using the chain rule.
· Propagate the error backward from the output layer to the input layer.
·
Update Weights and Biases:
· Repeat the process for all training examples (epoch) until the error converges or a stopping
criterion is met.
Iterative:
Gradient-Based Optimization:
· Uses gradient descent or its variants (e.g., Stochastic Gradient Descent, Adam).
Activation Function:
Advantages
· Efficient for training multi-layer neural networks.
· Applicable to non-linear and complex problems.
· Provides a systematic way to update weights.
Limitations
· Vanishing Gradient Problem:
· Gradients can become very small in deep networks, slowing convergence.
· Overfitting:
· The network may memorize training data without generalizing well to unseen data.
· Computational Cost:
· Training can be slow for large datasets or deep networks.
Applications
· Image recognition (e.g., handwriting recognition).
· Natural language processing (e.g., sentiment analysis).
· Time-series prediction (e.g., stock price forecasting).
Difference Between Artificial Neural Network (ANN) and Biological
Neural Network (BNN)
Artificial Neural Networks (ANNs) and Biological Neural Networks (BNNs) are both inspired by the
functioning of the human brain but differ significantly in their structure, function, and operation. Below is
a comparison of the two:
Learning:
Adaptability:
Fault Tolerance: