Unit I
Unit I
Definition: Neural networks in machine learning are inspired by the structure and function of the
human brain. Both systems consist of interconnected units (neurons in the brain, nodes in ANNs)
that process information. (Haykin).
● Dendrites: These are branching extensions of the neuron that receive signals from other
neurons. They act as input channels, collecting electrical impulses and transmitting them
to the cell body.
● Cell Body (Soma): This is the central part of the neuron that contains the nucleus and or-
ganelles. It processes incoming signals and integrates them to determine if an action po-
tential should be generated.
● Axon: A long, slender projection that transmits electrical impulses away from the cell
body to other neurons or muscles.
● Axon Terminals: The endpoints of the axon where neurotransmitters are released to
communicate with other neurons across synapses.
● Synapses: Junctions between neurons where chemical signals (neurotransmitters) are ex-
changed. (Freeman & Kapura)
Function
Example: When a sensory neuron detects a stimulus, it sends an electrical signal through its
axon to the brain, where the signal is processed and interpreted.
● Inputs: These are numerical values (features) fed into the neuron from the previous layer
in the network.
● Weights: Each input is associated with a weight that adjusts the input's influence on the
neuron's output. Weights are updated during training to minimize error.
● Summation Function: The weighted inputs are summed together, often with a bias term
added, to calculate the neuron's net input.
● Activation Function: This function applies a non-linear transformation to the net input
to produce the neuron's output. Common activation functions include sigmoid, tanh, and
ReLU. ( Haykin)
Function
● Artificial neurons perform a mathematical operation on their inputs. They use weights to
scale the inputs, sum them, and then apply an activation function to produce an output.
This output is passed to the next layer in the network or as the final result.
Example: In a neural network for image classification, an artificial neuron might process pixel
values of an image, apply weights to these values, and use an activation function to determine
whether a specific feature (e.g., an edge) is present.
● Basic Operation: Both biological and artificial neurons process information and send sig-
nals to other neurons or layers.
● Information Integration: In both systems, inputs are integrated to produce an output.
Biological neurons integrate electrical signals, while artificial neurons integrate numeri-
cal values.
Differences
● Complexity: Biological neurons are vastly more complex, involving biochemical interac-
tions, varying neurotransmitters, and intricate signaling pathways. Artificial neurons are
simplified models focusing on mathematical operations.
● Learning Mechanism: Biological neurons use synaptic plasticity and chemical signals
for learning, while artificial neurons use algorithms like backpropagation and gradient de-
scent to adjust weights and improve performance.
● Communication: Biological neurons communicate through chemical signals at synapses,
whereas artificial neurons communicate through numerical values passed between layers.
While a biological neuron adapts based on experience and changes in synaptic strength,
an artificial neuron adjusts its weights during training to minimize prediction error
.
Neural network architectures refer to the various structures and configurations of neural networks de-
signed to solve specific types of problems. Each architecture is optimized for particular tasks such as im-
age recognition, natural language processing, or time-series prediction. (Haykin)
A single-layer perceptron can solve linearly separable problems like basic binary classification
tasks. For instance, classifying points on a 2D plane into two categories using a linear boundary.
Limitations
● Explanation: Single-layered networks are limited to solving only linearly separable problems.
They cannot capture complex patterns or relationships in the data. (Haykin)
A single-layer perceptron struggles with the XOR problem, where the decision boundary is not
linear.
Multi-Layered Neural Networks
An MLP with multiple hidden layers can handle more complex tasks like image recognition or
function approximation. For example, predicting house prices based on various features such as
size, location, and number of rooms.
Training
● Explanation: MLPs are trained using algorithms like backpropagation, which adjusts the weights
based on the error between predicted and actual outputs. This process involves propagating the
error backwards through the network to update the weights. (Haykin)
Training an MLP to recognize handwritten digits using the MNIST dataset involves adjusting
weights to minimize the classification error.
Fully Connected Networks
● Explanation: Fully connected networks, also known as dense networks, are characterized by ev-
ery neuron in one layer being connected to every neuron in the next layer. This structure allows
the network to learn a rich representation of the data. (Haykin)
Fully connected layers are often used in the final stages of convolutional networks to combine
features learned by convolutional and pooling layers into a final classification output.
Applications
● Explanation: Fully connected networks are used in various applications, including feature ex-
traction, classification tasks, and regression. They are a fundamental component of many deep
learning models. (Haykin)
In a deep learning model for speech recognition, fully connected layers may process features ex -
tracted by convolutional layers to make predictions about spoken words.
● Explanation: Recurrent Neural Networks (RNNs) are designed to handle sequential data by in-
corporating cycles in the network architecture. This allows them to maintain a form of memory of
previous inputs, making them suitable for tasks involving time-series or sequential data. (Haykin)
RNNs are used in natural language processing for tasks such as language modeling and text gen -
eration, where the context of previous words affects the prediction of the next word.
Variants of RNNs
● Long Short-Term Memory (LSTM): An RNN variant designed to overcome the vanishing gra-
dient problem by using gates to control the flow of information and maintain long-term depen -
dencies. (Haykin)
● Gated Recurrent Unit (GRU): A simplified variant of LSTM with fewer gates but similar capa-
bilities in managing long-term dependencies. (Haykin)
LSTMs used for machine translation, where understanding the context of an entire sentence is
essential for translating to another language.
● Explanation: Neural networks are structured in layers. Each layer consists of multi-
ple nodes, and connections between nodes form directed edges. In a feedforward
network, edges go from input nodes to hidden nodes and then to output nodes.
(Haykin)
● Example: In a Convolutional Neural Network (CNN), the directed graph shows
how input images pass through convolutional layers, pooling layers, and fully con-
nected layers.
● Explanation: In RNNs, the directed graph can contain cycles because neurons can
have connections to themselves or previous layers. This cyclical structure allows
the network to maintain a memory of previous inputs. (Haykin)
● Example: In an RNN used for time-series prediction, the directed graph includes
cycles that represent the feedback of past information to the network.
Clarity in Architecture
Complexity Management
Efficient Computation
● Explanation: Directed graphs facilitate the use of algorithms for efficient computa-
tion, such as forward propagation, backpropagation, and optimization techniques.
(Haykin)
● Example: Backpropagation algorithms use the directed graph structure to compute
gradients and update weights efficiently.
Fig: Neural network as directed graph
Key Concepts
Learning Rule
● Explanation: The learning rule defines how weights are updated based on the
error. A common rule used is the Delta Rule, which adjusts weights propor-
tionally to the error term and the input. (Haykin)
Backpropagation
1. Forward Pass: Compute the output of the network using the current
weights.
2. Error Calculation: Calculate the difference between the predicted
output and the target output.
3. Backward Pass: Compute the gradient of the error with respect to
each weight and update the weights accordingly.
Forward Pass
● During the forward pass, the input data is fed into the network,
Explanation:
and the network computes the output based on the current weights and acti-
vation functions. (Haykin)
● Example: For an image classification network, the forward pass involves
passing pixel values through layers of the network to produce class scores.
Error Computation
Backward Pass
NUMERICAL PROBLEM:
In a single-layer neural network, you are working with a perceptron that receives an input of 0.5.
After performing a forward pass, the network produces an output of 0.54. You aim to train the
network to achieve a target output of 0.8. With a learning rate set to 0.1, and an initial weight of
0.4, calculate the following:
1. Determine the change in weight based on the given learning rate and the difference be-
tween the desired and actual outputs. Also, find the new weight after applying this up-
date.
2. If the actual output were adjusted to 0.6 instead of 0.54, how would this affect the weight
update? Calculate the new weight in this scenario as well.
Examples
Perceptron Learning
Memory-Based Learning
Introduction
Definition
Key Concepts
Instance-Based Learning
Detailed Process
Storage of Instances
Decision Making
Hebbian Learning
Introduction
Definition
Key Concepts
Hebb’s Rule
● Hebb’s Rule states that the change in the synaptic weight Δwij
Explanation:
between two neurons i and j is proportional to the product of their activa-
tions.
Detailed Process
Updating Weights
● Explanation: Weights are updated during each learning iteration based on cur-
rent activations, continuing until the network’s connections reflect input pat-
terns.
Examples
Applications
Neural Networks
Introduction
● Explanation: In the Competitive Learning process, the neuron with the highest
similarity to the input pattern (or lowest distance) is selected as the winner.
Applications
Learn-
ing Rule
Applications
Introduction
Definition
Key Concepts
Probability Distributions
Bias-Variance Tradeoff
where:
Complexity=Number of Parameters
More complex models may fit the training data better but risk overfitting,
while simpler models might underfit.