SCT UNIT-2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

UNIT-2(ANN)

Concept of Artificial Neural Networks:


The term "Artificial neural network" refers to a biologically inspired sub-field of
artificial intelligence modeled after the brain. An Artificial neural network is usually a
computational network based on biological neural networks that construct the structure of the
human brain. Similar to a human brain has neurons interconnected to each other, artificial
neural networks also have neurons that are linked to each other in various layers of the
networks. These neurons are known as nodes. The given figure illustrates the typical diagram
of Biological Neural Network.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors. We can understand the artificial neural network with
an example, consider an example of a digital logic gate that takes an input and gives an
output. "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get
"On" in output. If both the inputs are "Off," then we get "Off" in output. Here the output
depends upon input. Our brain does not perform the same task. The outputs to inputs
relationship keep changing because of the neurons in our brain, which are "learning."

An Artificial Neural Network in the field of Artificial intelligence where it attempts to


mimic the network of neurons makes up a human brain so that computers will have an option
to understand things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like interconnected brain
cells.

To understand the concept of the architecture of an artificial neural network, we have


to understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network. Artificial Neural Network primarily consists of three layers:
Input Layer: As the name suggests, it accepts inputs in several different formats provided by
the programmer.

Hidden Layer: The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.

Output Layer: The input goes through a series of transformations using the hidden layer,
which finally results in output that is conveyed using this layer.

Basic mathematical model:

The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function

and the output can be calculated by applying the activation function


over the net input. It determines weighted total is passed as an input to an
activation function to produce the output. Activation functions choose whether a node should
fire or not. Only those who are fired make it to the output layer. There are distinctive
activation functions available that can be applied upon the sort of task we are performing.

Working of ANN: Artificial Neural Network can be best represented as a weighted directed
graph, where the artificial neurons form the nodes. The association between the neurons
outputs and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a pattern and
image in the form of a vector. These inputs are then mathematically assigned by the notations
x(n) for every n number of inputs. Afterward, each of the input is multiplied by its
corresponding weights ( these weights are the details utilized by the artificial neural networks
to solve a specific problem ). In general terms, these weights normally represent the strength
of the interconnection between neurons inside the artificial neural network. All the weighted
inputs are summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function. The
activation function refers to the set of transfer functions used to achieve the desired output.
There is a different kind of the activation function, but primarily either linear or non-linear
sets of functions. Some of the commonly used sets of activation functions are the Binary,
linear, and Tan hyperbolic sigmoidal activation functions.

McCulloch-Pitts Model of Neuron:


The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types of
inputs — Excitatory and Inhibitory. The excitatory inputs have weights of positive
magnitude and the inhibitory weights have weights of negative magnitude. The inputs of the
McCulloch-Pitts neuron could be either 0 or 1. It has a threshold function as an activation
function. So, the output signal yout is 1 if the input ysum is greater than or equal to a given
threshold value, else 0. The diagrammatic representation of the model is as follows:
Simple McCulloch-Pitts neurons can be used to design logical operations. For that purpose,
the connection weights need to be correctly decided along with the threshold function (rather
than the threshold value of the activation function). For better understanding purpose, let us
consider an example. John carries an umbrella if it is sunny or if it is raining. There are four
given situations. I need to decide when John will carry the umbrella. The situations are as
follows:
 First scenario: It is not raining, nor it is sunny
 Second scenario: It is not raining, but it is sunny
 Third scenario: It is raining, and it is not sunny
 Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, I can consider the input
signals as follows:

 X1: Is it raining?
 X2 : Is it sunny?

So, the value of both scenarios can be either 0 or 1. We can use the value of both weights
X1 and X2 as 1 and a threshold function as 1. So, the neural network model will look like:

Truth Table for this case will be:


y 2

∑ ¿=∑ w i x i¿
i=1

y out =f ¿

The truth table built with respect to the problem is depicted above. From the truth table, It
can conclude that in the situations where the value of yout is 1, John needs to carry an
umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.

Rosenblatt’s Perceptron:
Rosenblatt’s perceptron is built around the McCulloch-Pitts neural model. The diagrammatic
representation is as follows:

The perceptron receives a set of input x 1, x2,….., xn. The linear combiner or the adder mode
computes the linear combination of the inputs applied to the synapses with synaptic weights
being w1, w2,……,wn. Then, the hard limiter checks whether the resulting sum is positive or
negative If the input of the hard limiter node is positive, the output is +1, and if the input is
negative, the output is -1. Mathematically the hard limiter input is:
n
V =∑ w i x i
i=1

However, perceptron includes an adjustable value or bias as an additional weight w 0. This


additional weight is attached to a dummy input x 0, which is assigned a value of 1. This
consideration modifies the above equation to:
n
V =∑ w i x i
i =0

The output is decided by the expression:

y out =f (V )= {−1
+1v≥ 1
v <1

The objective of the perceptron is o classify a set of inputs into two classes c 1 and c2. This can
be done using a very simple decision rule – assign the inputs to c 1 if the output of the
perceptron i.e. yout is +1 and c2 if yout is -1. So for an n-dimensional signal space i.e. a space
for ‘n’ input signals, the simplest form of perceptron will have two decision regions,
resembling two classes, separated by a hyperplane defined by:
n

∑ wi xi =0
i=0

Therefore, the two input signals denoted by the variables x 1 and x2, the decision boundary is a
straight line of the form:
w 0 x 0 + w1 x 1 +w 2 x 2=0

x 0 +w 1 x 1+ w2 x 2=0(if x 0 =1)

So, for a perceptron having the values of synaptic weights w 0,w1 and w2 as -2, 1/2 and 1/4,
respectively. The linear decision boundary will be of the form:

−2+(1/2) x 1+(1/4 )x 2=0

So, any point (x,1x2) which lies above the decision boundary, as depicted by the graph, will be
assigned to class c1 and the points which lie below the boundary are assigned to class

c2.

Thus, we see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for 2-dimensional space),
decision planes (for 3-dimensional space) or decision hyper planes (for n-dimensional
space).
Adaline (Adaptive Linear Neural):

A network with a single linear unit is called Adaline (Adaptive Linear Neural). A unit with
a linear activation function is called a linear unit. In Adaline, there is only one output unit
and output values are bipolar (+1,-1). Weights between the input unit and output unit are
adjustable. It uses the delta rule i.e w i(new)= wi(old)+(t-yin)xi, where wi , yin and t are the
weight, predicted output, and true value respectively. The learning rule is found to
minimize the mean square error between activation and target values. Adaline consists of
trainable weights, it compares actual output with calculated output, and based on error
training algorithm is applied.

First, calculate the net input to your Adaline network then apply the activation function to its
output then compare it with the original output if both the equal, then give the output else
send an error back to the network and update the weight according to the error which is
calculated by the delta learning rule. i.e wi(new)= wi(old)+(t-yin)xi

In Adaline, all the input neuron is directly connected to the output neuron with the weighted
connected path. There is a bias b of activation function 1 is present.

Algorithm:
Step 1: Initialize weight not zero but small random values are used. Set learning rate α.

Step 2: While the stopping condition is False do steps 3 to 7.


Step 3: for each training set perform steps 4 to 6.

Step 4: Set activation of input unit xi = si for (i=1 to n).

Step 5: compute net input to output unit

y ¿ =∑ w i x i +b

Here, b is the bias and n is the total number of neurons.

Step 6: Update the weights and bias for i=1 to n

wi(new)= wi(old)+ α (t-yin)xi

b(new)= b(old)+ (t-yin)

and calculate error=(t-yin)2

When the predicted output and the true value are the same then the weight will not change.
Step 7: Test the stopping condition. The stopping condition may be when the weight
changes at a low rate or no change.

Madaline (Multiple Adaptive Linear Neuron):


The Madaline(supervised Learning) model consists of many Adaline in parallel with
a single output unit. The Adaline layer is present between the input layer and the Madaline
layer hence Adaline layer is a hidden layer. The weights between the input layer and the
hidden layer are adjusted, and the weight between the hidden layer and the output layer is
fixed. It may use the majority vote rule, the output would have an answer either true or
false. Adaline and Madaline layer neurons have a bias of ‘1’ connected to them. Use of
multiple Adaline helps counter the problem of non-linear reparability.

There are three types of a layer present in Madaline First input layer contains all the input
neurons, the Second hidden layer consists of an adaline layer, and weights between the input
and hidden layers are adjustable and the third layer is the output layer the weights between
hidden and output layer is fixed they are not adjustable.
Algorithm:
Step 1: Initialize weight and set learning rate α.
v1=v2=0.5 , b=0.5
other weight may be a small random value.
Step 2: While the stopping condition is False do steps 3 to 9.
Step 3: for each training set perform steps 4 to 8.
Step 4: Set activation of input unit xi = si for (i=1 to n).
Step 5: Compute net input of Adaline unit

zin1 = b1 + x1w11 + x2w21


zin2 = b2 + x1w12 + x2w22

Step 6: for output of remote Adaline unit using activation function given below:
Activation function f(z)

z1=f(zin1)
z2=f(zin2)
Step 7: Calculate the net input to output.
yin = b3 + z1v1 + z2v2
Apply activation to get the output of the net
y=f(yin)
Step 8: Find the error and do weight updation
if t ≠ y then t=1 update weight on z(j) unit whose next input is close to 0.
if t = y no updation
wij(new) =wij(old) + α(t-zinj)xi
bj(new) = bj(old) + α(t-zinj)
if t=-1 then update weights on all unit z k which have positive net input
Step 9: Test the stopping condition; weights change all number of epochs.

Feedforward Multilayer Perceptron (MLP)

MLPs are foundational models in deep learning, serving as a basis for more complex
architectures like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs).

A Feedforward Multilayer Perceptron (MLP) is a type of artificial neural network where the
connections between the nodes do not form a cycle, making it a straightforward feedforward
structure. Here’s a breakdown of its components and how it works:

1. Architecture:

Layers: An MLP consists of at least three layers of nodes: an input layer, one or more hidden
layers, and an output layer.
• Input Layer: This layer receives the input data. Each node in this layer
represents a feature in the input data.

• Hidden Layers: These layers are where the learning happens. The nodes in
each hidden layer apply a weighted sum and a non-linear activation function to
their inputs. An MLP can have multiple hidden layers, making it a deep neural
network.

• Output Layer: This layer produces the final output of the network. The number
of nodes in this layer corresponds to the number of classes in a classification
task or the number of output variables in a regression task.

2. Feedforward Process:

• In an MLP, information moves in one direction—from the input layer, through the
hidden layers, to the output layer—hence the term "feedforward."

• Each node in a layer is connected to every node in the next layer through a set of
weights. The output of each node is a function of the weighted sum of its inputs.

3. Activation Functions:

Activation functions are used in the hidden layers to introduce non-linearity into the model,
enabling the network to learn complex patterns. Common activation functions include:

4. Training Process:

The MLP is trained using backpropagation, a process that involves two main steps:

• Forward Pass: The input is passed through the network, and the output is
computed.

• Backward Pass: The error between the predicted output and the actual target
is calculated, and the network's weights are adjusted in the opposite direction
of the gradient of the error concerning the weights (using gradient descent or a
variant).
5. Learning:

• During training, the MLP learns by updating the weights on each connection to
minimize the difference between the predicted output and the actual target values.

• This process is iterative and continues until the model reaches an acceptable level of
accuracy or another stopping criterion is met.

6. Applications:

MLPs are used for various tasks, including:

• Classification: Assigning input data to one of several categories.

• Regression: Predicting a continuous value based on input data.

• Pattern Recognition: Identifying patterns and regularities in data.

• Function Approximation: Modeling complex functions that map inputs to


outputs.

Limitations:

• Overfitting: MLPs can overfit the training data, especially if they have too many
parameters relative to the amount of training data.

• Computational Cost: Training deep MLPs can be computationally expensive and


time-consuming.

Advantages:

• Flexibility: MLPs can model a wide range of functions due to their non-linear
activation functions and multiple layers.

• Generalization: When trained properly, MLPs can generalize well to unseen data.

6.Learning and Training the neural network:

• Learning, in artificial neural network, is the method of modifying the weights of


connections between the neurons of a specified network. Learning in ANN can be
classified into three categories namely supervised learning, unsupervised learning,
and reinforcement learning.

• Supervised Learning : As the name suggests, this type of learning is done under the
supervision of a teacher. This learning process is dependent.

• During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector.
• This output vector is compared with the desired output vector. An error signal is
generated, if there is a difference between the actual output and the desired output
vector.

• On the basis of this error signal, the weights are adjusted until the actual output is
matched with the desired output.

Unsupervised Learning

• As the name suggests, this type of learning is done without the supervision of a
teacher.

• This learning process is independent.

• During the training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters.

• When a new input pattern is applied, then the neural network gives an output response
indicating the class to which the input pattern belongs.

• There needs to be feedback from the environment as to what should be the desired
output and if it is correct or incorrect. Hence, in this type of learning, the network
itself must discover the patterns and features from the input data, and the relation for
the input data over the output.
Reinforcement Learning

• As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information.

• This learning process is similar to supervised learning, however we might have very
less information.

• During the training of network under reinforcement learning, the network receives
some feedback from the environment.

• This makes it somewhat similar to supervised learning.

• However, the feedback obtained here is evaluative not instructive, which means there
is no teacher as in supervised learning.

• After receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.

Learning and Training the neural network:


Learning, in artificial neural network, is the method of modifying the weights of
connections between the neurons of a specified network. Learning in ANN can be classified
into three categories namely supervised learning, unsupervised learning, and reinforcement
learning.

Supervised Learning

As the name suggests, this type of learning is done under the supervision of a teacher.
This learning process is dependent.

During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the actual
output and the desired output vector. On the basis of this error signal, the weights are adjusted
until the actual output is matched with the desired output.

Unsupervised Learning

As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent. During the training of ANN under
unsupervised learning, the input vectors of similar type are combined to form clusters. When
a new input pattern is applied, then the neural network gives an output response indicating the
class to which the input pattern belongs. There is no feedback from the environment as to
what should be the desired output and if it is correct or incorrect. Hence, in this type of
learning, the network itself must discover the patterns and features from the input data, and
the relation for the input data over the output.

Reinforcement Learning

As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to supervised learning,
however we might have very less information. During the training of network under
reinforcement learning, the network receives some feedback from the environment. This
makes it somewhat similar to supervised learning. However, the feedback obtained here is
evaluative not instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get better critic
information in future.

Data Processing:

Scaling:

Machine learning algorithms often rely on the quality and distribution of input features to
make accurate predictions. However, not all features are created equal. Feature scaling, a
crucial preprocessing step, ensures that features are transformed into a consistent range,
allowing machine learning models to perform optimally. In this blog post, we will explore the
significance of feature scaling in machine learning, its impact on different algorithms,
popular scaling techniques, and best practices to enhance model performance.

What is feature scaling?

Feature scaling is a preprocessing technique used in machine learning to standardize or


normalize the range of input features. It involves transforming the values of features to a
specific range or distribution. The goal is to ensure that all features have a similar scale,
which can help the machine learning algorithms perform more effectively.

Why is feature scaling important?

Feature scaling is important because many machine learning algorithms are sensitive to the
scale of input features. When features have different scales or ranges, it can negatively impact
the performance of these algorithms. For example, distance-based algorithms such as K-
Nearest Neighbors (KNN) or Support Vector Machines (SVM) calculate distances between
data points, and if the features have different scales, the distances may be dominated by the
features with larger scales. This can lead to biased results and inaccurate predictions.

Impact of unscaled features on machine learning models.

When features are not scaled, it can have several negative impacts on machine learning
models. Distance-based algorithms, such as K-Nearest Neighbors (KNN) and Support Vector
Machines (SVM), may be dominated by features with larger scales, leading to biased
decisions. Gradient-based algorithms, like Linear Regression, Logistic Regression, and
Neural Networks, can experience slower convergence if features have different scales. Tree-
based algorithms, such as Decision Trees and Random Forests, are generally robust to feature
scaling but may use distance-based metrics for specific purposes.

The Impact of Feature Scaling on Different Algorithms:

1. Distance-based algorithms (K-Nearest Neighbors, Support Vector Machines,


etc.): Distance-based algorithms rely on distance calculations between data points.
Inconsistent feature scales can result in features with larger scales dominating the distances,
leading to biased decisions and inaccurate predictions.

2. Gradient-based algorithms (Linear Regression, Logistic Regression, Neural


Networks, etc.): Gradient-based algorithms optimize model parameters based on gradients.
Inconsistent feature scales can slow down convergence and affect the overall performance of
these algorithms.

3. Tree-based algorithms (Decision Trees, Random Forests, etc.): Tree-based algorithms


are generally robust to feature scaling since they partition the feature space based on
thresholds.

Popular Feature Scaling Techniques:

Standardization

Standardization, also known as Z-score normalization, is a feature scaling technique that


transforms the data to have zero mean and unit variance

X_scaled = (X — mean(X)) / std(X)

In this formula, X represents the original feature values, mean(X) is the mean of the feature
values, and std(X) is the standard deviation of the feature values. In order to standardize your
data, you can make use of the StandardScaler class from the sklearn library. By applying the
StandardScaler to your dataset, you will be able to achieve the desired standardization. Here’s
a demonstration of how you can implement the StandardScaler.

By comparing the original data graph with the standardized data graph, the following changes
can be observed:

• The scale of the features has been altered in the standardized data. The values on both the x-
axis and y-axis are now in terms of standard deviations.

• The range of the data is compressed and centered around zero.

• The distribution of the standardized data is more symmetric compared to the original data.
 The relationship between the features is preserved, but the data is now on a
standardized scale.

When outliers are present in the data, they can have extreme values that are far from the
mean. However, since standardization uses the mean and standard deviation of the entire
dataset, including the outliers, the outliers themselves are not altered by standardization.
Instead, their relative position with respect to the mean and standard deviation may change,
but their absolute values remain the same.

Normalization

Normalization, also known as min-max scaling, is a feature scaling technique that rescales
the data to a specific range, typically between 0 and 1. It is particularly useful when the
feature values have different ranges and it’s necessary to bring them to a common scale. The
formula for normalization is as follows:

X_normalized = (X — min(X)) / (max(X) — min(X))

In this formula, X represents the original feature values, min(X) is the minimum value of the
feature, and max(X) is the maximum value of the feature.

To perform data normalization, you can import the MinMaxScaler class from the sklearn
library and utilize it to transform your dataset. Let’s proceed with applying the
MinMaxScaler to achieve normalization on our data.

Let’s examine the impact of normalization on our dataset. After applying normalization, we
can observe that all the features now possess a minimum value of 0 and a maximum value of
1. This normalization process has successfully rescaled the data within the desired range.

The visualization allows us to observe the changes in the data distribution after Min-Max
scaling. In the original data plot, we can see the density estimates of the original ‘Age’ and
‘Fare’ features. In the Min-Max scaled data plot, we can observe the density estimates of the
corresponding scaled features.

By comparing the original and scaled data plots, we can observe that the Min-Max scaling
process transforms the data distribution. The scaled data is compressed within the range of 0
to 1, and the density estimates are adjusted accordingly.
The Fourier transform is a neural network:
We can consider the discrete Fourier transform (DFT) to be an artificial neural network: it is
a single layer network, with no bias, no activation function, and particular values for the
weights. The number of output nodes is equal to the number of frequencies we evaluate.

Here is the DFT:


Where k is the number of cycles per N samples, xn is the signal’s value at sample n, and N is
the length of the signal.

(A signal can be written as the sum of sinusoids. yk is a complex value that gives us
information about the sinusoid of frequency k in signal x; from yk we can compute the
amplitude and phase (i.e. location) of the sinusoid.)

We can rewrite (1) using matrix multiplication:

This gives us the Fourier value for a particular k. However, we most commonly want to
compute the full frequency spectrum, i.e. values of k from [0,1,… N −1]. We can use a
matrix for this (k is incremented column-wise, and n row-wise):

More concisely:

This should look familiar, because it is a neural network layer with no activation function and
no bias. The matrix of exponentials contains our weights, which we’ll call “complex Fourier
weights”. Usually we don’t know the weights of our neural networks in advance, but in this
case we do.
Principal-Component Analysis:
Principal components analysis (PCA) is a statistical technique that allows identifying
underlying linear patterns in a data set so it can be expressed in terms of another data set of a
significatively lower dimension without much loss of information. The final data set should
explain most of the variance of the original data set by reducing the number of variables. The
final variables will be named as principal components. The following diagram summarizes
the activities that need to be performed in principal components analysis.

1. Subtract mean

The first step in the principal component analysis is to subtract the mean for each variable of
the data set.

2. Calculate the covariance matrix

The covariance of two random variables measures the degree of variation from their means
for each other. The sign of the covariance provides us with information about the relation
between them:

 If the covariance is positive, then the two variables increase and decrease together.

 If the covariance is negative, then when one variable increases, the other decreases,
and vice versa.

These values determine the linear dependencies between the variables, which will be used to
reduce the data set's dimension. The variance is a measure of how the data is spread from the
mean.

The diagonal values show the covariance of each variable and itself, and they equal their
variance.
The off-diagonal values show the covariance between the two variables. In this case, these
values are positive, which means that both variables increase and decrease together.

3. Calculate eigenvectors and eigenvalues

Eigenvectors are defined as those vectors whose directions remain unchanged after any linear
transformation has been applied. However, their length could not remain the same after the
transformation, i.e., the result of this transformation is the vector multiplied by a scalar. This
scalar is called eigenvalue, and each eigenvector has one associated with it.

4. Select principal components

Among the available eigenvectors that were previously calculated, we must select those onto
which we project the data. The selected eigenvectors will be called principal components.

To establish a criterion to select the eigenvectors, we must first define the relative variance of
each and the total variance of a data set. The relative variance of an eigenvector measures
how much information can be attributed to it. The total variance of a data set is the sum of the
variance of all the variables.

5. Reduce data dimension

Once we have selected the principal components, the data must be projected onto them. The
following image shows the result of this projection for our example.

Although this projection can explain most of the variance of the original data, we have lost
the information about the variance along with the second component. In general, this process
is irreversible, which means that we cannot recover the original data from the projection.
Wavelet transformations:
Wavelet transformations have been explored in the context of artificial neural networks
(ANNs) to enhance their performance in various applications, especially in signal and image
processing tasks. Wavelet transformations are mathematical operations that decompose
signals or images into different frequency components. These transformations have a unique
property of preserving both time (spatial) and frequency (scale) information simultaneously
as represented in equation.

Integrating wavelet transformations with neural networks offers several advantages:


1. Multi resolution representation: Wavelets provide a multi resolution representation of
signals, allowing neural networks to work with different levels of detail. This can be
useful for capturing both high-frequency and low-frequency features, which might be
important in certain applications.

2. Feature extraction: Wavelets can be used as feature extraction techniques, providing a


compact representation of the input data. Neural networks can then operate on these
extracted features, potentially leading to more efficient and effective learning.

3. Denoising and compression: Wavelets are widely used for denoising and data
compression tasks. By incorporating wavelets into neural networks, these models can
learn to denoise or compress data more effectively.

4. Image processing: In computer vision tasks, wavelet transformations can be used for
image enhancement, edge detection, and feature extraction, which can benefit
subsequent neural network-based processing.

5. Time-frequency analysis: For time-series data, wavelet transformations can analyze


signals in both the time and frequency domains, enabling neural networks to capture
temporal patterns at different scales.

There are different approaches to combining wavelets with neural networks:

1. Wavelet as input: The neural network can take the wavelet-transformed input data
directly and learn to operate on the transformed coefficients.

2. Hybrid architectures: Combining traditional convolutional neural networks (CNNs)


with wavelet transformations for preprocessing or feature extraction.
3. Wavelet neural networks: Designing specific neural network architectures that
incorporate wavelet-based operations as part of the network layers.

4. Wavelet as a regularizer: Using wavelets as regularization techniques to control the


network's complexity and prevent overfitting.

However, it is worth noting that while the idea of using wavelet transformations with neural
networks is promising, the success and effectiveness of these techniques depend on the
specific application and dataset.

Hopfield network: The Hopfield network is a type of recurrent artificial neural network
introduced by John Hopfield in 1982. It is a form of associative memory network that is
primarily used for content-addressable memory and pattern recognition tasks. Unlike
feedforward neural networks, which propagate information in a one-way direction, Hopfield
networks have feedback connections, allowing them to store and retrieve patterns in a
distributed manner.
Discrete Hopfield Network
A Hopfield net
work which operates in a discrete line fashion or in other words, it can be said the
input and output patterns are discrete vector, which can be either binary 0,10,1 or
bipolar +1,−1+1,−1 in nature. The network has symmetrical weights with no self-connections
i.e., wij = wji and wii = 0.

Architecture: Following are some important points to keep in mind about discrete Hopfield
network −

 This model consists of neurons with one inverting and one non-inverting output.

 The output of each neuron should be the input of other neurons but not the input of
self.

 Weight/connection strength is represented by wij.

 Connections can be excitatory as well as inhibitory. It would be excitatory, if the


output of the neuron is same as the input, otherwise inhibitory.

 Weights should be symmetrical, i.e. wij = wji


The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.
Training Algorithm
During training of discrete Hopfield network, weights will be updated. As we know that we
can have the binary input vectors as well as bipolar input vectors. Hence, in both the cases,
weight updates can be done with the following relation
Case 1 − Binary input patterns
For a set of binary patterns sp, p = 1 to P
Here, sp = s1p, s2p,..., sip,..., snp
Weight Matrix is given by

Case 2 − Bipolar input patterns


For a set of binary patterns sp, p = 1 to P
Here, sp = s1p, s2p,..., sip,..., snp Weight Matrix is given by

Testing Algorithm
Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input vector X as follows
− yi=xi for i=1 to n.
Step 5 − For each unit Yi, perform steps 6-9.
Step 6 − Calculate the net input of the network as follows −

Step 7 − Apply the activation as follows over the net input to calculate the output –

Here θi is the threshold.


Step 8 − Broadcast this output yi to all other units.
Step 9 − Test the network for conjunction.
Self-Organizing Neural Network: Self Organizing Neural Network (SONN) is an
unsupervised learning model in Artificial Neural Network termed as Self-Organizing Feature
Maps or Kohonen Maps. These feature maps are the generated two-dimensional discretized
form of an input space during the model training (based on competitive learning). This
phenomenon is very similar to biological systems. In the human cortex, sensory input spaces
(e.g., auditory, motor, tactile, visual, somatosensory, etc.) of multi-dimension are represented
by two-dimensional maps. Such projection of higher dimensional inputs to reduced
dimensional maps is termed as topology conserving. And this topology-conserving mapping
can be achieved by the Self Organizing Networks. Why SONN is required? These Self-
Organizing Maps are used for classification and visualization of higher-dimensional data in
lower-dimension.
SONN Architecture:

 Layers: SONN is with two layers: Fully connected input layer and output (map)
layer. The output layer is termed as Kohonen Layer
 Intralayer Connections: All the neurons in output layer are connected in a specific
neighborhood with some topology. These are the unweighted lateral connections but
responsible for the competitive learning.
 Lateral Feedback Connections: These connections generate the excitatory and
inhibitory effects, based on the distance from the winning neurons. accomplished by
the utilization of a Mexican hat function which depicts the synaptic weights between
the neurons in the Kohonen layer.

Phases of SONN:
1. Learning phase: Construction of maps; the network is designed with a competitive
process using the training samples.
2. Prediction phase: Classification of new data; for the new data samples, a specific
location is provided on the converged map.

Recurrent Neural Network (RNN)


RNN works on the principle of saving the output of a particular layer and feeding
this back to the input in order to predict the output of the layer. Below is how you can
convert a Feed-Forward Neural Network into a Recurrent Neural Network:

The nodes in different layers of the neural network are compressed to form a single
layer of recurrent neural networks. A, B, and C are the parameters of the network
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A,
B, and C are the network parameters used to improve the output of the model. At any given
time t, the current input is a combination of input at x(t) and x(t-1). The output at any given
time is fetched back to the network to improve on the output.
Vanishing Gradient Problem
Recurrent Neural Networks enable you to model time-dependent and sequential
data problems, such as stock market prediction, machine translation, and text generation.
You will find, however, RNN is hard to train because of the gradient problem. RNNs suffer
from the problem of vanishing gradients. The gradients carry information used in the RNN,
and when the gradient becomes too small, the parameter updates become insignificant. This
makes the learning of long data sequences difficult.

Exploding Gradient Problem


While training a neural network, if the slope tends to grow exponentially instead of
decaying, this is called an Exploding Gradient. This problem arises when large error
gradients accumulate, resulting in very large updates to the neural network model weights
during the training process. Long training time, poor performance, and bad accuracy are the
major issues in gradient problems.

Neural Network based controller: A neural network-based controller is a type of control


system that employs neural networks to make decisions and control a process or system.
Neural networks are computational models inspired by the human brain's structure and
functioning. They are capable of learning and generalizing patterns from input data, which
makes them suitable for a wide range of applications, including control systems. In a neural
network-based controller, the neural network takes input signals from the environment or
the system being controlled and processes them through a series of interconnected layers of
neurons. Each neuron applies a weighted transformation to its inputs and passes the result
through an activation function to produce an output.

Here's a general outline of how a neural network-based controller works:

1. Input Signals: The controller receives input signals from the environment or sensors
that provide information about the system's current state or the task being performed.

2. Neural Network Architecture: The neural network's architecture consists of input


nodes (neurons), hidden layers (if applicable), and output nodes (neurons). The
structure and size of the neural network depend on the complexity of the control task
and the characteristics of the input data.

3. Weights and Biases: Each connection between neurons in the neural network has
associated weights and biases that determine how the input data is processed and
transformed during forward propagation.

4. Forward Propagation: The input signals propagate through the neural network from
the input layer to the output layer. Each layer's neurons perform calculations based on
the input data and their respective weights and biases.

5. Activation Function: After the weighted sums of inputs are computed in each neuron,
an activation function is applied to introduce non-linearity and introduce a threshold
for the neuron's activation.

6. Output Generation: The output layer produces the final control signals or decisions
based on the processed input signals and learned patterns within the neural network.

7. Training: Before the neural network-based controller can be used effectively, it needs
to be trained using a labeled dataset or reinforcement learning. The training process
adjusts the weights and biases in the neural network to minimize the error or
maximize the performance of the controller on the task.

8. Feedback and Iteration: In some control scenarios, the controller may receive
feedback from the system it controls or from external sources. This feedback can be
used to fine-tune the neural network and improve its performance through iterative
learning.

Neural network-based controllers have been successfully applied to various control


tasks, including robotics, autonomous vehicles, industrial process control, and game playing.
They offer advantages such as adaptability, generalization, and the ability to handle complex
and non-linear control problems. However, their effectiveness depends on the quality of
training data, architecture design, and optimization techniques used during training.

You might also like