SCT UNIT-2
SCT UNIT-2
SCT UNIT-2
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors. We can understand the artificial neural network with
an example, consider an example of a digital logic gate that takes an input and gives an
output. "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get
"On" in output. If both the inputs are "Off," then we get "Off" in output. Here the output
depends upon input. Our brain does not perform the same task. The outputs to inputs
relationship keep changing because of the neurons in our brain, which are "learning."
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer,
which finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function
Working of ANN: Artificial Neural Network can be best represented as a weighted directed
graph, where the artificial neurons form the nodes. The association between the neurons
outputs and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a pattern and
image in the form of a vector. These inputs are then mathematically assigned by the notations
x(n) for every n number of inputs. Afterward, each of the input is multiplied by its
corresponding weights ( these weights are the details utilized by the artificial neural networks
to solve a specific problem ). In general terms, these weights normally represent the strength
of the interconnection between neurons inside the artificial neural network. All the weighted
inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function. The
activation function refers to the set of transfer functions used to achieve the desired output.
There is a different kind of the activation function, but primarily either linear or non-linear
sets of functions. Some of the commonly used sets of activation functions are the Binary,
linear, and Tan hyperbolic sigmoidal activation functions.
X1: Is it raining?
X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights
X1 and X2 as 1 and a threshold function as 1. So, the neural network model will look like:
∑ ¿=∑ w i x i¿
i=1
y out =f ¿
The truth table built with respect to the problem is depicted above. From the truth table, It
can conclude that in the situations where the value of yout is 1, John needs to carry an
umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.
Rosenblatt’s Perceptron:
Rosenblatt’s perceptron is built around the McCulloch-Pitts neural model. The diagrammatic
representation is as follows:
The perceptron receives a set of input x 1, x2,….., xn. The linear combiner or the adder mode
computes the linear combination of the inputs applied to the synapses with synaptic weights
being w1, w2,……,wn. Then, the hard limiter checks whether the resulting sum is positive or
negative If the input of the hard limiter node is positive, the output is +1, and if the input is
negative, the output is -1. Mathematically the hard limiter input is:
n
V =∑ w i x i
i=1
y out =f (V )= {−1
+1v≥ 1
v <1
The objective of the perceptron is o classify a set of inputs into two classes c 1 and c2. This can
be done using a very simple decision rule – assign the inputs to c 1 if the output of the
perceptron i.e. yout is +1 and c2 if yout is -1. So for an n-dimensional signal space i.e. a space
for ‘n’ input signals, the simplest form of perceptron will have two decision regions,
resembling two classes, separated by a hyperplane defined by:
n
∑ wi xi =0
i=0
Therefore, the two input signals denoted by the variables x 1 and x2, the decision boundary is a
straight line of the form:
w 0 x 0 + w1 x 1 +w 2 x 2=0
x 0 +w 1 x 1+ w2 x 2=0(if x 0 =1)
So, for a perceptron having the values of synaptic weights w 0,w1 and w2 as -2, 1/2 and 1/4,
respectively. The linear decision boundary will be of the form:
So, any point (x,1x2) which lies above the decision boundary, as depicted by the graph, will be
assigned to class c1 and the points which lie below the boundary are assigned to class
c2.
Thus, we see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for 2-dimensional space),
decision planes (for 3-dimensional space) or decision hyper planes (for n-dimensional
space).
Adaline (Adaptive Linear Neural):
A network with a single linear unit is called Adaline (Adaptive Linear Neural). A unit with
a linear activation function is called a linear unit. In Adaline, there is only one output unit
and output values are bipolar (+1,-1). Weights between the input unit and output unit are
adjustable. It uses the delta rule i.e w i(new)= wi(old)+(t-yin)xi, where wi , yin and t are the
weight, predicted output, and true value respectively. The learning rule is found to
minimize the mean square error between activation and target values. Adaline consists of
trainable weights, it compares actual output with calculated output, and based on error
training algorithm is applied.
First, calculate the net input to your Adaline network then apply the activation function to its
output then compare it with the original output if both the equal, then give the output else
send an error back to the network and update the weight according to the error which is
calculated by the delta learning rule. i.e wi(new)= wi(old)+(t-yin)xi
In Adaline, all the input neuron is directly connected to the output neuron with the weighted
connected path. There is a bias b of activation function 1 is present.
Algorithm:
Step 1: Initialize weight not zero but small random values are used. Set learning rate α.
y ¿ =∑ w i x i +b
When the predicted output and the true value are the same then the weight will not change.
Step 7: Test the stopping condition. The stopping condition may be when the weight
changes at a low rate or no change.
There are three types of a layer present in Madaline First input layer contains all the input
neurons, the Second hidden layer consists of an adaline layer, and weights between the input
and hidden layers are adjustable and the third layer is the output layer the weights between
hidden and output layer is fixed they are not adjustable.
Algorithm:
Step 1: Initialize weight and set learning rate α.
v1=v2=0.5 , b=0.5
other weight may be a small random value.
Step 2: While the stopping condition is False do steps 3 to 9.
Step 3: for each training set perform steps 4 to 8.
Step 4: Set activation of input unit xi = si for (i=1 to n).
Step 5: Compute net input of Adaline unit
Step 6: for output of remote Adaline unit using activation function given below:
Activation function f(z)
z1=f(zin1)
z2=f(zin2)
Step 7: Calculate the net input to output.
yin = b3 + z1v1 + z2v2
Apply activation to get the output of the net
y=f(yin)
Step 8: Find the error and do weight updation
if t ≠ y then t=1 update weight on z(j) unit whose next input is close to 0.
if t = y no updation
wij(new) =wij(old) + α(t-zinj)xi
bj(new) = bj(old) + α(t-zinj)
if t=-1 then update weights on all unit z k which have positive net input
Step 9: Test the stopping condition; weights change all number of epochs.
MLPs are foundational models in deep learning, serving as a basis for more complex
architectures like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs).
A Feedforward Multilayer Perceptron (MLP) is a type of artificial neural network where the
connections between the nodes do not form a cycle, making it a straightforward feedforward
structure. Here’s a breakdown of its components and how it works:
1. Architecture:
Layers: An MLP consists of at least three layers of nodes: an input layer, one or more hidden
layers, and an output layer.
• Input Layer: This layer receives the input data. Each node in this layer
represents a feature in the input data.
• Hidden Layers: These layers are where the learning happens. The nodes in
each hidden layer apply a weighted sum and a non-linear activation function to
their inputs. An MLP can have multiple hidden layers, making it a deep neural
network.
• Output Layer: This layer produces the final output of the network. The number
of nodes in this layer corresponds to the number of classes in a classification
task or the number of output variables in a regression task.
2. Feedforward Process:
• In an MLP, information moves in one direction—from the input layer, through the
hidden layers, to the output layer—hence the term "feedforward."
• Each node in a layer is connected to every node in the next layer through a set of
weights. The output of each node is a function of the weighted sum of its inputs.
3. Activation Functions:
Activation functions are used in the hidden layers to introduce non-linearity into the model,
enabling the network to learn complex patterns. Common activation functions include:
4. Training Process:
The MLP is trained using backpropagation, a process that involves two main steps:
• Forward Pass: The input is passed through the network, and the output is
computed.
• Backward Pass: The error between the predicted output and the actual target
is calculated, and the network's weights are adjusted in the opposite direction
of the gradient of the error concerning the weights (using gradient descent or a
variant).
5. Learning:
• During training, the MLP learns by updating the weights on each connection to
minimize the difference between the predicted output and the actual target values.
• This process is iterative and continues until the model reaches an acceptable level of
accuracy or another stopping criterion is met.
6. Applications:
Limitations:
• Overfitting: MLPs can overfit the training data, especially if they have too many
parameters relative to the amount of training data.
Advantages:
• Flexibility: MLPs can model a wide range of functions due to their non-linear
activation functions and multiple layers.
• Generalization: When trained properly, MLPs can generalize well to unseen data.
• Supervised Learning : As the name suggests, this type of learning is done under the
supervision of a teacher. This learning process is dependent.
• During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector.
• This output vector is compared with the desired output vector. An error signal is
generated, if there is a difference between the actual output and the desired output
vector.
• On the basis of this error signal, the weights are adjusted until the actual output is
matched with the desired output.
Unsupervised Learning
• As the name suggests, this type of learning is done without the supervision of a
teacher.
• During the training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters.
• When a new input pattern is applied, then the neural network gives an output response
indicating the class to which the input pattern belongs.
• There needs to be feedback from the environment as to what should be the desired
output and if it is correct or incorrect. Hence, in this type of learning, the network
itself must discover the patterns and features from the input data, and the relation for
the input data over the output.
Reinforcement Learning
• As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information.
• This learning process is similar to supervised learning, however we might have very
less information.
• During the training of network under reinforcement learning, the network receives
some feedback from the environment.
• However, the feedback obtained here is evaluative not instructive, which means there
is no teacher as in supervised learning.
• After receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher.
This learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the actual
output and the desired output vector. On the basis of this error signal, the weights are adjusted
until the actual output is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent. During the training of ANN under
unsupervised learning, the input vectors of similar type are combined to form clusters. When
a new input pattern is applied, then the neural network gives an output response indicating the
class to which the input pattern belongs. There is no feedback from the environment as to
what should be the desired output and if it is correct or incorrect. Hence, in this type of
learning, the network itself must discover the patterns and features from the input data, and
the relation for the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to supervised learning,
however we might have very less information. During the training of network under
reinforcement learning, the network receives some feedback from the environment. This
makes it somewhat similar to supervised learning. However, the feedback obtained here is
evaluative not instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get better critic
information in future.
Data Processing:
Scaling:
Machine learning algorithms often rely on the quality and distribution of input features to
make accurate predictions. However, not all features are created equal. Feature scaling, a
crucial preprocessing step, ensures that features are transformed into a consistent range,
allowing machine learning models to perform optimally. In this blog post, we will explore the
significance of feature scaling in machine learning, its impact on different algorithms,
popular scaling techniques, and best practices to enhance model performance.
Feature scaling is important because many machine learning algorithms are sensitive to the
scale of input features. When features have different scales or ranges, it can negatively impact
the performance of these algorithms. For example, distance-based algorithms such as K-
Nearest Neighbors (KNN) or Support Vector Machines (SVM) calculate distances between
data points, and if the features have different scales, the distances may be dominated by the
features with larger scales. This can lead to biased results and inaccurate predictions.
When features are not scaled, it can have several negative impacts on machine learning
models. Distance-based algorithms, such as K-Nearest Neighbors (KNN) and Support Vector
Machines (SVM), may be dominated by features with larger scales, leading to biased
decisions. Gradient-based algorithms, like Linear Regression, Logistic Regression, and
Neural Networks, can experience slower convergence if features have different scales. Tree-
based algorithms, such as Decision Trees and Random Forests, are generally robust to feature
scaling but may use distance-based metrics for specific purposes.
Standardization
In this formula, X represents the original feature values, mean(X) is the mean of the feature
values, and std(X) is the standard deviation of the feature values. In order to standardize your
data, you can make use of the StandardScaler class from the sklearn library. By applying the
StandardScaler to your dataset, you will be able to achieve the desired standardization. Here’s
a demonstration of how you can implement the StandardScaler.
By comparing the original data graph with the standardized data graph, the following changes
can be observed:
• The scale of the features has been altered in the standardized data. The values on both the x-
axis and y-axis are now in terms of standard deviations.
• The distribution of the standardized data is more symmetric compared to the original data.
The relationship between the features is preserved, but the data is now on a
standardized scale.
When outliers are present in the data, they can have extreme values that are far from the
mean. However, since standardization uses the mean and standard deviation of the entire
dataset, including the outliers, the outliers themselves are not altered by standardization.
Instead, their relative position with respect to the mean and standard deviation may change,
but their absolute values remain the same.
Normalization
Normalization, also known as min-max scaling, is a feature scaling technique that rescales
the data to a specific range, typically between 0 and 1. It is particularly useful when the
feature values have different ranges and it’s necessary to bring them to a common scale. The
formula for normalization is as follows:
In this formula, X represents the original feature values, min(X) is the minimum value of the
feature, and max(X) is the maximum value of the feature.
To perform data normalization, you can import the MinMaxScaler class from the sklearn
library and utilize it to transform your dataset. Let’s proceed with applying the
MinMaxScaler to achieve normalization on our data.
Let’s examine the impact of normalization on our dataset. After applying normalization, we
can observe that all the features now possess a minimum value of 0 and a maximum value of
1. This normalization process has successfully rescaled the data within the desired range.
The visualization allows us to observe the changes in the data distribution after Min-Max
scaling. In the original data plot, we can see the density estimates of the original ‘Age’ and
‘Fare’ features. In the Min-Max scaled data plot, we can observe the density estimates of the
corresponding scaled features.
By comparing the original and scaled data plots, we can observe that the Min-Max scaling
process transforms the data distribution. The scaled data is compressed within the range of 0
to 1, and the density estimates are adjusted accordingly.
The Fourier transform is a neural network:
We can consider the discrete Fourier transform (DFT) to be an artificial neural network: it is
a single layer network, with no bias, no activation function, and particular values for the
weights. The number of output nodes is equal to the number of frequencies we evaluate.
(A signal can be written as the sum of sinusoids. yk is a complex value that gives us
information about the sinusoid of frequency k in signal x; from yk we can compute the
amplitude and phase (i.e. location) of the sinusoid.)
This gives us the Fourier value for a particular k. However, we most commonly want to
compute the full frequency spectrum, i.e. values of k from [0,1,… N −1]. We can use a
matrix for this (k is incremented column-wise, and n row-wise):
More concisely:
This should look familiar, because it is a neural network layer with no activation function and
no bias. The matrix of exponentials contains our weights, which we’ll call “complex Fourier
weights”. Usually we don’t know the weights of our neural networks in advance, but in this
case we do.
Principal-Component Analysis:
Principal components analysis (PCA) is a statistical technique that allows identifying
underlying linear patterns in a data set so it can be expressed in terms of another data set of a
significatively lower dimension without much loss of information. The final data set should
explain most of the variance of the original data set by reducing the number of variables. The
final variables will be named as principal components. The following diagram summarizes
the activities that need to be performed in principal components analysis.
1. Subtract mean
The first step in the principal component analysis is to subtract the mean for each variable of
the data set.
The covariance of two random variables measures the degree of variation from their means
for each other. The sign of the covariance provides us with information about the relation
between them:
If the covariance is positive, then the two variables increase and decrease together.
If the covariance is negative, then when one variable increases, the other decreases,
and vice versa.
These values determine the linear dependencies between the variables, which will be used to
reduce the data set's dimension. The variance is a measure of how the data is spread from the
mean.
The diagonal values show the covariance of each variable and itself, and they equal their
variance.
The off-diagonal values show the covariance between the two variables. In this case, these
values are positive, which means that both variables increase and decrease together.
Eigenvectors are defined as those vectors whose directions remain unchanged after any linear
transformation has been applied. However, their length could not remain the same after the
transformation, i.e., the result of this transformation is the vector multiplied by a scalar. This
scalar is called eigenvalue, and each eigenvector has one associated with it.
Among the available eigenvectors that were previously calculated, we must select those onto
which we project the data. The selected eigenvectors will be called principal components.
To establish a criterion to select the eigenvectors, we must first define the relative variance of
each and the total variance of a data set. The relative variance of an eigenvector measures
how much information can be attributed to it. The total variance of a data set is the sum of the
variance of all the variables.
Once we have selected the principal components, the data must be projected onto them. The
following image shows the result of this projection for our example.
Although this projection can explain most of the variance of the original data, we have lost
the information about the variance along with the second component. In general, this process
is irreversible, which means that we cannot recover the original data from the projection.
Wavelet transformations:
Wavelet transformations have been explored in the context of artificial neural networks
(ANNs) to enhance their performance in various applications, especially in signal and image
processing tasks. Wavelet transformations are mathematical operations that decompose
signals or images into different frequency components. These transformations have a unique
property of preserving both time (spatial) and frequency (scale) information simultaneously
as represented in equation.
3. Denoising and compression: Wavelets are widely used for denoising and data
compression tasks. By incorporating wavelets into neural networks, these models can
learn to denoise or compress data more effectively.
4. Image processing: In computer vision tasks, wavelet transformations can be used for
image enhancement, edge detection, and feature extraction, which can benefit
subsequent neural network-based processing.
1. Wavelet as input: The neural network can take the wavelet-transformed input data
directly and learn to operate on the transformed coefficients.
However, it is worth noting that while the idea of using wavelet transformations with neural
networks is promising, the success and effectiveness of these techniques depend on the
specific application and dataset.
Hopfield network: The Hopfield network is a type of recurrent artificial neural network
introduced by John Hopfield in 1982. It is a form of associative memory network that is
primarily used for content-addressable memory and pattern recognition tasks. Unlike
feedforward neural networks, which propagate information in a one-way direction, Hopfield
networks have feedback connections, allowing them to store and retrieve patterns in a
distributed manner.
Discrete Hopfield Network
A Hopfield net
work which operates in a discrete line fashion or in other words, it can be said the
input and output patterns are discrete vector, which can be either binary 0,10,1 or
bipolar +1,−1+1,−1 in nature. The network has symmetrical weights with no self-connections
i.e., wij = wji and wii = 0.
Architecture: Following are some important points to keep in mind about discrete Hopfield
network −
This model consists of neurons with one inverting and one non-inverting output.
The output of each neuron should be the input of other neurons but not the input of
self.
Testing Algorithm
Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input vector X as follows
− yi=xi for i=1 to n.
Step 5 − For each unit Yi, perform steps 6-9.
Step 6 − Calculate the net input of the network as follows −
Step 7 − Apply the activation as follows over the net input to calculate the output –
Layers: SONN is with two layers: Fully connected input layer and output (map)
layer. The output layer is termed as Kohonen Layer
Intralayer Connections: All the neurons in output layer are connected in a specific
neighborhood with some topology. These are the unweighted lateral connections but
responsible for the competitive learning.
Lateral Feedback Connections: These connections generate the excitatory and
inhibitory effects, based on the distance from the winning neurons. accomplished by
the utilization of a Mexican hat function which depicts the synaptic weights between
the neurons in the Kohonen layer.
Phases of SONN:
1. Learning phase: Construction of maps; the network is designed with a competitive
process using the training samples.
2. Prediction phase: Classification of new data; for the new data samples, a specific
location is provided on the converged map.
The nodes in different layers of the neural network are compressed to form a single
layer of recurrent neural networks. A, B, and C are the parameters of the network
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A,
B, and C are the network parameters used to improve the output of the model. At any given
time t, the current input is a combination of input at x(t) and x(t-1). The output at any given
time is fetched back to the network to improve on the output.
Vanishing Gradient Problem
Recurrent Neural Networks enable you to model time-dependent and sequential
data problems, such as stock market prediction, machine translation, and text generation.
You will find, however, RNN is hard to train because of the gradient problem. RNNs suffer
from the problem of vanishing gradients. The gradients carry information used in the RNN,
and when the gradient becomes too small, the parameter updates become insignificant. This
makes the learning of long data sequences difficult.
1. Input Signals: The controller receives input signals from the environment or sensors
that provide information about the system's current state or the task being performed.
3. Weights and Biases: Each connection between neurons in the neural network has
associated weights and biases that determine how the input data is processed and
transformed during forward propagation.
4. Forward Propagation: The input signals propagate through the neural network from
the input layer to the output layer. Each layer's neurons perform calculations based on
the input data and their respective weights and biases.
5. Activation Function: After the weighted sums of inputs are computed in each neuron,
an activation function is applied to introduce non-linearity and introduce a threshold
for the neuron's activation.
6. Output Generation: The output layer produces the final control signals or decisions
based on the processed input signals and learned patterns within the neural network.
7. Training: Before the neural network-based controller can be used effectively, it needs
to be trained using a labeled dataset or reinforcement learning. The training process
adjusts the weights and biases in the neural network to minimize the error or
maximize the performance of the controller on the task.
8. Feedback and Iteration: In some control scenarios, the controller may receive
feedback from the system it controls or from external sources. This feedback can be
used to fine-tune the neural network and improve its performance through iterative
learning.