Report 2022
Report 2022
1960:
Widrow and Hoff developed the ADALINE(Adaptive Linear Neuron) model using the
least mean squares (LMS) algorithm for quick and accurate learning, with applications in
pattern recognition, weather forecasting, and adaptive controls.
1969:
Minsky and Papert highlighted the limitations of single-layer neural networks in their
book on perceptrons, causing a setback in ANN research.
Post-1969:
Despite the setback, researchers like Kohonen, Grossberg, Anderson, and Hopfield
continued their work, and multi-layer perceptron networks were found to solve nonlinear
problems.
1970s-1980s:
Research on threshold elements and neural network theory continued, with Kunihiko
Fukushima developing neocognitrons in 1980.
Late 1980s:
Demonstrations of ANN capabilities emerged, including text-to-speech conversion,
handwritten character recognition, and image compression, primarily using the
backpropagation algorithm. Backpropagation, developed independently by Werbos,
Parker, and Rumelhart, Hinton, and Williams, enabled the training of multi-layer
networks, overcoming limitations identified by Minsky and Papert.
Parallel operation
The NNs can process information in parallel, at high speed, and in a distributed
manner.
Mapping
The NNs exhibit mapping capabilities, that is, they can map input patterns to
their associated output patterns.
Generalization
The NNs possess the capability to generalize. Thus, they can predict new
outcomes from past trends. Once trained, a network’s response can be to a
degree, insensitive to minor variations in its input. This ability to see through
noise and distortion to the pattern that lies within is vital to pattern recognition in
a real-world environment.
Robust
The NNs are robust systems and are fault tolerant. They can, therefore, recall full
patterns from incomplete, partial or noisy patterns .
Abstraction
Some ANN’s are capable of abstracting the essence of a set of inputs. i.e. they
can extract features of the given set of data, for example, convolution neural
networks are used to extract different features from images like edges, dark
spots, shapes ..etc. Such networks are trained for feature patterns based on which
they can classify or cluster the given input set.
Applicability
ANN’s are not a panacea. They are clearly unsuited to such tasks as calculating
the payroll. They are preferred for a large class of pattern-recognition tasks that
conventional computers do poorly, if at all.
1. Aerospace
High performance aircraft autopilots, flight path simulations, aircraft control systems,
autopilot enhancements, aircraft component simulations, aircraft component fault detectors.
2.Automotive
Automobile automatic guidance systems, fuel injector control, automatic braking systems,
misfire detection, virtual emission sensors, warranty activity analyzers.
3.Banking
Check and other document readers, credit application evaluators, cash forecasting, firm
classification, exchange rate forecasting, predicting loan recovery rates, measuring credit risk.
4.Defense
Weapon steering, target tracking, object discrimination, facial recognition, new kinds of
sensors, sonar, radar and image signal processing including data compression, feature
extraction and noise suppression, signal/image identification.
5.Electronics
Code sequence prediction, integrated circuit chip layout, process control, chip failure
analysis, machine vision, voice synthesis, nonlinear modelling.
6.Entertainment
7.Medical
Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, optimization of
transplant times, hospital expense reduction, hospital quality improvement, emergency room
test advisement.
8. Robotics
9.Speech
10.Telecommunications
Image and data compression, automated information services, real-time translation of spoken
language, customer payment processing systems.
Conclusion
The number of neural network applications, the money that has been invested in neural
network software and hardware, and the depth and breadth of interest in these devices is
enormous.
2.Human and Artificial Neurons- Investigating the similarities
Much is still unknown about how the brain trains itself to process information, so theories
abound.
The brain consists of a large number (approximately 10^11) of highly connected elements
(approximately 10^4 connections per element) called neurons. For our purposes these
neurons have three principal components: the dendrites, the cell body and the axon.
In the human brain, a typical neuron collects signals from others through a host of fine
structures called dendrites. The neuron sends out spikes of electrical activity through a long,
thin stand known as an axon, which splits into thousands of branches. At the end of each
branch, a structure called a synapse converts the activity from the axon into electrical effects
that inhibit or excite activity from the axon into electrical effects that inhibit or excite activity
in the connected neurons. When a neuron receives excitatory input that is sufficiently large
compared with its inhibitory input, it sends a spike of electrical activity down its axon.
Learning occurs by changing the effectiveness of the synapses so that the influence of one
neuron on another changes.
The artificial neuron is developed to mimic the first-order characteristics of the biological
neuron. In similar to the biological neuron, the artificial neuron receives many inputs
representing the output of other neurons. Each input is multiplied by a corresponding weight,
analogous to the synaptic strength. All of these weighted inputs are then summed and passed
through an activation function to determine the neuron input.
Fig: An Artificial Neuron
We conduct these neural networks by first trying to deduce the essential features of neurones
and their interconnections. We then typically program a computer to simulate these features.
However, because our knowledge of neurons is incomplete and our computing power is
limited, our models are necessarily gross idealisations of real networks of neurons
a. Biological Neuron
b. Artificial Neuron
3.Architecture of Artificial Neural Network
Input Layer
This is where the network receives its input data. Each input neuron in the layer corresponds
to a feature in the input data.
Hidden Layers
These layers perform most of the computational heavy lifting. A neural network can have one
or multiple hidden layers. Each layer consists of units (neurons) that transform the inputs into
something that the output layer can use.
Output Layer
The final layer produces the output of the model. The format of these outputs varies
depending on the specific task (e.g., classification, regression).
Neurons
Neurons, also known as artificial neurons or nodes, are fundamental units in artificial neural
networks. Each neuron connects to other neurons within the network, allowing for the
transmission and processing of information. Each neuron has an associated weight and
threshold. Weights determine the strength of the connection between neurons, while the
threshold dictates the minimum input required for a neuron to activate.
Weights
These are numerical values assigned to the connections between neurons in different layers of
the network. They determine the strength or importance of each connection. During training,
the weights are adjusted to minimize the difference between the network's predictions and the
actual target values.
Biases
These are additional numerical values associated with each neuron. They act as an offset,
allowing the neuron to activate even when the input is zero. Like weights, biases are also
adjusted during training to improve the network's accuracy.
Activation Function
The behaviour of the artificial neuron depends both on the synaptic weights and the
activation function. Sigmoid functions are the commonly used activation functions in
multilayered feed forward neural networks. Neurons with sigmoid functions bear a greater
resemblance to the biological neurons than with other activation functions. The other feature
of sigmoid function is that it is differentiable, and gives a continuous values output. Some of
the popular activation functions are described below along with their other characteristics.
The simplest type of network where data flows in one direction from input to output is
the feed forward neural network. It is often use for task like classification and regression.
Consider m numbers of neurons are arranged in a layer structure and each neuron
receiving n inputs as shown in Fig. Output and input vectors are respectively.
Weight wji connects the jth neuron with the ith input. Then the activation value for jth
neuron as
The following nonlinear transformation involving the activation function f(net j), for
j=1,2,. . .m, completes the processing of X. The transformation will be done by each of
the m neurons in the network.
where weight vector wj contains weights leading toward the j th output node and is defined
as follows Wj = [ wj1 wj2 . . . wjn]
Introducing the nonlinear matrix operator F, the mapping of input space X to output
space O implemented by the network can be written as
O = F (W X)
Where W is the weight matrix and also known as connection matrix and is represented as
The weight matrix will be initialized and it should be finalized through appropriate
training method.
The nonlinear activation function f(.) on the diagonal of the matrix operator F(.) operates
component-wise on the activation values net of each neuron. Each activation value is, in
turn, a scalar product of an input with the respective weight vector, X is called input
vector and O is called output vector. The mapping of an input to an output is of the feed-
forward and instantaneous type, since it involves no delay between the input and the
output. Therefore the relation may be written in terms of time t as
O (t) = F (W X(t))
Example: To illustrate the computation of output O(t), of the single layer feed forward
network consider an input vector X(t) and a network weight matrix W (say initialized
weights), given below. Consider the neurons uses the hard limiter as its activation
function.
= [ -1 1 1 1]
The output vector of the above single layer feedforward network is = [ -1 1 1 1].
Convolutional neural network (CNN), a class of artificial neural networks that has become
dominant in various computer vision tasks, is attracting interest across a variety of domains,
including radiology.
CNN is a type of deep learning model for processing data that has a grid pattern, such as
images, which is inspired by the organization of animal visual cortex [13, 14] and designed to
automatically and adaptively learn spatial hierarchies of features, from low- to high-level
patterns.
“A simple CNN is a sequence of layers, and every layer of a CNN transforms one volume of
activations to another through a differentiable function.” What it actually means is that, each
layer is associated with converting the information from the values, available in the previous
layers, into some more complex information and pass on to the next layers for further
generalization.
The operation involves multiplying the values of a cell corresponding to a particular row and
column, of the image matrix, with the value of the corresponding cell in the filter matrix. We
do this for the values of all the cells within the span of the filter matrix and add them together
to form an output. For example, here part of the image matrix and part of the filter matrix are
convolved.
1. The number of filters affects the depth of the output. For example, three distinct filters
would yield three different feature maps, creating a depth of three.
2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix.
While stride values of two or greater is rare, a larger stride yields a smaller output.
3. Zero-padding is usually used when the filters do not fit the input image. This sets all
elements that fall outside of the input matrix to zero, producing a larger or equally sized
output. There are three types of padding
Pooling layer
A pooling layer provides a typical downsampling operation which reduces the in-plane
dimensionality of the feature maps in order to introduce a translation invariance to small
shifts and distortions, and decrease the number of subsequent learnable parameters. It is of
note that there is no learnable parameter in any of the pooling layers, whereas filter size,
stride, and padding are hyperparameters in pooling operations, similar to convolution
operations.
Max pooling
The most popular form of pooling operation is max pooling, which extracts patches from the
input feature maps, outputs the maximum value in each patch, and discards all the other
values. A max pooling with a filter of size 2 × 2 with a stride of 2 is commonly used in
practice. This downsample the in-plane dimension of feature maps by a factor of 2. Unlike
height and width, the depth dimension of feature maps remains unchanged.
Another pooling operation worth noting is a global average pooling. A global average pooling
performs an extreme type of downsampling, where a feature map with size of height × width
is downsampled into a 1 × 1 array by simply taking the average of all the elements in each
feature map, whereas the depth of feature maps is retained. This operation is typically applied
only once before the fully connected layers. The advantages of applying global average
pooling are that it reduces the number of learnable parameters and enables the CNN to
accept inputs of variable size.
The output feature maps of the final convolution or pooling layer is typically flattened, i.e.,
transformed into a one-dimensional (1D) array of numbers (or vector), and connected to one
or more fully connected layers, also known as dense layers, in which every input is connected
to every output by a learnable weight. Once the features extracted by the convolution layers
and downsampled by the pooling layers are created, they are mapped by a subset of fully
connected layers to the final outputs of the network, such as the probabilities for each class in
classification tasks. The final fully connected layer typically has the same number of output
nodes as the number of classes. Each fully connected layer is followed by a nonlinear
function, such as ReLU.
Introduction
The dynamics of neuron consists of two parts. One is the dynamics of the activation state and
the second one is the dynamics of the synaptic weights.
The Short Term Memory (STM) in neural networks is modelled by the activation state of the
network and the Long Term Memory is encoded the information in the synaptic weights due
to learning. The main property of artificial neural network is that, the ability of the learning
from its environment and history.
The network learns about its environment and history through its interactive process of
adjustment applied to its synaptic weights and bias levels.
Generally, the network becomes more knowledgeable about its environment and history,
after completion each iteration of learning process. It is important to distinguish between
representation and learning. Representation refers to the ability of a perceptron (or other
network) to simulate a specified function. Learning requires the existence of a systematic
procedure for adjusting the network weights to produce that function.
There are too many activities associated with the notion of learning and we define
learning in the context of neural networks as
“Learning is a process by which the free parameters of neural network are adapted
through a process of stimulation by the environment in which the network is embedded.
The type of learning is determined by the manner in which the parameter changes takes
place”
Based on the above definition the learning process of ANN can be divided into the
following sequence of steps:
A set of defined rules for the solution of a learning problem is called algorithm. There are
different approaches to train an ANN. Most of the methods fall into one of two classes
namely supervised learning and unsupervised learning.
Supervised learning:
Supervised training requires the pairing of each input vector with a target vector
representing the desired output; together these are called a training pair. Usually a
network is trained over a number of such training pairs. An input vector is applied, the
output of the network is calculated and compared to the corresponding target vector and
the difference (error) is fed back through the network and weights are changed according
to an algorithm that tends to minimize the error. The vectors of the training set are
applied sequentially, and errors are calculated and weights adjusted for each vector, until
the error for the entire training set is at the acceptably low value.
Unsupervised learning
No external signal (teacher) is used in the learning process. The neural network relies upon
both internal and local information.
Unsupervised training is a far more plausible model of training in the biological system.
Developed by Kohonen (1984) and many others, it requires no target vector for the outputs,
and hence, no comparisons to predetermined ideal responses. The training set consists solely
of input vectors. The training algorithm modifies network weights to produce output vectors
that consistent; i.e., both application of one of the training vectors and application of a vector
that is sufficiently similar to it will produce the same patterns of outputs.
• The training process, therefore, extracts the statistical properties of the training set and
group’s similar vector into classes.
• Applying a vector from a given class as a input will produce a specific output vector, but
there is no way to determine prior to training which specific output pattern will be produced
by a given input vector class. Hence, the outputs of such a network must generally be
transformed into a comprehensible form subsequent to the training process.
Loss function
The cross-entropy loss that is used in neural networks is the same one as for logistic
regression. If the neural network is being used as a binary classifier, with the sigmoid at the
final layer, the loss function is same as logistic regression loss.
If we are using the network to classify into 3 or more classes, the loss function is exactly the
same as the loss for multinomial regression.
First, when we have more than 2 classes we’ll need to represent both y and yˆ as vectors.
Let’s assume we’re doing hard classification, where only one class is the correct one. The true
label y is then a vector with K elements, each corresponding to a class, with y c = 1 if the
correct class is c, with all other elements of y being 0. Recall that a vector like this, with one
value equal to 1 and the rest 0, is called a one-hot vector. And our classifier will produce an
estimate vector with K elements yˆ, each element yˆ k of which represents the estimated
probability p(yk = 1|x). The loss function for a single example x is the negative sum of the
logs of the K output classes, each weighted by their probabilities.
Computing the gradient requires the partial derivative of the loss function with respect to
each parameter. For a network with one weight layer and sigmoid output (which is what
logistic regression is), we could simply use the derivative of the loss that we used for logistic
regression.
Or for a network with one weight layer and softmax output (=multinomial logistic
regression), we could use the derivative of the softmax loss shown for a particular weight wk
and input xi .
But these derivatives only give correct updates for one weight layer: the last one. For deep
networks, computing the gradients for each weight is much more complex, since we are
computing the derivative with respect to weight parameters that appear all the way back in
the very early layers of the network, even though the loss is computed only at the very end of
the network. The solution to computing this gradient is an algorithm called error
backpropagation or backprop (Rumelhart et al., 1986). While backprop was invented
specially for neural networks, it turns out to be the same as a more general procedure called
backward differentiation, which depends on the notion of computation graphs. Let’s see how
that works in the next subsection.
6. Backpropagation
In order to train a neural network to perform some task, we must adjust the weights of
each unit in such a way that the error between the desired output and the actual output is
reduced. This process requires that the neural network compute the error derivative of the
weights (EW). In other words, it must calculate how the error changes as each weight is
increased or decreased slightly. The back propagation algorithm is the most widely used
method for determining the EW. The backpropagation algorithm is easiest to understand
if all the units in the network are linear. The algorithm computes each EW by first
computing the EA, the rate at which the error changes as the activity level of a unit is
changed. For output units, the EA is simply the difference between the actual and the
desired output. To compute the EA for a hidden unit in the layer just before the output
layer, we first identify all the weights between that hidden unit and the output units to
which it is connected. We then multiply those weights by the EAs of those output units
and add the products. This sum equals the EA for the chosen hidden unit. After
calculating all the EAs in the hidden layer just before the output layer, we can compute in
like fashion the EAs for other layers, moving from layer to layer in a direction opposite
to the way activities propagate through the network. This is what gives back propagation
its name. Once the EA has been computed for a unit, it is straight forward to compute the
EW for each incoming connection of the unit. The EW is the product of the EA and the
activity through the incoming connection.
Rojas [2005] claimed that BP algorithm could be broken down to four main steps. After
choosing the weights of the network randomly, the back propagation algorithm is used to
compute the necessary corrections. The algorithm can be decomposed in the following
four steps:
i. Feed-forward computation
Back propagation to the output layer
ii. Back propagation to the hidden layer
iii. Weight updates
The algorithm is stopped when the value of the error function has become
sufficiently small. This is very rough and basic formula for BP algorithm. There
are some variation proposed by other scientist but Rojas definition seem to be
quite accurate and easy to follow. The last step, weight updates is happening
through out the algorithm.
Units are connected to one another. Connections correspond to the edges of the
underlying directed graph. There is a real number associated with each
connection, which is called the weight of the connection. We denote by w ij the
weight of the connection from unit u i to unit uj . It is then convenient to represent
the pattern of connectivity in the network by a weight matrix W whose elements
are the weights Wij. Two types of connection are usually distinguished: excitatory
and inhibitory. A positive weight represents an excitatory connection whereas a
negative weight represents an inhibitory connection. The pattern of connectivity
characterises the architecture of the network.
A unit in the output layer determines its activity by following a two step procedure.
First, it computes the total weighted input xj, using the formula:
Xj = ∑i yi Wij
Where yi is the activity level of the j th unit in the previous layer and W ij is the weight of the
connection between the ith and the jth unit.
Once the activities of all output units have been determined, the network computes the error
E, which is defined by the expression:
where yj is the activity level of the j th unit in the top layer and d j is the desired output of the j th
unit.
1. Compute how fast the error changes as the activity of an output unit is changed. This error
derivative (EA) is the difference between the actual and the desired activity.
2. Compute how fast the error changes as the total input received by an output unit is
changed. This quantity (EI) is the answer from step 1 multiplied by the rate at which the
output of a unit changes as its total input is changed.
3. Compute how fast the error changes as a weight on the connection into an output unit is
changed. This quantity (EW) is the answer from step 2 multiplied by the activity level of the
unit from which the connection emanates.
5. Compute how fast the error changes as the activity of a unit in the previous layer
is changed. This crucial step allows backpropagation to be applied to multilayer
networks. When the activity of a unit in the previous layer changes, it affects the
activities of all the output units to which it is connected. So to compute the
overall effect on the error, we add together all these separate effects on output
units. But each effect is simple to calculate. It is the answer in step 2multiplied by
the weight on the connection to that output unit.
By using steps 2 and 4, we can convert the EAs of one layer of units into EAs for the
previous layer. This procedure can be repeated to get the EAs for as many previous layers as
desired. Once we know the EA of a unit, we can use steps 2 and 3 to compute the EWs on its
incoming connections.
7. Perceptron
The most influential work on neural nets in the 60's went under the heading of
'perceptron' a term coined by Frank Rosenblatt. The perceptron turns out to be an
MCP model ( neuron with weighted inputs ) with some additional, fixed, pre-
processing.
Units labelled A1, A2, Aj , Ap are called association units and their task is to extract
specific, localised featured from the input images. Perceptron mimic the basic idea
behind the mammalian visual system. They were mainly used in pattern recognition
even though their capabilities extended a lot more.
In 1969 Minsky and Papert wrote a book in which they described the limitations of
single layer Perceptron. The impact that the book had was tremendous and caused a
lot of neural network researchers to loose their interest. The book was very well
written and showed mathematically that single layer perceptron could not do some
basic pattern recognition operations like determining the parity of a shape or
determining whether a shape is connected or not. What they did not realised, until the
80's, is that given the appropriate training, multilevel perceptron can do these
operations.
a. Perceptron Model
In the 1960, perceptron created a great deal of interest and optimism. Rosenblatt
(1962) proved a remarkable theorem about perceptron learning. Widrow (Widrow
1961, 1963, Widrow and Angell 1962, Widrow and Hoff 1960) made a number of
convincing demonstrations of perceptron like systems. Perceptron learning is of the
supervised type. A perceptron is trained by presenting a set of patterns to its input,
one at a time, and adjusting the weights until the desired output occurs for each of
them. Its synaptic weights are denoted by w1, w2, . . . wn. The inputs applied to the
perceptron are denoted by x1, x2, . . . . xn. The externally applied bias is denoted by
b.
For discrete perceptron the activation function should be hard limiter or sgn()function. The
popular application of discrete perceptron is a pattern classification. To develop insight into
the behaviour of a pattern classifier, it is necessary to plot a map of the decision regions in n-
dimensional space, spanned by the n input variables. The two decision regions separated by a
hyper plane defined by
This is illustrated in Figure for two input variables x1 and x2, for which the decision
boundary takes the form of a straight line.
Fig: Illustration of the hyper plane (in this example, a straight lines)as decision boundary for
a two dimensional, two-class patron classification problem.
For the perceptron to function properly, the two classed C 1 and C2 must be linearly separable.
This in turn, means that the patterns to be classified must be sufficiently separated from each
other to ensure that the decision surface consists of a hyper plane. This is illustrated in Figure.
(a)A pair of linearly separable patterns
the two classes C1 and C 2are sufficiently separated from each other to draw a hyper plane (in
this it is a straight line) as the decision boundary. If however, the two classes C 1 and C2 are
allowed to move too close to each other, as in Figure (b), they become nonlinearly separable,
a situation that is beyond the computing capability of the perceptron.
Suppose then that the input variables of the perceptron originate from two linearly separable
classes. Let æ1 be the subset of training vectors X 1(1), X1(2), . . . . , that belongs to class
C1and æ2 be the subset of train vectors X 2(1), X2(2),. . . . . , that belong to class C2. The
union of æ1 and æ2 is the complete training set æ.
Given the sets of vectors æ1 and æ2 to train the classifier, the training process involves the
adjustment of the W in such a way that the two classes C 1 and C2 are linearly separable. That
is, there exists a weight vector W such that we may write,
In the second condition, it is arbitrarily chosen to say that the input vector X belongs to class
C2 if WX = 0.
2. Otherwise, the weight vector of the perceptron is updated in accordance with the rule.
W(k+1)T = WkT - ⴄXk if Wk Xk > 0 and XK , belongs to class C2
W(k+1)T = WkT + ⴄXk if Wk Xk <= 0 and XK , belongs to class C1
where the learning rule parameter, ⴄ controls the adjustment applied to the weight vector.
Equations may be written generally as :
W(k+1) = Wkt + ⴄ/2(dk -ok )Xk
Limitations of perceptron
There are limitations to the capabilities of perceptron however. They will learn the solution, if
there is a solution to be found.
First, the output values of a perceptron can take on only one of two values (True or False).
Second, perceptron can only classify linearly separable sets of vectors. If a straight line or
plane can be drawn to separate the input vectors into their correct categories, the input vectors
are linearly separable and the perceptron will find the solution. If the vectors are not linearly
separable learning will never reach a point where all vectors are classified properly.
The most famous example of the perceptron's inability to solve problems with linearly non-
separable vectors is the bool exclusive-OR problem.
Consider the case of the exclusive-or (XOR) problem. The XOR logic function has two
inputs and one output, how below
It produces an output only if either one or the other of the inputs is on, but not if both are off
or both are on. It is shown in above table.
We can consider this has a problem that we want the perceptron to learn to solve; output a 1
of the x is on and y is off or y is on and x is off, otherwise output a ‘0’. It appears to be a
simple enough problem. We can draw it in pattern space as
The x-axis represents the value of x, the y-axis represents the value of y. The inside the
circles represent the inputs that produce an output of 1, whilst the outside the circles show the
inputs that produce an output of 0. Considering the inside the circles and outside circles as
separate classes, we find that, we cannot draw a straight line to separate the two classes. Such
patterns are known as linearly inseparable since no straight line can divide them up
successfully. Since we cannot divide them with a single straight line, the perceptron will not
be able to find any such line either, and so cannot solve such a problem. In fact, a single-layer
perceptron cannot solve any problem that is linearly inseparable.
8.Application of Perceptron
import numpy as np
def step_function(x):
for _ in range(epochs):
for i in range(len(X)):
return weights
plt.figure(figsize=(5, 5))
plt.ylabel("Input 2")
plt.legend()
plt.show()
logic_gates = {
weights = perceptron_train(X, y)
print(equation)
# XOR dataset
weights = np.random.randn(2)
bias = np.random.randn()
learning_rate = 0.1
epochs = 10
def step_function(z):
# Perceptron training
errors = []
total_error = 0
for i in range(len(X)):
y_pred = step_function(z)
# Update rule
errors.append(total_error)
plt.figure()
if weights[1] != 0:
else:
for j in range(len(X)):
plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.xlabel('X1')
plt.ylabel('X2')
plt.legend()
plt.grid()
plt.show()
# Error plot
plt.figure()
plt.xlabel('Epochs')
plt.ylabel('Total Errors')
plt.grid()
plt.show()
import numpy as np
class Perceptron:
self.lr = lr
self.epochs = epochs
self.bias = np.zeros(num_classes)
for i in range(len(x_train)):
xi = x_train[i]
yi = y_train[i]
y_pred = np.argmax(scores)
if y_pred != yi:
self.weights[yi] += self.lr * xi
self.bias[yi] += self.lr
self.weights[y_pred] -= self.lr * xi
self.bias[y_pred] -= self.lr
num_classes = 10
input_size = x_train.shape[1]
perceptron.train(x_train, y_train)
# Make predictions
y_pred = perceptron.predict(x_test)
# Calculate accuracy
for i, ax in enumerate(axes.flat):
ax.axis('off')
plt.show()
Binary classification of linearly separable data using Perceptron
import numpy as np
def generate_data(n):
np.random.seed(0)
X = np.random.randn(n, 2)
# Perceptron Algorithm
b = 0 # Initialize bias
losses = []
loss = 0
for i in range(len(y)):
w += lr * y[i] * X[i]
b += lr * y[i]
loss += 1
losses.append(loss)
plot_decision_boundary(X, y, w, b, epoch)
return w, b, losses
plt.figure()
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
# Main function
def main():
X, y = generate_data(100)
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Misclassification Count')
plt.show()