0% found this document useful (0 votes)
78 views11 pages

ANN Unit 3

A single layer perceptron (SLP) is the simplest type of artificial neural network that can only classify linearly separable data with a binary target. An SLP has a single layer of nodes fully connected to the input layer, with weights assigned randomly initially. It sums the weighted inputs and if the sum exceeds a threshold, the output is activated. The weights are adjusted using a learning rule if the output does not match the desired output, in order to minimize errors. This process continues until the data is classified correctly or the model determines the data is not linearly separable.

Uploaded by

Santhosh Maddila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views11 pages

ANN Unit 3

A single layer perceptron (SLP) is the simplest type of artificial neural network that can only classify linearly separable data with a binary target. An SLP has a single layer of nodes fully connected to the input layer, with weights assigned randomly initially. It sums the weighted inputs and if the sum exceeds a threshold, the output is activated. The weights are adjusted using a learning rule if the output does not match the desired output, in order to minimize errors. This process continues until the data is classified correctly or the model determines the data is not linearly separable.

Uploaded by

Santhosh Maddila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Single Layered Perceptron

Linearly seperable data

A single layer perceptron (SLP) is a feed-forward network based on a threshold transfer function. SLP is
the simplest type of artificial neural networks and can only classify linearly separable cases with a binary
target (1 , 0).

 In a single layer perceptron, the weights to each input node are assigned randomly since there is
no a priori knowledge associated with the nodes.
 Also, a threshold value is assigned randomly
 Now SLP sums all the weights which are inputted and if the sums are is above the threshold then
the network is activated.
 If the calculated value is matched with the desired value, then the model is successful
 If it is not, then since there is no back-propagation technique involved in this the error needs to
be calculated using the below formula and the weights need to be adjusted again.

Perceptron Weight Adjustment

Below is the equation in Perceptron weight adjustment:

Δw = η * d * x

Where,

d: Predicted Output – Desired Output


η: Learning Rate, Usually Less than 1.

x: Input Data.

Since this network model works with the linear classification and if the data is not linearly separable,
then this model will not show the proper results.

Perceptron Learning Rule (learnp)

Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a
set of input, output pairs

P1t1, p2t2, p3t3, ….., pntn


where p is an input to the network and t is the corresponding correct (target) output. The objective is to
reduce the error e, which is the difference between t-a, the neuron response a, and the target vector t.
The perceptron learning rule learnp calculates desired changes to the perceptron's weights and biases
given an input vector p, and the associated error e. The target vector t must contain values of either 0 or
1, as perceptrons (with hardlim transfer functions) can only output such values.

Each time learnp is executed, the perceptron has a better chance of producing the correct outputs. The
perceptron rule is proven to converge on a solution in a finite number of iterations if a solution exists.

There are three conditions that can occur for a single neuron.

 p -> Input vector


 a -> Network response (acquired output)
 t -> Desired Output

CASE 1: If an input vector is presented and the output of the neuron is correct (a = t, and e = t - a = 0),
then the weight vector w is not altered.

CASE 2: If the neuron output is 0 and should have been 1 (a = 0 and t = 1, and e = t - a = 1), the input
vector p is added to the weight vector w. This makes the weight vector point closer to the input vector,
increasing the chance that the input vector will be classified as a 1 in the future.

CASE 3: If the neuron output is 1 and should have been 0 (a = 1 and t = 0, and e = t - a = -1), the input
vector p is subtracted from the weight vector w. This makes the weight vector point farther away from
the input vector, increasing the chance that the input vector is classified as a 0 in the future.
Pattern Classifier

Pattern recognition/classification is the process of recognizing/classifying patterns of data based on


knowledge already gained or on statistical information extracted from patterns and/or their
representation. One of the important aspects of the pattern recognition is its application potential.

Pattern recognition possesses the following features:

1. Pattern recognition system should recognise familiar pattern quickly and accurate
2. Recognize and classify unfamiliar objects
3. Accurately recognize shapes and objects from different angles
4. Identify patterns and objects even when partly hidden
5. Recognise patterns quickly with ease, and with automaticity.

Applications:

Image processing, segmentation and analysis

Pattern recognition is used to give human recognition intelligence to machine which is required in image
processing.

Computer vision

Pattern recognition is used to extract meaningful features from given image/video samples and is used
in computer vision for various applications like biological and biomedical imaging.

Seismic analysis

Pattern recognition approach is used for the discovery, imaging and interpretation of temporal patterns
in seismic array recordings. Statistical pattern recognition is implemented and used in different types of
seismic analysis models.

Radar signal classification/analysis

Pattern recognition and Signal processing methods are used in various applications of radar signal
classifications like AP mine detection and identification.

Speech recognition

The greatest success in speech recognition has been obtained using pattern recognition paradigms. It is
used in various algorithms of speech recognition which tries to avoid the problems of using a phoneme
level of description and treats larger units such as words as pattern
Finger print identification

The fingerprint recognition technique is a dominant technology in the biometric market. A number of
recognition methods have been used to perform fingerprint matching out of which pattern recognition
approaches is widely used.

In a typical pattern recognition application, the raw data is processed and converted into a form that is
amenable for a machine to use. Pattern recognition involves classification and cluster of patterns.

 In classification, an appropriate class label is assigned to a pattern based on an abstraction that


is generated using a set of training patterns or domain knowledge. Classification is used in
supervised learning.
 Clustering generates a partition of the data which helps decision making, the specific decision
making activity of interest to us. Clustering is used in an unsupervised learning.
 Regression. Regression algorithms try to find a relationship between variables and predict
unknown dependent variables based on known data. It is based on supervised learning.

Features may be represented as continuous, discrete or discrete binary variables. A feature is a


function of one or more measurements, computed so that it quantifies some significant characteristics
of the object.

Example: consider our face then eyes, ears, nose etc are features of the face.

A set of features that are taken together, forms the features vector.

Example: In the above example of face, if all the features (eyes, ears, nose etc) taken together then the
sequence is feature vector([eyes, ears, nose]). Feature vector is the sequence of a features represented
as a n-dimensional column vector. In case of speech, MFCC (Melfrequency Cepstral Coefficent) is the
spectral features of the speech.

Training and Learning in Pattern Recognition

Learning is a phenomena through which a system gets trained and becomes adaptable to give result in
an accurate manner. Learning is the most important phase as how well the system performs on the data
provided to the system depends on which algorithms used on the data. Entire dataset is divided into two
categories, one which is used in training the model i.e. Training set and the other that is used in testing
the model after training, i.e. Testing set.

Training set:

Training set is used to build a model. Training rules and algorithms used give relevant information on
how to associate input data with output decision. The system is trained by applying these algorithms on
the dataset, all the relevant information is extracted from the data and results are obtained. Generally,
80% of the data of the dataset is taken for training data.
Testing set:

Testing data is used to test the system. It is the set of data which is used to verify whether the system is
producing the correct output after being trained or not. Generally, 20% of the data of the dataset is used
for testing. Example: a system which identifies which category a particular flower belongs to, is able to
identify seven category of flowers correctly out of ten and rest others wrong, then the accuracy is 70 %

Approaches for Pattern Classification

Statistical approach: Historical data is collected and based on the observations and analyses from those
data, new patterns are recognized.

Syntactical approach: It is also known as the structural approach as it mainly relies upon sub-patterns
called primitives like words.

Neural approach: In this, a neural network is used to analyse the patterns. The advantages of neural
networks are their adaptive-learning, self-organization, and fault-tolerance capabilities. For these
outstanding capabilities, neural networks are used for pattern recognition applications. An ANN initially
goes through a training phase where it learns to recognize patterns in data, whether visually, aurally, or
textually. Some of the best neural models are back-propagation, high-order nets, time-delay neural
networks, and recurrent nets.

Bayesian Classification

Marginal Probability: The probability of an event irrespective of the outcomes of other random
variables, e.g. P(A).

If the random variable is independent, then it is the probability of the event directly, otherwise, if the
variable is dependent upon other variables, then the marginal probability is the probability of the event
summed over all outcomes for the dependent variables, called the sum rule.

Joint Probability: The joint probability is the probability of two (or more) simultaneous events, often
described in terms of events A and B from two dependent random variables, e.g. X and Y. The joint
probability is often summarized as just the outcomes, e.g. A and B. Probability of two (or more)
simultaneous events, e.g. P(A and B) or P(A, B).

Conditional Probability: The conditional probability is the probability of one event given the occurrence
of another event, often described in terms of events A and B from two dependent random variables e.g.
X and Y. Probability of one (or more) event given the occurrence of another event, e.g. P(A given B) or
P(A | B).
Principle of Naive Bayes Classifier:

A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The
crux of the classifier is based on the Bayes theorem.

Event A -> pick a black card from a pack – 26/52

Event B -> pick another black card from the same pack -> 25/51

Bayes Theorem:

Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is
the evidence and A is the hypothesis. The assumption made here is that the predictors/features are
independent. That is presence of one particular feature does not affect the other. Hence it is called
naive.

Example:

The dataset is divided into two parts, namely, feature matrix and the response vector.
 Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of independent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
 Response vector contains the value of class variable (prediction or output) for each row of
feature matrix. In above dataset, the class variable name is ‘Play golf’.

1. It is to be classified whether the day is suitable for playing golf, given the features of the day.
2. The columns represent these features and the rows represent individual entries.
3. If the first row of the dataset is taken, we can observe that is not suitable for playing golf if the
outlook is rainy, temperature is hot, humidity is high and it is not windy.
4. We make two assumptions here,
a. We consider that these predictors are independent. That is, if the temperature is hot, it does not
necessarily mean that the humidity is high.
b. All the predictors have an equal effect on the outcome. That is, the day being windy does not
have more importance in deciding to play golf or not.

According to this example, Bayes theorem can be rewritten as:

The variable y is the class variable(play golf), which represents if it is suitable to play golf or not given the
conditions. Variable X represent the parameters/features.

X is given as,

Here x1,x2….xn represent the features, i.e they can be mapped to outlook, temperature, humidity and
windy. By substituting for X and expanding using the chain rule we get,

Now, you can obtain the values for each by looking at the dataset and substitute them into the
equation. For all entries in the dataset, the denominator does not change, it remain static. Therefore,
the denominator can be removed and proportionality can be introduced.

Before applying the above formula manually on the weather dataset, some precomputations are to be
done on the dataset.
Find P(xi | yj) for each xi in X and yj in y. All these calculations have been demonstrated in the tables
below:

Calculated P(xi | yj) for each xi in X and yj in y manually in the tables. For example, probability of playing
golf given that the temperature is cool, i.e P(temp = cool | play golf = Yes) = 3/9.
Also, we need to find class probabilities (P(y)) which has been calculated in the table. For example,
P(play golf = Yes) = 9/14.

So now, we are done with our pre-computations and the classifier is ready!

Let us test it on a new set of features (let us call it today):

today = (Sunny, Hot, Normal, False)

So, probability of playing golf is given by:

P(Yes|today) = P(Sunny Outlook|Yes)P(Hot Temperature|Yes)P(Normal Humidity|Yes)P(No Wind|Yes)P(Yes) / P(today)

and probability to not play golf is given by:

P(No | today) = P(Sunny Outlook|No)P(Hot Temperature|No)P(Normal Humidity|No)P(No Wind|No)P(No) / P(today)

Since, P(today) is common in both probabilities, we can ignore P(today) and find proportional
probabilities as:

P(Yes | today) (directly proportional ) 2/9 x 2/9 x 6/9 x 6/9 / 9/14 = approx = 0.0141

and

P(No | today) (directly proportional ) 3/5 x 2/5 x 1/5 x 2/5 / 5/14 = approx = 0.0068

Now, since

P(Yes | today) + P(No | today) = 1

These numbers can be converted into a probability by making the sum equal to 1 (normalization):

P(Yes | today) = 0.0141 / (0.0141 + 0.0068) = 0.67

and

P(No | today) = 0.0068 / (0.0141 + 0.0068) = 0.33

Since

P(Yes | today) > P(No | today)

So, prediction that golf would be played is ‘Yes’.


Perceptron as Pattern Classifier

Like Logistic Regression, the Perceptron is a linear classifier used for binary predictions. This means that
in order for it to work, the data must be linearly separable.

The perceptron works by “learning” a series of weights, corresponding to the input features. These input
features are vectors of the available data.

For example, if we were trying to classify whether an animal is a cat or dog,

x1 might be weight, x2 might be height, and x3 might be length.

Each pair of weights and input features is multiplied together, and then the results are summed. If the
summation is above a certain threshold, we predict one class, otherwise the prediction belongs to a
different class.

For example, we could set the threshold at 0. If the summation is greater than 0, the prediction is a 1
(dog), otherwise it’s a 0 (cat).

The final step is to check if our predictions were classified correctly. If they were not, then the weights
are updated using a learning rate. This process continues for a certain number of iterations, known as
“epochs.” The goal is to determine the weights that produce a linear decision boundary that correctly
classifies the predictions.

Step-by-step Example

A good way to understand exactly how the Perceptron works is to walk through a simple example. I’m
going to use a NAND gate model for my example, which has a very small linearly separable dataset.
Given the two features x1 and x2, here’s what the outputs y are for the NAND gate:

x1 x2 y
0 0 1
0 1 1
1 0 1
1 1 0
Breaking the features and output into column vectors

Limitations of Perceptron

The output values of a perceptron can take on only one of two values (0 or 1) due to the hard-
limit transfer function.
Perceptrons can only classify linearly separable sets of vectors. If a straight line or a plane can be
drawn to separate the input vectors into their correct categories, the input vectors are linearly
separable. If the vectors are not linearly separable, learning will never reach a point where all
vectors are classified properly. Note, however, that it has been proven that if the vectors are
linearly separable, perceptrons trained adaptively will always find a solution in finite time.

You might also like