0% found this document useful (0 votes)
21 views12 pages

Unit 4 (PCA)

The document provides an overview of Principal Component Analysis (PCA), an unsupervised learning algorithm used for dimensionality reduction by transforming correlated features into uncorrelated principal components. It describes the steps involved in PCA, its applications in various fields, and introduces Hebbian Learning Rule and Self Organizing Maps (SOM) as other learning techniques in neural networks. Additionally, it includes a question bank related to the discussed concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views12 pages

Unit 4 (PCA)

The document provides an overview of Principal Component Analysis (PCA), an unsupervised learning algorithm used for dimensionality reduction by transforming correlated features into uncorrelated principal components. It describes the steps involved in PCA, its applications in various fields, and introduces Hebbian Learning Rule and Self Organizing Maps (SOM) as other learning techniques in neural networks. Additionally, it includes a question bank related to the discussed concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CSE 7th (NNDL)

UNIT- 4

Principal Component Analysis


Principal Component Analysis is an unsupervised learning algorithm that is used
for the dimensionality reduction in machine learning. It is a statistical process that
converts the observations of correlated features into a set of linearly uncorrelated
features with the help of orthogonal transformation. These new transformed
features are called the Principal Components. It is one of the popular tools that is
used for exploratory data analysis and predictive modeling. It is a technique to
draw strong patterns from the given dataset by reducing the variances.
PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.

Principal Component Analysis


1. Principal Component Analysis (PCA) is a technique for dimensionality
reduction that identifies a set of orthogonal axes, called principal components,
that capture the maximum variance in the data. The principal components are
linear combinations of the original variables in the dataset and are ordered in
decreasing order of importance. The total variance captured by all the
principal components is equal to the total variance in the original dataset.
1. The first principal component captures the most variation in the data, but the
second principal component captures the maximum variance that
is orthogonal to the first principal component, and so on.
1. Principal Component Analysis can be used for a variety of purposes, including
data visualization, feature selection, and data compression. In data
visualization, PCA can be used to plot high-dimensional data in two or three
dimensions, making it easier to interpret. In feature selection, PCA can be
used to identify the most important variables in a dataset. In data compression,
PCA can be used to reduce the size of a dataset without losing important
information.
1. In Principal Component Analysis, it is assumed that the information is carried
in the variance of the features, that is, the higher the variation in a feature, the
more information that features carries.

PCA works by considering the variance of each attribute because the high
attribute shows the good split between the classes, and hence it reduces the
dimensionality. Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.
The PCA algorithm is based on some mathematical concepts such as:
o Variance and Covariance
o Eigenvalues and Eigen factors
Some common terms used in PCA algorithm:
o Dimensionality: It is the number of features or variables present in the
given dataset. More easily, it is the number of columns present in the
dataset.
o Correlation: It signifies that how strongly two variables are related to each
other. Such as if one changes, the other variable also gets changed. The
correlation value ranges from -1 to +1. Here, -1 occurs if variables are
inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and
hence the correlation between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given.
Then v will be eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of
variables is called the Covariance Matrix.
Principal Components in PCA
As described above, the transformed new features or the output of PCA are the
Principal Components. The number of these PCs are either equal to or less than
the original features present in the dataset. Some properties of these principal
components are given below:
o The principal component must be the linear combination of the original
features.
o These components are orthogonal, i.e., the correlation between a pair of
variables is zero.
o The importance of each component decreases when going to 1 to n, it
means the 1 PC has the most importance, and n PC will have the least
importance.
Steps for PCA algorithm
1. Getting the dataset
Firstly, we need to take the input dataset and divide it into two subparts X
and Y, where X is the training set, and Y is the validation set.
2. Representing data into a structure
Now we will represent our dataset into a structure. Such as we will
represent the two-dimensional matrix of independent variable X. Here each
row corresponds to the data items, and the column corresponds to the
Features. The number of columns is the dimensions of the dataset.
3. Standardizing the data
In this step, we will standardize our dataset. Such as in a particular column,
the features with high variance are more important compared to the
features with lower variance.
If the importance of features is independent of the variance of the feature,
then we will divide each data item in a column with the standard deviation
of the column. Here we will name the matrix as Z.
4. Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will
transpose it. After transpose, we will multiply it by Z. The output matrix will
be the Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the
resultant covariance matrix Z. Eigenvectors or the covariance matrix are the
directions of the axes with high information. And the coefficients of these
eigenvectors are defined as the eigenvalues.
6. Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing
order, which means from largest to smallest. And simultaneously sort the
eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix
will be named as P*.
7. Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P*
matrix to the Z. In the resultant matrix Z*, each observation is the linear
combination of original features. Each column of the Z* matrix is
independent of each other.
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and
what to remove. It means, we will only keep the relevant or important
features in the new dataset, and unimportant features will be removed out.
Applications of Principal Component Analysis
o PCA is mainly used as the dimensionality reduction technique in various AI
applications such as computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions.
Some fields where PCA is used are Finance, data mining, Psychology, etc.

Hebbian Learning Rule with Implementation of AND Gate


Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by
Donald O Hebb. It is one of the first and also easiest learning rules in the neural
network. It is used for pattern classification. It is a single layer neural network, i.e.
it has one input layer and one output layer. The input layer can have many units,
say n. The output layer only has one unit. Hebbian rule works by updating the
weights between neurons in the neural network for each training sample.
Hebbian Learning Rule Algorithm :
1. Set all weights to zero, wi = 0 for i=1 to n, and bias to zero.
2. For each input vector, S(input vector) : t(target output pair), repeat steps 3-
5.
3. Set activations for input units with the input vector Xi = Si for i = 1 to n.
4. Set the corresponding output value to the output neuron, i.e. y = t.
5. Update weight and bias by applying Hebb rule for all i = 1 to n:

Implementing AND Gate :


Truth Table of AND Gate using bipolar sigmoidal function
There are 4 training samples, so there will be 4 iterations. Also, the activation
function used here is Bipolar Sigmoidal Function so the range is [-1,1].
Step 1 :
Set weight and bias to zero, w = [ 0 0 0 ]T and b = 0.
Step 2 :
Set input vector Xi = Si for i = 1 to 4.
X1 = [ -1 -1 1 ]T
X2 = [ -1 1 1 ]T
X3 = [ 1 -1 1 ]T
X4 = [ 1 1 1 ]T
Step 3 :
Output value is set to y = t.
Step 4 :
Modifying weights using Hebbian Rule:
First iteration –
w(new) = w(old) + x1y1 = [ 0 0 0 ]T + [ -1 -1 1 ]T . [ -1 ] = [ 1 1 -1 ]T
For the second iteration, the final weight of the first one will be used and so on.
Second iteration –
w(new) = [ 1 1 -1 ]T + [ -1 1 1 ]T . [ -1 ] = [ 2 0 -2 ]T
Third iteration –
w(new) = [ 2 0 -2]T + [ 1 -1 1 ]T . [ -1 ] = [ 1 1 -3 ]T
Fourth iteration –
w(new) = [ 1 1 -3]T + [ 1 1 1 ]T . [ 1 ] = [ 2 2 -2 ]T
So, the final weight matrix is [ 2 2 -2 ]T
Testing the network :

The network with the final weights

For x1 = -1, x2 = -1, b = 1, Y = (-1)(2) + (-1)(2) + (1)(-2) = -6


For x1 = -1, x2 = 1, b = 1, Y = (-1)(2) + (1)(2) + (1)(-2) = -2
For x1 = 1, x2 = -1, b = 1, Y = (1)(2) + (-1)(2) + (1)(-2) = -2
For x1 = 1, x2 = 1, b = 1, Y = (1)(2) + (1)(2) + (1)(-2) = 2
The results are all compatible with the original table.
Decision Boundary :
2x1 + 2x2 – 2b = y
Replacing y with 0, 2x1 + 2x2 – 2b = 0
Since bias, b = 1, so 2x1 + 2x2 – 2(1) = 0
2( x1 + x2 ) = 2
The final equation, x2 = -x1 + 1

Decision Boundary of AND Function

Self Organizing Maps – Kohonen Maps


Self Organizing Map (or Kohonen Map or SOM) is a type of Artificial Neural
Network which is also inspired by biological models of neural systems from the
1970s. It follows an unsupervised learning approach and trained its network
through a competitive learning algorithm. SOM is used for clustering and mapping
(or dimensionality reduction) techniques to map multidimensional data onto
lower-dimensional which allows people to reduce complex problems for easy
interpretation. SOM has two layers, one is the Input layer and the other one is the
Output layer.
The architecture of the Self Organizing Map with two clusters and n input features
of any sample is given below:
How do SOM works?
Let’s say an input data of size (m, n) where m is the number of training examples
and n is the number of features in each example. First, it initializes the weights of
size (n, C) where C is the number of clusters. Then iterating over the input data,
for each training example, it updates the winning vector (weight vector with the
shortest distance (e.g Euclidean distance) from training example). Weight
updation rule is given by :
wij = wij(old) + alpha(t) * (xik - wij(old))
where alpha is a learning rate at time t, j denotes the winning vector, i denotes
the ith feature of training example and k denotes the kth training example from the
input data. After training the SOM network, trained weights are used for
clustering new examples. A new example falls in the cluster of winning vectors.
Algorithm
Training:
Step 1: Initialize the weights wij random value may be assumed. Initialize the
learning rate α.
Step 2: Calculate squared Euclidean distance.
D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m
Step 3: Find index J, when D(j) is minimum that will be considered as winning
index.
Step 4: For each j within a specific neighborhood of j and for all i, calculate the
new weight.
wij(new)=wij(old) + α[xi – wij(old)]
Step 5: Update the learning rule by using :
α(t+1) = 0.5 * t
Step 6: Test the Stopping Condition.

Learning Vector Quantization


Learn Vector Quantization (or LVQ) is a type of Artificial Neural Network that is
also influenced by the biological model that represents neural networks. It is
based on a prototype algorithm for supervised learning and classification. It has
developed its network using an algorithm of competitive learning similar to the
Self Organizing Map. It is also able to deal with the problem of multiclass
classification. LVQ is composed of two layers, one of which is the input layer, and
the second is called the Output. The structure of Learning Vector Quantization
with the number of classes that are in the input data and the numbers of features
that are input for every sample can be found below:
How Learning Vector Quantization works?

Let's suppose that we have input data of size (m and the number) with m being
the training samples, while n refers to the number of components in each
instance and an arbitrary label vector of the size (1 m). Then, it is initialized with
its weights in size (n and C) from the initial number of training samples that have
different labels. They must be removed in all samples of training. In this case, c is
an indication of the classes. Then, iterate over the remaining input information
for each example of training that it changes to the winner vector (weight vector)
with the closest distance (e.g., Euclidean distance ) from the example of training ).

The weight updation rules are provided by:

1. if correctly_classified:
2. wij(new) = wij(old) + alpha(t) * (xik - wij(old))
3. else:
4. wij(new) = wij(old) - alpha(t) * (xik - wij(old))

Where alpha represents a learning rate over time t, J is an award-winning vector.


In addition, i is the characteristic of the training example, and k represents the
number k of the training sample using an input dataset. After training on the LVQ
network, weights trained are used to classify new examples. The new instance is
labelled by an LVQ class that is the one winning.

Algorithm
The steps involved include:

o Weight initialization
o From 1 to N of Epochs
o Select a good training example
o Find the winning vector
o Make sure you update the vector that is winning
o Repeat steps 3, 4, and 5 for every exercise example.
o Classify test samples

Question bank
1. What is PCA (Principle Component Analysis )
2. Describe the relationship between the Self-Organizing Map algorithm, and the Learning
Vector Quantization algorithm.
ANS- In order to use Learning Vector Quantization (LVQ), a set of approximate
reconstruction vectors is first found using the unsupervised SOM algorithm. The
supervised LVQ algorithm is then used to fine-tune the vectors found using SOM
3. Explain Hebbain-based method using AND gate.
4. What is vector quantization?
5. Explain self-organizing maps.
6. What is dimensionality reduction?
7. Explain the dimensionality reduction using PCA.
8. What is SOM.

You might also like