Module 06
Module 06
F
6.4.3 Principal in Principal Component Analysis (P.C.A.)... Page no (&-2
6.4.4 PC1, Pc2, axis in P.C.A.
6.4.5 Calculation of Principal Components.....
6.4.6 Principal Component Analysis (P.C.A.)......
6.4.7 Working Rule of PCA
.9
6.4.8 Use of P.C.A. In Machine Leaming. .&9
6.4.9 Output of PCA
6.4.10
.10
6.4.11
Limitations of PCA..
.10
6.5
Comparison between PCA and Cluster Analysis
Feature extraction and Selection
10
6.5.1
10
Feature Selection Ranking.. .&10
6.5.2 Advantages of PCA. .e11
6.5.3
Disadvantages of PCA .&12
6.6
Dimensionality Reduction Techniques. .612
UQ. Why Dimensionality Reduction is very Important step in
Machine Learning ? .612
(Ref. MU(Comp.) -May 19, Dec. 19, 5 Marks)
6.6.1
Dimension Reduction Techniques in ML .6-12
6.6.1(A) Feature Selection. .6-13
6.6.1(B) Feature Extraction... .6-13
6.7
PrincipAL Component Analysis. .614
UQ.
Explain in detail Principal Component Analysis for .614
(Ref. MU (Comp.) -May 15, May 16, 1O Dimension Reduction.
UQ.
Marks)
Use Principal Component analysis (PCA)to ..614
2 1 0 arrive at the transformed matrix for the given
data.
A=4 3 1
(Ref, MU (Comp.) - May 17, 10
UQ.
Marks).
Describe the steps to reduce 614
dimensionality using Principal Component Analysis
mathernatical formulas used. (Ref. MU (Comp.) - May 19, Dec. method by clearty stabng
UQ. Write Short note on ISA and 19, 5 Marks) 6-14
compare it with PCA. (Ref. MU
Chpater End. (Comp.) - May 19, 5 Marks) .6-14
.6-1
siochine Learning (MU - Sem 6 - ECS &AIDS)
(Dimensionality Reduction) ...Page no (6-3)
The curse of dimensionality refers to various Therc is an cxponential increase in volume associated
phenomena that arise when we analyse and organise with adding extra dimensions to a mathematical
data in high-dimensional spaces. These phenomena space.
do not occur in low-dimensional settings such as the thr For example, 10 = 100 evenly spaced sarmple points
ee dimensional physical space of everyday experience. suffice to sample a unit interval with 10 = 0.01
distance between points.
Dimensionally cursed phenomena occur in domains
An equivalent sampling of a 10-dimensional unit
such as numerical analysis, sampling, combinatorics,
hypercube with a lattice that has a spacing of 10
machine learning data mining, and databases.
The actual and common problem is that when the
=0.01 between adjacent points would require 10
10
dimensionality increases, the volume of the space = (10)" sample points.
increases. And it increases so fast that the available This effect is a combination of the combinatorics
data becomes sparse.
problems.
In order to obtain a reliable result, the amount of data
needs often grows exponentially with the 6.1.3 Optimisaiton
dimensionality. When solving dynamic optimisation, problems by
numerical backword induction the objective function
Also, organising and searching data often relies on
detecting areas where objects form groups with similar must be computed for each combination of values.
properties. But, in high dimensional data, all objects Thisis a significant obstacle when the dimension of the
appear to be sparse and dissimilar in many ways and "state variable" is 1large.
that prevents common data organisation strategies from
a 6.1.4 Anomaly Detection
being efficient.
We Come across the following problems when
a 6.1.1 Combinatorics
searching for anomalies in high dimensional data.
In some problems, each variable can take one of several 1. Concentration of scores and distances : derived
discrete values, or the range of possible values is values such as distances become numerically similar.
divided to give a finite number of possibilities.
2. Irrelevant attributes: in high dimensional data, a
Taking the variables together, a huge number of significant number of attributes may be irelevant.
combinations of values must be considered.
3. Definition of reference sets : For local methods,
This effect is known as combinatorial explosion. reference sets are often nearest-neighbor based.
dimensionalities :
Even in the simplest case of d binary variables, the 4. Incomparable scores for different
number of possible combinations is 2". different subspaces produce incomplete scores.
longer
Thus, each additional dimension doubles
the effect 5. Interpretability of scores : the scores often no
needed to try all combinations. convey a semantic meaning.
Exponential search space : the search space
can no
6.
longer be systematically scanned.
Tech-Neo Publications.A SACHIN SHAH Venture
(M6-131)
Machine Learning (^ (Dimensionality Reduction) ...Page no (6-4)
AIDS)
7 Data-snooping bias : given the large search-space, for W 6.2 STATISTICAL FEATURES
every desired significance a hypothesis can be found.
6.2.1 Introduction
8. Hubness: certain objccts occur more frequently in
neighbour lists than others. Statistical features is the most important features uSed
in statistical data science.
6.1.5 Machine Learning
It is the first statistical technique which is applied while
In machine learning. a marginal increase in
exploring adataset and the expressions like bias, mean.
dimensionality also requires a large increase in the variance, standard deviation, median, moments,
volume in the data in order to maintain the same level percentiles etc.
of performance. The field of utility of statistics has been increasing
The curse of dimensionality is the by-product of a steadily and thus different people expressed features of
phenomenon which appears with high dimensional statistics differently according to the developments of
data.
the subject.
6.1.6 How to Combat the COD ? In old days statistics was regarded as the "Science of
statecraft" but today it covers almost every sphere of
Combating COD is not such a big deal until we have natural and human activity. It is more exhanstive and
dimensionality reduction. Dimensionality Reduction is
elaborate in approach.
the process of reducing the number of input variables in
a dataset. 6.2.2 Selected Features
This is known as converting high-dimensional variables (1) Statistics are the classified facts representing the
into lower-dimensional variables without changing conditions of the people in a state specially those facts
their attributes of the same. which can be stated in number or in tables of numbers
or in any tabular or classified arrangement.
Itdoes not contain any extra variables. That makes it
- Webster
very simple for analysts to analyse the data leading to
faster results for algorithms. (2) "Statistics are numerical statements of facts in any
department of enquiry placed in relation to each other"
a 6.1.7 Mitigating Curse of Dimensionality -Bowley
To mitigate the problems associated with high (3) "By statistics we mean quantitative data affected to a
dimensional data, techniques generally, referred to as marked extent by multiplicity of causes"
'Dimensionality reduction' techniques are used. - Yule and Kendall
Dimensionality reduction techniques fall into one of the (4) "Statistics is the aggregate of facts affected to a marked
two categories 'Feature selection' or feature extent by multiplicity of causes, numerically expressed,
extraction'.
enumerated or estimated according to a reasonable
standard of accuracy, collected in a systematic manner,
We shall deal with these techniques in the nest section. for a predetermined purpose and placed in relation to
cach other"
roriable. It is an independent variable. single item or the isolated effect of a single factor
on the given item provided the effect of each of
Remarks
the factors can be measured quantitatively.
) According to Webster, features of statistics are only O Definition of a Feature
numerical| facts. It only restricts the domain of statistics
Feature is defined as to give or bring special attention
to the affairs of a state, i.e. to the social sciences,
to someone or something. For example, a part of the face, a
aonomics etc). But this is very inadequate for modern quality, aspecial
attraction, etc. An example : Nose.
Statistics covers all sciences - SOcial, physical IG
andnatural.
Feature in Data
Each feature represents a measurable piece of data that
Bowley features out statistics in more general way. It
can be used for analysis. For example, name, age, sex,
relates the numerical data in any department of enquiry. fare and so on.
Italso refers to the comparative study of the figures as
Features are also sometimes referred to as Variables
against mere classification and tabulation. or 'Attributes'. Depending upon type of analysis, the
() In the case of social, economic and business features can vary widely.
phenomena, Yule and Kendall's feature refers to data a 6.2.4 Features of Big Data
affected by a multiplicity of causes. For example, Important issues regarding data
In traditlonal file
supply of aparticular commodity is affected by supply,
demand, imports, exports, money in circulation, 1. Volume
competitive products in the market and so on.
2. Velocity
Similarly, the yield of a particular crop depends upon
multiplicity of factors like quality of seed, fertility of 3. Variety
soil, method of cultivation, irrigation facilities, weather
4.Variability
conditions, fertilizers used and so on.
(4) Secrist's feature of statistics is more exhaustive. We 5. Complexity
study them:
(a) Aggregate of facts Fig. 6.2.1 :Important issues regarding data in regarding
data in traditional file
Simple or isolated items cannot be termed as
statistics unless they are a part of aggregate of
1, Volume
facts relating to any particular field of enquiry.
The aggregate of figures of births, deaths, sales, Now a day the volume of data regarding different fields
purchase, production, profits etc. over different
times, places etc. constitute statistics. is high and potentially increasing day by day.
(b) Affected by multipliclty of causes Organizations collect data from a variety of sources,
including business transactions, social media and
In physical sciences, it is possible to isolate the information etc.
is
effect of various factors on a single item but it
The configuration of system with single processor. () Independent and dependent varhabies,
limited RAM and limited storage capacity cannot store (2) Active and Attribute
variables
and manage high volume of data.
(3) Continuous variable
3. Variety (4) Discrete variable
The form of data from different sources is different. (5) Categorical variable
4.
Variability (6) Extraneous variable, and
The flow of data coming from sources like social media (7) Demographic variable
isinconsistent because of daily emerging new trends. It 6.2.8 Statistical Methods
can show sudden increase in size of data which is
difficult to manage. In statistical analysis of raw research data, the methode
used are : mathematical formulae, models and
5. Complexity
techniques (mathematical)
As the data is coming from various sources, it 1s of statistical
The application methods
difficult to link, match and transform such data across extracts
information from research data and provides differet
Systems. It is necessary to connect and correlate ways to assess the strength of research outputs. We nse
relationships, hierarchies and multiple data linkages of numerical (discrete, continuous), categorical and
the data.
ordinal methods for the research output.
All these issues are solved by the new advanced Big
a 6.2.9 Statistical Function
Data Technology.
6.2.5 Features in a Model Statistics help in providing better understanding and
accurate description of nature's phenomenon. Statistics
It is a model that defines features and their helps in the proper and efficient planning ofa statistical
dependencies, typically in the form of a feature inquiry in any field of study.
diagram (with constraints). Also itcould be as a table Statistics helps in collecting appropriate quantitaive
of possible combinations.
data. Basic three elements in statistics are
These are parts or patterns of an object in an image that measurement, comparison and variation.
help to identify it, e.g., a triangle, has three corners and A statistical function, such as mean, median, or
3 edges, these are features of the triangle, and they help
variance summarizes of sample of values by a singe
us to identify it as a 'triangle'. Features include value.
properties like corners, edges, regions of main points,
ridges etc. They expect their parameters to bea probabilistic value
represented by a random sample of values over the Run
a 6.2.6 Four basic Elements of Statistics index.
They are: Population, sample; parameter,statistic The following functions are statistical functions and
(singular); and variable form the basic vocabulary of they are very similar to functions such as sum, Mak, u
statistics. average :
(M6-131) ech:t
Tech-Neo Publications..A SACHIN SHAH Venture
Machine Leaming (MU- Sem 6 - ECS & AIDS)
(Dimensionality Reduction) ...Page no (6-7)
Count Function
1.
number of resources without losing any important or
We use count function when we need to count the
relevant information. Fcature extraction helps to reduce
number of cells containing a number. the amount of redundant data from the data set.
Counta function
2. In the end, the reduction of the data helps to build the
While the counta function only counts the numeric model with less machine's efforts and also increase the
values, the COUNTA function counts all the cclls in a speed of learning and generalization steps in
range that are not empty. The function is useful for the machine learning prOcess.
counting cells containing any type of information, (1) The goal of statistical pattern feature extraction
including error values and empty text. (SPFE) is 'low dimensional reduction'.
3 Count blank function (2) Five measures of statistical features are :
The count blank function counts the number of empty (i) The most extremne values in the data set (the
cells in a range of cells. Cells with formulae that return maximum and minimum values),
empty text are also counted but cells with zero values
(ü) The lower and upper quartiles, and the
are not counted. This is an important function for
(iii) median
summarizing empty cells while analyzing any data.
(3) These values are presented together and ordered
6.3 FEATURES AND CLASSIFICATION from lowest to highest : minimum value, lower
quartile (Q), median value (Q), upper quartile
Classification is the act of forming a class or classes; (Q3), Maximum value.
according to some common relations or attributes while (4) Statistical features of image are :
feature is a distinguishable characteristic of a person or
area,
thing. They are the binary object features. They include
perimeter,
centre of area, axis of least second moment,
6.3.1 Feature Extraction (In Machine Euler member, projection, thinness ratio, and aspect
Learning) ratio.
second momnent,
In real life, all the data we collect are in large amounts. Area, center of area, axis of least
To understand this data, we need a process. Manually, perimeter tell us something about where the object is.
the
it is not possible to process them. Here's when The other four tell us about the shape of the objects.
concept of feature extraction comes in.
6.3.3 Time Domain Features
constructing
It is a general term for methods of
of
combination of the variables to get around these Time domain features refer to the analysis
problems which still describe the data with
sufficient mathematical functions, physical signals or time series
respect to
accuracy. of economic or environmental data, with
to time.
Property optimized feature extraction is the key
value is known
effective model construction. In time-domain, the signal of function's
time,
for all real numbers. It is also valid for continuous
A 6.3.2 Why Feature Extraction is Useful ? or at various separate instants in
the case of discrete
useful when
The technique of extracting the features is
time.
need to reduce the
you have a large data set and
Tech-Neo Publications..A SACHIN SHAH Venture
(M6-131)
Machine Leaming (MU - Sem 6- ECS & (Dimensionality Reduction)
AIDS) . .Page no ((&-8)
For time series data, fcaturc to detcct features such as
cxtraction can be
performed using various time scrics analysis and
algorithms
shaped, edges, or
motion in a digital image or video to process the
decomposition techniques. (3) Auto-encoders :The main purpose of
the auto
Also, features can be obtained by sequence encoders is efficient data coding which is
compar1son
techniques such as dynamic time wrapping and by in naturc. This process comes under unsupervised
subsequence discovery techniques such as motif learning. So Feature extraction procedure isunsupervised
analysis. here to identify the key features from the
applicable
data to cOde
Time-domain feature shows how a signal changes with by learning from the coding of the original data ses.
time, while a frequency domain graph shows how derive new ones.
much of the signal lies within each given frequency NA4 PRINCIPAL
COMPONENT ANALYSIs
band over the range of frequencies. (P.C.A)
IG Feature Values
a 6.4.1 Introduction
It is a determination made by the customer about which
feature is most important to the Principal component represents a multivarjate do.
customers.
Shape Features table as smaller set of variables in order to obserye
trends, jumps, clusters and outliers.
Shape feature provides an alternative to describing an
This overview may give us the relationships between
object, using its most important characteristic and
reduce the amount of information stored. observations and variables and also among variables.
I PCA directions with the largest variances are the of the points.
(a) Covariance
m0st important (i.e. most principal). PC1 axis is the Let x and y bc the means of two data,
first principal direction along which the samples show measures the relationship between the two variables.
the largest variation. (b) Variance operates on a single variable
The PC2 is the second most direction and it is
to PC1 axis (Refer to above (,-)(y,- )
orthogonal
i=1 ...(1)
article 6.4.2 ) Now, COV (x, y)
cOvariance matrix,
by cigen
Principal components are after computed S Further components
decompostion of the data covariance matrix or th found by subtracting the first
matrix. PCA The k component can be
Gingular value decomposition of the data
analysis. k-lprincipal components from X.
is closely related to factor abbreviations
Table 6.5.1: Symbols and
domain specific
Factor analysis incorporates more
and solves Dimensions Indices
Assumptions about the underlying structure Symbol Meaning
PCA is also
cigenvectors of a slightly different matrix. i= 1,..n
(CCA). |X= (Xy Data matrix consisting n Xp
related to canonical correlation Analysis |j= 1,..n
of the set of all data
that optimally
CCA defines coordinate systems vectors, one vector per
between two datasets
describe the croSS covariance raw.
orthogonal coordinate
while PCA defines a new Scalar
variance in a single The number of row1 x1
system and it optimally describes n
vectors in the data set
dataset.
orthogonal linear Scalar
Thus PCA is defined as an The number of1 x 1
to a new
transformation that transforms the data element in each row
or coefficients.
of size l of p-dimensional vectors of weights s =(s)One standardpx1 j=1....p
vector X) Of
Wa) = (Wj.,Wp)k) that map peach row deviation for gach
X to a new vector of principal component scores ti) = column j of the data
(ttdo given by tx=Xa Wo, for i= 1, 2,...,n. (1 matrix
reduce
is usually selected as strictly less than p to i=1,..,n
h =(h)Vector of all l's 1xn
dimensionality)
First component a 6.5.1 Feature Selection Ranking
In order to maximize variance, the first weight vector
Many feature selection methods used in classification
Wa) satisfies
are directly applied to ranking. Specifically, for each
Fcaturc cxtraction
Feature selcction =f
Ym
XN
XiM
projection matrix w is computed from N-dimensional
A
Feature Selection Steps low error.
to M-dimensional vectors to achieve
Feature selection is an optimization problem. Tx. Principle component analysis and
Z = w
Step 1: Search the space of possible feature subsets. Independent component analysis are the feature extractice
Step 2: Pick the subset that is optimal or near-optimal methods.
.Page no (6-16)
First we will find the first cigen vector Cx V = yx Vy
to first cigen valuc, corresponding [0.6165 0.615447
h=0.0489. Lo.61544 0.7165
0.6165 0.61544]
L0.61544 0.7165 | LV12
Fromthe above cquation we will get two
= 0.0489
0.6165V,1 +0.61544V2 =0.0489 V,,
equations as
From the above equation we willget twoequations as 0.61544 V +0.7165V2 =0.0489 V
0.6165 V+0.61544 V12= 0.0489 V11 ...(1) To find the eigen vector we can take either
0.61544 V +0.7165 V12= 0.0489 V ...(2) or (2), for both the equations answer will be the
Equation (1)
To find the eigen vector we can take either Equation (1) let's take the first equation
or (2), for both the equations answer will be the same, = 0.6165 V21 +0.61544 V
let's take the first equation 1.284 V21-0.6675 V21=-0.61544V,
0.6165 Vu +0.61544 V12 =0.0489 V11 V21 = 0.922 V
0.5676 V,u +0.61544 V12=0 Now we willassume V2= l then V1=0.922
V| = -0.61544 V12 V,=0.9227
1 J
V11 = -1.0842 V12
As we are assuming the value of V2 we have to
Now we will assume V1= 1 then VË1=-1.0842
normalized V, as follows
1
VN = Vo.922 +1L I
As we are assuming the value of Vj2 we have to
f0.677]
normalized V, as follow VN = LO.735
1
VIN = Now we have to find the principal component, it is
Vi0842 +1 equal to the eigen vector corresponding to maximum
-0.7357
VIN = 0.677 Eigen value, in this y is maximum, hence principal
component is Eigen vector V2N
Similarly we will find the second eigen vector
corresponding to second eigen value, y = 1.284 Principal component =|
Chapter Ends..
O00