Feature Extraction
Feature Extraction
FEATURE EXTRACTION
□new features are created from a combination of original features.
□Some of the commonly used operators for combining the original features include
1. For Boolean features: Conjunctions, Disjunctions, Negation, etc.
2. For nominal features: Cartesian product etc.
3.For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.
□We have a data set with a feature set F (F1 , F2 , …, Fn ). After feature extraction
using a mapping function f(F1 , F2 , …, F3 ) say, we will have a set of
features
such that and m < n.
PRINCIPAL COMPONENT ANALYSIS
□In PCA, a new set of features are extracted from the original features which
are quite dissimilar in nature.
□The new features are distinct, i.e. the covariance between the new features, i.e.
the principal components is 0.
□The principal components are generated in order of the variability in the data
that it captures.
□The sum of variance of the new features or the principal components should be
□First, calculate the covariance matrix of a data set.
□Then, calculate the eigenvalues of the covariance matrix.
□The eigenvector having highest eigenvalue represents them direction in which there
is the highest variance.
□ Patterns among the instances are captured by the left-singular, i.e. the columns
of U.
□Larger a singular value, larger is the part of the matrix A that it accounts for and
its associated vectors.
□New data matrix with ‘k’ attributes is obtained using the equation
□D ‘= D × [v 1, v2 , … , vk ]
LINEAR DISCRIMINANT ANALYSIS
□Calculate the mean vectors for the individual classes.
□Calculate intra-class and inter-class scatter matrices.
□Calculate eigenvalues and eigenvectors for Sw-1 and SB , where
□Sw is the intra-class scatter matrix and SB is the inter-classscatter
matrix
□Feature redundancy
Redundant features are those that are correlated with other features and not relevant
in the sense that they do not improve the discriminatory ability of a set of features
MEASURES OF FEATURE RELEVANCE
□Mutual Information
For supervised learning, mutual information is considered as a good measure
of information contribution of a feature to decide the value of the class label.
Higher the value of mutual information of a feature, more relevant is that feature..
MEASURES OF FEATURE REDUNDANCY
□Correlation-based measures
□Distance-based measures, and
□Other coefficient-based measure
CORRELATION BASED MEASURES
DISTANCE BASED SIMILARITY MEASURES
DISTANCE BASED SIMILARITY MEASURES
JACCARD & SIMPLE MATCHING COEFFICIENT
□Let’s consider two features F1 and F2 having values
□ (0,1, 1, 0, 1, 0, 1, 0) and (1, 1, 0, 0, 1, 0, 0, 0).
COSINE SIMILARITY MEASURES
OTHER SIMILARITY MEASURES
x= (2, 4, 0, 0, 2, 1, 3, 0, 0) and y = (2, 1, 0, 0, 3, 2, 1, 0, 1).
Hamming Distance □calculate the distance between binary vectors
1. Take XOR between vectors
2.Count no.of I’s in the result
Ex: 01101011 and 11001001