0% found this document useful (0 votes)
18 views16 pages

Feature Extraction

Feature extraction involves creating new features from combinations of original features to reduce dimensionality while retaining important information. Common techniques include taking conjunctions, disjunctions, products, and averages of numerical features. Principal component analysis transforms features into orthogonal principal components ordered by the amount of variance explained. Single value decomposition captures patterns in attributes with right singular vectors and among instances with left singular vectors. Larger singular values account for more of the original matrix.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views16 pages

Feature Extraction

Feature extraction involves creating new features from combinations of original features to reduce dimensionality while retaining important information. Common techniques include taking conjunctions, disjunctions, products, and averages of numerical features. Principal component analysis transforms features into orthogonal principal components ordered by the amount of variance explained. Single value decomposition captures patterns in attributes with right singular vectors and among instances with left singular vectors. Larger singular values account for more of the original matrix.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

FEATURE EXTRACTION

FEATURE EXTRACTION
□new features are created from a combination of original features.
□Some of the commonly used operators for combining the original features include
1. For Boolean features: Conjunctions, Disjunctions, Negation, etc.
2. For nominal features: Cartesian product etc.
3.For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.

□We have a data set with a feature set F (F1 , F2 , …, Fn ). After feature extraction
using a mapping function f(F1 , F2 , …, F3 ) say, we will have a set of
features
such that and m < n.
PRINCIPAL COMPONENT ANALYSIS
□In PCA, a new set of features are extracted from the original features which
are quite dissimilar in nature.

□An n-dimensional feature space gets transformed to an m-dimensional


feature space, where the dimensions are orthogonal to each other, i.e.
completely independent of each other.

□The new features are distinct, i.e. the covariance between the new features, i.e.
the principal components is 0.

□The principal components are generated in order of the variability in the data
that it captures.

□The sum of variance of the new features or the principal components should be
□First, calculate the covariance matrix of a data set.
□Then, calculate the eigenvalues of the covariance matrix.
□The eigenvector having highest eigenvalue represents them direction in which there
is the highest variance.

□This will help in identifying the first principal component.


□The eigenvector having the next highest eigenvalue represents the direction in which
data has the highest remaining variance and also orthogonal to the first direction.

□ So this helps in identifying the second principal component.


□Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues so as to get the
‘k’ principal components.
SINGLE VALUE DECOMPOSITION
□Patterns in the attributes are captured by the right-singular vectors, i.e. the
columns of V.

□ Patterns among the instances are captured by the left-singular, i.e. the columns
of U.

□Larger a singular value, larger is the part of the matrix A that it accounts for and
its associated vectors.

□New data matrix with ‘k’ attributes is obtained using the equation
□D ‘= D × [v 1, v2 , … , vk ]
LINEAR DISCRIMINANT ANALYSIS
□Calculate the mean vectors for the individual classes.
□Calculate intra-class and inter-class scatter matrices.
□Calculate eigenvalues and eigenvectors for Sw-1 and SB , where
□Sw is the intra-class scatter matrix and SB is the inter-classscatter
matrix

□where, m is the mean vector of the i-th class


□where, mi is the sample mean for each class, m is the overall mean of the data set,
Ni is the sample size of each class
FEATURE SUBSET SELECTION
□It intends to select a subset of system attributes or features which makes a
most meaningful contribution in a machine learning activity

□ Issues in high-dimensional data

□The objective of feature selection is three-fold:


□Having faster and more cost-effective (i.e. less need for computational
resources) learning model
□Improving the efficiency of the learning model
□Having a better understanding of the underlying model that generated
the data
FEATURE RELEVANCE AND REDUNDANCY
□Feature relevance
Feature (variable) importance indicates how much each feature contributes to the
model prediction.
Basically, it determines the degree of usefulness of a specific variable for a current
model and prediction.

□Feature redundancy
Redundant features are those that are correlated with other features and not relevant
in the sense that they do not improve the discriminatory ability of a set of features
MEASURES OF FEATURE RELEVANCE
□Mutual Information
For supervised learning, mutual information is considered as a good measure
of information contribution of a feature to decide the value of the class label.
Higher the value of mutual information of a feature, more relevant is that feature..
MEASURES OF FEATURE REDUNDANCY
□Correlation-based measures
□Distance-based measures, and
□Other coefficient-based measure
CORRELATION BASED MEASURES
DISTANCE BASED SIMILARITY MEASURES
DISTANCE BASED SIMILARITY MEASURES
JACCARD & SIMPLE MATCHING COEFFICIENT
□Let’s consider two features F1 and F2 having values
□ (0,1, 1, 0, 1, 0, 1, 0) and (1, 1, 0, 0, 1, 0, 0, 0).
COSINE SIMILARITY MEASURES
OTHER SIMILARITY MEASURES
x= (2, 4, 0, 0, 2, 1, 3, 0, 0) and y = (2, 1, 0, 0, 3, 2, 1, 0, 1).
Hamming Distance □calculate the distance between binary vectors
1. Take XOR between vectors
2.Count no.of I’s in the result
Ex: 01101011 and 11001001

You might also like