Feature Extraction

Feature extraction involves creating new features from combinations of original features to reduce dimensionality while retaining important information. Common techniques include taking conjunctions, disjunctions, products, and averages of numerical features. Principal component analysis transforms features into orthogonal principal components ordered by the amount of variance explained. Single value decomposition captures patterns in attributes with right singular vectors and among instances with left singular vectors. Larger singular values account for more of the original matrix.

Uploaded by

F.Ramesh Dhanaseelan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views16 pages

Feature Extraction

Uploaded by

F.Ramesh Dhanaseelan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

FEATURE EXTRACTION

FEATURE EXTRACTION
□new features are created from a combination of original features.
□Some of the commonly used operators for combining the original features include
1. For Boolean features: Conjunctions, Disjunctions, Negation, etc.
2. For nominal features: Cartesian product etc.
3.For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.

□We have a data set with a feature set F (F1 , F2 , …, Fn ). After feature extraction
using a mapping function f(F1 , F2 , …, F3 ) say, we will have a set of
features
such that and m < n.
PRINCIPAL COMPONENT ANALYSIS
□In PCA, a new set of features are extracted from the original features which
are quite dissimilar in nature.

□An n-dimensional feature space gets transformed to an m-dimensional

feature space, where the dimensions are orthogonal to each other, i.e.
completely independent of each other.

□The new features are distinct, i.e. the covariance between the new features, i.e.
the principal components is 0.

□The principal components are generated in order of the variability in the data
that it captures.

□The sum of variance of the new features or the principal components should be
□First, calculate the covariance matrix of a data set.
□Then, calculate the eigenvalues of the covariance matrix.
□The eigenvector having highest eigenvalue represents them direction in which there
is the highest variance.

□This will help in identifying the first principal component.

□The eigenvector having the next highest eigenvalue represents the direction in which
data has the highest remaining variance and also orthogonal to the first direction.

□ So this helps in identifying the second principal component.

□Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues so as to get the
‘k’ principal components.
SINGLE VALUE DECOMPOSITION
□Patterns in the attributes are captured by the right-singular vectors, i.e. the
columns of V.

□ Patterns among the instances are captured by the left-singular, i.e. the columns
of U.

□Larger a singular value, larger is the part of the matrix A that it accounts for and
its associated vectors.

□New data matrix with ‘k’ attributes is obtained using the equation
□D ‘= D × [v 1, v2 , … , vk ]
LINEAR DISCRIMINANT ANALYSIS
□Calculate the mean vectors for the individual classes.
□Calculate intra-class and inter-class scatter matrices.
□Calculate eigenvalues and eigenvectors for Sw-1 and SB , where
□Sw is the intra-class scatter matrix and SB is the inter-classscatter
matrix

□where, m is the mean vector of the i-th class

□where, mi is the sample mean for each class, m is the overall mean of the data set,
Ni is the sample size of each class
FEATURE SUBSET SELECTION
□It intends to select a subset of system attributes or features which makes a
most meaningful contribution in a machine learning activity

□ Issues in high-dimensional data

□The objective of feature selection is three-fold:

□Having faster and more cost-effective (i.e. less need for computational
resources) learning model
□Improving the efficiency of the learning model
□Having a better understanding of the underlying model that generated
the data
FEATURE RELEVANCE AND REDUNDANCY
□Feature relevance
Feature (variable) importance indicates how much each feature contributes to the
model prediction.
Basically, it determines the degree of usefulness of a specific variable for a current
model and prediction.

□Feature redundancy
Redundant features are those that are correlated with other features and not relevant
in the sense that they do not improve the discriminatory ability of a set of features
MEASURES OF FEATURE RELEVANCE
□Mutual Information
For supervised learning, mutual information is considered as a good measure
of information contribution of a feature to decide the value of the class label.
Higher the value of mutual information of a feature, more relevant is that feature..
MEASURES OF FEATURE REDUNDANCY
□Correlation-based measures
□Distance-based measures, and
□Other coefficient-based measure
CORRELATION BASED MEASURES
DISTANCE BASED SIMILARITY MEASURES
DISTANCE BASED SIMILARITY MEASURES
JACCARD & SIMPLE MATCHING COEFFICIENT
□Let’s consider two features F1 and F2 having values
□ (0,1, 1, 0, 1, 0, 1, 0) and (1, 1, 0, 0, 1, 0, 0, 0).
COSINE SIMILARITY MEASURES
OTHER SIMILARITY MEASURES
x= (2, 4, 0, 0, 2, 1, 3, 0, 0) and y = (2, 1, 0, 0, 3, 2, 1, 0, 1).
Hamming Distance □calculate the distance between binary vectors
1. Take XOR between vectors
2.Count no.of I’s in the result
Ex: 01101011 and 11001001

Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
Air Master Catalog
100% (2)
Air Master Catalog
191 pages
Unit V
No ratings yet
Unit V
82 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
ML Chapter 4 Part3
No ratings yet
ML Chapter 4 Part3
82 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
Grade 8 Physics Binder
100% (1)
Grade 8 Physics Binder
188 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Unit 3
No ratings yet
Unit 3
50 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
UNIT04
No ratings yet
UNIT04
35 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
LINFO2275 Questions D Examen-4
No ratings yet
LINFO2275 Questions D Examen-4
34 pages
L06 Feature Selection and Extraction
No ratings yet
L06 Feature Selection and Extraction
29 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
Module 2
No ratings yet
Module 2
12 pages
Day School 03
No ratings yet
Day School 03
32 pages
Unit 3
No ratings yet
Unit 3
28 pages
The Shard Presentation
No ratings yet
The Shard Presentation
15 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Basics of Feature Engineering Marked
No ratings yet
Basics of Feature Engineering Marked
33 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
CHP 4
No ratings yet
CHP 4
72 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
ML Lec-20
No ratings yet
ML Lec-20
17 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
Sta 5
No ratings yet
Sta 5
16 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Var, Svar and Svec Models
No ratings yet
Var, Svar and Svec Models
32 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Eature Engineering: Presenter: Prof. Amit Kumar Das
No ratings yet
Eature Engineering: Presenter: Prof. Amit Kumar Das
17 pages
Multivariate Parametric Methods: Steven J Zeil
No ratings yet
Multivariate Parametric Methods: Steven J Zeil
36 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Asco Numatics ATEX General Info
No ratings yet
Asco Numatics ATEX General Info
18 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Numerical and Experimental Modelling of The Steam Assisted Gravity Drainage (SAGD) Process
No ratings yet
Numerical and Experimental Modelling of The Steam Assisted Gravity Drainage (SAGD) Process
7 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Edit - The Complete Guide To MACD Indicator
No ratings yet
Edit - The Complete Guide To MACD Indicator
18 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
No ratings yet
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
2 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Compactly Powerful: Ugeo Pt60A
No ratings yet
Compactly Powerful: Ugeo Pt60A
6 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
15 pages
Dignaga's Philosophy of Language Dignaga On Anyapoha
No ratings yet
Dignaga's Philosophy of Language Dignaga On Anyapoha
374 pages
Bivariate
No ratings yet
Bivariate
8 pages
ORF309 Probability
No ratings yet
ORF309 Probability
28 pages
Data Sheet USB5 V 2019 05 EN
No ratings yet
Data Sheet USB5 V 2019 05 EN
1 page
Circuit Note: Dual-Channel Colorimeter With Programmable Gain Transimpedance Amplifiers and Digital Synchronous Detection
No ratings yet
Circuit Note: Dual-Channel Colorimeter With Programmable Gain Transimpedance Amplifiers and Digital Synchronous Detection
8 pages
Combine Stresses Week14
100% (1)
Combine Stresses Week14
2 pages
19e Multifunctional Indicator Operator Manual
No ratings yet
19e Multifunctional Indicator Operator Manual
73 pages
Prepared By:-Kartik Thakkar
No ratings yet
Prepared By:-Kartik Thakkar
16 pages
Pc102 Document SemesterProjectWorkbook
No ratings yet
Pc102 Document SemesterProjectWorkbook
6 pages
10b LeadTime
No ratings yet
10b LeadTime
2 pages
Assignment
No ratings yet
Assignment
6 pages
Circuit Meets Challenges of Fast, High-Current NiCd Charging
No ratings yet
Circuit Meets Challenges of Fast, High-Current NiCd Charging
5 pages
Chemistry Chapter 5 PDF
No ratings yet
Chemistry Chapter 5 PDF
52 pages
800T/H 30.5 MM Push Buttons NEMA Push Button Specifications: Approximate Dimensions
No ratings yet
800T/H 30.5 MM Push Buttons NEMA Push Button Specifications: Approximate Dimensions
2 pages
Why SIMUL8 PDF
No ratings yet
Why SIMUL8 PDF
8 pages
RES320 - Preisinger, Carrie FINAL EXAM
100% (1)
RES320 - Preisinger, Carrie FINAL EXAM
5 pages
Measure of Association
No ratings yet
Measure of Association
66 pages
Introduction to Advanced Mathematical Analysis
From Everand
Introduction to Advanced Mathematical Analysis
Simone Malacrida
No ratings yet
CS - Full SQL
No ratings yet
CS - Full SQL
60 pages
BERGHOUT Et Al, 2020 - Aircraft Engines Remaining Useful Life Prediction With An Adaptive Denoising Online Sequential Extreme Learning Machine
No ratings yet
BERGHOUT Et Al, 2020 - Aircraft Engines Remaining Useful Life Prediction With An Adaptive Denoising Online Sequential Extreme Learning Machine
10 pages
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
Propylparabens Uv-Vis 1
No ratings yet
Propylparabens Uv-Vis 1
12 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
2018 Howland Et Al. Quantifying The Effects of Erosion On Archaeological Sites With Low-Altitude Aerial Photography, Structure From Motion, and GIS
No ratings yet
2018 Howland Et Al. Quantifying The Effects of Erosion On Archaeological Sites With Low-Altitude Aerial Photography, Structure From Motion, and GIS
9 pages
Utility Model
No ratings yet
Utility Model
20 pages
Appendices: A B C D
No ratings yet
Appendices: A B C D
14 pages
Analytical Scalable PDF
No ratings yet
Analytical Scalable PDF
9 pages