0% found this document useful (0 votes)

12 views47 pages

Lecture 15 - 23.09.2024 - Feature Selection

Uploaded by

Amritanshu Vivek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views47 pages

Lecture 15 - 23.09.2024 - Feature Selection

Uploaded by

Amritanshu Vivek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

14

Feature Selection, PCA,

SVD
Thivin Anandh, IISc Bangalore
“ What feature selection?
What is Feature Selection

• Feature selection is the process of choosing a subset of relevant features from a dataset
to be used in model construction and analysis
• It involves identifying and retaining the most informative and discriminative features
while discarding irrelevant or redundant ones.

Image Credits : https://www.heavy.ai/technical-glossary/feature-selection

Types of Feature Selection

Filter Methods

Wrapper Methods

Embedded Methods
Filter Methods

• Filter methods are feature selection techniques that select features based on
their statistical properties, independent of any specific machine learning
algorithm
• These methods evaluate the relevance of features using statistical measures and
rank them accordingly.

Advantages Examples

• Computationally Efficient • Pearson – Correlation Coefficient

• Independence from ML Alg. • Chi-Square test

• More interpretable • Information Gain

Filter Methods – Pearson Correlation Coefficient

• It measures the linear co-relation between the two continuous variables

• r = 1 -> positive correlation , r = -1 -> neg. Correlation, r = 0 -> no corelation

• We can remove highly correlated variables from the data

Advantages Disadvantages

• Simple to understand • Sensitive to outliers

• East to compute on large datasets • Linearity assumption b/w variables

Filter Methods – chi2 test

• The chi-square test is a statistical method used to determine whether there is a

significant association between two categorical variables

Oi – observed frequency, Ei – expected

frequency

Terminologies N=50 Heads Tails

• Degrees of
freedom Expected 25 25
• Critical values
Observed 28 22
• Null Hypothesis
Filter Methods – chi2 test

• Features with higher chi-square values and lower p-values (b/w target and feature) are
considered more significant and are retained for further analysis.
• Note : The expected value on a tabular data is computed using the below formula
Filter Methods – chi2 test

Advantages Disadvantages

• Suitable for Categorical Data • Limited to categorical Data

• Easy to understand and implement • Sensitive to very small sample size

• Non-parametric (No assumptions on • Comparision between two variables

distributions of data) only

• Little dependence on sample size • Challenging to obtain understanding on

(Magic number is 30) large datasets
Filter Methods – Information Gain

• Information gain is a feature selection method commonly used in decision trees and
related algorithms
• It measures the amount of information obtained for a given feature with respect to the
target variable.
• Features with higher information gain are considered more informative and are
retained for further analysis.

Terminologies

• Entropy (Similar to gini index)

Filter Methods – Information gain

Advantages Disadvantages

• Handles missing values (ignoring • Biased towards features with multiple

them) categories (too many splits)

• More interpretable • Not suitable for regression(only for

classification)
• Non-parametric
• Ignores class distribution (creates
• Works with continuous and categorical problem on imbalanced datasets)
values
Wrapper Methods

• Generate Subsets: Wrapper methods generate different subsets of features from the original
feature set.
• Train Model: Each subset of features is used to train a machine learning model.
• Evaluate Performance: The model's performance is evaluated using a performance metric (e.g.,
accuracy, F1-score).
• Select Best Subset: The subset of features that produces the best performance is selected as the
final feature set.

Types of Wrapper Methods

• Forward Selection: Starts with an empty set of features and iteratively adds features one by one
based on their individual performance until no improvement is observed.
• Backward Elimination: Begins with the full set of features and iteratively removes features one by
one based on their individual performance until no improvement is observed.
• Recursive Feature Elimination (RFE): Selects features by recursively considering smaller and
smaller sets of features until the desired number of features is reached.
Wrapper Methods

Advantages
• Model-Centric: Wrapper methods consider the performance of the model when selecting features,
leading to potentially better model performance.
• Feature Interaction: These methods can capture feature interactions that may not be apparent in
individual features alone.

Disadvantages of Wrapper Methods

• Computational Complexity: Wrapper methods can be computationally expensive, especially when
dealing with a large number of features.
• Overfitting: There is a risk of overfitting when using wrapper methods, as they may select features
that perform well on the training data but poorly on unseen data.
Embedded Methods

• Model Training: Embedded methods use machine learning algorithms that inherently perform
feature selection during training.
• Intrinsic Feature Selection: Feature selection is intrinsic to the model's learning algorithm and is
performed automatically during model training.
• Regularization Techniques: Embedded methods often use regularization techniques to penalize the
model for the inclusion of unnecessary or redundant features.

Types of Embedded Methods

• L1 Regularization (Lasso): L1 regularization adds a penalty term to the model's cost function based
on the absolute value of the coefficients, encouraging sparse solutions and automatic feature
selection.
• Tree-Based Methods: Decision tree-based algorithms, such as Random Forest and Gradient
Boosting Machines (GBM), naturally perform feature selection by selecting the most informative
features at each split.
• Elastic Net: Elastic Net is a regularization technique that combines L1 and L2 penalties to achieve
both feature selection and feature grouping.
Embedded Methods
Advantages of Embedded Methods
• Efficient Feature Selection: Embedded methods perform feature selection directly during model
training, making them efficient and suitable for large datasets.
• Automatic Selection: Feature selection is intrinsic to the model training process, eliminating the
need for separate feature selection steps.
• Handles Non-Linear Relationships: Embedded methods, especially tree-based methods, can
capture non-linear relationships between features and the target variable.

DisAdvantages of Embedded Methods

• Model-Specific: Embedded methods are tightly coupled with specific modeling algorithms, limiting
their flexibility compared to wrapper methods.
• Less Control: Unlike wrapper methods, embedded methods offer less control over the feature
selection process, as it is driven by the model's optimization objective.
Summary – Feature selection

Advantages of Feature Selection

• Improved Model Performance: Feature selection can lead to simpler and more interpretable models
that generalize better to unseen data, resulting in improved performance metrics.
• Reduced Overfitting: By focusing on the most informative features, feature selection can reduce the
risk of overfitting and improve the model's ability to generalize to new data.
• Enhanced Model Interpretability: Selecting a subset of relevant features makes the model more
interpretable and easier to understand for stakeholders and domain experts.
“ Singular Value Decomposition
Singular Value Decomposition

• Singular Value Decomposition is a matrix factorization method that decomposes a

matrix into three separate matrices

• U: A matrix of orthogonal vectors that represent the left singular vectors of the input
matrix.
• Σ: A diagonal matrix that represents the singular values of the input matrix.
• 𝑉𝑇: The conjugate transpose of a matrix of orthogonal vectors that represent the
right singular vectors of the input matrix.
SVD – Geometric Intuition

• U represents a rotation matrix that rotates the original coordinate system to a new
coordinate system in which the data points are aligned along the axes of greatest
variance.
• Σ represents a diagonal matrix that scales the data points along each of the new
coordinate axes. The diagonal elements of Σ are known as the singular values, and
they represent the amount of variance captured by each coordinate axis.
• 𝑉𝑇 represents another rotation matrix that rotates the new coordinate system back
to the original coordinate system
Types of SVD

Full SVD Reduced SVD

Properties of SVD

• The rank of matrix A is the number of nonzero singular values

• || 𝐴 ||2 = 𝜎1 and || 𝐴 ||F = (𝜎12 + 𝜎22 + … + 𝜎r2)½ , r is the rank of the matrix
• The nonzero singular values of A are the square roots of the non-zero eigenvalues
of 𝐴𝑇 𝐴 or 𝐴𝐴𝑇.
• If 𝐴 = 𝐴𝑇 , then the singular values of A are the absolute values of eigen values of
A.
• It is also useful for performing low rank approximations of the matrix ( as we can
see on PCA )
“ Principal Component Analysis
PCA – Curse of Dimensionality

• Lot of Data has higher dimensionality associated with it ( such as images)

• Cambridge Analytica claimed they had more than 5000* data points on every US
voter ( data obtained mostly from FB profiles )
• This increase in dimensionality will lead to increase in Euclidean distances in
vector spaces, which makes it difficult for us to find similar data points.
• Larger dimension will also ensure that the test data will be far apart from the
training data, which will result in overfitting of the model.

*-> 'The Great Hack' : Cambridge Analytica is just the tip of the iceberge - Amnesty International
What is PCA?

• Principal component Analysis

finds the lower dimensional
space which preserves
maximum variance
• PCA identifies the axis that
accounts for the largest amount
of variance in the training set.
• The ith axis will be identified as
the ith principal component of the
data.

Image Reference: Hands on Machine learning with

sklearn keras, Geron 3rd Edition
What are subspaces?

• We can obtain these subspaces, which preserve the most variance by computing
the Eigen vectors of the covariance matrix of the given data
• Before that, we need to understand the following terminologies
• Eigen values and Eigen vectors
• Covariance Matrix
Eigenvalues and Eigenvectors

• When a vector 𝑥 is multiplied by a matrix 𝐴 , it linearly transforms the vector into

column space of 𝐴
• In most of the cases, the vector 𝑥 will undergo both scaling and rotation to get to
the final state
• However, for a particular linear transformation 𝐴, there will be a specific vectors 𝑣
such that, they will not undergo any rotation
• 𝐴𝑣 = 𝜆𝑣
• Here 𝑣 is the Eigen vector and 𝜆 is the eigen value corresponding to that
Eigenvector
Co-variance Matrix

• A covariance matrix is a square matrix that summarizes the covariance between

multiple variables in a dataset.
• The diagonal elements of a covariance matrix represent the variance of each
variable
• The off-diagonal elements represent the covariance between each pair of variables
• Positive covariance between two variables indicates that they tend to move together
• Negative covariance indicates that they tend to move in opposite directions.
• A covariance of zero indicates that the two variables are independent
How to compute principal components

• Compute the co-variance matrix of the given data A, by using

• Ã = (𝑥 − 𝜇) (𝑥 − 𝜇) 𝑇
• Now perform the eigen value decomposition of the given matrix
• Ã = 𝑃−1 Â𝑃
• Here the 𝑃 will the Eigen vectors or principal components of the original data 𝐴
• The principal component which captures the highest variance is the Eigen vector
associated with the highest Eigen value
PCA using sklearn

• Import the PCA from sklearn.decomposition module

• Provide the n_components as input to the PCA function to extract “n” principal
modes from the given data
• Use “fit_transform()” to generate the reduced dimensional data
Sufficient dimension

• By looking at the
“.explained_variance_ratio_ “ we can
obtain the variance captured by each
principal component
• So by looking at the cumulative explained
variance, we can decide the number of
modes that we need for our formulation
• The number of modes and percentage of
variance is highly dependent on the nature
of the task
PCA in image compression

• Since we are only storing a reduced

space of the actual data to reconstruct the
original data, we do not need to store the
complete data, which results in storage
reduction
• This is similar to idea used on image
compression formats like jpeg, where they
use coefficients of a cosine transformation
to reconstruct the pixel values
Types of PCA

• Randomized PCA ( Finds a quick approximation of first d principal components )

•Max(m,n) > 500 and n_components < 0.8*min(m,n)
•Criteria for Randomized PCA in sklearn
• Incremental PCA ( Sends large data in batches to compute PCA of large dataset)
Test your understanding

1. Which of the following is not a common dimensionality reduction technique? a)

Principal Component Analysis (PCA) b) t-SNE c) Linear Discriminant Analysis
(LDA) d) K-means clustering
2. True or False: Dimensionality reduction always results in loss of information.
3. In PCA, eigenvectors with the highest eigenvalues correspond to: a) Least
important principal components b) Most important principal components c)
Random noise in the data d) Outliers in the dataset
4. What is the primary goal of Linear Discriminant Analysis (LDA)? a) Maximize
variance of the data b) Minimize within-class scatter c) Cluster similar data points
d) Identify outliers in the dataset
5. True or False: PCA can be used for feature selection.
Solutions

1. d) K-means clustering
2. False
3. b) Most important principal components
4. b) Minimize within-class scatter
5. False
“ Linear Discriminant Analysis
(LDA)
Linear discriminant analysis

Terminologies

LDA method

Mathematical concept behind LDA

Multi-class LDA
LDA – terminologies

• We will be using two terms frequently

• Mean and scatter
• Each cluster has its own mean and within class variance or
scatter shown by s1,s2,3,s4 in the diagram
• This will be defined for each class.
• The color denotes separate classes
• So here we have labels unlike PCA
• In other words, supervised learning.
LDA

• End goal is the same as PCA dimensionality reduction

• But here we want to maximize the distance between means of the two
classes and minimize the "within class scatter"
• How do we quantify "maximizing means between two classes" and
"minimize scatter within class" using a single number ?
• We use a quantity called Fischer score. Notice the denominator and
numerator. If denominator increases, Fischer score reduces which is
undesirable.
• Fischer score for two classes where s is the within class scatter:
LDA – Fischer score illustration
Example

• Example :Choosing which direction to project is important. Both means

and within class variances should be accounted for
“ So how do we find out these
directions or discriminants ?
LDA – Finding out the discriminants

• We won't look into the derivation in detail

• Assume a vector v that we want to find out. We project the means to the
vector
• Next we do the same for the within class variance.
• Then use the Fischer formula and substitute the above equations
• We then take the derivative and get the expression for the vector v. This
vector v is the vector which will give us maximum class separability with
minimum class scatter.
LDA – derivation for the discriminations
LDA – derivation for the discriminations

• The discriminant (vector v) are the eigen vector of the below

matrix
Summary

• Both LDA and PCA try to reduce dimensions

• PCA looks at the data points with the most variation.
• LDA maximizes separation between known categories
• LDA is supervised , PCA is unsupervised.
Test your understanding

1. Which of the following statements about t-SNE is correct? a) It's a linear

dimensionality reduction technique b) It's primarily used for visualization in high-
dimensional spaces c) It always preserves global structure of the data d) It's faster
than PCA for large datasets
2. What is the main advantage of using dimensionality reduction techniques? a) They
always improve model accuracy b) They can help mitigate the curse of
dimensionality c) They increase the computational complexity d) They add more
features to the dataset
3. In LDA, the number of linear discriminants that can be computed is at most: a)
Equal to the number of features b) Equal to the number of classes minus one c)
Equal to the number of samples d) Unlimited
4. True or False: PCA always requires scaling the data before application.
5. Which technique is more suitable when the goal is to maximize class separability?
a) PCA b) LDA c) Random projection d) Autoencoder
Solutions

1. b) It's primarily used for visualization in high-dimensional spaces

2. b) They can help mitigate the curse of dimensionality
3. b) Equal to the number of classes minus one
4. True
5. b) LDA

Zlib - Pub Marine Biology Function Biodiversity Ecology
100% (4)
Zlib - Pub Marine Biology Function Biodiversity Ecology
584 pages
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
No ratings yet
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
66 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Feature Selection
No ratings yet
Feature Selection
61 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Unit 3
No ratings yet
Unit 3
50 pages
Feature Selection - New
No ratings yet
Feature Selection - New
41 pages
Embedded Methods: Isabelle Guyon André Elisseeff
No ratings yet
Embedded Methods: Isabelle Guyon André Elisseeff
12 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Sta 5
No ratings yet
Sta 5
16 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
ML Notes
No ratings yet
ML Notes
15 pages
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
No ratings yet
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
7 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
No ratings yet
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
13 pages
Day School 03
No ratings yet
Day School 03
32 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Pattern: Recognition
No ratings yet
Pattern: Recognition
25 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
Test
No ratings yet
Test
4 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
No ratings yet
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
9 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
GSW NG01017640 GEN LA7880 00004 - Technical Specifications For Pipeline Valves - D01
100% (1)
GSW NG01017640 GEN LA7880 00004 - Technical Specifications For Pipeline Valves - D01
23 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Maneb Jce Mathematics 2012 Past Paper1719321067
No ratings yet
Maneb Jce Mathematics 2012 Past Paper1719321067
4 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
DTTP Exam
100% (1)
DTTP Exam
4 pages
TNT
100% (1)
TNT
1 page
Reflections On AIDS
No ratings yet
Reflections On AIDS
8 pages
Superstitions, Rituals and Postmodernism: A Discourse in Indian Context.
No ratings yet
Superstitions, Rituals and Postmodernism: A Discourse in Indian Context.
7 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
01 Slides
No ratings yet
01 Slides
109 pages
Engineering Graphics I
No ratings yet
Engineering Graphics I
10 pages
Text To Image Survey
No ratings yet
Text To Image Survey
40 pages
Acha Et Al. 2015 PDF
No ratings yet
Acha Et Al. 2015 PDF
73 pages
CLASS XII PHYSICS EXEMPLAR SOLUTION Chapter 5 Magnetism and Matter
No ratings yet
CLASS XII PHYSICS EXEMPLAR SOLUTION Chapter 5 Magnetism and Matter
35 pages
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
51 pages
WK 6 Strategic Planning Policy Analysis
No ratings yet
WK 6 Strategic Planning Policy Analysis
47 pages
PM - L5 - SP2 - Learner WorkBook
No ratings yet
PM - L5 - SP2 - Learner WorkBook
42 pages
MATH 8 - Term 1 Lesson 4
No ratings yet
MATH 8 - Term 1 Lesson 4
22 pages
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
No ratings yet
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
61 pages
Lecture 11 - 09.09.24 Classification Part 1
No ratings yet
Lecture 11 - 09.09.24 Classification Part 1
51 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Final Monsoon Report 2015 Punjab
No ratings yet
Final Monsoon Report 2015 Punjab
31 pages
Indian Meterology Pilot Mantras
No ratings yet
Indian Meterology Pilot Mantras
6 pages
Exetastai-The Discourses of Identity in Hellenistic Erythrai
100% (1)
Exetastai-The Discourses of Identity in Hellenistic Erythrai
34 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Chapter 3 - Well Test Analysis Formulas and Calcu
No ratings yet
Chapter 3 - Well Test Analysis Formulas and Calcu
30 pages
Free Online AI Face Swap 2
No ratings yet
Free Online AI Face Swap 2
1 page
Overview Schedule of Weighted Assessment 2025
No ratings yet
Overview Schedule of Weighted Assessment 2025
2 pages
Kinerja Ruas Dan Simpang Jalan
No ratings yet
Kinerja Ruas Dan Simpang Jalan
43 pages
CS Project File
No ratings yet
CS Project File
8 pages
F1 Maths Bab 9
No ratings yet
F1 Maths Bab 9
6 pages
Daforest Techniques Grid Teachit 112456
No ratings yet
Daforest Techniques Grid Teachit 112456
2 pages
2025 Applicationguideline E-25
No ratings yet
2025 Applicationguideline E-25
1 page
Weighted Residual Method
No ratings yet
Weighted Residual Method
3 pages
SPARK STAR-68 Final 33KV Tripping
No ratings yet
SPARK STAR-68 Final 33KV Tripping
2 pages
The Origin of Mitochondria - Reading
No ratings yet
The Origin of Mitochondria - Reading
2 pages
Ministry of Resin Exposure Times - Durable Grey
No ratings yet
Ministry of Resin Exposure Times - Durable Grey
1 page
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Lecture 15 - 23.09.2024 - Feature Selection

Uploaded by

Lecture 15 - 23.09.2024 - Feature Selection

Uploaded by

14

Feature Selection, PCA,

Image Credits : https://www.heavy.ai/technical-glossary/feature-selection

• Computationally Efficient • Pearson – Correlation Coefficient

• Independence from ML Alg. • Chi-Square test

• More interpretable • Information Gain

• It measures the linear co-relation between the two continuous variables

• r = 1 -> positive correlation , r = -1 -> neg. Correlation, r = 0 -> no corelation

• Simple to understand • Sensitive to outliers

• East to compute on large datasets • Linearity assumption b/w variables

• The chi-square test is a statistical method used to determine whether there is a

Oi – observed frequency, Ei – expected

Terminologies N=50 Heads Tails

• Suitable for Categorical Data • Limited to categorical Data

• Easy to understand and implement • Sensitive to very small sample size

• Non-parametric (No assumptions on • Comparision between two variables

• Little dependence on sample size • Challenging to obtain understanding on

• Entropy (Similar to gini index)

• Handles missing values (ignoring • Biased towards features with multiple

• More interpretable • Not suitable for regression(only for

Types of Wrapper Methods

Disadvantages of Wrapper Methods

Types of Embedded Methods

DisAdvantages of Embedded Methods

Advantages of Feature Selection

• Singular Value Decomposition is a matrix factorization method that decomposes a

Full SVD Reduced SVD

• The rank of matrix A is the number of nonzero singular values

• Lot of Data has higher dimensionality associated with it ( such as images)

• Principal component Analysis

Image Reference: Hands on Machine learning with

• When a vector 𝑥 is multiplied by a matrix 𝐴 , it linearly transforms the vector into

• A covariance matrix is a square matrix that summarizes the covariance between

• Compute the co-variance matrix of the given data A, by using

• Import the PCA from sklearn.decomposition module

• Since we are only storing a reduced

• Randomized PCA ( Finds a quick approximation of first d principal components )

1. Which of the following is not a common dimensionality reduction technique? a)

Mathematical concept behind LDA

• We will be using two terms frequently

• End goal is the same as PCA dimensionality reduction

• Example :Choosing which direction to project is important. Both means

• We won't look into the derivation in detail

• The discriminant (vector v) are the eigen vector of the below

• Both LDA and PCA try to reduce dimensions

1. Which of the following statements about t-SNE is correct? a) It's a linear

1. b) It's primarily used for visualization in high-dimensional spaces

You might also like