0% found this document useful (0 votes)

13 views50 pages

Unit 3

The document discusses feature extraction and selection in machine learning, highlighting their importance in reducing computational costs, improving algorithm performance, and preventing overfitting. It explains techniques such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) for dimensionality reduction, as well as various feature selection methods including wrapper, filter, and embedded techniques. Additionally, it covers model evaluation metrics like accuracy, precision, recall, and the confusion matrix to assess model performance.

Uploaded by

kp4330471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views50 pages

Unit 3

Uploaded by

kp4330471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Statistical Learning

Feature Extraction

 Feature extraction is a machine learning and data analysis process

that transforms raw data into numerical features.

 Feature extraction refers to the process of transforming raw data into

numerical features that can be processed while preserving the
information in the original data set.

 Feature extraction is a process used in machine learning to reduce

the number of resources needed for processing without losing
important or relevant information.
Why is Feature Extraction
Important?
 Reduction of Computational Cost: By reducing the dimensionality of the
data, machine learning algorithms can run more quickly.

 Improved Performance: Algorithms often perform better with a reduced

number of features. This is because noise and irrelevant details are removed,
allowing the algorithm to focus on the most important aspects of the data.

 Prevention of Overfitting: With too many features, models can become

overfitted to the training data, meaning they may not generalize well to new,
unseen data. Feature extraction helps to prevent this by simplifying the model.

 Better Understanding of Data: Extracting and selecting important features

can provide insights into the underlying processes that generated the data.
Principal Component Analysis

 Principal Component Analysis is an unsupervised learning algorithm

that is used for the dimensionality(or features) reduction in machine
learning.

 Dimensionality reduction in machine learning refers to the process of

reducing the number of random variables (or features) under
consideration.

 This reduction can be achieved by obtaining a set of principal

variables.

 Dimensionality reduction can be used for feature selection, feature

extraction, or a combination of the two.
 Principal Component Analysis (PCA) is an unsupervised learning
algorithm technique used to examine the interrelations among a set
of variables. It is also known as a general factor analysis
Principal Component Analysis

 Principal Component Analysis (PCA) is a statistical procedure that

uses an orthogonal transformation that converts a set of correlated
variables to a set of uncorrelated variables.

 Set of Correlated Variables: PCA is typically applied to a dataset

containing multiple correlated variables (features). The goal of PCA is
to find a new set of variables (principal components) that are linear
combinations of the original variables but are uncorrelated with each
other.
 Uncorrelated Variables: The principal components derived from
PCA are uncorrelated, which means that the covariance between any
pair of principal components is zero.
 The PCA algorithm is based on some mathematical concepts such as:
• Variance and Covariance
• Eigenvalues and Eigen vectors
STEP 1: Standardization
STEP 2: Covariance Matrix
Computation
Step 3: Compute Eigenvalues and
Eigenvectors of Covariance Matrix to Identify
Principal Components
Singular Value Decomposition(SVD)

 The singular value decomposition(SVD) of a matrix is a factorization

of that matrix into three matrices.

 Singular-value decomposition is also one of the popular

dimensionality reduction techniques.

 It is the matrix-factorization method of linear algebra, and it is widely

used in different applications such as feature selection, visualization,
noise reduction, and many more.
 Intensity Level of the image(Which is applicable in Discrete signal)

 Since an image is contiguous, the values of most pixels depend on

the pixels around them.
Application of SVD

 Image Recovery

 Image Compression
Feature Selection

 While developing the machine learning model, only a few variables in

the dataset are useful for building the model, and the rest features
are either redundant or irrelevant.

 If we input the dataset with all these redundant and irrelevant

features, it may negatively impact and reduce the overall
performance and accuracy of the model.

 A feature is an attribute that has an impact on a problem or is useful

for the problem, and choosing the important features for the model is
known as feature selection.
 The main difference between them is that feature selection is about
selecting the subset of the original feature set, whereas feature
extraction creates new features.

 Feature selection is a way of reducing the input variable for the model
by using only relevant data to reduce overfitting in the model.

 It is a process of automatically or manually selecting the

subset of most appropriate and relevant features to be used
in model building.
Need for Feature Selection

 Before implementing any technique, it is really important to

understand, need for the technique and so for the Feature Selection.
As we know, in machine learning, it is necessary to provide a pre-
processed and good input dataset in order to get better outcomes.
 We collect a huge amount of data to train our model and help it to
learn better. Generally, the dataset consists of noisy data, irrelevant
data, and some part of useful data. Moreover, the huge amount of
data also slows down the training process of the model, and with
noise and irrelevant data, the model may not predict and perform
well.
Benefits of using feature selection in
machine learning:
• It helps in avoiding the curse of dimensionality.
• It helps in the simplification of the model so that it can be
easily interpreted by the researchers.
• It reduces the training time.
• It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
Wrapper Method

 In wrapper methods, different subsets of features are evaluated by

training a model for each subset, and then the performance is
compared and the right combination is chosen.
Some techniques of wrapper
methods:
 Forward selection

 Backward elimination

 Exhaustive Feature Selection

 Recursive Feature Elimination

Forward Selection

• Starting from Scratch: Begin with an empty set of features and

iteratively add one feature at a time.

• Model Evaluation: At each step, train and evaluate the machine

learning model using the selected features.

• Stopping Criterion: Continue until a predefined stopping criterion is

met, such as a maximum number of features or a significant drop in
performance.
Backward Elimination

• Starting with Everything: Start with all available features.

• Iterative Removal: In each iteration, remove the least important

feature and evaluate the model.

• Stopping Criterion: Continue until a stopping condition is met.

Exhaustive Search

• Exploring All Possibilities: Evaluate all possible combinations of

features, which ensures finding the best subset for model
performance.

• Computational Cost: This can be computationally expensive,

especially with a large number of features.
Recursive Feature Elimination (RFE)

• Ranking Features: Start with all features and rank them based on
their importance or contribution to the model.

• Iterative Removal: In each iteration, remove the least important

feature(s).

• Stopping Criterion: Continue until a desired number of features is

reached.
Filter Method

 These methods are generally used while doing the pre-processing

step. These methods select features from the dataset irrespective of
the use of any machine learning algorithm.

 In terms of computation, they are very fast and inexpensive and are
very good for removing duplicated, correlated, redundant features
but these methods do not remove multicollinearity.
Techniques

 Information Gain

 Chi-square test

 Fisher’s Score

 Correlation Coefficient
Information Gain

 It is defined as the amount of information provided by the feature for

identifying the target value and measures reduction in the entropy
values. Information gain of each attribute is calculated considering
the target values for feature selection.

 In the context of information theory and machine learning, entropy

is a measure of uncertainty or randomness associated with a random
variable.
Chi-square test

 Chi-square method (X2) is generally used to test the relationship

between categorical variables. It compares the observed values from
different attributes of the dataset to its expected value.
Fisher’s Score

 Fisher’s Score selects each feature independently according to their

scores under Fisher criterion leading to a suboptimal set of features.
The larger the Fisher’s score is, the better is the selected feature.
Correlation Coefficient

 Pearson’s Correlation Coefficient is a measure of quantifying the

association between the two continuous variables and the direction of
the relationship with its values ranging from -1 to 1.
Embedded Method

 Embedded methods combine the advantageous aspects of both Filter

and Wrapper methods.

 Embedded methods combined the advantages of both filter and

wrapper methods by considering the interaction of features along
with low computational cost.

 These are fast processing methods similar to the filter method but
more accurate than the filter method.
 These methods are also iterative, which evaluates each iteration, and
optimally finds the most important features that contribute the most
to training in a particular iteration. :
Some techniques of embedded
methods
 Regularization

 Random Forest Importance

Regularization

 Regularization adds a penalty term to different parameters of the

machine learning model to avoid overfitting in the model.

 This penalty term is added to the coefficients; hence it shrinks some

coefficients to zero. Those features with zero coefficients can be
removed from the dataset.

 The types of regularization techniques are L1 Regularization (Lasso

Regularization) or d L2 regularization(Ridge Regularization)
Random Forest Importance

 Different tree-based methods of feature selection help us with feature

importance to provide a way of selecting features. Here, feature
importance specifies which feature has more importance in model
building or has a great impact on the target variable.

 Random Forest is a tree-based method, which is a type of bagging

algorithm that aggregates a different number of decision trees.
Evaluating ML Algo and Model
Selection
 Model evaluation is the process that uses some metrics that help us
to analyze the performance of the model.
Accuracy

 Accuracy: Accuracy is defined as the ratio of the number of correct

predictions to the total number of predictions.

accuracy_score
Precision

 Precision is the ratio of true positives to the summation of true

positives and false positives. It basically analyses the positive
predictions.

The drawback of Precision is that it does not consider the True

Negatives and False Negatives.

precision_score()
Recall

 Recall is the ratio of true positives to the summation of true positives

and false negatives. It basically analyses the number of correct
positive samples.

recall_score()
Confusion Matrix

The confusion matrix is a representation of the accuracy of the

classification model.
 True Positive: - This is the number of times the model predicted an
independent variable as positive when the actual value was positive.
 False Positive: - This is the number of times the model predicted an
independent variable as positive when the actual value was negative.
 False Negative: - This is the number of times the model predicted
an independent variable as negative when the actual value was
positive.
 True Negative: - This is the number of times the model predicted an
independent variable as negative when the actual value was
negative.

Governance, Risk and Compliance - Energy Industry
100% (2)
Governance, Risk and Compliance - Energy Industry
4 pages
Civil Engineering Drawing Questions and Answers
No ratings yet
Civil Engineering Drawing Questions and Answers
2 pages
Emails (SK, China, Italy) (March 2023)
No ratings yet
Emails (SK, China, Italy) (March 2023)
226 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Amazon EC2 Auto Scaling User Guide
No ratings yet
Amazon EC2 Auto Scaling User Guide
367 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
Computer Architecture and Organization 1st Edition by Ian East ISBN 0273030388 9780273030386 Download
100% (4)
Computer Architecture and Organization 1st Edition by Ian East ISBN 0273030388 9780273030386 Download
43 pages
ANSYS 4 Maxwell PDF
No ratings yet
ANSYS 4 Maxwell PDF
2,747 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
L5 Dimensionality Reduction
No ratings yet
L5 Dimensionality Reduction
47 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Ricoh FT 5640 SM
No ratings yet
Ricoh FT 5640 SM
124 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
Classes That Can Be Instantiated: Ghoul Class
No ratings yet
Classes That Can Be Instantiated: Ghoul Class
14 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Transcript
No ratings yet
Transcript
1 page
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Consultant Empanelment Form
No ratings yet
Consultant Empanelment Form
24 pages
How To Log Defects
No ratings yet
How To Log Defects
6 pages
Citrix MetaFrame Web Interface Administrator's Guide
No ratings yet
Citrix MetaFrame Web Interface Administrator's Guide
141 pages
Theorizing Body As Place
No ratings yet
Theorizing Body As Place
29 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
BSBLDR811 Student Assessment V2.0
0% (1)
BSBLDR811 Student Assessment V2.0
28 pages
NV Operating Systems UNITIII
No ratings yet
NV Operating Systems UNITIII
61 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
2018 THSF mt8173 PCM
No ratings yet
2018 THSF mt8173 PCM
95 pages
Toshiba VF S11 Manual
No ratings yet
Toshiba VF S11 Manual
255 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
What Is Satellite Radio?: Satellites
No ratings yet
What Is Satellite Radio?: Satellites
4 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Day School 03
No ratings yet
Day School 03
32 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
CHP 4
No ratings yet
CHP 4
72 pages
Chameye C720 User - Manual
No ratings yet
Chameye C720 User - Manual
31 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Feature Engineering
No ratings yet
Feature Engineering
5 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Indicative Grade Profile 2022-23 (NTU NUS SMU)
No ratings yet
Indicative Grade Profile 2022-23 (NTU NUS SMU)
4 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
ML Notes
No ratings yet
ML Notes
15 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Carding
0% (1)
Carding
1 page
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Ericsson's Proactive Supply Chain
No ratings yet
Ericsson's Proactive Supply Chain
37 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
EPG REST Integration V17
No ratings yet
EPG REST Integration V17
48 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
The Makers' Movement and Fablabs in Education: Experiences, Technologies, and Research
No ratings yet
The Makers' Movement and Fablabs in Education: Experiences, Technologies, and Research
5 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Lab Mpls LDP Configuration
No ratings yet
Lab Mpls LDP Configuration
17 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Manage Quaility: Varun Anand, PMP +1-833-338-7768
No ratings yet
Manage Quaility: Varun Anand, PMP +1-833-338-7768
15 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
24 pages
Set A: Test Code: JS-X-62-17
No ratings yet
Set A: Test Code: JS-X-62-17
2 pages
CheckPoint Firewall Interview Question and Answer-Part1
100% (1)
CheckPoint Firewall Interview Question and Answer-Part1
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
10122me703 Computer Integrated Manufacturing
No ratings yet
10122me703 Computer Integrated Manufacturing
2 pages