0% found this document useful (0 votes)

405 views17 pages

Unit No.02 - Feature Extraction and Selection

Dimensionality reduction techniques are used to reduce the number of features or variables in a dataset. This helps address the curse of dimensionality by reducing complexity and overfitting. There are two main approaches: feature extraction, which creates new features as transformations of the original features; and feature selection, which selects a subset of important original features. Popular dimensionality reduction methods include principal component analysis (PCA), linear discriminant analysis (LDA), and correlation-based feature selection.

Uploaded by

SMHC23 Akshay Dandade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

405 views17 pages

Unit No.02 - Feature Extraction and Selection

Uploaded by

SMHC23 Akshay Dandade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Unit No.

02
Feature Extraction and Selection

 What is Dimensionality Reduction?

The number of input features, variables, or columns present in a given dataset is

known as dimensionality, and the process to reduce these features is called dimensionality
reduction.

A dataset contains a huge number of input features in various cases, which makes
the predictive modeling task more complicated. Because it is very difficult to visualize or

make predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting

the higher dimensions dataset into lesser dimensions dataset ensuring that it provides

similar information." These techniques are widely used in machine learning for obtaining
a better fit predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used for

data visualization, noise reduction, cluster analysis, etc.

 The Curse of Dimensionality

Handling the high-dimensional data is very difficult in practice, commonly known

as the curse of dimensionality. If the dimensionality of the input dataset increases, any
machine learning algorithm and model becomes more complex. As the number of features

increases, the number of samples also gets increased proportionally, and the chance of
overfitting also increases. If the machine learning model is trained on high-dimensional data,

it becomes overfitted and results in poor performance.

Hence, it is often required to reduce the number of features, which can be done with

dimensionality reduction.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
 Benefits of applying Dimensionality Reduction

Some benefits of applying dimensionality reduction technique to the given dataset are
given below:

o By reducing the dimensions of the features, the space required to store the dataset
also gets reduced.
o Less Computation training time is required for reduced dimensions of features.
o Reduced dimensions of features of the dataset help in visualizing the data quickly.
o It removes the redundant features (if present) by taking care of multicollinearity.

 Disadvantages of dimensionality Reduction

There are also some disadvantages of applying the dimensionality reduction, which are
given below:

o Some data may be lost due to dimensionality reduction.

o In the PCA dimensionality reduction technique, sometimes the principal
components required to consider are unknown.

 Methods of Dimensionality Reduction (Feature Reduction)

There are two ways to apply the dimension reduction technique,

Methods of Feature
Reduction

Feature Extraction Feature Selection

-Correlation
-Principal Component Analysis -Chi-Square Test
-Linear Discriminant Analysis -Forward Selection
-Kernel PCA -Backward Selection
-Quadratic Discriminant Analysis -Information Gain
-LASSO & Ridge Regression

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
 Feature Extraction: -
Feature extraction is usually used when the original raw data is very different and we
cannot use raw data for Machine Learning modeling then we transform raw data into the
desired form. Feature extraction is the method for creating a new and smaller set of
features that captures most of the useful information of raw data.
When we actually work in real world machine learning problem then we rarely get data in
shape of CSV So we have to extract the useful features from the raw data. Some of the
popular types of raw data from which features (new feature creation) can be extracted.

 Texts
 Images
 Geospatial data
 Date and time
 Web data
 Sensor Data

 Statistical features-
Statistics is a branch of mathematics that deals with collecting, analyzing,

interpreting, and visualizing empirical data. Descriptive statistics and inferential statistics
are the two major areas of statistics. Descriptive statistics are for describing the properties

of sample and population data (what has happened). Inferential statistics use those
properties to test hypotheses, reach conclusions, and make predictions (what can you

expect).

Important Statistical Features are as follows,

 Median
 Mode
 Mean
 Percentile
 Bias
 Variance
 Standard Deviation (S.D)
 Mean Absolute Deviation (M.A.D)
 Z- Score
 Skewness
 Kurtosis

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
 Mean- Mean is the most commonly used measure of central tendency. It actually
represents the average of the given collection of data.

It is equal to the sum of all the values in the collection of data divided by the total

number of values.
Suppose we have n values in a set of data namely as x1, x2, x3, …, xn, then the mean of

data is given by:

 Median- Generally median represents the mid-value of the given set of data when
arranged in a particular order. Middle Value is called as Median.

 Mode- The most frequent number occurring in the data set is known as the mode.
 Percentile- a percentile is a term that describes how a score compares to other scores

from the same set. IQR

(Interquartile Range)

0th 50th Percentile 100th

Percentile (Median) Percentile

100 300 400 500 800 900 1000

25th 75th
Percentile Percentile

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Calculation-

 Bias- when there is a systematic difference between the true parameters and the

results we estimated, it is called a bias.

 Variance- Variance gives us a measure of how much the data points differ from the

mean. It can also be treated as a measure that tell us how far the data points are
spread out.

 Standard Deviation (S.D)- the standard deviation is a measure of the amount of

variation or dispersion of a set of values.

 Z- Score- How many Standard Deviation away a datapoint is from mean.

 Skewness- Skewness is a measure of symmetry, or more precisely, the lack of

symmetry. A distribution, or data set, is symmetric if it looks the same to the left and
right of the center point.

 Kurtosis- Kurtosis is a measure of whether the data are heavy-tailed or light-tailed

relative to a normal distribution.

 Principal Component Analysis-

PCA (Principal Component Analysis) is a statistical technique that is widely used for data
reduction and dimensionality reduction in various fields such as machine learning, data
science, and signal processing. PCA is a linear transformation technique that finds the
directions of the maximum variance in a dataset and projects the data onto a lower-
dimensional space.

PCA can be divided into the following steps:

1. Standardize the data: The first step in PCA is to standardize the data so that each
variable has zero mean and unit variance. This is done to ensure that all variables
are on the same scale.
2. Calculate the covariance matrix: The next step is to calculate the covariance matrix,
which is a measure of the linear relationship between two variables. The covariance
matrix is calculated by multiplying the transpose of the data matrix by the data
matrix itself.
3. Calculate the eigenvectors and eigenvalues of the covariance matrix: The
eigenvectors and eigenvalues of the covariance matrix are calculated to find the

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
directions of maximum variance in the dataset. Eigenvectors are the directions and
eigenvalues are the magnitudes of the variance.
4. Select the principal components: The eigenvectors are sorted by their
corresponding eigenvalues in descending order. The eigenvectors with the largest
eigenvalues are called the principal components. These principal components
represent the directions of maximum variance in the dataset.
5. Project the data onto the principal components: Finally, the data is projected onto
the principal components to reduce the dimensionality of the dataset. The number
of principal components to be selected depends on the desired amount of variance
to be retained in the data.

PCA is useful in several applications such as image compression, feature extraction, data
visualization, and data analysis. PCA can also be used in combination with other machine
learning algorithms to improve their performance by reducing the dimensionality of the
dataset.

Feature Selection
Feature Selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.”

Need of Feature Selection

 While developing the machine learning model, only a few variables in the dataset
are useful for building the model, and the rest features are either redundant or
irrelevant.

 If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.

 Hence it is very important to identify and select the most appropriate features from
the data and remove the irrelevant or less important features, which is done with
the help of feature selection in machine learning.

 Feature selection is one of the important concepts of machine learning, which

highly impacts the performance of the model.

Feature Selection Techniques-

There are mainly two types of Feature Selection techniques, which are:

Supervised Feature Selection technique

Supervised Feature selection techniques consider the target variable and can be used
for the labelled dataset.

Unsupervised Feature Selection technique

Unsupervised Feature selection techniques ignore the target variable and can be used
for the unlabelled dataset
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
There are mainly three techniques under supervised feature Selection:

1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search problem,
in which different combinations are made, evaluated, and compared with other
combinations. It trains the algorithm by using the subset of features iteratively.

On the basis of the output of the model, features are added or subtracted, and with this
feature set, the model has trained again.

Some techniques of wrapper methods are:

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
o Forward selection - Forward selection is an iterative process, which begins with an empty
set of features. After each iteration, it keeps adding on a feature and evaluates the
performance to check whether it is improving the performance or not. The process continues
until the addition of a new variable/feature does not improve the performance of the model.
o Backward elimination - Backward elimination is also an iterative approach, but it is the
opposite of forward selection. This technique begins the process by considering all the
features and removes the least significant feature. This elimination process continues until
removing the features does not improve the performance of the model.
o Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature
selection methods, which evaluates each feature set as brute-force. It means this method tries
& make each possible combination of features and return the best performing feature set.
o Recursive Feature Elimination-Recursive feature elimination is a recursive greedy
optimization approach, where features are selected by recursively taking a smaller and
smaller subset of features. Now, an estimator is trained with each set of features, and the
importance of each feature is determined using coef_attribute or through
a feature_importances_attribute.

2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the model
by using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does not
overfit the data.

Some common techniques of Filter methods are as follows:

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
o Information Gain
o Chi-square Test
o Fisher's Score
o Missing Value Ratio

Information Gain: Information gain determines the reduction in entropy while

transforming the dataset. It can be used as a feature selection technique by calculating the
information gain of each variable with respect to the target variable.

Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.

Fisher's Score:

Fisher's score is one of the popular supervised technique of features selection. It returns the
rank of the variable on the fisher's criteria in descending order. Then we can select the
variables with a large fisher's score.

Missing Value Ratio:

The value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations. The variable is having
more than the threshold value can be dropped.

3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
These methods are also iterative, which evaluates each iteration, and optimally finds the
most important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:

o Regularization- Regularization adds a penalty term to different parameters of the machine

learning model for avoiding overfitting in the model. This penalty term is added to the
coefficients; hence it shrinks some coefficients to zero. Those features with zero coefficients
can be removed from the dataset. The types of regularization techniques are L1
Regularization (Lasso Regularization) or Elastic Nets (L1 and L2 regularization).
o Random Forest Importance - Different tree-based methods of feature selection help us
with feature importance to provide a way of selecting features. Here, feature importance
specifies which feature has more importance in model building or has a great impact on the
target variable. Random Forest is such a tree-based method, which is a type of bagging
algorithm that aggregates a different number of decision trees. It automatically ranks the
nodes by their performance or decrease in the impurity (Gini impurity) over all the trees.
Nodes are arranged as per the impurity values, and thus it allows to pruning of trees below a
specific node. The remaining nodes create a subset of the most important features.

Decision Tree Classification Algorithm-

o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
o Below diagram explains the general structure of a decision tree:

Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the
given dataset and problem is the main point to remember while creating a machine learning
model. Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-
like structure.

Decision Tree Terminologies

 Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.

 Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Example- Decision Tree
Let's say we have a dataset of weather conditions and corresponding whether people go outside or
not:

 In this case, the goal is to predict whether someone will go outside or not given the
weather conditions. A decision tree can be used to model this problem.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Entropy-
Entropy is a measure of the impurity or randomness of a dataset in decision tree classification. It
is used to determine which feature is the best to split the data on. Entropy is calculated by
considering the proportion of each class in the dataset, and the more evenly the classes are
distributed, the higher the entropy.
The formula for entropy is:

H(S) = -∑(p_i)log2(p_i)
where p_i is the proportion of samples in the dataset that belong to the ith class.
Let's take an example to understand entropy better. Suppose we have a dataset of 100 samples,
where 70 samples belong to class A and 30 samples belong to class B. We want to calculate the
entropy of this dataset.
Step 1: Calculate the probability of each class
p_A = 70 / 100 = 0.7 p_B = 30 / 100 = 0.3
Step 2: Calculate the entropy
H(S) = -[0.7log2(0.7) + 0.3log2(0.3)] = 0.881
This means that the dataset has an entropy of 0.881. If the entropy is closer to 0, it means that the
dataset is more pure and easier to classify, while if the entropy is closer to 1, it means that the
dataset is more impure and harder to classify. Therefore, when building a decision tree, we want
to choose the feature that gives the lowest entropy, as this will result in the best split of the data
into the different classes.

Information Gain-
Information gain is a measure of the amount of information gained by splitting a dataset on a
particular feature in decision tree classification. It is calculated by comparing the entropy or Gini
index of the dataset before and after the split. The feature that gives the highest information gain
is chosen as the feature to split on.

Information Gain = Entropy ( Parent)- [ Weighted Average ]

Entropy ( Childrens)
Let's take an example to understand information gain better. Suppose we have a dataset of 100
samples, where 40 samples belong to class A and 60 samples belong to class B. We want to split
the dataset on a feature called "color", which has two possible values: "red" and "blue". The color
distribution for each class is as follows:
• Class A: 20 red, 20 blue
• Class B: 30 red, 30 blue
Step 1: Calculate the entropy of the original dataset
H(S) = -[0.4log2(0.4) + 0.6log2(0.6)] = 0.971

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Step 2: Calculate the entropy after splitting on "color"
For red:
H(S_red) = -[0.5log2(0.5) + 0.5log2(0.5)] = 1.0
For blue:
H(S_blue) = -[0.333log2(0.333) + 0.667log2(0.667)] = 0.918
Step 3: Calculate the information gain
IG(color) = H(S) - [(20/40)*H(S_red) + (20/40)*H(S_red) + (30/60)*H(S_blue) +
(30/60)*H(S_blue)]
= 0.971 - [0.51.0 + 0.51.0 + 0.50.918 + 0.50.918]
= 0.029
This means that the information gain for splitting on the "color" feature is 0.029. If we compare
this with the information gain for other features, we can determine which feature is the best to split
on. The higher the information gain, the better the feature is for splitting the dataset.
Therefore, in summary, information gain measures the reduction in entropy or Gini index achieved
by splitting a dataset on a particular feature. The feature that provides the highest information gain
is chosen as the splitting feature for a decision tree.

Gini Index-
Gini index is a metric used to measure the impurity of a dataset in decision tree classification. A
dataset is considered pure if all of its samples belong to the same class. If a dataset is impure, it
means that there is more than one class present in the dataset. The Gini index measures the
probability of misclassifying a randomly chosen sample in the dataset.
The Gini index can be calculated using the following formula:

Gini index = 1 - ∑(p_i)^2

where p_i is the proportion of samples in the dataset that belong to the ith class.
Let's take an example to understand the Gini index better. Suppose we have a dataset of 100
samples, where 60 samples belong to class A and 40 samples belong to class B. We want to
calculate the Gini index for this dataset.
Step 1: Calculate the probability of each class
p_A = 60 / 100 = 0.6 p_B = 40 / 100 = 0.4
Step 2: Calculate the Gini index
Gini index = 1 - ((0.6)^2 + (0.4)^2) = 0.48
This means that if we randomly choose a sample from the dataset, there is a 48% chance of
misclassifying it. If the Gini index is closer to 0, it means that the dataset is more pure and easier
to classify, while if the Gini index is closer to 1, it means that the dataset is more impure and harder
to classify. Therefore, when building a decision tree, we want to choose the feature that gives the
lowest Gini index, as this will result in the best split of the data into the different classes.
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Greedy forward & backward-
Greedy forward and backward are two popular wrapper methods for feature selection in machine
learning. These methods involve selecting a subset of features and evaluating the performance of
a machine learning model on the subset.

Greedy forward selection:

In greedy forward selection, the algorithm starts with an empty set of features and adds one feature
at a time to the set. At each step, the algorithm selects the feature that gives the best performance
improvement when added to the set. The algorithm continues until a predetermined number of
features is selected or until adding new features no longer improves performance.
The steps for the greedy forward selection method are as follows:
Initialize the selected set of features to an empty set.
For each feature that is not in the selected set:
a. Train a machine learning model on the selected set of features plus the current feature.
b. Evaluate the performance of the model using a performance metric such as accuracy or AUC.
c. If the performance of the model improves when the current feature is added to the selected set,
add the feature to the selected set.
Repeat steps 2a to 2c until the desired number of features is selected or until adding new features
no longer improves performance.
The advantages of greedy forward selection include its simplicity and the fact that it may converge
to a better feature subset faster than other methods. However, it may be prone to local optima and
may not find the globally optimal feature subset.

Greedy backward selection:

In greedy backward selection, the algorithm starts with all features and removes one feature at a
time from the set. At each step, the algorithm removes the feature that gives the best performance
improvement when removed from the set. The algorithm continues until a predetermined number
of features is selected or until removing features no longer improves performance.
The steps for the greedy backward selection method are as follows:
Initialize the selected set of features to all features.
For each feature in the selected set:
a. Train a machine learning model on the selected set of features minus the current feature.
b. Evaluate the performance of the model using a performance metric such as accuracy or AUC.
c. If the performance of the model improves when the current feature is removed from the selected
set, remove the feature from the selected set.
Repeat steps 2a to 2c until the desired number of features is selected or until removing features no
longer improves performance.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
The advantages of greedy backward selection include that it can be faster than greedy forward
selection and may find a better feature subset by removing irrelevant or redundant features.
However, like greedy forward selection, it may be prone to local optima and may not find the
globally optimal feature subset.
Overall, both greedy forward and backward selection methods are simple and effective methods
for feature selection, and their choice depends on the specific problem and dataset. It is
recommended to use cross-validation to evaluate the performance of the selected feature subset
and to avoid overfitting.

Applications of feature extraction and selection algorithms in

Mechanical Engineering.
Feature extraction and selection algorithms have numerous applications in Mechanical
Engineering. In this answer, we will discuss some of the key applications of these algorithms in
this field.

Structural Health Monitoring:

Structural health monitoring is the process of monitoring the health of structures such as bridges,
dams, and buildings to ensure their safety and integrity. Feature extraction and selection algorithms
are used in this field to extract relevant features from sensors such as accelerometers, strain gauges,
and temperature sensors. These features can be used to detect and predict structural damage,
corrosion, and fatigue, which can help prevent catastrophic failures.

Quality Control in Manufacturing:

Feature extraction and selection algorithms are used in quality control to detect defects and
anomalies in manufactured products. For example, image processing techniques can be used to
extract features from images of manufactured parts, such as their dimensions and surface
roughness. These features can be used to classify parts as either defective or non-defective, which
can help reduce scrap and improve product quality.

Predictive Maintenance:
Predictive maintenance is the process of predicting when machines or equipment are likely to fail,
so that maintenance can be scheduled before a failure occurs. Feature extraction and selection
algorithms can be used to extract relevant features from sensors such as accelerometers,
thermocouples, and vibration sensors. These features can be used to train machine learning models
that can predict when equipment is likely to fail, and schedule maintenance accordingly. This can
help reduce downtime and maintenance costs.

Energy Efficiency:
Feature extraction and selection algorithms are also used in the field of energy efficiency. For
example, in heating, ventilation, and air conditioning (HVAC) systems, features such as
temperature, humidity, and airflow can be extracted from sensors to predict energy consumption
and optimize system performance. These features can be used to train machine learning models
that can adjust system parameters to minimize energy consumption while maintaining comfort
levels.
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Robotics:
Feature extraction and selection algorithms are also used in robotics to extract relevant features
from sensor data such as lidar and camera images. These features can be used to enable
autonomous navigation, object recognition, and grasping. For example, machine learning models
can be trained to recognize objects in a cluttered environment, which can help robots pick and
place objects accurately.
In conclusion, feature extraction and selection algorithms have numerous applications in
Mechanical Engineering, ranging from structural health monitoring to robotics. These algorithms
can help extract relevant features from sensor data, and enable the development of machine
learning models that can improve safety, quality, efficiency, and productivity in various
mechanical engineering applications.

All Numerical Available in PDF which is in the video

description.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede

Group 5 FLASHION v.23
No ratings yet
Group 5 FLASHION v.23
6 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
A I M L
No ratings yet
A I M L
298 pages
Regression Fundamentals: Below Is A Scored Review of Your Assessment. All Questions Are Shown
No ratings yet
Regression Fundamentals: Below Is A Scored Review of Your Assessment. All Questions Are Shown
17 pages
Syllabus-Topics in Computer Vision
100% (1)
Syllabus-Topics in Computer Vision
5 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
Efficient Convolution Algorithms
No ratings yet
Efficient Convolution Algorithms
13 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 2 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 2 Notes
51 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Well Posed Learning Problems and Applications of ML
100% (1)
Well Posed Learning Problems and Applications of ML
17 pages
Cs3451 Ios Unit 5 Notes
No ratings yet
Cs3451 Ios Unit 5 Notes
21 pages
Assignment 11
100% (1)
Assignment 11
4 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
IVA Question Bank
No ratings yet
IVA Question Bank
8 pages
ML Mid Sem Question Bank
No ratings yet
ML Mid Sem Question Bank
11 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
R22 DVT Handout
No ratings yet
R22 DVT Handout
10 pages
ML Assignment 3
No ratings yet
ML Assignment 3
5 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
Memory Organisation in Embedded Systems
No ratings yet
Memory Organisation in Embedded Systems
12 pages
Information Retrieval Systems U6
No ratings yet
Information Retrieval Systems U6
13 pages
1.5 Image Sampling and Quantization
No ratings yet
1.5 Image Sampling and Quantization
17 pages
CS3491 - Notes - Unit 5 - Neural Networks
No ratings yet
CS3491 - Notes - Unit 5 - Neural Networks
37 pages
Unit 4
100% (1)
Unit 4
57 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
First Review PPT (Mini Project)
No ratings yet
First Review PPT (Mini Project)
22 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
Feature Extraction
No ratings yet
Feature Extraction
14 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages
Few AIML Lab Viva QA
No ratings yet
Few AIML Lab Viva QA
3 pages
Unit 4 DL
No ratings yet
Unit 4 DL
31 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
2 Marks Deep Learning
No ratings yet
2 Marks Deep Learning
4 pages
04.1 Points and Patches
No ratings yet
04.1 Points and Patches
37 pages
Unit WISE 2 MARKS PDF
100% (2)
Unit WISE 2 MARKS PDF
38 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
300+ TOP Operating System LAB VIVA Questions and Answers
No ratings yet
300+ TOP Operating System LAB VIVA Questions and Answers
25 pages
Ai Notes Jntuk r20 Unit 1
No ratings yet
Ai Notes Jntuk r20 Unit 1
17 pages
Unit 4 Notes
100% (1)
Unit 4 Notes
45 pages
Minor Project
No ratings yet
Minor Project
21 pages
Soft Computing Lab Manual
No ratings yet
Soft Computing Lab Manual
24 pages
Ai - Unit - 3-1
No ratings yet
Ai - Unit - 3-1
31 pages
Final Project Report
No ratings yet
Final Project Report
52 pages
CNN Lecture Notes
No ratings yet
CNN Lecture Notes
86 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Unit 2 (Second Order Methods)
No ratings yet
Unit 2 (Second Order Methods)
9 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
5 pages
Image and Video Analytics
No ratings yet
Image and Video Analytics
3 pages
IAT-1 Workbook P3-Python
No ratings yet
IAT-1 Workbook P3-Python
16 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report
No ratings yet
Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report
13 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Kom MCQ
No ratings yet
Kom MCQ
11 pages
Unit No. 01 - Introduction To AI & ML
No ratings yet
Unit No. 01 - Introduction To AI & ML
34 pages
Wastewater Treatment Using Electrocoagulation Process
No ratings yet
Wastewater Treatment Using Electrocoagulation Process
2 pages
Box-and-Whisker Plots 7.2: Activity
No ratings yet
Box-and-Whisker Plots 7.2: Activity
6 pages
Data Set JAMOVI - ALUBA
No ratings yet
Data Set JAMOVI - ALUBA
6 pages
Describing Data: Displaying and Exploring Data
No ratings yet
Describing Data: Displaying and Exploring Data
13 pages
Ch-8 Solution
No ratings yet
Ch-8 Solution
9 pages
Hideh
No ratings yet
Hideh
20 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
Formulas and Tables For Statistics
No ratings yet
Formulas and Tables For Statistics
11 pages
Suggested Reading: General Statistics Books
No ratings yet
Suggested Reading: General Statistics Books
12 pages
Tutorial 5
No ratings yet
Tutorial 5
12 pages
Lesson 4 - Measures of Variation
100% (1)
Lesson 4 - Measures of Variation
3 pages
Cambridge International AS & A Level: Mathematics 9709/53
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/53
12 pages
Chapter 18
No ratings yet
Chapter 18
43 pages
Handout#3 - Statistical Inference, Z and T Test
100% (1)
Handout#3 - Statistical Inference, Z and T Test
3 pages
6.2 Students - T Test
No ratings yet
6.2 Students - T Test
15 pages
05-06 - BIOE 211 - Supplement Notes of Hypothesis Testing and Inferential Stat
No ratings yet
05-06 - BIOE 211 - Supplement Notes of Hypothesis Testing and Inferential Stat
19 pages
Introduction To Quantitative Research: Adonis P. David, PHD Philippine Normal University
No ratings yet
Introduction To Quantitative Research: Adonis P. David, PHD Philippine Normal University
55 pages
Finance Exercise Chapter 11
No ratings yet
Finance Exercise Chapter 11
2 pages
Research Proposal
No ratings yet
Research Proposal
30 pages
Objectives: - by The End of This Lecture Students Will
No ratings yet
Objectives: - by The End of This Lecture Students Will
14 pages
THESIS
No ratings yet
THESIS
13 pages
1 - Statistics Syllabus
No ratings yet
1 - Statistics Syllabus
3 pages
Regression Through The Origin
No ratings yet
Regression Through The Origin
5 pages
X-Bar & R Chart Template Rev
No ratings yet
X-Bar & R Chart Template Rev
8 pages
Quantitative Techniques in Business Management and Finance
100% (2)
Quantitative Techniques in Business Management and Finance
503 pages
Practice Questions Chi Square Test of Independence
100% (1)
Practice Questions Chi Square Test of Independence
4 pages
Hypothesis Testing With Z Tests
No ratings yet
Hypothesis Testing With Z Tests
43 pages
cmc7 PDF
No ratings yet
cmc7 PDF
1 page
15-Nonparametric Methods Chap015
No ratings yet
15-Nonparametric Methods Chap015
21 pages

Unit No.02 - Feature Extraction and Selection

Uploaded by

Unit No.02 - Feature Extraction and Selection

Uploaded by

Unit No.

 What is Dimensionality Reduction?

Dimensionality reduction technique can be defined as, "It is a way of converting

data visualization, noise reduction, cluster analysis, etc.

 The Curse of Dimensionality

it becomes overfitted and results in poor performance.

 Disadvantages of dimensionality Reduction

o Some data may be lost due to dimensionality reduction.

 Methods of Dimensionality Reduction (Feature Reduction)

Feature Extraction Feature Selection

Important Statistical Features are as follows,

data is given by:

from the same set. IQR

0th 50th Percentile 100th

100 300 400 500 800 900 1000

results we estimated, it is called a bias.

 Standard Deviation (S.D)- the standard deviation is a measure of the amount of

 Z- Score- How many Standard Deviation away a datapoint is from mean.

 Kurtosis- Kurtosis is a measure of whether the data are heavy-tailed or light-tailed

 Principal Component Analysis-

PCA can be divided into the following steps:

Need of Feature Selection

 Feature selection is one of the important concepts of machine learning, which

Feature Selection Techniques-

Supervised Feature Selection technique

Unsupervised Feature Selection technique

Some techniques of wrapper methods are:

Some common techniques of Filter methods are as follows:

Information Gain: Information gain determines the reduction in entropy while

Missing Value Ratio:

o Regularization- Regularization adds a penalty term to different parameters of the machine

Decision Tree Classification Algorithm-

Why use Decision Trees?

Decision Tree Terminologies

Information Gain = Entropy ( Parent)- [ Weighted Average ]

Gini index = 1 - ∑(p_i)^2

Greedy forward selection:

Greedy backward selection:

Applications of feature extraction and selection algorithms in

Structural Health Monitoring:

Quality Control in Manufacturing:

All Numerical Available in PDF which is in the video

You might also like