0% found this document useful (0 votes)

21 views41 pages

Feature Selection - New

Uploaded by

245123742004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views41 pages

Feature Selection - New

Uploaded by

245123742004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Feature Selection and Dimensionality

Reduction
What is Feature selection (or Variable Selection)?

• Problem of selecting some subset of a learning algorithm’s

input variables upon which it should focus attention, while
ignoring the rest. In other words, Dimensionality
Reduction.
•
Why Feature Selection
• Naïve theoretical view: more features means more information and more discriminating
power. In practice: It is not the case, many reasons.

• Thousands of features with many irrelevant and redundant features. Irrelevant and redundant
features may confuse learners.

• Reduces Overfitting: Less redundant data means less opportunity to make decisions based on
noise.

• Improves Accuracy: Less misleading data means modeling accuracy improves.

• Reduces Training Time: fewer data points reduce algorithm complexity and algorithms train
faster.

• Especially when dealing with a large number of variables there is a need for Dimensionality
Reduction

• Feature Selection can significantly improve a learning algorithm’s performance

The curse of Dimesionality
• the classifier’s performance
usually will degrade for a
large number of features.
• The required number of
samples (to achieve the
same accuracy) grows
exponentially with the
number of variables.

• In practice: the number of training examples is

fixed.
Filter Methods

• The selection of features is independent of any machine learning

algorithms.
• Instead, features are selected on the basis of their scores in various
statistical tests for their correlation with the outcome variable.
• The correlation is a subjective term here. For basic guidance, you
can refer to the following table for defining correlation co-efficients.
• Note that filter methods do not remove multicollinearity.
• So, you must deal with multicollinearity of features as well before
training models for your data.
Statistical Tests
• Pearson’s Correlation: It is used as a measure for quantifying linear dependence
between two continuous variables X and Y. Its value varies from -1 to +1. Pearson’s
correlation is given as:

• LDA: Linear discriminant analysis is used to find a linear combination of features that
characterizes or separates two or more classes (or levels) of a categorical variable.
• ANOVA: ANOVA stands for Analysis of variance. It is similar to LDA except for the fact
that it is operated using one or more categorical independent features and one
continuous dependent feature. It provides a statistical test of whether the means of
several groups are equal or not.
• Chi-Square: It is a is a statistical test applied to the groups of categorical features to
evaluate the likelihood of correlation or association between them using their
frequency distribution.
Chi-Squared test
• Example: We would like to determine the relevance of pitch type (feature
with 3 values: good, medium, bad) to the performance of a base ball team
(target value with three classes: Wins, Draws, Losses). Following are the
observed distribution of statistics /frequencies from the dataset.

• To find the expected frequencies, we assume independence of the rows

and columns.
• To get the expected frequency corresponding to the 11 at top left, we look
at row total (21) and column total (30), multiply them, and then divide by
the overall total (75).
• So the expected frequency is: 21*30 /75 = 8.4
Chi-Squared test
• We compute the expected frequencies for every entry in
the table and are summarized as follows.

• The number of degrees of freedom is calculated for an m-

by-n table as (m-1)(n-1), so in this case (3-1)(3-1) = 2*2 = 4.

• In statistics, the degrees of freedom (DF) indicate the

number of independent values that can vary in an analysis
without breaking any constraints.
Chi-Squared test
• To calculate X^2, we then have a further table:
Chi-Squared test
• The tabular 95% value of X^2
(degrees of freedom = 4) is
9.49, so the value of X^2 that
we obtained (6.70) is not
significant at the 5% level.

• We conclude that the state of

the pitch does not affect the
performance of the team.

• P(X^2 > 9.49) < 0.05. So for

X^2 > 9.49 the assumption of
independence can be rejected
with 95% confidence.
Wrapper Methods

• In wrapper methods, we try to use a subset of features and train a model using them. Based on the
inferences that we draw from the previous model, we decide to add or remove features from your
subset. The problem is essentially reduced to a search problem. These methods are usually
computationally very expensive.

• Some common examples of wrapper methods are forward feature selection, backward feature
elimination, recursive feature elimination, etc.

• Forward Selection: Forward selection is an iterative method in which we start with having no feature in
the model. In each iteration, we keep adding the feature which best improves our model till an addition
of a new variable does not improve the performance of the model.

• Backward Elimination: In backward elimination, we start with all the features and removes the least
significant feature at each iteration which improves the performance of the model. We repeat this until
no improvement is observed on removal of features.

• Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the best
performing feature subset. It repeatedly creates models and keeps aside the best or the worst
performing feature at each iteration. It constructs the next model with the left features until all the
features are exhausted. It then ranks the features based on the order of their elimination.
Forward Selection
Recursive Feature Elimination
Embedded Methods

• Embedded methods combine the qualities’ of

filter and wrapper methods. It’s implemented by
algorithms that have their own built-in feature
selection methods.
• Example: Lasso regression performs L1
regularization which adds penalty equivalent to
absolute value of the magnitude of coefficients.
• Other examples of embedded methods are Ridge
regression. Regularized trees, Memetic
algorithm, Random multinomial logit.
Lasso regression
1
σ𝑚 𝑖 𝑖 2
• 𝐽 𝑤 = 𝑖=1 ℎ𝑤 𝑥 −𝑦 +𝜆 𝑤 1
2𝑚

Note that w1 is zero here

means that we are not
considering feature 1 to
determine the prediction.
Difference between Filter and Wrapper methods

• The main differences between the filter and wrapper methods for feature selection
are:
• Filter methods measure the relevance of features by their correlation with
dependent variable while wrapper methods measure the usefulness of a subset of
feature by actually training a model on it.
• Filter methods are much faster compared to wrapper methods as they do not involve
training the models. On the other hand, wrapper methods are computationally very
expensive as well.
• Filter methods use statistical methods for evaluation of a subset of features while
wrapper methods use cross validation.
• Filter methods might fail to find the best subset of features in many occasions but
wrapper methods can always provide the best subset of features.
• Using the subset of features from the wrapper methods make the model more prone
to overfitting as compared to using subset of features from the filter methods.
Principal Component Analysis
(Dimensionality Reduction)
Applications of PCA
• Data Visualization/Presentation
• Data Compression
• Noise Reduction
• Data Classification
• Trend Analysis
• Factor Analysis
Data Presentation
• Example: 53 Blood and urine
measurements (wet chemistry) from • Spectral Format
65 people (33 alcoholics, 32 non-
1000
alcoholics). 900
800
• Matrix Format 700
600
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC 500
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000 400
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000 300
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000 200
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 100
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000 00
10 20 30 40 50 60
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000 asur ment
Measurement
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000

A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000

A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Data Presentation

1.8
Univariate Bivariate
550
1.6 500
1.4 450
1.2 400

C-LDH
H-Bands

1 350
0.8 300
0.6 250
0.4 200
0.2 150
100
0 50
0 10 20 30 40 50 60 70 0 50 150 250 350 450
Person Trivariate C-Triglycerides
4

3
M-EPI
2

1
0
600
400 500
400
200 300
C-LDH 200
100
00
C-Triglycerides
Data Presentation
• Better presentation than ordinate axes?
• Do we need a 53 dimension space to view data?
• How to find the ‘best’ low dimension space that conveys
maximum useful information?
• One answer: Find “Principal Components”
Principal Component Analysis (PCA)
• PCA converts a set of observations of possibly correlated variables into a set of
values of linearly uncorrelated variables called principal components
• Takes a 𝑛 × 𝑝 data matrix of possibly correlated axes and summarizes it by
uncorrelated axes.
• The first k components display as much as possible of the variation among
objects.
Geometric Rationale of PCA

• objective of PCA is to rigidly rotate the axes of

this p-dimensional space to new positions
(principal axes) that have the following
properties:
– ordered such that principal axis 1 has the highest
variance, axis 2 has the next highest variance, .... ,
and axis p has the lowest variance
– covariance among each pair of the principal axes
is zero (the principal axes are uncorrelated).
PCA

• Project data onto a space variance is

maximized and Error is minimized.

• Orthogonal projection of data onto lower- dimension linear space

that...
– maximizes variance of projected data (purple line)
– minimizes mean squared distance between data point and
projections (sum of blue lines)
PCA

• Idea:
• Given data points in a d-dimensional space,
• project into lower dimensional space while preserving
as much information as possible
– Eg, find best planar approximation to 3D data
– Eg, find best 12-D approximation to 104-D data
• In particular, choose projection that minimizes
squared error in reconstructing original data
PCA: Algorithm
PCA Example –STEP 1

• Subtract the mean

• from each of the data dimensions. All the x values have x
subtracted and y values have y subtracted from them.
This produces a data set whose mean is zero.
• Subtracting the mean makes variance and covariance
calculation easier by simplifying their equations. The
variance and co-variance values are not affected by the
mean value.
PCA Example –STEP 1
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf

DATA: ZERO MEAN DATA:

x y x y
2.5 2.4 .69 .49
0.5 0.7 -1.31 -1.21
2.2 2.9 .39 .99
1.9 2.2 .09 .29
3.1 3.0 1.29 1.09
2.3 2.7 .49 .79
2 1.6
.19 -.31
1 1.1
-.81 -.81
1.5 1.6
-.31 -.31
1.1 0.9
-.71 -1.01
PCA Example –STEP 1
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
PCA Example –STEP 2
• Calculate the covariance matrix

cov = .616555556 .615444444

.615444444 .716555556

• Since the non-diagonal elements in this covariance matrix are positive, we

should expect that both the x and y variable increase together.
PCA Example –STEP 3

• Calculate the eigenvectors and eigenvalues of

the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
PCA Example –STEP 3
• http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
• eigenvectors are plotted as
diagonal dotted lines on the plot.
• Note they are perpendicular to
each other.
• Note one of the eigenvectors
goes through the middle of the
points, like drawing a line of best
fit.
• The second eigenvector gives us
the other, less important, pattern
in the data, that all the points
follow the main line, but are off
to the side of the main line by
some amount.
PCA Example –STEP 4
• Reduce dimensionality and form feature vector
• the eigenvector with the highest eigenvalue is the
principle component of the data set.
• In our example, the eigenvector with the larges eigenvalue
• was the one that pointed down the middle of the data.

• Once eigenvectors are found from the covariance matrix,

the next step is to order them by eigenvalue, highest to
lowest.
• This gives you the components in order of significance.
PCA Example –STEP 4
• Now, if you like, you can decide to ignore the components of
lesser significance.

• You do lose some information, but if the eigenvalues are

small, you don’t lose much

• p dimensions in your data

• calculate p eigenvectors and eigenvalues
• choose only the first k eigenvectors
• final data set has only k dimensions.
PCA Example –STEP 4
• Feature Vector
FeatureVector = (eig1 eig2 eig3 …eign)
We can either form a feature vector with both of the
eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
PCA Example –STEP 5
• Deriving the new data
• FinalData = RowFeatureVector x RowZeroMeanData
• RowFeatureVector is the matrix with the eigenvectors in the
columns transposed so that the eigenvectors are now in the
rows, with the most significant eigenvector at the top
• RowZeroMeanData is the mean-adjusted data
transposed, ie. the data items are in each column, with
each row holding a separate dimension.
PCA Example –STEP 5
FinalData transpose: dimensions along columns

x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
PCA Example –STEP 5
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
Reconstruction of original Data
• If we reduced the dimensionality, obviously, when
reconstructing the data we would lose those
dimensions we chose to discard. In our example let
us assume that we considered only the x dimension…
Reconstruction of original Data
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf

x
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
References and useful links

• http://www.iro.umontreal.ca/~pift6080/H09/
documents/papers/pca_tutorial.pdf
• https://www.cs.cmu.edu/~elaw/papers/pca.p df
• https://stats.stackexchange.com/questions/26 91/making-sense-
of-principal-component- analysis-eigenvectors-eigenvalues

Dokumen - Pub Introductory Applied Statistics With Resampling Methods Amp R 3031277406 9783031277405
No ratings yet
Dokumen - Pub Introductory Applied Statistics With Resampling Methods Amp R 3031277406 9783031277405
197 pages
LSSGB Practice Exam Questions and Answers
100% (4)
LSSGB Practice Exam Questions and Answers
101 pages
Essentials of Statistics For Business & Economics 9th Edition David R. Anderson - Ebook PDF All Chapters Instant Download
100% (3)
Essentials of Statistics For Business & Economics 9th Edition David R. Anderson - Ebook PDF All Chapters Instant Download
62 pages
ARDL Model
100% (1)
ARDL Model
16 pages
Chi Square Test in Weka
67% (3)
Chi Square Test in Weka
40 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Python For Data Science - Unit 6 - Week 4
No ratings yet
Python For Data Science - Unit 6 - Week 4
5 pages
Seminar Maschinellem Lernen: An Improved Model Selection Heuristic For AUC
No ratings yet
Seminar Maschinellem Lernen: An Improved Model Selection Heuristic For AUC
19 pages
Feature Selection Mechanisms in ML
No ratings yet
Feature Selection Mechanisms in ML
93 pages
STATS Ogive Definition
No ratings yet
STATS Ogive Definition
6 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Edited Book - Recent Applied Research in Mathematical, Statistical and Computational Sciences
No ratings yet
Edited Book - Recent Applied Research in Mathematical, Statistical and Computational Sciences
129 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
No ratings yet
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
66 pages
Lecture 8 Feature Selection and Dimensionality Reduction
No ratings yet
Lecture 8 Feature Selection and Dimensionality Reduction
27 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
Production Planning and Control (BMM4823) : Week 4
No ratings yet
Production Planning and Control (BMM4823) : Week 4
54 pages
Poisson CDF Table
No ratings yet
Poisson CDF Table
6 pages
Unit 3
No ratings yet
Unit 3
50 pages
Random Variables and Distributions - New
No ratings yet
Random Variables and Distributions - New
84 pages
Nedl Arch
No ratings yet
Nedl Arch
147 pages
Waste To Energy: Transforming Waste Into Power: Miskin Ravi Kumar
No ratings yet
Waste To Energy: Transforming Waste Into Power: Miskin Ravi Kumar
71 pages
Stats Group 9 JMP Normality Testing - 20250703 - 141915 - 0000
No ratings yet
Stats Group 9 JMP Normality Testing - 20250703 - 141915 - 0000
16 pages
Recommender System - New
No ratings yet
Recommender System - New
49 pages
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet
MATH 1281 - Unit 2 MA
No ratings yet
MATH 1281 - Unit 2 MA
6 pages
1-AS500 Sheet 1 Final
No ratings yet
1-AS500 Sheet 1 Final
9 pages
LINFO2275 Questions D Examen-4
No ratings yet
LINFO2275 Questions D Examen-4
34 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Demand Analysis and Time Series Forecast: - Arkaprava Ghosh - Vikash Prakash Rajdev
No ratings yet
Demand Analysis and Time Series Forecast: - Arkaprava Ghosh - Vikash Prakash Rajdev
14 pages
Lecture 03
No ratings yet
Lecture 03
33 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Module 2
No ratings yet
Module 2
12 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Operation Management
No ratings yet
Operation Management
43 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Signal Unit I l312
No ratings yet
Signal Unit I l312
15 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Unit12 Handout
No ratings yet
Unit12 Handout
38 pages
Cellular Networks12
No ratings yet
Cellular Networks12
15 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
ML Unit-5
No ratings yet
ML Unit-5
12 pages
Statistical Analysis On Mean Rainfall and Mean Temperature Via Functional Data Analysis Technique
No ratings yet
Statistical Analysis On Mean Rainfall and Mean Temperature Via Functional Data Analysis Technique
43 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
STAT8310 Statistical Theory 2021 Topic 5.2 Chebyshev's Inequality and The Central Limit Theorem
No ratings yet
STAT8310 Statistical Theory 2021 Topic 5.2 Chebyshev's Inequality and The Central Limit Theorem
42 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
The 5 Feature Selection Algorithms Every Data Scientist Should Know
No ratings yet
The 5 Feature Selection Algorithms Every Data Scientist Should Know
29 pages
Sta 5
No ratings yet
Sta 5
16 pages
Practice Exam #1 Problem 1 (35 points) Clearing Impurities: p (x) = δ (x) + −x/a) ≤ x
No ratings yet
Practice Exam #1 Problem 1 (35 points) Clearing Impurities: p (x) = δ (x) + −x/a) ≤ x
4 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Modeling Uncertainty P2
No ratings yet
Modeling Uncertainty P2
11 pages
Probability and Statistics Ii Assignment I July 2019 PDF
No ratings yet
Probability and Statistics Ii Assignment I July 2019 PDF
3 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
ML Unit 2 Part - 2
No ratings yet
ML Unit 2 Part - 2
6 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Lecture-9-Distribution Function
No ratings yet
Lecture-9-Distribution Function
7 pages
Week 11 Spearman Rank Correlation
No ratings yet
Week 11 Spearman Rank Correlation
6 pages
ClassNotes 15nov2021
No ratings yet
ClassNotes 15nov2021
9 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
An Introduction To Variable and Feature Selection
No ratings yet
An Introduction To Variable and Feature Selection
26 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Feature Selection Techniques
No ratings yet
Feature Selection Techniques
5 pages
Data Prep For ML-1
No ratings yet
Data Prep For ML-1
5 pages
10 Minute Guide to Orthogonal Array Test Strategy
From Everand
10 Minute Guide to Orthogonal Array Test Strategy
Rajeev Nair Raman
No ratings yet
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
North Luzon Philippines State College: Midterm Examination IN Biostatistics
No ratings yet
North Luzon Philippines State College: Midterm Examination IN Biostatistics
6 pages
Objectives: - by The End of This Lecture Students Will
No ratings yet
Objectives: - by The End of This Lecture Students Will
14 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
T Test
No ratings yet
T Test
7 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
New .........
No ratings yet
New .........
2 pages
T Test As A Parametric Statistic: Tae Kyun Kim
No ratings yet
T Test As A Parametric Statistic: Tae Kyun Kim
7 pages
Iijcs 2014 07 19 18
No ratings yet
Iijcs 2014 07 19 18
7 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Estimation of Optimum Field Plot Shape and Size For Testing Yield in Crambe Abyssinica Hochst
No ratings yet
Estimation of Optimum Field Plot Shape and Size For Testing Yield in Crambe Abyssinica Hochst
3 pages
Pre-Unit Assessment
No ratings yet
Pre-Unit Assessment
3 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages

Feature Selection - New

Uploaded by

Feature Selection - New

Uploaded by

Feature Selection and Dimensionality

• Problem of selecting some subset of a learning algorithm’s

• Improves Accuracy: Less misleading data means modeling accuracy improves.

• Feature Selection can significantly improve a learning algorithm’s performance

• In practice: the number of training examples is

• The selection of features is independent of any machine learning

• To find the expected frequencies, we assume independence of the rows

• The number of degrees of freedom is calculated for an m-

• In statistics, the degrees of freedom (DF) indicate the

• We conclude that the state of

• P(X^2 > 9.49) < 0.05. So for

• Embedded methods combine the qualities’ of

Note that w1 is zero here

A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000

A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

• objective of PCA is to rigidly rotate the axes of

• Project data onto a space variance is

• Orthogonal projection of data onto lower- dimension linear space

• Subtract the mean

DATA: ZERO MEAN DATA:

cov = .616555556 .615444444

• Since the non-diagonal elements in this covariance matrix are positive, we

• Calculate the eigenvectors and eigenvalues of

• Once eigenvectors are found from the covariance matrix,

• You do lose some information, but if the eigenvalues are

• p dimensions in your data

You might also like