0% found this document useful (0 votes)
18 views24 pages

Lecture#10

some topics related to NLP

Uploaded by

Qareena sadiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views24 pages

Lecture#10

some topics related to NLP

Uploaded by

Qareena sadiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Feature selection

Feature selection
• Feature Selection is one of the core concepts in machine learning
which hugely impacts the performance of your model. The data
features that you use to train your machine learning models have
a huge influence on the performance you can achieve.
• In machine learning and statistics, feature selection, also known
as variable selection, attribute selection or variable subset
selection, is the process of selecting a subset of relevant features
(variables, predictors) for use in model construction. Feature
selection techniques are used for several reasons:
• Simplification of models to make them easier to interpret by
researchers/users
• Shorter training times,
• To avoid the curse of dimensionality
• Improve data's compatibility with a learning model class
Feature selection
• Feature selection techniques should be distinguished from feature
extraction. Feature extraction creates new features from functions of the
original features, whereas feature selection returns a subset of the
features. Feature selection techniques are often used in domains where
there are many features and comparatively few samples (or data points).
• A feature selection algorithm can be seen as the combination of a search
technique for proposing new feature subsets, along with an evaluation
measure which scores the different feature subsets. The simplest
algorithm is to test each possible subset of features finding the one which
minimizes the error rate.
• Irrelevant or partially relevant features can negatively impact model
performance.
• Feature selection and Data cleaning should be the first and most
important step of your model designing.
why feature
• What’s/why feature selection
• A procedure in machine learning
to find a subset of features that produces
‘better’ model for given dataset
– Avoid overfitting and achieve better
generalization ability
– Reduce the storage requirement and
training time
– Interpretability
When feature selection is important

• Noise data
•Lots of low frequent features
•Use multiple type features
•Too many features comparing to samples
•Complex model
•Samples in real scenario is in
homogeneous with training & test
samples
How to Select Features
• How to select features and what are Benefits of performing
feature selection before modeling your data?

• · Reduces Overfitting: Less redundant data means less


opportunity to make decisions based on noise.

• · Improves Accuracy: Less misleading data means modeling


accuracy improves.

• · Reduces Training Time: fewer data points reduce algorithm


complexity and algorithms train faster.
Goal of feature selection
• The goal of feature selection in machine learning is to find the
best set of features that allows one to build useful models of
studied phenomena.

• The techniques for feature selection in machine learning can be


broadly classified into the following categories:

• Supervised Techniques: These techniques can be used for


labeled data, and are used to identify the relevant features for
increasing the efficiency of supervised models like classification
and regression.

• Unsupervised Techniques: These techniques can be used for


unlabeled data like clustering.
Types of feature Selection
From a taxonomic point of view, these techniques are
classified as under:
A. Filter methods
B. Wrapper methods
C. Embedded methods
D. Hybrid methods
•Filter methods perform the feature selection independently of
construction of the classification model.
• Wrapper methods iteratively select or eliminate a set of
features using the prediction accuracy of the classification
model.
•In embedded methods the feature selection is an integral
part of the classification model.
Filter methods
Filter type methods select variables regardless of the model.
They are based only on general features like the correlation with
the variable to predict. Filter methods suppress the least
interesting variables. The other variables will be part of a
classification or a regression model used to classify or to predict
data. These methods are particularly effective in computation
time and robust to overfitting.
Types of Feature Selection
Wrapper methods: Wrapper methods evaluate subsets of
variables which allows, unlike filter approaches, to detect the
possible interactions amongst variables.The two main
disadvantages of these methods are:
•The increasing overfitting risk when the number of observations
is insufficient.
•The significant computation time when the number of variables
is large.
Types of Feature Selection
Embedded methods are a catch-all group
of techniques which perform feature
selection as part of the model construction
process. The exemplar of this approach is
the LASSO method.
Embedded methods have been recently
proposed that try to combine the
advantages of both previous methods. A
learning algorithm takes advantage of its
own variable selection process and
performs feature selection and
classification simultaneously, such as the
FRMT algorithm
Classifying Features

Relevance: These are features that have an influence


on the output and whose role can not be
assumed by the rest.
Irrelevance: Features that don't have any influence on
the output, and whose values are
generated at random for each example.
Redundance: A redundancy exists whenever a
feature can take the role of another.
Typical methods for feature selection

Categories
•Single feature evaluation
•Frequency based, mutual information, KL
divergence, Gini-indexing, information gain,
Chi-square statistic
•Subset selection method
– Sequential forward selection
– Sequential backward selection
Single feature evaluation
• Measure quality of features by all
kinds of metrics
- Frequency based
- Dependence of feature and label
(Co-occurrence)
(Mutual information, Chi-square statistic)
- Information theory
(KL divergence, information gain)
- Gini-indexin
Frequency based

– Remove features according to


frequency of features or instances
contain the feature
Typical scenario
– Text mining
Correlation criteria

• One of the simplest criteria is the


Pearson correlation coefficient defined
as:
Removing features with low variance
VarianceThreshold is a simple baseline approach to feature selection.
It removes all features whose variance doesn’t meet some threshold.
By default, it removes all zero-variance features, i.e. features that have
the same value in all samples.
As an example, suppose that we have a dataset with boolean features,
and we want to remove all features that are either one or zero (on or
off) in more than 80% of the samples. Boolean features are Bernoulli
random variables, and the variance of such variables is given by
so we can select using the threshold .8 * (1 - .8):
Univariate feature selection
• Univariate feature selection works by selecting the best features
based on univariate statistical tests. It can be seen as a
preprocessing step to an estimator. Scikit-learn exposes feature
selection routines as objects that implement the transform method:
• SelectKBest removes all but the highest scoring features
• SelectPercentile removes all but a user-specified highest scoring
percentage of features
• using common univariate statistical tests for each feature: false
positive rate Select Fpr, false discovery rate Select Fdr, or family
wise error Select Fwe.
• GenericUnivariateSelect allows to perform univariate feature
selection with a configurable strategy. This allows to select the best
univariate selection strategy with hyper-parameter search estimator.
Chi-square (X2)

Chi-square (X2) is a popular feature


selection method in which predefined finite
number of words are used to represent the
documents (Forman, 2003). In this method,
a positive or negative value (word) in the
documents represent a particular category.
In Chi-square, frequency of the feature is
totally ignored.
TF/IDF

TF/IDF is a popular technique in the field of


natural language processing (NLP) that is
used to determine the relative frequency of
words in a specific document through an
inverse proportion of the word over the
entire corpus. There are many techniques in
which TF/IDF is a popular weighting
technique (Dsouza & Ansari, 2015). This is
an unsupervised technique and it does not
use class information
Distinguishing Feature Selector

yusal and Gunal (2012) introduced a


probabilistic feature ranking method that
assigns a high rank to a term which
appears frequently in one class irrespective
of the class size. Distinguishing feature
selector is the improved version of Mutual
Information that normalize the weight on N-
Gram and2 assign weight in the range of
[0.5, 1].
Relative Discrimination Criterion
Rehman et al (2015) proposed a new feature ranking
method namely “Relative Discrimination Criterion
(RDC)” for text data. The RDC enhances rank of the
frequently occurring terms presented in one class. The
RDC estimates the rank of a term by taking a weighted
difference of the tpr and fpr for each term count. By
incorporating (tpr, fpr) while determining the rank of a
term, this method selects more useful terms
Thanx for Listening

You might also like