0% found this document useful (0 votes)
6 views35 pages

3.1 Feature Selection

The document discusses feature selection, a process aimed at selecting a subset of relevant features from a larger set to minimize classification error and improve model performance. It differentiates feature selection from dimensionality reduction techniques and outlines various methods including supervised, semi-supervised, and unsupervised approaches, as well as filter, wrapper, and hybrid methods. Additionally, it highlights the importance of feature ranking techniques and the impact of training set size on the effectiveness of feature selection.

Uploaded by

kiemdaidaukien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

3.1 Feature Selection

The document discusses feature selection, a process aimed at selecting a subset of relevant features from a larger set to minimize classification error and improve model performance. It differentiates feature selection from dimensionality reduction techniques and outlines various methods including supervised, semi-supervised, and unsupervised approaches, as well as filter, wrapper, and hybrid methods. Additionally, it highlights the importance of feature ranking techniques and the impact of training set size on the effectiveness of feature selection.

Uploaded by

kiemdaidaukien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Data Preparation

• Given a set of D features, the role of feature selection is


to select a subset of d features (d < D) in order to
minimize the classification error.

dimensionality
reduction

• Fundamentally different from dimensionality reduction


(e.g., PCA or LDA) based on feature combinations (i.e.,
feature extraction).
• Feature selection is defined as a process of selecting the
features that best describe a dataset out of a larger set
of candidate features
1. to improve performance (in terms of speed,
predictive power, simplicity of the model).
2. to visualize the data for model selection.
3. to reduce dimensionality and remove noise.
4. removing irrelevant data.
5. increasing predictive accuracy of learned models.
6. reducing the cost of the data.
7. improving learning efficiency, such as reducing
storage requirements and computational cost.
8. reducing the complexity of the resulting model
description, improving the understanding of the
data and the model
Data Reduction
 Typically, there are two types of features: relevant and irrelevant
features
 For classification problem, relevant features are the features that
contain discriminative information about the classes (supervised
context) or clusters (unsupervised context)
 The terms of “feature selection” can be
replaced by different synonyms in the
literature: “variable selection”, “attribute
selection” and “feature ranking”, “feature
weighting”.
 A generation step which is based on a search method
generates subsets of features to be evaluated. A subset
search strategy generates candidate feature subsets in order
to find the optimal subset
 Random
 Sequential
 Complete
 An evaluation function
 A stopping criterion
 A validation step
 The context of learning or the evaluation
strategy
 Supervised methods:
 Semi-supervised methods:
 Unsupervised methods:
 Filter Methods
 Evaluation is independent of
the classification algorithm.
 The objective function evaluates
feature subsets by their
information content, typically
interclass distance, statistical
dependence or information-
theoretic measures.
 Wrapper Methods
 Evaluation uses criteria
related to the classification
algorithm.
 The objective function is a
pattern classifier, which
evaluates feature subsets by
their predictive accuracy
(recognition rate on test data)
by statistical resampling or
cross-validation.
 Hybrid methods combine both filter and wrapper
methods into a single framework, in order to
provide a more efficient solution to the feature
selection problem
 Feature Ranking Techniques:
 we expect as the output a ranked list of features which
are ordered according to evaluation measures.
 they return the relevance of the features.
 for performing actual FS, the simplest way is to
choose the first m features for the task at hand,
whenever we know the most appropriate m value.
 Unsupervised feature selection
 Variance score
 Unsupervised feature selection
 Laplacian score

 Unsupervised sparsity score


 Supervised feature selection
 Fisher score
 Supervised feature selection
 Supervised Laplacian score
 Semi-Supervised feature selection
 Class labels are usually limited or expensive to
be obtained
 A part of unsupervised and supervised learning
 Generate constraints from class labels
 For a set S with N samples and d features
 Initialize : w = (1,1,1….1)
For t=1:T (number iteration) do
 Pick a random sample x from S
 Find nearhit(x) and nearmiss(x) with Euclidean distance
 Compute 1
 
  x  nearmiss( x ) 2

 x  nearhit( x ) 2


2  x  nearmiss( x ) w x  nearhit( x ) w 
 Compute w  w  

End w2
w 2
w

 For a set S with N samples and d features
 Initialize : w=(1,1,1….1)
For t= 1: T ( number iteration) do
 Pick a random sample x from S
 Find K  nearmisses( x)  y1, y 2 ,... y K 
 
K  nearhit( x)  z1, z 2 ,...z K
( xi  yi )2
with  distance based on w,  w ( x, y )   wi
2 2

i ( xi  yi )
1 K 2 1 K 2
 Compute : Dmiss  
K j 1
 ( x, yi ) ; Dhit    ( x, zij )
j

K j 1
 Compute :  
1
Dmiss  Dhit 
2
End
w2
w 2
w

29
 Simba in semi-supervised contexte
Unknown labels Labels

Propagation method => SOFT Labels [1]

Simba

[1] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, “Learning with local and
global consistency,” in Advances in Neural Information Processing Systems 16, 2004, pp.
321–328.
Unknown labels Labels

Simba

 Compute a neighborhood size by [2]


 The samples outside the neighborhood is nearmiss

[2] F. Dornaika, Y. El Traboulsi, and A. Assoum, “Adaptive Two Phase Sparse Representation
Classifier for Face Recognition,” in Advanced Concepts for Intelligent Vision Systems: 15th
International Conference, ACIVS 2013, Poznań, Poland, October 28-31, 2013. Proceedings, J.
Blanc-Talon, A. Kasinski, W. Philips, D. Popescu, and P. Scheunders, Eds. Cham: Springer
International Publishing, 2013, pp. 182–191.
 Must-link (x+y)
 Cannot-link (x-y)
 Compute nearhit and nearmiss

mCANNOT  d ( x, NM ( y))  d ( x, NH ( x))


mMUSTLINK  d ( x, NH ( x))  d ( x, NH ( y ))
mNEW  d ( x, NM ( x))  d ( x, NH ( x)) Future Work (2)
M fusion  M CANNOT  M MUSTLINK  M UNSUPERVIS ED

 fusion  1M CANNOT   2M MUSTLINK   3M UNSUPERVIS ED


 Using score existing : Fisher, Laplacian,… like
a constant guide for simba in supervised
contexte

w  w    Fisher

33
 The resulted subsets of many models of FS are strongly
dependent on the training set size.
 It is not true that a large dimensionality input can always be
reduced to a small subset of features because the objective
feature is actually related with many input features and the
removal of any of them will seriously effect the learning
performance.
.
[1] F. Dornaika, Y. El Traboulsi, and A. Assoum, “Adaptive Two Phase Sparse Representation Classifier for
Face Recognition,” in Advanced Concepts for Intelligent Vision Systems: 15th International Conference,
ACIVS 2013, Poznań, Poland, October 28-31, 2013. Proceedings, J. Blanc-Talon, A. Kasinski, W. Philips, D.
Popescu, and P. Scheunders, Eds. Cham: Springer International Publishing, 2013, pp. 182–191.
[2] M. Yang, F. Wang, and P. Yang, “A novel feature selection algorithm based on hypothesis-margin,”
Journal of Computers, vol. 3, no. 12, pp. 27–34, 2008.
[3] K. Q. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor
classification,” The Journal of Machine Learning Research, vol. 10, pp. 207–244, 2009.
[4] A. Moujahid, A. Abanda, and F. Dornaika, “Feature Extraction Using Block-based Local Binary Pattern
for Face Recognition,” Electronic Imaging, vol. 2016, no. 10, pp. 1–6, 2016.
[5] Y. Li and B.-L. Lu, “Feature selection based on loss-margin of nearest neighbor classification,” Pattern
Recognition, vol. 42, no. 9, pp. 1914–1921, Sep. 2009.
[6] W. Pan, P. Ma, and X. Su, “Feature Weighting Algorithm Based on Margin and Linear Programming,” in
Rough Sets and Current Trends in Computing, 2012, pp. 388–396.
[7] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, “Learning with local and global
consistency,” in Advances in Neural Information Processing Systems 16, 2004, pp. 321–328.
[8] K. Crammer, R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin analysis of the LVQ algorithm,” in
Advances in neural information processing systems, 2002, pp. 462–469.
[9]R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin based feature selection-theory and algorithms,” in
Proceedings of the twenty-first international conference on Machine learning, 2004, p. 43.
35

You might also like