DEBRE TABOR UNIVERSITY INSTITUTE OF TECHNOLOGY
Department OF Information Technology
Article review on The Title
Effective Feature Subset Selection Methods and
Algorithms for High Dimensional Data.
REVIEWED BY
Haiylachew Tessema &
Walelign Atnafu
Submitted to: - Dr Bhoom
Outlines
Abstract
Introduction
Methodology
Objective
Result
Critiques
Conclusion
Recommendation
Reference
Abstract
Feature selection is the mode of recognizing the good number of features that
fabricate well-suited outcome as the unique entire set of features.
Feature Extraction is the special form of dimensionality reduction where feature
selection is the subfield of feature extraction. Feature selection algorithms
essentially have two basic criteria named as, time requirement and quality.
The core idea of feature selection process is improve accuracy level of classifier,
reduce dimensionality and speedup the clustering task.
Introduction
Data mining -is the Route of ascertaining/selecting/finding the interesting
knowledge from huge amounts of information repositories or data warehouses .
Data mining Functionalities are:
Characterization and Discrimination
Mining Frequent Patterns
Association and Correlations
Classification and Prediction
Cluster Analysis
Outlier Analysis
Evolution Analysis
Methodology
The researcher use different Effective Feature Subset Selection and Algorithm
methodologies for High Dimensional Data like:-
o Invention procedure
o Estimation
o Evaluation Methods
o Verification methods
The researcher Uses different classification Methodology Like
o Bayesian Classification
o Decision tree Induction
o Rule Based Classification
Objective
General Objective
Getting well suited outcomes of best feature subset data From high
dimensional Data set
Specific Objective
Select Best feature subset selection methods and Algorithms for High
dimensional dataset
Alternatively Selecting Feature subset selection methods(Embedded, Filter,
Wrapper, and Hybrid)
Select best feature Extraction for subset selection
Improve accuracy and speed up clustering for high dimensional data.
Critics
Strength
For finding effective feature subset selection:-
The author use Different Algorithms
The author use different classification method’s
The author uses different feature selection method's
The author uses different data processing techniques
The author Try to compare different Algorithms and techniques .
The preview works are only using different algorithms but in this article the author try to
include different classification techniques .
Weakness
The author not shows that which data types are easy and which are not easy for effective
feature subset selection.
The author Not shows that which feature subset selection is fast and effective than the
Result
Effective Feature Subset Selection Methods and Algorithms for High
Dimensional Data is an efficient way to improve :-
the accuracy of classifiers
dimensionality reduction,
To remove both irrelevant and redundant data.
Invention procedure, Estimation, Evaluation Methods and Verification
techniques or algorithms are used to eliminate irrelevant and redundant data .
Decision tree Induction ,Bayesian classification and Rule Based Classification
are used to reduce high dimensional data set.
cont…
Tab: Comparison of various techniques and algorithms
Conclusion
The followings related works are used for Effective Feature Subset Selection Methods and Algorithms
for High Dimensional Data
A. Minimum-Spanning Tree (MST) -is an undirected, connected, weighted graph is a spanning tree of
minimum weight.
B. Graph Clustering is theoretic clustering methods have been used in many applications.
C. Consistency Measure -focuses to locate the optimal subset of related feature for improve the overall accuracy
of classification task and deduce the size of the dataset.
D.Relief Algorithm-Relief is well known and good feature set estimator. Feature set estimators evaluate
features individually.
E.Mutual Information-Consecutive features are grouped into clusters, and replaces into single feature. The
clustering process based on the nature of data.
F.Hierarchical clustering-is a procedure of grouping data objects into a tree of clusters.
G. Feature selection methods-Evaluation functions are used to measure the goodness of the subset. Feature
subset selection method is categorized into four types: Embedded, Filter, Wrapper, and Hybrid.
Recommendation
For future Enhancement researchers should include the data
type of any high dimensional data set and categorize which
data types are fast and which are not for Effective feature
subset selection techniques and algorithms.
Reference
[1] Qinbao Song, Jingjie Ni, and Guangtao Wang, “A Fast Clustering-Based Feature Subset
Selection Algorithm for High-Dimensional Data,” IEEE Transaction on Knowledge and Data,
Engineering, Vol. 25, No. 1, January 2013.
[2] M. Dash, H. Liu, and H. Motoda, “Consistency Based Feature Selection,” Proc. Fourth Pacific
Asia Conf. Knowledge Discovery and Data Mining, pp. 98-109, 2000.
[3] M. Dash and H. Liu, “Consistency-Based Search in Feature Selection,” Artificial Intelligence,
vol. 151, nos. 1/2, pp. 155-176, 2003.
[4] H. Liu, H. Motoda, and L. Yu, “Selective Sampling Approach to Active Feature Selection,”
Artificial Intelligence, vol. 159, nos. 1/2, pp. 49-74, 2004.
[5] Battiti,“Using Mutual Information for Selecting Features in Supervised Neural Net Learning,”
IEEE Trans. Neural Networks, vol. 5, no. 4, pp. 537-550, July 1994
Thank You !