Feature Selection Method in Decision Tree Induction
Feature Selection Method in Decision Tree Induction
METHOD IN DECISION
TREE INDUCTION
71762351023
NITHISH KUMAR R
2
WHAT IS FSD ?
Feature selection is the process of choosing a subset of
relevant features (variables, attributes) from the original set
of features in a dataset. The goal of feature selection is to
improve model performance by reducing dimensionality,
decreasing computational cost, enhancing interpretability,
and mitigating the risk of overfitting.
INFORMATION GAIN (IG):
Information gain measures the reduction in entropy or
impurity achieved by splitting the data on a particular
feature.
Features with higher information gain are considered more
relevant for splitting.
Decision trees typically use information gain (or related
metrics like gain ratio) as the criterion for selecting the best
split at each node.
GAIN RATIO:
Gain ratio is a modification of information gain that penalizes attributes
with a large number of distinct values. It helps to avoid biases towards
attributes with many categories.
In the context of decision trees, it assesses whether the distribution of the target
variable differs significantly in different subsets of the data defined by each attribute.
Features with higher chi-square statistics (indicating greater association with the target
variable) are selected.
6
WRAPPER METHODS:
Wrapper methods use a specific learning algorithm (e.g.,
decision tree) as a black box to evaluate the usefulness of
feature subsets.
EMBEDDED METHODS:
Embedded methods incorporate feature selection
within the model construction process. They select
features during the training phase of the learning
algorithm.