MLT UNIT-3 Notes
MLT UNIT-3 Notes
Decision Tree:
Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.
Pruning: Pruning is the process of removing unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
Example:
2. Information Gain:
Information gain is the difference between before and after a split
on a given attribute. It measures how much information a feature
provides about a target.
Where:
3. Gain Ratio:
Gain Ratio or Uncertainty Coefficient is used to normalize the
information gain of an attribute against how much entropy that
attribute has. The information gain measure is biased towards tests
with many outcomes.
Formula of gain ratio is given by
Steps:
Since we cannot just pick one of the features to start our decision
tree, we need to make calculations to get the feature with the
highest information gain from which we start splitting.
Sunny
Overcast
Rain.
Let's calculate:
Let's calculate:
Let's calculate:
Entropy(S) = 0.94
Entropy (SSunny) = 0.97
Entropy (SOvercast) = 0
Entropy (SRain) = 0.97
So:
Similarly, calculate the information gain for Humidity and Wind. All
information gain values will be:
Note, for Sunny and Rain branches, we can not easily conclude a
yes or a no since we have events where Play Volleyball is yes and
Play volleyball is no. That means that their entropy is more than
zero and hence impure. So we need to split them.
💡
Overcast is a branch with zero entropy since it has all events as Play
volleyball (Yes), so it automatically becomes a leaf node.
We will calculate information gain for the rest of the features when
the Outlook is Sunny and when the Outlook is Rain:
Splitting on the Sunny attribute
Entropy (SHigh) = 0
Entropy (SNormal) = 0
Image by author
Entropy (SStrong) = 0
Entropy(SWeak) = 0
Wind gives the highest information gain value (0.97). Now we can
complete our Decision Tree.
A complete decision tree with Entropy and Information
gain criteria:
Mitigation Strategies:
1. Pruning:
Pruning involves removing branches from the tree that do not provide significant
predictive power. This helps to reduce overfitting and make the tree more
generalizable.
2. Minimum Samples per Leaf or Split:
Setting a minimum number of samples required to make a split or form a leaf node
can help control the tree's depth and mitigate overfitting.
3. Feature Selection:
Carefully selecting relevant features and avoiding irrelevant ones can improve the
tree's ability to generalize to new data.
4. Ensemble Methods:
Using ensemble methods like Random Forests or Gradient Boosting can improve
the overall performance and robustness of decision trees by combining multiple
trees.
5. Handling Imbalanced Data:
Techniques like resampling, using different evaluation metrics, or using specialized
algorithms can address issues related to imbalanced class distributions.
6. Feature Engineering:
Preprocessing the data and engineering informative features can enhance the
performance of decision trees.
7. Cross-Validation:
Employing techniques like cross-validation helps to assess the model's
performance on different subsets of the data, reducing the risk of overfitting.
8. Hyperparameter Tuning:
Tuning the hyperparameters of the decision tree, such as the maximum depth,
minimum samples per leaf, and others, can significantly impact the model's
performance.
By carefully addressing these issues and applying appropriate mitigation strategies,
decision trees can be powerful and effective models in machine learning.
Inductive Inference:
Suppose there are two categories, i.e., Category A and Category B, and we
have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
Basic Idea:
For each prediction, LWR assigns different weights to the data points
based on their proximity to the point where the prediction is
being made. Points closer to the prediction point receive higher
weights, while points farther away receive lower weights.
Weighting Function:
The weights are assigned using a weighting function, which is typically
a Gaussian (bell-shaped) function.
where,
x^i is the feature value of the data point,
x is the feature value of the prediction point, and
τ is a bandwidth parameter that controls the width of the weighting
function.
Local Regression:
LWR fits a regression model locally for each prediction point using the
weighted data. The weights are incorporated into the regression
algorithm to give more importance to nearby points.
Prediction:
To make a prediction at a new point, the model computes a weighted
least squares regression using only the data points close to the
prediction point.
Bandwidth Parameter:
The bandwidth parameter (τ) is crucial in controlling the degree of
locality. A smaller bandwidth focuses more on local details, but it may
lead to overfitting, while a larger bandwidth considers more global
patterns.
Case based format encourages active learning and demonstrates how to apply
theoretical concepts to surgical practice.
1. Preprocessor: This prepares the input for processing e.g., normalizing the
range of numeric value features to ensure that they are treated with equal
importance by the similarity function formatting the raw input into a set of
cases etc.
2. Similarity: This function assesses the similarities of a given case with the
previously stored cases in the concept description. Assessment may
involve explicit encoding and dynamic computation, most practical CBL
similarity functions and a compromise along the continuum between these
extremes.
3. Prediction: This function inputs the similarity assessments and generates
a prediction for the value of the given cases goal feature i.e., a
classification when it is symbolic values.
4. Memory Updating: This updates the stored case base by modifying or
abstracting previously stored cases, forgetting cases presumed to be noisy
or updating a features relevance weight setting.
Case-based learning cycle with different schemes of CBL:
1. Case retrieval: After the problem situation has been assessed, the best
matching case is searched in the case base and an approximate solution is
retrieved.
2. Case adaptation: The retrieved solution is adapted to fit better the new
problem.
3. Solution evaluation: The adapted solution can be evaluated either before the
solution is applied to the problem or after the solution has been applied. In any
case, if the accomplished result is not satisfactory, the retrieved solution must be
adapted again or more cases should be retrieved.
4. Case-based updating: If the solution was verified as correct, the new case
may be added to the case base.