Chapter 03
Chapter 03
Prepared by
Neural Networks:
Overview: Consist of layers of neurons that can learn complex representations of data.
Nonlinearity: Layers with nonlinear activation functions (like ReLU, Sigmoid) allow
modeling very complex relationships.
Decision Trees:
Overview: Splits the data based on feature values to make predictions.
Nonlinearity: Creates complex, piecewise constant decision boundaries that adapt
to intricate data patterns.
Random Forests:
Overview: An ensemble of decision trees, each trained on different subsets of the
data.
Nonlinearity: Combines multiple nonlinear trees to create a more robust model with
improved generalization ability.
Advantages:
Model Complex Relationships: Capable of capturing complex patterns in data.
High Accuracy: Often achieve higher accuracy compared to linear models, especially
in real-world applications with complex data.
Disadvantages:
Computationally Intensive: Training nonlinear models can be resource-intensive.
Risk of Overfitting: More prone to overfitting, especially if not properly regularized.
Decision Tree
• Decision Tree is a Supervised learning technique
• It can be used for both classification and Regression problems, but mostly it is preferred
for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
Why use Decision Trees?
• Decision Trees usually mimic human thinking ability while making a decision.
• The logic of decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies
Root Node: Decision tree starts from root node. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.
Leaf Node: It’s final output node, the tree cannot be segregated further after getting a
leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.
How does the Decision Tree algorithm
Work?
Step-1: Begin the tree with the root node,
says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains
possible values for the best attributes.
Step-4: Generate the decision tree node,
which contains the best attribute.
Step-5: Recursively make new decision trees
using the subsets of the dataset created in
step -3. Continue this process until a stage is
reached where you cannot further classify the
nodes and called the final node as a leaf node.
Attribute Selection Measures:
• While implementing a Decision tree, the main issue arises that how to select the
best attribute for the root node and for sub-nodes.
• To solve such problems there is a technique which is called as Attribute selection
measure or ASM.
• Popular techniques for ASM, which are: Information Gain, Gini Index
1. Information Gain:
• Information gain is the measurement of changes in entropy after the segmentation
of a dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision
tree.
• A decision tree algorithm always tries to use maximum value of information gain,
It can be calculated using the below formula:
Information gain is a measure of this change in entropy.
• Suppose S is a set of instances(whole dataset),
• A is an attribute
• Sv(one feature) is the subset of S
• v represents an individual value that the attribute A can take and Values (A) is the set of
all possible values of A, then