Chapter 03

Uploaded by

suleymanabdu0931

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views30 pages

Chapter 03

Uploaded by

suleymanabdu0931

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Fundamentals of Machine Learning

Prepared by

Prince Thomas M.E., PhD

Associate Professor
Chapter 3: Supervised Learning: Nonlinear Models
K-Nearest Neighbors (K-NN)
- Distance Metrics (Euclidean, Manhattan)
- K-Value Selection and Cross-Validation
Neural Networks and Multilayer Perceptrons (MLPs)
- Structure of Neural Networks (Input, Hidden, and Output Layers)
- Activation Functions (ReLU, Sigmoid, Softmax)
- Backpropagation and Gradient Descent in Neural Networks
Decision Trees
- Splitting Criteria: Gini Index, Entropy, and Information Gain
- Overfitting in Decision Trees and Pruning Boosting Techniques (e.g., AdaBoost)
Random Forests (Introduction to Ensembles) - Concept of Weak Learners
- Bagging and Random Subspace Sampling
- Boosting Algorithms: AdaBoost,
Gradient Boosting
Stacking and Voting Methods
- Model Combination Techniques
- Hard vs. Soft Voting
Nonlinear Models in Machine Learning
- Nonlinear models capture relationships between inputs and outputs that are not linear.
- They can model complex patterns in data that linear models cannot.

Key Characteristics of Nonlinear Models:

Flexibility: Capable of modeling complex, nonlinear relationships.
Complex Patterns: Can fit data with intricate patterns and interactions between
variables.
Non-Linear Boundaries: Able to create decision boundaries that are not straight lines,
which is crucial for solving complex classification problems.

Examples of Nonlinear Models:

K-Nearest Neighbors (K-NN):
Overview: Classifies a data point based on the classification of its nearest neighbors.
Nonlinearity: The decision boundary can be highly nonlinear depending on the
distribution of training data and the value of K.

Neural Networks:
Overview: Consist of layers of neurons that can learn complex representations of data.
Nonlinearity: Layers with nonlinear activation functions (like ReLU, Sigmoid) allow
modeling very complex relationships.
Decision Trees:
Overview: Splits the data based on feature values to make predictions.
Nonlinearity: Creates complex, piecewise constant decision boundaries that adapt
to intricate data patterns.

Random Forests:
Overview: An ensemble of decision trees, each trained on different subsets of the
data.
Nonlinearity: Combines multiple nonlinear trees to create a more robust model with
improved generalization ability.

Advantages:
Model Complex Relationships: Capable of capturing complex patterns in data.
High Accuracy: Often achieve higher accuracy compared to linear models, especially
in real-world applications with complex data.

Disadvantages:
Computationally Intensive: Training nonlinear models can be resource-intensive.
Risk of Overfitting: More prone to overfitting, especially if not properly regularized.
Decision Tree
• Decision Tree is a Supervised learning technique
• It can be used for both classification and Regression problems, but mostly it is preferred
for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
Why use Decision Trees?
• Decision Trees usually mimic human thinking ability while making a decision.
• The logic of decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies
Root Node: Decision tree starts from root node. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.
Leaf Node: It’s final output node, the tree cannot be segregated further after getting a
leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.
How does the Decision Tree algorithm
Work?
Step-1: Begin the tree with the root node,
says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains
possible values for the best attributes.
Step-4: Generate the decision tree node,
which contains the best attribute.
Step-5: Recursively make new decision trees
using the subsets of the dataset created in
step -3. Continue this process until a stage is
reached where you cannot further classify the
nodes and called the final node as a leaf node.
Attribute Selection Measures:
• While implementing a Decision tree, the main issue arises that how to select the
best attribute for the root node and for sub-nodes.
• To solve such problems there is a technique which is called as Attribute selection
measure or ASM.
• Popular techniques for ASM, which are: Information Gain, Gini Index
1. Information Gain:
• Information gain is the measurement of changes in entropy after the segmentation
of a dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision
tree.
• A decision tree algorithm always tries to use maximum value of information gain,
It can be calculated using the below formula:
Information gain is a measure of this change in entropy.
• Suppose S is a set of instances(whole dataset),
• A is an attribute
• Sv(one feature) is the subset of S
• v represents an individual value that the attribute A can take and Values (A) is the set of
all possible values of A, then

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies

randomness in data, It specifies how much information available in the data. Entropy can
be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
Example for Entropy calculation dataset
example
2. Gini Index:
• Gini index is a measure of impurity or purity used while creating a decision
tree in the CART(Classification and Regression Tree) algorithm.
• An attribute with low Gini index should be preferred as compared to the
high Gini index. because Low Gini index indicates less impurity, leading to
better decision tree splits.
• It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
• Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2
Pj: The probability of an object being classified into the jth class.
∑jPj^2: The sum of the squared probabilities for all classes.
Range: The Gini Index ranges from 0 to 1.
0: Perfect equality (all classes have equal probability).
1: Perfect inequality (one class has a probability of 1, all others 0).
Higher Gini Index: Indicates greater inequality or impurity in the dataset.
Lower Gini Index: Indicates less inequality or higher purity.
Overfitting in Decision Trees
Definition: Overfitting happens when the model captures too much detail
from the training data, including noise and outliers.
Impact: Leads to poor generalization, meaning the model performs well on
training data but poorly on unseen data.
Cause: Often results from a highly complex tree with many nodes and
branches, trying to perfectly fit the training data.
Consequence: High risk of making incorrect predictions on new data due to
the overly specific patterns learned from the training data.
Controlling Overfitting Through Pruning
Pruning helps reduce complexity by removing branches that don’t contribute
significantly to model accuracy.
1. Pre-pruning (Early Stopping):
• Stops tree growth early based on certain criteria, preventing it from becoming overly
complex.
Common parameters:
Max depth: Limits the depth of the tree.
Minimum samples per leaf: Sets a minimum number of samples for each leaf node.
Minimum samples to split: Specifies the minimum samples required to split a
node.
Maximum leaf nodes: Limits the total number of leaf nodes.
Pros: Faster, reduces complexity upfront.
2. Post-pruning (Cost Complexity Pruning):
• Prunes the tree after it has fully grown by removing branches to simplify it.
Techniques:
• Reduced Error Pruning: Removes branches if it doesn’t worsen accuracy
on a validation set.
• Cost Complexity Pruning: Adds a penalty for each node to balance
accuracy and complexity, tuned with ccp_alpha in libraries like scikit-learn.
Pros: Allows exploration of deeper patterns before simplifying.
Cons: Computationally intensive and requires careful parameter selection.
Additional Tips for Controlling Overfitting
Cross-Validation: Apply cross-validation to fine-tune pruning parameters
and other hyperparameters, achieving a balance between bias and variance.
Ensemble Methods: Use methods like Random Forests or Gradient Boosted
Trees, which combine multiple trees to reduce overfitting through averaging.
Advantages of the Decision Tree
• It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
• For more class labels, the computational complexity of the decision tree
may increase.
Random Forest Algorithm
What is Random Forest Algortihm
Random Forest = Decision Tree + Column Sampling/Row Sampling

Random Forest is a popular machine learning algorithm that belongs to the

family of ensemble learning methods.
• Random Forest is a tree-based ensemble learning algorithm used in
machine learning for classification and regression.
• It constructs multiple Decision Trees during training, each using a
random subset of the dataset.
• Each tree measures a random subset of features at each split, increasing
variability and reducing overfitting.
• Prediction is made by aggregating the results of all trees:
• Voting for classification tasks.
• Averaging for regression tasks.
• This ensemble approach leads to stable and precise results.
• Random Forests can handle complex data effectively and are widely used in
various applications for their reliability in predictions.
What are Ensemble Learning models?
• The collective strength of multiple models overcomes individual limitations,
leading to more robust predictions.
• Ensemble models are commonly used in classification and regression tasks.
• Popular ensemble models include:
• Bagging: Reduces variance by training multiple versions of a model.
• Random Forest: Builds multiple decision trees on random data subsets.
• Boosting: Sequentially improves models by focusing on errors (e.g.,
AdaBoost, XGBoost, LightGBM).
• Voting: Combines predictions by taking a majority or average vote
across models.
Bagging (Bootstrap Aggregating)
Goal: Reduce variance and avoid overfitting by combining predictions from
multiple models.
How it works:
• Creates multiple subsets of the training data by sampling with replacement.
• Trains a separate model on each subset (often using decision trees).
Aggregates predictions:
For regression: Takes the average of predictions.
For classification: Uses majority voting.
Example: Random Forest is a popular bagging method that combines many
decision trees.
Boosting
Goal: Improve model accuracy by focusing on difficult-to-predict cases.
How it works:
• Trains models sequentially, with each new model correcting the errors of the
previous ones.
• Adjusts weights to emphasize data points that were misclassified earlier.
• Final prediction combines all models, often with weighted voting.
Example: AdaBoost and XGBoost are popular boosting methods that
iteratively refine predictions.

Both bagging and boosting aim to create a stronger overall model by

combining the strengths of individual models.
Algorithm for Random Forest Work:

Step 1: Select random K data points from the training set.

Step 2: Build the decision trees associated with the selected data
points(Subsets).
Step 3: Choose the number N for decision trees that you want to build.
Step 4: Repeat Step 1 and 2.
Step 5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
What is a Weak Learner?
• A Weak Learner is a model that performs just slightly better than random guessing
on a given problem.
• In binary classification, a weak learner has an accuracy of just above 50% (i.e.,
better than chance).
• For regression, it performs only marginally better than guessing the average value.
Characteristics of Weak Learners
• Simple models: Often, weak learners are simple models, such as small decision
trees (stumps with one or two splits) or simple linear models.
• High bias: Weak learners typically have limited complexity, so they’re biased and
may underfit the data if used alone.
• Low predictive power individually: On their own, weak learners may not
capture all patterns or relationships in the data.
Why Use Weak Learners?
• Combining Weak Learners in Ensembles: While a weak learner on its own is not
powerful, combining many weak learners can lead to a strong model.
• In Boosting, each weak learner corrects the errors of the previous ones, resulting in
a progressively better model.
• In Bagging (like Random Forest), the weak learners are trained independently, and
their predictions are averaged or voted upon, reducing variance.
Efficiency: Weak learners are computationally simpler and faster to train, making
them suitable for use in large ensemble methods where many learners are needed.
Controlled overfitting: Because weak learners are limited in complexity, they can
help keep the ensemble model from overfitting, especially in Boosting methods.
Examples of Weak Learners
• Decision Stumps: Decision trees with only one or two splits.
• Shallow Trees: Decision trees with low depth, typically limited to a few levels.
• Simple Linear Models: Models that only capture linear relationships without
complex transformations.
Boosting Technique:
1. AdaBoost (Adaptive Boosting)
How it works:
• AdaBoost builds models sequentially, where each new model focuses on the
mistakes made by the previous one.
• After each model, misclassified data points are given more weight, so the next
model will focus more on those points.
• The final prediction is made by combining the results from all models, with more
weight given to models that performed better.
Key idea: AdaBoost adjusts itself based on what it learns from the errors of
earlier models.
Common use: It works well for both classification and regression tasks.
Strength: AdaBoost is simple and effective, but it can be sensitive to noisy data.
2. Gradient Boosting
• Like AdaBoost, Gradient Boosting also builds models sequentially, but with a key
difference: each new model is trained to predict the residual errors (the difference
between the actual and predicted values) of the previous models.
• Each model tries to minimize a loss function (such as mean squared error) by making
small corrections to the previous models' predictions.
• The predictions of all models are combined, usually by weighted summing.
Key idea: It’s focuses on correcting errors by directly improving the predictions in small
steps.
Common use: It’s widely used for both classification and regression, when accuracy is a
priority.
Strength: It is powerful and flexible but can be prone to overfitting if not tuned properly.

AdaBoost: Focuses on improving errors by adjusting the weights of misclassified data

Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
Unit - 3
No ratings yet
Unit - 3
73 pages
Pa Unit-Iii
No ratings yet
Pa Unit-Iii
75 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Chapter 04
No ratings yet
Chapter 04
48 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
NOTES
No ratings yet
NOTES
18 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
11) Elaborate On The Types of Machine Learning With Appropriate Examples
No ratings yet
11) Elaborate On The Types of Machine Learning With Appropriate Examples
9 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
ML Important
No ratings yet
ML Important
11 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
DL
No ratings yet
DL
10 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
2179 Unit 3
No ratings yet
2179 Unit 3
29 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
5 Learning
No ratings yet
5 Learning
7 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
ML Unit3
No ratings yet
ML Unit3
8 pages
Unit 4
No ratings yet
Unit 4
33 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
Tree
No ratings yet
Tree
7 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
No ratings yet
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
15 pages
Breast Cancer Diagnosis Using Machine
No ratings yet
Breast Cancer Diagnosis Using Machine
11 pages
Weather Forecasting Basepaper
100% (1)
Weather Forecasting Basepaper
14 pages
CUML1021 Machine Learning For Predictive Analytics Syllabus
No ratings yet
CUML1021 Machine Learning For Predictive Analytics Syllabus
4 pages
CH 5
No ratings yet
CH 5
21 pages
ML CP-23-24 EVEN As On 81.25
No ratings yet
ML CP-23-24 EVEN As On 81.25
13 pages
Deep Neural Networks, Gradient-Boosted Trees, Random Forests Statistical Arbitrage On The S&P 500
No ratings yet
Deep Neural Networks, Gradient-Boosted Trees, Random Forests Statistical Arbitrage On The S&P 500
33 pages
Murat Durmus - A Primer To The 42 Most Commonly Used Machine Learning Algorithms (With Code Samples) - Leanpub (2023)
No ratings yet
Murat Durmus - A Primer To The 42 Most Commonly Used Machine Learning Algorithms (With Code Samples) - Leanpub (2023)
192 pages
CS 11 01
No ratings yet
CS 11 01
124 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
An Introduction To Statistical Learning From A Reg PDF
No ratings yet
An Introduction To Statistical Learning From A Reg PDF
25 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
1 page
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
Fin Irjmets1741417092
No ratings yet
Fin Irjmets1741417092
8 pages
BoostingDEA and R Language
No ratings yet
BoostingDEA and R Language
8 pages
Detecting Cybersecurity Attacks Across Different Network Features and Learners
No ratings yet
Detecting Cybersecurity Attacks Across Different Network Features and Learners
29 pages
Hardware Implementation For Lower Limb Surface EMG Measurement and Analysis Using Explainable AI For Activity Recognition
No ratings yet
Hardware Implementation For Lower Limb Surface EMG Measurement and Analysis Using Explainable AI For Activity Recognition
9 pages
Project Proposal Chi
No ratings yet
Project Proposal Chi
6 pages
9 - Prediction of Air Pollution by Using Machine Learning Algorithm
No ratings yet
9 - Prediction of Air Pollution by Using Machine Learning Algorithm
59 pages
Ensembling Neural Networks: Many Could Be Better Than All: Zhi-Hua Zhou, Jianxin Wu, Wei Tang
No ratings yet
Ensembling Neural Networks: Many Could Be Better Than All: Zhi-Hua Zhou, Jianxin Wu, Wei Tang
23 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Al3451 ML - Questionbank - 3,4,5
No ratings yet
Al3451 ML - Questionbank - 3,4,5
11 pages
Proactive Collections Management: Using Artificial Intelligence To Predict Invoice Payment Dates By: Sonali Nanda
No ratings yet
Proactive Collections Management: Using Artificial Intelligence To Predict Invoice Payment Dates By: Sonali Nanda
22 pages
Data-Driven Early Diagnosis of Chronic Kidney Disease Development and Evaluation of An Explainable AI Model
No ratings yet
Data-Driven Early Diagnosis of Chronic Kidney Disease Development and Evaluation of An Explainable AI Model
11 pages
Data Mining Attrition Analysis
No ratings yet
Data Mining Attrition Analysis
14 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Detecting Parkinson'S Disease Using Machine Learning
No ratings yet
Detecting Parkinson'S Disease Using Machine Learning
3 pages
Applsci 09 01231 PDF
No ratings yet
Applsci 09 01231 PDF
19 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
4 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 03

Uploaded by

Chapter 03

Uploaded by

Fundamentals of Machine Learning

Prince Thomas M.E., PhD

Key Characteristics of Nonlinear Models:

Examples of Nonlinear Models:

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies

Random Forest is a popular machine learning algorithm that belongs to the

Both bagging and boosting aim to create a stronger overall model by

Step 1: Select random K data points from the training set.

AdaBoost: Focuses on improving errors by adjusting the weights of misclassified data

You might also like