0% found this document useful (0 votes)

2 views13 pages

UNIT III Word File

The document discusses decision trees, a type of supervised machine learning model used for classification and regression tasks, highlighting their structure, advantages, and applications. It also covers ensemble learning techniques, specifically boosting and bagging, which combine multiple models to improve prediction accuracy and reduce overfitting. Key algorithms mentioned include AdaBoost, Gradient Boosting, XGBoost, and CatBoost, each with unique methods for enhancing model performance.

Uploaded by

Shaik Khalid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views13 pages

UNIT III Word File

Uploaded by

Shaik Khalid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT - III Learning with Trees – Decision Trees – Constructing Decision Trees – Classification

and Regression Trees – Ensemble Learning – Boosting – Bagging – Different ways to Combine
Classifiers – Basic Statistics – Gaussian Mixture Models – Nearest Neighbor Methods –
Unsupervised Learning – K means Algorithms

Decision Tree
Decision tree is a simple diagram that shows different choices and their possible results
helping you make decisions easily.

Decision Trees are a type of Supervised Machine Learning where the data is continuously
split according to a certain parameter.

✔ The tree can be explained by two entities, namely decision nodes and leaves.

✔ The leaves are the decisions or the final outcomes. And the decision nodes are where
the data is split.

Understanding Decision Tree

A decision tree is a graphical representation of different options for solving a problem and
show how different factors are related. Root Node is the starting point that represents the
entire dataset.

 Branches: These are the lines that connect nodes. It shows the flow from one
decision to another.

 Internal Nodes are Points where decisions are made based on the input features.

 Leaf Nodes: These are the terminal nodes at the end of branches that represent final
outcomes or predictions

Now, let’s take an example to understand the decision tree. Imagine you want to decide
whether to drink coffee based on the time of day and how tired you feel. First the tree
checks the time of day—if it’s morning it asks whether you are tired. If you’re tired the tree
suggests drinking coffee if not it says there’s no need. Similarly in the afternoon the tree
again asks if you are tired. If you recommends drinking coffee if not it concludes no coffee is
needed.

Classification of Decision Tree

We have mainly two types of decision tree based on the nature of the target
variable: classification trees and regression trees.

 Classification trees: Classification is used when you want to categorize data into
different classes or groups. For example, classifying emails as "spam" or "not spam"
or predicting whether a patient has a certain disease based on their symptoms.

 Regression trees : regression handles continuous values (e.g., price, temperature).

Regression algorithms predict a continuous value based on input data. This is used
when you want to predict numbers such as income, height.

 Decision Trees: Advantages, Disadvantages, and Applications

 Advantages:

 Easy to Understand: Works like a flowchart, making decision-making clear.

 Versatile: Suitable for both classification and regression tasks.

 No Need for Scaling: No data normalization required.

 Handles Non-Linear Relationships: Captures complex patterns effectively.

 Disadvantages:

 Overfitting: Can become too complex and perform poorly on new data.
 Instability: Small data changes can lead to big variations in predictions.

 Bias Toward Many-Level Features: Might focus too much on features with many
categories, reducing accuracy.

 Applications:

 Bank Loan Approval: Uses customer details (income, credit score, etc.) to decide
loan approval.

 Medical Diagnosis: Helps predict diseases like diabetes based on test results.

 Student Exam Predictions: Identifies at-risk student

Constructing Decision Trees

Ensemble Learning
Ensemble learning is a method where we use many small models instead of just one. Each of
these models may not be very strong on its own, but when we put their results together, we
get a better and more accurate answer. It's like asking a group of people for advice instead of
just one person—each one might be a little wrong, but together, they usually give a better
answer

 In machine learning, ensemble learning is a technique that combines multiple weak

learners to create a strong learner.

 The idea is that a group of weak learners can perform better than any single weak
learner.

Types of Ensembles Learning in Machine Learning

There are three main types of ensemble methods:

1.
Bagging (Bootstrap Aggregating):
Models are trained independently on different random subsets of the training data.
Their results are then combined—usually by averaging (for regression) or voting (for
classification). This helps reduce variance and prevents overfitting.

2. Boosting:
Models are trained one after another. Each new model focuses on fixing the errors
made by the previous ones. The final prediction is a weighted combination of all
models, which helps reduce bias and improve accuracy.

3. Stacking (Stacked Generalization):

Multiple different models (often of different types) are trained, and their predictions
are used as inputs to a final model, called a meta-model. The meta-model learns how
to best combine the predictions of the base models, aiming for better performance
than any individual model

Boosting in Machine Learning

Boosting is an ensemble learning technique that sequentially combines multiple weak
classifiers to create a strong classifier. It is done by training a model using training data and is
then evaluated. Next model is built on that which tries to correct the errors present in the
first model. This procedure is continued and models are added until either the complete
training data set is predicted correctly or predefined number of iterations is reached.

Think of it like in a class a teacher focuses more on weak learners to improve its academic
performance similarly boosting works.

 weak learner and Strong learner in Boosting

 It is common to describe ensemble learning techniques as weak and strong learners.
A weak learner is a model that performs slightly better than random guessing. The
most used type of weak learning model is the decision tree. This is because the tree’s
depth can control the tree’s weakness during construction.

 While Strong learners have higher prediction accuracy, Boosting converts a system of
weak learners into a single strong learning system. A strong learner is a model that
tries to overcome the weakness and errors of the weak model to give better
predictions.
Increase or improve : Boosting can also mean to increase or improve something. For
example, "boosted him up over the fence"

AdaBoost(adaptive boosting)

 AdaBoost, short for Adaptive Boosting, is an ensemble machine learning algorithm

that can be used in a wide variety of classification and regression tasks.

 It is a supervised learning algorithm that is used to classify data by combining

multiple weak or base learners (e.g., decision trees) into a strong learner.

 AdaBoost works by weighting the instances in the training dataset based on the
accuracy of previous classifications.

Boosting is an ensemble learning technique that sequentially combines multiple weak

classifiers to create a strong classifier. It is done by training a model using training data and is
then evaluated. Next model is built on that which tries to correct the errors present in the
first model. This procedure is continued and models are added until either the complete
training data set is predicted correctly or predefined number of iterations is reached.

its a boosting technique that assigns equal weights to all training samples initially and
iteratively adjusts these weights by focusing more on misclassified datapoints for next
model. It effectively reduces bias and variance making it useful for classification tasks but it
can be sensitive to noisy data and outliers.

Types Of Boosting Algorithms

There are several types of boosting algorithms some of the most famous and useful models
are as :

1. Gradient Boosting: it constructs models in a sequential manner where each weak

learner minimizes the residual error of the previous one using gradient descent.
Instead of adjusting sample weights like AdaBoost Gradient Boosting reduces error
directly by optimizing a loss function.

2. XGBoost: It is an optimized version of Gradient Boosting that uses rugularization to

prevent overfitting. It is faster, efficient and supports handling both numerical and
categorical variables.

3. CatBoost: It is particularly effective for datasets with categorical features. It employs

symmetric decision trees and a unique encoding method that considers target
values, making it superior in handling categorical data without preprocessing.

4. ALGORITHM

5. Initialise the dataset and assign equal weight to each of the data point.

6. Provide this as input to the model and identify the wrongly classified data points.
7. Increase the weight of the wrongly classified data points and decrease the weights of
correctly classified data points. And then normalize the weights of all data points.

8. if (got required results)

Goto step 5
else
Goto step 2

9. End

BAGGING
It can be used for both regression and classification tasks.

Bootstrap Sampling: Randomly selects subsets of data with replacement to create

diverse training samples.

 Base Model Training: Trains multiple weak models independently on different

subsets.

 Parallel Learning: Each model learns simultaneously, improving efficiency.

 Prediction Aggregation: Combines predictions using majority voting (classification)

or averaging (regression).

 Out-of-Bag (OOB) Evaluation: Uses leftover data to estimate performance without

extra validation.

 Final Prediction: Merges all model outputs to make a strong, stable final prediction.

It helps reduce variance, avoid overfitting, and boost accuracy in machine learning

Bootstrap aggregating

Bagging, short for bootstrap aggregating, is a machine learning ensemble method

designed to improve the stability and accuracy of algorithms.

The simplest way of combining predictions that

belong to the same type.

Aim to decrease variance, not bias.

Each model receives equal weight.

Each model is built independently.

Different training data subsets are selected using row sampling with replacement and
random sampling methods from the entire training dataset.
Bagging tries to solve the over-fitting problem.
If the classifier is unstable (high variance), then apply bagging.

In this base classifiers are trained parallelly.

Example: The Random forest model uses Bagging

ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Ensemble, Voting, Bagging, Boosting
No ratings yet
Ensemble, Voting, Bagging, Boosting
15 pages
OHST Complete Guide
100% (1)
OHST Complete Guide
24 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Unit 4
No ratings yet
Unit 4
17 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Gradient Boosted Trees: Dr. Geetha Kuntoji
No ratings yet
Gradient Boosted Trees: Dr. Geetha Kuntoji
24 pages
AI25
No ratings yet
AI25
7 pages
M4 - FDS
No ratings yet
M4 - FDS
15 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Module 2
No ratings yet
Module 2
34 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
ML Exp 9
No ratings yet
ML Exp 9
3 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
26 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
Chapter 4 (Acc)
No ratings yet
Chapter 4 (Acc)
25 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Bagging
No ratings yet
Bagging
7 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
AI & ML Unit 4 Notes
No ratings yet
AI & ML Unit 4 Notes
16 pages
Unit 4 - ML
No ratings yet
Unit 4 - ML
38 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Examples of Good and Bad Thesis Statements Middle School
100% (3)
Examples of Good and Bad Thesis Statements Middle School
5 pages
Advertising Briefs
No ratings yet
Advertising Briefs
16 pages
Unit 4
No ratings yet
Unit 4
24 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
CPM Measure+What+Matters
No ratings yet
CPM Measure+What+Matters
3 pages
Chapter 3 - Quality Management
No ratings yet
Chapter 3 - Quality Management
2 pages
Optimal Volt/var Control in Distribution Systems: Yutian Liu, Peng Zhang, Xizhao Qiu
No ratings yet
Optimal Volt/var Control in Distribution Systems: Yutian Liu, Peng Zhang, Xizhao Qiu
6 pages
Introduction To The Arts and Humanities
No ratings yet
Introduction To The Arts and Humanities
13 pages
Mini Scale Research - Algebraic Thinking
No ratings yet
Mini Scale Research - Algebraic Thinking
22 pages
Pakeducation
No ratings yet
Pakeducation
8 pages
Comparison of Vehicle-Based Crash Severity Metrics For Predicting
No ratings yet
Comparison of Vehicle-Based Crash Severity Metrics For Predicting
23 pages
Nhs FPX 4000 Assessment 4 Analyzing A Current Health Care Problem or Issue
No ratings yet
Nhs FPX 4000 Assessment 4 Analyzing A Current Health Care Problem or Issue
6 pages
Cognitive Psychology 7th Edition Robert J Sternberg Karin Sternberg
No ratings yet
Cognitive Psychology 7th Edition Robert J Sternberg Karin Sternberg
324 pages
Befa Unit-I
No ratings yet
Befa Unit-I
40 pages
Implementation of E-Logistics in Supply Chain Operations
No ratings yet
Implementation of E-Logistics in Supply Chain Operations
5 pages
FDP Program Imr 2023
No ratings yet
FDP Program Imr 2023
2 pages
Farmakopunktur Pada Kanker
No ratings yet
Farmakopunktur Pada Kanker
15 pages
Provisional Program v2
No ratings yet
Provisional Program v2
12 pages
CENG Simplified by NKay
No ratings yet
CENG Simplified by NKay
11 pages
Epidemiology Revision Module - 20240920 - JT
No ratings yet
Epidemiology Revision Module - 20240920 - JT
23 pages
Tvl-Css g1 12 Socrates Final Paper
No ratings yet
Tvl-Css g1 12 Socrates Final Paper
129 pages
Project Housing & The Architectural Profession in The 1960s
No ratings yet
Project Housing & The Architectural Profession in The 1960s
346 pages
Value Consensus and Partner Satisfaction Among Dating Couples
No ratings yet
Value Consensus and Partner Satisfaction Among Dating Couples
9 pages
ARTICLE Levelofagreement ADRencoder
No ratings yet
ARTICLE Levelofagreement ADRencoder
13 pages
ArtikelProsidingISEHTUNNES HamdanHuseinBatubara
No ratings yet
ArtikelProsidingISEHTUNNES HamdanHuseinBatubara
9 pages
A Closer Look at German Cost Accounting Methods PDF
No ratings yet
A Closer Look at German Cost Accounting Methods PDF
16 pages
Artikel Tri Herawati - UMUS
No ratings yet
Artikel Tri Herawati - UMUS
9 pages
Chapter Three-Wps Office
No ratings yet
Chapter Three-Wps Office
4 pages
Sample Community Survey
No ratings yet
Sample Community Survey
2 pages
Auditoria de Calidad Vs Inspeccion
No ratings yet
Auditoria de Calidad Vs Inspeccion
3 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet