0% found this document useful (0 votes)

154 views16 pages

Random Forest

Random Forest is an ensemble learning method that constructs multiple decision trees and outputs the class that is the mode of the classes or mean prediction of individual trees. It works by constructing decision trees on various sub-samples of the dataset and averaging their predictions. Some key hyperparameters are the number of trees, maximum depth of each tree, and number of features considered at each split. Cross-validation and grid search are used to tune hyperparameters and evaluate performance.

Uploaded by

tanmayi nandiraju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views16 pages

Random Forest

Uploaded by

tanmayi nandiraju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Random

Forest

1
Contents
• What is Random Forest?

• Ensemble Methods - Bagging

• How does Random Forest work?

• Hyper-Parameters in Random Forest

• Parameter Tuning - Cross-Validation & GridSearchCV

• Building RF in Scikit-learn

• Pros and Cons

2
What is Random Forest?
• Random Forest is a
supervised learning
algorithm and capable of
performing both regression
and classification tasks.

• As the name suggests,

Random Forest algorithm
creates a forest with a
number of decision trees.

3
Ensemble method
• Use multiple learning algorithms to obtain
better predictions.
• Train various different models, aggregate
their predictions to improvise stability and
predictive power.

• As we see, we need numbers of

models(learners), whose predictive power
is just slightly better than random chance.
Such learners are called as weak learners.
• We take such weak learners to make one
combined strong learner.

4
Bagging
• The idea behind bagging is combining the results of multiple models (for
instance, all decision trees) to get a generalized result.
• Bagging uses a sampling technique called – Bootstrapping.
• Bootstrapping is a sampling technique in which we create subsets of
observations from the original dataset, with replacement.
• Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to
get a fair idea of the distribution (complete set).
• The size of subsets created for bagging may be same or less than the original
set.

5
Bagging
• Multiple subsets are created from the original dataset, selecting
observations with replacement.

6
Bagging
• A base model (weak model) is created on
each of these subsets.
• The models run in parallel and are
independent of each other.
• The final predictions are determined by
combining the predictions from all the
models.

7
How does Random Forest work?
• RF consists multiple decision trees which act as base learners.

• Each decision tree is given a subset of random samples from the data set
(hence the name random).

• RF algorithm uses an Ensemble method – Bagging (Bootstrap Aggregating)

• Then, Random Forest train each base learner (i.e Decision Tree) on a different
sample of data and the sampling of data points happens with replacement.

8
Example
• Consider a training dataset : [X1, X2, X3, … X10, Y].

• Random forest will create decision trees taking the input from subset using
bagging as shown below:

9
Hyper-Parameters Random Forest
• Optimization of RF depends on few inbuilt parameters.

• n_estimators* - number of decision trees that the algorithm creates. As the

number tree increases, the performance increases and the predictions are
more stable but it slows down the computation.

• max_features* - maximum number of features that are considered for

splitting a node.

• n_jobs - number of jobs to run in parallel. If n_jobs=1, it uses one processor.

If n_jobs=-1, then the number of jobs is set to the number of cores available.

10
Parameters Random Forest
• max_depth is the maximum depth of the tree. The deeper the tree, the more
splits it has and it captures more information about the data.

• criterion is the function to measure the quality of a split. Supported criteria

are “gini” for the Gini impurity and “entropy” for the information gain.

11
Cross-Validation (CV)
• Cross-validation is a statistical method used to estimate the performance of
machine learning models.

• It is a resampling procedure used to evaluate machine learning models on a

limited data sample.

• The most common method is K-Fold CV.

• Normally, we split the data into – train & test data sets.

• In K-fold CV, training data is further split into K number of subsets, called
folds.

12
Cross-Validation (CV)
• Then iteratively fit the model K times, each time training the data on K-1 of
the folds and evaluating on the Kth fold (called the validation data).

• For example, consider the train data is splitted into 5 folds (K = 5).

• 1st Iteration - train on the first four folds and evaluate on the fifth.

• 2nd Iteration - train on the first, second, third, and fifth fold and evaluate on the fourth.

• And repeat the same procedure.

• At end of training, we average the performance on each of the folds to come

up with final validation metrics for the model.

13
Cross-Validation (CV)
• 5 Fold Cross Validation –

• For hyperparameter tuning, we perform many iterations of the entire K-Fold CV

process, each time using different model settings.

• If we have 10 sets of hyperparameters and are using 5-Fold CV, that represents 50
training loops.

14
GridSearchCV
• Grid-search is used to find the optimal hyperparameters of a model which
results in the most ‘accurate’ predictions.

• To implement the Grid Search algorithm we need to import GridSearchCV

class from the sklearn.model_selection library.

• The first step you need to perform is to create a dictionary of all the
parameters and their corresponding set of values that you want to test for
best performance.

15
Pros & Cons
Pros:
• Random Forest algorithm avoids overfitting.
• For both classification and regression task, the same random forest algorithm can be used.
• The Random Forest algorithm can be used for identifying the most important features from
the training dataset. It helps in feature engineering.

Cons:
• Random Forest is difficult to interpret. Because of averaging the results of many trees
becomes hard for us to figure out why a random forest is making predictions the way it is.
• Random Forest takes a longer time to create. It is computationally expensive compared to a
Decision Tree.

ANN Unit-2 Chapter-2
No ratings yet
ANN Unit-2 Chapter-2
56 pages
Single Layer Perceptron
No ratings yet
Single Layer Perceptron
6 pages
Assignment On PL FOL
No ratings yet
Assignment On PL FOL
11 pages
Kripke S Worlds An Introduction To Modal Logics Via Tableaux 1st Edition Olivier Gasquet PDF Download
No ratings yet
Kripke S Worlds An Introduction To Modal Logics Via Tableaux 1st Edition Olivier Gasquet PDF Download
75 pages
Compiler Design CS - 3
No ratings yet
Compiler Design CS - 3
66 pages
AI Uninformed Search Strategies by Examples
No ratings yet
AI Uninformed Search Strategies by Examples
18 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
AI Search Iterative Deepening
No ratings yet
AI Search Iterative Deepening
4 pages
Math4ml PDF
No ratings yet
Math4ml PDF
21 pages
TOC - Question Answer
No ratings yet
TOC - Question Answer
41 pages
Modal Logic PDF
No ratings yet
Modal Logic PDF
236 pages
2.3. Big-O Notation - Problem Solving With Algorithms and Data Structures
No ratings yet
2.3. Big-O Notation - Problem Solving With Algorithms and Data Structures
4 pages
Spring Expression Language
No ratings yet
Spring Expression Language
2 pages
Lab 6
No ratings yet
Lab 6
2 pages
A Tutorial On Lambda Prolog
No ratings yet
A Tutorial On Lambda Prolog
35 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
16 pages
Research Paper Compiler
No ratings yet
Research Paper Compiler
9 pages
Ceres
No ratings yet
Ceres
54 pages
SLR Parsing
No ratings yet
SLR Parsing
66 pages
Abstract Data Types
No ratings yet
Abstract Data Types
19 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
28 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Unix
No ratings yet
Unix
97 pages
Dbms
No ratings yet
Dbms
95 pages
SOLUCION DE ERRORES - Pruebas Unitarias-Funcionales - Acetpacion
No ratings yet
SOLUCION DE ERRORES - Pruebas Unitarias-Funcionales - Acetpacion
18 pages
Dimensionality Reduction: Principal Component Analysis (PCA)
No ratings yet
Dimensionality Reduction: Principal Component Analysis (PCA)
11 pages
Natural Language Grammar Induction of Indonesian Language Corpora Using Genetic Algorithm
No ratings yet
Natural Language Grammar Induction of Indonesian Language Corpora Using Genetic Algorithm
4 pages
16 Decidable Cfgs
No ratings yet
16 Decidable Cfgs
26 pages
Comp 304 Quiz
No ratings yet
Comp 304 Quiz
13 pages
The Concept of Formality in Mathematics: by Hiroshi Nagai Tokyo University of Education
No ratings yet
The Concept of Formality in Mathematics: by Hiroshi Nagai Tokyo University of Education
24 pages
Lecture 3 - Introduction To Linear Algebra, Probability and Statistics (DONE!!)
No ratings yet
Lecture 3 - Introduction To Linear Algebra, Probability and Statistics (DONE!!)
41 pages
PINN Gentle Introduction
No ratings yet
PINN Gentle Introduction
26 pages
Search Techniques-Uninformed Search
No ratings yet
Search Techniques-Uninformed Search
67 pages
CS606 FinalTerm MCQs With Reference Solved by Arslan
100% (1)
CS606 FinalTerm MCQs With Reference Solved by Arslan
37 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Big O Notes
No ratings yet
Big O Notes
1 page
Recursion: Genome 559: Introduction To Statistical and Computational Genomics
No ratings yet
Recursion: Genome 559: Introduction To Statistical and Computational Genomics
42 pages
Solutions To Set 6
No ratings yet
Solutions To Set 6
8 pages
Python - Make Your Own Mandelbrot Set PDF
No ratings yet
Python - Make Your Own Mandelbrot Set PDF
8 pages
ARM Cortex-A57 Block Diagram
No ratings yet
ARM Cortex-A57 Block Diagram
1 page
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Errata in The Logic Book 6th Edition
No ratings yet
Errata in The Logic Book 6th Edition
10 pages
FALLSEM2019-20 STS4021 SS VL2019201000258 Reference Material I 16-Jul-2019 Ppt-A2 Module 1 PDF
No ratings yet
FALLSEM2019-20 STS4021 SS VL2019201000258 Reference Material I 16-Jul-2019 Ppt-A2 Module 1 PDF
91 pages
A Tutorial Introduction To The Lambda Calculus: Raul Rojas
No ratings yet
A Tutorial Introduction To The Lambda Calculus: Raul Rojas
17 pages
Automata Theory and Computability PDF
No ratings yet
Automata Theory and Computability PDF
189 pages
MLOPs Original
No ratings yet
MLOPs Original
27 pages
Artificial Neural Networks: Classification Using Multilayer Perceptron Model
No ratings yet
Artificial Neural Networks: Classification Using Multilayer Perceptron Model
15 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Random Forest
No ratings yet
Random Forest
29 pages
LA
No ratings yet
LA
138 pages
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
No ratings yet
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
17 pages
Random Forest Reference Code
No ratings yet
Random Forest Reference Code
19 pages
ML LAB Mannual-1
No ratings yet
ML LAB Mannual-1
79 pages
CSE 20 Homework 1: Solutions
No ratings yet
CSE 20 Homework 1: Solutions
3 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
KRR Notes
No ratings yet
KRR Notes
5 pages
Lab Manual: Shri S'ad Vidya Mandal Institute of Technology
No ratings yet
Lab Manual: Shri S'ad Vidya Mandal Institute of Technology
4 pages
Normal Forms (Logic) PDF
No ratings yet
Normal Forms (Logic) PDF
36 pages
Dual Methods For The Minimization of The Total Variation
No ratings yet
Dual Methods For The Minimization of The Total Variation
30 pages
Assignment - Intro To OpenGL & CGAL
No ratings yet
Assignment - Intro To OpenGL & CGAL
3 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
Random Forest
No ratings yet
Random Forest
25 pages
Artificial Neural Networks Video Tutorial: Machine Learning 17CS73
No ratings yet
Artificial Neural Networks Video Tutorial: Machine Learning 17CS73
23 pages
CS2106 Cheatsheet
No ratings yet
CS2106 Cheatsheet
6 pages
Unproject Explained
No ratings yet
Unproject Explained
4 pages
CNN MATLAB Lab Instructions
No ratings yet
CNN MATLAB Lab Instructions
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Gauss Jacobi and Gauss Seidal
No ratings yet
Gauss Jacobi and Gauss Seidal
23 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
A Tutorial On CGAL Polyhedron For Subdivision Algorithms
No ratings yet
A Tutorial On CGAL Polyhedron For Subdivision Algorithms
25 pages
PLSQL
No ratings yet
PLSQL
49 pages
FCM 3
67% (3)
FCM 3
66 pages
Data Structures by Fareed Sem IV 27-Jan-2020 Pages 213 Completed W1
No ratings yet
Data Structures by Fareed Sem IV 27-Jan-2020 Pages 213 Completed W1
213 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Computation Geometry Algorithms Library From CGAL
No ratings yet
Computation Geometry Algorithms Library From CGAL
27 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
Compiler Blowupn
No ratings yet
Compiler Blowupn
1 page
Xpectation Aximization: Grading An Exam Without An Answer Key
No ratings yet
Xpectation Aximization: Grading An Exam Without An Answer Key
9 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Convert To Clause Form:: 1-Eliminate The Implication (
No ratings yet
Convert To Clause Form:: 1-Eliminate The Implication (
8 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Least-Squares Fitting of Two 3-D Point Sets
No ratings yet
Least-Squares Fitting of Two 3-D Point Sets
3 pages

Random Forest

Uploaded by

Random Forest

Uploaded by

Random

• Ensemble Methods - Bagging

• How does Random Forest work?

• Hyper-Parameters in Random Forest

• Parameter Tuning - Cross-Validation & GridSearchCV

• Pros and Cons

• As the name suggests,

• As we see, we need numbers of

• RF algorithm uses an Ensemble method – Bagging (Bootstrap Aggregating)

• n_estimators* - number of decision trees that the algorithm creates. As the

• max_features* - maximum number of features that are considered for

• n_jobs - number of jobs to run in parallel. If n_jobs=1, it uses one processor.

• criterion is the function to measure the quality of a split. Supported criteria

• It is a resampling procedure used to evaluate machine learning models on a

• The most common method is K-Fold CV.

• And repeat the same procedure.

• At end of training, we average the performance on each of the folds to come

• For hyperparameter tuning, we perform many iterations of the entire K-Fold CV

• To implement the Grid Search algorithm we need to import GridSearchCV

You might also like