0% found this document useful (0 votes)
50 views2 pages

Random Forest Summary - Rashmi

The document discusses random forest, an algorithm that constructs multiple decision trees and outputs the majority vote of the trees. It works by bootstrapping the data to create training subsets, building decision trees on each subset, then aggregating the trees' votes or averages. Random forest reduces overfitting, has good accuracy, and estimates missing data, though it requires more memory and computation than a single decision tree.

Uploaded by

rashmi bhaila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views2 pages

Random Forest Summary - Rashmi

The document discusses random forest, an algorithm that constructs multiple decision trees and outputs the majority vote of the trees. It works by bootstrapping the data to create training subsets, building decision trees on each subset, then aggregating the trees' votes or averages. Random forest reduces overfitting, has good accuracy, and estimates missing data, though it requires more memory and computation than a single decision tree.

Uploaded by

rashmi bhaila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Fall 2023

9/29/2023

Random Forest
Introduction
It is constructed using multiple decision trees and the final decision is obtained by majority
votes of decision tree. Some of the advantages of the Random forest are low variance that is
it combines the result of multiple decision tree and also each decision tree is trained on
limited data set (making its own subset of data), it reduce overfitting that is model is fitted
well and use bootstrapped( aggregating the result of the decision trees and taking majority
vote in case of classification & mean in case of regression problems and give output), it does
not need normalization that is it works on rule based approach, it have good accuracy ( run
efficiently on large database) and it also estimates missing data.

Working of Random Forest Algorithm


Firstly, it performs bootstrapping (Random sampling with Replacement), that it creates
multiple subsets of the training data through a process called bootstrapping. It randomly
selects samples from original dataset with replacement. Secondly, it develops decision trees,
that is for each subset of the data, a decision tree is trained independently. At each node of
the decision tree, a random subset of features is considered for splitting. This introduces
diversity among the individual trees and prevent overfitting. Thirdly, it works on voting
(Classification) or averaging (regression). After growing a forest of decision trees, when
making predictions, Random forest aggregates the predictions of all individual trees. For
classification tasks, each tree “votes” for a class, and the class with the most votes becomes
the final prediction (majority voting). For regression tasks, the predictions of all trees are
averaged to produce the final regression prediction. Fourthly, it reduces overfitting that is the
random selection of data points and features at each step introduces randomness ad reduces
the risk of overfitting. Finally, it also estimates error, since, each tree is trained on a different
subset of data, the samples not included in a particular tree’s boot strap sample can be used
to estimate its prediction accuracy.
However, there are still some disadvantages of Random forest, that more training time is
required and more decision tree is needed. Also, it requires more memory and
computationally expensive.
Bootstrapping (Random Sampling with Replacement):
 Random Forest starts by creating multiple subsets of the training data through a process
called bootstrapping. It randomly selects samples from the original dataset with replacement,
which means that some data points may be repeated in a subset, while others may be
omitted.

You might also like