Learn Random Forest Using Excel
Learn Random Forest Using Excel
https://www.newtechdojo.com/learn-random-forest-using-excel/ 1/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
You can’t control the inside functionality aside from changing the input
values
https://www.newtechdojo.com/learn-random-forest-using-excel/ 2/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
You probably already guessed the answer having already learned about decision
trees. Yes, just as a forest is a collection of trees, a random forest is also a
collection of decision trees. Decision trees that are grown very deep often overfit
the training data so they show high variation even on a small change in input data.
They are sensitive to the specific data on which they are trained so they are error
prone to test data sets. The random forest grows many such decision trees and
provide the average of the different classification trees (or the mode) and thus
reduces the variance. The different classification trees are trained on different
parts of the training dataset. To classify a new object from an input vector, put the
input vector down each of the trees in the forest. Each tree gives a classification,
the forest chooses the classification having the most votes or the average of all the
trees in the forest.
Fig: Working of Random Forest. The final classification value is the average (or mode) of the many
decision trees.
https://www.newtechdojo.com/learn-random-forest-using-excel/ 3/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
has more error as our sample is very small. But we can improve the accuracy by
taking a large number of random samples of the data with replacement and taking
the average of the mean of the subsamples.Bootstrap sampling de-correlates the
trees using different training sets as training many trees on a single training set
gives strongly correlated trees.
If all of the previous material seem daunting to you, worry not. Let us look at a
very simple example similar to see what it all means.
https://www.newtechdojo.com/learn-random-forest-using-excel/ 4/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
From the scatter plot we can see that we can’t partition the sets into two halves (as
we did in the decision tree). So, the idea here is to train multiple trees and then take
the mean (or mode) of all the predictions.
Let’s see what each tree would predict for the case (8, 6).
Model 1 predicts 0
Model 2 predicts 1
Model 3 predicts 1
https://www.newtechdojo.com/learn-random-forest-using-excel/ 5/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
Using model 1 only gives us wrong answer. But if we take the majority of the
predictions of all three then we get the right answer. Let’s look at another example
say (9, 17)
Model 1 predicts 0
Model 2 predicts 1
Model 3 predicts 0
In this case, Model 2 predicts wrong. But again taking majority we get the correct
answer. We see that each of the trees fail for some cases. But the combination
(forest) always gives a correct answer. This is the idea of random forests,
combining the prediction of multiple trees.
(Please refer to the section on decision trees and the excel worksheet to look at
detailed calculation of each tree)
3. The decision/regression trees (let’s say a function Fi) are trained on different
models. Each tree has grown to the largest extent possible without pruning.
https://www.newtechdojo.com/learn-random-forest-using-excel/ 6/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
If it is a classification then the majority of the prediction of all the trees is taken.
Number of features?
The number of features at each split point (m) must be supplied to the algorithm.
Following formula is used to calculate the recommended number for m.
https://www.newtechdojo.com/learn-random-forest-using-excel/ 7/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
?
As we mentioned earlier a single decision tree tends to overfit the data. The
process of averaging or combining the results of different decision trees helps
to overcome the problem of overfitting.
Random forest also has less variance than a single decision tree. It means that
it works correctly for a large range of data items than single decision trees.
Random forests are extremely flexible and have very high accuracy.
They also do not require preparation of the input data. You do not have to
scale the data.
It also maintains accuracy even when a large proportion of the data are
missing.
?
The main disadvantage of Random forests is their complexity. They are much
harder and time-consuming to construct than decision trees.
They also require more computational resources and are also less intuitive.
When you have a large collection of decision trees it is hard to have an
intuitive grasp of the relationship existing in the input data.
https://www.newtechdojo.com/learn-random-forest-using-excel/ 8/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel
https://www.newtechdojo.com/learn-random-forest-using-excel/ 9/9