0% found this document useful (0 votes)
176 views

Learn Random Forest Using Excel

Random forest is a machine learning algorithm that uses ensemble learning. It creates multiple decision trees during training and outputs the class that is the mode of the classes or mean prediction of individual trees. Each tree is grown using a random subset of features and a random subset of samples. This reduces correlation between trees and improves accuracy. Random forests can handle both classification and regression problems, are robust to outliers and missing data, and do not require data normalization.

Uploaded by

kPrasad8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views

Learn Random Forest Using Excel

Random forest is a machine learning algorithm that uses ensemble learning. It creates multiple decision trees during training and outputs the class that is the mode of the classes or mean prediction of individual trees. Each tree is grown using a random subset of features and a random subset of samples. This reduces correlation between trees and improves accuracy. Random forests can handle both classification and regression problems, are robust to outliers and missing data, and do not require data normalization.

Uploaded by

kPrasad8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

Learn Random Forest using Excel


December 27, 2017 NewTechDojo Machine Learning

Learn Random Forest using Excel - Machine Learning Algorithm


Beginner guide to learn the most well known and well-understood algorithm in
statistics and machine learning. In this post, you will discover the Random Forest
Algorithm using Excel Machine Learning, Also, how it works using Excel,
application and pros and cons.

Quick facts about Random Forest

https://www.newtechdojo.com/learn-random-forest-using-excel/ 1/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

Random forest algorithm consists of a random collection of decision trees

Random subset of training data provided to each decision tree

Bagging or bootstrap aggregating is used. It’s a general procedure that can be


used to reduce the variance of algorithms that have high variance

Not so good for Regression

You can’t control the inside functionality aside from changing the input
values

Maintains accuracy, even when data is missing

Can handle large datasets with a large number of attributes

Watch a video on Random Forest

Random Forest Algorithm Using Excel Machine


learning

What is a Random Forest?

https://www.newtechdojo.com/learn-random-forest-using-excel/ 2/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

You probably already guessed the answer having already learned about decision
trees. Yes, just as a forest is a collection of trees, a random forest is also a
collection of decision trees. Decision trees that are grown very deep often overfit
the training data so they show high variation even on a small change in input data.
They are sensitive to the specific data on which they are trained so they are error
prone to test data sets. The random forest grows many such decision trees and
provide the average of the different classification trees (or the mode) and thus
reduces the variance. The different classification trees are trained on different
parts of the training dataset. To classify a new object from an input vector, put the
input vector down each of the trees in the forest. Each tree gives a classification,
the forest chooses the classification having the most votes or the average of all the
trees in the forest.

Fig: Working of Random Forest. The final classification value is the average (or mode) of the many
decision trees.

How are trees grown in a Random Forest?


The training algorithm for random forests applies the general technique of
bootstrap aggregating, or bagging, to tree learners. To see what bootstrap means
let us suppose we have a sample of 50 data values. Calculating the mean directly

https://www.newtechdojo.com/learn-random-forest-using-excel/ 3/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

has more error as our sample is very small. But we can improve the accuracy by
taking a large number of random samples of the data with replacement and taking
the average of the mean of the subsamples.Bootstrap sampling de-correlates the
trees using different training sets as training many trees on a single training set
gives strongly correlated trees.

Random Forest is same as the original bagging algorithm but with one difference.


It extends the bootstrap algorithm by applying different machine learning
algorithms to each of the decision trees. The way that each subtree is learned is
different in random forests. Random forests reduce the correlation among the
subtrees as each one is learned using a different mode.

If all of the previous material seem daunting to you, worry not. Let us look at a
very simple example similar to see what it all means.

https://www.newtechdojo.com/learn-random-forest-using-excel/ 4/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

Fig: Data and Scatterplot

From the scatter plot we can see that we can’t partition the sets into two halves (as
we did in the decision tree). So, the idea here is to train multiple trees and then take
the mean (or mode) of all the predictions.

Let us take three tree splits as follows:


Model 1: X1<9
Model 2: X1<6
Model 3: X2>9

Let’s see what each tree would predict for the case (8, 6).

Model 1 predicts 0

Model 2 predicts 1

Model 3 predicts 1

https://www.newtechdojo.com/learn-random-forest-using-excel/ 5/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

Using model 1 only gives us wrong answer. But if we take the majority of the
predictions of all three then we get the right answer. Let’s look at another example
say (9, 17)

Model 1 predicts 0
Model 2 predicts 1
Model 3 predicts 0

In this case, Model 2 predicts wrong. But again taking majority we get the correct
answer. We see that each of the trees fail for some cases. But the combination
(forest) always gives a correct answer. This is the idea of random forests,
combining the prediction of multiple trees.
(Please refer to the section on decision trees and the excel worksheet to look at
detailed calculation of each tree)

Let us summarize the steps in classification or regression using Random forests.


Suppose we have a training set X = {x1,x2,…….,x¬n} with class labels(values).

1. First, we sample at random with replacement (B times) from the original


data. This sample functions as the training set for growing the tree.

2. If there are M input variables, a number m<<M is specified such that at each


node, m variables are selected at random out of the M and the best split on
this m is used to split the node. The value of m is held constant during the
forest growing.

3. The decision/regression trees (let’s say a function Fi) are trained on different
models. Each tree has grown to the largest extent possible without pruning.

Prediction in Random Forest


If it is a regression problem then the predictions for test samples xt are done by
taking the mean of the prediction by all of the trees.

https://www.newtechdojo.com/learn-random-forest-using-excel/ 6/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

If it is a classification then the majority of the prediction of all the trees is taken.

How many trees Random Forest need to train?


The number of samples/trees B is taken typically from a few hundred to several
thousand depending upon the size and nature of the training set. It can also be
found using cross-validation, or by observing the out-of-bag error.  The out-of-bag
error is the mean prediction error on each training sample xᵢ, using only the trees
that did not have xᵢ in their bootstrap sample.

Number of features?
The number of features at each split point (m) must be supplied to the algorithm.
Following formula is used to calculate the recommended number for m.

1. For classification problem with p features, m =√p.

2. For regression problems, m=p/3.

Why are Random Forests used?


Random forests are very widely used because they have some very desirable
properties. First of all, they correct the overfitting problem that plagues normal
decision trees. They have unparalleled accuracy among the current algorithms and
can run on very large datasets. They also have an effective method for estimating
missing data and maintaining accuracy when large chunks of the data are missing.

Pros and Cons of Random Forest:

https://www.newtechdojo.com/learn-random-forest-using-excel/ 7/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

?
As we mentioned earlier a single decision tree tends to overfit the data. The
process of averaging or combining the results of different decision trees helps
to overcome the problem of overfitting.

Random forest also has less variance than a single decision tree. It means that
it works correctly for a large range of data items than single decision trees.

Random forests are extremely flexible and have very high accuracy.

They also do not require preparation of the input data. You do not have to
scale the data.

It also maintains accuracy even when a large proportion of the data are
missing.

?
The main disadvantage of Random forests is their complexity. They are much
harder and time-consuming to construct than decision trees.

They also require more computational resources and are also less intuitive.
When you have a large collection of decision trees it is hard to have an
intuitive grasp of the relationship existing in the input data.

In addition, the prediction process using random forests is time-consuming


than other algorithms.

https://www.newtechdojo.com/learn-random-forest-using-excel/ 8/9
10/18/21, 6:45 PM [1200+ Share] What is Random Forest & Learn Random Forest using Excel

https://www.newtechdojo.com/learn-random-forest-using-excel/ 9/9

You might also like