How The Random Forest Algorithm Works in Machine Learning
How The Random Forest Algorithm Works in Machine Learning
Random forest algorithm can use both for classification and the
regression kind of problems.
The Same algorithm both for classification and regression, You mind be thinking I am
kidding. But the truth is, Yes we can use the same random forest algorithm both for
classification and regression.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 1/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Excited, I do have the same feeling when I first heard about the advantage of the random
forest algorithm. Which is the same algorithm can use for both regression and
classification problems.
In this article, you are going to learn, how the random forest algorithm works in
machine learning for the classification task. In the next coming article, you can learn
about how the random forest algorithm can use for regression.
Shares Get a cup of coffee before you begin, As this is going to be a long article 😛
Table of Contents:
In general, the more trees in the forest the more robust the forest looks like. In the same
way in the random forest classifier, the higher the number of trees in the forest gives the
high the accuracy results.
If you know the decision tree algorithm. You might be thinking are we creating more
number of decision trees and how can we create more number of decision trees. As all the
calculation of nodes selection will be the same for the same dataset.
Shares Yes. You are true. To model more number of decision trees to create the forest you are not
going to use the same apache of constructing the decision with information gain or Gini
index approach.
If you are not aware of the concepts of decision tree classifier, Please spend some time on
the below articles, As you need to know how the decision tree classifier works before you
learning the working nature of the random forest algorithm. If you would like to learn the
implementation of the decision tree classifier, you can check it out from the below articles.
If you are new to the concept of the decision tree. I am giving you a basic overview of the
decision tree.
The decision tree concept is more to the rule-based system. Given the training dataset
with targets and features, the decision tree algorithm will come up with some set of rules.
The same set rules can be used to perform the prediction on the test dataset.
Suppose you would like to predict that your daughter will like the newly
released animation movie or not. To model the decision tree you will use the training
dataset like the animated cartoon characters your daughter liked in the past movies.
So once you pass the dataset with the target as your daughter will like the movie or not
to the decision tree classifier. The decision tree will start building the rules with the
characters your daughter likes as nodes and the targets like or not as the leaf nodes. By
considering the path from the root node to the leaf node. You can get the rules.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 3/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
The simple rule could be if some x character is playing the leading role then your daughter
will like the movie. You can think a few more rules based on this example.
Then to predict whether your daughter will like the movie or not. You just need to
check the rules which are created by the decision tree to predict whether your daughter
will like the newly released movie or not.
In decision tree algorithm calculating these nodes and forming the rules will happen using
Shares the information gain and Gini index calculations.
In a random forest algorithm, Instead of using information gain or Gini index for
calculating the root node, the process of finding the root node and splitting the feature
nodes will happen randomly. Will look about in detail in the coming section.
Next, you are going to learn why random forest algorithm? When we are having other
classification algorithms to play with.
To address why random forest algorithm. I am giving you the below advantages.
The same random forest algorithm or the random forest classifier can use for both
classification and the regression task.
Random forest classifier will handle the missing values.
When we have more trees in the forest, a random forest classifier won’t overfit the
model.
Can model the random forest classifier for categorical values also.
Will discuss this advantage in the random forest algorithm advantages section of this
article. Until think through the above advantages of the random forest algorithm compared
to the other classification algorithms.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 4/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Shares
Before you drive into the technical details about the random forest algorithm. Let’s look
into a real-life example to understand the layman type of random forest algorithm.
Suppose Mady somehow got 2 weeks’ leave from his office. He wants to spend his 2
weeks traveling to a different place. He also wants to go to the place he may like.
So he decided to ask his best friend about the places he may like. Then his friend started
asking about his past trips. It’s just like his best friend will ask, You have been visited the
X place did you like it?
Based on the answers which are given by Mady, his best start recommending the place
Mady may like. Here his best formed the decision tree with the answer given by Mady.
As his best friend may recommend his best place to Mady as a friend. The model will be
biased with the closeness of their friendship. So he decided to ask a few more friends to
recommend the best place he may like.
Now his friends asked some random questions and each one recommended one place to
Mady. Now Mady considered the place which is high votes from his friends as the final
place to visit.
In the above Mady trip planning, two main interesting algorithms decision tree algorithm
and random forest algorithm used. I hope you find it already. Anyhow, I would like to
highlight it again.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 5/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Decision Tree:
To recommend the best place for Mady, his best friend asked some questions. Based on
the answers given by mady, he recommended a place. This is decision tree algorithm
approach. Will explain why it is a decision tree algorithm approach.
Mady friend used the answers given by mady to create rules. Later he used the created
rules to recommend the best place which mady will like. These rules could be, mady like a
Shares
place with lots of tree or waterfalls ..etc
In the above approach mady best friend is the decision tree. The vote (recommended
place) is the leaf of the decision tree (Target class). The target is finalized by a single
person, In a technical way of saying, using only a single decision tree.
In the other case when mady asked his friends to recommend the best place to visit. Each
friend asked him different questions and come up with their recommend a place to visit.
Later mady consider all the recommendations and calculated the votes. Votes basically are
to pick the popular place from the recommend places from all his friends.
Mady will consider each recommended place and if the same place recommended by
some other place he will increase the count. In the end, the high count place where mady
will go.
In this case, the recommended place (Target Prediction) is considered by many friends.
Each friend is the tree and the combined all friends will form the forest. This forest is
random forest. As each friend asked random questions to recommend the best place visit.
Now let’s use the above example to understand how the random forest algorithm work.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 6/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Shares
Let’s look at the pseudocode for random forest algorithm and later we can walk through
each step in the random forest algorithm.
The pseudocode for random forest algorithms can split into two stages.
1. Where k << m
2. Among the “k” features, calculate the node “d” using the best split
point.
3. Split the node into daughter nodes using the best split.
Shares
4. Repeat 1 to 3 steps until “l” number of nodes has been reached.
The beginning of random forest algorithm starts with randomly selecting “k” features out
of total “m” features. In the image, you can observe that we are randomly taking features
and observations.
In the next stage, we are using the randomly selected “k” features to find the root node by
using the best split approach.
In the next stage, We will be calculating the daughter nodes using the same best split
approach. Will the first 3 stages until we form the tree with a root node and having the
target as the leaf node.
Finally, we repeat 1 to 4 stages to create “n” randomly created trees. These randomly
created trees form the random forest.
To perform prediction using the trained random forest algorithm uses the below
pseudocode.
1. Takes the test features and use the rules of each randomly created
decision tree to predict the oucome and stores the predicted outcome
(target)
3. Consider the high voted predicted target as the final prediction from
the random forest algorithm.
To perform the prediction using the trained random forest algorithm we need to pass the
test features through the rules of each randomly created trees. Suppose let’s say we
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 8/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Each random forest will predict a different targets (outcomes) for the same test feature.
Then by considering each predicted target votes will be calculated. Suppose the 100
random decision trees are prediction some 3 unique targets x, y, z then the votes of x is
nothing but out of 100 random decision tree how many trees prediction is x.
Likewise for the other 2 targets (y, z). If x is getting high votes. Let’s say out of 100
Shares random decision tree 60 trees are predicting the target will be x. Then the final random
forest returns the x as the predicted target.
The random algorithm used in wide varieties of applications. In this article, we are going
to address a few of them.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 9/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Below are some the application where the random forest algorithm is widely used.
1. Banking
2. Medicine
3. Stock Market
4. E-commerce
1.Banking:
In the banking sector, a random forest algorithm widely used in two main applications.
These are for finding loyal customers and finding fraud customers.
The loyal customer means not the customer who pays well, but also the customer who can
take the huge amount as loan and pays the loan interest properly to the bank. As the
growth of the bank purely depends on loyal customers. The bank customer’s data highly
analyzed to find the pattern for the loyal customer based on the customer details.
In the same way, there is a need to identify the customer who is not profitable for the
bank, like taking the loan and paying the loan interest properly or find outlier customers.
If the bank can identify theses kind of customer before giving the loan the customer.
Bank will get a chance to not approve the loan to these kinds of customers. In this case,
also random forest algorithm is used to identify the customers who are not profitable for
the bank.
2.Medicine
In the medicine field, a random forest algorithm is used to identify the correct
combination of the components to validate the medicine. Random forest algorithm also
helpful for identifying the disease by analyzing the patient’s medical records.
3.Stock Market
In the stock market, a random forest algorithm used to identify the stock behavior as well
as the expected loss or profit by purchasing the particular stock.
4.E-commerce
In e-commerce, the random forest used only in the small segment of the recommendation
engine for identifying the likely hood of customers liking the recommend products base
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 10/11
3/12/25, 8:15 PM How the random forest algorithm works in machine learning
Running a random forest algorithm on a very large dataset requires high-end GPU
systems. If you are not having any GPU system. You can always run the machine learning
models in cloud-hosted desktop. You can use clouddesktoponline platform to run high-end
machine learning models from sitting any corner of the world.
Shares
Advantages of random forest algorithm
Below are the advantages of random forest algorithm compared with other classification
algorithms.
The overfitting problem will never come when we use the random forest algorithm
in any classification problem.
The same random forest algorithm can be used for both classification and regression
tasks.
The random forest algorithm can be used for feature engineering.
This means identifying the most important features out of the available
features from the training dataset.
Follow us:
I hope you like this post. If you have any questions, then feel free to comment below. If
you want me to write on one particular topic, then do tell it to me in the comments below.
https://dataaspirant.com/random-forest-algorithm-machine-learing/ 11/11