Random Forest
Random Forest
In machine learning, a random forest is a classifier that consists of many decision trees and outputs
the class that is the mode of the classes output by individual trees. The algorithm for inducing a
random forest was developed by Leo Breiman and Adele Cutler, and "Random Forests" is their
trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of
Bell Labs in 1995. The method combines Breiman's "bagging" idea and Ho's "random subspace
method" to construct a collection of decision trees with controlled variations.
Learning algorithm
1. Let the number of training cases be N, and the number of variables in the classifier be M.
2. We are told the number m of input variables to be used to determine the decision at a node of
the tree; m should be much less than M.
3. Choose a training set for this tree by choosing N times with replacement from all N available
training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the
tree, by predicting their classes.
4. For each node of the tree, randomly choose m variables on which to base the decision at that
node. Calculate the best split based on these m variables in the training set.
5. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).
Advantages
2. Both being tree-based algorithm, how is random forest different from Gradient boosting
algorithm (GBM)?
Answer: The fundamental difference is, random forest uses bagging technique to make predictions.
GBM uses boosting techniques to make predictions. In bagging technique, a data set is divided into n
samples using randomized sampling. Then, using a single learning algorithm a model is built on all
samples. Later, the resultant predictions are combined using voting or averaging. Bagging is done is
parallel. In boosting, after the first round of predictions, the algorithm weighs misclassified
predictions higher, such that they can be corrected in the succeeding round. This sequential process
of giving higher weights to misclassified predictions continue until a stopping criterion is reached.
Random forest improves model accuracy by reducing variance (mainly). The trees grown are
uncorrelated to maximize the decrease in variance. On the other hand, GBM improves accuracy my
reducing both bias and variance in a model.
Answer: Since RF can handle non-linearity but can't provide coefficients, would it be wise to use
random forest to gather the most important features and then plug those features into a multiple
linear.
Answer: The usual way to compute the feature importance values of a single tree is as follows:
2.you traverse the tree: for each internal node that splits on feature i you compute the error
reduction of that node multiplied by the number of samples that were routed to the node and add
this quantity to feature_importances[i].
5.Do we need to normalize (or scale) data for randomForest (R package) explain why?
Answer: Random Forest is a tree-based model and hence does not require feature scaling.
Answer: Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the
prediction error of random forests.
Answer: By Normalization of data we can improve the Random Forest generalization ability.
8.How can I find out the most significant predictor in a Random Forest?
Answer: Using Variable Importance Plot in Random Forest we can find the most significant predictor.
Answer: Using max_depth and n_estimators we can tune hyper parameters for Random Forest
estimation.
10.What is the best way to implement random forest in matlab and plot the ROC curve?
Answer: TreeBagger function is the best way to implement random forest in matlab and plot the
ROC curve.
Answer: A random forest should have a number of trees between 64 - 128 trees.
12.Can anyone here tell me the difference between random forest and decision tree?
Answer: In simple word, a decision tree is just one tree which seems easy to interpret as it provides
the data partitioning based on different threshold values for the independent variables. Whereas,
random forest ensembles hundreds (user can decide) of decision trees to produce the average
output. It is less intuitive but way more powerful than a single tree.
Answer: Neural Networks are the alternatives of Random forest for behaviour scoring models.
14.How is Random forest (RF) classification more significant than support vector machine (SVM)
and ANN?
Answer: This depends only on the dataset but Random forest is easy to tune and faster than SVM
and ANN.
15. By which method we can compare random forest better than support vector machines?
Answer: Using bootstrap method we can compare random forest and support vector machines
16. What are the advantages and disadvantages of decision trees/Random Forest?
Answer: Advantages: Decision trees/Random Forest are easy to interpret, nonparametric (which
means they are robust to outliers), and there are relatively few parameters to tune.
Disadvantages: Decision trees/Random Forest are prone to be overfit. However, this can be
addressed by ensemble methods like random forests or boosted trees.
18.Give some situations where you will use an SVM over a RandomForest Machine Learning
algorithm and vice-versa.
Answer: SVM and Random Forest are both used in classification problems.
a) If you are sure that your data is outlier free and clean then go for SVM. It is the opposite - if
your data might contain outliers then Random forest would be the best choice
b) Generally, SVM consumes more computational power than Random Forest, so if you are
constrained with memory go for Random Forest machine learning algorithm.
c) Random Forest gives you a very good idea of variable importance in your data, so if you want to
have variable importance then choose Random Forest machine learning algorithm.
d) Random Forest machine learning algorithms are preferred for multiclass problems.
but as a good data scientist, you should experiment with both of them and test for accuracy or
rather you can use ensemble of many Machine Learning techniques.
Answer:
1.Boosting is based on weak learners (high bias, low variance). In terms of decision trees, weak
learners are shallow trees, sometimes even as small as decision stumps (trees with two leaves).
Boosting reduces error mainly by reducing bias (and also to some extent variance, by aggregating the
output from many models).
2.On the other hand, Random Forest uses as you said fully grown decision trees (low bias, high
variance). It tackles the error reduction task in the opposite way: by reducing variance. The trees are
made uncorrelated to maximize the decrease in variance, but the algorithm cannot reduce bias
(which is slightly higher than the bias of an individual tree in the forest). Hence the need for large,
unpruned trees, so that the bias is initially as low as possible.
Answer:Random Forests employ a procedure calling “feature bagging”, which decreases the
correlation between decision trees considerably -> increasing the mean accuracy of predictions.
They are useful for discovering complex relationships between different datasets, validating which
permits structural inference and hence more accurate prediction of future data points.
21. Do you think 50 small decision trees are better than a large one? Why?
● Yes!
● More robust model (ensemble of weak learners that come and make a strong learner)
● Better to improve a model by taking many small steps than fewer large steps
● If one tree is erroneous, it can be auto-corrected by the following
● Less prone to overfitting
22.Do we need to normalize (or scale) data for randomForest (R package) explain why?
Answer: Random Forest is a tree-based model and hence does not require feature scaling.
Answer: Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the
prediction error of random forests.
Answer: By Normalization of data we can improve the Random Forest generalization ability.
25.How can I find out the most significant predictor in a Random Forest?
Answer: Using Variable Importance Plot in Random Forest we can find the most significant predictor.
Answer: Using max_depth and n_estimators we can tune hyper parameters for Random Forest
estimation.
27.What is the best way to implement random forest in matlab and plot the ROC curve?
Answer: TreeBagger function is the best way to implement random forest in matlab and plot the
ROC curve.
Answer: A random forest should have a number of trees between 64 - 128 trees.
29.Can anyone here tell me the difference between random forest and decision tree?
Answer: In simple word, a decision tree is just one tree which seems easy to interpret as it provides
the data partitioning based on different threshold values for the independent variables. Whereas,
random forest ensembles hundreds (user can decide) of decision trees to produce the average
output. It is less intuitive but way more powerful than a single tree.
30.What would be alternatives to Random Forest for behavior scoring models?
Answer: Neural Networks are the alternatives of Random forest for behavior scoring models.
31.How is Random forest (RF) classification more significant than support vector machine (SVM)
and ANN?
Answer: This depends only on the dataset but Random forest is easy to tune and faster than SVM
and ANN.
32. By which method we can compare random forest better than support vector machines?
Answer: Using bootstrap method we can compare random forest and support vector machines
33. What are the advantages and disadvantages of decision trees/Random Forest?
Answer: Advantages: Decision trees/Random Forest are easy to interpret, nonparametric (which
means they are robust to outliers), and there are relatively few parameters to tune.
Disadvantages: Decision trees/Random Forest are prone to be overfit. However, this can be
addressed by ensemble methods like random forests or boosted trees.
Answer: We combine multiple trees because a group of weak learners here comes together to form
a strong learner.
35.Give some situations where you will use an SVM over a RandomForest Machine Learning
algorithm and vice-versa.
Answer: SVM and Random Forest are both used in classification problems.
a) If you are sure that your data is outlier free and clean then go for SVM. It is the opposite - if
your data might contain outliers then Random forest would be the best choice
b) Generally, SVM consumes more computational power than Random Forest, so if you are
constrained with memory go for Random Forest machine learning algorithm.
c) Random Forest gives you a very good idea of variable importance in your data, so if you want to
have variable importance then choose Random Forest machine learning algorithm.
d) Random Forest machine learning algorithms are preferred for multiclass problems.
Answer:
1.Boosting is based on weak learners (high bias, low variance). In terms of decision trees, weak
learners are shallow trees, sometimes even as small as decision stumps (trees with two leaves).
Boosting reduces error mainly by reducing bias (and also to some extent variance, by aggregating the
output from many models).
2.On the other hand, Random Forest uses as you said fully grown decision trees (low bias, high
variance). It tackles the error reduction task in the opposite way: by reducing variance. The trees are
made uncorrelated to maximize the decrease in variance, but the algorithm cannot reduce bias
(which is slightly higher than the bias of an individual tree in the forest). Hence the need for large,
unpruned trees, so that the bias is initially as low as possible.
Answer: Random Forests employ a procedure calling “feature bagging”, which decreases the
correlation between decision trees considerably -> increasing the mean accuracy of predictions.
They are useful for discovering complex relationships between different datasets, validating which
permits structural inference and hence more accurate prediction of future data points.
1.In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate the
results of these tree. Which of the following is true about individual(Tk) tree in Random Forest?
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Solution: A
Random forest is based on bagging concept, that consider faction of sample and faction of feature
for building the individual trees.
2.Which of the following algorithm are not an example of ensemble learning algorithm?
A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees
Solution: E
Decision trees doesn’t aggregate the results of multiple trees so it is not an ensemble algorithm.
3.Suppose you are using a bagging based algorithm say a RandomForest in model building. Which
of the following can be true?
B) 2
C) 1 and 2
D) None of these
Solution: A
Since Random Forest aggregate the result of different weak learners, If It is possible we would
want more number of trees in model building. Random Forest is a black box model you will lose
interpretability after using it.
4.Suppose you are building random forest model, which split a node on the attribute, that has
highest information gain. In the below image, select the attribute which has the highest
information gain?
A) Outlook
B) Humidity
C) Windy
D) Temperature
Solution: A
Information gain increases with the average purity of subsets. So option A would be the right
answer.
5.Which of the following algorithm would you take into the consideration in your final model
building on the basis of performance?
Suppose you have given the following graph which shows the ROC curve for two different
classification algorithms such as Random Forest(Red) and Logistic Regression(Blue)
A) Random Forest
B) Logistic Regression
D) None of these
Solution: A
2. Random forest
Let the number of predictors used at a single split in bagged decision tree is A and Random Forest
is B.
Which of the following statement is correct?
1. A >= B
2. A < B
3. A >> B
4. Cannot be said since different iterations use different numbers of predictors
Solution: A
Random Forest uses a subset of predictors for model building, whereas bagged trees use all the
features at once
Since, Random forest has largest AUC given in the picture so I would prefer Random Forest
7.Random forests (While solving a regression problem) have the higher variance of predicted
result in comparison to Boosted Trees (Assumption: both Random Forest and Boosted Tree are
fully optimized).
1. True
2. False
3. Cannot be determined
Solution: C
It completely depends on the data, the assumption cannot be made without data.
8.Which of the following tree based algorithm uses some parallel (full or partial) implementation?
1. Random Forest
2. Gradient Boosted Trees
3. XGBOOST
4. Both A and C
5. A, B and C
Solution: D
Random Forest is very easy to parallelize, where as XGBoost can have partially parallel
implementation. In Random Forest, all trees grows parallel and finally ensemble the output of
each tree .
Xgboost doesn’t run multiple trees in parallel like Random Forest, you need predictions after each
tree to update gradients. Rather it does the parallelization WITHIN a single tree to create branches
independently.
9.Generally, in terms of prediction performance which of the following arrangements are correct:
1. Bagging>Boosting>Random Forest>Single Tree
2. Boosting>Random Forest>Single Tree>Bagging
3. Boosting>Random Forest>Bagging>Single Tree
4. Boosting >Bagging>Random Forest>Single Tree
Solution: C
Generally speaking, Boosting algorithms will perform better than bagging algorithms. In terms of
bagging vs random forest, random forest works better in practice because random forest has less
correlated trees compared to bagging. And it’s always true that ensembles of algorithms are
better than single models.
10.When using Random Forest for feature selection, suppose you permute values of two features
– A and B. Permutation is such that you change the indices of individual values so that they do not
remain associated with the same target as before.
For example:
You notice that permuting values does not affect the score of model built on A, whereas the score
decreases on the model trained on B.Which of the following features would you select from the
following solely based on the above finding?
(A)
(B)
Solution: B
This is called mean decrease in accuracy when using random forest for feature selection.
Intuitively, if shuffling the values is not impacting the predictions, the feature is unlikely to add
value.
11.There are “A” features in a dataset and a Random Forest model is built over it. It is given that
there exists only one significant feature of the outcome – “Feature1”. What would be the % of
total splits that will not consider the “Feature1” as one of the features involved in that split (It is
given that m is the number of maximum features for random forest)?
Note: Considering random forest select features space for every node split.
1. (A-m)/A
2. (m-A)/m
3. m/A
4. Cannot be determined
Solution: A
Option A is correct. This can be considered as permutation of not selecting a predictor from all the
possible predictors
Option A is False because, number of trees has to decided when building a tree. It is not random.
13.Predictions of individual trees of bagged decision trees have lower correlation in comparison to
individual trees of random forest.
1. TRUE
2. FALSE
Solution: B
This is False because random Forest has more randomly generated uncorrelated trees than bagged
decision trees. Random Forest considers only a subset of total features. So individual trees that
are generated by random forest may have different feature subsets. This is not true for bagged
trees.
Stowing is the technique for enhancing the execution by collecting the consequences of powerless
students
A) 1
B) 2
C) 1 and 2
D) None of these
Arrangement: C
The two choices are valid. In Bagging, every individual trees are free of each other on the grounds
that they think about various subset of highlights and tests.
It is the strategy for enhancing the execution by amassing the consequences of feeble students
A) 1
B) 2
C) 1 and 2
D) None of these
Arrangement: B
In boosting tree individual feeble students are not free of each other in light of the fact that each
tree rectify the aftereffects of past tree. Sacking and boosting both can be consider as enhancing the
base students results.
3) Which of the accompanying is/are valid about Random Forest and Gradient Boosting group
strategies?
Irregular Forest is use for relapse though Gradient Boosting is use for Classification errand
A) 1
B) 2
C) 3
D) 4
E) 1 and 4
Arrangement: E
The two calculations are outline for order and also relapse errand.
4) In Random woodland you can create many trees (say T1, T2 … ..Tn) and after that total the
consequences of these tree. Which of the accompanying is valid about individual(Tk) tree in Random
Forest?
B) 1 and 4
C) 2 and 3
D) 2 and 4
Arrangement: A
Irregular woodland depends on stowing idea, that think about group of test and group of highlight
for building the individual trees.
Lower is better parameter if there should arise an occurrence of same approval exactness
Higher is better parameter if there should arise an occurrence of same approval exactness
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Arrangement: A
Increment the profundity from the specific estimation of profundity may overfit the information and
for 2 profundity esteems approval exactnesses are same we generally incline toward the little
profundity in definite model building.
6) Which of the accompanying calculation doesn't utilizes learning Rate starting at one of its
hyperparameter?
Inclination Boosting
Additional Trees
AdaBoost
Irregular Forest
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Arrangement: D
Irregular Forest and Extra Trees don't have learning rate as a hyperparameter.
7) Which of the accompanying calculation would you take into the thought in your last model
expanding based on execution?
Assume you have given the accompanying diagram which demonstrates the ROC bend for two
diverse arrangement calculations, for example, Random Forest(Red) and Logistic Regression(Blue)
A) Random Forest
B) Logistic Regression
D) None of these
Arrangement: A
Since, Random woods has biggest AUC given in the photo so I would incline toward Random Forest
8) Which of the accompanying is valid about preparing and testing blunder in such case?
Assume you need to apply AdaBoost calculation on Data D which has T perceptions. You set a large
portion of the information for preparing and half to test at first. Presently you need to build the
quantity of information focuses for preparing T1, T2 … Tn where T1 < T2… . Tn-1 < Tn.
A) The contrast between preparing mistake and test blunder increments as number of perceptions
increments
B) The contrast between preparing mistake and test blunder diminishes as number of perceptions
increments
C) The contrast between preparing mistake and test blunder won't change
D) None of These
Arrangement: B
As we have an ever increasing number of information, preparing mistake increments and testing
blunder de-wrinkles. What's more, they all focalize to the genuine blunder.
9) In irregular backwoods or angle boosting calculations, highlights can be of any sort. For instance, it
can be a consistent element or an all out component. Which of the accompanying choice is genuine
when you think about these sorts of highlights?
An) Only Random woodland calculation handles genuine esteemed characteristics by discretizing
them
B) Only Gradient boosting calculation handles genuine esteemed properties by discretizing them
C) Both calculations can deal with genuine esteemed properties by discretizing them
D) None of these
Arrangement: C
10) Which of the accompanying calculation are not a case of gathering learning calculation?
A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees
Arrangement: E
Choice trees doesn't total the aftereffects of various trees so it's anything but a group calculation.
11) Suppose you are utilizing a sacking based calculation say a RandomForest in demonstrate
building. Which of the accompanying can be valid?
A) 1
B) 2
C) 1 and 2
D) None of these
Arrangement: A
Since Random Forest total the aftereffect of various frail students, If It is conceivable we would need
more number of trees in show building. Irregular Forest is a discovery display you will lose
interpretability in the wake of utilizing it.
Setting 12-15
Think about the accompanying figure for noting the following couple of inquiries. In the figure, X1
and X2 are the two highlights and the information point is spoken to by dabs (- 1 is negative class
and +1 is a positive class). What's more, you first split the information in light of highlight X1(say part
point is x11) which is appeared in the figure utilizing vertical line. Each esteem under x11 will be
anticipated as positive class and more prominent than x will be anticipated as negative class.
12) what number information focuses are misclassified in above picture?
A) 1
B) 2
C) 3
D) 4
Arrangement: A
Just a single perception is misclassified, one negative class is appearing at the left half of vertical line
which will anticipate as a positive class.
13) Which of the accompanying part point on highlight x1 will arrange the information accurately?
C) Equal to x11
D) None of above
Arrangement: D
On the off chance that you look through any point on X1 you won't discover any point that gives
100% precision.
14) If you consider just element X2 for part. Could you currently flawlessly isolate the positive class
from negative class for any one split on X2?
A) Yes
B) No
Arrangement: B
It is likewise unrealistic.
15) Now consider just a single part on both (one on X1 and one on X2) include. You can part the two
highlights anytime. Would you have the capacity to order all information focuses accurately?
A) TRUE
B) FALSE
Arrangement: B
You won't discover such case since you can get least 1 misclassification.
Setting 16-17
Assume, you are taking a shot at a parallel grouping issue with 3 input highlights. Furthermore, you
connected a stowing algorithm(X) on this information. You picked max_features = 2 and the
n_estimators =3. Presently, Think that every estimator have 70% precision.
Note: Algorithm X is accumulating the aftereffects of individual estimators in view of greatest voting
16) What will be the most extreme precision you can get?
A) 70%
B) 80%
C) 90%
D) 100%
Arrangement: D
1 1 0 1 1
1 1 0 1 1
1 1 0 1 1
1 0 1 1 1
1 0 1 1 1
1 0 1 1 1
1 1 1 1 1
1 1 1 0 1
1 1 1 0 1
1 1 1 0 1
D) None of these
Arrangement: C
1 1 0 0 0
1 1 1 1 1
1 1 0 0 0
1 0 1 0 0
1 0 1 1 1
1 0 0 1 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
18) Suppose you are building arbitrary woodland demonstrate, which split a hub on the
characteristic, that has most noteworthy data pick up. In the beneath picture, select the
characteristic which has the most astounding data pick up?
An) Outlook
B) Humidity
C) Windy
D) Temperature
Arrangement: A
Data pick up increments with the normal virtue of subsets. So choice An eventual the correct
answer.
19) Which of the accompanying is valid about the Gradient Boosting trees?
In each stage, acquaint another relapse tree with remunerate the inadequacies of existing model
We can utilize angle good strategy for limit the misfortune work
A) 1
B) 2
C) 1 and 2
D) None of these
Arrangement: C
20) True-False: The stowing is appropriate for high difference low predisposition models?
A) TRUE
B) FALSE
Arrangement: A
The stowing is reasonable for high fluctuation low predisposition models or you can state for
complex models.
21) Which of the accompanying is genuine when you pick division of perceptions for building the
base students in tree based calculation?
A) Decrease the portion of tests to construct a base students will result in diminish in difference
B) Decrease the division of tests to assemble a base students will result in increment in fluctuation
C) Increase the division of tests to manufacture a base students will result in diminish in change
D) Increase the division of tests to assemble a base students will result in Increase in change
Arrangement: A
Setting 22-23
Assume, you are building a Gradient Boosting model on information, which has a large number of
perceptions and 1000's of highlights. Before building the model you need to consider the distinction
parameter setting for time estimation.
22) Consider the hyperparameter "number of trees" and mastermind the choices regarding time
taken by each hyperparameter for building the Gradient Boosting model?
A) 1~2~3
B) 1<2<3
C) 1>2>3
D) None of these
Arrangement: B
The time taken by building 1000 trees is most extreme and time taken by building the 100 trees is
least which is given in arrangement B
23) Now, Consider the learning rate hyperparameter and orchestrate the alternatives as far as time
taken by each hyperparameter for building the Gradient boosting model?
1. learning rate = 1
2. learning rate = 2
3. learning rate = 3
A) 1~2~3
B) 1<2<3
C) 1>2>3
D) None of these
Arrangement: A
Since learning rate doesn't influence time so all learning rates would take break even with time.
24) In greadient boosting it is essential utilize learning rate to get ideal yield. Which of the
accompanying is genuine adjoin picking the learning rate?
Arrangement: C
Learning rate ought to be low however it ought not be low generally calculation will take so long to
complete the preparation since you have to expand the number trees.
25) [True or False] Cross approval can be utilized to choose the quantity of emphasess in boosting;
this strategy may help decrease overfitting.
A) TRUE
B) FALSE
Arrangement: A
26) When you utilize the boosting calculation you generally think about the feeble students. Which
of the accompanying is the principle purpose behind having frail students?
To avert overfitting
A) 1
B) 2
C) 1 and 2
D) None of these
Arrangement: A
To counteract overfitting, since the multifaceted nature of the general student increments at each
progression. Beginning with frail students infers the last classifier will be more averse to overfit.
27) To apply stowing to relapse trees which of the accompanying is/are valid in such case?
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3
Arrangement: D
C) Both of these
D) None of these
Arrangement: B
We generally consider the approval results to contrast and the test outcome.
29) In which of the accompanying situation a pick up proportion is favored over Information Gain?
D) None of these
Arrangement: A
At the point when high cardinality issues, pick up proportion is favored over Information Gain
procedure.
30) Suppose you have given the accompanying situation for preparing and approval mistake for
Gradient Boosting. Which of the accompanying hyper parameter would you pick in such case?
1 2 100 110
2 4 90 105
3 6 50 100
4 8 45 105
5 10 30 150
A) 1
B) 2
C) 3
D) 4
Arrangement: B
Situation 2 and 4 has same approval correctnesses however we would choose 2 since profundity is
bring down is better hyper parameter.
You can get to the scores here. In excess of 350 individuals partook in the ability test and the most
astounding score got was 28.