Ensemble Modeling
Ensemble Modeling
B. Random Forest
C. Gradient Boosting
D. Decision Tree
Solution: (D)
Option D is correct. In case of decision tree, we build a single tree and no ensembling is
required.
1. Classifiers that are more “sure” can vote with more conviction
A. 1 and 2
B. 1 and 3
C. 2 and 3
Solution: (D)
In an ensemble model, we give higher weights to classifiers which have higher accuracies. In
these weak learners, we can aggregate the results of their sure parts of each of them.
The final result would have better results than the individual weak models.
Q3. Which of the following option is / are correct regarding benefits of ensemble model?
1. Better performance
2. Generalized models
3. Better interpretability
A. 1 and 3
B. 2 and 3
C. 1 and 2
D. 1, 2 and 3
Solution: (C)
1 and 2 are the benefits of ensemble modeling. Option 3 is incorrect because when we ensemble
Q4) Which of the following can be true for selecting base learners for an ensemble?
1. Different learners can come from same algorithm with different hyper parameters
A. 1
B. 2
C. 1 and 3
D. 1, 2 and 3
Solution: (D)
We can create an ensemble by following any / all of the options mentioned above. So option D is
correct.
Q5. True or False: Ensemble learning can only be applied to supervised learning methods.
A. True
B. False
Solution: (B)
Generally, we use ensemble technique for supervised learning algorithms. But, you can use an
Q
6. True or False: Ensembles will yield bad results when there is significant diversity among
the models.
A. True
B. False
Solution: (B)
improvise on the stability and predictive power of the model. So, creating an ensemble of diverse
Q7. Which of the following is / are true about weak learners used in ensemble model?
2. They have high bias, so they can not solve hard learning problems
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. None of these
Solution: (A)
Weak learners are sure about particular part of a problem. So they usually don’t overfit which
means that weak learners have low variance and high bias.
Q8.True or False: Ensemble of classifiers may or may not be more accurate than any of its
individual model.
A. True
B. False
Solution: (A)
Usually, ensemble would improve the model, but it is not necessary. Hence, option A is correct.
Q9. If you use an ensemble of different base models, is it necessary to tune the hyper
A. Yes
B. No
C. can’t say
Solution: (B)
It is not necessary. Ensemble of weak learners can also yield a good model.
Q10. Generally, an ensemble method works better, if the individual base models have
____________?
Note: Suppose each individual base models have accuracy greater than 50%.
Solution: (A)
A lower correlation among ensemble model members will increase the error-correcting
capability of the model. So it is preferred to use models with low correlations when creating
ensembles.
Context – Question 11
In an election, N candidates are competing against each other and people are voting for
either of the candidates. Voters don’t communicate with each other while casting their
votes.
Q.11 Which of the following ensemble method works similar to above-discussed election
procedure?
Hint: Persons are like base models of ensemble method.
A. Bagging
B. Boosting
C. A Or B
D. None of these
Solution: (A)
In bagged ensemble, the predictions of the individual models won’t depend on each other. So
option A is correct.
Q12. Suppose you are given ‘n’ predictions on test data by ‘n’ different models (M1, M2,
…. Mn) respectively. Which of the following method(s) can be used to combine the
1. Median
2. Product
3. Average
4. Weighted sum
B. 1,3 and 6
C. 1,3, 4 and 6
D. All of above
Solution: (D)
All of the above options are valid methods for aggregating results of different models (in case of
a regression model).
Suppose, you are working on a binary classification problem. And there are 3 models each
Q13. If you want to ensemble these models using majority voting method. What will be the
A. 100%
B. 78.38 %
C. 44%
D. 70
Solution: (A)
1 1 0 1 1
1 1 0 1 1
1 1 0 1 1
1 0 1 1 1
1 0 1 1 1
1 0 1 1 1
1 1 1 1 1
1 1 1 0 1
1 1 1 0 1
1 1 1 0 1
Q14. If you want to ensemble these models using majority voting. What will be the
Solution: (C)
1 1 0 0 0
1 1 1 1 1
1 1 0 0 0
1 0 1 0 0
1 0 1 1 1
1 0 0 1 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Q15. How can we assign the weights to output of different models in an ensemble?
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. All of above
Solution: (D)
All of the options are correct to decide weights of individual models in an ensemble.
D. None of these
Solution: (C)
You can use average ensemble on classification as well as regression. In classification, you can
apply averaging on prediction probabilities whereas in regression you can directly average the
Context Question 17
predictions = [0.2,0.5,0.33,0.8]
Which of the following will be the ranked average output for these predictions?
D. None of above
Solution: (A)
You can follow this code in python to get the desired result.
Q18.
In above snapshot, line A and B are the predictions for 2 models (M1, M2 respectively).
Now, You want to apply an ensemble which aggregates the results of these two models
using weighted averaging. Which of the following line will be more likely of the output of
this ensemble if you give 0.7, 0.3 weights to models M1 and M2 respectively.
A) A
B) B
C) C
D) D
E) E
Solution: (C)
B. 2 and 3
C. 1 and 2
D. 1, 2 and 3
E. None of above
Solution: (D)
Suppose in a classification problem, you have following probabilities for three models: M1,
M1 M2 M3 Output
apply probability threshold greater than or equals to 0.5 for category “1” or less than 0.5
Note: You are applying the averaging method to ensemble given predictions by three
models.
A.
M1 M2 M3 Output
B.
M1 M2 M3 Output
M1 M2 M3 Output
D. None of these
Solution: (B)
Take the average of predictions of each models for each observation then apply threshold 0.5 you
For example, in first observation of the models (M1, M2 and M3) the outputs are 0.70,0.80,0.75
if you take the average of these three you will get 0.75 which is above 0.5 that means this
Q21: Which of the following will be the predicted category for these observations if you
apply probability threshold greater than or equals to 0.5 for category “1” or less than 0.5
A.
M1 M2 M3 Output
B.
M1 M2 M3 Output
C.
M1 M2 M3 Output
D. None of these
Solution: (B)
Take the weighted average of the predictions of each model for each observation
For example, in first observation of models (M1,M2 and M3) the outputs are
0.70,0.80,0.75 if you take the weighted average of these three predictions, you will
get 0.745 output (0.70 * 0.4 + 0.80 *0.3 + 0.75* 0.3) which is above 0.5 that means
Suppose in binary classification problem, you have given the following predictions of three
models (M1, M2, M3) for five observations of test data set.
M1 M2 M3 Output
1 1 0
0 1 0
0 1 1
1 0 1
1 1 1
Q22: Which of the following will be the output ensemble model if we are using majority
voting method?
A.
M1 M2 M3 Output
1 1 0 0
0 1 0 1
0 1 1 0
1 0 1 0
1 1 1 1
B.
M1 M2 M3 Output
1 1 0 1
0 1 0 0
0 1 1 1
1 0 1 1
1 1 1 1
C.
M1 M2 M3 Output
1 1 0 1
0 1 0 0
0 1 1 1
1 0 1 0
1 1 1 1
D. None of these
Solution: (B)
Take the majority voting for the predictions of each model for each observation.
For example for a first observation of models(M1,M2 and M3) the outputs are 1,1,0 if you take
the majority voting of these three models predictions you will get 2 votes for class 1 that means
Q23. When using the weighted voting method, which of the following will be the output of
an ensemble model?
Hint: Count the vote of M1, M2, and M3 as 2.5 times, 6.5 times and 3.5 times respectively.
A.
M1 M2 M3 Output
1 1 0 0
0 1 0 1
0 1 1 0
1 0 1 0
1 1 1 1
B.
M1 M2 M3 Output
1 1 0 1
0 1 0 0
0 1 1 1
1 0 1 1
1 1 1 1
C.
M1 M2 M3 Ouput
1 1 0 1
0 1 0 1
0 1 1 1
1 0 1 0
1 1 1 1
D. None of these
Solution: (C)
training data
A.1 and 2
B. 2 and 3
C. 1 and 3
D. All of above
Solution: (C)
results.
better prediction
A. 1 and 2
B. 2 and 3
C. 1 and 3
Solution: (A)
Option 1 and 2 are advantages of stacking whereas option 3 is not correct as staking takes higher
time.
A.
B.
C. None of these
Solution: (A)
A is correct because it is aggregating the results of base models by applying a function f (you can
2. Train k models on each k-1 folds and get the out of fold predictions for remaining one
fold
3. Divide the test data set in “k” folds and get individual fold predictions by different
algorithms
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of above
Solution: (A)
The third option is not correct because we don’t create folds for test data in stacking.
Q28. Which of the following is the difference between stacking and blending?
D. None of these
Solution: (D)
folds on data.
Which of the following is true about one level (m base models + 1 stacker) stacking?
Note:
Solution: (B)
If you have m base models in stacking. That will generate m features for second stage models.
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of these
Solution: (C)
1. In bagging individual learners are not dependent on each other so they can be parallel
2-3 The bagging is suitable for high variance low bias models or you can say for complex
models.
A. True
B. False
Solution: (B)
In boosting you always try to add new models that correct previous models weaknesses. So it is
sequential.
Q32. Below are the two ensemble models:
Which of the following are more likely to choose if following conditions for E1 and E2 are
given?
E1: Individual Models accuracies are high but models are of the same type or in another
E2: Individual Models accuracies are high but they are of different types in another term
A. E1
B. E2
C. Any of E1 and E2
D. None of these
Solution: (B)
predictions of best x models. Now, which of the following can be a possible method to select
Solution: (C)
You can apply both the algorithms. In step wise forward selection, you will start with empty
predictions and will add the predictions of models one at a time if they improve the accuracy of
an ensemble. In Step wise backward elimination, you will start with full set of features and
remove model predictions one by one if after removing the predictions of model give
an improvement in accuracy.
Q34. Suppose, you want to apply a stepwise forward selection method for choosing the best
models for an ensemble model. Which of the following is the correct order of the steps?
1. Add the models predictions (or in another term take the average) one by one in the
3. Return the ensemble from the nested set of ensembles that has maximum performance
B. 1-3-4
C. 2-1-3
D. None of above
Solution: (C)
Option C is correct.
A. True
B. False
Solution: (B)
Because in dropout, weights are shared and the ensemble of subnetworks are trained together.
multiple sub-networks are trained together by “dropping” out certain connections between
neurons.
A. 1
B. 9
C. 12
D. 16
Solution: (B)
There are 16 possible combinations, of which only 9 are viable. Non-viable are (6, 7, 12, 13, 14,
15, 16).
Q37. How is the model capacity affected with dropout rate (where model capacity means
D. None of these
Solution: (B)
The subnetworks have more number of neurons to work with when dropout rate is low. So They
are more complex resulting in an increase in overall model complexity. Refer chap 11 of
DL book.
Q38. Which of the following parameters can be tuned for finding good ensemble model in
2. Max features
3. Bootstrapping of samples
4. Bootstrapping of features
A. 1 and 3
B. 2 and 4
C. 1,2 and 3
D. 1,3 and 4
E. All of above
Solution: (E)
All of the techniques given in the options can be applied to get the good ensemble.
small change in training data cause the large change in the learned classifiers.
B. False
Solution: (A)
Q40. Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.
Suppose you are using averaging as ensemble technique. What will be the probabilities that
A. 0.05
B. 0.06
C. 0.07
D. 0.09
Ans: B