Random forest and gradient boosting are both tree-based algorithms. Random forest uses bagging to make predictions by dividing data into samples and building models on each, then combining results through voting or averaging. Gradient boosting uses boosting, weighting misclassified predictions more so they can be corrected sequentially until a stopping point. Random forest reduces variance while gradient boosting reduces both bias and variance. Type I error rejects a true null hypothesis as false, while Type II error accepts a false null hypothesis as true. Stratified sampling maintains class proportions across samples for classification problems, unlike random sampling.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
23 views2 pages
ML Intw - Points To Remember
Random forest and gradient boosting are both tree-based algorithms. Random forest uses bagging to make predictions by dividing data into samples and building models on each, then combining results through voting or averaging. Gradient boosting uses boosting, weighting misclassified predictions more so they can be corrected sequentially until a stopping point. Random forest reduces variance while gradient boosting reduces both bias and variance. Type I error rejects a true null hypothesis as false, while Type II error accepts a false null hypothesis as true. Stratified sampling maintains class proportions across samples for classification problems, unlike random sampling.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 2
1 - In stratified cross-validation, the split preserves the ratio of the
categories on both the training and validation datasets.
2- Q21. Both being tree based algorithm, how is random
forest different from Gradient boosting algorithm (GBM)? Answer: The fundamental difference is, random forest uses bagging technique to make predictions. GBM uses boosting techniques to make predictions.
In bagging technique, a data set is divided into n
samples using randomized sampling. Then, using a single learning algorithm a model is build on all samples. Later, the resultant predictions are combined using voting or averaging. Bagging is done is parallel. In boosting, after the first round of predictions, the algorithm weighs misclassified predictions higher, such that they can be corrected in the succeeding round. This sequential process of giving higher weights to misclassified predictions continue until a stopping criterion is reached.
Random forest improves model accuracy by reducing
variance (mainly). The trees grown are uncorrelated to maximize the decrease in variance. On the other hand, GBM improves accuracy my reducing both bias and variance in a model.
3- Q30. What do you understand by Type I vs Type II error
? Answer: Type I error is committed when the null hypothesis is true and we reject it, also known as a ‘False Positive’. Type II error is committed when the null hypothesis is false and we accept it, also known as ‘False Negative’.
4- In case of classification problem, we should always
use stratified sampling instead of random sampling. A random sampling doesn’t takes into consideration the proportion of target classes. On the contrary, stratified sampling helps to maintain the distribution of target variable in the resultant distributed samples also.