Lecture 3
Lecture 3
Contents
3.1 Understanding Ensembles, K-fold cross validation, Boosting, Stumping, XGBoost
3.2 Bagging, Subagging, Random Forest, Comparison with Boosting, Different ways to
combine classifiers
In this approach each dataset is dividing in to the k equal size samples. Each
individual sample is called as Fold.
For each set, k-1 folds are used for prediction function and remaining is
used for testing.
The steps for k-fold cross-validation is:
1. Divide the dataset into the k samples
2. For each iteration.
- Reserve one fold as the test data set
- Use remaining for training purpose
- Then evaluate the performance of model based on training set
For example, let us consider 5-fold cross-validation, here dataset is divided
into the 5 folds. In the 1st iteration first fold is reserved for testing dataset
and remaining is used for training purpose. In the 2nd iteration second fold is
used for testing and remaining is used for train the dataset. This method is
continuing until all the fold is used as a test dataset.
This technique is very much similar to the k-fold cross-validation with some
minor changes.
This method is based on stratification process, means to arrange the data
so that each fold is good representative of complete dataset.
This is the good method to handle bias and variance.
For example mobile prices, the price of some mobile gadgets are high as
compared to others so for that we can use stratified k-fold cross validation
process.
Advantages of Cross-validation:
- Overfitting: It resolves the problem of overfitting as it evaluates the
strong model performance on unseen data.
- Model selection: In cross-validation it combines the different model
performance.
- Data Efficient: This method allows us to use all the data for training
as well as testing, so this makes the model as data efficient.
Disadvantages of Cross-validation:
- Expensive: It requires high cost as model is complex and require long
time to train.
- Time consuming: As more number of complex model are there to
combine then more time is required for training and testing.
- Bias-variance trade-off: as some folds are result in high variance and
some may results in high bias.
3.2 Boosting:
3.3 Stumping:
Decision stump is nothing but one level decision tree, that means it consist
of one internal node called as root node and which is connected to terminal
nodes means leaf nodes.
Decision stump is used as weak learners in ensemble learning techniques
such as boosting or bagging.
In this technique, decision is just based on single input features so that it is
also called as 1-rule technique.
As it is binary classification technique, initially we assign some threshold
value, if input value is greater than threshold then we classify as 1 or if it is
less than or equal to threshold then we classify as 0.
There are several categories are there to build stump depending upon the
input, for nominal features build stump such that each input feature has
leaf. On other hand build stump based on categorical leaf.
Yes No
Boys Girls
It stands for extreme Gradient Boosting algorithm and it can handle large
dataset easily and achieve better performance in classification and
regression.
It combines many weak models and develop strong prediction model. This
can makes us to understand data and make better decisions
The advantage of XGBoost is speed, easy to use and better performance in
large set of data.
One of the important features of XGBoost is, real world data if some values
are missing then it can handle the same without any preprocessing. This
can be possible by training large data set in small amount of time.
It is used in many applications such as recommendation system, kaggle
competition, click through prediction system and so on.
It allows modification in parameters so that model can be highly optimized
and highly personalized.
It is also useful for managing overfitting problem by adding some weights
and biases in trees.
It follows parallel learning so it can be easily scalable on clusters. It supports
both classifications as well as regression model.
In this we use series of models and combine them to achieve highly
accurate model.
For adding new model in the existing one it uses gradient descent
algorithm.
Since there are some amount of input feature. If we want to find target
output variable and which is in the continuous format, in that regression
algorithm is used.
In this our responsibility to guide the data so that one can achieve highly
accurate data model.
Suppose if you are having input set features, and we want to calculate
target output feature which is in the categorical format then we use
classification algorithm. In this data can be guided by past observations of
dataset.
3.5 Bagging:
The above fig shows how bagging works. Bagging creates subset of original
data by using replacement method. It generates subset by bootstrap
resampling and trains each subset separately.
Final prediction model can be developed by considering averaging or voting
from all prediction models.
Bagging classifier uses different base classifiers such as decision tree, neural
network, linear classifier and so on.
Algorithm:
- Subset data is created from original dataset by bootstrap
rasampling with replacement.
- A base classifier is created for each subset.
- Each classifier works parallel on training subset data and these are
independent of each other.
- The final prediction model is developed by considering averaging
or voting of all predictions.
-
Review Questions:
Summary