2023AIB1008 Lab08
2023AIB1008 Lab08
Ropar
AI211
Machine Learning
LAB REPORT 8
Random Forest
I attended the lecture
2 Data Preprocessing
1. Loaded the dataset as a dataframe and added headings.
4. histograms
1
5. Next we removed outliers from education-num
6. We then analyzed the countplots of the categorical columns and found some
mismatch in number of unique values in training(left) and testing(right)
dataset. We fixed it by putting the extras in a column called Others and
adding a dummy column others to training dataset
2
7. Some issues were faced in maintaining equal size of columns.
8. Finally dummies were added and the numerical columns were scaled
3 Decision Trees
• Entropy Formula:
k
X
H(Y ) = − pi log2 (pi )
i=1
3
• The dataset is split into left and right subsets, and subtrees are built
recursively.
• TreeNode objects store feature indices, thresholds, and child nodes for
decision-making.
4 Random Forest
1. The Random Forest Classifier is initialized with parameters like number of
trees, tree depth, minimum samples to split, pruning ratio, and number of
features to sample at each split.
2. The fit method creates n trees decision trees using bootstrap sampling of
the training data.
4
5
5 Testing & Final Result
6
7