0% found this document useful (0 votes)

12 views8 pages

2023AIB1008 Lab08

This lab report discusses the implementation of Random Forest, an ensemble learning method based on decision trees, detailing its structure, advantages, and hyperparameters. It outlines the data preprocessing steps, including classification and visualization, as well as the construction and training of decision trees and random forests. The report emphasizes the importance of tuning parameters to balance bias and variance while preventing overfitting.

Uploaded by

2023aib1008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

2023AIB1008 Lab08

Uploaded by

2023aib1008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Indian Institute of Technology,

Ropar

AI211
Machine Learning

LAB REPORT 8
Random Forest
I attended the lecture

Student Name Student ID

Kamakshi Gupta 2023AIB1008

Submission Date : 4/04/2025

1 Overview
1. Decision Trees - is a supervised learning algorithm used for classification and
regression tasks by recursively splitting data based on feature values. It con-
sists of a root node representing the entire dataset, internal nodes making
decisions, branches indicating possible outcomes, and leaf nodes providing
the final prediction. The splitting criteria can be Entropy & Information
Gain (for classification),or Mean Squared Error (MSE) (for regression). Key
hyperparameters include max depth, minimum samples split, and pruning,
which help control overfitting and improve generalization. Decision trees
are easy to interpret and handle both numerical and categorical data but
can overfit if not properly tuned. Despite their simplicity, they serve as the
foundation for more advanced models like Random Forests and Gradient
Boosting.

2. Random Forest - A random forest is an ensemble learning method that

builds multiple decision trees and combines their predictions to improve
accuracy and reduce overfitting. It works by randomly selecting subsets
of data and features for training each tree, ensuring diversity in decision-
making. The final prediction is made by majority voting for classification or
averaging for regression. Key hyperparameters include the number of trees,
maximum depth, and minimum samples per split, which help balance bias
and variance. Random forests are robust to noise, handle missing values
well, and provide feature importance rankings, making them widely used
in various applications. However, they require more computational power
compared to a single decision tree.

2 Data Preprocessing
1. Loaded the dataset as a dataframe and added headings.

2. Classified >50K as 1 and ≤ 50K as 0 for ease of classification

3. Next we visualised data

4. histograms

1
5. Next we removed outliers from education-num

6. We then analyzed the countplots of the categorical columns and found some
mismatch in number of unique values in training(left) and testing(right)
dataset. We fixed it by putting the extras in a column called Others and
adding a dummy column others to training dataset

2
7. Some issues were faced in maintaining equal size of columns.

8. Finally dummies were added and the numerical columns were scaled

3 Decision Trees
• Entropy Formula:
k
X
H(Y ) = − pi log2 (pi )
i=1

where H(Y ) is the entropy, pi is the probability of class i, and k is the

number of classes.
• Information Gain Calculation:
X |Yv |
IG(Xf ) = H(Y ) − H(Yv )
v∈V
|Y |

where V is the set of unique values of feature Xf .

• Decision Trees are initialized and loaded with optional max depth,
min samples split, and pruning ratio parameters.
• The fit method calls build tree to construct the decision tree recur-
sively.
• The algorithm stops splitting if the depth limit is reached, too few
samples remain, or all labels are the same.
• The best split is found by iterating over all features and selecting
thresholds using percentiles.
• Entropy and information gain are calculated for each possible split,
and the best feature-threshold pair is chosen.
• Random pruning is applied based on pruning ratio to reduce overfit-
ting.

3
• The dataset is split into left and right subsets, and subtrees are built
recursively.
• TreeNode objects store feature indices, thresholds, and child nodes for
decision-making.

4 Random Forest
1. The Random Forest Classifier is initialized with parameters like number of
trees, tree depth, minimum samples to split, pruning ratio, and number of
features to sample at each split.

2. The fit method creates n trees decision trees using bootstrap sampling of
the training data.

3. Each tree is trained using the DecisionTreeClassifier class with specified

hyperparameters.

4. During training, each tree considers a random subset of features (typically

sqrt(n features)) for finding the best split.

5. Random pruning is applied to each tree to prevent overfitting.

4
5
5 Testing & Final Result

6
7

Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
2 Marks
No ratings yet
2 Marks
2 pages
Adsa Syllabus
No ratings yet
Adsa Syllabus
2 pages
VLSI Physical Design Automation: Lecture 9. Introduction To Routing Global Routing (I)
No ratings yet
VLSI Physical Design Automation: Lecture 9. Introduction To Routing Global Routing (I)
51 pages
A2D and D2A Converter
No ratings yet
A2D and D2A Converter
16 pages
cs3401 Algorithms Lab Manual Final
No ratings yet
cs3401 Algorithms Lab Manual Final
43 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Random Forest
No ratings yet
Random Forest
18 pages
Lecture 7-8 Operation Research, BIG M METHOD
No ratings yet
Lecture 7-8 Operation Research, BIG M METHOD
21 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
NMCP Unit 3
100% (1)
NMCP Unit 3
4 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Random Forests
No ratings yet
Random Forests
43 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
10 Trees
No ratings yet
10 Trees
57 pages
Unit 4
No ratings yet
Unit 4
33 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
03 - Random Forest
No ratings yet
03 - Random Forest
24 pages
CE880 Lecture7 Slides
No ratings yet
CE880 Lecture7 Slides
78 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest
No ratings yet
Random Forest
25 pages
Da MS
No ratings yet
Da MS
24 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages
Chapter 6 - Basics of Digital Audio
No ratings yet
Chapter 6 - Basics of Digital Audio
24 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Random Forests
No ratings yet
Random Forests
22 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Unit 3QB
No ratings yet
Unit 3QB
36 pages
Unit 2 - Fip
No ratings yet
Unit 2 - Fip
29 pages
A Mini Project Report Tic Tac Toe
No ratings yet
A Mini Project Report Tic Tac Toe
11 pages
20ee38011 Exp4
No ratings yet
20ee38011 Exp4
24 pages
Present
No ratings yet
Present
20 pages
Random Forest
No ratings yet
Random Forest
21 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
Lecture 19 Different Classification Models
No ratings yet
Lecture 19 Different Classification Models
22 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
ML Asst.-01
No ratings yet
ML Asst.-01
21 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Sequencing Model
No ratings yet
Sequencing Model
10 pages
Random Forest Algorithm 1
No ratings yet
Random Forest Algorithm 1
14 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Dynamic Programming Morshed Sir
No ratings yet
Dynamic Programming Morshed Sir
19 pages
Lecture2 Decision Tree and Random Forest
No ratings yet
Lecture2 Decision Tree and Random Forest
24 pages
Entropy 24 01255
No ratings yet
Entropy 24 01255
13 pages
14 - Ensemble Methods
No ratings yet
14 - Ensemble Methods
38 pages
MLS 1 - Decision Trees and Random Forests
No ratings yet
MLS 1 - Decision Trees and Random Forests
16 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
EE 210 - 01 - Linear Systems Theory (Fall 2020) : San José State University Department of Electrical Engineering
No ratings yet
EE 210 - 01 - Linear Systems Theory (Fall 2020) : San José State University Department of Electrical Engineering
6 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Random Forest
No ratings yet
Random Forest
8 pages
Ai 3
No ratings yet
Ai 3
8 pages
Adijfpqo
No ratings yet
Adijfpqo
8 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
DC Lab 2
No ratings yet
DC Lab 2
6 pages
Random Forest
No ratings yet
Random Forest
6 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
Simplex Method Microsoft Office PowerPoint Presentation
No ratings yet
Simplex Method Microsoft Office PowerPoint Presentation
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
5 pages
Graphical Solution Methodspart2
No ratings yet
Graphical Solution Methodspart2
5 pages
Machine Learning: A Comprehensive Overview
No ratings yet
Machine Learning: A Comprehensive Overview
3 pages
2023 CSC10004 22CLC03 HW02 AlgorithmEfficiency
No ratings yet
2023 CSC10004 22CLC03 HW02 AlgorithmEfficiency
4 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Random Forest Algorithm Unit 3
No ratings yet
Random Forest Algorithm Unit 3
2 pages
Hamza Samad 3
No ratings yet
Hamza Samad 3
2 pages
Random Forest
No ratings yet
Random Forest
3 pages
Final EED364 End Sem 2023 GSP
No ratings yet
Final EED364 End Sem 2023 GSP
2 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

2023AIB1008 Lab08

Uploaded by

2023AIB1008 Lab08

Uploaded by

Indian Institute of Technology,

Student Name Student ID

Submission Date : 4/04/2025

2. Random Forest - A random forest is an ensemble learning method that

2. Classified >50K as 1 and ≤ 50K as 0 for ease of classification

3. Next we visualised data

where H(Y ) is the entropy, pi is the probability of class i, and k is the

where V is the set of unique values of feature Xf .

3. Each tree is trained using the DecisionTreeClassifier class with specified

4. During training, each tree considers a random subset of features (typically

5. Random pruning is applied to each tree to prevent overfitting.

You might also like