0% found this document useful (0 votes)

91 views

Random Forest Reference Code

The document discusses random forest classification models. It shows that a random forest model was built with 500 trees using 3 variables for each tree. The out-of-bag error estimate provides an accurate assessment of the model's performance without a test set. Variable importance is assessed using a variable importance plot that sorts variables by their MeanDecreaseGini. The random forest model achieves 99% accuracy on both the training and test sets, indicating stability.

Uploaded by

Rajat Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views

Random Forest Reference Code

Uploaded by

Rajat Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Classification

Random Forest

www.proschoolonline.com
Random Forest
#Random Forest model
modelrf <- randomForest(as.factor(left) ~ . , data = trainSplit, do.trace=T)
modelrf

The random forest model output tells us that it has built 500 trees and used 3 variables for each tree building.
Out-of-bag estimate for the generalization error is the error rate of the out-of-bag classifier on the training set.
The OOB estimate is as accurate as using a test set of the same size as the training set. Therefore, using the out-
of-bag error estimate removes the need for a set aside test set.

www.proschoolonline.com
Random Forest
#Checking variable importance in Random Forest
importance(modelrf)

varImpPlot(modelrf)

The variable importance plot displays

a plot with variables sorted by
MeanDecreaseGini

www.proschoolonline.com
Random Forest
# Prediction and Model Evaluation using Confusion Matrix
predrf_tr <- predict(modelrf, trainSplit) #Train Data
predrf_test <- predict(modelrf, testSplit) #Test Data

confusionMatrix(predrf_tr,trainSplit$left) #Train Data

confusionMatrix(predrf_test,testSplit$left) #TestData

The Confusion Matrix The Confusion Matrix

on Train data gives on Train data gives
the accuracy of 99% the accuracy of 99%

As we observe, the model shows similar performance on Train and Test data and hence we assure stability of our Random Forest model
www.proschoolonline.com
Comparing ROC curves for Decision Tree and Random Forest

# Prediction and Model Evaluation using Confusion Matrix

#Decision Tree ROC
auc1 <- roc(as.numeric(testSplit$left),
as.numeric(predtest))
plot(auc1,col =
'blue',main=paste('AUC:',round(auc1$auc[[1]],3)))

#Random Forest ROC

aucrf <- roc(as.numeric(testSplit$left),
as.numeric(predrf), ci=TRUE)
plot(aucrf, ylim=c(0,1), print.thres=TRUE,
main=paste('Random Forest
AUC:',round(aucrf$auc[[1]],3)),col = 'blue')

#Comparing both ROC curves

plot(aucrf, ylim=c(0,1), main=paste('ROC Comparison :
RF(blue),C5.0(Black))'),col = 'blue')
par(new = TRUE)
plot(auc1)
par(new = TRUE) The ROC curve for Random Forest is better for
Decision Tree.

www.proschoolonline.com
Classification Model
Naïve Bayes

www.proschoolonline.com
Naïve Bayes
#Naive Bayes
modelnb <- naiveBayes(as.factor(left) ~. , data = trainSplit)
modelnb

These are the apriori probabilities for the variables in the dataset
www.proschoolonline.com
Naïve Bayes
#Performance of Naïve Bayes using Confusion Matrix
prednb_tr <- predict(modelnb,trainSplit) #Train Data
prednb_test <- predict(modelnb,testSplit) #Test Data

confusionMatrix(prednb_tr,trainSplit$left) #Train Data

confusionMatrix(prednb_test,testSplit$left) #Test Data

The Confusion Matrix

The Confusion Matrix
on Train data gives
on Train data gives
the accuracy of
the accuracy of
78.84%
78.58%

As we observe, the model shows similar performance on Train and Test data and hence we assure stability of our Naïve Bayes model

www.proschoolonline.com
Classification Model
kNN Algorithm

www.proschoolonline.com
kNN Algorithm
#Data Preparation for kNN Algorithm

library(dummies)
#Creating dummy variables for Factor variable
dummy_df = dummy.data.frame(hr_data1[, c('role_code', 'salary.code')])

hr_data2 = hr_data1
hr_data2 = cbind.data.frame(hr_data2, dummy_df)

#Removing role_code and salary.code since we have created dummy variables

hr_data2 = hr_data2[, !(names(hr_data2) %in% c('role_code', 'salary.code'))]

#Converting variables to numeric datatype

hr_data2$Work_accident = as.numeric(hr_data2$Work_accident)
hr_data2$promotion_last_5years = as.numeric(hr_data2$promotion_last_5years)

www.proschoolonline.com
kNN Algorithm
#Data Preparation for kNN Algorithm

#Scale the variables and check their final

structure
X = hr_data2[, !(names(hr_data2) %in% c('left'))]
hr_data2_scaled = as.data.frame(scale(X))

str(hr_data2_scaled)

#Splitting the data for the model building

hr_train <- hr_data2_scaled[splitIndex,]
hr_test <- hr_data2_scaled[-splitIndex,]

hr_train_labels <- hr_data2[splitIndex, 'left']

hr_test_labels <- hr_data2[-splitIndex, 'left']

www.proschoolonline.com
kNN Algorithm
#Applying kNN Algorithm on the dataset
library(class)
library(gmodels)

test_pred_1 <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=1)

CrossTable(x=hr_test_labels ,y=test_pred_1 ,prop.chisq = FALSE)

Here from this crosstab, we can compute the accuracy

of this model, for k = 1.

Accuracy = (TP+TN)/Total
= (3311+1030)/4499
= 96.48%

www.proschoolonline.com
kNN Algorithm
#Applying kNN Algorithm on the dataset

As we calculated for k = 1, Similarly we will calculate it for k = 5,10,50,100,122.

Below we summarize the accuracy for these k values

K Accuracy
5 94.46%
10 94.17%
50 90.19%
100 86.48%
122 85.06%

From the above accuracy table, we can observe that as the k value increases the accuracy goes
down.

www.proschoolonline.com
kNN Algorithm
# Thumb rule to decide on k for k-NN is sqrt(n)/2
k = sqrt(nrow(hr_train))/2
k
#51.2347 (which can be approximated to 51

test_pred_rule <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=k)

CrossTable(x=hr_test_labels ,y=test_pred_rule ,prop.chisq = FALSE)
# accuracy = 4050/4499 = 90.02%

# Another method to determine the k for k-NN

set.seed(400)
ct <- trainControl(method="repeatedcv",repeats = 3)
fit <- train(left ~ ., data = hr_data2, method = "knn", trControl = ct, preProcess =
c("center","scale"),tuneLength = 20)
fit

# Checking accuracy of the model with k = 7

test_pred_7 <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=7)
CrossTable(x=hr_test_labels ,y=test_pred_7 ,prop.chisq = FALSE)
# accuracy = 4357/4499 = 96.84%

#or alternatively we can use this below command

confusionMatrix (hr_test_labels,test_pred_7)

Output on next slide..

www.proschoolonline.com
Using the above code it indicates that
k=7 is the best for this data and it is
better to go with this value because it
has been cross validated

www.proschoolonline.com
Step 6
Model
Summarization

www.proschoolonline.com
Summary of Model Performance

Model Accuracy
Decision Tree 97.09%
Random Forest 99%
Naïve Bayes 78.84%
kNN Algorithm (Using k = 7) 96.84%

www.proschoolonline.com
Appendix
Packages used for the Classification Analysis:

•data.table
•reshape2
•randomForest
•party # For decision tree
•rpart # for Rpart
•rpart.plot #for Rpart plot
•lattice # Used for Data Visualization
•caret # for data pre-processing
•pROC # for ROC curve
•corrplot # for correlation plot
•e1071 # for ROC curve and Confusion matrix
•RColorBrewer
•dummies
•class
•gmodels

www.proschoolonline.com
Thank You.

Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
16 pages
MLOPs Original
No ratings yet
MLOPs Original
27 pages
ARM Cortex-A57 Block Diagram
No ratings yet
ARM Cortex-A57 Block Diagram
1 page
Random Forest
No ratings yet
Random Forest
16 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
ML_LAB_Mannual-1
No ratings yet
ML_LAB_Mannual-1
79 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Machine Learning Approachs (AI)
100% (1)
Machine Learning Approachs (AI)
11 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
6th_SEM Machine Learning Notes PDF
100% (1)
6th_SEM Machine Learning Notes PDF
36 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Unit 4
No ratings yet
Unit 4
24 pages
Chapter
100% (1)
Chapter
101 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
ML QB WITH ANSWER
No ratings yet
ML QB WITH ANSWER
20 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Instructions For Physics Practical Exam
No ratings yet
Instructions For Physics Practical Exam
2 pages
Introduction To Machine Learning PDF
100% (1)
Introduction To Machine Learning PDF
17 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Unit 4
No ratings yet
Unit 4
4 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
PDF Introduction To Machine Learning Wit PDF
0% (1)
PDF Introduction To Machine Learning Wit PDF
3 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Xpectation Aximization: Grading An Exam Without An Answer Key
No ratings yet
Xpectation Aximization: Grading An Exam Without An Answer Key
9 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
Part I
No ratings yet
Part I
12 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Strategic Management: Business Strategies and Corporate Strategies
No ratings yet
Strategic Management: Business Strategies and Corporate Strategies
22 pages
Strategic Management: Prof Bharat Nadkarni
No ratings yet
Strategic Management: Prof Bharat Nadkarni
40 pages
logistics - 3rd - 5th party logistics provider
No ratings yet
logistics - 3rd - 5th party logistics provider
2 pages
Strategic MGMT Mintzberg McKinsey Blue Red Ocean
No ratings yet
Strategic MGMT Mintzberg McKinsey Blue Red Ocean
26 pages
Strategic Management: Prof Bharat Nadkarni
No ratings yet
Strategic Management: Prof Bharat Nadkarni
34 pages
Dell Case Study
0% (1)
Dell Case Study
2 pages
Globalisation Strategies Prof Bharat Nadkarni
No ratings yet
Globalisation Strategies Prof Bharat Nadkarni
15 pages
Models in Strategic Management
No ratings yet
Models in Strategic Management
110 pages
Supply Chain Management
No ratings yet
Supply Chain Management
4 pages
Characteristics of Mass Production
No ratings yet
Characteristics of Mass Production
7 pages
GE Matrix
No ratings yet
GE Matrix
13 pages
Decision Making: Prof Bharat Nadkarni
No ratings yet
Decision Making: Prof Bharat Nadkarni
38 pages
Supply Chain Management Notes
No ratings yet
Supply Chain Management Notes
6 pages
L4 Manufacturing Resource Planning - Ii
No ratings yet
L4 Manufacturing Resource Planning - Ii
74 pages
L 2 Vision Mission Goals Etc
No ratings yet
L 2 Vision Mission Goals Etc
88 pages
L 4 Strategy Types and Choices
100% (2)
L 4 Strategy Types and Choices
118 pages
L 3 Strategic Planning, Concepts, Operational Planning Etc
No ratings yet
L 3 Strategic Planning, Concepts, Operational Planning Etc
56 pages
L5 Offshoring and Outsourcing - Risk, Capabilities Etc
No ratings yet
L5 Offshoring and Outsourcing - Risk, Capabilities Etc
68 pages
L3 Yield Management, Overbooking Etc
No ratings yet
L3 Yield Management, Overbooking Etc
58 pages
L5 International HRM
No ratings yet
L5 International HRM
23 pages
31st March20 Circular
No ratings yet
31st March20 Circular
4 pages
L5 MRP - Ii, Jit and MRP Ii
No ratings yet
L5 MRP - Ii, Jit and MRP Ii
48 pages
BDA Course Structure
No ratings yet
BDA Course Structure
3 pages
Circular:: Copy For Necessary Action
No ratings yet
Circular:: Copy For Necessary Action
4 pages
International Business (Jaimen
No ratings yet
International Business (Jaimen
37 pages
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
85 pages
Deep Learning Resources
No ratings yet
Deep Learning Resources
5 pages
Cheatsheet Deep Learning
No ratings yet
Cheatsheet Deep Learning
2 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
experiment 1
No ratings yet
experiment 1
2 pages
Finalproject Review PPT
No ratings yet
Finalproject Review PPT
39 pages
Neural Network Questions
No ratings yet
Neural Network Questions
9 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
36 pages
UNIT - 1 DLNN
No ratings yet
UNIT - 1 DLNN
36 pages
Method Subtractive Culstering
No ratings yet
Method Subtractive Culstering
13 pages
Sequence Generation With RNNs - Pre Quiz - Attempt Review
100% (1)
Sequence Generation With RNNs - Pre Quiz - Attempt Review
5 pages
Ai Ga1
No ratings yet
Ai Ga1
7 pages
Dav Assignment 5
No ratings yet
Dav Assignment 5
2 pages
1 s2.0 S0262885621000706 Main
No ratings yet
1 s2.0 S0262885621000706 Main
11 pages
1491-Article Text-3334-1-10-20201216
No ratings yet
1491-Article Text-3334-1-10-20201216
6 pages
Unit II
No ratings yet
Unit II
56 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Data Mining Modul 3 Notes
No ratings yet
Data Mining Modul 3 Notes
3 pages
Deep Learning Notebook
No ratings yet
Deep Learning Notebook
7 pages
Rainfall-runoff Modelling Using Artificial Neural
No ratings yet
Rainfall-runoff Modelling Using Artificial Neural
7 pages
DMDW
No ratings yet
DMDW
24 pages
Is_Naive_Bayes_a_Good_Classifier_for_Document_Clas
No ratings yet
Is_Naive_Bayes_a_Good_Classifier_for_Document_Clas
11 pages
ML Unit 2 ppt
No ratings yet
ML Unit 2 ppt
54 pages
Data Warehousing and Data Mining UNIT - 04: A Lazy Learner Simply Stores The Training Data and
No ratings yet
Data Warehousing and Data Mining UNIT - 04: A Lazy Learner Simply Stores The Training Data and
3 pages
MCQ
No ratings yet
MCQ
8 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages
536C3B
No ratings yet
536C3B
2 pages
Studi Kasus: Identifikasi Komponen Penciri Akreditasi Sekolah/Madrasah Pada Tingkat SD/MI Di Provinsi Kalimantan Timur Tahun 2015
No ratings yet
Studi Kasus: Identifikasi Komponen Penciri Akreditasi Sekolah/Madrasah Pada Tingkat SD/MI Di Provinsi Kalimantan Timur Tahun 2015
8 pages
6 benefits of DL techniques for credit scoring
No ratings yet
6 benefits of DL techniques for credit scoring
14 pages
219 - Exp 9 - DWM
No ratings yet
219 - Exp 9 - DWM
10 pages

Random Forest Reference Code

Uploaded by

Random Forest Reference Code

Uploaded by

Classification

The variable importance plot displays

confusionMatrix(predrf_tr,trainSplit$left) #Train Data

The Confusion Matrix The Confusion Matrix

# Prediction and Model Evaluation using Confusion Matrix

#Random Forest ROC

#Comparing both ROC curves

confusionMatrix(prednb_tr,trainSplit$left) #Train Data

The Confusion Matrix

#Removing role_code and salary.code since we have created dummy variables

#Converting variables to numeric datatype

#Scale the variables and check their final

#Splitting the data for the model building

hr_train_labels <- hr_data2[splitIndex, 'left']

test_pred_1 <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=1)

Here from this crosstab, we can compute the accuracy

As we calculated for k = 1, Similarly we will calculate it for k = 5,10,50,100,122.

test_pred_rule <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=k)

# Another method to determine the k for k-NN

# Checking accuracy of the model with k = 7

#or alternatively we can use this below command

Output on next slide..

You might also like