0% found this document useful (0 votes)

203 views39 pages

DM Assignment - Thena Bank

This document describes a case study about a bank called Thera Bank which wants to build a model to identify customers who are likely to purchase a personal loan. The bank has a growing customer base with most being depositors and only a small percentage currently holding loans. The bank wants to convert more deposit-only customers into loan customers while retaining their deposits. The dataset contains information on 5,000 customers to help build this predictive model.

Uploaded by

Santhosh Sadasivam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views39 pages

DM Assignment - Thena Bank

Uploaded by

Santhosh Sadasivam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

DM Assignment

Santhosh Sadasivam

12/11/2019

Description
Thera Bank - Loan Purchase Modeling
This case is about a bank (Thera Bank) which has a growing customer base. Majority of
these customers are liability customers (depositors) with varying size of deposits. The
number of customers who are also borrowers (asset customers) is quite small, and the
bank is interested in expanding this base rapidly to bring in more loan business and in the
process, earn more through the interest on loans. In particular, the management wants to
explore ways of converting its liability customers to personal loan customers (while
retaining them as depositors). A campaign that the bank ran last year for liability
customers showed a healthy conversion rate of over 9% success. This has encouraged the
retail marketing department to devise campaigns with better target marketing to increase
the success ratio with a minimal budget. The department wants to build a model that will
help them identify the potential customers who have a higher probability of purchasing the
loan. This will increase the success ratio while at the same time reduce the cost of the
campaign. The dataset has data on 5000 customers. The data include customer
demographic information (age, income, etc.), the customer’s relationship with the bank
(mortgage, securities account, etc.), and the customer response to the last personal loan
campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the
personal loan that was offered to them in the earlier campaign.
Problem Statement
# Libraries to install
library(readxl)
library(readr)
library(DataExplorer)
library(caTools)
library(rpart)
library(rpart.plot)
library(rattle)

## Rattle: A free graphical interface for data science with R.

## Version 5.2.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.

library(data.table)
library(ROCR)
## Loading required package: gplots

##
## Attaching package: 'gplots'

## The following object is masked from 'package:stats':

##
## lowess

library(ineq)
library(InformationValue)
library(ModelMetrics)

##
## Attaching package: 'ModelMetrics'

## The following objects are masked from 'package:InformationValue':

##
## confusionMatrix, npv, precision, sensitivity, specificity

## The following object is masked from 'package:base':

##
## kappa

library(reshape)

##
## Attaching package: 'reshape'

## The following object is masked from 'package:data.table':

##
## melt

library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

##
## Attaching package: 'randomForest'

## The following object is masked from 'package:rattle':

##
## importance

# Setting the working directory

setwd("C:/Users/santhosh/Desktop/R programming/DM Assignment")
getwd()

## [1] "C:/Users/santhosh/Desktop/R programming/DM Assignment"

# Reading the sata file

data = read_xlsx("Thera Bank_Personal_Loan_Modelling-dataset-1.xlsx",2)
# Exploratory data analysis
## Univariate Analysis
print(summary(data))

## ID Age (in years) Experience (in years) Income (in K/month)

## Min. : 1 Min. :23.00 Min. :-3.0 Min. : 8.00
## 1st Qu.:1251 1st Qu.:35.00 1st Qu.:10.0 1st Qu.: 39.00
## Median :2500 Median :45.00 Median :20.0 Median : 64.00
## Mean :2500 Mean :45.34 Mean :20.1 Mean : 73.77
## 3rd Qu.:3750 3rd Qu.:55.00 3rd Qu.:30.0 3rd Qu.: 98.00
## Max. :5000 Max. :67.00 Max. :43.0 Max. :224.00
##
## ZIP Code Family members CCAvg Education
## Min. : 9307 Min. :1.000 Min. : 0.000 Min. :1.000
## 1st Qu.:91911 1st Qu.:1.000 1st Qu.: 0.700 1st Qu.:1.000
## Median :93437 Median :2.000 Median : 1.500 Median :2.000
## Mean :93153 Mean :2.397 Mean : 1.938 Mean :1.881
## 3rd Qu.:94608 3rd Qu.:3.000 3rd Qu.: 2.500 3rd Qu.:3.000
## Max. :96651 Max. :4.000 Max. :10.000 Max. :3.000
## NA's :18
## Mortgage Personal Loan Securities Account CD Account
## Min. : 0.0 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 0.0 Median :0.000 Median :0.0000 Median :0.0000
## Mean : 56.5 Mean :0.096 Mean :0.1044 Mean :0.0604
## 3rd Qu.:101.0 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :635.0 Max. :1.000 Max. :1.0000 Max. :1.0000
##
## Online CreditCard
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.0000 Median :0.000
## Mean :0.5968 Mean :0.294
## 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.000
##

attach(data)

# we could see the summary of the data set for each column with Mean,Median,
Min, Max, 1st Qtr, 3rd Qtr etc..

18 NA’s observed in varibale - Family Members

## Data Types
print(colnames(data))

## [1] "ID" "Age (in years)"

## [3] "Experience (in years)" "Income (in K/month)"
## [5] "ZIP Code" "Family members"
## [7] "CCAvg" "Education"
## [9] "Mortgage" "Personal Loan"
## [11] "Securities Account" "CD Account"
## [13] "Online" "CreditCard"

str(data)

## Classes 'tbl_df', 'tbl' and 'data.frame': 5000 obs. of 14 variables:

## $ ID : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Age (in years) : num 25 45 39 35 35 37 53 50 35 34 ...
## $ Experience (in years): num 1 19 15 9 8 13 27 24 10 9 ...
## $ Income (in K/month) : num 49 34 11 100 45 29 72 22 81 180 ...
## $ ZIP Code : num 91107 90089 94720 94112 91330 ...
## $ Family members : num 4 3 1 1 4 4 2 1 3 1 ...
## $ CCAvg : num 1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
## $ Education : num 1 1 1 2 2 2 2 3 2 3 ...
## $ Mortgage : num 0 0 0 0 0 155 0 0 104 0 ...
## $ Personal Loan : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Securities Account : num 1 1 0 0 0 0 0 0 0 0 ...
## $ CD Account : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Online : num 0 0 0 0 0 1 1 0 1 0 ...
## $ CreditCard : num 0 0 0 0 1 0 0 1 0 0 ...

# Data has all varibales as numeric and it is found that data is a mix of
table and dataframe

data = data.frame(data) # converting dataset into dataframe

# We could see that all varibales are indicated as numbers but we need to
convert a few variables into factors

# Converting required variables into factors

data$Online=as.factor(data$Online)
data$Personal.Loan = as.factor(data$Personal.Loan)
data$Education=as.factor(data$Education)
data$Securities.Account=as.factor(data$Securities.Account)
data$CD.Account=as.factor(data$CD.Account)
data$CreditCard = as.factor(data$CreditCard)
data$Family.members = as.factor(data$Family.members)
print(str(data))

## 'data.frame': 5000 obs. of 14 variables:

## $ ID : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Age..in.years. : num 25 45 39 35 35 37 53 50 35 34 ...
## $ Experience..in.years.: num 1 19 15 9 8 13 27 24 10 9 ...
## $ Income..in.K.month. : num 49 34 11 100 45 29 72 22 81 180 ...
## $ ZIP.Code : num 91107 90089 94720 94112 91330 ...
## $ Family.members : Factor w/ 4 levels "1","2","3","4": 4 3 1 1 4 4
2 1 3 1 ...
## $ CCAvg : num 1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
## $ Education : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 3
2 3 ...
## $ Mortgage : num 0 0 0 0 0 155 0 0 104 0 ...
## $ Personal.Loan : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2
...
## $ Securities.Account : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 1 1
...
## $ CD.Account : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1
...
## $ Online : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 2 1 2 1
...
## $ CreditCard : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 2 1 1
...
## NULL

# Few more univariate analysis

print(dim(data))

## [1] 5000 14

Dataset has 5000 Rows and 14 Columns

# Column Names
print(colnames(data))

## [1] "ID" "Age..in.years."

## [3] "Experience..in.years." "Income..in.K.month."
## [5] "ZIP.Code" "Family.members"
## [7] "CCAvg" "Education"
## [9] "Mortgage" "Personal.Loan"
## [11] "Securities.Account" "CD.Account"
## [13] "Online" "CreditCard"

# Making Valid clumn names syntactically

colnames(data)=make.names(colnames(data))
print(colnames(data))

## [1] "ID" "Age..in.years."

# Removing the first column ID as it is a sequential number and Zip code is

also not required for our processing
data = data [,-1] # removing first column ID
data = data[,-4] # removing 4th column Zip code

# Identifying NA in the dataset

sum(is.na(data))

## [1] 18
There are 18 NAs in the dataset. as observed earlier all 18 in Family Members column
# Proportio of Responders and Non responders to personal loan campaign

prop.table(table(data$Personal.Loan))*100

##
## 0 1
## 90.4 9.6

9.6% responded to the personal loan campaign 90.4% has not responded to the campaign
# missing values and plotting them
plot_missing(data)

colSums(is.na(data))

## Age..in.years. Experience..in.years. Income..in.K.month.

## 0 0 0
## Family.members CCAvg Education
## 18 0 0
## Mortgage Personal.Loan Securities.Account
## 0 0 0
## CD.Account Online CreditCard
## 0 0 0

Family Members observed with 0.36% missing values Since the percnetage is low we can
delete it from the dataset
# Missing Value Treatment
print.data.frame(data[!complete.cases(data),]) # showing no of rows where NA
is present

## Age..in.years. Experience..in.years. Income..in.K.month.

## 21 56 31 25
## 59 28 2 93
## 99 49 23 94
## 162 61 35 80
## 236 38 8 71
## 290 42 15 24
## 488 39 13 88
## 722 49 24 39
## 1461 40 16 85
## 1462 54 28 48
## 2400 62 36 41
## 2833 45 21 133
## 3702 58 33 95
## 4136 48 23 168
## 4139 47 22 114
## 4403 55 25 52
## 4404 50 24 112
## 4764 51 25 173
## Family.members CCAvg Education Mortgage Personal.Loan
## 21 <NA> 0.9 2 111 0
## 59 <NA> 0.2 1 0 0
## 99 <NA> 0.3 1 0 0
## 162 <NA> 2.8 1 0 0
## 236 <NA> 1.8 3 0 0
## 290 <NA> 1.0 2 0 0
## 488 <NA> 1.4 2 0 0
## 722 <NA> 1.4 3 0 0
## 1461 <NA> 0.2 3 0 0
## 1462 <NA> 0.2 1 0 0
## 2400 <NA> 1.0 3 154 0
## 2833 <NA> 5.7 3 0 1
## 3702 <NA> 2.6 1 0 0
## 4136 <NA> 2.8 1 308 0
## 4139 <NA> 0.6 1 0 0
## 4403 <NA> 1.4 3 207 0
## 4404 <NA> 0.0 1 0 0
## 4764 <NA> 0.5 2 0 1
## Securities.Account CD.Account Online CreditCard
## 21 0 0 1 0
## 59 0 0 0 0
## 99 0 0 1 0
## 162 0 0 1 0
## 236 0 0 1 0
## 290 0 0 1 1
## 488 0 0 1 1
## 722 0 0 1 0
## 1461 0 0 1 1
## 1462 0 0 1 0
## 2400 1 0 1 0
## 2833 0 1 1 1
## 3702 0 0 1 0
## 4136 0 0 1 0
## 4139 1 1 1 1
## 4403 1 0 0 0
## 4404 0 0 0 0
## 4764 0 0 1 0

data = na.omit(data) # deleting rows containing NAs

colSums(is.na(data)) # finding columns with no values

## Age..in.years. Experience..in.years. Income..in.K.month.

## 0 0 0
## Family.members CCAvg Education
## 0 0 0
## Mortgage Personal.Loan Securities.Account
## 0 0 0
## CD.Account Online CreditCard
## 0 0 0

All the rows which had NAs are removed completely

# Negative Values - As observed earlier years of experience had negative
values. Yrs of exp cannot be negative. Also checking for other columns

print(colSums(data<0))

## Warning in Ops.factor(left, right): '<' not meaningful for factors

## Age..in.years. Experience..in.years. Income..in.K.month.

## 0 52 0
## Family.members CCAvg Education
## NA 0 NA
## Mortgage Personal.Loan Securities.Account
## 0 NA NA
## CD.Account Online CreditCard
## NA NA NA

cat("Total Negative Values:",

(length(data$Experience..in.years.[data$Experience..in.years.<0])/nrow(data))
*100 , "%")

## Total Negative Values: 1.043758 %

Total Negative Values: 1.043758 %

Professional Experience has 52 negative values
Since experience cant be negative we need to treat them
# Negative Value Treatment

data[data$Experience..in.years. <0, "Experience..in.years."] =

mean(data$Experience..in.years.[data$Experience..in.years. >= 0])

print(colSums(data < 0))

## Warning in Ops.factor(left, right): '<' not meaningful for factors

## Age..in.years. Experience..in.years. Income..in.K.month.

## 0 0 0
## Family.members CCAvg Education
## NA 0 NA
## Mortgage Personal.Loan Securities.Account
## 0 NA NA
## CD.Account Online CreditCard
## NA NA NA

Replaced Negative values with mean value of >=0 data values

## Bivariate ANalysis in EDA
# Finding correlation between the variables
# Correlation plot
plot_correlation(data)

Age and Experience has high correlation Income and Averge spending on credit card has
medium correlation There is no other significan correlation as we observe the plot
# Histogram

plot_histogram(data, binary_as_factor = FALSE, geom_histogram_args =

list("fill" = "red"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Age and Expereince are normall distributed CC Avg income right skewed Mortgage - 70%
has no mortgage
# Finding Outliers using Boxplot

plot_boxplot(data, by="Personal.Loan", binary_as_factor = FALSE,

geom_boxplot_args = list("fill"= "Blue"))
Outliers observed
in CC Avg, Income, Mortgage
# Density Plot

plot_density(data, binary_as_factor = FALSE, geom_density_args = list("fill"

= "Green"))
## Splitting data into Train and Test Data Ser

seed = 1000
set.seed(seed)
x = sample.split(data$Personal.Loan, SplitRatio = 0.7)
TrainDS = subset(data, x==TRUE)
TestDS = subset(data,x==FALSE)
TrainDS_RF = TrainDS
TestDS_RF = TestDS

Modelling
# CART Modelling
# setting CART Parameters
cartParameters = rpart.control(minsplit = 15, cp =0.009,xval = 10)
cartModel = rpart(formula = TrainDS$Personal.Loan~ .,data = TrainDS, method =
"class", control = cartParameters)
cartModel

## n= 3488
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 3488 335 0 (0.903956422 0.096043578)
## 2) Income..in.K.month.< 119.5 2874 76 0 (0.973556019 0.026443981)
## 4) CCAvg< 2.95 2633 13 0 (0.995062666 0.004937334) *
## 5) CCAvg>=2.95 241 63 0 (0.738589212 0.261410788)
## 10) CD.Account=0 221 47 0 (0.787330317 0.212669683)
## 20) Education=1 113 9 0 (0.920353982 0.079646018) *
## 21) Education=2,3 108 38 0 (0.648148148 0.351851852)
## 42) Income..in.K.month.< 92.5 67 9 0 (0.865671642 0.134328358)
*
## 43) Income..in.K.month.>=92.5 41 12 1 (0.292682927 0.707317073)
*
## 11) CD.Account=1 20 4 1 (0.200000000 0.800000000) *
## 3) Income..in.K.month.>=119.5 614 259 0 (0.578175896 0.421824104)
## 6) Education=1 406 51 0 (0.874384236 0.125615764)
## 12) Family.members=1,2 355 0 0 (1.000000000 0.000000000) *
## 13) Family.members=3,4 51 0 1 (0.000000000 1.000000000) *
## 7) Education=2,3 208 0 1 (0.000000000 1.000000000) *

# plotting the model

fancyRpartPlot(cartModel)

printcp(cartModel)

##
## Classification tree:
## rpart(formula = TrainDS$Personal.Loan ~ ., data = TrainDS, method =
"class",
## control = cartParameters)
##
## Variables actually used in tree construction:
## [1] CCAvg CD.Account Education
## [4] Family.members Income..in.K.month.
##
## Root node error: 335/3488 = 0.096044
##
## n= 3488
##
## CP nsplit rel error xerror xstd
## 1 0.31045 0 1.00000 1.00000 0.051946
## 2 0.15224 2 0.37910 0.38806 0.033395
## 3 0.01791 3 0.22687 0.23582 0.026230
## 4 0.00900 7 0.14030 0.14925 0.020956

plotcp(cartModel)

The Built Cart tree have scope for Pruning as we see form the above plot considering the
lowest error
# Fnding the best CP

bestCP = cartModel$cptable[which.min(cartModel$cptable[,"xerror"]), "CP"]

bestCP

## [1] 0.009

## pruning Tree
pTree = prune(cartModel,cp = bestCP, "CP")
pTree

#Plotting Pruned Tree

fancyRpartPlot(pTree, main = "Pruned Tree")
printcp(pTree)

The final tree is built with lowest xerror and 7 splits

## CART model Performance
# PRediction
TrainDS$Prediction = predict(pTree,TrainDS, type = "class")
TrainDS$Probability = predict(pTree,TrainDS, Type = "Prob")[,"1"]
head(TrainDS) #Prediction and probability columns added to Training Data

## Age..in.years. Experience..in.years. Income..in.K.month. Family.members

## 1 25 1 49 4
## 3 39 15 11 1
## 4 35 9 100 1
## 5 35 8 45 4
## 6 37 13 29 4
## 8 50 24 22 1
## CCAvg Education Mortgage Personal.Loan Securities.Account CD.Account
## 1 1.6 1 0 0 1 0
## 3 1.0 1 0 0 0 0
## 4 2.7 2 0 0 0 0
## 5 1.0 2 0 0 0 0
## 6 0.4 2 155 0 0 0
## 8 0.3 3 0 0 0 0
## Online CreditCard Prediction Probability
## 1 0 0 0 0.004937334
## 3 0 0 0 0.004937334
## 4 0 0 0 0.004937334
## 5 0 1 0 0.004937334
## 6 1 0 0 0.004937334
## 8 0 1 0 0.004937334

# Confusion MAtrix

tb1_TrDS_CART = table(TrainDS$Prediction, TrainDS$Personal.Loan)

tb1_TrDS_CART

##
## 0 1
## 0 3137 31
## 1 16 304

# Classification Error Rate / Misclassification

CER_TrDS = (tb1_TrDS_CART[1,2]+tb1_TrDS_CART[2,1])/sum(tb1_TrDS_CART)
CER_TrDS

## [1] 0.01347477

Classification Error or Erro is 1.3%

# Accuracy of the Model
# as we know that accuracy is 1-Error

Acc_TrDS = 1 - CER_TrDS
Acc_TrDS
## [1] 0.9865252

Accuracy of the model is 98.7$

# True positive Rate / Sensitivity
TPR_TrDS=tb1_TrDS_CART[2,2]/(tb1_TrDS_CART[1,2]+tb1_TrDS_CART[2,2]) #True
positive rate or sensitivity (TP/TP+FN)
TPR_TrDS

## [1] 0.9074627

TPR / Sensiivity is 0.9074627 / 90.7%

# True negative rate or specificity:
TNR_TrDS=tb1_TrDS_CART[1,1]/(tb1_TrDS_CART[1,1]+tb1_TrDS_CART[2,1]) #True
negative rate or specificity (TN/TN+FP)
TNR_TrDS

## [1] 0.9949255

TNR / Specificity is 0.9949255

# Creating Decile and chopping into buckets
prob_TrDS_CART = seq(0,1,length = 11)
qt_TrDS_CART = quantile(TrainDS$Probability,prob_TrDS_CART)
qt_TrDS_CART

## 0% 10% 20% 30% 40% 50%

## 0.000000000 0.000000000 0.004937334 0.004937334 0.004937334 0.004937334
## 60% 70% 80% 90% 100%
## 0.004937334 0.004937334 0.004937334 0.134328358 1.000000000

TrainDS$deciles = cut(TrainDS$Probability,
unique(qt_TrDS_CART),include.lowest = TRUE, right = TRUE)
table(TrainDS$deciles)

##
## [0,0.00494] (0.00494,0.134] (0.134,1]
## 2988 180 320

Three different buckets were created based on a specific number interval Above 0-0.00494
is one bucket, 0.00494 - 0.134 in one bucket and 0.134-1 in the kast bucket
# Rank ordering Table (Model PErofrmance 1)

TrainDS = data.table(TrainDS)
rankTbl_TrDS_CART = TrainDS[, list(
cnt = length(Personal.Loan),
cnt_tar1 = sum(Personal.Loan == 1),
cnt_tar0 = sum(Personal.Loan == 0)),
by=deciles][order(-deciles)]
rankTbl_TrDS_CART$resp_rate = round(rankTbl_TrDS_CART$cnt_tar1 /
rankTbl_TrDS_CART$cnt,4)*100;
rankTbl_TrDS_CART$cum_resp = cumsum(rankTbl_TrDS_CART$cnt_tar1)
rankTbl_TrDS_CART$cum_non_resp = cumsum(rankTbl_TrDS_CART$cnt_tar0)
rankTbl_TrDS_CART$cum_rel_resp = round(rankTbl_TrDS_CART$cum_resp /
sum(rankTbl_TrDS_CART$cnt_tar1),4)*100
rankTbl_TrDS_CART$cum_rel_non_resp = round(rankTbl_TrDS_CART$cum_non_resp /
sum(rankTbl_TrDS_CART$cnt_tar0),4)*100
rankTbl_TrDS_CART$ks = abs(rankTbl_TrDS_CART$cum_rel_resp -
rankTbl_TrDS_CART$sum_rel_non_resp)
print(rankTbl_TrDS_CART)

## deciles cnt cnt_tar1 cnt_tar0 resp_rate cum_resp cum_non_resp

## 1: (0.134,1] 320 304 16 95.00 304 16
## 2: (0.00494,0.134] 180 18 162 10.00 322 178
## 3: [0,0.00494] 2988 13 2975 0.44 335 3153
## cum_rel_resp cum_rel_non_resp ks
## 1: 90.75 0.51 NA
## 2: 96.12 5.65 NA
## 3: 100.00 100.00 NA

# auc,ks & gini computing methods

predObj_TrDS = prediction(TrainDS$Probability, TrainDS$Personal.Loan)
perf_TrDS = performance(predObj_TrDS, "tpr" , "fpr")
plot(perf_TrDS)

ROC curve has been plotted

ks_TrDS = max([email protected][[1]][email protected][[1]])
auc_TrDS = performance(predObj_TrDS, "auc")
auc_TrDS = as.numeric([email protected])
gini_TrDS = ineq(TrainDS$Probability,type = "Gini")
cat("Ks=", ks_TrDS,
"auc=" , auc_TrDS,
"gini=" , gini_TrDS)

## Ks= 0.9108586 auc= 0.9799244 gini= 0.8676614

# concordance and Discordance

Concordance_TrDS = Concordance(actuals = TrainDS$Personal.Loan,
predictedScores = TrainDS$Probability)
Concordance_TrDS

## $Concordance
## [1] 0.9629162
##
## $Discordance
## [1] 0.03708385
##
## $Tied
## [1] -2.775558e-17
##
## $Pairs
## [1] 1056255

Concordance is very good since it shows 96% so the model is very good
# Root Mean Square Error (RMSE)
# computed considering the personal loan as a continous variable or number
RMSE_TrDS = rmse(TrainDS$Personal.Loan,TrainDS$Prediction)
RMSE_TrDS

## [1] 0.1160809

Root MEan Square Error is 0.1160809 / 11%

# Mean Absolute Error considering PErsonal Loan as a number
MAE_TrDS = mae(TrainDS$Personal.Loan, TrainDS$Prediction)
MAE_TrDS

## [1] 0.01347477

Mean Absolute Error is 0.0135 / 1.3%

## Model perofrmance of Test Data on the built model

# Prediction

TestDS$Prediction=predict(pTree, TestDS, type = "class")

TestDS$Probability=predict(pTree, TestDS, type = "prob")[ ,'1']
# Confusion MAtrix

tb1_TeDS=table(TestDS$Prediction, TestDS$Personal.Loan)
print(tb1_TeDS)

##
## 0 1
## 0 1343 14
## 1 8 129

# Classification Error Computation

CeR_TeDS=(tb1_TeDS[1,2]+tb1_TeDS[2,1])/ sum(tb1_TeDS)
CeR_TeDS

## [1] 0.01472557

Classificaiton Error Rate on Test data is 0.01472557 / 1.5%

# Computing Accuracy
Accuracy_TeDS = 1-CeR_TeDS # Since Accuracy is 1-Error
Accuracy_TeDS

## [1] 0.9852744

Accuracy of the model on the testing data is 98.5% which is quite similar tot he one of the
train data
# finding True positive rate / Sensitivity
TPR_TeDS=tb1_TeDS[2,2]/(tb1_TeDS[1,2]+tb1_TeDS[2,2])
TPR_TeDS

## [1] 0.9020979

# Finding True Negative Rate / Specificity

TNR_TeDS = tb1_TeDS[1,1]/ (tb1_TeDS[1,1]+tb1_TeDS[2,1])
TNR_TeDS

## [1] 0.9940785

# Creating Decile and chopping them to buckets

prob_TeDS_CART = seq(0,1,length = 11)
qt_TeDS_CART = quantile(TestDS$Probability, prob_TeDS_CART)
qt_TeDS_CART

## 0% 10% 20% 30% 40% 50%

## 0.000000000 0.000000000 0.004937334 0.004937334 0.004937334 0.004937334
## 60% 70% 80% 90% 100%
## 0.004937334 0.004937334 0.004937334 0.134328358 1.000000000

As we observe that most of the data falls between 90% - 100% bucket Almost 86% of the
response falls in that bucket
TestDS$deciles = cut(TestDS$Probability, unique(qt_TeDS_CART),include.lowest
= TRUE, right = TRUE)
table(TestDS$deciles)

##
## [0,0.00494] (0.00494,0.134] (0.134,1]
## 1282 75 137

## Model perofrmance measures on Test Data

# Rank Order Table

testDT = data.table(TestDS)
rankTbl_TeDS_CART = testDT[, list(
cnt = length(Personal.Loan),
cnt_tar1 = sum(Personal.Loan == 1),
cnt_tar0 = sum(Personal.Loan == 0)),
by=deciles][order(-deciles)]
rankTbl_TeDS_CART$resp_rate = round(rankTbl_TeDS_CART$cnt_tar1 /
rankTbl_TeDS_CART$cnt,4)*100
rankTbl_TeDS_CART$cum_resp = cumsum(rankTbl_TeDS_CART$cnt_tar1)
rankTbl_TeDS_CART$cum_non_resp = cumsum(rankTbl_TeDS_CART$cnt_tar0)
rankTbl_TeDS_CART$cum_rel_resp = round(rankTbl_TeDS_CART$cum_resp /
sum(rankTbl_TeDS_CART$cnt_tar1),4)*100
rankTbl_TeDS_CART$cum_rel_non_resp = round(rankTbl_TeDS_CART$cum_non_resp /
sum(rankTbl_TeDS_CART$cnt_tar0),4)*100
rankTbl_TeDS_CART$ks = abs(rankTbl_TeDS_CART$cum_rel_resp -
rankTbl_TeDS_CART$cum_rel_non_resp) #ks
rankTbl_TeDS_CART

## deciles cnt cnt_tar1 cnt_tar0 resp_rate cum_resp cum_non_resp

## 1: (0.134,1] 137 129 8 94.16 129 8
## 2: (0.00494,0.134] 75 5 70 6.67 134 78
## 3: [0,0.00494] 1282 9 1273 0.70 143 1351
## cum_rel_resp cum_rel_non_resp ks
## 1: 90.21 0.59 89.62
## 2: 93.71 5.77 87.94
## 3: 100.00 100.00 0.00

# Calculating auc,ks & gini computing methods on Test data

predObj_TeDS = prediction(TestDS$Probability, TestDS$Personal.Loan)

perf_TeDS = performance(predObj_TeDS, "tpr" , "fpr")
plot(perf_TeDS)
ks_TeDS = max([email protected][[1]][email protected][[1]])
auc_TeDS = performance(predObj_TeDS, "auc")
auc_TeDS = as.numeric([email protected])
gini_TeDS = ineq(TestDS$Probability,type = "Gini")
cat("Ks=", ks_TeDS,
"auc=" , auc_TeDS,
"gini=" , gini_TeDS)

## Ks= 0.8961764 auc= 0.9686039 gini= 0.8680705

# Concordance and Discordance Ratio Computation

Concordance_TeDS = Concordance(actuals = TestDS$Personal.Loan,

predictedScores = TestDS$Probability)
Concordance_TeDS

## $Concordance
## [1] 0.9419803
##
## $Discordance
## [1] 0.0580197
##
## $Tied
## [1] -4.163336e-17
##
## $Pairs
## [1] 193193
# Root Mean Square Error (RMSE)
# computed considering the personal loan as a continous variable or number

RMSE_TeDS = rmse(TestDS$Personal.Loan,TestDS$Prediction)
RMSE_TeDS

## [1] 0.121349

# Mean Absolute Error considering Personal Loan as a number

MAE_TeDS = mae(TestDS$Personal.Loan, TestDS$Prediction)

MAE_TeDS

## [1] 0.01472557

# CART model performance Table

Performance_KPI = c("Classification Error Rate",

"Accuracy",
"TPR",
"TNR",
"ks",
"auc",
"gini",
"Concordance",
"RMSE*",
"MAE*")

Training_CART = c(CER_TrDS,
Acc_TrDS,
TPR_TrDS,
TNR_TrDS,
ks_TrDS,
auc_TrDS,
gini_TrDS,
Concordance_TrDS$Concordance,
RMSE_TrDS,
MAE_TrDS)
Test_CART =c(CeR_TeDS,
Accuracy_TeDS,
TPR_TeDS,
TNR_TeDS,
ks_TeDS,
auc_TeDS,
gini_TeDS,
Concordance_TeDS$Concordance,
RMSE_TeDS,
MAE_TeDS)

x=cbind(Performance_KPI,Training_CART,Test_CART)
x=data.table(x)
x$Training_CART=as.numeric(x$Training_CART)
x$Test_CART=as.numeric(x$Test_CART)
print(x)

## Performance_KPI Training_CART Test_CART

## 1: Classification Error Rate 0.01347477 0.01472557
## 2: Accuracy 0.98652523 0.98527443
## 3: TPR 0.90746269 0.90209790
## 4: TNR 0.99492547 0.99407846
## 5: ks 0.91085865 0.89617636
## 6: auc 0.97992436 0.96860393
## 7: gini 0.86766141 0.86807045
## 8: Concordance 0.96291615 0.94198030
## 9: RMSE* 0.11608088 0.12134896
## 10: MAE* 0.01347477 0.01472557

• considering personla Loan as number applicable

RANDOM FOREST
## Random Forest

# Building Random Forest Model

TrainDS =TrainDS_RF
TestDS=TestDS_RF

rndForest=randomForest(Personal.Loan ~ ., data = TrainDS, ntree=501, mtry=5,

nodesize=10, importance=TRUE)
print(rndForest)

##
## Call:
## randomForest(formula = Personal.Loan ~ ., data = TrainDS, ntree = 501,
mtry = 5, nodesize = 10, importance = TRUE)
## Type of random forest: classification
## Number of trees: 501
## No. of variables tried at each split: 5
##
## OOB estimate of error rate: 1.32%
## Confusion matrix:
## 0 1 class.error
## 0 3147 6 0.00190295
## 1 40 295 0.11940299

# Tree Calculation based on error rate

min(rndForest$err.rate)

## [1] 0.001585791
# Plotting Error Rates for Random Forest

plot(rndForest, main = "")

legend("topright", c("OOB", "0", "1"), text.col = 1:6, lty = 1:3, col = 1:3)
title(main = "Error Rates Random Forest TrainDT")

After 100 the curve seems to be constant

We will start with 101 tree. Taking odd number so that we get a result in probability
# finding importance parameter

print(rndForest$importance)

## 0 1 MeanDecreaseAccuracy
## Age..in.years. 3.673304e-03 6.194839e-04 3.378982e-03
## Experience..in.years. 3.189203e-03 2.180676e-03 3.091802e-03
## Income..in.K.month. 1.285083e-01 4.589875e-01 1.600288e-01
## Family.members 5.306916e-02 7.721191e-02 5.536076e-02
## CCAvg 3.162860e-02 7.401166e-02 3.565070e-02
## Education 7.129464e-02 1.317559e-01 7.705097e-02
## Mortgage 1.186234e-03 -2.870626e-03 7.936276e-04
## Securities.Account 4.405486e-05 -5.149235e-05 3.663810e-05
## CD.Account 3.255770e-03 1.031084e-02 3.929585e-03
## Online 5.194141e-05 3.917111e-04 8.482881e-05
## CreditCard 6.970194e-04 5.825216e-04 6.873496e-04
## MeanDecreaseGini
## Age..in.years. 9.7745913
## Experience..in.years. 9.9801487
## Income..in.K.month. 182.4077177
## Family.members 83.4483778
## CCAvg 80.2558097
## Education 155.3222629
## Mortgage 8.3953196
## Securities.Account 0.7925112
## CD.Account 29.3202665
## Online 1.0615870
## CreditCard 2.0055241

# Tuning Random Forest

set.seed(1000)
set.seed(seed)
tRndForest=tuneRF(x=TrainDS[,-
which(colnames(TrainDS)=="Personal.Loan")],y=TrainDS$Personal.Loan,
mtryStart = 9,
ntreeTry = 101,
stepFactor = 1.2,
improve = 0.001,
trace = FALSE,
plot = TRUE,
doBest = TRUE,
nodesize = 10,
importance = TRUE )

## 0.1632653 0.001
## -0.1463415 0.001
## -0.1707317 0.001
# Finding important variables

importance(tRndForest)

## 0 1 MeanDecreaseAccuracy
## Age..in.years. 17.785873 -1.1339472 16.0236905
## Experience..in.years. 13.793447 -0.5375885 13.0188811
## Income..in.K.month. 235.480675 123.4019580 245.4670544
## Family.members 174.449089 68.6975275 178.4550860
## CCAvg 34.016640 52.1285949 41.2768729
## Education 226.777339 96.6939567 238.5661690
## Mortgage 4.066117 0.1846508 4.0537627
## Securities.Account -1.568248 1.5192923 -1.0841882
## CD.Account 13.230849 13.5666479 17.7969469
## Online 0.454509 0.6883306 0.7216765
## CreditCard 2.689135 -0.4054599 2.2414486
## MeanDecreaseGini
## Age..in.years. 7.9980541
## Experience..in.years. 6.7010570
## Income..in.K.month. 190.9673454
## Family.members 97.0932383
## CCAvg 58.7611727
## Education 189.1666498
## Mortgage 2.9831095
## Securities.Account 0.5013371
## CD.Account 18.3850897
## Online 0.8344523
## CreditCard 0.9173974

Income is considered most important variable

There are also other important variables Education, Family Member etc
# Random Forest Model Performance
# Performance on Training Data

# Prediction:

TrainDS$Prediction_RF=predict(tRndForest, TrainDS, type = "class")

TrainDS$Probability1_RF=predict(tRndForest, TrainDS, type = "prob")[,"1"]

# Confusion Matrix:

tbl_TrDS_RF=table(TrainDS$Prediction_RF, TrainDS$Personal.Loan)
tbl_TrDS_RF

##
## 0 1
## 0 3153 17
## 1 0 318

# Classification Error Rate:

CeR_TrDS_RF=(tbl_TrDS_RF[1,2]+tbl_TrDS_RF[2,1])/sum(tbl_TrDS_RF)

#classification Error Rate or error rate (FP+FN/TP+FP+TN+FN)

CeR_TrDS_RF

## [1] 0.004873853

# Accuracy:

Accuracy_TrDS_RF=1-(tbl_TrDS_RF[1,2]+tbl_TrDS_RF[2,1])/sum(tbl_TrDS_RF)

#accuracy (1-error rate)

Accuracy_TrDS_RF

## [1] 0.9951261

# True positive rate or sensitivity:

TPR_TrDS_RF=tbl_TrDS_RF[2,2]/(tbl_TrDS_RF[1,2]+tbl_TrDS_RF[2,2])

#True positive rate or sensitivity (TP/TP+FN)

TPR_TrDS_RF

## [1] 0.9492537

# True negative rate or specificity:

TNR_TrDS_RF=tbl_TrDS_RF[1,1]/(tbl_TrDS_RF[1,1]+tbl_TrDS_RF[2,1])

#Truenegative rate or specificity (TN/TN+FP)

TNR_TrDS_RF

## [1] 1

# Creating Decile and Chopping into unique buckets:

probs_TrDS_RF=seq(0,1,length=11)
qs_TrDS_RF=quantile(TrainDS$Probability1_RF, probs_TrDS_RF)
qs_TrDS_RF

## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.248 1.000

# Splitting the deciles

TrainDS$deciles_RF=cut(TrainDS$Probability1_RF, unique(qs_TrDS_RF),
include.lowest = TRUE, right=TRUE)
table(TrainDS$deciles_RF)

##
## [0,0.002] (0.002,0.248] (0.248,1]
## 2826 313 349

Three deciles has been split, first decile 0 -0.002, second decile 0.002 - 0.248, third decile
0.248 - 1 where majority of the data falls
# Rank ordering table computing

library(data.table)
trainDT = data.table(TrainDS)
rankTbl_TrDS_RF = trainDT[, list(
cnt = length(Personal.Loan),
cnt_tar1= sum(Personal.Loan == 1),
cnt_tar0 = sum(Personal.Loan == 0)),
by=deciles_RF][order(-deciles_RF)]
rankTbl_TrDS_RF$resp_rate = round(rankTbl_TrDS_RF$cnt_tar1 /
rankTbl_TrDS_RF$cnt,4)*100
rankTbl_TrDS_RF$cum_resp = cumsum(rankTbl_TrDS_RF$cnt_tar1)
rankTbl_TrDS_RF$cum_non_resp = cumsum(rankTbl_TrDS_RF$cnt_tar0)
rankTbl_TrDS_RF$cum_rel_resp = round(rankTbl_TrDS_RF$cum_resp /
sum(rankTbl_TrDS_RF$cnt_tar1),4)*100
rankTbl_TrDS_RF$cum_rel_non_resp = round(rankTbl_TrDS_RF$cum_non_resp /
sum(rankTbl_TrDS_RF$cnt_tar0),4)*100
rankTbl_TrDS_RF$ks = abs(rankTbl_TrDS_RF$cum_rel_resp -
rankTbl_TrDS_RF$cum_rel_non_resp) #ks
rankTbl_TrDS_RF

## deciles_RF cnt cnt_tar1 cnt_tar0 resp_rate cum_resp cum_non_resp

## 1: (0.248,1] 349 333 16 95.42 333 16
## 2: (0.002,0.248] 313 2 311 0.64 335 327
## 3: [0,0.002] 2826 0 2826 0.00 335 3153
## cum_rel_resp cum_rel_non_resp ks
## 1: 99.4 0.51 98.89
## 2: 100.0 10.37 89.63
## 3: 100.0 100.00 0.00

# auc,ks and gini Computing:

predObj_TrDS_RF = prediction(TrainDS$Probability1_RF, TrainDS$Personal.Loan)

perf_TrDS_RF = performance(predObj_TrDS_RF, "tpr", "fpr")
plot(perf_TrDS_RF) #ROC curve

# ks

ks_TrDS_RF = max([email protected][[1]][email protected][[1]]) #ks

auc_TrDS_RF = performance(predObj_TrDS_RF,"auc");
auc_TrDS_RF= as.numeric([email protected]) #auc
gini_TrDS_RF= ineq(TrainDS$Probability1_RF, type="Gini") #gini
cat("ks=", ks_TrDS_RF,
"auc=", auc_TrDS_RF,
"gini=", gini_TrDS_RF)

## ks= 0.9942912 auc= 0.9998755 gini= 0.9027421

# Concordance and Discordance ratios: computing

Concordance_TrDS_RF=Concordance(actuals=TrainDS$Personal.Loan,
predictedScores=TrainDS$Probability1_RF)
Concordance_TrDS_RF

## $Concordance
## [1] 0.9998731
##
## $Discordance
## [1] 0.0001268633
##
## $Tied
## [1] -2.981556e-17
##
## $Pairs
## [1] 1056255

# Root-Mean Square Error (RMSE*):

RMSE_TrDS_RF=rmse(TrainDS$Personal.Loan, TrainDS$Prediction_RF)
RMSE_TrDS_RF

## [1] 0.06981299

# Mean absolute error (MAE*):

MAE_TrDS_RF=mae(TrainDS$Personal.Loan, TrainDS$Prediction_RF)
MAE_TrDS_RF

## [1] 0.004873853

# Test data Performance on built Model

# Prediction:

TestDS$Prediction_RF=predict(tRndForest, TestDS, type = "class")

TestDS$Probability1_RF=predict(tRndForest, TestDS, type = "prob")[,"1"]

# Confusion Matrix:

tbl_TeDS_RF=table(TestDS$Prediction_RF, TestDS$Personal.Loan)
tbl_TeDS_RF
##
## 0 1
## 0 1347 14
## 1 4 129

# Classification Error Rate:

CeR_TeDS_RF=(tbl_TeDS_RF[1,2]+tbl_TeDS_RF[2,1])/sum(tbl_TeDS_RF)
CeR_TeDS_RF

## [1] 0.01204819

# Accuracy:

Accuracy_TeDS_RF=1-CeR_TeDS_RF
Accuracy_TeDS_RF

## [1] 0.9879518

# True positive rate or sensitivity:

TPR_TeDS_RF=tbl_TeDS_RF[2,2]/(tbl_TeDS_RF[1,2]+tbl_TeDS_RF[2,2])
TPR_TeDS_RF

## [1] 0.9020979

# True negative rate or specificity:

TNR_TeDS_RF=tbl_TeDS_RF[1,1]/(tbl_TeDS_RF[1,1]+tbl_TeDS_RF[2,1])
TNR_TeDS_RF

## [1] 0.9970392

# Creating Decile and Chopping into unique buckets:

probs_TeDS_RF=seq(0,1,length=11)
qs_TeDS_RF=quantile(TestDS$Probability1_RF, probs_TeDS_RF)
qs_TeDS_RF

## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.004 0.205 1.000

# Splitting Deciles

TestDS$deciles_RF=cut(TestDS$Probability1_RF, unique(qs_TeDS_RF),
include.lowest = TRUE, right=TRUE)
table(TestDS$deciles_RF)

##
## [0,0.004] (0.004,0.205] (0.205,1]
## 1210 134 150
# Rank ordering table on RF Test Data

testDT = data.table(TestDS)
rankTbl_TeDS_RF = testDT[, list(
cnt = length(Personal.Loan),
cnt_tar1 = sum(Personal.Loan == 1),
cnt_tar0 = sum(Personal.Loan == 0)),
by=deciles_RF][order(-deciles_RF)]
rankTbl_TeDS_RF$resp_rate = round(rankTbl_TeDS_RF$cnt_tar1 /
rankTbl_TeDS_RF$cnt,4)*100
rankTbl_TeDS_RF$cum_resp = cumsum(rankTbl_TeDS_RF$cnt_tar1)
rankTbl_TeDS_RF$cum_non_resp = cumsum(rankTbl_TeDS_RF$cnt_tar0)
rankTbl_TeDS_RF$cum_rel_resp = round(rankTbl_TeDS_RF$cum_resp /
sum(rankTbl_TeDS_RF$cnt_tar1),4)*100
rankTbl_TeDS_RF$cum_rel_non_resp = round(rankTbl_TeDS_RF$cum_non_resp /
sum(rankTbl_TeDS_RF$cnt_tar0),4)*100
rankTbl_TeDS_RF$ks = abs(rankTbl_TeDS_RF$cum_rel_resp -
rankTbl_TeDS_RF$cum_rel_non_resp) #ks
rankTbl_TeDS_RF

## deciles_RF cnt cnt_tar1 cnt_tar0 resp_rate cum_resp cum_non_resp

## 1: (0.205,1] 150 133 17 88.67 133 17
## 2: (0.004,0.205] 134 10 124 7.46 143 141
## 3: [0,0.004] 1210 0 1210 0.00 143 1351
## cum_rel_resp cum_rel_non_resp ks
## 1: 93.01 1.26 91.75
## 2: 100.00 10.44 89.56
## 3: 100.00 100.00 0.00

# auc, ks and gini Computing:

predObj_TeDS_RF = prediction(TestDS$Probability1_RF, TestDS$Personal.Loan)

perf_TeDS_RF = performance(predObj_TeDS_RF, "tpr", "fpr")
plot(perf_TeDS_RF) #ROC curve
ks_TeDS_RF = max([email protected][[1]][email protected][[1]])
#ksTestDS
auc_TeDS_RF = performance(predObj_TeDS_RF,"auc")
auc_TeDS_RF = as.numeric([email protected]) #auc
gini_TeDS_RF = ineq(TestDS$Probability1_RF, type="Gini") #gini
cat("ks_TeDS_RF=", ks_TeDS_RF,
"auc_TeDS_RF=", auc_TeDS_RF,
"gini_TeDS_RF=", gini_TeDS_RF)

## ks_TeDS_RF= 0.9328754 auc_TeDS_RF= 0.9968555 gini_TeDS_RF= 0.9035695

# Concordance and Discordance ratios:

Concordance_TeDS_RF=Concordance(actuals=TestDS$Personal.Loan,
predictedScores=TestDS$Probability1_RF)
Concordance_TeDS_RF

## $Concordance
## [1] 0.9968011
##
## $Discordance
## [1] 0.003198874
##
## $Tied
## [1] -1.864828e-17
##
## $Pairs
## [1] 193193

# Root-Mean Square Error(RMSE*):

RMSE_TeDS_RF=rmse(TestDS$Personal.Loan, TestDS$Prediction_RF)
RMSE_TeDS_RF

## [1] 0.1097643

# Mean absolute error (MAE*):

MAE_TeDS_RF=mae(TestDS$Personal.Loan, TestDS$Prediction_RF)
MAE_TeDS_RF

## [1] 0.01204819

# CART & Random Forest Model Summary

Performance_KPI = c("Classification Error Rate",

"Accuracy",
"TPR",
"TNR",
"ks",
"auc",
"gini",
"Concordance",
"RMSE*",
"MAE*")

Training_CART = c(CER_TrDS,
Acc_TrDS,
TPR_TrDS,
TNR_TrDS,
ks_TrDS,
auc_TrDS,
gini_TrDS,
Concordance_TrDS$Concordance,
RMSE_TrDS,
MAE_TrDS)

Test_CART = c(CeR_TeDS,
Accuracy_TeDS,
TPR_TeDS,
TNR_TeDS,
ks_TeDS,
auc_TeDS,
gini_TeDS,
Concordance_TeDS$Concordance,
RMSE_TeDS,
MAE_TeDS)

Training_RF = c(CeR_TrDS_RF,
Accuracy_TrDS_RF,
TPR_TrDS_RF,
TNR_TrDS_RF,
ks_TrDS_RF,
auc_TrDS_RF,
gini_TrDS_RF,
Concordance_TrDS_RF$Concordance,
RMSE_TrDS_RF,
MAE_TrDS_RF)

Test_RF = c(CeR_TeDS_RF,
Accuracy_TeDS_RF,
TPR_TeDS_RF,
TNR_TeDS_RF,
ks_TeDS_RF,
auc_TeDS_RF,
gini_TeDS_RF,
Concordance_TeDS_RF$Concordance,
RMSE_TeDS_RF,
MAE_TeDS_RF)

y=cbind(Performance_KPI, Training_CART, Test_CART, Training_RF, Test_RF)

library(data.table)
y=data.table(y)
y$Training_CART=as.numeric(y$Training_CART)
y$Test_CART=as.numeric(y$Test_CART)
y$Training_RF=as.numeric(y$Training_RF)
y$Test_RF=as.numeric(y$Test_RF)
print(y)

## Performance_KPI Training_CART Test_CART Training_RF

## 1: Classification Error Rate 0.01347477 0.01472557 0.004873853
## 2: Accuracy 0.98652523 0.98527443 0.995126147
## 3: TPR 0.90746269 0.90209790 0.949253731
## 4: TNR 0.99492547 0.99407846 1.000000000
## 5: ks 0.91085865 0.89617636 0.994291151
## 6: auc 0.97992436 0.96860393 0.999875504
## 7: gini 0.86766141 0.86807045 0.902742132
## 8: Concordance 0.96291615 0.94198030 0.999873137
## 9: RMSE* 0.11608088 0.12134896 0.069812987
## 10: MAE* 0.01347477 0.01472557 0.004873853
## Test_RF
## 1: 0.01204819
## 2: 0.98795181
## 3: 0.90209790
## 4: 0.99703923
## 5: 0.93287541
## 6: 0.99685548
## 7: 0.90356946
## 8: 0.99680113
## 9: 0.10976426
## 10: 0.01204819

Conclusion on the model performance

CART and Random Forest Model Summary
All key performance indicators, indicating built CART model is very good and showing very
good performance on Train and Test datasets Performance of Random Forest is even better
than CART (As the above table suggests) Both models are very good and as a choice of
preference, would select Random Forest for further business working due to it’s better
performance over CART model

Business Statistics: A First Course, D 2nd Canadian Edition, Norean Sharpe
100% (1)
Business Statistics: A First Course, D 2nd Canadian Edition, Norean Sharpe
406 pages
Business Analytics
No ratings yet
Business Analytics
65 pages
Module 1
No ratings yet
Module 1
138 pages
Research Methodology - Types, Examples and Writing Guide
No ratings yet
Research Methodology - Types, Examples and Writing Guide
12 pages
Bol Programming in Se38
No ratings yet
Bol Programming in Se38
2 pages
Guidelines On CIB
100% (1)
Guidelines On CIB
4 pages
CHAPTER 1 To 4 Final Page 1
No ratings yet
CHAPTER 1 To 4 Final Page 1
77 pages
C Zentrix Manual
100% (1)
C Zentrix Manual
134 pages
Speech Emotion Recognition and Classification Using Deep Learning
100% (1)
Speech Emotion Recognition and Classification Using Deep Learning
39 pages
Preferences On Listening To Music, Academic Performance and Stress Coping of The Grade 12 Abm Senior Highschool in Our Lady of Fatima University
100% (1)
Preferences On Listening To Music, Academic Performance and Stress Coping of The Grade 12 Abm Senior Highschool in Our Lady of Fatima University
33 pages
Thera Bank-Project
100% (12)
Thera Bank-Project
26 pages
Proceedings - 25EYGEC Circuklar Shaft
0% (1)
Proceedings - 25EYGEC Circuklar Shaft
444 pages
CRM Unit 5 - Customer Analytics Part I
No ratings yet
CRM Unit 5 - Customer Analytics Part I
23 pages
Soft Computing
No ratings yet
Soft Computing
13 pages
Chapter III - Computer Solution Transportation Problem
No ratings yet
Chapter III - Computer Solution Transportation Problem
67 pages
CH 17
No ratings yet
CH 17
9 pages
Python OOP
No ratings yet
Python OOP
19 pages
ERSS-Lecture 3 and 4 Geotechnical Design of Embedded Retaining Wall (HW)
100% (2)
ERSS-Lecture 3 and 4 Geotechnical Design of Embedded Retaining Wall (HW)
164 pages
Rest Api Step by Step
No ratings yet
Rest Api Step by Step
15 pages
Syllabus SOPE
0% (2)
Syllabus SOPE
6 pages
Introduction To Machine Learning With Python A Guide For Beginners in Data Science 9781724417503 1724417509
100% (3)
Introduction To Machine Learning With Python A Guide For Beginners in Data Science 9781724417503 1724417509
176 pages
House Price Prediction
No ratings yet
House Price Prediction
59 pages
Project 3 Thera Bank
100% (1)
Project 3 Thera Bank
24 pages
A LSTM Based Framework For Handling Multiclass Imbalance in DGA Botnet Detection 1
No ratings yet
A LSTM Based Framework For Handling Multiclass Imbalance in DGA Botnet Detection 1
13 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Sampling - Test Bank: Sekaran Research Methods For Business: A Skill-Building Approach, 7/e Test Bank
No ratings yet
Sampling - Test Bank: Sekaran Research Methods For Business: A Skill-Building Approach, 7/e Test Bank
4 pages
EDM Creating Formulas
No ratings yet
EDM Creating Formulas
7 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
g10 - Full Report Amali 3
No ratings yet
g10 - Full Report Amali 3
26 pages
Biostats Notes
No ratings yet
Biostats Notes
3 pages
Data Science Pipeline, EDA & Data Preparation
No ratings yet
Data Science Pipeline, EDA & Data Preparation
14 pages
Mini Project-Data Mining
No ratings yet
Mini Project-Data Mining
25 pages
Project On Data Mining-Raveendra Babu Gaddam
No ratings yet
Project On Data Mining-Raveendra Babu Gaddam
29 pages
Saad Karimi (Assignment 1)
67% (3)
Saad Karimi (Assignment 1)
10 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
119 pages
Research Methodology Chapter 1 - 240118 - 130959
No ratings yet
Research Methodology Chapter 1 - 240118 - 130959
9 pages
AI and ML With Python PDF
No ratings yet
AI and ML With Python PDF
2 pages
Tutorial 3: Section A
No ratings yet
Tutorial 3: Section A
4 pages
Project3: Loading Library
No ratings yet
Project3: Loading Library
17 pages
Statistical Analysis of Inclusion Chemistry Distributions in Steels PDF
No ratings yet
Statistical Analysis of Inclusion Chemistry Distributions in Steels PDF
12 pages
Statistics SLM
No ratings yet
Statistics SLM
7 pages
Unit 3
No ratings yet
Unit 3
47 pages
Validation Evidence of The Motivation For Teaching Scale in Secondary Education
No ratings yet
Validation Evidence of The Motivation For Teaching Scale in Secondary Education
12 pages
Cic Integration With Sap CRM: Pureconnect®
No ratings yet
Cic Integration With Sap CRM: Pureconnect®
58 pages
Individual SPC Assignment
100% (1)
Individual SPC Assignment
10 pages
Up Tps6 Lecture Powerpoint 11.1 2
No ratings yet
Up Tps6 Lecture Powerpoint 11.1 2
63 pages
(Haskins) Practical Guide To Critical Thinking PDF
100% (2)
(Haskins) Practical Guide To Critical Thinking PDF
14 pages
How To Extract Approval Rules
No ratings yet
How To Extract Approval Rules
11 pages
Homework # 5 Solution: Instructor: John C.S. Lui
No ratings yet
Homework # 5 Solution: Instructor: John C.S. Lui
3 pages
Department of Mathematics MAL 108 (Introduction To Statistics) Tutorial Sheet No. 6 (Sampling Distribution)
No ratings yet
Department of Mathematics MAL 108 (Introduction To Statistics) Tutorial Sheet No. 6 (Sampling Distribution)
2 pages
Assignment Two
No ratings yet
Assignment Two
5 pages
APCS Deck of Cards Programming Assignment
No ratings yet
APCS Deck of Cards Programming Assignment
5 pages
Finish Start: Chapter 02: Project Management Solution: Practice Problems
No ratings yet
Finish Start: Chapter 02: Project Management Solution: Practice Problems
5 pages
Module 2 in IStat 1 Probability Distribution
No ratings yet
Module 2 in IStat 1 Probability Distribution
6 pages
SQIT1013 Individual
No ratings yet
SQIT1013 Individual
8 pages
Best Practices of Huawei SAP HANA TDI Solution Using OceanStor Dorado V3
No ratings yet
Best Practices of Huawei SAP HANA TDI Solution Using OceanStor Dorado V3
26 pages
R Lnaguager
No ratings yet
R Lnaguager
38 pages
C2 - WK05 - Python Project 5 - Problem Statement
No ratings yet
C2 - WK05 - Python Project 5 - Problem Statement
3 pages
Depreciation and Fixed Assets
No ratings yet
Depreciation and Fixed Assets
10 pages
Conditional Probability Formula Excel Template
No ratings yet
Conditional Probability Formula Excel Template
4 pages
Machine Learning - Home - Coursera Quiz PDF
100% (1)
Machine Learning - Home - Coursera Quiz PDF
5 pages
Interaction Record
No ratings yet
Interaction Record
3 pages
To Pool or Not To Pool: That Is The Confusion
No ratings yet
To Pool or Not To Pool: That Is The Confusion
7 pages
CRM Organization Model Config
No ratings yet
CRM Organization Model Config
13 pages
ME Final Assignment
No ratings yet
ME Final Assignment
6 pages
Execute BW Query Using Abap Part II
No ratings yet
Execute BW Query Using Abap Part II
9 pages
Middayy Jammu
No ratings yet
Middayy Jammu
16 pages
BCU BSC in QS CP - Assignment - July Intake 2020
No ratings yet
BCU BSC in QS CP - Assignment - July Intake 2020
8 pages
Assigning Business Roles in SAP CRM
No ratings yet
Assigning Business Roles in SAP CRM
11 pages
Bcu BSC in Qs Ipp Assignment Individual
No ratings yet
Bcu BSC in Qs Ipp Assignment Individual
6 pages
Sapuniversity - Eu-The Usage of The SAP CRM Role Configuration Key Detailed Example
No ratings yet
Sapuniversity - Eu-The Usage of The SAP CRM Role Configuration Key Detailed Example
10 pages
Customers Tempt
No ratings yet
Customers Tempt
2 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Final
No ratings yet
Final
27 pages
Information Management (Lab) Stored Procedures and UDF Assignment 7
No ratings yet
Information Management (Lab) Stored Procedures and UDF Assignment 7
2 pages
Determining Visitors' Satisfaction in Theme Parks A Case
100% (2)
Determining Visitors' Satisfaction in Theme Parks A Case
4 pages
Testing Isu CRM
No ratings yet
Testing Isu CRM
2 pages
QWE Case Study
No ratings yet
QWE Case Study
5 pages
Global Strategy Final Assigment Submission
No ratings yet
Global Strategy Final Assigment Submission
18 pages
Manual Therapy: Original Article
No ratings yet
Manual Therapy: Original Article
5 pages
IPc Configuration
No ratings yet
IPc Configuration
11 pages
Tutorial 4: Simple Problems: Following Program Segment?
No ratings yet
Tutorial 4: Simple Problems: Following Program Segment?
5 pages
Qs 4002 Bted Assignment
No ratings yet
Qs 4002 Bted Assignment
7 pages
Unit 1 Introduction To BW Reporting
No ratings yet
Unit 1 Introduction To BW Reporting
10 pages
What Is Currency Conversion Currency Conversion Currency Conversion
No ratings yet
What Is Currency Conversion Currency Conversion Currency Conversion
2 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
Understanding SAP CRM Webclient
No ratings yet
Understanding SAP CRM Webclient
6 pages
AA ERP Project
No ratings yet
AA ERP Project
12 pages