0% found this document useful (0 votes)

126 views25 pages

Mini Project-Data Mining

The document describes a predictive modeling mini project that aims to classify customers who are most likely to purchase a loan. It outlines steps for exploratory data analysis including importing data, identifying variables, univariate and bivariate analysis, handling missing values and outliers, feature creation, and applying models like linear regression, clustering, CART and random forest. It also discusses evaluating model performance and providing business insights.

Uploaded by

Stuti Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views25 pages

Mini Project-Data Mining

Uploaded by

Stuti Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Mini Project – Predictive

Modeling
By - Stuti Prasad

1
Table of Contents

1 Project Objective ............................................................................................................................. 3

2 Assumptions .................................................................................................................................... 3

3 Exploratory Data Analysis – Step by step approach ....................................................................... 3

3.1 Environment Set up and Data Import ..................................................................................... 3

3.1.1 Install necessary Packages and Invoke Libraries ............................................................. 3

3.1.2 Set up working Directory ................................................................................................ 3

3.1.3 Import and Read the Dataset .......................................................................................... 3

3.2 Variable Identification ............................................................................................................. 3

3.2.1 Variable Identification – Inferences ................................................................................ 3

3.3 Univariate Analysis .................................................................................................................. 5

3.4 Bi-Variate Analysis.................................................................................................................... 7

3.5 Missing Value Identification. .................................................................................................... 8

3.6 Outlier Identification. ............................................................................................................... 8

3.7 Variable Transformation / Feature Creation ........................................................................... 8

4 Simple Linear Regression . ................................................................................................................8

5 Clustering ...........................................................................................................................................9

6 CART .................................................................................................................................................10

7 Random Forest. ................................................................................................................................12

8 Roc Curve …………………………………………………………………………………………………………………………………….12

9 Conclusion. .......................................................................................................................................13

10 Appendix A – Source Code .............................................................................................................14

2
1 Project Objective
The objective of the report is to build the best model which can classify the right customers who
have a higher probability of purchasing the loan. This exploration report will consist of the
following:

 Importing the dataset in R.

 EDA of the data available
 Understanding of attributes
 Visual inspection of data
 Appropriate Clustering of data
 Missing Value Treatment
 Outlier treatment
 Addition of New variables, if required
 Application of unsupervised learning methods, if required
 Creation of appropriate models and its interpretation
 Checking performance of all models
 Model validation
 Interpretation from the best model
 Business insights

2 Assumptions
 The data has one dependent variable and other response variables
 Variables that are highly correlated will converge to a common concept or factor

3 Exploratory Data Analysis – Step by Step approach

3.1 Environment Set Up and Data Import
3.1.1 Install Necessary Packages and Invoke Libraries

3.1.2 Set up working Directory

3.1.3 Import and read the dataset

3.2 Variable Identification
3.2.1 Variable identification – inferences
Number of rows and columns:
Number of Rows: 5000

3
Number of Columns: 14

Column Names & Types:

Data
Description:

ID Customer ID
Age Customer's age in years
Experience Years of professional experience
Income Annual income of the customer ($000)
ZIPCode Home Address ZIP code.
Family Family size of the customer
CCAvg Avg. spending on credit cards per month ($000)
Education Education Level. 1: Undergrad; 2: Graduate; 3: Advanced/Professional
Mortgage Value of house mortgage if any. ($000)
Personal Loan Did this customer accept the personal loan offered in the last campaign?
Securities
Does the customer have a securities account with the bank?
Account
CD Account Does the customer have a certificate of deposit (CD) account with the bank?
Online Does the customer use internet banking facilities?
CreditCard Does the customer use a credit card issued by the bank?
The first column is ID which can be ignored.

$ ID : num 1 2 3 4 5 6 7 8 9 10 ...
$ Age (in years) : num 25 45 39 35 35 37 53 50 35 34 ...
$ Experience (in years): num 1 19 15 9 8 13 27 24 10 9 ...
$ Income (in K/month) : num 49 34 11 100 45 29 72 22 81 180 ...
$ ZIP Code : num 91107 90089 94720 94112 91330 ...
$ Family members : num 4 3 1 1 4 4 2 1 3 1 ...
$ CCAvg : num 1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
$ Education : num 1 1 1 2 2 2 2 3 2 3 ...
$ Mortgage : num 0 0 0 0 0 155 0 0 104 0 ...
$ Personal Loan : num 0 0 0 0 0 0 0 0 0 1 ...
$ Securities Account : num 1 1 0 0 0 0 0 0 0 0 ...
$ CD Account : num 0 0 0 0 0 0 0 0 0 0 ...
$ Online : num 0 0 0 0 0 1 1 0 1 0 ...
$ CreditCard : num 0 0 0 0 1 0 0 1 0 0 ...
str indicates that all the columns in the data set are numeric. (ignore the ID column)

3.3 Descriptive Analysis

We can make a new data frame excluding ID column for further analysis

Standard Deviation of the data

4
Age (in years) Experience (in years) Income (in K/month) ZIP Code
11.4631656 11.4679537 46.0337293 2121.8521973
Family members CCAvg Education Mortgage
1.1471604 1.7476590 0.8398691 101.7138021
Personal Loan Securities Account CD Account Online
0.2946207 0.3058093 0.2382503 0.4905893
CreditCard
0.455638
Variance of the data

Age (in years) Experience (in years) Income (in K/month) ZIP Code
1.314042e+02 1.315140e+02 2.119104e+03 4.502257e+06
Family members CCAvg Education Mortgage
1.315977e+00 3.054312e+00 7.053801e-01 1.034570e+04
Personal Loan Securities Account CD Account Online
8.680136e-02 9.351934e-02 5.676319e-02 2.406779e-01
CreditCard
0.207606

Summary of data

Age (in years) Experience (in years) Income (in K/month) ZIP Code Family members
Min. :23.00 Min. :-3.0 Min. : 8.00 Min. : 9307 Min. :1.000
1st Qu.:35.00 1st Qu.:10.0 1st Qu.: 39.00 1st Qu.:91911 1st Qu.:1.000
Median :45.00 Median :20.0 Median : 64.00 Median :93437 Median :2.000
Mean :45.34 Mean :20.1 Mean : 73.77 Mean :93153 Mean :2.397
3rd Qu.:55.00 3rd Qu.:30.0 3rd Qu.: 98.00 3rd Qu.:94608 3rd Qu.:3.000
Max. :67.00 Max. :43.0 Max. :224.00 Max. :96651 Max. :4.000
NA's :18
CCAvg Education Mortgage Personal Loan Securities Account
Min. : 0.000 Min. :1.000 Min. : 0.0 Min. :0.000 Min. :0.0000
1st Qu.: 0.700 1st Qu.:1.000 1st Qu.: 0.0 1st Qu.:0.000 1st Qu.:0.0000
Median : 1.500 Median :2.000 Median : 0.0 Median :0.000 Median :0.0000
Mean : 1.938 Mean :1.881 Mean : 56.5 Mean :0.096 Mean :0.1044
3rd Qu.: 2.500 3rd Qu.:3.000 3rd Qu.:101.0 3rd Qu.:0.000 3rd Qu.:0.0000
Max. :10.000 Max. :3.000 Max. :635.0 Max. :1.000 Max. :1.0000

CD Account Online CreditCard

Min. :0.0000 Min. :0.0000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
Median :0.0000 Median :1.0000 Median :0.000
Mean :0.0604 Mean :0.5968 Mean :0.294
3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.000
Max. :1.0000 Max. :1.0000 Max. :1.000

Visualisation using Histogram and Boxplot

Histogram of all Variables

5
Boxplot of Variables

3.4 Bi Variate Analysis

Correlation between variables:

6
3.4 Missing Value Identification:

There are 18 missing values in Family. This will be taken care by KNN imputation method.
Please refer to the R code for same. The new imputed file name is “Thera_Bank_imputed”

3.5 Outlier Identification

No Outliers

3.6 Variable transformation / feature creation

The categorical variables with values in the form of 0 and 1 were converted to factor
s from numeric. So two datasets were created and then merged using cbind.
The new dataset was named as “Thera_Bank_merged”

4 Simple Linear Regression

Call:
lm(formula = `Personal Loan` ~ Education + CCAvg + Mortgage +
`CD Account` + `Family members` + Ìncome (in K/month)` +
CreditCard + Online + `Securities Account` + Èxperience (in years)` +
Àge (in years)` + `ZIP Code`)

Residuals:
Min 1Q Median 3Q Max
-0.79891 -0.13417 -0.02883 0.07250 1.04525
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.283e-01 1.604e-01 -2.046 0.0408 *
Education 7.943e-02 4.119e-03 19.284 < 2e-16 ***
CCAvg 1.197e-02 2.467e-03 4.853 1.26e-06 ***
Mortgage 6.668e-05 3.301e-05 2.020 0.0434 *
`CD Account` 3.290e-01 1.587e-02 20.737 < 2e-16 ***
`Family members` 3.341e-02 2.903e-03 11.508 < 2e-16 ***

7
Ìncome (in K/month)` 2.999e-03 9.716e-05 30.872 < 2e-16 ***
CreditCard -4.497e-02 7.562e-03 -5.947 2.92e-09 ***
Online -2.653e-02 6.804e-03 -3.900 9.77e-05 ***
`Securities Account` -5.970e-02 1.143e-02 -5.223 1.83e-07 ***
Èxperience (in years)` 6.135e-03 2.768e-03 2.216 0.0267 *
Àge (in years)` -5.647e-03 2.771e-03 -2.038 0.0416 *
`ZIP Code` 1.017e-06 1.545e-06 0.658 0.5103
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2311 on 4969 degrees of freedom
(18 observations deleted due to missingness)
Multiple R-squared: 0.3857, Adjusted R-squared: 0.3842
F-statistic: 260 on 12 and 4969 DF, p-value: < 2.2e-16
The percentage of Total Variation in Personal Loan as a dependent variable explained

by Simple Linear Regression is 38.57 %( The R-squared value)

We can see that there is not much correlation between all the variables

5 Clustering

K-means clustering with 3 clusters of sizes 1573, 1564, 1863

Cluster means:
Age (in years) Experience (in years) Income (in K/month) ZIP Code Family members CCAvg
1 45.42912 20.20280 73.11570 93590.01 2.391122 1.922479
2 44.92775 19.68926 72.77749 95083.30 2.440832 1.924041
3 45.60655 20.37037 75.16694 91162.19 2.363539 1.962657
Education Mortgage Securities.Account CD.Account Online CreditCard
1 1.907184 55.84870 0.1055308 0.06293706 0.5785124 0.3121424
2 1.860614 56.93159 0.1054987 0.06393862 0.6246803 0.2985934
3 1.876006 56.68438

Within cluster sum of squares by cluster:

[1] 674533593 371103612 8013120178
(between_SS / total_SS = 59.9 %)

The data was divided into 3 clusters of sizes 1573, 1564 and 1863. The above table gives the clu
ster means of variables.

6 Building CART Model

We are splitting the data such that we have 70% of the data is Train Data and 30% of the dat
a is my Test Data

The Table function shows that There are 4520 who didn’t respond and 480 who r esponded
from the total data of 5000 observation

table(Thera_Bank1$`Personal Loan`)

0 1

8
4520 480

The train dataset has 3500 observations out of which 3164 are
non respondents and 336 are respondents

table(train$`Personal.Loan`)

0 1
3164 336

The test dataset has 1500 observations out of which 1356 are non
respondents and 144 are respondents

table(test$`Personal.Loan`)

0 1
1356 144

Variables like Credit card, securities account, CD Account, Education and Online have been
considered to build the decision tree

m1= rpart(formula = `Personal.Loan` ~`CreditCard`+`Online`+`Securities.Account`+`CD.Acc

t`+ Education, data=train,method="class",control=r.ctrl)
> m1
n= 3500

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 3500 336 0 (0.90400000 0.09600000)

2) CD.Account=0 3297 239 0 (0.92750986 0.07249014) *
3) CD.Account=1 203 97 0 (0.52216749 0.47783251)
6) CreditCard=1 160 62 0 (0.61250000 0.38750000)
12) Securities.Account=1 65 10 0 (0.84615385 0.15384615)
24) Online=1 58 6 0 (0.89655172 0.10344828) *
25) Online=0 7 3 1 (0.42857143 0.57142857)
50) Education>=2.5 4 1 0 (0.75000000 0.25000000) *
51) Education< 2.5 3 0 1 (0.00000000 1.00000000) *
13) Securities.Account=0 95 43 1 (0.45263158 0.54736842)
26) Education< 1.5 36 11 0 (0.69444444 0.30555556) *
27) Education>=1.5 59 18 1 (0.30508475 0.69491525) *
7) CreditCard=0 43 8 1 (0.18604651 0.81395349) *

9
The above data and Tree shows that CD Account is the Variable at root node

On Checking the Head of prediction of the model following data is provided:

Age (in years) Experience (in years) Income (in K/month) ZIP Code Family members
2 45 19 34 90089 3
5 35 8 45 91330 4
6 37 13 29 92121 4
8 50 24 22 93943 1
9 35 10 81 90089 3
10 34 9 180 93023 1
CCAvg Education Mortgage Personal.Loan Securities.Account CD.Account Online
2 1.5 1 0 0 1 0 0
5 1.0 2 0 0 0 0 0
6 0.4 2 155 0 0 0 1
8 0.3 3 0 0 0 0 0
9 0.6 2 104 0 0 0 1
10 8.9 3 0 1 0 0 0
CreditCard predict.score.0 predict.score.1 predict.class
2 0 0.92750986 0.07249014 0
5 1 0.92750986 0.07249014 0
6 0 0.92750986 0.07249014 0
8 1 0.92750986 0.07249014 0
9 0 0.92750986 0.07249014 0
10 0 0.92750986 0.07249014 0

confusion.matrix.train
predict.class
Personal.Loan 0 1
0 3138 26
1 257 79
> Accuracy.train
[1] 0.9191429

The confusion Matrix show that there is 91% accuracy in the train model

Following is the data for Test

10
Age (in years) Experience (in years) Income (in K/month) ZIP Code Family members
2 45 19 34 90089 3
5 35 8 45 91330 4
6 37 13 29 92121 4
8 50 24 22 93943 1
9 35 10 81 90089 3
10 34 9 180 93023 1
CCAvg Education Mortgage Personal.Loan Securities.Account CD.Account Online
2 1.5 1 0 0 1 0 0
5 1.0 2 0 0 0 0 0
6 0.4 2 155 0 0 0 1
8 0.3 3 0 0 0 0 0
9 0.6 2 104 0 0 0 1
10 8.9 3 0 1 0 0 0
CreditCard predict.score.0 predict.score.1 predict.class
2 0 0.92750986 0.07249014 0
5 1 0.92750986 0.07249014 0
6 0 0.92750986 0.07249014 0
8 1 0.92750986 0.07249014 0
9 0 0.92750986 0.07249014 0
10 0 0.92750986 0.07249014 0

confusion.matrix.test
predict.class
Personal Loan 0 1
0 1324 32
1 105 39
> Accuracy.test
[1] 0.9086667

In the test data the accuracy is around 90%.

7 Random Forest

Call:
randomForest(formula = Personal.Loan ~ CreditCard, data = trainR, ke
ep.forest = TRUE, ntree = 30)
Type of random forest: classification
Number of trees: 30
No. of variables tried at each split: 1

OOB estimate of error rate: 9.68%

Confusion matrix:
0 1 class.error
0 3613 0 0
1 387 0 1

11
The OOB error rate is 9.68% which is fairly decent.

8 ROC Curve

For the train data

12
For the test data

9 Conclusion

After checking model accuracy we can conclude that the CART model can be used as it has
highest accuracy to predict.

13
10 Appendix Source code

#set up working directory

setwd("C:/Users/YOGA 520 M8IN/Desktop/work/R program")

#run necessary packages

library(readr)

library(readxl)

library(caTools)

library(rpart)

library(rpart.plot)

library(rattle)

library(RColorBrewer)

library(data.table)

library(ROCR)

library(ineq)

library(gplots)

library(InformationValue)

install.packages("Hmisc")

library(Hmisc)

library(lattice)

library(ggplot2)

library(plyr)

library(psych)

library(dplyr)

library(tidyverse)

14
library(car)

install.packages("carData")

# Import data set

Thera_Bank = read_excel("Thera Bank_Personal_Loan_Modelling-dataset-1.xlsx")

names(Thera_Bank) # Display column names

dim(Thera_Bank) # Show dimensions of dataset

head(Thera_Bank,10) # Display first 10 rows of dataset

str(Thera_Bank1)

summary(Thera_Bank1)

sum(is.na(Thera_Bank))

describe(Thera_Bank1)

Thera_Bank1=Thera_Bank[,-1] # Removing ID column

sapply(Thera_Bank1, sd, na.rm=TRUE) # To check the Standard Deviation

sapply(Thera_Bank1, var,na.rm=TRUE) # To check the Variance

par(mfrow = c(2,2)) # Convert plotting space to show 4 graphs

multi.hist(Thera_Bank[,1:13],density = FALSE, labels=T, col = "pink", cex.main = 2)

# Histogram matrix for all variables

par(mfrow = c(2,1)) #Reset plotting space

attach(Thera_Bank1)

boxplot(Thera_Bank1,las = 2,col = "Cyan",cex.axis = 1.5) # Boxplot for all variables to

find outlier.

15
title = c("Age (in years)", "Experience (in years)","Income (in K/month)","ZIP
Code","Family members", "CCAvg","Education","Mortgage",

"Securities Account", "CD Account",

"Online", "CreditCard","Personal Loan")

#simple Linear Regression

slm=lm(`Personal Loan`~ Education+CCAvg+Mortgage+`CD Account`+`Family

members`+Ìncome (in K/month)`+CreditCard+Online+`Securities
Account`+Èxperience (in years)`+Àge (in years)`+`ZIP Code`)

summary(slm)

vif(slm)

corbank=cor(Thera_Bank1[,1:13])

print(corbank)

corrplot::corrplot(cor(Thera_Bank1[,1:13]), method = "number", type = "lower")

lmmodel1=lm(Thera_Bank1$`Personal Loan`~Thera_Bank1$`Income (in K/month)`)

summary(lmmodel1)

#Clustering

16
dist1=dist(x=Thera_Bank_imputed,method="maximum")

dist1

cluster=hclust(dist1,method="complete")

cluster

plot(cluster,labels = as.character(Thera_Bank_imputed[,1]))

rect.hclust(cluster,k=3,border="red")

groups=cutree(cluster,k=3)

data1=cbind(Thera_Bank1,groups)

group1=subset(data1,groups==1)

group1

group2=subset(data1,groups==2)

group2

group3=subset(data1,groups==3)

group3

group4=subset(data1,groups==4)

group4

group5=subset(data1,groups==5)

group5

C=Thera_Bank_imputed[,-c(9)]

head(C)

kmeans.cluster=kmeans(C,3)

kmeans.cluster

17
# KNN Imputation

Thera_Bank_num=Thera_Bank1[,c(1:8)] # Numeric columns

Thera_Bank_cat=Thera_Bank1[,-c(1:8)] # categorical columns

str(Thera_Bank_num)

str(Thera_Bank_cat)

Thera_Bank_cat=data.frame(apply(Thera_Bank_cat,2,function(x){as.factor(x)}))

Thera_Bank_merged=cbind(Thera_Bank_num,Thera_Bank_cat)

sum(is.na(Thera_Bank_merged))

Thera_Bank_imputed=knnImputation(data=Thera_Bank_merged,k=5)

sum(is.na(Thera_Bank_imputed))

##CART

table(Thera_Bank1$`Personal Loan`)

set.seed(3000)

attach(Thera_Bank_imputed)

sample = sample.split(`Personal Loan`, SplitRatio = 0.7)

#we are splitting the data such that we have 70% of the data is Train Data and 30%
of the data is my Test Data

18
sample

train = subset(Thera_Bank_imputed, sample == TRUE)

test = subset( Thera_Bank_imputed, sample == FALSE)

table(train$`Personal.Loan`)

table(test$`Personal.Loan`)

#####

#Setting control parameters

r.ctrl = rpart.control(minsplit = 1000,minbucket = 100,cp=0,xval = 10)

r.ctrl=rpart.control(minsplit = 1,minbucket = 1,cp=0.001)

#model building

#Building the CART model

m1= rpart(formula = `Personal.Loan`

~`CreditCard`+`Online`+`Securities.Account`+`CD.Account`+ Education,
data=train,method="class",control=r.ctrl)

#Displaying Decision Tree

fancyRpartPlot(m1)

19
train$predict.score=predict(m1,train)

train$predict.class=predict(m1,train,type="class")

head(train)

#checking model accuracy

confusion.matrix.train=with(train,table(`Personal.Loan`,predict.class))

confusion.matrix.train

Accuracy.train=sum(diag(confusion.matrix.train))/sum(confusion.matrix.train)

Accuracy.train

# Test

test$predict.score=predict(m1,test)

test$predict.class=predict(m1,test,type="class")

head(test)

#checking model accuracy

confusion.matrix.test=with(test,table(`Personal Loan`,predict.class))

confusion.matrix.test

Accuracy.test=sum(diag(confusion.matrix.test))/sum(confusion.matrix.test)

Accuracy.test

20
###

rows=seq(from=1,to=nrow(Thera_Bank_imputed),by=1)

set.seed(1)

trainrows=sample(x=rows,size=nrow(Thera_Bank_imputed)*0.8) # 80% rows

trainR=Thera_Bank_imputed[trainrows,]

testR=Thera_Bank_imputed[-trainrows,]

library(DMwR)

prop.table(table(Thera_Bank_imputed$Personal.Loan))

prop.table(table(trainR$Personal.Loan))

prop.table(table(testR$Personal.Loan))

## Random Forest

library(randomForest)

Therabank_rf=randomForest(Personal.Loan ~
Education+CreditCard+Online+Securities.Account+CD.Account

,data=trainR,keep.forest=TRUE,ntree=30)

21
Therabank_rf=randomForest(Personal.Loan ~
CreditCard,data=trainR,keep.forest=TRUE,ntree=30)

print(Therabank_rf)

print(Therabank_rf$err.rate)

plot(Therabank_rf)

Therabank_rf$predicted

Therabank_rf$importance

varImpPlot(Therabank_rf)

View(trainR)

pred_model_train=predict(Therabank_rf,trainR[,-c(9)],type="class")

trainR$Prediction=pred_model_train

names(trainR)

result_train=table("actual value"=trainR$Personal.Loan,trainR$Prediction)

result_train

pred_model_test=predict(Therabank_rf,testR[,-c(9)],type = "class")

result_test=table("actual values"=testR$Personal.Loan,pred_model_test);result_test

test_accuracy=sum(diag(result_test))/sum(result_test);test_accuracy

test_recall=(result_test[2,2])/(result_test[2,2]+result_test[2,1]);test_recall

test_precision=(result_test[2,2])/(result_test[2,2]+result_test[1,2]);test_precision

22
#Building the ROC curve and lift charts

library(ROCR)

pred = prediction(train$predict.score[,2],train$`Personal.Loan`)

perf = performance(pred, "tpr", "fpr")

plot(perf,main = "ROC curve")

#install.packages("ineq")

library(ineq)

gini = ineq(train$predict.score, type="Gini")

gini

#Checking the classification error

#Confusion Matrix

with(train, table(`Personal.Loan`, predict.class))

nrow(train)

#Accuracy and KS

auc

#Concordance/Discordance

library(InformationValue)

23
Concordance(actuals=train$`Personal.Loan`, predictedScores=train$predict.score)

# Scoring test sample and validating the same

test$predict.class <- predict(m1, test, type="class")

test$predict.score <- predict(m1, test)

head(test)

#with(p_test, table(Sdelinquent, predict.class))

#nrow(p_test)

library(ROCR)

pred = prediction(test$predict.score[,2], test$`Personal.Loan`)

perf <- performance(pred, "tpr", "fpr")

plot(perf,main = "ROC curve")

#install.packages("ineq")

library(ineq)

gini = ineq(test$predict.score, type="Gini")

gini

#Checking the classification error

#Confusion Matrix

with(test, table(`Personal Loan`, predict.class))

24
nrow(test)

#Accuracy and KS

auc

Concordance(actuals=test$`Personal Loan`, predictedScores=test$predict.score)

plot(pressure)

The End

141-CIS Lab Manual v3
100% (2)
141-CIS Lab Manual v3
56 pages
Expert Systems Principles and Programming 4th Edition PDF
20% (5)
Expert Systems Principles and Programming 4th Edition PDF
2 pages
Stored Procedures
100% (1)
Stored Procedures
23 pages
SprankleCh1 Ed9
No ratings yet
SprankleCh1 Ed9
8 pages
Assignment Solution
No ratings yet
Assignment Solution
22 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Dbms Final Report OF PHARMACY MANAGEMNET SYSTEM
No ratings yet
Dbms Final Report OF PHARMACY MANAGEMNET SYSTEM
29 pages
نظام العقارات الالكتروني-2013
60% (5)
نظام العقارات الالكتروني-2013
95 pages
Thera Bank-Project
100% (12)
Thera Bank-Project
26 pages
‎⁨تحليل نظام صيدلية - PDF⁩
No ratings yet
‎⁨تحليل نظام صيدلية - PDF⁩
43 pages
هياكل بيانات وخوارزميات عمليم ماجد البعداني
No ratings yet
هياكل بيانات وخوارزميات عمليم ماجد البعداني
83 pages
Online Cake Shop Project Report
No ratings yet
Online Cake Shop Project Report
90 pages
مشروع قواعد بيانات
No ratings yet
مشروع قواعد بيانات
5 pages
B291 TMA - Fall - 2022-2023
No ratings yet
B291 TMA - Fall - 2022-2023
5 pages
Unlock 2 RW Unit 1
No ratings yet
Unlock 2 RW Unit 1
22 pages
Variable Length Subnet Masks
No ratings yet
Variable Length Subnet Masks
32 pages
اصدار جواز الكتروني
100% (1)
اصدار جواز الكتروني
58 pages
Altera Max Plus Tutorial
No ratings yet
Altera Max Plus Tutorial
15 pages
Final System Analysis Project
No ratings yet
Final System Analysis Project
44 pages
Data Flow Diagram For Bank Management System
100% (1)
Data Flow Diagram For Bank Management System
11 pages
AirIndia Design
No ratings yet
AirIndia Design
13 pages
ECE Course Description
100% (1)
ECE Course Description
10 pages
The Agile Unified Process (AUP)
No ratings yet
The Agile Unified Process (AUP)
9 pages
TM351 Data Management and Analysis: Prepared by Eng. A.Samy Tel: 99941566
100% (1)
TM351 Data Management and Analysis: Prepared by Eng. A.Samy Tel: 99941566
7 pages
- .alghushaimi اسئلة وحلول جافا 2020 - 2021
No ratings yet
- .alghushaimi اسئلة وحلول جافا 2020 - 2021
11 pages
تحليل وتصميم النظم
86% (7)
تحليل وتصميم النظم
6 pages
2667A Introduction To Programming ENU Companion Content
No ratings yet
2667A Introduction To Programming ENU Companion Content
37 pages
System Design
No ratings yet
System Design
4 pages
A Joke Called Current Ratio
No ratings yet
A Joke Called Current Ratio
3 pages
22 - Transactions and Error Handling in SQL Server
No ratings yet
22 - Transactions and Error Handling in SQL Server
9 pages
ERD Library Management System
No ratings yet
ERD Library Management System
2 pages
Srs Budget Accounting
100% (1)
Srs Budget Accounting
27 pages
نظم موزعة المحاضرة 1
No ratings yet
نظم موزعة المحاضرة 1
16 pages
الوحدة الأولى - مقدمة في تقنية المعلومات
No ratings yet
الوحدة الأولى - مقدمة في تقنية المعلومات
30 pages
Expert Systems Types
No ratings yet
Expert Systems Types
9 pages
Web Services and Soa
100% (1)
Web Services and Soa
16 pages
Example of Normalization
No ratings yet
Example of Normalization
42 pages
Quick Startup Guide: Step 1: Preparing To Start Up Step 2: Connecting Devices
No ratings yet
Quick Startup Guide: Step 1: Preparing To Start Up Step 2: Connecting Devices
2 pages
Expert Oracle Thomas Kyte 2oo5 PDF
No ratings yet
Expert Oracle Thomas Kyte 2oo5 PDF
2 pages
Gdi C#
No ratings yet
Gdi C#
181 pages
IM Ch11 DB Performance Tuning Ed12
No ratings yet
IM Ch11 DB Performance Tuning Ed12
17 pages
CDMA
100% (1)
CDMA
31 pages
System Design: Context Flow Diagram
No ratings yet
System Design: Context Flow Diagram
7 pages
‎⁨كتاب الحاسب (النظري) - removed⁩
No ratings yet
‎⁨كتاب الحاسب (النظري) - removed⁩
81 pages
Kali Book Ar PDF
No ratings yet
Kali Book Ar PDF
155 pages
Assignment 1 Visual Programming
No ratings yet
Assignment 1 Visual Programming
2 pages
Computer-Dab PPT-12 08 2023 - 3
No ratings yet
Computer-Dab PPT-12 08 2023 - 3
91 pages
تحليل وتصميم النظم 1
75% (8)
تحليل وتصميم النظم 1
8 pages
Facebook Schema
No ratings yet
Facebook Schema
1 page
SQL Questions
No ratings yet
SQL Questions
4 pages
FULL Script Human Resources For Oracle Database
No ratings yet
FULL Script Human Resources For Oracle Database
65 pages
Medical Stock Managemenet System
No ratings yet
Medical Stock Managemenet System
39 pages
Full Installation ONYX ERP Only ENGLISH 2022 - 220717 - 195303
No ratings yet
Full Installation ONYX ERP Only ENGLISH 2022 - 220717 - 195303
52 pages
Oracle Developer CV (Hesham Fouda)
No ratings yet
Oracle Developer CV (Hesham Fouda)
2 pages
PL SQL Code Protection
No ratings yet
PL SQL Code Protection
124 pages
نظم المعلومات
No ratings yet
نظم المعلومات
3 pages
Project3: Loading Library
No ratings yet
Project3: Loading Library
17 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Cart Project
75% (4)
Cart Project
17 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages