0% found this document useful (1 vote)
348 views2 pages

Random Forest

The document discusses using random forest modeling on fraud data to classify individuals as either "Risky" or "Good" based on their taxable income. It installs relevant packages, loads the fraud data, builds a random forest model with 600 trees using taxable income as the target variable and other columns as predictors, and evaluates the results by checking variable importance and the risk classification breakdown.

Uploaded by

santhi s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
348 views2 pages

Random Forest

The document discusses using random forest modeling on fraud data to classify individuals as either "Risky" or "Good" based on their taxable income. It installs relevant packages, loads the fraud data, builds a random forest model with 600 trees using taxable income as the target variable and other columns as predictors, and evaluates the results by checking variable importance and the risk classification breakdown.

Uploaded by

santhi s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

Use Random Forest to prepare a model on fraud data

treating those who have taxable_income <= 30000 as "Risky" and others are
"Good"

install.packages("caret", dependencies = TRUE)


install.packages("randomForest")
library(randomForest)
library(caret)
View(Fraud_check_)
hist(Fraud_check_$Taxable.Income)

model<-
randomForest(Fraud_check_$Taxable.Income~.,data=Fraud_check_,ntree=600
)
# View the forest results.
print(model)
Risk = ifelse(Fraud_check_$Taxable.Income<= 30000, "Risky", "Good")
Fraud= data.frame(Fraud_check_,Risk)
Fraud1 = Fraud[,c(1:6)]
str(Fraud)#Imoporantce of the variable - Lower Gini
table(FC$Risk)
Call:
randomForest(formula = Fraud_check_$Taxable.Income ~ ., data =
Fraud_check_, ntree = 600)
Type of random forest: regression
Number of trees: 600
No. of variables tried at each split: 1

Mean of squared residuals: 693268098


% Var explained: -1.13
str(Fraud)#Imoporantce of the variable - Lower Gini
'data.frame': 600 obs. of 7 variables:
$ Undergrad : chr "NO" "YES" "NO" "YES" ...
$ Marital.Status : chr "Single" "Divorced" "Married" "Single" ...
$ Taxable.Income : num 68833 33700 36925 50190 81002 ...
$ City.Population: num 50047 134075 160205 193264 27533 ...
$ Work.Experience: num 10 18 30 15 28 0 8 3 12 4 ...
$ Urban : chr "YES" "YES" "YES" "YES" ...
$ Risk : chr "Good" "Good" "Good" "Good" ...
> table(Fraud$Risk)

Good Risky
476 124

You might also like