Data Mining Approach

The document discusses using data mining classification algorithms like Random Forest, Logistic Regression, and SVM to predict bank credit default based on customer data. It preprocesses data, builds models with these algorithms in Python, and compares the models' performance on metrics like accuracy, recall, precision, and F1-score.

Uploaded by

Iliana Vargas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Data Mining Approach

Uploaded by

Iliana Vargas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

The International Journal of Social Sciences and Humanities Invention 5(06): 4820-4823, 2018

DOI: 10.18535/ijsshi/v5i6.09 ICV 2015: 45.28

ISSN: 2349-2031
© 2018, THEIJSSHI

Research Article
Research on bank credit default prediction based on data mining algorithm
Li Ying
School of Business Administration, China University of Petroleum-Beijing, Beijing, 102249, China

ABSTRACT: It is of great importance to identify the potential risks to the bank's loan customers. Based on data mining
technology, it is an effective method to classify loan customers by classification algorithm. In this paper, we use Random
Forest method, Logistic Regression method, SVM method and other suitable classification algorithms by python to study
and analyze the bank credit data set, and compared these models on five model effect evaluation statistics of Accuracy,
Recall, precision, F1-score and ROC area. This paper use the data mining classification algorithm to identify the risk
customers from a large number of customers to provide an effective basis for the bank's loan approval.

Key words: Bank credit, Risk prediction, Data mining, Classification algorithm, python
INTRODUCTION Data sample collection and preprocessing
In today's information and digit age, bank credit default is still This paper uses the bank credit data set loan_model_sample in
frequent, how to establish an effective model for the the kaggle as the target data set for the study[6]. There are
prediction whether bank customers will default on the loan for 11017 samples and 199 attribute features. After the data
recognition of the risk in bank from a mass of loan applicants collection is completed, the data is viewed and pre-processed.
is of great significance. At present, many scholars at home and The overall framework of data preprocessing is shown in
abroad have studied the probability prediction model of bank Figure 1.
credit default and put forward some forecasting methods
which almost have different restrictions and defects. Beaver Loan Sample Data
proposes to use a single-factor method of financial ratios to
analyze the technical credit default prediction of Split train and test
enterprises[1]. Pomp at al constructed a default prediction
model using multivariate discriminant analysis (MDA)[2]. Missing value processing
Yang at al used Logistic Regression method establishing a Yes
probability prediction model of listed companies’ credit No
Missing value scale<20%?
default, and identified the most influential corporate financial
indicators[3]. Zhang at al proposed a SVM model, which
constructed the technical credit default prediction model of Delete the Mean value
small and medium-sized enterprises by constructing the feasures interpolation

evaluation index system of different input variables[4]. A

large number of studies have shown that the data mining Class imbalance
Test StandardScaler
processing
classification algorithm can find hidden rules from mass data
and use this rule to classify and predict new unknown data[5]. Feature Engineering
Through data mining classification algorithm, banks can set up
classification models by using the relevant personal Dimension specification
information and consumption data of the loan applicants in the
past, and find out the characteristics of risk customers. Then, Figure 1 Data pre-processing framework
use the classification model making classification prediction
Firstly，In order to prevent the occurrence of data leakage
for new loan applicants, from which identify the risk
problem, the data set is divided into two parts: training and
customers, so as to reduce the risk of default repayment.
testing. Training is used to train the model, and testing is used
Therefore, in this paper we use the Random Forest method,
to test the model classification accuracy. Secondly, view the
Logistic Regression method, SVM method and other suitable
missing values of the data set and process the missing values.
classification algorithms to study and analyze the bank credit
The visualization of the missing values of the original data is
data set, and compared these models on five model effect
shown in Figure 2, in which white lines represent missing
evaluation statistics of Accuracy, Recall, precision, F1-score
data. The threshold is set to 20%, and the features whose
and ROC area to identify the risk customers from a large
missing values are greater than the threshold will be directly
number of customers and provide effective approaches for the
deleted, the feature attributes whose missing values are
bank's loan approval.
4820 The International Journal of Social Sciences and Humanities Invention, vol. 5, Issue 06, Jun, 2018
Li Ying / Research on bank credit default prediction based on data mining algorithm
smaller than the threshold will be averaged using the A series of measures for measuring the performance of
preprocessing module of sklearn in scikit-learn. The processed learning systems such as Overall Accuracy, Recall, precision
data visualization is shown in Figure 3, it can be seen that the and F1-score, can be defined based on the confusion matrix.
entire data set has been filled completely. The definition is as follows.
Accuracy is the correct proportion of the overall number of
predictions, it is defined by the formula[11]:

Where TP is the number of true positives, TN is the number of

true negatives, FP is the number of false positives, and the
number of false negatives[12,13].
Figure 2 Missing data visualization
Precision is to measure the proportion of truly positive
samples in samples that are predicted as positive samples by
the classification model[14]:

Recall is a measure of the ability of prediction model to select

instances of a certain class from a data set[7], which is also
represent TPR(true positive rate)[15]:
Figure 3 Missing data visualization after processing
Then, use the StandardScaler in scikit-learn to standardize test
set data and observe whether the class exists the imbalanced F1-score is the harmonic average of precision and recall, it is
issue. Perform feature engineering on datasets and select defined by the formula[11]:
feature attributes that are valid for the classification model.
Finally, in order to avoid overfitting problems and reduce the
complexity of the model, using the PCA method, Pearson
correlation coefficient or other automatic screening feature FPR(false positive rate) is to describe the proportion of the
methods to perform dimension reduction on the training set. model negative class in the samples predicted to be
Establishment and Evaluation of classification models positive[14]:

Based on the pre-processed data sets, use Random Forest[7],

Logistic Regression[8] and SVM[9] method through python ROC(receiver operating characteristic curve) is a technique for
programming language to establish classification models visualizing, organizing and selecting classifiers based on their
respectively and adjust the hyper parameters with performance[15]. It is a comprehensive index reflecting the
GridSearchCV method. Thus, the parameters are obtained continuous variables of sensitivity and specificity. The ROC
when the model classification effect is best. Using five model curve is a two-dimensional curve with FPR as the X axis and
effect evaluation statistics: Overall Accuracy, Recall, TPR as the Y axis, which ranges from (0,0) to (1,1). A
precision, F1-score and ROC area to compared the common method to compare classifiers is to calculate the area
classification prediction effect of these classification models. under the ROC curve, abbreviated AUC[16,17]. The larger the
The most straightforward way to evaluate the classification AUC, the better the model classification is.
model performance is based on the confusion matrix analysis. Since the purpose of the bank credit default model is to
Confusion matrix is a concept from machine learning that identify credit default risk customers from a large number of
contains information about actual classifications and predicted loan application customers, the risk of predicting a risk
classifications done by a classification system. A confusion customer (Class = 0) as a normal customer (Class = 1) is much
matrix has two-dimensions, one is indexed by the actual class larger than predicting a normal customer (Class = 1) to
of an object, the other is indexed by the class that the classifier customers (Class=0). Based on the above analysis, this paper
predicts[10]. In the bank's credit data in this paper, the number pays more attention to whether the model can correctly
0 represents the credit default customer category, and 1 classify Class=0 when assessing model classification results,
represents the normal customer category. The confusion so the classification algorithms results in this paper are all
matrix is shown in Table Ⅲ. from Class=0.
Table Ⅲ CONFUSION MATRIX Findings and discussions
Predicted Class
Class=1 Class=0 This paper uses RandomForest method, LogisticRegression
method and SVM method to establish classification models.
Actual Class Class=1 TP FN
Perform research and analysis on pre-processed bank credit
Class=0 FP TN
default data sets, and use the GridSearchCV method to search
4821 The International Journal of Social Sciences and Humanities Invention, vol. 5, Issue 06, Jun, 2018
Li Ying / Research on bank credit default prediction based on data mining algorithm
for the best parameters. The above three classification model penalty parameter C of the error term and gamma which
construction methods are respectively correspond to represents kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. C
RandomForestClassifier, LogisticRegression, and SVC is the penalty factor. The larger its value is, the heavier the
algorithms in scikit-learn. penalty for misclassification of training samples and the
In the RandomForestClassifier algorithm，what need to higher the requirement for correct classification are. Gamma is
perform GridSearchCV to find the best parameter value in a parameter of the insensitive loss function. The smaller the
order to enhance the prediction effect of classification model value is, the more support vectors are; and the larger the
is n_estimators，which represents the number of trees in the distance between two large edges is, the easier it is to find the
forest. In addition，with the number of trees in the forest maximum hyper-plane Maximum Marginal Hyperplane[5].
After parameter optimization, we obtain the comparison
increasing, the classifier is becoming more and more
results of RandomForestClassifier, LogisticRegression, and
complicated, which probably brings overfitting problem to the
SVC algorithms on the loan_model_sample, as shown in
model. To decrease the model complexity, we also adjust
values of some other parameters in the Table Ⅳ and algorithm ROC curves are shown in Figure 4.
RandomForestClassifier algorithm, such as max_depth The above comparison results show that
min_samples_split、min_samples_leaf and max_features. RandomForestClassifier algorithm has a good classification
effect for large sample as well as high dimensional attribute
In the LogicsticRegression algorithm, we use GridSearchCV
to search the best Parameter combination, one is the feature datasets，because the value of its Recall, F1-score,
penalty(Penalty item) between l1 and l2, the other is C which Accuracy, Precision and ROC area are all larger than the other
represents the reciprocal of the regularization coefficient λ. Classifiers. And the LogisticRegression algorithm has
The objective function can be summarized as follows[18]: relatively high recall, but with the lowest value of precision.
The SVC algorithm has the relatively high accuracy, precision
∑ and F1-score, but with a lowest recall, which to a certain
Where the first item L is the training error, and the second degree shows SVC algorithm may be not suitable for the bank
item is the penalty item. The first item is to minimize the credit default prediction models, because it may well omit risk
training error and get the best fitting data; the second is to loan applicants and cause huge losses to banks. Thus, we
simplify the model, prevent overfitting, and get better consider RandomForest algorithm as the most suitable
generalization ability. algorithm for the bank credit default prediction problem,
SVC algorithm is the classifier model of SVM. In the SVC especially when the dataset is very large or has high
algorithm, we use GridSearchCV to find the best Parameter dimensions.
combination of kernel {‘linear’,‘poly’, ‘rbf’}、 Conclusions
Table Ⅳ COMPARISON OF CLASSIFICATION This paper establish bank credit default prediction models
ALGORITHMS RESUITS using RandomForest, LogisticRegression and SVM
classification algorithms under the python language
precisio recal F1-
Algorithm Accuracy environment. And compare the classification effect of the
n l score
classifiers through five model effect evaluation statistics:
RandomForestCla
0.8863 0.18 0.67 0.28 Accuracy, Recall, precision, F1-score and ROC area.
ssifier
Comparative analysis Experimental results show that
LogisticRegressio
0.7130 0.09 0.64 0.16 compared to LogisticRegression and SVM classification
n
algorithms, RandomForest algorithm is more suitable for the
SVC 0.8580 0.15 0.55 0.24 bank credit default precision model because its high
classification effect for Class=0, especially when the dataset is
AUC=0.71 AUC=0.68 AUC=0.71 very large or has high dimensions. This paper provides an
effective experimental basis for bank credit approval to
identify risk customers from a large number of loan applicants
using data mining classification algorithms.
Since the number of appropriate public data sets for bank
credit is small, the number of samples in this paper is only
11,017, therefore, the experiment may not be comprehensive.
In the future, it is necessary to collect more datasets of large
number and features for further improvement.
References
RandomForestClassifier LogisticRegression SVC [1] Beaver, W. Financial Ratios as Predictors of
Figure 4 Comparison of algorithm ROC curves Failure.Empirical Research in Accounting: Selected
Studied[J]. Journal of Accounting Research, 1966，(4).

4822 The International Journal of Social Sciences and Humanities Invention, vol. 5, Issue 06, Jun, 2018
Li Ying / Research on bank credit default prediction based on data mining algorithm
[2] Pompe, P.P.M., Bilderbe, J. The Prediction of Bankruptcy [18] Hilbe J M. Logistic regression models[M]. CRC press,
of Smalland Medium-sized Industrial Firms[J]. Journal of 2009.
Business Venturing, 2005,20.
[3] Yang Pengbo, Zhang Chenghu, Zhang Xiang. Prediction
model of credit default probability of listed companies
based on Logistic regression analysis [J]. economic
latitude and longitude,2009(02):144-148.
[4] Zhang Jie, Zhao Feng. [J]. statistics and decision making
of SME credit default prediction based on support vector
machine,2013(20):66-69.
[5] Mei Mei. Application of data mining classification
algorithm in credit card risk management [J]. modern
computer,2013(19):13-16.
[6] https://www.kaggle.com/datasets
[7] Liaw A, Wiener M. Classification and regression by
randomForest[J]. R news, 2002, 2(3): 18-22.
[8] Hosmer Jr D W, Lemeshow S, Sturdivant R X. Applied
logistic regression[M]. John Wiley & Sons, 2013.
[9] Joachims T. Making large-scale SVM learning
practical[R]. Technical report, SFB 475:
Komplexitätsreduktion in Multivariaten Datenstrukturen,
Universität Dortmund, 1998.
[10] Deng X, Liu Q, Deng Y, et al. An improved method to
construct basic probability assignment based on the
confusion matrix for classification problem[J].
Information Sciences, 2016, 340: 250-261.
[11] Ohsaki M, Wang P, Matsuda K, et al. Confusion-matrix-
based Kernel logistic regression for imbalanced data
classification[J]. IEEE Transactions on Knowledge and
Data Engineering, 2017, 29(9): 1806-1819.
[12] Branco P, Torgo L, Ribeiro R P. A survey of predictive
modeling on imbalanced domains[J]. ACM Computing
Surveys (CSUR), 2016, 49(2): 31.
[13] Fanshawe T R, Power M, Graziadio S, et al. Interactive
visualisation for interpreting diagnostic test accuracy
study results[J]. BMJ Evidence-Based Medicine, 2018,
23(1): 13-16.
[14] Davis J, Goadrich M. The relationship between Precision-
Recall and ROC curves[C]//Proceedings of the 23rd
international conference on Machine learning. ACM,
2006: 233-240.
[15] Fawcett T. An introduction to ROC analysis[J]. Pattern
recognition letters, 2006, 27(8): 861-874.
[16] Bradley A P. The use of the area under the ROC curve in
the evaluation of machine learning algorithms[J]. Pattern
recognition, 1997, 30(7): 1145-1159.
[17] Breheny P. Classification and regression trees[J]. 1984..

4823 The International Journal of Social Sciences and Humanities Invention, vol. 5, Issue 06, Jun, 2018

Risk-Based Portfolio For Crytocurrencies
100% (1)
Risk-Based Portfolio For Crytocurrencies
53 pages
Predicting Football Scores Via Poisson Regression Model: Applications To The National Football League
100% (1)
Predicting Football Scores Via Poisson Regression Model: Applications To The National Football League
24 pages
Folks 1981 Ideas of Statistics
100% (1)
Folks 1981 Ideas of Statistics
392 pages
Default of Credit Card Clients
No ratings yet
Default of Credit Card Clients
33 pages
Customer Loan Prediction: Term Project Report
100% (1)
Customer Loan Prediction: Term Project Report
11 pages
Aadarsha ML STW
No ratings yet
Aadarsha ML STW
35 pages
2024-04-16 Jagoda Bobińska Pasquale Gravante SHAP
No ratings yet
2024-04-16 Jagoda Bobińska Pasquale Gravante SHAP
11 pages
Forecasting
88% (8)
Forecasting
42 pages
PREDICTING BANK CREDIT RISK USING DATA MINING Group SIX
No ratings yet
PREDICTING BANK CREDIT RISK USING DATA MINING Group SIX
5 pages
Prac Res Module 1-2
No ratings yet
Prac Res Module 1-2
6 pages
Cluster Credit Risk R PDF
No ratings yet
Cluster Credit Risk R PDF
13 pages
Quantitative
100% (1)
Quantitative
14 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
Edtpa Lesson Plans
100% (1)
Edtpa Lesson Plans
10 pages
BigDataAnalytics-enabledSupplyChainTransformation ALiteratureReview
No ratings yet
BigDataAnalytics-enabledSupplyChainTransformation ALiteratureReview
11 pages
LXMLS Guide 2020
No ratings yet
LXMLS Guide 2020
105 pages
(FREE PDF Sample) Power and Society A Framework For Political Inquiry 1st Edition Harold D. Lasswell Ebooks
100% (4)
(FREE PDF Sample) Power and Society A Framework For Political Inquiry 1st Edition Harold D. Lasswell Ebooks
84 pages
Taxi Fare Prediction Using Random Forests
No ratings yet
Taxi Fare Prediction Using Random Forests
10 pages
Ramal Books
57% (7)
Ramal Books
4 pages
Bank Loan Prediction Using ML
No ratings yet
Bank Loan Prediction Using ML
65 pages
B.E Cse Batchno 149
No ratings yet
B.E Cse Batchno 149
43 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
89 pages
Arpit Pal E2 17 Report Loan-Prediction-System
No ratings yet
Arpit Pal E2 17 Report Loan-Prediction-System
34 pages
Asme PTC 60
No ratings yet
Asme PTC 60
5 pages
Credit Score Kisutsa - Loan Default Prediction Using Machine Learning, A Case of Mobile Based Lending
No ratings yet
Credit Score Kisutsa - Loan Default Prediction Using Machine Learning, A Case of Mobile Based Lending
51 pages
DataMining - CaseStudy
No ratings yet
DataMining - CaseStudy
48 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Random Forest
No ratings yet
Random Forest
225 pages
Make 06 00004
No ratings yet
Make 06 00004
25 pages
My Answer Prelim Lab AI 33 50
No ratings yet
My Answer Prelim Lab AI 33 50
21 pages
Rapport Loan Prediction Finance
No ratings yet
Rapport Loan Prediction Finance
24 pages
Raf 07 2017 0143
No ratings yet
Raf 07 2017 0143
26 pages
1 PB
No ratings yet
1 PB
13 pages
Solution Demand Forecasting (Part 1) FOR STUDENTS
No ratings yet
Solution Demand Forecasting (Part 1) FOR STUDENTS
27 pages
Data 08 00169
No ratings yet
Data 08 00169
17 pages
Coser Al. Crisan Albu (T)
No ratings yet
Coser Al. Crisan Albu (T)
17 pages
LottoArchitect 2 2-Helpfile
No ratings yet
LottoArchitect 2 2-Helpfile
39 pages
2022 V13i1198
No ratings yet
2022 V13i1198
12 pages
Loan Default Prediction Using Decision Trees and R
No ratings yet
Loan Default Prediction Using Decision Trees and R
13 pages
ML Implementation in Lending and Credit Scoring in Rural Areas
No ratings yet
ML Implementation in Lending and Credit Scoring in Rural Areas
24 pages
EasyChair Preprint 8693
No ratings yet
EasyChair Preprint 8693
22 pages
SSRN Id4532468
No ratings yet
SSRN Id4532468
13 pages
Madaan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012042
No ratings yet
Madaan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012042
13 pages
CreditMetrics ML-Powered Loan Default Risk Detection
No ratings yet
CreditMetrics ML-Powered Loan Default Risk Detection
30 pages
Assessment of Default Risk Factors in The Disbursement of Home Loans
No ratings yet
Assessment of Default Risk Factors in The Disbursement of Home Loans
13 pages
Yousra 032
No ratings yet
Yousra 032
11 pages
Explaining Black-Box Algorithms Using Probabilistic
No ratings yet
Explaining Black-Box Algorithms Using Probabilistic
23 pages
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
11 pages
Bank Loan Approval Prediction Using Data Science Technique (ML)
No ratings yet
Bank Loan Approval Prediction Using Data Science Technique (ML)
10 pages
Risks 10 00146 v2
No ratings yet
Risks 10 00146 v2
11 pages
Project Report - Lendingclub - FINAL
No ratings yet
Project Report - Lendingclub - FINAL
24 pages
10.3934 Dsfe.2024009
No ratings yet
10.3934 Dsfe.2024009
14 pages
Credit Loan Default Prediction
No ratings yet
Credit Loan Default Prediction
22 pages
Data-Driven Product Returns Prediction: A Cloud-Based Ensemble Selection Approach
No ratings yet
Data-Driven Product Returns Prediction: A Cloud-Based Ensemble Selection Approach
11 pages
A Theoretical and Empirical Investigation of Job Satisfaction and Intended Turnover in The Large Cpa Firm
No ratings yet
A Theoretical and Empirical Investigation of Job Satisfaction and Intended Turnover in The Large Cpa Firm
16 pages
Xtreme Boosting Machine
No ratings yet
Xtreme Boosting Machine
5 pages
Fin Irjmets1651834789
No ratings yet
Fin Irjmets1651834789
8 pages
Algorithm Comparison For Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk
No ratings yet
Algorithm Comparison For Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk
10 pages
Displayed Uncertainty Improves Driving Experience and Behavior: The Case of Range Anxiety in An Electric Car
No ratings yet
Displayed Uncertainty Improves Driving Experience and Behavior: The Case of Range Anxiety in An Electric Car
10 pages
The Role of Validation in Toxicology
No ratings yet
The Role of Validation in Toxicology
8 pages
Marked
No ratings yet
Marked
10 pages
Predictive Analytics Siegel en 27852
No ratings yet
Predictive Analytics Siegel en 27852
7 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
Credit Risk Analysis
No ratings yet
Credit Risk Analysis
6 pages
Loan Approval Prediction Using DM Techniques: Pusendra Chaudhary, Sumit Chaudhary, Arpan Mahatra
No ratings yet
Loan Approval Prediction Using DM Techniques: Pusendra Chaudhary, Sumit Chaudhary, Arpan Mahatra
8 pages
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
No ratings yet
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
6 pages
An Automatic Credit Analysis Model
No ratings yet
An Automatic Credit Analysis Model
12 pages
Loan Approval Prediction Using Supervised Learning Algorithm
No ratings yet
Loan Approval Prediction Using Supervised Learning Algorithm
11 pages
Algorithm For The Loan Credibility Prediction System: Soni P M, Varghese Paul
No ratings yet
Algorithm For The Loan Credibility Prediction System: Soni P M, Varghese Paul
8 pages
Loan Eligibility Prediction
No ratings yet
Loan Eligibility Prediction
14 pages
Building Energy Use Prediction Using Time Series Analysis
No ratings yet
Building Energy Use Prediction Using Time Series Analysis
5 pages
Beyond Risk-Parity
No ratings yet
Beyond Risk-Parity
9 pages
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
No ratings yet
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
4 pages
2818-Article Text-5218-1-10-20210411
No ratings yet
2818-Article Text-5218-1-10-20210411
5 pages
Enterprise Credit Risk Evaluation Based On Neural Network Algorithm
No ratings yet
Enterprise Credit Risk Evaluation Based On Neural Network Algorithm
8 pages
Stock Market Prediction Using LSTM: Abstract
No ratings yet
Stock Market Prediction Using LSTM: Abstract
6 pages
IDS 575 Project Report
No ratings yet
IDS 575 Project Report
9 pages
Loan
No ratings yet
Loan
4 pages
Predicting The Trends of Quality-Oriented Jobs
No ratings yet
Predicting The Trends of Quality-Oriented Jobs
3 pages
Proposed Research Direction For Sustainable Smes in Bangladesh
No ratings yet
Proposed Research Direction For Sustainable Smes in Bangladesh
13 pages
Credit Scoring Through Data Mining Approach A Case Study of Mortgage Loan in Indonesia
No ratings yet
Credit Scoring Through Data Mining Approach A Case Study of Mortgage Loan in Indonesia
5 pages
Effects of Parental Expectations and Cultural Values Orientation On Career
No ratings yet
Effects of Parental Expectations and Cultural Values Orientation On Career
10 pages
Credit Loan Default Prediction Based On Data Mining
No ratings yet
Credit Loan Default Prediction Based On Data Mining
4 pages
Loan Prediction System Using Machine Learning
No ratings yet
Loan Prediction System Using Machine Learning
4 pages
Rockview University
No ratings yet
Rockview University
7 pages
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
No ratings yet
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
4 pages
Loan Prediction
No ratings yet
Loan Prediction
3 pages
Time Series Prediction With Independent Component Analysis
No ratings yet
Time Series Prediction With Independent Component Analysis
6 pages
Mining Technology Volume 117 Issue 3 2008 (Doi 10.1179 - 037178409x405741) Singh, T. N. Dontha, L. K. Bhardwaj, V. - Study Into Blast
No ratings yet
Mining Technology Volume 117 Issue 3 2008 (Doi 10.1179 - 037178409x405741) Singh, T. N. Dontha, L. K. Bhardwaj, V. - Study Into Blast
6 pages
15 1 21 Popoola
No ratings yet
15 1 21 Popoola
5 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Loan Prediction Using Logistic Regression in Machine Learning
No ratings yet
Loan Prediction Using Logistic Regression in Machine Learning
1 page
Credit Approval Data Analysis Using Classification and Regression Models
No ratings yet
Credit Approval Data Analysis Using Classification and Regression Models
2 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Ds4A: Foundational Prep Materials: Resources
No ratings yet
Ds4A: Foundational Prep Materials: Resources
1 page
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods
No ratings yet
Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods
17 pages

Data Mining Approach

Uploaded by

Data Mining Approach

Uploaded by

The International Journal of Social Sciences and Humanities Invention 5(06): 4820-4823, 2018

DOI: 10.18535/ijsshi/v5i6.09 ICV 2015: 45.28

evaluation index system of different input variables[4]. A

Where TP is the number of true positives, TN is the number of

Recall is a measure of the ability of prediction model to select

Based on the pre-processed data sets, use Random Forest[7],

You might also like