0% found this document useful (0 votes)
31 views

Genetic Algorithm-Based Feature Selection Method For Credit Risk Analysis

The document discusses using a genetic algorithm-based feature selection method for credit risk analysis. It introduces support vector machines and genetic algorithms for classification. The proposed method uses a genetic algorithm to select important features for credit risk classification with support vector machines, aiming to improve performance over other feature selection methods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Genetic Algorithm-Based Feature Selection Method For Credit Risk Analysis

The document discusses using a genetic algorithm-based feature selection method for credit risk analysis. It introduces support vector machines and genetic algorithms for classification. The proposed method uses a genetic algorithm to select important features for credit risk classification with support vector machines, aiming to improve performance over other feature selection methods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2012 2nd International Conference on Computer Science and Network Technology

Genetic Algorithm-based Feature Selectionmethod for


Credit Risk Analysis

XiaoyunLiu James Huang


School of Business Deparment of Computer Science
Fudan University University of California
Shanghai, China Los Angeles, USA
[email protected] [email protected]

Abstract—Credit risk assessment of financial intermediaries is an between the complexity of algorithm and the sample dimension.
important problem in finance.The key is to find accurate However, high dimensional data from credit assessment problem
predictors of individual risk in the credit portfolios of institutions. may bring difficulty to the classification training as well as
However, accessing credit risk is very challenging as many factors influence the classification accuracy. How to reduce the feature
may contribute to the risk and their relationship is complicated to size of the data and select a group of effective features for SVM
capture. Recentyears have witnessed a growing trend in applying are very important. Feature selection methods are used to solve
statistical and machinelearning modeling methods such as this problem before the classifier is to be trained.
SVMclassifier, for credit risk analysis, which is effective in Feature selection has become the focus of many research
capturing nonlinear relationshipin the data. However, high areas in recent years. With the rapid advance of computer
dimensional training data not only results in time-consuming
science and information technologies, large scale of datasets with
computation but also affects the performance of the classifier. In
this paper, wepropose a wrapper feature selection method based on thousands of attributes are now ubiquitous in the fields of data
genetic algorithm to select a subset of essential featuresthat will mining, pattern recognition, and machine learning [2-3]. It is a
contribute to good performance in the credit risk classification. We challenging task to process such huge datasets because most of
test ourmethod in a real-world credit risk predictiontask, and our the machine learning techniques usually work well only on small
empirical results demonstrate the advantage ofour method over datasets [4]. The task of the feature subset selection is to
other competing ones. addresses this problem mainly by identifyingand eliminating the
irrelevant and redundant features so that the number of
Keywords-feature selection; machine learning; genetic algorithm; dimension of the dataset would drop. It performs to find a small
credit risk analysis; feature subset that can describe the data for a learning task as
good as or better than the original dataset in order to reduce the
I. INTRODUCTION computational cost and provide better understandings of the
Credit risk management has played a key role in financial datasets, as well as achieve highclassification accuracy [5].
and banking industry. Generally credit risk management includes Algorithms for feature selection or attribute reduction can be
credit risk analysis, assessment (measurement) of enterprise classified into two main categories depending on whether the
credit risk and how to manage the risk efficiently, while credit approach uses feedback from the subsequent performance of the
risk assessment is the basic and critical factor in credit risk machine learning algorithm (e.g. SVM for classification task).
management. The main purpose of credit risk assessment is to A filter method is a no-feedback, pre-selection method
measure the default possibility of borrowers and provide the without involving the later machine learning algorithm to be
loaner a decision-aid by conducting qualitative analysis and applied. The data is first analyzed by using statistical techniquein
qualitative computation to the possible factors that will cause order to determine which features describing the data records are
credit risk. At present, the classification method from machine relevant for the class attribute. Afterwards, the relevant features
learning is the most popular method used in credit risk subset is used to train a classifier for prediction. In filter method,
assessment. According to the financial status of the borrowers, no feedback from the subsequent performance of the induction
we can use a credit scoring system to estimate the corresponding algorithm is used. The typical filtermethods are ReliefF
risk rate so that the status can be classified as normal or default. algorithm, chi-squared (χ2) feature selection, information gain
Support vector machine (SVM) [1] is a relatively new (IG) based feature selection, gain ratio (GR) based feature
machine learning technique to train a powerful classifier which is selection, symmetrical uncertainty (SU) based feature selection,
a good to the credit assessment problem for better explanatory etc.[6,7]
power. The structure of SVM has many computation advantages, In contrast, a wrapper method is a feedback method that
such as special direction at a finite sample and irrelevance incorporates the machine learning algorithm in the feature

978-1-4673-2964-4/12/$31.00 ©2012 IEEE 2233 CHANGCHUN, CHINA


selection process. The optimal feature subsetselection is The SVM separates the classes with a decision surface
determined by a search in the space of possible selections. This thatmaximizes the margin between the classes. The surface is
means that the candidate features subsets are generated and then oftencalled the optimal hyper-plane, and the data points closest
are evaluatedby running the induction algorithm through the to thehyper-plane are called support vectors. The support vectors
train and test phases using each selection of features,and arethe critical elements of the training set. The SVM can
providing feedback on its learning performance. The feature beadapted to become a nonlinear classifier through the use
selections methods using random searchor greedy hill-climbing ofnonlinear kernels. While SVM is a binary classifier in
search are representative wrappers methods [8]. itssimplest form, it can function as a multiclass classifier
In general, a wrapper approach should give better results than bycombining several binary SVM classifiers (creating a
a filter method, since it adapts itself to the inherent biases of the binaryclassifier for each possible pair of classes). For
induction algorithm to be used. However a wrapper approach has multiclassclassification, the pairwise classification strategy is
larger computation costs which may make it prohibitive [8]. often used.
Considering the importance of the credit risk, we are seeking a The output of SVM classification is the decision values of
powerful classifier with high accuracy. The wrapper method eachinstance for each class, which are used for probability
consisting of a searching strategy and an induction learning estimates.The probability values represent "true" probability in
algorithm will be a good choice to select an optimum subset of the sensethat each probability falls in the range of 0 to 1, and the
features. Usually an exhaustive search is too expensive, so non sum ofthese values for each instance equals 1. Classification is
exhaustive search techniques like hill-climbing, or random thenperformed by selecting the highest probability.
search are often used. In this paper, we will use the Genetic
algorithm (GA) to search for the best selection of features with
the machine learning algorithm providing the GA’s fitness B. Brief introduction to genetic algorithm
function. Compared to traditional search techniques, GA is more Genetic algorithms (GAs)[9], a form of inductive learning
powerful in locating global optimum. strategy, are adaptive search techniques initially invented by
John Holland. Genetic algorithms derive their name from the fact
that their operations are similar to the mechanics of genetic
models of natural systems. Algorithms are implemented as a
computer simulation in which a population of abstract
representations (called chromosomes or the genotype of the
genome) of candidate solutions (called individuals, creatures, or
phenotypes) to an optimization problem evolves toward better
solutions via using genetic operators e.g. selection, crossover,
mutation, etc.
Figure.2illustrates the genetic operators of crossover and
mutation. Crossover, the critical genetic operator that allows new
solution regions in the search space to be explored, is a random
mechanism for exchanging genes between two chromosomes
using the one point crossover, two point crossover, or homologue
crossover. In mutation the genes may occasionally be altered, for
example, changing the gene value from 0 to 1 or vice versa in a
binary code chromosome.

Figure 1. Models of filter and wrapper

II. GENETIC ALGORITHMS FOR FEATURE SELECTION USING SUPPORT


V ECTOR MACHINE
A. Brief introduction to support vector machine
Support Vector Machines (SVM) is a classification
systemderived from statistical learning theory. It has been
appliedsuccessfully in fields such as text categorization, hand-
writtencharacter recognition, image classification, etc. Figure 2. Illustration of the crossover and mutation operatiors

2234
C. Genetic algorithm-based feature selection
Feature selection problem can be considered as a
III. EXPERIMENTS AND ANALYSIS
combinatorial optimization problem that from feature space
searching the feature subset which can be applied to train a A. Dataset
classifier with maximal classification accuracy rate. Since The dataset used in this paper comes from the private label
genetic algorithm performs a randomized search and is not very credit card operation of a major Brazilian retail chain. There are
susceptible to getting stuck in local minima, it can be used to
50,000 instances in the original dataset, each being labeled as
search for relevant features. In the feature selection problem, the
GA population (the chromosomes) is coded as simple vectors of positive (good) and negative (bad). In this experiment, we use a
binary genes, where 1s represent relevant features. GA subset of the data which contains 5,000 instances with balanced
chromosome is shown in Figure 3. positive and negative labels.Each instance has 32 features,
including client ID, sex, age, education, shopping history,
monthly income, etc. In our experiments, we use 22 features
which we believe are most relevant to credit risk.

B. Exprimental setup
We randomly split the data into a training set (60% of points), a
validation set (20% of points) and a testing set (20% of points).
Figure 3. GA chromosome We use the training set to do feature selection and record the
optimum feature subset, and finally use this small subset of
feature to train the classifier for credit prediction. The SVM
Also, the fitness of solutions is mainly evaluated by training classification [12] accuracy with 10-folds cross validation on
classifier on the training data using only the features the whole dataset is used to measure the effect of the feature
corresponding to 1s in the chromosome, and returning the subset we select. The parameter setting in genetic algorithm are
classification accuracy as the fitness. Besides, the size of feature
set according to our empirical experience: the Population size:
subset is also a factor affecting the fitness of solution. We use the
following equation as the fitness formula: 20; Number of generations: 20; Probability of crossover:
0.6 ;Probability of mutation: 0.03.
We compare several feature selection methods:
= × +

 ReliefF is an instance-based learning methods that
where represents the weight value for classification accuracy,
sample instances randomly from the training set and
for the number of features; is the mask value of the i-th
check neighboringrecords of the same and different
feature, ‘1’ represents that feature is selected; ‘0’ represents classes—“near hits” and “near misses.” If anear hit has
that feature is not selected.It can be inferred that high fitness
a different value for a certain attribute, that
value is determined by high classification and small feature
featureseems to beirrelevant and its weight should be
number [13].Figure 4 illustrates the principle of GA-based
feature selection. decreased.

 Information gain (IG) is based on well-known


informationtheory measure entropy, which
characterizes the purity of an arbitrarycollection of
items and is considered as a metric of system’s
unpredictability[10,11]. Information gain measuring
the expected reduction of entropy caused by
partitioningthe examples according to featureA is
given by:

Sv
IG ( S , A)  H ( S )   S
H ( S v ) ,
vV ( A )

Sc Sc
H (S )    S
log 2
S
,
cC
Figure 4. Genetic based feature selection algorithm

2235
whereS is the item collection, |S| its cardinality; V(A) is TABLE I. RESULT OF EXPERIMENTS
the set of allpossible values for featureA; Sv is the SVM
Method # of features
subset of S for which A has value v; C is theclass Accuracy
collection;S c is the subset of S containing items raw 22 71.140
belonging to class c. In the process of IG-based feature ReliefF 10 58.214
selection, the features are ranked with their Gain Ratio 15 75.808
information gains, and then filter out the non-
Information Gain 11 78.416
significant features by setting an appropriate threshold
in the ranking. Symmetrical Uncertainty 13 79.440
SVM wrapper + Genetic search 12 80.212
 Gain Ratio (GR) evaluates the worth of an attribute by
measuring the gain ratio with respect to the class by the
following formulation: IV. CONCLUSIONS AND FUTURE WORK
In this paper, we address the credit risk analysis problem
GainR(Class, Attribute) = (H(Class) - H(Class | which is a crucial task in finance and management. Our work is
Attribute)) / H(Attribute). based on a machine learning method SVM. We have shown that
how a small size of subset of features for SVM by our feature
 Symmetric uncertaintyis amethod of eliminating selection method based on genetic search algorithm. Our
redundant features as well as irrelevant ones isto select empirical study shows that the selected group of feature can
a subset of features that individually correlate well significantly improve SVM classification accuracy, compared to
several competing methods.
with the class buthave little intercorrelation. The
correlation between two nominal features X and Ycan REFERENCES
be measured using the symmetric uncertainty (SU) [1] Cortes, Corinna; and Vapnik, Vladimir N.; "Support-Vector Networks",
criterion, which also compensatesfor the inherent bias Machine Learning, 20, 1995.
of Information Gain by dividing it by the sum of [2] K.Fukunaga. Introduction to Statistical Pattern Recognition. Academic
theentropies of X and Y: Press,San Deigo, California, 1990.
[3] L. Yu, H. Liu. Efficiently handling feature redundancy in high-
dimensional data, in: Proceedings of The Ninth ACM SIGKDD
H (Y )  H ( X )  H ( X , Y ) 2  IG International Conference onKnowledge Discovery and Data Mining
SU ( X , Y )  2 
(KDD-03), Washington, DC, August, 2003, pp. 685-690.
H ( X )  H (Y ) H (Y )  H ( X )
[4] Lewis P M. The characteristic selection problem in recognition system.
IRETransaction on Information Theory,1962,8, pp.171-178.
whereH is the entropy function. The entropies [5] Kittler J.Feature set search algorithms.Pattern Recognition and Signal
arebased on the probability associated with each Processing.1978,41-60.
feature value; H(A,B), the jointentropy of A and B, is [6] M. Dash and H. Liu. Feature Selection for Classification. Intelligent Data
calculated from the joint probabilities of all Analysis, 1997, Vol. 1, No. 3, pp.131-156.
combinationsof values of A and B. Owing to the [7] Zexuan Zhu, Yew-Soon Ong, Manoranjan Dash.Wrapper-filter feature
selection algorithm using a memetic framework.IEEE transactions on
correction factor 2, SUgets values, which systems, man, and cybernetics. Part B, Cybernetics: a publication of the
arenormalized to the range [0, 1]. A value of SU=0 IEEE Systems, Man, and Cybernetics Society 2007;37(1):70-6.
indicatesthat X and Y are uncorrelated, and SU=1 [8] R. Kohavi and G. H. John. Wrapper for Feature Subset Selection.
meansthat the knowledge of one feature completely Artificial Intelligence, vol. 97, no. 1-2, pp.273-324, 1997.
predictsthe other. Similarly to GR, the SU is biased [9] De Jong, K. “Learning with Genetic Algorithms: An overview”. Machine
Learning Vol. 3, Kluwer Academic publishers, 1988.
toward featureswith fewer values [11].
[10] wei Han, Jian Pei, Yiwen Yin, Mining frequent patterns without candidate
generation, Proceedings of the 2000 ACM SIGMOD international
C. Result conference on Management of data, p.1-12, May 15-18, 2000, Dallas,
The result of experiments in Table 1 indicates that feature Texas, United States.
[11] Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to data
selection algorithm based on genetic algorithm can be used to mining. Addison Wesley Longman, 2006.
deal with feature selection problem, and it is able to reduce the [12] Hsu, Chih-Wei; Chang, Chih-Chung; and Lin, Chih-Jen (2003). A
number ofselected features significantly and produces obvious Practical Guide to Support Vector Classification. Department of Computer
improvement in the classification accuracy. It uses onlyhalf of Science and Information Engineering, National Taiwan University.
the original featuresto obtainhigher classification accuracy. [13] Eiben, A. E. et al (1994). "Genetic algorithms with multi-parent
recombination". PPSN III: Proceedings of the International Conference on
Evolutionary Computation. The Third Conference on Parallel Problem
Solving from Nature: 78–87.

2236

You might also like