MPRA - paper - 57298 - Формулы !!!!!!!!!!!!!!!!!!!!!
MPRA - paper - 57298 - Формулы !!!!!!!!!!!!!!!!!!!!!
MPRA - paper - 57298 - Формулы !!!!!!!!!!!!!!!!!!!!!
18 February 2012
Online at https://mpra.ub.uni-muenchen.de/57298/
MPRA Paper No. 57298, posted 14 Jul 2014 23:40 UTC
MODELLING OF EAD AND LGD
Empirical Approaches and Technical Implementation*
(Pre-typeset version)
(Published in “Journal of Credit Risk”, Vol. 8/No. 2, 2012)
Abstract
The Basel Accords have created the need to develop and implement models for PD, LGD
and EAD. Although PD is quite well researched, LGD and EAD still lag both in theoretical
and practical aspects. This paper proposes some empirical approaches for EAD/LGD
modelling and provides technical insights into their implementation. It is expected that
modellers will be able to use the tools proposed in this paper.
Key words: Basel, EAD, LGD, WOE, Naïve Bayes, mixture density, neural network
1. Introduction
This paper proposes some practical approaches to modelling Loss Given Default (LGD)
and Exposure at Default (EAD). These two measures are required by the BASEL
Accords.
Several references ([1], [3]) give an overview of problems and restrictions encountered
with LGD/EAD modelling. The focus of this paper is on practical techniques that will
lead to feasible implementation and improvements in LGD/EAD modelling. These
techniques can also be applied in other areas of predictive modelling, especially in fast-
paced financial institutions.
This paper briefly surveys current modelling methodologies, then proposes some
empirical approaches, and provides technical insights into their implementation. It is
expected that modellers will be able to use these proposed tools for EAD/LGD modelling
*
Bill Huajian Yang, Ph.D in mathematics, is a Senior Manager, and Mykola Tkachenko, Ph.D in physics, is
Director of the Commercial Risk Methodology & Analytics team, Bank of Montreal, Toronto, Canada.
The views expressed in this article are not necessarily those of Bank of Montreal or any of its affiliates.
Please direct any comments to [email protected], phone 416-643-1922, and/or
[email protected], phone 416-927-5660.
1
as well as other predictive modelling. Performance comparison for all proposed tools is
provided in Section 8 using our own software implementation. The authors would like to
thank our colleagues Ping Wang, James Fung, and Jason Zhang for many valuable
conversations, Clovis Sukam for his critical comments and proofreading of this article.
where amount recovered sums up all discounted cash flows received during the recovery
process after default, less the total cost incurred.
There are major differences between PD and LGD modelling. While LGD is a continuous
variable and usually follows a beta distribution, default events (PD) are binomial. LGD
depends on the recovered amount, which may take several years after default to resolve,
whereas PD describes the likelihood of a default event occurring within a specified
period (usually 1 year). Information about events occurring after default has no effect on
PD.
There is a lack of reliable historical data for LGD (and EAD). Interest in LGD (and EAD)
data collection started in years 1996 - 2001 when specific mandatory BASEL
requirements were imposed on financial institutions in order to become AIRB (advanced
internal rating bands) compliant.
The non-normality of LGD (and EAD Factor) distribution calls for an explicit
transformation so that the target variable follows a standard normal distribution. This will
alow one to use a linear regression with a normally distributed target variable to get an
LGD prediction as proposed in [3]. Although this approach allows one to build LGD
models on a transformed target variable, the inverse transformation (reverting and
predicting the actual target), usually exhibits large errors. This situation is due to the
gradient of the inverse transformation. This gradient is usually higher in the tails, and
hence, a small error with the transformed target may turn into a much bigger conversion
error. This challenge suggests practical tip #1:
When linear regression is used, it is best to train the model by minimizing the error
of the actual target, not the error of the transformed target.
Another problem associated with LGD models is the use of information on the collateral
for the loans. One needs to appropriately integrate collateral values for different collateral
2
types so that the integrated collateral values can become input variables for the LGD
model. The challenge with integrated collateral values is that many lenders and business
people believe that there exists a linear relationship between collateral values and LGD.
Since very often data suggest that LGD distribution over collateral values is non-linear,
tensions can quickly build up between modellers and business people over the use of
collaterals in LGD modelling. Other variables, like company size for example, can also
become a bone of contention ([4]). This suggests practical tip #2:
Denote
Thus EAD Factor is the proportion of undrawn 0 to drawn down at default time. The
predicted exposure amount at default is then calculated as:
The Max function applies in the definition, as Basel requires one to model the risk of
further drawing down at time of default. Another practical option for the EAD Factor
definition is to remove the Max function and model the EAD Factor between -1 and 1
(floored at -1 and capped at 1), then floor the prediction at 0. Modelling the EAD Factor
this way captures the two-directional spending behaviour of drawing more and paying
down as well. It is worthwhile to mention that modeling the EAD factor within a range
between -1 and +1 and flooring the resulted EAD prediction at 0 could lead to
underestimated EAD values.
3
Another option in selecting a target variable in Exposure at Default modelling is to model
the facility utilization change, which is defined as:
Bal1 Bal0
Util _ Ch
Auth0
floored at 0 and capped at 1. Both the EAD Factor and Utilization Change model the
outstanding dollar amount change with the first as a fraction of undrawn amount
( Undrawn 0 ) and the latter as a fraction of current authorized limit ( Auth0 ).The EAD
Factor is usually more difficult to model as the undrawn dollar amount for some facilities
could be very small thus inflating the EAD Factor. Both types of models are good when
converting back to predict the EAD dollar amount (or as fraction of current authorized
limit Auth0 ).
Possible other target variables for an EAD model include utilization ratio or the EAD
dollar amount. Please refer to ([10]) for a review of possible target variables for EAD
models.
We recommend modelling the EAD Factor or Utilization Change for the following
reasons:
(a) Basel requires one to floor the estimated exposure at current outstanding, which
means one needs only to focus on the change of exposure.
(b) Both are ratio variables, dimensionless (unlike dollars and cents which have a
unit), thus will not be impacted by the magnitude of scale, and is within a narrow
range between 0 and 1.
In following discussion, we choose to model the EAD Factor floored at 0 and capped at
1, rather than modelling the EAD dollar amount directly. As the outstanding amount at
default ( Bal1 ) varies significantly from a very low dollar amount to an extremely high
dollar amount, modelling the EAD dollar amount directly would be statistically difficult.
It turns out that even though an EAD Factor model such as the Logit model shown in
Section 8 may have a low R squared ( RSQ ) .In the following example, the RSQ is only
0.27,therefore it can translate into a much higher RSQ (like 0.91 below) when converting
to predict the EAD dollar amount using the above formula. This suggests practical tip
#3:
Choose to model the EAD Factor or Utilization Change rather than the outstanding
dollar amount at default.
By the definition of EAD Factor, one needs to divide the sample into two segments
before modelling; one with undrawn 0 0 , the other with undrawn 0 0 . We need to
model the EAD Factor for those with undrawn 0 0 only. This suggests practical tip #4:
4
Model the EAD Factor only for those facilities with undrawn 0 0 .
Different financial institutions, especially wholesale lending portfolios, face some
common problems with EAD modelling. As an example, since the EAD Factor
distribution usually exhibits higher concentration around 0 and 1 after flooring at 0 and
capping at 1, a linear regression model will predict only a narrow range around the target
variable average. Some other problems include, having a small model development
sample size, and small pool of candidates for covariate selection, especially for
companies that are not publicly traded, like wholesale portfolios.
It turns out that macroeconomic effects impact the EAD Factor in the following way:
when business is booming for more than one year, or when there is a downturn for more
than one year, the EAD Factor starts to climb during the 2nd year. For this reason, we
prefer not to include any macro variable in the EAD Factor model. This suggests
practical tip #5:
Choose to model the EAD Factor or Utilization Change model with no macro
variables.
3.1. Variable Transformation. Leaving selection of the target variable to the preference
of the practitioner, the first proposed innovation for LGD/EAD modelling is the
technique of Weight of Evidence (WOE) transformation for independent variables. It
applies to both types of variables; numeric and character. As will be shown later, such
variable transformation allows one to tackle problems with optimum selection of
variables, issues with outliers, as well as problems with imputation of missing values.
5
WOE methodology is quite well known for Risk models, but surprisingly, it is not widely
used for Basel specific models. As previously mentioned in 2.1, the WOE approach could
accommodate business judgement as well.
3.2 Variable Selection. With all independent variables WOE transformed, a Naïve
Bayesian (see Section 5) model can be employed for variable selection. Each time, it
selects one variable that gives the highest lift to the Naïve Bayesian model. Naïve
Bayesian selection methodology is better than using the SAS stepwise selection
procedure for logistic or linear regression, which also includes variables with negative
coefficients. With WOE transformed variables, a negative coefficient in general
indicates noise effect, and will most likely not be accepted by business partners.
3.3 Model Structure. We consider the following models, which are either trained by
maximizing likelihood or minimizing the least square error:
As both LGD and EAD Factor distributions usually exhibit high concentration around 0
and 1, the mixture model or single layer neural network demonstrate significant
improvement over the logit model that uses only raw variables (with no WOE
transformation). Even a simple model like the Naïve Bayesian model sees a decent
improvement (see Section 8).
6
For a fixed k >1, the k -nearest neighbour method estimates E ( y | x) by taking the average
of k values of y : y1 , y 2 ,..., y k , corresponding to k nearest points relative to the current
point x (not including the current point x ):
y ~ y1 y 2 ... y k / k
The resulting algorithm is called the k -NN algorithm in machine learning, which is a
type of instance-based learning ([5] pp.14-16, pp.165-168, [13]).
A special case when vector x consists of only a single variable leads to the following
concepts of WOE transformation:
The WOE transformation for a numerical variable z consists of the following steps:
We call the derived variable w(z ) the WOE transformation for the variable z . The idea
with WOE transformation is that it gets the model right on the individual variable level
first, regardless of what we do subsequently.
For a class variable, the WOE transformation can be implemented by first calculating
the average of y for each class level, then grouping based on business and statistical
considerations.
Step (b) in general requires adaption when the chosen model structure is not linear.
Below are two practical examples.
7
log( p /(1 p)) ~ b0 b1 x1 b2 x 2 ... bm x m
p P ( y 1 | x) ,
1 p P ( y 0 | x) .
p P( x | y 1) P( y 1)
,
1 p P( x | y 0) P( y 0)
p P( x | y 1) P( y 1)
log log log .
1 p P ( x | y 0) P( y 0)
P( y 1)
The second term log is just the population log odd, which does not depend
P( y 0)
P( x | y 1)
on x . While the first term (when x reduces to a single variable) log
P( x | y 0)
suggests the following adaption to (b) of Section 4.1:
bdist
WOE log .
gdist
or that the logit model is a model component, as with the mixture model or single layer
neural network discussed in Section 6.
8
(b2) Calculate the average of y values over a partitioned interval and denote it
by avg(y). Compute the WOE value for that interval as:
Such application should also provide other functionalities including: printing the SAS
code for the WOE transformation, allowing interval breaking and regrouping, calculating
the predictive power for the transformed variable, and outputting the bucketing for
reporting and documentation.
A Naïve Bayesian model calculates the conditional probability under some assumptions
of independence. We show in this section how the WOE technique contributes
interestingly to the implementation of Naïve Bayesian models.
Under the Naïve Bayes assumptions, variables x1 , x 2 ,..., x m are independent conditional
on y 1 and y 0 , respectively.
P( y 1 | x) P( x | y 1) P( y 1)
P( y 0 | x ) P( x | y 0) P( y 0)
P( x1 | y 1) P ( x 2 | y 1) ... P( x m | y 1) P( y 1)
.
P( x1 | y 0) P( x 2 | y 0) ... P( x m | y 0) P( y 0)
9
P( y 1) P( x1 | y 1) P( x 2 | y 1) P( x m | y 1)
log( p /(1 p)) log log log ... log
P ( y 0) P( x1 | y 0) P ( x 2 | y 0) P ( x m | y 0)
w0 w( x1 ) w( x 2 ) ... w( x m ) ,
where
P( y 1)
w0 log ,
P( y 0)
P( xi | y 1)
w( x i ) log .
P ( x i | y 0)
The constant w0 is just the population log odd as seen before. But w( x i ) is something
interesting, as it can be estimated by the WOE transformation described in adaption (b1)
of Section 4.2.
This means that the Naïve Bayesian model can be interpreted simply as a model with log
odd score being given by summing up the WOE transformed variables and the population
log odd:
The Naïve Bayesian model is a particularly simple model in structure, but is very robust
in performance for ranking order, and no training of the parameters is required.
Although the above discussion is specific to the case when the dependent variable y is
binary, it can be extended to the case when y is continuous with 0 y 1 . The only
difference is that we need to calculate WOE by adapting (b2) of Section 4.2 and change
log( p /(1 p)) in (1) to log( y /(1 y )) .
To implement the Naïve Bayesian model for either a binary dependent variable y, or a
continuous dependent variable 0<y<1, such as the EAD Factor and LGD, we take the
following steps:
(a) Apply the WOE transformation appropriately (either adapt (b1) or (b2) of
Section 4.2) to independent variables
(b) Use stepwise selection of variables that add incremental performance lift
(c) Output the model in the forms of (1)-(3)
10
Although the Naïve Bayesian model rank orders well, one may run into a problem when
the magnitude of the predicted value matters. This is because Naïve Bayes assumptions
are generally not strictly satisfied ([8], [14]). Under this circumstance, a segmentation
step can be applied:
(d) create score bands for the built Naïve Bayesian model and map a score band to
the average of y values over that band
We propose in this section a few methodologies in addition to the Naïve Bayesian model
discussed previously. We will compare their performance in Section 8.
1 if y a,
H
0 else
1 if y b,
L
0 else
p ( p1 p 2 ) / 2
Typically, the composite score p exhibits stronger ranking power than either p1 or
p 2 individually. Keep in mind that as long as a model rank orders well, bias in
11
magnitude can be corrected by the segmentation methodology as mentioned in
Section 5.2 (d).
For EAD Factor and LGD modelling, the least square minimization (b) is generally
preferred. In this case, we label this least square logit model as LS Logit. We will
discuss the LS Logit models in Section 6.3.
Alternatively, one can use the regular SAS logistic procedure for a binary dependent
variable to train a model predicting y, following the steps below:
(a) augment the original sample D ( x, y ) in such a way: for each record in D
duplicate the same record 100 times (or 1000 times if higher precision is
required). For example, if f 0 stands for a record for a facility in the sample D ,
then inside the augmented dataset there are 100 duplicate records for f 0 .
(b) Define a binary dependent variable H for the augmented sample as follows:
suppose for record f 0 in D the y value rounds up to k / 100 , where
0 k 100 is an integer. Among these 100 duplicate records for f 0 in the
12
augmented sample, assign k of them to have H 1 , and the rest 100 k to have
H 0.
The frequency of event H 1 for the given facility f 0 is k (out of 100). We thus have
transformed the target variable y into the probability of the event H 1 for the given
facility f 0 with the augmented sample. One can then apply the WOE transformation for
a binary target (as in Section 4.2) to independent variables, and train the probability
model predicting the probability of H 1 using the regular SAS binary logistic
regression.
This methodology is essentially the same as using SAS logistic events/trials syntax if
one scales up y by 100 (that is, replace y by y 100 ) and change “y/1” to “y/100”, as in
the following:
Although both methodologies result in essentially the same parameter estimates (except
possibly the intercept, due to the fact that the binary WOE transformation differs from the
continuous version of WOE transformation by a constant), scaling up y by 100 usually
decreases the p value for the significance of a variable to be included in the model.
This is a model where model (4) is trained by minimizing the total least square error:
( y p) 2
where
1
p ,
1 exp( log odd score )
Usually, for generalized linear models, including the logit model, it is assumed that the
error term should follow a distribution belonging to exponential family ([9]), which is not
normal in the case of logit model. As maximizing likelihood is not necessarily equivalent
to minimizing the total least square error (unless the error term is normal), general logit
model differs in general from the least square logit model. We do not assume that the
13
error term for least logit model follows a normal distribution, as in the case of Single
Index model described in [2].
Technical Implementation
Training the LS Logit model can be implemented following the Iteratively Re-Weighted
Least Square algorithm as described in ([5] p349). This algorithm applies to the training
of models of the form:
y ~ h(b0 b1 w( x1 ) b2 w( x 2 ) ... bm w( x m )) ,
We mention here that with SAS logistic procedure, the maximum likelihood estimation
of parameters is also implemented using the Iteratively Re-Weighted Least Square
algorithm ([12] or [6]). However, the weight assigned for each iteration differs from
what is assigned here for an LS Logit model. This is because their objective functions for
optimization are different.
When capped at 1 and floored at 0, the distribution of the EAD Factor or LGD exhibits
heavy concentration around 0 and 1, as mentioned previously. Recall the mixture model
for cluster analysis for unsupervised learning ([15]), where we model probability density
by assembling the component densities from individual clusters (density f i ( x, bi ) for i-th
cluster) as in the following:
f ( x, b) ~ p1 f1 ( x, b1 ) p 2 f 2 ( x, b2 ) ... p m f m ( x, bm ) .
Here, b {b1 , b2 , ..., bm } are parameters to be determined, and p i is the prior probability
of falling into the i-th cluster. This suggests an adapted mixture model (for our
supervised learning) of two components: one component that models the low y value (y
is the LGD or EAD Factor in our case) cluster and another that models the high y value
cluster.
Adding another component to address the median y values usually improves the
prediction accuracy. This middle component serves as a correction component between
the high and low y value clusters.
14
A 3-component mixture model is defined as:
1
h( z ) .
1 exp( z )
Here parameters p1, p 2 , p 3 are nonnegative numbers just like the prior probabilities for
the mixture models for the cluster analysis, satisfying
p1 p 2 p 3 1 . (6)
When the constraint (6) is removed, the model is called a 3-component single layer
neural network. In either case, the parameters can be trained by minimizing the total
least square error of the prediction.
Compared to the Logit or LS Logit model, a mixture model or single layer neural
network usually demonstrates strong predictive power and high accuracy even with just a
minimal increase of the number of parameters.
The above mixture models and single layer neural networks can be implemented and
trained by using the conjugate gradient algorithm and the Fletcher-Reeves method as
described in ([7] pp121-133, [16] pp424-429, [5] p355): first, find a decreasing trend for
the total least square error, then perform a line search along this direction to ensure that a
sufficient decrease has been found with the total least square error.
With the WOE technique, one can knock out a variable during the training process when
the corresponding coefficient becomes negative. As mentioned earlier, negative
coefficients usually indicate noise effect and are rarely accepted by business partners.
15
7. Boost Modelling
A situation can arise where we are not satisfied with the model built, probably because of
its bias in prediction for some score bands. In addition to the segmentation methodology
mentioned in Section 5.2 (d), we can use the following boost strategies to improve the
accuracy:
With methodology (a), we simply choose the model prediction error as the new
dependent variable and train a decision tree using all available variables (no WOE
transformation is required), including the built model score. In general, we get a decent
improvement in accuracy.
Let ptrend denote the value of y predicted by the model built (base model). A scalar
model can be trained by scaling an exponential factor to the base model:
where z1 , z 2 ,..., z k are either indicator variables denoting the score bands that require
adjustment for the prediction error, or other available variables that give lift to this scalar
model. The resulting boost model needs to be capped at 1 (because 0<y<1).
Scalar boost models can be implemented and trained similarly by using the conjugate
gradient algorithm and the Fletcher-Reeves method as described in ([7] pp121-133, [16]
pp424-429) through the minimization of the total least square error.
In this section, we discuss how linear regression can be used to improve the base model
prediction.
First, we transform all independent variables, including the built base model
score ptrend , into WOE format following (a)-(c) in Section 4.1. At this point, if we
regress y on these WOE transformed variables, we usually end up with just one variable
in the model; the base model score ptrend . This is because the base model has already
absorbed all the possible available information.
16
Therefore, more diligence is required as suggested in (a) and (b):
For example, we can divide the sample into high/low segments using indicators H and L ,
where:
1 if ptrend c,
H
0 else
L 1 H
We can then fit a linear sub-model to each segment, individually forcing the WOE
transformed ptrend variable to be included in the new models (as the base score). Let
p1 and p 2 be the sub-model scores for H and L segments respectively, both capped at
1. Then the final model can be given by:
p H p1 L p 2 .
where w(z ) is the WOE transformed variable for variable z following (a)-(c) in Section
4.1. Usually, these types of variables exhibit higher predictive power.
Applying the Bayes theorem, we can show that a boost variable of the form (7) usually
carries more information than w(z ) or ptrend individually, provided that variable z had
not been included in the base model.
Then we have:
P( x, z ) P( x) P( z ),
P( x, z | B) P( x | B) P( z | B)
Thus,
17
P ( B, x , z )
P( B |x, z) =
P ( x, z )
P ( B ) P ( x, z | B )
P ( x, z )
P( B) P( x | B) P( z | B)
P( x, z )
P( x) P( z ) P( B | x) P( B | z )
P( x, z ) P( B )
P( B | x) P( B | z )
P( B)
This means P(B |x) * P(B |z) differs from P(B |x, z) only by a constant 1 / P( B) . Because
P(B |x, z) carries more information than P(B |x) or P(B |z) individually, we conclude that
ptrend w(z ) , which is an estimate for P(B |x) * P(B |z), is more likely to be more
predictive than ptrend and w(z ) individually.
In this section we present the model performance results. The following models
were trained using either SAS procedures or our own software implementation:
We focused on EAD Factor modelling and similar results were obtained for the LGD
case.
We first applied the WOE transformation to all independent variables following Section
4.2. example B (WOE transformations for the following 8 variables were verified and
confirmed with business partners), and selected only those variables, using the Naïve
Bayesian model selection methodology, that demonstrated significant incremental
18
performance lift. Below is a list of variables selected from this stepwise selection
procedure:
As we are working on facility level EAD (not borrower level), both borrower and facility
level utilization and collateral percentage are relevant. Authorized amount acts as a
potential exposure, and total assets value measures the size of the entity. We know both
entity size and industry segment are important risk drivers. We did not include any
macroeconomic variables.
For the above 7 models, we used the same 8 variables. Except for the Logit Raw, which
uses the raw form of these 8 variables, all other models were trained using the same 8
WOE transformed variables. We dropped a variable unless its corresponding coefficient
in the model was positive (according to our interpretation of WOE definition). This was
done essentially to avoid the potential risk of including noise effect in the model as
mentioned before, making the model simpler and robust.
The table below shows the summary performance statistics for each of the listed
7 models (MAD=mean absolute deviation (error), RMSE=root mean square error,
RSQ=R-square, KSD= Kolmogorov–Smirnov statistic):
From this table, we see that all other 6 models are significantly better than the
Logit Raw model. This shows the value of the WOE transformation. Even for a
simple model like the Naïve Bayesian model WOE provided decent
improvements. It turns out that Mixture model or Neural Net of 3-components is
particularly a good candidate for EAD Factor or LGD modelling.
19
Performance usually lifts up significantly when the predicted EAD Factor is
translated into a prediction of an EAD dollar amount or the EAD dollar amount as
a fraction of the current authorized limit using the following formulas:
For example, for the LS Logit model, we have RSQ 0.27 . But when it is
converted into a prediction of the EAD dollar amount, we have RSQ 0.91 .
Finally, we present the performance results for the boost methodologies, boosting
the LS Logit model built previously (the base model):
For the Scalar boost, we cut the base model score ptrend into 8 score bands,
using a decision tree software, and trained a scalar boost model as described in
Section 7.1 using our own software implementation.
For the Linear Reg, we divided the sample into H and L segments using the
base score ptrend , as described in Section 7.2, with the cutting point given by:
Performances for both models have slightly improved compared to the base
model.
The above results are based on a sample of 500 commercial borrowers, which is
relatively small compared to the retail case, where we usually have a much larger sample,
depending on products we are working with. With retail EAD and LGD, industry code is
replaced by product type, while collateral percentage and total assets value are simply not
there in the data for retail revolving products. Utilization, credit bureau report, recent
delinquency records, and the activeness of looking for more credits, are among the
important drivers. With large retail samples, the risk patterns fitted from the sample for
variable bins, through WOE transformation, are usually much stable, resulting in better
model performance eventually.
20
REFERENCES
[2] Gery Geenens, Michel Delecroix, A Survey about Single-Index Models Theory,
International Journal of Statistics and Systems, ISSN 0973-2675 Vol.1 No.2
(2006), pp. 203-230
[3] Greg M. Gupton, Roger M. Stein, LossCalc: Model for Predicting Loss Given
Default (LGD). February 2002, Moody’s KMV
[4] Jens Grunet, Martin Weber. Recovery Rates of Commercial Lending: Empirical
Evidence for German Companies, SSRN, 2007
[5] Jerome Friedman, Trevor Hastie, and Robert Tibshirani, The Elements of Statistical
Learning, 2nd Edition, Springer
[7] Jorge Nocedal, Stephen J. Wright, Numerical Optimization, 2nd Edition, Springer
[8] Kim Larsen, Generalized Naive Bayes Classifiers, Sigkdd Exploration, Vol 7, Issue1,
pp76-81
[9] McGullagh, P. and J.A. Nedler, Generalized Linear Models, Second Edition,
London: Champan and Hall, 1989
[11] Radovan Chalupka and Juraj Kopecsni. Modelling Bank Loan LGD of Corporate
and SME Segments A Case Study. Ideas.Repec.Org, 2009
[15] Tommi Jaakkola, Lecture Notes, 6.867 Machine Learning, Fall 2002, MIT,
Open Course Ware
21