05-05credit Risk Modeling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

CONFERENCE SUMM ARY

Forum on
Validation of Consumer
Credit Risk Models
November 19, 2004
Forum on
Validation of Consumer
Credit Risk Models
Sponsored by the Payment Cards Center of the Federal Reserve Bank of Philadelphia and
the Wharton School’s Financial Institutions Center

Peter Burns
Christopher Ody

Summary
On November 19, 2004, the Payment Cards Center of the Federal Reserve Bank of
Philadelphia, in conjunction with the Wharton School’s Financial Institutions Center, hosted
a one-day event entitled “Forum on Validation of Consumer Credit Risk Models.” This forum
brought together experts from industry, academia, and the policy community to discuss
challenges surrounding model validation strategies and techniques. This paper provides
highlights from the forum and ensuing discussions.

The views expressed here are those of the authors and do not necessarily represent the views of the Federal
Reserve Bank of Philadelphia or the Federal Reserve System. The authors wish to thank William Lang,
Dennis Ash, and Joseph Mason for their special contributions to this document.

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 1


2 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc
TABLE OF Introduction...........................................................................................5
CONTENTS
Model Validation: Challenging and Increasingly Important ....................7

Linking Credit Scoring and Loss Forecasting..........................................8

Metrics for Model Validation................................................................11

Incorporating Economic and Market Variables.....................................14

Conclusion: Art Versus Science ...........................................................16

Appendix A — Institutions Represented at the Conference.................21

Appendix B — Conference Agenda .....................................................22

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 3


4 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc
Introduction ing in these business processes has introduced new
risk management challenges. Very simply, how do
On November 19, 2004, the Payment we know that our credit risk models are working
Cards Center of the Federal Reserve Bank of Phil- as intended?
adelphia and the Wharton School’s Financial In-
stitutions Center hosted a “Forum on Validation The conference discussions focused on two
of Consumer Credit Risk Models.”1 This one-day critical types of risk models: credit scoring mod-
event brought together experts from industry, aca- els commonly used in credit underwriting and loss
demia, and the policy community to discuss chal- forecasting models used to predict losses over time
lenges surrounding model validation strategies and at the portfolio level. These two model types dif-
techniques. The discussions greatly benefited from fer in a number of ways, but the two modeling pro-
the diverse perspectives of conference participants cesses have strong theoretical links (although they
and the leadership provided by moderators and are not often linked in practice).
program speakers.2
Credit scoring models used for acquir-
Retail lenders, and particularly credit card ing accounts are typically built on a static sam-
lenders, use statistical models extensively to guide ple of accounts for which credit bureau — and of-
a wide range of decision processes associated with ten other applicant or demographic — informa-
loan origination, account management, and port- tion is available at the time of application. These
folio performance analysis. The increased sophisti- data must then be combined with information
cation of modeling techniques and the broader ap- about how these accounts ultimately performed in
plication of models have undoubtedly played key their first one to two years after acquisition. Cred-
roles in the rapid growth of the credit card indus- it scoring models are designed to predict the prob-
try and consumer lending in general.3 At the same ability that an individual account will default or,
time, the widespread adoption of statistical model- more generally, develop a delinquency status bad
enough that the bank would not have booked the
account initially had it known this would happen.
1
In May 2002, the Philadelphia Fed and the Financial A number of credit scoring models only use credit
Institutions Center co-hosted a multi-day conference on “Credit
bureau data to predict this probability, while oth-
Risk Modeling and Decisioning.” A summary of that event
was published as a Special Conference Issue of the Payment ers use application or demographic data in addi-
Cards Center’s newsletter, available on the Center’s web site at: tion to credit bureau data.
http://www.philadelphiafed.org/pcc/update/index.html.
Loss forecasting models predict dollar loss-
2
Speakers and moderators are listed in the
es for a portfolio or sub-portfolio, not individu-
program agenda at the end of this document. Copies of
presentations and the program agenda are available at al accounts. Some of the most popular loss fore-
http://www.philadelphiafed.org/pcc/conferences/Agenda.pdf. While casting models include cumulative loss rate mod-
all of the individuals in the program made important contributions, els, which rely on vintage curve analysis, and Mar-
William Lang, Dennis Ash, Shannon Kelly, and Robert Stine were kov models, which rely on delinquency analysis of
especially helpful in structuring an agenda for the day.
buckets. Loss forecasting models may or may not
3
“Revolving credit” outstandings in the U.S. (largely include segmentation by credit score. Econom-
credit card debt) grew from $100 billion to $790 billion in the ic data may be explicitly included in the model or
20-year period 1984-2004, as reported in the Federal Reserve implicitly included by using a time series covering
Statistical Release G.19 (February 7, 2005), available at an entire business cycle.4
http://www.federalreserve.gov/releases/g19/hist/cc_hist_sa.txt.

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 5


Given the economic implications associat- The conference format explicitly recog-
ed with a model’s accuracy and effectiveness, is- nized these overlapping interests, and each panel
sues concerning model validation are of obvious was structured to include an industry, an academ-
concern to the industry. Erroneous or misspec- ic, and a regulatory perspective.
ified models may lead to lost revenues through
poor customer selection (credit risk) or collections The conference began with an introducto-
management. While academics and other statisti- ry session outlining the importance of model val-
cians continue to extend and improve modeling idation and describing inherent challenges in the
technologies, lenders have to realistically assess credit risk management process. These themes
the costs and benefits associated with increasing were extended in the panels that followed, deal-
model sophistication and investing in more com- ing with validating credit scoring models and loss
plex validation techniques. forecasting models. The
Hence, one of the central day’s final panel, entitled
issues addressed during the One of the central issues “Where Do We Go from
forum was the adequacy of addressed during the Here?,” attempted to draw
the attention and resourc- out common threads and
es being devoted to valida- forum was the adequacy issues from the earlier dis-
tion activities, given these of the attention and cussions. As might be ex-
tradeoffs. pected when such complex
resources being devoted issues are examined, the
The forum also ad- to validation activities. discussions raised as many
dressed the increasing im- questions as answers. At
portance of validation from the same time, the dialogue
the regulatory perspective. Bank regulators and provided important insights and a better appreci-
policymakers recognize the potential for undue ation for the potential improvements that could
risk that can arise from model misapplication or result from greater collaboration among industry
misspecification. Examining and testing model leaders, academic researchers, and regulators.
validation processes are becoming central compo-
nents in supervisory examinations of banks’ con- Rather than provide a chronological sum-
sumer lending businesses. mary of the day’s discussion, this paper high-
lights several key issues that emerged during the
day. The paper begins with a summary of the open-
ing presentation on the importance of model vali-
dation, which set the stage for the subsequent pan-
4
Economic data are generally not used in credit scoring
models because this would require a very different sample structure. els. The remainder covers three general themes
To be useful, the sample would have to include accounts with similar that emerged from the panel discussions. These
credit bureau and application information booked over multiple themes represent areas of particular complexity
time periods, in order to reflect different economic environments. where the dialogue revealed multiple dimensions,
This would require a longer sample time and run the risk that the
alternative views, and, often, competing tensions.
account-level data would be seriously outdated before the model
was ever used. Loss forecasting models, on the other hand, are often While resolving the various issues was not feasi-
designed specifically to include the effects of economic changes ble in a single day, discussions generated important
on expected loss and so use a time series of losses under varying clarifications and specific suggestions for improv-
economic circumstances, either controlling for changes in the risk ing the model validation process.
profiles of the population of accounts or assuming there are none.

6 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


Model Validation: Challenging and That the same, often generic, scorecards
Increasingly Important are frequently used on a variety of portfolios with
widely different characteristics further challeng-
Dennis Ash, of the Federal Reserve Bank es the validation process. Different portfolios
of Philadelphia, opened the day’s discussion by that have different terms and conditions or prod-
addressing several fundamental issues associated uct features will also experience varied patterns of
with validation of credit risk models. He began by customer acceptance.
describing the practical challenges that emanate
from the basic modeling framework and how these With these and other practical challeng-
factors have affected industry practices. Ash em- es facing users of credit risk models, Ash assert-
phasized that, despite these challenges, there are a ed that it is not surprising that banks too often pay
number of compelling reasons for modelers to im- little or no attention to model validation. Too of-
prove validation practices. He closed with a series ten as well, he noted, banks ignore the most cur-
of questions that he encouraged participants to rent information available in their validation pro-
consider during the day’s deliberations. cesses. In an effort to recognize portfolio seasoning
effects, many banks will create validation samples
Ash noted that an intrinsic limitation to only from accounts booked one or two years ago.
developing robust validation processes comes from As such, they do not examine new account dis-
the model construction process itself. He pointed tributions or consider early delinquency patterns
out that scorecards (the output of the model that that might provide useful validation information.
weighs each borrower’s characteristics to compute
a score) are by definition “old” when put into pro- Similar issues face the development and
duction and then are often used for five to 10 years validation of loss forecasting models. Forecasts
without revision. By necessity, scorecards are based based on recent performance look at performance
on historic data requiring at least a year of observa- over the most recent outcome period, generally
tion points before model construction can even be- one year, which can then be weighted by the dis-
gin. In essence, the model-building process results tributions of accounts today. This is a more accu-
in a prediction of a future that looks like the past, rate approach than relying on scorecard outcomes
which, as Ash aptly noted, is analogous to “driving that are one to two years old and is further im-
a car by looking through the rear window.” Fur- proved by using current weightings. Despite this,
thermore, this approach simply fits patterns of cor- the technique does not take into account econom-
relation, which may not necessarily be related to ic forecasts. More comprehensive loss predictions,
causation, creating another level of challenge to which do use economic forecasts, generally use da-
any future validation process. ta over a complete economic cycle, which can be
dated. Any forecast assumes that the future is driv-
Similarly, Ash pointed out that score- en by the same factors that operated in the past.
cards are rarely constructed to incorporate chang- Issues of causality and accuracy of data can cause
es in underlying economic conditions. He noted degradation of the forecasts. Still, the more com-
that borrower behavior tends to be quite differ- plete data, including economic data in addition to
ent when interest rates are rising versus falling or data on individual accounts, the longer time histo-
in periods of economic downturns versus upturns. ry, and the use of time-series analysis should make
Performance validation, by definition, requires these forecasts more reliable over time.
some quantifiable expectations about the impact
of these economic factors. Despite these and other real challenges,

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 7


Ash argued that there are a number of compelling tion processes? How do we incorporate stress test-
reasons for credit card banks and other consumer ing under different economic conditions and then
lenders to pay greater attention to model valida- establish relevant tolerance metrics in validation?
tion. Size and scale considerations are driving fac- What do we do when we determine that our mod-
tors that increase the importance of carefully mon- els are not working as intended? What are appro-
itoring a model’s performance. As lender portfolios priate monitoring standards, and how do we in-
become larger and more complex, scoring becomes corporate ad hoc analyses into standard report re-
even more embedded in decision processes, adding views? How can we recognize and document the
greater importance to monitoring a model’s perfor- role of judgment in validation processes?
mance. All of these factors can have significant
economic consequences. Many of these questions have technical
components that are gener-
In a highly compet- ally addressed with detailed
itive lending environment, Ash noted that statistical considerations.
a model’s performance can implementation of The focus of this forum,
have important effects on however, was on the more
market share, perhaps even Basel II requirements general management princi-
creating adverse selection will quickly “raise the ples that need to be consid-
problems for those who real- ered in improving validation
ly get modeling wrong. Ash bar” on validation and risk management prac-
noted that implementation of credit risk models. tices. These and many other
of Basel II requirements will issues were actively debated
quickly “raise the bar” on throughout the day. Of the
validation of credit risk models. Model risk in con- various points raised, the remainder of this paper
sumer lending is a factor in defining overall oper- highlights three selected themes that seemed to
ational risk. Increasingly, bank examiners will be capture a number of the key issues debated: link-
seeking evidence that scoring models are effective- ing credit scoring models and loss forecasting mod-
ly differentiating pools of exposures by their cred- els; appropriate metrics for model validation; and
it risk characteristics and, by extension, that loss the use of economic and market variables in cred-
forecasting models reflect current portfolio com- it scoring models.
positions and take into account macroeconomic
and other relevant exogenous factors. Validation Linking Credit Scoring and Loss
processes, and related documentation and report- Forecasting
ing, will need to be consistent and clearly tied to a
model’s purpose. The conference discussion focused on val-
idation issues associated with credit scoring and
Basel guidance documents provide a tem- loss forecasting, two common and critical risk
plate for validation that should help financial insti- models used in credit card banks and other con-
tutions adopt advanced validation practices. sumer lending environments. However, confer-
ence participants also debated an underlying point
In closing, Ash raised a series of questions to the discussion of validation: the extent to which
that he encouraged conference participants to con- these two risk models have theoretical and practi-
sider during the day: How do we integrate model cal links.
purpose and performance expectations into valida-

8 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


Banks use credit scoring models to rank in- be used to predict future losses, each of which has
dividuals based on how likely they are to default its own technical complexities, advantages, and
on a loan. While a credit scoring model typical-
5
limitations. Banks may use more than one kind of
ly produces a default probability, the models are loss forecasting model to help predict future cash
generally built to separate and rank order borrow- flows, establish loan loss reserves, and set appropri-
ers by risk. Thus, metrics for validation of cred- ate levels of capital.
it scoring models typically do not rely on wheth-
er the model accurately predicts default frequency, An underlying theme during the day’s dis-
but rather they concentrate on the model’s abil- cussions centered on the connection between
ity to determine which borrowers are more like- these two risk modeling techniques. Some partic-
ly to default relative to others. In contrast, valida- ipants argued that the two processes are logical-
tion of loss forecasting mod- ly linked. That is, the de-
els is based on the accuracy fault rate is a central com-
of the models’ predictions Failure to exploit the ponent of aggregate dol-
relative to those of alterna- lar losses, and therefore, a
tive models. 6 connection between scoring model that gener-
these modeling ates statistical measures of
Banks use the scor- the likelihood of default
ing model’s measure of rel- approaches means that should be a central input
ative expected performance lenders are not using all to loss forecasting models.
to make a variety of deci- Moreover, failure to exploit
sions, such as whether to the relevant information the connection between
grant credit, where to set the available to develop these modeling approaches
interest rate, and how to de- means that lenders are not
termine the maximum bor- more effective tools. using all the relevant infor-
rowing limit. Bank manage- mation available to develop
ment must dynamically adjust score cut-off criteria more effective tools.
for granting credit as well as the criteria for setting
risk-based prices and credit limits. This dynamic Professor Robert Stine, of the Wharton
adjustment is generally based on an assessment of School, observed that in his experience the two
market conditions as well as on the observed abso- modeling functions are often conducted indepen-
lute rate of default for a given score band. dently. “Banks have the credit score modelers in
one office, and the loss forecasters in another of-
Loss forecasting models predict aggregate fice, and the two groups build their models in iso-
dollar losses for particular portfolios over a specif- lation without ever talking to each other.” Stine
ic period of time. A variety of methodologies can suggested that bringing these groups together
could create synergies, increase knowledge with-
in banks, and unify different pieces of evidence
5
The definition of default (or “bad”) for scoring purposes involved in managerial decision-making. Oth-
is not generally the same as the definition of default a lender may ers noted that this separation sometimes occurs,
use for charge-off or placing a loan on nonaccrual status. in part, because of differences in functional skills.
6
Many lenders use a “champion/challenger” approach
Credit scoring modelers are typically statisticians
for validating a loss forecasting model. This approach compares
the current (champion) model’s forecast accuracy to that of an housed in business units responsible for underwrit-
alternative (challenger) model. ing and account management, whereas in many

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 9


banks, loss forecasters are finance professionals to build statistical scoring models that give abso-
working in the bank’s treasury department. lute risk in varying conditions, it is better to build
relatively stable rank-ordering models and then re-
In addition to pointing out institutional di- ly on managerial judgment to change cutoffs for
visions within a firm, participants also noted tech- credit scores and make other business decisions to
nical reasons for building credit scoring and loss account for different conditions.
forecasting models independently. In particular,
the absolute likelihood of default depends on fac- While acknowledging that there are sub-
tors that go beyond the characteristics of the indi- stantial difficulties in making greater use of scor-
vidual borrower, and these factors are difficult to ing models in loss prediction, Nick Souleles, of the
incorporate into a statistical model. For example, Wharton School, contended that some of these
the likelihood of default also difficulties are surmount-
depends on a firm’s pricing, able and that there might
which, in turn, depends on Different people make also be substantial gains
the pricing decisions of its different distinctions in tackling them. As not-
competitors as well as on the ed earlier, different peo-
overall interest rates. More- between credit scoring ple make different distinc-
over, industry and macro- models and loss tions between credit scor-
economic factors change dy- ing models and loss fore-
namically, so by definition, forecasting models. casting models. One dis-
incorporating these factors tinction concerns what is
would require building far more complex, dynam- being measured: credit scoring models predict de-
ic models. fault, whereas loss models usually predict expect-
ed losses. Another distinction concerns the “car-
Indeed, some conference participants sug- dinality” of the results: credit scoring models typi-
gested that attempting to incorporate industry and cally produce only a rank ordering of risk, whereas
macroeconomic factors into credit scoring mod- loss models predict dollar losses.
els is inherently too complex and would ultimately
lead to substantial error. In light of these complexi- Souleles argued that both of these distinc-
ties, some practitioners argued that by concentrat- tions are somewhat artificial and that, in princi-
ing on producing a relative risk ranking of borrow- ple, the two models should share common foun-
ers, lenders can effectively capture fairly stable re- dations. For example, it is possible to rank or-
lationships between borrower-specific information der consumers by expected losses or profitabili-
and the relative risk of default. ty and conversely to produce cardinal probabili-
ties of default. Indeed, while earlier generations of
Intuitively, it would seem that changes in scoring models were based on discriminant analy-
economic or market conditions would change the ses that simply tried to separate “bad” and “good”
absolute likelihood that people will repay their accounts, many current scoring models are based
loans. However, it was argued that most “good on logistic and related models, which formally pro-
risks” will remain less likely to default than “bad vide (and assume) cardinal probabilities of default.
risks,” regardless of economic or market condi- Hence, when people say they use scoring models
tions. Thus, one would expect rank ordering to be only to rank order risk, they are, in practice, ignor-
more stable in changing conditions than the abso- ing the additional information available in the un-
lute rate of default. In this view, instead of trying derlying model. As argued earlier, this is done for

10 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


robustness. In Souleles’ view, though, this suggests approach loss forecasting from a variety of direc-
that the underlying models are not stable enough tions that do not involve exploiting the poten-
and that it might be better to deal with robustness tial connection with credit scoring models. While
and model instability directly. participants had varying views as to the efficacy
of various approaches that would bring these two
With respect to “cardinality,” his view is modeling techniques closer together, they gener-
that lenders cannot avoid making cardinal deci- ally agreed that industry and academic research-
sions, so they might as well systematize their de- ers are moving in the direction of greater linkage
cisions as best as possible. While in the past cred- and that implementation of Basel II will likely spur
it scoring models were often used simply to decide these developments. Furthermore, as the accura-
whether or not to extend a loan, today very few de- cy of prediction in credit scoring models improves,
cisions are so binary. For in- there will be a greater in-
stance, on booking a credit centive to exploit the con-
card account, a lender must While in the past credit nection with loss forecast-
decide on the credit limit scoring models were often ing. More broadly, credit
and the interest rate, both scoring models that gener-
of which are continuous used simply to decide ate more reliable point esti-
variables, and the appropri- whether or not to extend mates of the rate of default
ate interest rate should gen- could serve explicitly as in-
erally depend on the (cardi- a loan, today very few puts into a variety of oth-
nal) expected probability of decisions are so binary. er decision-making models,
default. such as lifetime value mod-
els or pricing models. Aca-
Representatives of the regulatory commu- demics, regulators, and those in the financial ser-
nity also noted that in the Basel II framework, risk vices industry all have good reason to actively fol-
ranking and forecasting are linked by requiring a low these developments.
portfolio to be segmented into homogeneous pools
of risk, a job for which scoring is a prime tool, and Metrics for Model Validation
then requiring various risk parameters to be esti-
mated for each pool: the probability of default, During the discussion on model valida-
the loss given default, and the exposure at default. tion, the issue of appropriate metrics was anoth-
These risk parameters, in turn, determine the min- er prominent theme. Recognizing that there is no
imum capital requirements for that pool. The capi- common yardstick by which credit scoring and loss
tal requirements can then be added across pools to forecasting models can be measured, the confer-
get the total capital requirement. Basel risk param- ence panelists offered a framework for thinking
eters and capital requirements are not necessarily about how model purpose, model use, and expec-
the same as a bank’s internal estimates of loss and tations for results play into the evaluation of credit
economic capital, but the link between the Basel scoring and loss forecasting models. Despite wide-
process and internal risk models may provide an spread agreement about the importance of clear-
impetus to banks to more effectively incorporate ly articulating models’ purpose, use, and expect-
scoring into their loss forecasts. ed results, opinion diverged on the merits of us-
ing such standard statistical tests as the Gini co-
In the face of current limitations to cred- efficient and the K-S statistic. In the end, as with
it scoring models, banks have generally chosen to other discussion topics, forum participants broadly

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 11


acknowledged that developing effective processes es are to those actually realized. Commonly used
and exercising sound judgment were equally as im- metrics to test predictive accuracy include the
portant as the particular statistical measurement mean-squared error and the mean-absolute error.
technique used.
Glennon’s general conclusion was that val-
Dennis Glennon, of the Office of the idation methodologies should be closely associated
Comptroller of the Currency, provided a helpful with how the model is used. For example, in cases
description of the relationships between the fun- where a bank has a business need to use the esti-
damental uses of credit scoring and loss forecast- mated probability of default produced by a scoring
ing models and the tools used to evaluate their model, validation criteria should include evalua-
performance. tions of the model’s goodness-of-fit and accuracy.
However, if a bank only us-
In defining credit es the rank-ordering prop-
scoring models as essential- Instability in ordering erties of the score, valida-
ly a classification tool, he ar- would suggest that the tion should concentrate on
gued that they be evaluat- the model’s ability to sepa-
ed simply based on how well model is not capturing rate risk over time.
they separate “good” and the underlying and
“bad” credits over time. One Although partici-
common approach is to con- relatively constant pants agreed that models
sider some measure of diver- information about how should be evaluated based
gence between “goods” and on purpose and defined by
“bads.” An effective classi- risky different credits are. expectations, there was less
fication tool should result agreement about wheth-
in accepting a high proportion of “goods” consis- er commonly used statistical tests are appropri-
tent with expectations. The K-S statistic and the ate to the needs of model-based consumer lend-
Gini coefficient are common measures of a mod- ers, such as credit card companies. Professor David
el’s ability to separate risk. A second, related con- Hand, of London’s Imperial College, argued that
sideration is to evaluate whether the scoring mod- the standard metrics for validating credit scoring
el rank orders well over time. Instability in order- models are, indeed, inadequate and potentially
ing would suggest that the model is not capturing misleading.
the underlying and relatively constant information
about how risky different credits are. Hand started with the observation that
credit scoring models are used to assign applicants
Glennon noted that, by contrast, loss fore- to one of a discrete number of possible actions by
casting models are essentially predictive tools that the bank. For example, in deciding whether to ac-
require metrics that evaluate “goodness-of-fit” and cept an applicant for a credit card, a bank accepts
“accuracy.” “Goodness-of-fit,” he explained, mea- applicants above a certain score and rejects those
sures how much of the variation in losses can be below it. When the bank makes the accept/reject
explained by changes in the independent vari- decision, it doesn’t matter how much the person is
ables. In regression analysis, this is most common- above or below the cutoff. Therefore, the distribu-
ly measured as the R-squared of the regression. By tion of applicants’ scores is irrelevant to the mod-
contrast, a loss forecasting model’s “accuracy” is el’s performance at assigning applicants to actions.
best determined by how close predictions of loss- Hand pointed out that the model’s only observable

12 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


measure of performance is the number of “bad” to borrowers who are below the cutoff for existing
applicants accepted. Nevertheless, the common- products. Mout also argued that the consistent use
ly used statistical tests of a model’s performance, of an agreed-upon metric is important, noting that
such as the K-S statistic or Gini coefficient, mea- a consistent metric is essential for comparing mod-
sure the model’s ability to rank risk throughout the els during development, across portfolios, and over
entire sample without giving any special weight to time. Thus, he concluded that there could be dif-
performance near the accept/reject region. More ficulty in tying a metric too closely to a cut-off cri-
generally, Hand argued that banks should not use terion that was dynamically changing.
metrics that rely on continuous distributions to
evaluate models used for assigning applicants to While the discussion raised questions
discrete actions. about whether Hand’s approach was applicable
in all situations, there was
Hand further sug- agreement on Hand’s more
gested that standard statis- Hand’s model shows general point that evaluat-
tics for evaluating the risk
separation properties of
that alternative measures ing a model’s performance
depends critically on a clear
scoring models were often that concentrate on understanding of the mod-
not well aligned with the use el’s intended use.
of those models. In particu-
ranking performance of
lar, he presented research marginal borrowers (i.e., Nick Souleles also
on the measures one should
use when evaluating a mod-
those borrowers near the pointed out the importance
of establishing a clear yard-
el that establishes a cut-off potential score cutoff) stick for a model’s purpose.
score for granting or de-
nying credit. Hand’s mod-
produce better results than Moreover, he argued that
the appropriate yardstick
el shows that alternative standard validation criteria for lending models should
measures that concentrate be the maximization of a
on ranking performance of
that measure how the bank’s risk-adjusted lifetime
marginal borrowers (those model ranks performance returns from its loans or ac-
borrowers near the potential counts rather than accu-
score cutoff) produce better
for the entire sample. rate estimates of the prob-
results than standard vali- ability of default or expect-
dation criteria that measure how the model ranks ed losses.
performance for the entire sample.
He also noted that at the portfolio level,
Keith Krieger, of JPMorgan Chase, noted the return on a portfolio of loans depends on more
that Hand’s argument holds only for the K-S sta- than the risk characteristics of an individual loan
tistic when banks choose a cutoff different from or segment. The covariance in returns across loans
the point of maximum divergence. Michael Mout, is an additional, crucial parameter. To illustrate the
of Capital One, also noted that banks do not al- importance of covariance in returns, suppose that
ways develop and evaluate models for a use as spe- the average probability of default as measured by
cific as accepting or rejecting applicants. For ex- credit scores is the same in Michigan and in Alas-
ample, a scoring model might be used to provide ka. However, suppose that the timing is such that
a bank with information for testing new products default rates in Alaska have a low covariance with

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 13


the national default rate, while the default rates in solely required establishing objective statistical
Michigan are highly correlated with the national criteria or whether judgment was a necessary com-
default rate. In this case, loans to Alaskans will re- ponent. Some practitioners noted that a model’s
duce the volatility of the portfolio, holding all else performance depends on multiple factors. For ex-
fixed. While this example is simply illustrative, not ample, a model’s performance is likely to be bet-
a policy recommendation, the point is that most ter in stable economic environments than unsta-
lenders would value lower volatility for the same ble ones. Some forum participants argued that any
average default rate. evaluation of a model’s performance needs to take
into account these complex factors and that mod-
Souleles presented recent research show- el developers could not solely rely on a statistical
ing that it is possible to formally model which con- measure to assess a model’s performance. At least
sumers are likely to be more one participant noted that
cyclical than others. Fur- the discussion on tools for
ther, he pointed out that Rather than establishing a model’s validation high-
this sort of cyclicality can some arbitrary statistical lights just how much “art”
potentially break the rank remains in what initially ap-
ordering of risk implicit- criteria for a model’s pears to be a scientific and
ly assumed by many cred- performance, the central strictly numerical decision.
it scorers, since, in a down-
turn, the risk from cyclical question for validation While there was
consumers will deteriorate is whether the model is general agreement that the
faster than that from non- validation process is part
cyclical consumers. working as intended and science and part art, some
producing results that participants argued for
Forum participants the need to establish clear
also concurred that mod- are at least as good as quantitative criteria as part
els must be validated rela- alternative approaches. of the validation process.
tive to clearly understood Such criteria need not be
expectations. Rather than the sole measure of mod-
establishing some arbitrary statistical criteria for a el performance, but they are necessary for estab-
model’s performance, the central question for val- lishing scientific rigor and discipline in the valida-
idation is whether the model is working as intend- tion process. Although participants did not reach
ed and producing results that are at least as good consensus on this topic, they generally recognized
as alternative approaches. A clear understanding that experts must learn to balance evidence from
and documentation of expected performance is a a variety of metrics when building and evaluating
necessary and fundamental basis on which all val- models.
idation approaches must be built. On a pragmatic
level, validation must assist management in deter- Incorporating Economic and
mining whether the benefits of potential improve- Market Variables
ments to the model are worth the added costs of
developing and implementing new models. Throughout the conference, participants
discussed the advantages and disadvantages of
There was considerable discussion as to including additional market and economic vari-
whether expectations for a model’s performance ables in both credit scoring and loss forecasting

14 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


models. In her presentation, Dina Anderson, of Moreover, even with shorter sample peri-
TransUnion, illustrated that credit scoring models ods, he believes that it is still possible to use cross-
are limited because they do not account for mac- sectional variation in, say, unemployment rates
roeconomic variables or, more generally, any fac- across counties, to model the effects of unem-
tors influencing loan repayment that are outside of ployment. Souleles showed results from his study
an individual’s control. Anderson described an in- of this subject, which found that increases in un-
dividual who loses her job during a recession and employment rates, declines in house prices, and
goes late on credit card payments until she finds a health shocks (e.g., the loss of health insurance)
new job. If the job loss is simply due to bad luck, increase default rates.7 Such macro variables help
she will not be any riskier after getting a new job predict default even after controlling for stan-
than she was before. “In reality,” Anderson noted, dard credit scores. While the scores still provide
“the likelihood that the cus- most of the predictive “lift,”
tomer is ‘good’ remains the the macro variables pro-
same.” However, because Souleles argued that vide enough additional lift
she was delinquent, cred- to warrant their inclusion.
it scoring models will move
it would be better to Knowing this, lenders of-
her into a higher risk pool, formally include the macro ten respond informally, for
despite the fact that her un- example, by adjusting their
derlying risk is unchanged.
variables in the model, score “cutoffs” (for at least
Therefore, the model is not in addition to the usual binary decisions). Souleles
appropriately reflecting the argued that it would be bet-
risk probability over time
credit variables. ter to formally include the
because of causal factors macro variables in the mod-
that it does not include. el, in addition to the usual credit variables.

During his presentation, Souleles also ad- Souleles pointed out that it is relative-
dressed issues of model stability. He began by not- ly easy to control for macro variables in reduced
ing that model instability is an issue for both scor- form, without building a complete structural mod-
ing and loss models. Models are calibrated using el of the economy. While some in the audience ar-
historical data, so if relevant unmodeled condi- gued that controlling for macro variables introduc-
tions change, the model can have trouble fore- es too much subjectivity, Souleles responded that
casting out of sample. Souleles pointed out that limiting oneself to the variables that happen to be
one useful response is to try to incorporate more available at the credit bureau is no less subjective.
of the relevant conditions into the model, in par- Nonetheless, Souleles warned that, in the absence
ticular, macroeconomic conditions. Time-series of a structural model, one must remember that fu-
analysis of macro variables, such as the unemploy- ture recessions might be different from past reces-
ment rate, requires long sample periods, presum- sions. He showed data from the period 1995-97,
ably covering at least one business cycle. Until re- during which the bankruptcy rate significantly in-
cently, sample periods that were long enough were creased, even when controlling for credit scores
hard to come by, but he suggested that the 2001 and macroeconomic conditions (which were im-
recession provided new data that could be useful
in predicting the effects of future increases in un- 7
“An Empirical Analysis of Personal Bankruptcy and
employment.
Delinquency,” (with D. Gross), Review of Financial Studies, 15(1),
Spring 2002.

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 15


proving at the time). Lenders will always have to at least to practitioners of the art – the importance
back up their models with judgment. Still, he con- of the qualitative aspects may be less so. In prac-
cluded that one should try to quantify that which tice, though, these qualitative aspects are no less
can be quantified and use the experience of recent important to the successful operation of a busi-
recessions to increase a model’s accuracy (as com- ness.”8 Later in her talk she added, “Some quali-
pared to the alternative of ignoring that experi- tative factors – such as experience and judgment
ence altogether). – affect what one does with model results. It is im-
portant that we not let models make the decisions,
Joseph Breeden, of Strategic Analytics, al- that we keep in mind that they are just tools, be-
so emphasized that banks should quantify the ex- cause in many cases it is management experience
pected effects of scenarios on future losses. Wheth- – aided by models to be sure – that helps to lim-
er explicitly or implicitly, all it losses.” In a related sense,
loss forecasts are based on a good bit of the conference
predictions regarding the Breeden suggested that discussions focused on the
vintage life-cycle, chang- role of judgment in the val-
ing credit quality, seasonal- banks could even solve idation of credit risk mod-
ity, management action, the the model backwards, els. By noting this balance
macroeconomic environ- of technical and judgmen-
ment, and the competitive determining what tal factors, participants rec-
environment, which togeth- would need to happen ognized the importance of
er form a scenario. By overt- both “art” and “science” in
ly including these factors, to the economy for a credit risk modeling.
management can determine portfolio’s performance
how much of the difference At the most basic
between actual and expect- to fulfill management’s level, the construction of
ed losses is a result of the expectations. any statistical credit scor-
model and how much is a ing and loss forecasting
result of the scenario. Even model requires some ele-
if a macroeconomic forecast is inaccurate, by ex- ment of judgment, wherein the statisticians them-
plicitly including it, banks can examine outcomes selves decide whether to formally model the full
over a range of possible future conditions. Breeden array of (often endogenous) processes underlying
suggested that banks could even solve the model repayment and default. The discussion relating
backwards, determining what would need to hap- to incorporating macroeconomic data into mod-
pen to the economy for a portfolio’s performance el design reflects one such issue, as Souleles not-
to fulfill management’s expectations. As in other ed, that even without a formal structural model of
areas of the discussion, this topic elicited a number
of important insights for further research.

Conclusion: Art Versus Science 8


Susan S. Bies, “It’s Not Just about the Models:
Recognizing the Importance of Qualitative Factors in an
In a speech in early December 2004, Fed- Effective Risk-Management Process,” The International Center
eral Reserve Governor Susan Schmidt Bies noted for Business Information’s Risk Management Conference,
Geneva, Switzerland, December 7, 2004. Speech online at:
that “although the importance of quantitative as-
http://www.federalreserve.gov/boarddocs/speeches/2004/
pects of risk management may be quite apparent – 20041207/default.htm

16 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


the macroeconomy, measurements of available re- change and re-calibration is likely a sign that the
duced-form parameters often improve model fit. model is no longer functioning as intended and
needs to be replaced. Judgmental factors may
The art, of course, lies in choosing the pa- therefore add noise or accuracy (or both) to ac-
rameters to include and in calibrating a meaning- tual credit and loss outcomes. Hence, when mod-
ful model. Those choices, in turn, rely on a clear- els are augmented by managerial judgment, results
ly stated and documented understanding of the from the modeling and subsequent validation pro-
model’s intended purpose and use. Models used cesses can become seriously compromised. There-
to rank order credit scores have different inher- fore, while there was broad agreement that mod-
ent limitations than those used to generate accu- el performance must allow for judgmental factors,
rate predictions. Furthermore, models used for bi- a number of participants argued that incorporat-
nary classifications (accept/ ing judgmental factors in-
reject) face different lim- creases the need for rigor-
itations than those used Consistency is a critical ous testing and validation.
for multiple joint decisions factor, and judgmental
(accept/reject, interest rate, Validation, and
and credit line). Models in- input must be controlled more generally risk man-
corporating changes in eco- and managed with the agement, is an entire pro-
nomic or industry perfor- cess that requires an inter-
mance may face limitations same precision used with play between effective man-
not yet known. Nonethe- other model inputs. agerial judgment and statis-
less, we can be sure that as tical expertise. It is not sim-
competitive pressures and ply establishing a set of sta-
technical advances continue, implementation of tistical benchmarks. Ronald Cathcart, of CIBC,
new model validation techniques will rise in im- aptly summarized the benefits and drawbacks of
portance. incorporating judgmental factors in the construc-
tion, use, and validation of credit scoring and loss
The industry typically refers to such judg- forecasting models when he emphasized the need
ment as “overrides”: Management decides to take for consistency in the use of managerial process-
action notwithstanding the model’s results. While es throughout the model’s life. Cathcart defined
most participants agreed that managerial judg- eight common steps or stages generally found in
ment, aided by credit scoring and loss forecasting credit risk modeling beginning with “problem def-
models, can lead to better account management, inition” to “maintenance and monitoring.”9 As he
that judgment needs to be implemented careful- described these eight steps, he noted that judg-
ly. Consistency is a critical factor, and judgmen- mental factors are incorporated throughout the
tal input must be controlled and managed with model’s life and all steps require distinct validation
the same precision used with other model inputs. approaches to ensure consistency throughout the
When judgmental inputs are inconsistent and sub- entire process.
ject to frequent changes, the model becomes less
important to the credit scoring and loss forecast-
ing management process. If the model is routine-
ly overridden, the model becomes superfluous and 9
The eight steps as defined by Cathcart are included in his
should be either abandoned or revised. As one in- PowerPoint presentation available on the Center’s web site at: http:
dividual observed, the perceived need for constant //www.philadelphiafed.org/pcc/conferences/Ronald_Cathcart.pdf.

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 17


Cathcart also emphasized the im- processes. As noted in Basel Retail Guidance, “A
portance of documentation, a point echoed bank must establish policies for all aspects of val-
by others in the discussion. While this may idation. A bank must comprehensively validate
seem obvious, a number of participants risk segmentation and quantification at least an-
from the regulatory community noted that nually, document the results, and report its find-
the lack of documentation of judgmental process- ings to senior management.”10
es is an all too common deficiency found in bank
exams. Very simply, internal risk managers and Models are quickly becoming a critical ar-
bank examiners have a common need to under- ea of potential innovation and competitive advan-
stand how judgment is being employed and how tage. While participants generally accepted this
well outcomes matched expectations or previous premise, several argued that a reliance on dem-
performance. While lenders onstrated validation out-
should have clearly estab- comes will lead to the elim-
lished expectations of how a Lenders must be able ination of judgment in the
model will perform and how to demonstrate to their lending process. As artic-
it should inform manage- ulated by several members
ment decisions, they should regulators how their of the regulatory commu-
also have criteria that elicit models are performing nity, this is clearly not the
managerial review to deter- intention or direction they
mine whether a model has against expectations will be pursuing. The appli-
come to the end of its use- and how risk exposures cation of judgmental fac-
ful life. tors is recognized as a criti-
fit within defined bands cal element of the risk man-
As a result, docu- of acceptability. agement process. How such
mentation is expected to factors are applied and how
become an ever more crit- expectations for perfor-
ical factor in the Basel II world. As model risk mance will be affected now, however, need to be
becomes a bigger factor in overall risk consider- well documented.
ations, model validation becomes paramount. Un-
derpinning the Basel II framework is the regulato- In the end, it was generally agreed that
ry acceptance of individual banks’ approaches to while credit scoring and loss forecasting models
model-based decisioning. Lenders must be able to and their statistical validation appear to be a well-
demonstrate to their regulators how their models grounded quantitative science that is becoming an
are performing against expectations and how risk important focus of regulatory compliance, they re-
exposures fit within defined bands of acceptabili- main inextricably intertwined with the art of man-
ty. In essence, Basel II raises the bar for validation agement.

10
Internal Ratings-Based Systems for Retail Credit Risk
for Regulatory Credit; 69 Federal Register, pp. 62,748 ff, October 27,
2004.

18 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


APPENDICES

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 19


20 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc
APPENDIX A
Institutions Represented at the Conference

American General Corporation GE Consumer Finance


Argus Information and Advisory Services Household Credit Card Services
Bank of America Imperial College London
Bridgeforce Innovalytics, LLC
Capital One JPMorgan Chase
CIBC KeyBank
CIT KPMG
Citigroup LoanPerformance, Inc.
Cornell University MBNA
Daimler Chrysler Merrill Lynch
Drexel University Office of the Comptroller of the Currency
Equifax Penn Mutual Life Insurance Company
Ernst & Young PNC Bank
Experian-Scorex Strategic Analytics
Fair Isaac & Co., Inc. TransUnion
Federal Deposit Insurance Corporation U.S. Department of Justice
Federal Reserve Bank of Atlanta US Bank Corp.
Federal Reserve Bank of Philadelphia Wells Fargo
Federal Reserve Bank of Richmond Wharton School
Federal Reserve Board of Governors

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 21


APPENDIX B
Conference Agenda

8:30 am Registration and Coffee

9:00 am Welcome and Introduction


Carol Leisenring
Co-Director, The Wharton School’s Financial Institutions Center
Peter Burns
Vice President & Director, Payment Cards Center
Federal Reserve Bank of Philadelphia

9:15 am What Is the Challenge and Why Is It Important?


Dennis Ash, Federal Reserve Bank of Philadelphia

• What do we mean by model validation?


• Why focus on credit scoring and loss forecasting models?
• What are the risks of not getting it right? And what are the opportunities for
those that can do better?

9:45 am Break

10:15 am Validating Credit Scoring Models


Moderator: Christopher Henderson, MBNA America Bank
Panelists: David Hand, Imperial College London
Dina Anderson, TransUnion
Michael Mout, Capital One

• How often do we need to validate and what does this timing depend on?
• Will one measure do?
• What do we do when the future is different from the past because of changes in the
economy, changes due to portfolio acquisitions, changes in product terms, etc.?

12:00 pm Informal Lunch

22 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


Conference Agenda

1:00 pm Validating Loss Forecasting Models


Moderator: Joseph Breeden, Strategic Analytics
Panelists: Dennis Glennon, Office of the Comptroller of the Currency
Nick Souleles, The Wharton School
Ron Cathcart, Canadian Imperial Bank of Commerce

• How are loss forecasting models different from credit scoring models?
• What techniques (roll rate, vintage analysis, scoring-based approaches, etc.) are best used
for forecasting dollar losses?
• How do we best validate loss forecasting models and how is this different from or similar
to validation of credit scoring models?

2:45 pm Break

3:00 pm Where Do We Go From Here?


Moderator: William Lang, Federal Reserve Bank of Philadelphia
Panelists: Robert Stine, The Wharton School
Erik Larsen, Office of the Comptroller of the Currency
Sumit Agarwal, Bank of America
Huchen Fei, JPMorgan Chase

• What should we most care about going forward?


• What are the gaps in our understanding?
• What things do we need to work on: to run the business, to provide effective oversight,
and to resolve theoretical questions?

www.philadelphiafed.org/pcc Validation of Consumer Credit Risk Models 23


The Wharton Financial Institutions Center Payment Cards Center
2307 Steinberg Hall-Dietrich Hall Federal Reserve Bank of Philadelphia
3620 Locust Walk Ten Independence Mall
Philadelphia, PA 19104 Philadelphia, PA 19106

http://fic.wharton.upenn.edu/fic/ http://www.philadelphiafed.org/pcc/

24 Validation of Consumer Credit Risk Models www.philadelphiafed.org/pcc


Ten Independence Mall
Philadelphia, PA 19106-1574
215-574-7110
215-574-7101 (fax)
www.philadelphiafed.org/pcc

Peter Burns
Vice President and Director

Stan Sienkiewicz
Manager

The Payment Cards Center was established to serve as a source of knowledge and expertise on this important segment of
the financial system, which includes credit cards, debit cards, smart cards, stored-value cards, and similar payment vehicles.
Consumers’ and businesses’ evolving use of various types of payment cards to effect transactions in the economy has
potential implications for the structure of the financial system, for the way that monetary policy affects the economy, and
for the efficiency of the payments system.

You might also like