Logit Model For PD
Logit Model For PD
Logit Model For PD
2009.5
Table of Contents
Predictive Modeling on Probability of Default – Logit Model Using Microsoft Excel and VBA...1
Predictive Modeling on Probability of Default – Logit Model Using Microsoft Excel and VBA...3
Abstract................................................................................................................................................3
1. Introduction......................................................................................................................................3
2. General Discussion on Key Factors in the Analysis of Obligor’s Risk Assessment.......................3
3. Data..................................................................................................................................................4
4. The Model........................................................................................................................................5
5. Visual Basic Application (VBA) for Estimation on Logistic Regression.......................................6
6. Results..............................................................................................................................................6
Appendix:.............................................................................................................................................6
Reference:............................................................................................................................................9
Predictive Modeling on Probability of Default – Logit Model Using Microsoft Excel and VBA
Kevin Lan
Abstract
Logistic regression is often used to estimate the probability of default in commercial loans against
different types of borrowers (obligor’s PD model). Microsoft Excel is probably the most popular used
software in daily banking business work. However, the logistic regression model function is not
captured by Excel. In this case study, I developed a Visual Basic Application (VBA) add-in in Excel
macro environment to regress the linear logistic function with algorithms that capture the maximum
likelihood procedure to predict probabilities of default in commercial lending. Using the dataset with
five financial ratios that capture the widely known Z-score model developed by Altman (1968), the
general drivers in terms of credit risk as well as the model estimation procedure was discussed. The
developed VBA application can be expanded into retail banking (i.e., credit card, auto loan & home
mortgages) and SME lending practices in small or medium commercial banking environment.
1. Introduction
Credit risk refers to a financial or credit institution’s risk of a borrower’s payment default on payment of
interest and principal due to the borrower’s unwillingness or inability to service the debt. The higher the
credit risk an institution is exposed to, the greater the losses may be. For banks and most other credit
institutions, credit risk is considered to be the form of risk that can most significantly diminish earnings
and financial strength.
Essentially, a pre-warning mechanism of a risk rating system is supposed to be focused on the financial
strength of a firm as well as the business cycle of the specific industry trends. First corporate earnings
must be reasonable relative to payment obligations. If this is not the case, liquidity will be weakened.
Without satisfactory earnings, it will also be difficult for an enterprise to raise other types of capital,
such as loan capital and new equity. A shortage of liquidity is often the factor that triggers bankruptcy.
One or more variables that can explain the level of and changes in the enterprise’s liquidity should
therefore be included in a credit risk model. An enterprise’s ability to withstand losses is often assessed
on the basis of its financial strength measured by its equity ratio. With a high equity ratio, the enterprise
is better equipped to cope with difficult periods, partly because it will be easier to raise capital through
the sale of assets without encumbrances and also obtain new loans because better collateral can be
offered. Generally, a high equity ratio also implies lower current expenses for interest and principal.
3. Data
Following the previous guideline, one can incorporate a lot of financial ratios into the linear logistic
function. Among the dozens of financial ratios available, we've chosen 30 measurements that are the
most relevant to the investing process and organized them into six main categories as per the following
list1.
Not every financial ratio is appropriate for the modeling estimation due to the data access limitation. The
accurate prediction is impossible in reality. This paper aims to develop an Excel VB application rather
than a good model. Therefore, we only select five variables into the model for default prediction:
Working Capital (WC), Retained Earnings (RE), Earnings before interest and taxes (EBIT) and Sales(S),
each divided by Total Assets (TA); and Market Value of Equity (ME) divided by Total Liabilities (TL).
Except for the market value, all of these items are found in the balance sheet and income statement of
the company. The market value is given by the number of shares outstanding multiplied by the stock
price. The five ratios are those from the widely known Z-score developed by Altman (1968). WC/TA
captures the short-term liquidity of a firm, RE/TA and EBIT/TA measure historic and current
profitability, respectively. S/TA further proxies for the competitive situation of the company and ME/TL
are a market-based measure of leverage. Of course, one could consider other variables as well; to
mention only a few, these could be: cash flows over debt service, sales or total assets (as a proxy for
size), earnings volatility, and stock price volatility. Also, there are often several ways of capturing one
underlying factor. Current profits, for instance, can be measured using EBIT, EBITDA (=EBIT plus
depreciation and amortization) or net income.
1
The list is quoted from www.investopedia.com.
Table 1 List of Financial Ratios
1) Liquidity Measurement Ratios 4) Operating Performance Ratios
4. The Model
A score summarizes the information contained in factors that affect default probability. Standard scoring
models take the most straightforward approach by linearly combining those factors. Let x denote the
factors (their number is N) and b the weights (or coefficients) attached to them; we can represent the
score that we get in scoring instance i as:
(1)
The scoring model should predict a high default probability for those observations that defaulted and a
low default probability for those that did not. In order to choose the appropriate weights b, we first need
to link scores to default probabilities. This can be done by representing default probabilities as a
function F of scores:
(2)
Like default probabilities, the function F should be constrained to the interval from 0 to 1; it should also
yield a default probability for each possible score. The requirements can be fulfilled by a cumulative
probability distribution function. A distribution often considered for this purpose is the logistic
distribution. The logistic distribution function z is defined as:
(3)
(4)
6. Results
Table 1 Logit Model Results using VBA
Model 1
CONST WC/TA RE/TA EBIT/TA ME/TL S/TA
b -2.543 0.414 -1.454 -7.999 -1.594 0.620
SE(b) 0.266 0.572 0.229 2.702 0.323 0.349
t -9.56 0.72 -6.34 -2.96 -4.93 1.77
p-value 0.000 0.469 0.000 0.003 0.000 0.076
Pseudo R² / # iter 0.222 12
LR-test / p-value 160.1 0.000
lnL / lnL0 -280.5 -360.6
Appendix:
Option Explicit
'Adding a vector of ones to the x matrix if constant=1, name xraw=x from now on
Dim x() As Double
ReDim x(1 To N, 1 To K)
For i = 1 To N
x(i, 1) = 1
For j = 1 + constant To K
x(i, j) = xraw(i, j - constant)
Next j
Next i
ybar = Application.WorksheetFunction.Average(y)
If constant = 1 Then b(1) = Log(ybar / (1 - ybar))
For i = 1 To N
bx(i) = b(1)
Next i
'Compute prediction Lambda, gradient dlnl, Hessian hesse, and log likelihood lnl
For i = 1 To N
lambda(i) = 1 / (1 + Exp(-bx(i)))
For j = 1 To K
dlnL(j) = dlnL(j) + (y(i) - lambda(i)) * x(i, j)
For jj = 1 To K
hesse(jj, j) = hesse(jj, j) - lambda(i) * (1 - lambda(i)) * x(i, jj) * x(i, j)
Next jj
Next j
lnL(iter) = lnL(iter) + y(i) * Log(1 / (1 + Exp(-bx(i)))) + (1 - y(i)) * Log(1 - 1 / (1 + Exp(-bx(i))))
Next i
'Compute inverse Hessian (=hinv) and multiply hinv with gradient dlnl
hinv = Application.WorksheetFunction.MInverse(hesse)
hinvg = Application.WorksheetFunction.MMult(dlnL, hinv)
'If convergence achieved, exit now and keep the b corresponding with the estimated hessian
If Abs(change) <= sens Then Exit Do
Loop
'output
Dim relogit()
ReDim relogit(1 To 1, 1 To K)
If stats = 1 Then ReDim relogit(1 To 7, 1 To K)
'Coefficients
For j = 1 To K
relogit(1, j) = b(j)
Next j
relogit(5, j) = "#N/A"
relogit(6, j) = "#N/A"
relogit(7, j) = "#N/A"
Next j
End If
logit = relogit
GoTo myend
'Error Handler
error:
MsgBox ("Fatal Error. Reasons might be: y not {0,1}, not the same number of N for y and x's...or
anything else")
myend:
End Function
Reference:
1. Loeffler, G. and Posch, P. N. 2007 “Credit Risk Modeling Using Excel and VBA with DVD (The
Wiley Finance Series)”. John Wiley & Sons.