Meis PDF
Meis PDF
mortgages
Janneke Meis
372866
Abstract
The current low interest rate environment has triggered refinancing incentives in the residential mortgage sector. An unscheduled return of a part or the full outstanding principal
constitutes a risk from the perspective of financial institutions providing mortgages. On the one
hand, prepayments affect the Asset and Liability management of a bank. On the other hand,
prepayments lead to interest rate risk. Given the magnitude of the residential mortgages on the
balance sheet of a bank, it is of vital importance to obtain insight in the actual maturity of the
mortgages provided by financial institutions. This thesis looks into different models that can be
used to predict current and future prepayment rates. The most widely used prepayment models
are option theoretic models, multinomial logit models and competing risk models. Aside from
these models this thesis investigates the applicability of the Markov model as a prepayment
model. Important determinants of prepayment include borrower specific characteristics, loan
specific characteristics and macro-economic variables. Variable selection procedures are used to
identify the most important risk drivers. Models are compared based on a wide range of insample and out-of-sample performance criteria to determine the model that is most appropriate
for predicting prepayment rates. Another important feature that the models should be capable
of incorporating is the recent credit crisis of 2008. Tests on parameter stability are conducted
to determine the possible presence of structural breaks in prepayment models.
Contents
1 Introduction
2 General background
2.1 Mortgage market in the United States . . .
2.2 Embedded options in the mortgage contract
2.3 Determinants of mortgage termination . . .
2.4 Prepayment models . . . . . . . . . . . . . .
2.5 Prepayment modelling over time . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
9
10
3 Technical background
3.1 Empirical models . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Multinomial logit model . . . . . . . . . . . . . .
3.1.2 Survival analysis . . . . . . . . . . . . . . . . . .
3.1.3 Markov model . . . . . . . . . . . . . . . . . . .
3.2 Variable selection procedures . . . . . . . . . . . . . . .
3.3 Tests for structural breaks . . . . . . . . . . . . . . . . .
3.3.1 CUSUM and CUSUMSQ Tests . . . . . . . . . .
3.4 Option theoretic model . . . . . . . . . . . . . . . . . . .
3.4.1 Interest rate models . . . . . . . . . . . . . . . .
3.4.1.1 Vasicek model . . . . . . . . . . . . . .
3.4.1.2 Hull-White extension of Vasicek model
3.4.1.3 Swaption price under HW1f . . . . . . .
3.4.1.4 Model calibration . . . . . . . . . . . .
3.5 Assessing model performance . . . . . . . . . . . . . . .
3.5.1 Panel data performance . . . . . . . . . . . . . .
3.5.2 Cross sectional analysis . . . . . . . . . . . . . .
3.5.3 Time series analysis . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
13
14
16
17
17
18
19
20
21
21
22
22
23
24
25
4 Data description
4.1 Dependent variables . . . . . . .
4.1.1 Three state models . . . .
4.1.2 Five state models . . . . .
4.1.3 Time series analysis . . .
4.2 Independent variables . . . . . .
4.3 Bivariate analyses . . . . . . . .
4.3.1 Loan Age . . . . . . . . .
4.3.2 Loan specific risk drivers .
4.3.3 Macroeconomic variables
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
27
28
30
31
33
34
34
38
5 Results
5.1 Multinomial logit model . . . . . . . . . . . . . . . . . .
5.1.1 Three state MNL model . . . . . . . . . . . . . .
5.1.2 Five state MNL model . . . . . . . . . . . . . . .
5.1.3 Independence of Irrelevant Alternatives Property
5.1.4 Cross sectional analysis . . . . . . . . . . . . . .
5.1.5 Time series analysis . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
40
40
40
40
41
41
42
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
46
47
49
49
49
51
52
54
57
58
60
6 Model performance
6.1 In-sample performance . . . . . . . . . . . . .
6.2 Out-of-sample performance . . . . . . . . . .
6.2.1 Cross sectional out-of-sample . . . . .
6.2.2 Time series out-of-sample . . . . . . .
6.3 Tests for parameter stability . . . . . . . . . .
6.3.1 Estimation of time varying coefficients
6.3.2 CUSUMSQ Test . . . . . . . . . . . .
6.3.3 MNL3 model with structural break . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
63
63
63
64
65
68
69
5.2
5.3
5.4
5.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Conclusion
71
73
74
75
81
83
89
95
Introduction
The outstanding amount of all residential mortgage loans in the US amounted USD 13.5 trillion in
the beginning of 20151 . A total of USD 7.6 trillion has been securitized and sold to the secondary
market in the form of mortgage backed securities (MBS) and mortgage trusts. The remainder of the
outstanding debt constitutes a direct investment in mortgage loans by financial institutions. The
most important segment in the secondary mortgage market are the so-called agency mortgage backed
securities. The three agencies that issue and guarantee mortgages in the US are the Government
National mortgage Association (Ginnie Mae, GNMA), the Federal National Mortgage Association
(Fannie Mae, FNMA) and the Federal Home Loan Mortgage Corporation (Freddie Mac, FHLMC).
The combined outstanding debt held by these agencies amounted USD 2.7 trillion at the beginning of
2015. GNMA securities are default free since they are fully backed by the US government. FNMA
and FHLMC are government sponsored agencies but are not default free. GNMA, FNMA and
FHLMC securities generally have highly standardized features, trading and settle mechanisms. The
most common mortgage in their portfolio is the thirty year fixed rate fully amortizing mortgage.
Standard residential mortgages in the US offer full prepayment flexibility in the sense that the
sense that the entire principal outstanding can be paid off at any time. Unlike in the Netherlands
the majority of mortgages can be prepaid without a prepayment penalty. The option to prepay
can be seen as an American option on the mortgage contract. At the discretion of the borrower,
the outstanding principal can be returned, relieving the mortgagee from any future contractual
obligations on the loan. Allthough it is possible to form an estimate of the timing of option exercise,
such options are often not exercised in an optimal way. Main causes are the diverging ability of
borrowers to time optimal exercise of such options as well as other non-rational heterogeneities in
borrower behaviour. Aside from the option to prepay the borrower also has the option to default
on the mortgage. The incentive for voluntary default is high when the market value of the property
is lower than the value of the mortgage. Again, although this option may be exercised optimally,
heterogeneity in borrower behaviour prevents the prediction of option exercise times in a purely
optimal fashion.
The presence of implicit options embedded in mortgage contracts coupled with heterogeneity in
borrower behaviour pose risks from the side of the mortgagor. On the one hand, uncertainty related
to predicting the timing of cash flows leads to liquidity risk. One other hand, a mismatch in interest
rates paid and received leads to interest rate risk. From the perspective of the lender it is therefore
important to obtain reliable estimates of future prepayment and default rates as well as gain insight
in the factors driving these rates. This can aid financial institutions in their Asset and Liability
management.
The most important determinant of prepayment is refinancing. Refinancing is attractive when
the mortgage rate currently paid on the loan is higher than the mortgage rate available in the
market. Other reasons for prepayment are for example relocation and divorce. Default is a form
of prepayment in which the outstanding principal is returned via a sale of the property. A peculiar
feature of the US mortgage market is that a borrower with mortgage (pre-)arrears can opt for a
voluntary repossession. In some states in the US, borrowers can hand in the keys of their home and
be relieved from any future obligations on the mortgage.
The US economy is slowly recovering from the collapse of the housing and mortgage markets in
2008. House prices are rising again since their bottom out in 2011. Home sales remain generally
flat but housing starts are again increasing. Since the decrease in mortgage rates as of 2013 not
only has the demand for mortgage loans increased, the level of prepayments has dropped after its
1 Source: Economic Research
Data of the Board of Governors of the Federal Reserve System, available at
http://www.federalreserve.gov/econresdata/releases/mortoutstand/current.htm.
spike in 2009. Mortgage defaults are on a decreasing trend since the crisis (Agency, 2014). All in
all, current market developments lead to rising fear among lenders for increasing prepayment rates.
The housing market is picking up which fuels incentives to move. Simultaneously, although interest
rates are beginning to rise, they still remain at a historically low level and are expected to rise in
the future. Searching for better mortgage rates in the market is likely to be a fruitful action.
Given the size and the developments of the US mortgage market in the last decade this market
is an interesting market to study. A prepayment model developed now is likely to be considerably
different from the prepayment models constructed a decade ago. Aside from being able to capture
the economic conditions of the last decade, a model for prepayments should contain both an optimal
component as well as a component that incorporates borrower heterogeneity. The aim of the present
research is to develop a model that is able to accurately predict past, current and future prepayment
rates in the US. This involves the selection of appropriate modelling strategies as well as the selection
of relevant variables. Furthermore, attention will be paid to the changing circumstances resulting
from the business cycle as well as borrower heterogeneity.
This research uses the Single Family Loan Level data set provided by Freddie Mac. It contains
information on mortgage characteristics as well as mortgage performance over time for loans issued
by Freddie Mac in the period 1999 to 2014. Additionally, several macro-economic variables are
added to the data set. Prepayments are modelled with an option theoretic model, a multinomial
logit (MNL) model, a competing risk model and a Markov model. The MNL model is the most widely
used in the research on mortgage termination due to its relatively straightforward estimation. It
is demonstrated that the competing risk model constitutes a special case of the MNL model under
certain conditions. The Markov models are estimated as conditional logit models using covariate
information and conditioning on the current status of contractual mortgage payments. Due to the
multitude of variables affecting prepayment and default decisions, a variable selection procedure is
applied to determine which variables are most influential for explaining mortgage termination. Since
it has been established that prepayments exhibit a behavioural uncertainty, the results of the option
theoretic model are not of interest standing alone but are included as an explanatory variable in the
exogenous models. The optimal refinancing incentive is derived via the risk neutral valuation of a
European lookback put option on the mortgage contract. To this end, the risk free rate is modelled
with a Hull-White One Factor (HW1f) model. The advantage of this model is that it can provide
an exact fit to the current term structure by taking this as an additional input variable, next to
the spot rate. Since refinancing constitutes the most important determinant for prepayments, the
appropriate construction of the refinancing incentive is a valuable addition to the exogenous models.
The aim of this thesis is to develop a prepayment model that can predict in-sample and outof-sample prepayment rates, whereby out of sample can be cross sectional as well as time series
wise. Performance is assessed by means of a wide array of indicators. Models under investigation
are the prepayment models most widely used in the prepayment literature. The Markov model is
introduced as a novel prepayment model. A multitude of loan specific, borrower specific and macro
economic variables are included in the analysis as risk drivers. Special attention is paid to whether
the models hold up well during the financial crisis of 2008. Prepayments and defaults cause the
actual maturity of a mortgage contract to be stochastic and typically shorter than its contractual
maturity. The main contribution of this thesis is to provide insight in the true maturity of a portfolio
of mortgage contracts. This is relevant for financial institutions since the funding they attract to
cover for the mortgages they provide should be of shorter maturity as well. This research adds to
existing literature on this topic by applying the most common models in the prepayment literature
as well as a novel prepayment model, the Markov model, to one data set. This allows for an accurate
comparison between the models for which a wide array of performance criteria will be used. Moreover
this research pays special attention to the presence of a structural break since the data set includes
General background
2.1
The US mortgage market is characterized by a relatively high negative equity rate. Although the
rate has come down from 31.4 percent in the second quarter of 2012 to 15,4 percent in the first
quarter of 2015, this percentage is still high in international comparison (Gudell, 2015). In 2015Q1,
11.8 percent of home owners in negative equity owe more than twice what their home is worth to the
bank. Even though such high loan-to-value (LTV) ratios2 are geographically concentrated, the US
mortgage market can in general be characterized as having high LTV ratios. Mortgage defaults are
on a downward trend as well, although still well above pre-crisis levels. Although mortgages rates
have risen slightly since 2012, they still remain at relatively low levels, increasing the propensity to
refinance (Agency, 2014).
The most common mortgage product in the US is the fixed rate fully amortizing loan with a
maturity of thirty years. The rate paid on the mortgage is generally fixed for the lifetime of the
loan and the the loan is fully paid off at maturity. Each period, a constant amount is repaid. The
fraction of this amount that is used to repay the principal increases over time since interest payments
decrease as the outstanding balance decreases over time.
An important feature of the US residential mortgage market is that agency backed loans allow
for a penalty free prepayment of (part of) the principal any time before maturity. Commercial
and investment property loans can however carry a prepayment penalty, which often consists of a
percentage of the remaining principal outstanding. To discourage prepayment behaviour, financial
institutions in other countries often set a monthly prepayment limit in excess of which a penalty
will be charged.
The default legislation in the US is often targeted as one of the causes of the financial crisis.
The most peculiar feature is the fact that in some states3 mortgage loans are nonrecourse, meaning
that if a borrower fails to make contractual mortgage payments, the lender can seize the collateral
but has no recourse to any other of the borrowers assets (Gerardi and Hudson, 2010). Voluntary
repossession effectively means that a borrower can hand in the keys of the house an be free from
any future obligations even though he has enough cash on the bank. Especially in times when
housing prices are low, the existence of negative equity leads to a large amount of so-called strategic
defaults. In most countries, the mortgage loan is diminished by the amount paid at the auction.
The difference between the market value of the property and the value of the loan, the shortfall, is
added to the debt of the borrower. To be alleviated from the entire mortgage debt, borrowers have
to file for bankruptcy.
Voluntary repossessions are not entirely free of costs. They come at the expense of a downgrade
in the FICO credit score. FICO scores are used by many US banks to assess the creditworthiness
of individuals. The score is designed to measure the risk of default by taking into account various
factors in a persons financial history, such as payment history, size of the debt burden and length
of credit history. Lenders are in the position to offer different terms and conditions to customers
depending on their credit rating.
2 Loan-to-value
2.2
The mortgages sold by mortgagees contain implicit options. Prepayment is one of these options.
The mortgagor exchanges the unpaid principal balance on the mortgage for a release from further
obligations (Deng, 1997). Default is another optionality in a mortgage contact. The property is
sold in exchange for elimination of future mortgage obligations. The difficulty with quantifying such
options is that mortgagors do not exercise these options efficiently. Moreover, mortgagors behaviour
is heterogeneous and cannot be represented by a typical borrower (Deng et al., 2005). This risk that
the behaviour of the mortgagor deviates from what is expected from a purely financial standpoint
is called behavioural risk (Bissiri and Cogo, 2014). The application of this risk to early unscheduled
return of the principal on a mortgage is defined as prepayment risk. A formal definition is given in
Kolbe (2002) who defines prepayments as: (contractually permitted) notional cash flows which occur
earlier or later than expected, deviating form the anticipated call or put policy of the counterparty
in a financial contract (Kolbe, 2008), p.21. Henceforth, Kolbe defines prepayment risk as the risk
resulting from these cash flow deviations.
Risk of early mortgage termination makes the duration of a portfolio of mortgages stochastic and
in turn has implications on the refinancing policy of the lender (Jacobs et al., 2005). The implicit
options embedded in mortgages may or may not be exercised in response to market changes, which
in turn leads to significant liquidity risk and interest rate risk for the credit providing institution
(Consalvi and di Freca, 2010). Interest rate risk in general arises when there is a mismatch in
the fixation of the interest rates paid and received by the bank (Perry et al., 2001). To fund the
mortgage, financial institutions attract resources from elsewhere, on which a certain agreed upon
rate has to be paid over a fixed period. If the mortgagee decides to prepay (part of) the principal
at any time before its original maturity, the mortgagee will have to find an alternative use for these
funds. If market interest rates have fallen since the origination of the mortgage the bank will incur
a loss. On the other hand, uncertainty in the maturity profile of loans subjective to prepayment
has a considerable impact on the representation of the liquidity profile of the bank (Consalvi and
di Freca, 2010). An incorrect evaluation of this profile exposes financial institutions to the risk of
overestimating future liquidity requirements (overfunding) as well as to the risk of increased longterm liquidity costs (Bissiri and Cogo, 2014).
To incorporate the risk of prepayment, financial institutions generally include a charge in their
mortgage pricing. This risk is however not priced in completely as a mortgagor can prepay any
time while the spread to account for this risk is received on a monthly basis across the life of a
mortgage (Vasconcelos, 2010). To discourage prepayment behaviour of mortgagees, banks usually
charge a prepayment penalty. This penalty is mostly equal to the present value of the difference in
monthly interest payments between the mortgage and of a newly originated mortgage with the same
characteristics (Jacobs et al., 2005).
2.3
There exist a multitude of factors that influence the mortgagors decision of mortgage termination.
These risk drivers can roughly be divided into borrower-specific factors, loan-specific factors and
macro-economic factors, see Table 1. The effect of these variables on prepayment and default rates
is fairly straightforward, see for example (Clapp et al., 2000).
Some important features stand out. Firstly, prepayments often exhibit an S-shaped relation
with loan age. This so-called seasoning effect arises since prepayment rates are generally low shortly
after origination of the loan and increase as the mortgage matures, the ramp-up period, to finally
arrive at a steady state level near the maturity of the mortgage (Charlier and Bussel, 2001), (Jacobs
et al., 2005). Secondly, trends in house prices are indicative of the level of activity in the housing and
8
Category
Borrower specific
Loan specific
Macro-economic
Explanatory variables
Age, income, creditworthiness, loan purpose, employment
status.
Loan age, loan amount, mortgage rate, insurance, penalty,
location, property type, market value of the property.
Housing prices, mortgage rates, risk free rates, divorce rate,
month of the year.
Table 1: Factors affecting prepayment rates.
mortgage markets. In periods of house price appreciation, home sales and mortgage originations may
increase as the expected return on investment rises (Agency, 2014). Prepayments due to relocation
are more prevailing in this case. Conversely, during periods of price depreciation or price uncertainty,
home sales and mortgage originations tend to decrease as risk-averse home-buyers are reluctant
to enter the market. In turn, this leads to fewer prepayments due to relocation. Furthermore,
prepayment due to relocation is positively affected by the divorce rate.
Prepayment due to refinancing incentives is by far the most important reason for prepayment.
Refinancing can be attractive when the mortgage market rate is below the contractual rate. The contractual mortgage rate paid and the effective duration of the mortgage are inversely related (Burns,
2010). An important feature of the refinancing incentive is the so-called burnout effect (Jacobs
et al., 2005). This effect arises due to differences in borrower behaviour in a pool of mortgages. If a
refinancing incentive occurs (such as a drop in mortgage market rates) a wave of prepayments will
occur. The borrowers that grasp this opportunity can be deemed the fast,e.g. financially aware,
borrowers whereas the remainder of the borrowers in the pool are slow borrowers. Therefore, in
the presence of another refinancing opportunity, the pool is expected to be less active or in other
words, the pool is burned out (Gonchanov, 2002).
Going into strategic default when the market value of the property is lower than the value
of the loan is the second most important determinant for prepayment. This is especially true in
jurisdictions where mortgage loans are issued on a non-recourse basis. Properly assessing the value
of negative equity at each point in time is however a challenging task since the market value of the
property is unknown.
2.4
Prepayment models
Following on the previous section, it is clear that it is challenge to model prepayment rates give the
multitude of factors that are of influence in this decision. The models proposed in the literature can
broadly be classified into optimal prepayment models on the one hand and exogenous prepayment
models on the other hand. The former category contains models that consider prepayments to
occur as a result of fully rational behaviour of borrowers. These models make the assumption that
prepayments are exercised in an optimal way and rely on the absence of arbitrage. Only taking
financial considerations into account would lead mortgagors to prepay if the current value of their
property exceeds the remainder of the outstanding mortgage plus transaction costs (Bussel, 1998).
An entire body of literature has developed which models prepayments as options on a mortgage.
This assumptions is useful for valuation purposes but is quite stringent in ruling out all irrational
behaviour. Studies have demonstrated that modelling prepayments according to financial variables
such as housing turnover and mortgages rates can lead to either underestimation of prepayment
rates under financial optimal circumstances and overestimation of prepayment rates in financial
suboptimal times (Consalvi and di Freca, 2010). Indeed, it has been observed that prepayments
9
2.5
The prepayment projections resulting from the models are dependent on the time frame in which they
are conducted. Prior to the credit crisis of 2008 a prepayment model might predict a sharp increase in
prepayment level, following a refinancing incentive. However in 2008 and 2009 a similar prepayment
model should predict a lower prepayment rate following the same refinancing incentive (Burns,
2010). Economic circumstances have considerable influence on prepayment rates. More specifically,
a weakened housing market, unemployment levels and lending standards create a liquidity crisis for
mortgagors and thereby reduce the incentive to prepay. Therefore, any prepayment model should
contain macro-economic factors to account for economic conditions in a country. Furthermore, it is
interesting to test for structural breaks in the data set. It is likely that parameter estimates prior
and post 2008 are considerably different.
10
Technical background
This section provides a technical background on two different methods for modelling mortgage
termination: option theoretic models and empirical (exogenous) models.
3.1
3.1.1
Empirical models
Multinomial logit model
Discrete choice models are models in which the dependent variable is a categorical response variable,
Yit = j for j = 1, 2, .... Using a set of explanatory variables, these models estimate the probability
that that either one of the categories in the ordinal dependent variable occurs P (Yit = j). Two
popular models are the probit model on the one hand and the logit model on the other hand. The
difference between the two lies in the distributional assumptions for the error terms. The probit
model assumes a normal distribution, whereas the logit model assumes a logistic distribution. In
the context of mortgage continuation and termination, the multinomial logit model (MNL) is the
most widely adopted. The logit is defined as
F (.)
0
ln
= Xit
j ,
(3.1)
1 F (.)
where F (.) is the logistic cumulative distribution function, Xit contains the explanatory variables
and j denote coefficient estimates. Consequently, let the probability that mortgage i at time t is
classified as category j be defined as
eXit j
P (Yit = j) = P X 0 .
e it j
(3.2)
11
categories must necessarily be associated with a probability decrease for one of the other categories.
The coefficients in the MNL model are interpreted in terms of the log odds ratio. To ensure parameter
identification, one of the categories is set as benchmark category. To this end, the coefficients are
equated to zero. The probability of the ith mortgage at time t being classified as being in the
reference state follows directly from 3.2 and reads
P (Yit = 0) =
1+
1
P
eXit j
(3.3)
j1
(3.4)
in which dijt is a dummy variable for the category of Yit . The coefficients are estimated by means of
Maximimum Likelihood Estimation (MLE) and are obtained from the first order conditions (FOC)
of Equation (3.4). Estimates are obtained by maximizing ln L with respect to j .
One of the properties of the MNL model is the Independence of Irrelevant Alternatives (IIA).
This means that the odds ratio for any pair of choices is assumed to be independent of any other
alternative. Elimination of one of the choices should not change the ratios of probabilities for the
remaining choices. This means that a requirement for the categories is that they are mutually exclusive. More formally, the IIA property states that characteristics of one particular choice alternative
do not impact the relative probabilities of choosing other alternatives. To validate whether the IIA
property holds the Haussman-McFadden test can be performed. This test relies on the insight that
under IIA, the parameters of the choice under a subset of alternatives may be estimated with a MNL
model on just this subset or the full set though the former is less efficient than the latter (Vijverberg,
2011). The test statistic is given by
HM = (U R )0 [VU VR ]1 (U R ),
(3.5)
which is 2 (k) with k denoting the number of parameters. Under the null hypothesis IIA holds
which implicates that omitting irrelevant alternatives will lead to consistent and efficient parameter
estimates for the restricted model, R , while the parameter estimates of the unrestricted model,U
are consistent but not efficient. Under the alternative, only U are consistent.
In this thesis, two classifications for Yit will be adopted. The three state classification includes
contractual payment, default and prepayment
1 if Contractual payment
Yit3 = 2 if Prepayment
,
3 if Default
in which the first category will be taken as reference state.
Information on intermediate states states can be included via curtailments (partial prepayments)
and delinquency (delayed payments). This leads to a dependent variable with five states
1 if Contractual payment
2 if Curtailment
5
Yit = 3 if Prepayment
.
4 if Delinquent
5 if Default
12
The superscript will be omitted when the analysis is applicable to both models. The MNL
constitutes the most common approach to modelling mortgage termination. Due to the relative ease
with which such models can be estimated, they form an appropriate starting point for modelling
mortgage termination. DCM suffer from the drawback that they are unable to take into account
dependence in observations that arises from the fact that the same mortgage contract is observed
over multiple time periods. Mortgage termination cannot be assumed to be independent over time.
Survival models can alleviate this problem.
3.1.2
Survival analysis
Survival models specify the probability distribution for the duration of a mortgage contract. The
dependence of observations over time is taken into account by using the mortgage contract as unit
of measurement as opposed to the contract year.
Survival models focus on modelling the time until the occurrence of a certain event. Within the
survival analysis the link between the survival function and the covariates is usually expressed on
the basis of two models: accelerated life models (ALM) and proportional hazard models (PHM)
(Consalvi and di Freca, 2010). Cox (1972) was among the first to model time to failure as an
underlying random variable. After (Green and Shoven, 1983) applied survival models to the mortgage
literature, this approach became more popular in this field of research (Charlier and Bussel, 2001),
(Jacobs et al., 2005), (Deng et al., 2005). In these studies the duration of a mortgage is modelled
until it is terminated. The hazard rate is defined as the probability of mortgage termination for
reason j at time t given the non-occurrence of this event until time t
j (t, x) = lim
t0
0
P (t Tj < t + t|Tj t)
= j0 (t)eXit j .
t
(3.6)
As can be seen, to analyze the relationship between the survival function and the covariates, the
model can be split into two parts. The first part of the model, 0j (t), captures the distribution of
the failure time when the explanatory variables are equal to zero, e.g. the base rate. It is a reflection
of the natural prepayment rate and varies with the age of the mortgage. The seasoning effect
explained in Section 2.3 is directly visible in this rate. The parametric specifications assume a specific
functional form, such as Exponential, Weibull, Log Normal or Log logistic. These distributions
are often selected due to their ability of capturing the S-shaped relationship between prepayments
and age of the loan. Alternatively, the distribution can be left unspecified or be estimated nonparametrically. The second part of the model incorporates the effect of the explanatory variables
on the hazard rate. The second part in (3.6) is a proportionality factor that incorporates both loan
-and time specific effects, Xijt .
Following (Rodriguez, 2005), let T be a discrete random variable representing survival time.
Analogous to the classification of the dependent variable in the multinomial logit model, it is assumed that mortgage termination can occur due to prepayment or default. The default category is
contractual payment. Since the data collection period is cut-off at a certain date, some mortgage
duration data will be unobservable if these mortgages are not terminated before the cut-off date.
These mortgages are generally classified as belonging to the benchmark category (contractual payment). The survival function is defined as the probability of surviving from failure type j up to time
t and is given by
Sj (t, x) = ej (t,x) ,
(3.7)
where j (t, x) is the cumulative hazard for cause j
XX
j (t, x) =
j (t, x).
t
13
(3.8)
Combining the above, the unconditional probability that a mortgage is terminated at time t due to
cause j is given by the cause-specific density
fj (t, x) = j (t, x)S(t, x).
The log likelihood of the competing risk model is given by
XXX
ln L(j ) =
dijt ln fj (t, x),
t
(3.9)
(3.10)
where dijt is a dummy variable indicating the reason for contract termination.
This likelihood is very similar to the likelihood of the MNL model, see Appendix A. In a competing risk model the analysis can thus be broken down into two parts. The MNL model determines the
cause of death and the standard hazard model determines the overall risk (Rodriguez, 2005). When
the effect of the baseline hazard, 0j is small relative to the effect of the covariates the competing
risk model and the MNL model are comparable.
One drawback of competing risk models is the difficulty of including time varying covariates in
the analysis. This can be seen as follows. The dependent variable in survival studies constitutes the
survival time of mortgage i from cause j, Tij . This variable does not depend on time t over which
mortgages are observed. Using time varying covariate information in competing risk models poses
a problem since for each mortgage only one observation on each covariate can be used. Therefore,
the panel dataset has to be transformed into a cross-sectional dataset by combining information
per time-varying covariate per mortgage i. There are several methods for combining time varying
information into one observation per covariate. One example given in Section 3.5.2 is to average
time varying covariates per mortgage. Another option is to randomly select a time period t for each
mortgage i from which covariate information is included.
Since it has been demonstrated in Appendix A that the MNL model and the competing risk
model are very similar and given the fact that the MNL model is capable of including time varying
covariates, the MNL model is preferred over the competing risk model. Therefore, the coefficients of
the competing risk model are not estimated separately. The baseline hazard of the survival models
will be investigated to determine whether the outcome of the competing risk model and MNL model
is indeed similar.
A more appropriate model to account for dependence between observations is the Markov model.
Dependence is accounted for by conditioning on the current state in mortgage termination. In
three state models the preceding state is always contractual payment. In the five state models
conditioning on partial prepayment and delinquency status can contribute to capturing dependency
in prepayments and defaults.
3.1.3
Markov model
In general, a stochastic process Yt , t 0 with state space S is a discrete time Markov chain if, for
all states i, j, s0 , ..., st1 S,
P (Yt+1 = j|Yt = i, Yt1 = st1 , ..., Y0 = s0 ) = P (Yt+1 = j|Yt = i).
(3.11)
Hence, given the present Yt and the past Y0 , ..., Yt1 of the process, the future Yt+1 only depends
on the present and not on the past. The (one-step) transition probabilities from state i to state j
are conditional probabilities and defined as
pij = P (Yt+1 = j|Yt = i).
14
(3.12)
In a time-homogeneous Markov chain, the one-step transition probabilities are time independent.
For a Markov chain with m states, the one-step transition matrix is a stochastic matrix given by
..
..
..
(3.13)
P = ...
.
.
.
A Markov chain arises in mortgage terminations since in each consecutive period, the borrower
can decide to continue with the contractual payment scheme, prepay the mortgage or default on the
mortgage. Full prepayment and default constitute end states, whereas contractual payment can be
followed by either of the three states. Figure 2 visualized this Markov chain consisting of m = 3
states in a transition diagram.
p02
p00
p01
0
Figure 2: Transition diagram for contractual payment (Yt = 0), full prepayment (Yt = 1) and default
(Yt = 2).
The corresponding transition matrix is
p00
P= 0
0
p01
1
0
p02
0
1
(3.14)
p33
p03
p10
p00
p01
p31
1
p12
p32
p34
p13
p14
p02
p30
Figure 3: Transition diagram for contractual payment (Yt = 0), partial prepayment (Yt = 1), full
prepayment (Yt = 2), delinquency (Yt = 3) and default (Yt = 4).
15
p00
p10
P=
0
p30
0
p01
p11
0
p31
0
p02
p12
1
p32
0
p03
p13
0
p33
0
p04
p14
0
,
p34
1
(3.15)
in which full prepayment and default again denote end states. Theoretically, all given transitions in
P are possible. From a practical perspective, the probability of a transition from partial prepayment
to delinquent, p13 , will generally be small.
The transition probabilities can be estimated from the data by deriving the likelihood and equating the first order conditions to zero. A consistent MLE estimate for pij is given by
Nij
pij = P
,
m
Nij
(3.16)
j=1
3.2
To extract the variables that are most important in determining prepayments, variable selection
procedures can be applied. One possibility is a general-to-specific variable selection procedure.
Sequentially, variables that are most insignificant are removed from the model after which the logistic
regression is run again. This procedure can be conducted such that all variables are significant
for one state, all states or for any state. Since the purpose of the present research is to develop
a prepayment model a variable selection procedure is applied with which only variables that are
significant fr prepayments remain in the model.
Individual significance is assessed by applying a standard t-test. The significance of the categorical variables is assessed jointly, by means of a Wald test. Under the null hypothesis it holds that
the set of coefficient estimates, is significantly different from zero. The Wald test statistic is given
by
2
2 (m),
(3.17)
var()
in which m denotes the number of elements in and the statistic is chi-squared distributed. The
significance level for selecting variables is set to ten percent to avoid that too many variables are
discarded from the model.
A disadvantage of the general-to-specific approach is that this approach can fail to identify useful
predictors in the general model if their significance is affected by irrelevant variables. To prevent
important predictors from not ending up in the final model, variables that are relevant based on
economic theory are added to the final (specific) to see if the selection is justified.
16
3.3
Important diagnostic tests when using models that rely on time series data are tests for parameter stability. Aside from the purpose of assessing model adequacy, such tests are also relevant in
providing information on out-of-sample forecasting accuracy. If model parameters are time dependent or exhibit (multiple) structural breaks, forecasting can become challenging. Multiple tests for
parameter stability exist. The Chow breakpoint test can be used if there is a known breakpoint
in the data set. The disadvantage of this test is that it requires an a priory split of the data. If
the location of the break point is uncertain, this test on its own is not very informative. Common
tests that are used to determine the timing of the break point are the CUSUM tests and Quandts
likelihood ratio (QLR) test. In this thesis the CUSUM test is used to test for parameter stability
since it requires less computation time than the QLR test. Given the sizeable data set this is an
important consideration.
3.3.1
The CUSUM and CUSUMSQ test are often used to test for the presence of a structural break
(Tanizaki, 2007). The former is useful for detecting the timing of the structural break however the
null hypothesis of no structural breaks is too often accepted. Therefore, the CUSUMSQ test is often
used to determine the timing of the structural break.
The recursive residual is defined as
wt = p
Yt xt t1
,
0
1 + xt (Xt1
Xt1 )1 x0t
(3.18)
t
X
wi /
, t = k + 1, ...T,
(3.19)
i=k+1
where
2 =
1
T k
t
P
i=k+1
symmetric it holds that E(Wt ) = 0 and that the distribution of Wt is symmetric around zero. Since
the distribution of Wt cannot be obtained explicitly, the test is conducted differently from standard
statistical tests. The null hypothesis
is accepted if Wt lies within
the upper and lower bounds that
pass through the points (k, +/cw T k) and (T, +/3cw T k). The parameter cw depends
on the significance level of the test, . For a significance level of = 0.10, cw = 0.850 (Tanizaki,
2007).
The CUSUMSQ test statistic is defined as
t
P
St =
i=k+1
T
P
wi2
(3.20)
wi2
i=k+1
and the corresponding confidence interval is given by a pair of straight lines cs + / Ttk
k where cs
depends on both the sample size T k and the significance level .
17
If the CUSUMSQ test statistic crosses the boundaries this is an indication of parameter instability. The point where the exceedance takes place, say t = , indicates the presence of the structural
break. If this is the case, the model can be estimated with the inclusion of an indicator variable
such that model parameters are allowed to differ prior and post the structural break. Formally, the
test is performed by estimating the logistic regression
Pitj
1 0
2 0
ln
= itj
Xit I[t ] + itj
Xit I[t > ] + itj ,
(3.21)
Pitk
1
2
in which itj
and itj
are coefficients obtained from
( 1 0
itj Xit + itj , t = 1, ..., ,
Pitj
=
ln
2 0
Pitk
itj
Xit + itj , t = , ..., T.
(3.22)
The structural break can be included in the model by estimating different coefficient before and
after the break date.
3.4
For a borrower with a contractual mortgage rate of 6.00 percent, looking back from the end of
the sample period, it would have been optimal to exercise the option in November 2011 since the
difference between the contractual mortgage rate and and the mortgage rate in the market (30Y
FRM) is the largest at this point.
The option to refinance can be seen as a European lookback put with a fixed strike price and a
varying asset price. The discounted payoff at maturity is given by
LBPi,T = P (0, T )E Q [max (moi min (ma ), 0)],
t T
(3.23)
where the strike price is equal to moi and is mortgage specific. The put option is is written on mat .
The payoff of the put option thus depends on two stochastic processes, namely the term structure
of the risk free rate and the 30 year market mortgage rate.
Equation (3.23) however fails to incorporate two important features of the optimal refinancing
incentive. For one, refinancing in an earlier phase of the mortgage contract is more attractive than
refinancing at the same rate close to maturity of the mortgage since more interest payments can
be saved. Secondly, mortgage rate differ depending on the remaining maturity of the mortgage
which is being refinanced. The contractual mortgage rate has to be compared to the mortgage rate
corresponding to the remainder of the maturity at the point of refinancing. The term structure of
mortgage rates therefore has to be derived to include the fact that mortgage rates are higher for
longer maturities. The outer maximum is the maximum of the put option. The inner maximum
refers to the point in time in which the difference between the contractual mortgage rate and the
market mortgage rate is the highest.
LBPi,T = P (0, T )E Q [max (max [(moi Rma (, T )) (T / )], 0)],
(3.24)
in which Rma (, T ) denotes the point on the yield curve where mat attains its minimum, that is
mat = mint T (ma ). The price of (3.24) can only be obtained numerically.
Both the risk free rate and the market mortgage rate have to be simulated under the risk neutral
measure. The two rates can be simulated by separate stochastic processes, assuming a certain
correlation between these processes. Given the fact that the evolution of the rates is similar, a
straightforward choice would be to simulate the short rate and the mortgage rate with a similar
stochastic process. The two rates show the same pattern since the mortgage securities sold by
Freddie Mac are implicitly backed by the US government. Therefore, mortgage securities receive an
AAA credit rating which in turn puts them in direct competition with risk free rates.
The following section provides a background on dynamic interest rate models.
3.4.1
The relation between interest rates and the bond price process P (t, T ) can be derived from the time
t value of P (T, T ) = 1 as
Q
P (t, T ) = E [e
RT
t
r(s)ds
|Fs ],
(3.25)
in which r(t) denotes the short rate and Fs is the filtration up to t = s. Bond prices relate to the
term structure, via
ln P (t, T )
,
(3.26)
f (t, T ) =
T t
in which f (t, T ) denotes the instantaneous forward rate at time t with maturity T .
19
(3.27)
r (t) is a
in which (t, r(t)) represents the drift term, (t, r(t)) denotes the diffusion term and dW
P Brownian motion with the usual properties (Bjork, 2009). To obtain risk-neural prices for the
derivatives, instead of specifying and under the objective probability measure P, the dynamics
of the short rate will be specified under the martingale measure Q. Using the change of measure
proposed by Cameron-Martin-Girsanov (CMG) Equation (3.27) can be written as
dr(t) = (t, r(t))dt + (t, r(t))dWr (t),
(3.28)
in which dWr (t) is a Q Brownian motion. Consequently, the family of bond price processes will be
determined by the general term structure equation
(
Ft + ( )Fr + 21 2 Frr rF = 0,
(3.29)
F (T, r) = 1,
where is defined as the market price of risk. If the term structure P (t, T ) has the form
P (t, T ) = exp (A(t, T ) B(t, T )r(t)),
(3.30)
where A(t, T ) and B(t, T ) are deterministic functions, then the model is said to possess an affine
term structure (ATSM).
Through different specifications of the short rate different term structures can be estimated,
which follow directly from Equation (3.29). In the literature, a wide variety short rate specifications
can be found. One of the earliest models is the Vasicek model (1977), followed by the Dothan model
(1978), the Cox-Ingersoll-Ross model (1985) and the Ho-Lee model (1986). Both the Vasicek model
and the CIR model are mean-reverting. The former allows for negative interest rates, while the
latter does not. The Dothan model also allows for interest rates to become negative and is not mean
reverting. Finally, the Ho-Lee model has a time dependent mean function and allows for negative
interest rates.
Desirable properties for the present research are: (1) mean reversion in interest rates and (2)
possibility of negative interest rates. Previous research often considers this possibility as a drawback
but since in the contemporary economic environments interest rates have become negative, the short
rate model has to allow for this possibility. The Vasicek model satisfies these two properties and
will be discussed in more depth.
3.4.1.1
Vasicek model
The Vasicek model is a mean reverting process under the risk neutral measure and specifies the
dynamics of the short rate using constant coefficients
dr(t) = (b ar)dt + dWr (t),
(3.31)
a2
4a
.
B(t, T ) =
20
(3.32)
Due to the fact that the Vasicek model assumes an endogenous term structure, it cannot take the
current term structure of interest rates as input (Baldvinsdottir and Palmborg, 2011). Consequently,
it is not possible to obtain an exact fit of this model to the market term structure. Since the aim
of the ESG is to provide a market consistent valuation, it is desirable that the interest rate model
is able to take the current term structure of interest rates as input.
3.4.1.2
The Hull-White extension of the Vasicek model, also called the Hull-White one-factor model (HW1f),
allows for an exact fit to the current term structure. The dynamics of the short rate are specified by
dr(t) = ((t) a(t)r(t))dt + (t)dWr (t),
(3.33)
in which the rate of mean reversion a(t) and the diffusion term (t) are time dependent. Through the
term (t) this model can be fitted to the current term structure. The time variation in (t) allows for
an exact fit to the spot or forward volatilities (Plomp, 2013). However, Brigo and Mercurio (2010)
(Brigo, 2010) note that if an exact fit to the current term structure is desired it can be dangerous
to perfectly fit the volatility term structure as well. The main reason is that the volatility quotes of
less liquid markets may be unreliable. To prevent overfitting, the preferred specification of the short
rate is obtained by setting a(t) = a and (t) = and is given by
dr(t) = ((t) ar(t))dt + dWr (t).
(3.34)
The SDE in (3.34) can be solved by applying Itos lemma. After observing that the process r(r),
conditional on the filtration up to t = s, denoted by Fs , the term structure implied by the HW1f
model can be derived. The structure of the zero-coupon bond prices is given in (3.25). The term
B(t, T ) is provided in 3.32 and the term A(t, T ) under HW1f is given by
A(t, T ) =
3.4.1.3
2 2
P (0, T )
exp(B(t, T )f (0, t)
B (t, T ).
P (0, t)
4a
(3.35)
The standard Black-Scholes formula for option pricing assumes a constant short rate. In a similar
manner, options can be priced using a stochastic short rate, such as the rate implied by the HW1f
model. The purpose of this section is to price an option on a bond. The price of a put option at
time t with strike K and maturity T on a zero coupon bond with maturity S is given by
ZBP (t, T, S) = KP (t, T )(h + P ) P (t, S)(h),
where
(3.36)
1 e2a(T t)
B(t, S),
2a
1
P (t, S)
P
h=
ln
+
.
P
P (t, T )K
2
P =
(3.37)
A European payer swaption with strike price K gives the holder the right at maturity T to enter
into a payer interest rate swap. A payer interest rate swap allows the owner to exchange a fixed
rate for a floating rate at a number of future dates, T1 , T2 , ..., Tn . The time between these dates, ,
is generally fixed. The maturity date of the swap usually coincides with the first reset date of the
21
n
X
KZBP (t, T, Ti , Ki ),
(3.38)
i=1
in which Ki = A(t, Ti )eB(t,Ti )r where r is the value of the spot rate at time T for which
n
X
KA(t, Ti )eB(t,Ti )r = 1.
(3.39)
i=1
3.4.1.4
Model calibration
In order to simulate interest rate paths, values for a and in (3.34) have to be determined. This
can be done by optimizing the value for a
and
such that the HW1f model is best fitted to market
prices. The best fit is defined as the minimization of the sum of the squared relative deviation
between market prices for swaptions and swaption prices implied by the HW1f model, given in
(3.39). The objective function is given by the minimum of
XX
[P S(t, Tj , Ti , N, K) BP S(t, Tj , Ti , N )]2
(3.40)
O=
i
n
X
P (t, Ti ),
(3.41)
i=1
where
,
(t) T0 t
p
d2 (t) = d1 (t) (t) T0 t.
d1 (t) =
(3.42)
Here (.) denotes the standard normal distribution, F (t) represents the future swap rate, (t) is the
quoted swaption volatility and i = 1, ...n denote the future reset dates of the swap. A derivation of
the Black formula can be found in Bjork (2009) (Bjork, 2009).
3.5
Since the aim of this thesis is to determine the model that is most appropriate for predicting prepayment rates, measures have to be identified that are capable of assessing model performance. After
the initial models are estimated, a variable selection procedure is conducted and diagnostic tests are
applied, model performance measures can be computed.
22
3.5.1
Model performance measures can be applied after estimating the logistic regression
Pitj
0
ln
= Xit
itj + itj
Pitk
(3.43)
in which Pitj = P (Yit = j) is the probability of event j occurring and Pitj = P (Yit = k) is the
probability of the benchmark event k occurring.
Measures of fit for logistic regressions generally fall into two categories, namely measures that look
at predictive power and goodness of fit tests (Allison, 2014). Examples of the former are measures
for R2 . However, no consensus is reached on the best manner that this can be calculated for a
logistic regression. The McFadden R2 or the pseudo R2 , is a likelihood ratio test that determines
the proportional reduction in error variance of an intercept only model versus a model that includes
explanatory variables. The Cox and Snell R2 is a more general version of the former, however it
has an upper bound less than one (Allison, 2014). Another example is the Tjur R2 which has the
advantage of being closely related to linear models and has an upper bound of one. However, it does
not depend on the likelihood. Therefore it could be the case its value decreases after the addition
of explanatory variables. Concluding, the McFadden R2 appears to be the most appropriate R2 for
a logistic regression
R2 = 1 ln(LM )/ln(L0 ),
(3.44)
in which LM denotes the likelihood of the model including regressors and L0 is the likelihood of the
model with only an intercept. The likelihood is given by
L=
n X
T X
m
X
(3.45)
n P
T
P
Yitj /N , in which N
i=1 t=1
Another approach to determine the relative quality of different model specifications is by means
of information criteria such as the Akaike Information Criterion (AIC) and the Schwartz Information
Criterion (SIC). These criteria are well suited for finding a balance between model fit and model
parsimony. They distinguish useful risk drivers from irrelevant ones these criteria penalize the
estimation of additional parameters. The AIC is defined as
AIC = 2k 2 ln L,
(3.47)
in which k denotes the number of parameters in the model and L is the likelihood. The SIC is
given by
SIC(k) = T ln
2 + k ln T
(3.48)
and penalized the inclusion of additional regressors more than the AIC.
23
To facilitate a comparison between the three- and five state models, the goodness of fit measures
are based on the contribution only of the three states (contractual payment, full prepayment and
delinquent) to the likelihood. To this end, the number of observations used in the computation of
the likelihood in the five state models has been adjusted.
3.5.2
Aside from looking at model performance in a panel data set, it can also be assessed over time
t or over mortgages i. Cross sectional model performance is assessed by transforming the panel
dataset into a cross-sectional dataset by averaging information per mortgage over time. The logistic
regression in Equation (3.43) looks as follows
Ti
Ti
X
X
Pitj
0
/Ti =
Xit
itj /Ti +
itj /Ti
Pitk
t=1
t=1
t=1
Pij
ln
= Xi0 ij + ij
Pik
Ti
X
ln
(3.49)
in which Ti denotes the number of (time series) observations per mortgage i. This number is
mortgage specific since the panel is unbalanced.
Since in a cross sectional out-of-sample analysis only one estimate can be made per individual i,
a contingency table is more appropriate to assess model performance. A contingency table compares
realized versus predicted state realizations. Predicted realizations are obtained by comparing the
estimates probability for mortgage i being in state j at time t, Pitj , to the in-sample probability for
state j. The in-sample probability is given by
Pj = N (Yj )/N
(3.50)
in which N (Yj ) denotes the number of observations classified as state j and N denotes the total
number of observations. The predicted value of the dependent variable reads Yit = j if the following
holds
Pitj > Pj
(3.51)
for states j = 1, ..., m. It could be the case that in-sample thresholds for multiple states are breached.
To ensure only one state assignment for an observation, the state can be assigned which has the
highest relative breach severity
Pitj Pj
.
(3.52)
Pj
Table 2 shows a contingency table of observed states, Yj , and forecasted states, Yj for state
j = 1, ...m.
Yitj Yitj
j=1
j=2
..
.
j=m
j=1
n(Y1 Y1 )
n(Y2 Y1 )
..
.
n(Ym Y1 )
N (Y1 )
j=2
n(Y1 Y2 )
n(Y2 Y2 )
..
.
n(Ym Y2 )
N (Y2 )
...
...
...
..
.
...
...
j=m
n(Y1 Ym )
n(Y2 Ym )
..
.
n(Ym Ym )
N (Ym )
N (Y1 )
N (Y2 )
..
.
N (Ym )
N
n(Yj Yj )
,
N (Yj )
(3.53)
the false alarm rate is defined as the number of times the model predicts state j when the actual
state is k
n(Yj Yk6=j )
.
(3.54)
Fjk =
N (Yk6=j )
Finally, the missed rate is defined as the number of times the model predicts state k when in
fact the state is j
n(Yk6=j Yj )
Mjk =
.
(3.55)
N (Yj )
Cross sectional model performance can be assessed both in-sample and out-of-sample. A crosssectional out-of-sample analysis can be conducted by using covariate information from all mortgages
except the ith mortgage and using the coefficient estimates to form a prediction on the state of
mortgage i.
3.5.3
For a time series analysis the panel data set can be transformed into a time series data set by
averaging different mortgages over calender months in Equation (3.43) as
nt
nt
X
X
Pitj
0
itj /nt
Xit
itj /nt +
/nt =
Pitk
i=1
i=1
i=1
Ptj
ln
= Xt0 tj + tj
Ptk
nt
X
ln
(3.56)
in which nt denotes the number of mortgages at time t. This number varies per month since each
month new mortgages can enter via an origination or can leave if they are matured, prepaid or
defaulted on.
For the present research a contingency table is only used to assess cross sectional model performance as opposed to for evaluating time series performance. Since there are no restrictions on the
amount of end states a model can predict (e.g. prepayments and defaults) the model can predict
multiple end states for a certain mortgage while in the dataset a mortgage is no longer observed if
an end state occurs. Therefore, using contingency tables for assessing time series model performance
provides a too negative view of the actual model performance. The models often signal a prepayment a few months before the actual prepayment occurs. This feature cannot be captured in the
contingency tables. Time series model performance can be assessed based on (i) unbiasedness; (ii)
accuracy and (iii) efficiency. The unbiasedness of model forecasts is evaluated by means of a simple
hypothesis test of a zero mean in the prediction errors. The prediction errors in a logistic regression
are given by
tj = Ptj Ptj ,
(3.57)
in which Ptj is the in-sample probability for state j and the residuals follow an Extreme Value
distribution, tj EV . When the sample size is large, the Central Limit Theorem (CLT) can be used
to obtain the asymptotic distribution of the residuals. If the model is well specified, the distribution
of the residuals converges to a normal distribution for large sample sizes, e.g. tj N ID(0, j ).
25
To test whether this assumption is valid, Kolmogorov-Smirnov (KS) test for normality can be used.
The test statistic, Dj , is defined as
Dj = suptj |Fn (tj ) F0 (tj )|
(3.58)
in which Fn is the empirical CDF of the residuals and F0 is the CDF of a normal distribution
with the same mean and variance. If this test does not reject that the residuals are asymptotically
normally distributed, a simple t-test can be used to test whether the prediction error has a zero
mean.
Another desirable property of any time series model is that there is no time variation in the
residuals. To test whether this is the case, residuals can be regressed on their lagged value(s) as
tj = 1 t1,j + 2 t2,j ut j + ... + l tl,j
(3.59)
in which l denotes the lag in empirical autocorrelation. To determine the number of lags by which
possible autocorrelation can be captured, the Partial Autocorrelation Function (PACF) of the residuals can be plotted. The PACF (not the ACF) can be used for this purpose since it corrects the
time series for autocorrelation at lower lags. By means of the Ljung-Box (LB) test it can be formally
tested whether the residuals exhibit there autocorrelation. The LB test is specified as
LB = T (T + 2)
L
X
2l
2 (L)
T l
(3.60)
l=1
(3.62)
26
(3.63)
Data description
The model is estimated using a panel data set of mortgage loans provided by Freddie Mac, the
Single Family Loan-Level data set. It contains loans that originated between January 1, 1999 and
September 31, 2013. The data set consists of monthly observations on fixed rate fully amortizing
mortgages with a maturity of thirty years. The sample is geographically dispersed in the United
states and covers loans that were originated in 51 different states. The fixed rate interest rate period
coincides with the full maturity of the mortgage contract. Unlike common in the Netherlands,
the data set does not contain intermediate periods in which the interest rate is reset. The data
set includes 50,000 loans that originated each year between 1999 and 2013. The total number of
mortgage loans in the data set amounts 737,111. Each loan is tracked from origination date until
mortgage termination or maturity with as cutoff date March 2014. Followed over time this leads to
an unbalanced panel data set consisting of 31,018,317 observations.
Since computation time is too lengthy when all observations are included in the model, a random
sample is drawn. A set of 10,000 mortgages is sampled uniformly at random without replacement
from the 737,111 mortgages. The mortgages are tracked over time which leads to a total of 409,319
observations.
4.1
Dependent variables
The dependent variable used in the models is a categorical variable indicating the state of the
principal payment scheme. There are two classifications for the dependent variable, namely one for
the three state models and one for the five state models.
4.1.1
The three states denote contractual payment of the mortgage loan, full prepayment of the mortgage
and default. The first category is used as a benchmark category. The benchmark category, Yit3 = 1,
denotes contractual payment of the remaining outstanding principal. The category Yit3 = 2 denotes
a voluntary prepayment of the full remaining outstanding principal. This classification is derived
from the data set by means of a variable indicating the reason for a zero balance code. The category
Yit3 = 3 denotes default. Loans with a delinquency status of more than 90 days are classified as
default. Observations on mortgages after a delinquency status of more than 90 days are removed
from the data set. This leads to a total of 29,932,667 observations in the full data set. The default
category also comprises foreclosures by an alternative group, for example though a short sale, third
party sale, charge off or note sale. In this case the borrower is unable to make principal or interest
payments and the property can be seized. A repurchase prior to a property disposition and a RealEstate-Owned (REO) disposition are also classified as default. A REO disposition occurs if the
lender becomes the owner of the property after an unsuccessful foreclosure auction.
Table 3 provides the number of observations for the dependent variable per category for both
the full data set and the random sample of 10,000 mortgages.
Yit3
1
2
3
Description
Contractual payment UPB
Voluntary prepayment of full UPB
Default
Random sample
402,154
6,616
549
27
The percentage of observations classified as prepayments in the sample denotes 1.616 percent,
compared to 1.640 percent in the full data set. For default the figures are 0.134 percent and 0.138
percent, respectively. Since these numbers are comparable the sample can be used as an approximation for the entire data set.
Table 3 can also be interpreted on loan level. From the 10,000 mortgages 6,616 mortgages are
prepaid at some point whereas 549 default at some point in the time period January 1999 until
March 2014. The high prepayment rate underwrites the importance of appropriately modelling the
propensity to prepay.
The Markov models are obtained by determining transition probabilities from state sequences
for each loan. Both full prepayment and default constitute end states, whereas the other categories
are transient states. If an end state of a loan is observed, then the state sequence for that loan is
observed entirely. However, if a loan is not fully prepaid nor defaulted within the sample period the
remaining states in the lifetime of the loan are not observed. Since the data set is right-censored,
the number of observations for each transition is less than the total number of observations. Table
4 shows the number of observations for each possible transition for the three-state Markov model.
Yit3
3
Yi,t+1
1
2
3
392,161
(0.9821)
0
(0)
0
(0)
6,609
(0.0166)
0
(1)
0
(0)
540
(0.0014)
0
(0)
0
(1)
The five state dependent variable consists of additional categories for partial prepayments and delinquency status. The delinquency state is derived from the number of days the borrower is delinquent,
based on the due date of the last payment installment (DDLPI) reported by servicers to Freddie
Mac. The delinquency state varies between zero and 119 days. An observation is classified as delinquent if the borrower is delinquent between 30 and 90 days on his mortgage payments. If a borrower
is delinquent from 1 to 29 days, this observation is classified as contractual payment.
Partial prepayments are defined as prepayments in excess of the payments expected under the
contractual payment scheme, often also referred to as curtailments. The actual cash flows can be
identified from the data set by means of the current actual Unpaid Principal Balance (UPB) which
is given for each observation. This reflects the mortgage ending balance as reported by the servicer
for the corresponding monthly reporting period. The UPB for the first six months after the loan
origination is censored. To obtain observations for this initial period, the UPB for these months
is interpolated from the original loan balance and the first given observation on the actual UPB.
Hence, it is assumed that in this initial period no curtailments take place.
28
The monthly contractual cash flows for fully amortizing loans are generated as follows
CF =
P r
,
1 (1 + r)n
(4.1)
in which r denotes the monthly mortgage rate and n is the original term of the loan. The
mortgage rate is constant over the lifetime of the mortgage. For each month, the difference between
the actual and contractual payment is calculated. Next, for each loan, the average excess payment
and the standard deviation thereof is computed over the lifetime of the loan. Consequently, an
observation is classified as a curtailment if the actual payment exceeds the contractual payment by
three times its standard deviation.
Table 5 provides the number of observations for the dependent variable per category for both
the full data set and the random sample of 10,000 mortgages. The number of full prepayments and
default are identical to those reported for the three state models. The percentage partial prepayments
denotes 2.59 percent in the random sample and 2.57 percent for the full data set. For delinquent
payments the figures are 1.32 percent and 1.43 percent, respectively. Again, the percentages are
comparable.
Yit5
1
2
3
4
5
Description
Contractual payment UPB
Unscheduled partial return of UPB
Voluntary prepayment of full UPB
Delayed payment of less than 90 days
Default
Data set
28,207,383
769,904
484,119
429,845
41,416
Random sample
386,239
10,596
6,514
5,421
549
5
Yi,t+1
1
2
3
4
5
366,119
(0.9495)
7,390
(0.8779)
0
(0)
2,737
(0.5168)
0
(0)
9,592
(0.0249)
686
(0.0815)
0
(0)
318
(0.0600)
0
(0)
6,228
(0.0162)
239
(0.0284)
0
(1)
40
(0.0076)
0
(0)
3,129
(0.0081)
103
(0.0122)
0
(0)
2,189
(0.4133)
0
(0)
537
(0.0014)
0
(0.000)
0
(0)
12
(0.0023)
0
(1)
extremely small. Intuitively this makes sense since if a person is delinquent on its mortgage payments
in the current period, full repayment of the remaining outstanding principle in month later is indeed
unlikely. Furthermore, partial prepayments mostly are an indicator of more partial prepayments in
the next period and sometimes even for full prepayment.
4.1.3
Figure 5 plots the dependent variable over the period January 1999 until March 2014 (in months).
The dashed line (green) indicates the number of observations in each calender month. This number
increases over time since more and more mortgages are purchased by Freddie Mac. At the end of
the sample period the number of observations decreases again due to the fact that mortgages have
either expired (after 30 years), have been prepaid or have defaulted. The rates in the dependent
variable are constructed by dividing the number of observations on a certain category for a certain
calender months, Nt (Oj ), by the total number of observations in that month, Nt .
4.2
Independent variables
Table 7 provides a description of explanatory variables for mortgage i and time t that are included in
the Freddie Mac data set. Complementary, four macroeconomic risk drivers have been added. State
level information is used if available, otherwise region specific data are used4 . The unemployment
rate is available on a monthly basis and is added to the data set on state level. The mortgage rate
in the market is also provided on a monthly basis and is refined by US region. Divorce rates and
US housing prices are included on a monthly basis per state. The categorical variables Loan Age,
Property Type, Loan Purpose and Region are included as dummy variables.
4 States are classified into five regions according to the Federal Reserves classification: North Central, North East,
South East, South West and West.
31
Variable
loanageit
mortrateit
FICOi
firsthomei
insurancei
DTIi
loansizei
LTVi
penaltyi
occupancyowneri
propertytypei
purposei
statei
unemplt
mortgageratet
divorceratet
housingpricet
Description
The number of months since the note origination month of the
mortgage.
The current interest rate on the mortgage note, taking into account any loan modifications.
A credit score, between 301 and 850, that indicates the creditworthiness of the borrower and is indicative of the likelyhood that
the borrower will timely repay future obligations.
Indicates whether the property is the first home bought by the
borrower.
The percentage of loss coverage on the loan (between 0 and 55
percent).
Debt to income (DTI) ratio defined as the sum of the borrowers
monthly debt payments divided by the tot monthly income used
to underwrite the borrower at the origination date of the mortgage.
The original UPB scaled by the mean original UPB of the selected
sample.
The original Loan-To-Value (LTV) ratio defined by the original
prinicpal divided by the purchase price of the mortgaged property.
Dummy variable that indicates whether the borrower is obliged
to pay a penalty for unscheduled return of the principal.
Variable indicating that the mortgage is owner occupied, investment property or a second home.
Denotes whether the property is a condominium (CO), planned
unit development (PUD), cooperative share (CS), manufactured
home (MH) or single family home (SFH).
Indicates whether the mortgage is a cash-out refinance mortgage,
no cash-out refinance mortgage or a purchase mortgage.
Indicates the state within which the property securing the mortgage is located.
Seasonally adjusted unemployment level per US state.
Mortgage rate on 30-year fully amortizing fixed rate mortgages
available in the market per US region5 .
Divorce rates per US state.
Seasonally adjusted indexed home price levels per US state.
Table 7: Description of explanatory variables.
Table 8 shows summary statistics for the risk drivers for the random sample of 10,000 mortgages.
Summary statistics for the full sample of 737,111 mortgages can be found in Appendix B.
32
Variable
Loan Age
FICO score
Firsthome
Mortgage insurance
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Property Type Condo
Property Type Planned Unit Development
Property Type Cooperative Share
Property Type Manufactured Housing
Property Type Single Family Home
Purpose Cash-Out
Purpose No Cash-out Refinance
Purpose Purchase
Region 1
Region 2
Region 3
Region 4
Region 5
Loan Age below 1Y
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Mean
32.529
730.482
0.138
5.045
34.166
0.915
72.877
0.001
6.143
4.051
155.655
-0.612
0.001
0.277
0.302
0.243
0.180
0.649
0.094
0.257
0.070
0.001
0.142
0.008
0.779
0.293
0.212
0.150
0.183
0.094
0.046
0.022
St. Dev.
28.728
53.930
0.345
10.445
11.740
0.512
15.931
0.029
2.089
0.877
26.342
0.985
0.701
0.447
0.459
0.429
0.384
0.477
0.292
0.437
0.255
0.033
0.349
0.089
0.415
0.455
0.409
0.357
0.386
0.291
0.209
0.148
Min
1
530
0
0
1
0.080
6
0
2.100
1.700
100.590
-6.020
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Max
181
832
1
40
65
4.187
100
1
14.900
9.900
206.670
2.870
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Table 8: Summary statistics of dependent variables for sample of 10,000 mortgages (409,319 observations).
4.3
Bivariate analyses
In this section, bivariate analyses are conducted to determine (non)linear effects of the different
risk drivers on the dependent variable, most notably prepayment rates. In this way, it can be
assessed whether risk drivers can directly be included in the models or should be included with a
transformation. The figures for the continuous variables are constructed by combining observations
in bins. The bin size determines the degree of smoothing visible in the figures. Decreasing the
number of bins can aid in revealing a trend. The number of bins used, k, is provided below each
figure. The explanatory variables will be discussed in the same order as in Table 7.
33
4.3.1
Loan Age
Loan age is tracked on a monthly basis and varies between 1 and 181 months. Figure 6(a) displays
the seasoning of the mortgages for the full sample for prepayments (blue) and defaults (red). Prepayments are at the highest level for loan ages of about six years after which they sharply decrease
to peak again for loan ages of twelve years. Default rates increase steadily during the first few years
after loan origination. After a small decrease at ages of 10 years default rates rise again.
Figure 6: Bivariate analyses of loan age (in years) on prepayment rates and default rates for the full
data set.
Since the effect of loan age on prepayment rates is not linear over the lifetime of the mortgage,
this variable is included as a categorical variable by buckets. The bucket size is be determined by
ensuring the presence of a similar amount of observations in each bucket. Buckets of smaller size are
statistically less reliable. Since there a more observations for younger loans, visible by the dashed
yellow line in Figure 6(a), these buckets are narrower. Figure 6(b) shows prepayment and default
rates per bucket.
4.3.2
Figures 7(a) through 7(d) show the bivariate analyses for the variables FICO score, Mortgage Insurance, DTI and Loan Size on prepayment (blue) and default rates (red).
34
Figure 7: Bivariate analyses of loan specific risk drivers on prepayment rates (blue) and default rates
(red) for the full data set.
A higher FICO score is associated with a higher degree of creditworthiness. Consequently, it is
expected that this score is inversely related to the default rate. This is confirmed in Figure 7(a). The
relation between the FICO score and prepayment rates also seems to be negative for FICO scores
exceeding 550. Prepayment rates are relatively high for borrowers with a low creditworthiness.
Bearing in mind the limited number of observations for FICO scores below 550 (indicated by the
dashed line) the FICO score is included in the model with no transformations.
The theoretical relationship between mortgage insurance and default rates is negative. The higher
the percentage of the mortgage that is covered, the lower the probability that the mortgagee will
default. This is visible in Figure 7(b). Both defaults and prepayments peak at mortgage insurance
rates of about eight percent. Given the limited number of observations for this point, it could be
the case that only a few observations cause this spike. Hence, again this variable will be included in
the model without transformations.
A higher DTI ratio is associated with a higher risk involved in the ability to meet all scheduled
payments. Therefore, its relation with the default rate is expected to be positive. This is confirmed
35
in Figure 7(c). A higher DTI ratio also generally entails that people have fewer free resources to
make unscheduled excess payments. Hence, its relation with prepayment rates is expected to be
negative. In Figure 7(c) this is clearly visible by the decreasing trend in the blue line. Since this
effect is more or less linear for the majortity of the observations the DTI ratio is included with no
transformations.
Loan size is a scaled variable indicating the dollar size of the loan relative to the average loan
size in the sample. The effect of loan size on prepayment and default rates is less evident (Figure
7(c)) although it appears that prepayment rates are higher for larger loans. For mortgages with a
below average size prepayment rates are slightly higher, indicated by the decreasing line before loan
size is equal to one. This is as expected since the amount of funds required for a full prepayment of
the UPB is relatively lower.
Figures 8(a) and 8(b) show the bivariate analyses of the loan-to-value ratio (LTV) and the first
home indicator.
Figure 8: Bivariate analyses of loan specific risk drivers on prepayment rates (blue) and default rates
(red/yellow) for the full data set.
A higher LTV is indicative of a higher default risk since mortgagees have the possibility to walk
away from the loan if the value of the residential property is significantly lower than the outstanding
loan. This is visible in Figure 8(a) by the slightly increasing default rates for higher LTVs. The
relation between LTVs and prepayment rates is expected to be negative since a lower value of the
residential property is associated with lower wealth of the mortgagee, especially given the fact that a
residential property constitutes the largest fraction of wealth for an individual. This effect is visible
in the figure by the on average decreasing trend for prepayments.
It is expected that mortgagees for which the property is a first home have a higher default
rate due to its relation with job insecurity and age of the borrower. The effect of a first home on
prepayment rates is twofold. On the one had it is expected that prepayment rates are lower since
young mortgagees often do not have a lot of spare funds to finance a prepayment. On the other
hand, young people often relocate more than older people which could lead to higher prepayment
rates.
Figures 9(a) through 9(d) show the bivariate analyses for the occupancy status of the residential
property, the purpose of the mortgage, the region in which the residential property is located an
36
whether a prepayment penalty applies to the mortgage contract on prepayment rates (blue) and
default rates (yellow).
(c) Region
Figure 9: Bivariate analyses of loan specific risk drivers on prepayment rates (blue) and default rates
(yellow) for the full data set.
In Figure 9(a) it is visible that prepayment rates are slightly higher when the property is a
condominium (CO) or a manufactured home (MH) whereas default rates are quite a bit higher when
the property is a planned unit development (PUD) or a single family home (SFH).
When the purpose of the loan is is to purchase the property, prepayment rates are slightly higher
compared to when the loan purpose is to cash out (Figure 9(b)). Default rates are considerably
higher when the purpose of the loan is to not cash out.
Figure 9(c) shows that prepayment and default rates differ per region. In region 4, the South
West, default rates are higher and prepayment rates are lower compared to the rest of the US.
Prepayment rates are the highest in the South East (region 3).
Figure 9(d) shows the difference in prepayment and default rates when a prepayment penalty
applies to the mortgage and when it does not. In the presence of a prepayment penalty it is expected
37
that prepayment rates are lower, which is confirmed in the figure. The presence of a prepayment
penalty does not influence the default rate.
4.3.3
Macroeconomic variables
Figures 10(a) through 10(d) show the bivariate analyses for the included macroeconomic variables
unemployment rate, divorce rate, house price index and market mortgage rates on prepayment (blue)
and default rates (red).
Figure 10: Bivariate analyses of macroeconomic variables on prepayment rates (red) and default
rates (blue) for the full data set.
The relation between unemployment rates and default rates is expected to be positive, whereas a
negative relation with prepayment rates is expected. In Figure 10(a) especially the negative relation
between unemployment levels and prepayments is visible.
The theoretical relation between the divorce rate and the prepayment rate is positive. A divorce
often leads to relocation which in turn can be a trigger to pay off the current mortgage before
38
P Vitma P Vitmo
,
P Vitma
(4.2)
in which P Vitmo denotes the present value of the remaining principal payments plus interest
payments at the mortgage contract rate, given by
P Vitmo =
T
k
X
t=1
Pit mo
.
(1 + rt )T t
(4.3)
The current mortgage rate is assumed to be constant over the contractual life of the mortgage. The
term P Vitma in Equation (4.2) denotes the present value of the remaining principal payments plus
interest payments at the mortgage market rate, given by
P Vitma =
T
k
X
t=1
Pit mat
,
(1 + rt )T t
39
(4.4)
5
5.1
5.1.1
Results
Multinomial logit model
Three state MNL model
The MNL model is initially estimated using the entire set of explanatory variables, see Appendix C
Table 35. To extract the variables that are most important in determining prepayments a generalto-specific variable selection procedure for prepayments is applied. Individual significance as well as
joint significance of the categorical variables is assessed at a significance level of 10 percent. Table
9 shows the coefficient estimates and p-values of the MNL3 model.
C
Firsthome
Mortgage insurance
Loansize
LTV
Houseprice
Refinancing
Loan Age
Property Type
Loan Purpose
Region
Prepayment
Coef.
-14,457
-0,444
-0,019
-0,165
0,015
0,022
-1,414
P-value
0,000
0,005
0,001
0,062
0,000
0,000
0,000
0,000
0,000
0,000
0,000
Default
Coef.
-19,166
-0,339
-0,013
-0,043
0,052
0,024
-1,633
P-value
0,000
0,110
0,064
0,727
0,000
0,000
0,000
0,000
0,000
0,000
0,000
The five state MNL model (MNL5) includes the transient states partial prepayment and delinquent
payment aside form the end states (full) prepayment and default. Contractual payment is again
chosen as reference category. Table 10 shows the results of the MNL5 model after applying the
general-to-specific variable selection procedure. The general model is given in Appendix C Table
36. The variable FICO score, the First Home indicator, the DTI ratio, Loan Size and Prepayment
Penalty have been removed during this procedure. Appendix E Table 44 shows the individual
40
coefficient estimates an p-values of the categorical variables as well as for the included missing value
dummies.
C
Mortgage insurance
LTV
Unemployment
Divorcerate
Houseprice
Refinancing
Property Type
Loan Purpose
Region
Loan Age
Part.Prep
Coef.
-5,520
0,004
0,001
0,019
0,066
-0,001
-0,107
P-value
0,000
0,004
0,512
0,001
0,000
0,000
0,000
0,000
0,000
0,243
0,000
Prep.
Coef.
-17,022
-0,015
0,010
0,174
-0,238
0,020
-1,170
P-value
0,000
0,005
0,007
0,000
0,021
0,000
0,000
0,000
0,000
0,000
0,000
Delinq.
Coef.
-7,232
0,004
0,016
-0,010
-0,033
0,004
-0,558
P-value
0,000
0,015
0,000
0,191
0,040
0,000
0,000
0,000
0,000
0,000
0,000
Default
Coef.
-22,215
-0,014
0,039
0,347
-0,329
0,027
-1,382
To validate the IIA property, the Hausman-McFadden test explained in Section 3.1.1 is performed.
This is done by removing defaults as a category from the dependent variable and deleting observations
from the data set that are classified with this state. Consequently, the logistic regression is performed
on the binary variable in case of the three state MNL model and a multinomial logistic regression in
case of the five state MNL model. The coefficient estimates are provided in Appendix D Tables 41
and 42. The HM test probabilities for both MNL models for all states under a 2 (36) are almost
equal to zero. Therefore, it is concluded that the IIA property is validated in both the MNL3 model
and the MNL5 model.
5.1.4
To investigate the performance of the model over over different mortgages, a cross sectional analysis is
conducted. To this end covariate information of mortgages is averaged over time periods, as explained
in Equation (3.49). The predicted states are defined by comparing the predicted probability for
Ti
P
mortgage i for state j, Pij =
Pitj /Ti to the in-sample probability for state j, Pj . See Equations
t=1
(3.50)-(3.52).
Tables 11 and 12 show the contingency tables for the MNL3 and MNL5 models based on a
cross sectional regression of 10,000 mortgages. Rows indicate predicted states and columns indicate
observed states. The diagonal represents the number of correctly predicted states.
41
P-value
0,000
0,055
0,000
0,000
0,004
0,000
0,000
0,081
0,000
0,000
0,000
Yit3
1
2
3
Yit3
2750
80
5
2835
420
5662
534
6616
36
347
166
549
3206
6089
705
10000
Table 11: Contingency table of predicted and realized state classifications for MNL3 model.
Yit5
Yit5
1
2
3
4
5
216
381
28
3
6
634
65
2010
84
5
14
2178
97
541
5175
14
687
6514
2
25
76
2
20
125
4
49
320
0
176
549
384
3006
5683
24
903
10000
Table 12: Contingency table of predicted and realized state classifications for MNL5 model.
The tables show the cross sectional performance of the MNL models. The section on model
performance, Section 6.1, will investigate the performance in more detail by computing the success
rate, false alarm rate and missed prepayments rate based on the contingency tables.
5.1.5
In Section 4.1.3 it was observed that prepayment rates are time varying. More specifically, periods of
high prepayments and low prepayments were visible in Figure 5. This section will investigate to what
extent the MNL models are capable of capturing this time varying effect. To this end, the logistic
regression for the panel dataset is transformed into a time series dataset according to Equation
nt
P
(3.56). Figures 11(a) and 11(b) show the predicted prepayment probabilities, Ptj =
Pitj /nt of
i=1
the MNL models versus the realized prepayments on a monthly basis for t = 1999M 01, ..., 2014M 03.
42
Residual diagnostics
The residuals for the MNL models are derived according to Equation (3.57). Given the large number
of observations in the present data set the CLT is used to infer whether the distribution of the
prepayment residuals converges to a normal distriution. The number of mortgage observations per
calender month varies between 2,246 in 1999M02 and 260,449 in 2011M10 and can be found in
Figure 5.
Figures 12(a) and 12(b) plot the empirical CDF of the prepayment residuals of the MNL models
together with a normal CDF with the same mean and variance (given in Table 13).
43
Figure 12: Empirical and normal CDF of prepayment residuals in MNL models.
The figures show that the ECDF resembles a normal CDF but is less smooth. Table 13 provides
p-value for the Kolomogorov-Smirnov (KS) test for normality which rejects that the time series
residuals are normally distributed. However the time series residuals are based on less observations
(182 observations) compared to the panel data residuals (409,3119 observations). The panel data
residuals are normally distributed with a mean equal to zero as indicated by the p-value for the
t-test in Table 13.
MNL3
MNL5
(jt ) 103
0,566
0,658
2 (jt ) 103
12,070
11,933
pKS
0,006
0,008
pttest
0,529
0,459
Table 13: Mean, variance, KS statistic and p-value zero mean hypothesis for prepayment residuals
MNL models.
Figures 13(a) and 13(b) plot the prepayment residuals, tj of the MNL models over the sample
period.
44
Figure 13: Time series analysis of MNL3 and MNL5 prepayment estimates.
The residual plots show the underestimation of the prepayment rates in the period November 2000
until October 2003, the period in which prepayment rates are at the highest level. Furthermore,
in Figures 13(a) and 13(b) it can be seen that there is some time variation in the residuals.To
determine the order of residual autocorrelation, the Partial Autocorrelation Functions (PACF) are
plotted in Figures 14(a) and 14(b). To limit computation time forecasts are made for a sample of
200 mortgages.
45
1
0.884
0.888
M N L3
M N L5
LB(1)
0.000
0.000
Table 14: Residual autocorrelation and Ljung-Box test statistic for MNL3 and MNL5 prepayment
residuals.
In an attempt to control for autocorrelation in the MNL models, dummy variables for month
and/or year have been added. This slightly decreased the AR(1) coefficient however comes at the
expense of increased computation time due to the addition of a large number of variables. Another
possible remedy that has been examined is the inclusion of the macro-economic variables in first
differences. However, this did not lower the autocorrelation and diminished the accuracy of the
predictions.
In OLS regressions, residual autocorrelation is often remedied by including a lagged dependent
variable as explanatory variable in the model. In the three state MNL model this is not possible
directly since a mortgage drops out of the data set after it has been prepaid or defaulted on.
A possible solution is to include the percentage of mortgages that is in state j in the preceding
month(s) as an endogenous regressor. This has been done for both the MNL3 model and the MNL5
model. The residual autocorrelation decreased slightly (from 0.884 to 0.817 in the MNL3 model)
and the model estimates remained relatively similar. One drawback of including an endogenous
regressor in the model is that this comes at the expense of the significance of other variables in the
model. This was observed by the fact that in the variable selection procedure only a few variables
turned out to be significant for prepayments whereas the endogenous regressor was highly significant.
Moreover, including an endogenous regressor leads to inconveniences when making multiple period
ahead predictions. Since the aim of this thesis is to develop a prepayment model that is capable
of predicting prepayments not only in-sample but also out-of-sample (this is of more interest for
financial institutions), the endogenous regressors are not included in the final models.
5.1.7
Forecasting
The cross sectional out-of-sample model performance is assessed by transforming the panel dataset
into a cross-sectional dataset according to Equation (3.49). The out-of-sample analysis is conducted
by using covariate information from all mortgages except the ith mortgage and using the coefficient
estimates to form a prediction on the state of mortgage i. The contingency tables for the MNL3
forecasts and the MNL5 forecasts are provided in Tables 15 and 16 respectively. The results of the
cross sectional forecasts will be discussed in more depth in Section 6.2.1.
Yit3
1
2
3
Yit3
52
3
0
54
14
108
10
132
1
6
7
14
66
117
17
200
46
Yit5
1
2
3
4
5
Yit5
8
10
0
0
0
18
0
31
4
0
0
35
3
16
102
0
9
130
0
0
3
0
0
3
0
1
9
0
4
14
11
58
118
0
13
200
Figure 15: One -and two period ahead prepayment predictions MNL models, January 2009 to March
2014.
In Figure 15(a) it can be seen that the one period ahead prediction of the MNL3 model is
relatively accurate and follows the prepayment pattern over the forecast period quite well. One
exception is is in March 2013 in which the forecast shows a sharp drop. The one step ahead forecast
of the MNL5 model in Figure 15(b) also appears to be accurate. The forecasts of both models seem
to overestimate the prepayment rate in the beginning of the forecast period (January 2009) whereas
they underestimate prepayments towards the end of the sample.
5.2
In case there are no covariates, the survivor function in the competing risk model can be estimated
with the Kaplan-Meier (K-M) estimator (Rodriguez, 2005). Let tj1 < tj2 < ... < tjkj denote the kj
47
distinct failure times of type j = 1, 2. Let nji denote the number of loans at risk of being terminated
for reason j just before tji and let dji denote the number of loan terminations due to cause j at time
tji . Censored data are not included in this analysis.The K-M estimator of the survivor function is
given by
a
dji
Sj (t) =
1
(5.1)
nji
i:t <t
ji
and can be interpreted as the probability that a mortgage is terminated at time tji given that it
was not terminated before this time. The conditional probability of mortgage termination at time t
Ignoring censored cases, the KM estimate coincides with the empirical surival function. The
empirical survival functions for mortgage termination due to prepayments and defaults are given in
Figure 16(a). The figure shows the survival probability of loans that prepay at some point (blue) or
default at some point (red). There are discontinuities at observed times of mortgage termination.
Since loans are only tracked in the time period 1999-2014, the data are censored and for the majority
of the loans that have not been prepaid or defaulted in this period it is not possible to determine
their actual maturity.
Figure 16: Empirical and theoretical survivor functions for prepayments and defaults.
Figure 16(a) demonstrates that the survival rate for defaults is higher for younger loans compared
to the survival rate for prepayments. Figure 16(a) also illustrates that the relation between loan age
and the survival probability of the mortgage is non-linear. This effect can be captured by including
dummy variables for loan age.
As explained in Section 3.1.2, the baseline hazard rate j0 can be fitted to various theoretical
distributions. Figure 16(b) shows the ECDF for prepayments together with the CDF of the Weibull,
Log Logistic, Exponential and Log Normal distribution. The fitted parameters are given in Table
17.
Parameters
p1
p2
Weibull
41,641
1,382
Log Logistic
3,362
0,478
Exponential
37,866
Log Normal
3,330
0,834
The prepayment survival rate can best be modelled by a Weibull distribution with location
parameter p1 = 41.641 and scale parameter p2 = 1.382. To incorporate non linearities in the relation
between loan age and prepayment rates, this variable will be included as a categorical variable.
5.3
5.3.1
Markov models
Three state Markov model
The Markov models are obtained by determining transition probabilities from state sequences for
each loan. Both full prepayment and default constitute end states, whereas the other categories are
transient states. Including covariate information to estimate the transition probabilities can be done
by estimating the MNL conditional on a certain state. Table 18 reports the results after variable
selection. Table 37 in Appendix C shows the result of the general model and Appendix E Table 45
provides individual estimates of the categorical variables.
C
FICO score
Firsthome
DTI
Loansize
LTV
Unemployment
Divorcerate
Refinancing
Property Type
Loan Purpose
Region
Loan Age
Prepayment
Coef.
-5,485
0,001
-0,106
-0,006
0,465
-0,003
-0,096
0,099
-0,689
P-value
0,000
0,000
0,015
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,018
0,000
0,000
Default
Coef.
-4,243
-0,011
0,004
0,030
0,388
0,028
0,080
-0,020
-0,487
P-value
0,000
0,000
0,978
0,000
0,000
0,000
0,000
0,699
0,000
0,002
0,002
0,000
0,000
The five state Markov model incorporates information on intermediate states. Coefficients are estimated by conditioning on the transient states and conducting a logistic regression. This is done for
contractual payment, partial prepayment and delinquent payment.
Conditional on partial prepayment, the results of the Markov model are given in Table 19. For
the Markov models a general-to-specific variable selection procedure is applied by which the specific
model only contains variables that are significant for predicting prepayments at the 10 percent level.
Table 38 in Appendix C shows the estimation output for the Markov5(1) model before variable
selection.
49
Part.Prep
C
Loansize
Penalty
Unemployment
Refinancing
Property Type
Loan Purpose
Region
Loan Age
Part.Prep
Coef.
-2,158
-0,444
-13,485
0,013
0,117
Prep.
Coef.
-16,015
0,323
2,095
-0,099
-0,339
P-value
0,000
0,000
0,985
0,500
0,013
0,000
0,014
0,325
0,000
P-value
0,818
0,025
0,064
0,001
0,000
0,000
0,435
0,000
0,000
Delinq.
Coef.
-4,606
-0,468
-11,414
0,054
-0,168
P-value
0,000
0,060
0,988
0,228
0,138
0,000
0,430
0,000
0,000
Part.Prep
Coef.
-2,439
0,000
-0,006
0,032
-0,030
P-value
0,007
0,997
0,198
0,321
0,660
0,210
0,807
0,007
0,000
Prep.
Coef.
-3,229
0,006
-0,024
-0,424
-0,298
P-value
0,169
0,046
0,015
0,000
0,091
0,865
0,108
0,056
0,000
Delinq.
Coef.
-0,837
-0,002
0,006
0,038
-0,159
P-value
0,059
0,001
0,013
0,018
0,000
0,000
0,000
0,000
0,000
Default
Coef.
-5,596
-0,001
0,021
0,250
0,225
P-value
0,167
0,872
0,480
0,088
0,528
0,000
0,542
0,120
0,000
50
Contr.Pay
Part.Prep
Coef.
-5,396
-0,001
0,057
0,000
0,096
0,001
0,092
0,047
0,030
C
FICO score
Firsthome
DTI
Loansize
LTV
Unemployment
Divorcerate
Refinancing
Property Type
Loan Purpose
Region
Loan Age
P-value
0,000
0,000
0,120
0,671
0,000
0,099
0,000
0,001
0,020
0,000
0,000
0,000
0,000
Prep.
Coef.
-5,412
0,001
-0,087
-0,007
0,476
-0,003
-0,081
0,097
-0,706
P-value
0,000
0,000
0,052
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,009
0,000
0,000
Delinq.
Coef.
3,512
-0,015
-0,099
0,014
-0,017
0,011
0,028
0,013
-0,280
P-value
0,000
0,000
0,136
0,000
0,690
0,000
0,002
0,534
0,000
0,000
0,101
0,000
0,000
Default
Coef.
-3,956
-0,012
0,020
0,030
0,400
0,029
0,068
-0,019
-0,533
P-value
0,000
0,000
0,902
0,000
0,000
0,000
0,001
0,719
0,000
0,751
0,557
0,000
0,000
The contingency tables for the cross sectional performance analysis of the three and five state
Markov models are given in Table 22-23. The cross sectional analysis of the Markov models is
slightly different from that of the MNL models due to the way in which the Markov models, or
conditional MNL models, are estimated. For example, the Markov model conditional on partial
prepayment is estimated by maintaining in the data set only those observations that are classified
as partially prepaid. Estimating the cross sectional model performance is done by averaging the
covariate information over the mortgages and predicting the state of a mortgage at the final date on
which the mortgage is observed. Since all observations classified as partial prepayment are contained
in the data set, no observations on partial prepayment remain to be predicted. This is accounted for
in the tables by eliminating the states partial prepayment and delinquent for the Markov5(1) and
Markov5(3) model, respectively.
M3
Yit3
1
2
3
Yit3
M5(1)
2731
84
20
2835
443
5555
611
6609
26
312
211
549
3200
5951
842
9993
Yit5
1
3
4
Yit5
3040
1914
0
4954
99
140
0
239
31
36
0
67
3170
2090
0
5260
Table 22: In-sample contingency table for Markov3 and Markov5(1) model.
The Markov3 models seems to perform reasonably well cross sectionally. The Markov5 model
conditioned on partial prepayments slightly overestimates the number of full prepayments. This
model fails to predict delinquent mortgages. Given the limited number of transitions from partial
prepayment to delinquent, 102 (see Table 6), it is not surprising that this state is not accurately
predicted by the model. Since the number of transitions from partial prepayment to default is zero
this state is omitted from the table.
51
The contingency tables for the Markov5 models conditioned on delinquency status and contractual payment are provided in Table 23.
M5(3)
Yit5
1
2
3
5
Yit5
M5(5)
870
231
97
59
56
48
4
2
26
5
8
1
5
0
1
6
957
284
110
68
1257
110
40
12
1419
Yit5
1
2
3
4
5
Yit5
237
369
19
2
7
634
222
1864
209
13
92
2400
135
897
4415
43
738
6228
3
36
94
5
56
194
8
50
219
7
253
537
605
3216
4956
70
1146
9993
Table 23: In-sample contingency table for Markov5(3) and Markov5(5) model.
The Markov5 model conditioned on delinquent payments slightly overestimates the default rate.
The Markov5 model conditioned on contractual payments performs reasonably well and is comparable to the contingency table of the MNL5 model in Table 12.
5.3.4
Figures 17(a)-17(d) show the time series performance of the Markov models over the sample period
January 1999M until March 2014.
52
5.3.5
Residual diagnostics
Figures 18(a) - 18(d) plot the empirical distribution of the prepayment residuals of the Markov
models together with a normal distribution with the same mean and variance. The mean and
variance of the residuals is given in Table 24.
Figure 18: Empirical and normal CDF of prepayment residuals in Markov models.
The ECDF of the prepayment residuals for all models, especially the Markov3 model and the
Markov models conditioned on delinquency status and contractual payments closely resemble a
normal distribution. This is confirmed by the Kolomogorov-Smirnov (KS) test statistics in Table 24
and indicates that the use of the CLT is justified. Since the Markov model conditional on partial
prepayments is based on a lower number of observations it is no surprise that this residual ECDF is
slightly further off from the normal CDF however, a normal distribution for these residuals can still
not be rejected.
54
Markov3
Markov5(1)
Markov5(3)
Markov5(5)
(jt ) 103
0,469
-4,517
-1,427
0,836
2 (jt ) 103
6,255
8,442
5,147
6,370
KS
0.156
0.238
0.161
0.103
pttest
0.105
0.000
0.000
0.057
Table 24: Mean, variance, KS statistic and p-value zero mean hypothesis for prepayment residuals
Markov models.
The presence of a zero mean in the residuals is rejected for the Markov3 model but can be
confirmed for the Markov5(1) and Markov5(3) models at the one percent level and for the Markov5(5)
model at the ten percent level, indicated by the p-value for the t-test in Table 24.
Figures 19(a) - 19(d) plot the prepayment residuals, tj of the Markov models over the sample
period.
determine the order of residual autocorrelation, the Partial Autocorrelation Functions (PACF) are
plotted in Figures 14(a) and 14(b).
Markov3
Markov5(1)
Markov5(3)
Markov5(5)
1
0.913
0.871
0.746
0.920
LB(1)
0.000
0.000
0.000
0.000
Table 25: Residual autocorrelation and Ljung-Box test statistic for Markov prepayment residuals.
The residuals of the Markov5 model conditioned on delinquency status have a lower first order
56
autocorrelation compared to the other models, however the second order autocorrelation coefficient
is also significant in this model.
Allthough the Markov models are constructed by conditioning on the current state in mortgage
termination, there remains autocorrelation in the models which is not lower than the autocorrelation
found in the MNL models. Similar remedies that were applied to the MNL models have been used
for the Markov models, however these measures were only minorly effective and lead to less accurate
forecasts and therefore they are not applied in the final models.
5.3.6
Forecasting
The contingency tables for the cross sectional out-of-sample performance for 200 mortgages for the
Markov models is given in Tables 26 - 27.
M3
Yit3
1
2
3
Yit3
M5(1)
51
3
0
54
15
109
8
132
1
9
4
14
Yit5
1
3
4
67
121
12
200
Yit5
114
73
0
187
6
2
0
8
3
2
0
5
123
77
0
200
Table 26: Out-of-sample contingency table for Markov3 and Markov5(1) model.
M5(3)
Yit5
1
2
3
5
Yit5
M5(5)
132
14
5
0
11
4
1
0
8
0
0
0
5
0
0
0
171
22
7
0
151
16
200
Yit5
1
2
3
4
5
Yit5
8
10
0
0
0
18
4
30
3
0
0
37
4
22
91
0
10
127
0
0
3
0
1
4
0
0
7
0
7
14
16
62
104
0
18
200
Table 27: Out-of-sample contingency table for Markov5(3) and Markov5(5) model.
The time series out-of-sample performance of the models is again assessed by means of a one
and two period out of time analysis using a sample split at January 2009. Figures 21(a) and 21(d)
show the forecasted prepayment probabilities along with the actual prepayments and in-sample
prepayment predictions for the one-step ahead forecasts of the Markov models.
57
Figure 21: Markov models one -and two period ahead prepayment predictions January 2009 to
March 2014.
From the figures it can be seen that the one step ahead prediction generally lies above the
in-sample prediction from January 2009 to March 2013, after which the forecast drops below the
in-sample prediction. The Markov3 forecast slightly overestimates the prepayment rate whereas the
Markov5 models seem to be able to capture the general pattern in the prepayment rate.
5.4
The risk neutral refinancing incentive can be modelled as a European lookback put on the market
mortgage rate. The strike price is fixed and equal to the contractual mortgage rate. The asset price
is equal to the minimum value of the market mortgage rate over the lifetime of the mortgage, see
Equation (3.23). The two stochastic processes that need to be modelled are the short rate and the
market mortgage rate.
To model the short rate, the Hull-White model is calibrated on ATM swaption prices traded on
the over-the-counter market (OtC), quoted in terms of Black volatilities and taken from Bloomberg.
The interest rate term structure to which the model is fitted is the EURIBOR6M curve on 01-01-2015
58
provided by Bloomberg. This rate is chosen for calibration since US rates provided by Bloomberg are
illiquid for longer maturities. Bloomberg does provide these rates but uses an extrapolation method
that leads to inaccurate results. Since the maturities of the mortgages in this thesis is thirty years,
it is important to obtain accurate rates for long maturities. The market implied term structure is
given in Figure 22(a).
Figure 22: Market implied term structure and HW1f simulated short rate.
Since swaptions are quoted in terms of implied volatilities, Blacks formula for swaptions has
to be used to derive market prices, see Equation (3.41). Calibrating the HW1f model in a market
consistent way implies finding values for a and r for which the objective function in Equation (3.40)
is minimized. The parameter estimates are given in Table 28.
a
r
2.90 102
8.26 103
Table 28: Parameters for the Hull White model calibrated to swaption prices.
Next, future paths for the instantaneous short rate are simulated. The Euler scheme is used
to discretize the stochastic differential equation. Figure 22(b) shows simulations for the short rate
under the Hull White model (100 simulations). Although the interest rate level was low on the date
of calibration, 01-01-2015, interest rates do not become negative. What is more, the simulated paths
for the short rate are very similar. This is due to the fact that the variance of the short rate is very
small, see Table 28.
Next, the market mortgage rate has to be modelled. Looking at Figure 4 it can be seen that the
pattern of the 30FRM rate and the EURIBOR6M rate is similar over the sample period, 1999M01
until 2014M03.
mat ,rf
t
smarf
0.814
3.200
Table 29: Correlation and spread between 30Y FRM and EURIBOR6M.
59
Indeed, it can be seen in Table 29 that the historical correlation between the market mortgage
rate and the risk free rate is high. Under the assumption that the spread between the two rates is
constant, the market mortgage rate can be modelled by the same stochastic process as the risk free
rate. Table 29 also shows the average spread between the two rates over the selected sample period.
This spread will be added to the simulated zero bond price process of the risk free rate to derive a
stochastic process for the market mortgage rate.
The put option is ITM if the market mortgage rate for the remainder of the maturity of the
mortgage is lower than the contractual mortgage rate in at least one month between the origination
of the mortgage contract and its maturity (30 years). Since the strike price for the lookback option,
e.g. the contractual mortgage rate, differs per mortgage the value of the put option is derived
by taking the risk neutral expectation over 50 simulations for each mortgage. The large number
of mortgages puts bounds on the number of simulations can can be conducted within reasonable
computation time. One relation that should be observable is the fact that mortgages with a lower
contractual mortgage rate should have a lower value for the put option.
The price of the put option will be used as an explanatory variable in the exogenous models as
an alternative manner for capturing the refinancing incentive.
5.5
As an alternative indicator for the refinancing incentive in Equation (4.2), the MNL and Markov
models have been estimated with the risk neutral price of the put option. This variable has been
included by replacing the refinancing incentive in all specific models with the risk neutral put price
(the refinancing incentive was significant in all models). The model estimates are given in Appendix
F. In the MNL3 model the put price is insignificant for both prepayments and for defaults, and does
not alter the significance or coefficients of other variables in the model. In the MNL5 model the put
price is only significant for delinquent payments. In the Markov3 model the put price is significant
for prepayments. The coefficients on the other variables are again similar to the model estimated
with the refinancing incentive. In the Markov5(1) model the put price is significant for all categories
and in the Markov5(3) model this variable is significant for prepayments at the five percent level.
In the Markov5(5) model it is significant for all categories. Overall, the put price does not perform
better in the models compared to the refinancing incentive. In the MNL models it performs worse
and in the Markov models the performance is the same. This can be due to the fact that allthough
the risk neutral put price is the theoretically optimal calculation of the incentive to refinance, the
refinancing incentive defined in Equation 4.2 makes use of actual available information on the risk
free rate per month per state and the discounted value of UPB payments. The put price can however
be used as a proxy for the refinancing incentive if such actual information is unavailable. In the
remainder of this thesis the models are be estimated with the refinancing incentive.
60
6
6.1
Model performance
In-sample performance
Using panel data, in-sample model performance is assessed based on (i) R2 ; (ii) AIC and (iii)
deviance. The cross sectional performance of the model is evaluated based on (iv) success rate for
predicting prepayments; (v) the percentage of mortgages wrongly predicted as prepayments and
(vi) the percentage of mortgages for which a prepayment is missed. The success rate is defined in
Equation (3.53). The false alarm rate for prepayments in Equation (3.54) and the missed prepayment
rate in Equation (3.55) are defined in relation to contractual payments. The time series performance
of the models is based on (vii) unbiasedness via a hypothesis test of a zero mean in the forecast
error; (viii) accuracy via the MSPE and (ix)-(x) efficiency by means of the coefficients of the Mincer
Zarnovic regression.
The criteria of the general models (before variable selection) and specific models (after variable
selection) are given in Table 30 for the MNL and Markov models.
61
(i)
General
Specific
(ii)
General
Specific
(iii)
General
Specific
(iv)
General
Specific
(v)
General
Specific
(vi)
General
Specific
(vii)
General
Specific
(viii)
General
Specific
(ix)
General
Specific
(x)
General
Specific
MNL3
MNL5
Markov3
0,251
0,244
0,106
0,105
0,012
0,012
58,207
42,055
54,560
36,526
51,022
39,022
7,220
7,216
21,161
21,178
7,215
7,216
0,848
0,856
0,826
0,794
0,853
0,841
0,030
0,028
0,041
0,044
0,030
0,030
0,066
0,064
0,012
0,015
0,060
0,067
0,001
0,001
0,001
0,001
0,000
0,001
0,027
0,032
0,030
0,032
0,472
1,146
0,016
0,016
0,016
0,016
0,016
0,016
-1,000
-1,000
-1,000
-1,000
-1,000
-1,000
Markov5(1)
R2
0,088
0,087
AIC
57,083
29,059
Dev*104
0,751
0,768
Sj
0,636
0,586
Fj
0,343
0,386
Mj
0,364
0,414
E(tj )
-0,005
-0,004
MSPE*104
7,309
7,589
0,M Z
0,023
0,023
1,M Z
-1,000
-1,000
Markov5(3)
Markov5(5)
0,011
0,010
0,006
0,005
54,835
26,811
50,537
36,519
0,938
0,952
18,545
18,638
0,275
0,200
0,802
0,709
0,098
0,077
0,036
0,030
0,575
0,650
0,018
0,022
-0,001
-0,002
0,001
0,001
7,914
7,935
1,096
1,112
0,007
0,007
0,016
0,016
-1,000
-1,000
-1,000
-1,000
Overall, based on the performance criteria discussed above as well as the contingency tables and
the figures with time series performance, the MNL3 model has the highest in-sample performance.
Since all models have a higher in-sample performance after variable selection is conducted, the
out-of-sample performance is assessed only for the specific models.
6.2
6.2.1
Out-of-sample performance
Cross sectional out-of-sample
The contingency tables for the cross sectional out-of-sample performance have been provided in
Tables 15 and 16 for the MNL models and in Tables 26 and 27. Table 31 contains the out-of-sample
prepayment success rate, false alarm rate and missed rate for the sample of 200 mortgages.
(i)
(ii)
(iii)
Sj
Fj
Mj
MNL3
0,818
0,056
0,106
MNL5
0,785
0,000
0,023
Markov3
0,826
0,056
0,114
Markov5(1)
0,250
0,390
0,750
Markov5(3)
0,250
0,093
0,688
Markov5(5)
0,717
0,000
0,031
Table 31: Cross sectional out-of-sample performance criteria for estimated models.
Table 31 shows that the three state models have the highest out-of-sample success rate. Moreover
their false alarm rate and missed rate are low compared to the other models. The MNL3 model
has a slightly lower success rate than the Markov3 model but this is compensated by a lower missed
rate for the former. The MNL5 model has a relatively high success rate as well and a very low false
alarm and missed prepayment rate. The Markov5(5) model scores relatively well. The performance
of the other five state Markov models lacks behind. However, the Markov5(1) and Markov5(3) are
based on a smaller set of observations. From the sample of 200 mortgages there are only 8 and 16
prepayments that can be predicted by these models, respectively. Therefore the weight of individual
observations is large in these rates.
6.2.2
The one step ahead prepayment predictions have been given in Figures 15(a)-15(b) and Figures 21(a)21(d). As explained in Section 3.5.3, forecasting performance is assessed in terms of (i) unbiasedness;
(ii) accuracy and (iii) efficiency. Table 32 gives (i) the expected value of the forecast error; (ii) the
MSPE and (iii) the Mincer-Zarnovic (MZ) coefficients for the one- and two- step ahead forecasts for
all models (after variable selection).
63
MNL3
MNL5
Markov3
One-period
Two-period
-0.001
-0.001
-0.001
-0.001
-0.007
-0.007
One-period
Two-period
0.048
0.130
0.016
1.075
0.880
1.041
One-period
Two-period
0.016
0.016
0.016
0.002
0.016
0.017
One-period
Two-period
-1.012
-1.024
-1.005
-0.292
-1.009
-1.014
Markov5(1)
E(tj )
-0.006
-0.006
MSPE*104
3.539
3.696
0,M Z
0.032
0.032
1,M Z
-1.081
-1.067
Markov5(3)
Markov5(5)
0.002
0.002
-0.007
-0.007
0.912
0.914
0.850
0.983
0.010
0.010
0.016
0.016
-1.088
-1.082
-1.010
-1.015
6.3
The data set spans the period January 1999 until March 2014 and therefore contains data on the
2008 credit crisis. The sixteen years of data and can be split into nine pre-crisis years and six post
crisis years. Since prepayment models before, during and after a crisis are likely to be different, it
is interesting to investigate whether the estimated parameters are stable over time or whether the
crisis constitutes a structural break in the prepayment models. To test for parameter stability the
CUSUMSQ test will be conducted. The advantage of the CUSUMSQ test over Quandts Likelihood
Ratio test is that the former is a lot shorter in computation time. The reason for this is that the
QLR test requires the estimation of twice the number of coefficients which is time consuming given
the sizable data set at hand. See Appendix G for the specification of the QLR test. The CUSUM
test requires the estimation of time varying coefficients. To this end the MNL model is estimated
T times using observations from months t = 1, ..., 182. This procedure was initially applied to the
sample of 10,000 mortgages. However, for some months teh number of observations was limited int
his case. Therefore, the final CUSUMSQ test is applied to the full data set. The test on parameter
stability is applied to the model which is classified as best performer according to the previous
sections, namely the MNL3 model.
64
6.3.1
The CUSUMSQ test is conducted using the full data set and the variables contained in the specific
MNL3 model. These variables can be found in Table 9. The panel data set is restructured by
calender month and the coefficients are estimated monthly for each month between January 1999 to
March 2013 (183 estimates). In each month it is tested whether the explanatory variables contain
values and exhibit a correlation of below |1| with any of the other explanatory variables. If either
of the two is not the case, the variables are removed from the model. Therefore, the number of
variables in the model can differ per calender month. This mostly holds for a subset of dummy
variables for categorical risk drivers such as property type and loan purpose in the beginning or at
the end of the sample period. Furthermore, loan age dummies often drop out of the model in the
beginning of the sample period due to the structure of the data set.
This requirement on the included variables also constituted a main reason to perform the
CUSUMSQ tests on the full data set. The CUSUMSQ tests have been applied to the sample of
10,000 mortgages but lead to issues of missing covariate information per month or too limited variation in covariates in a certain month. Moreover if in one month no prepayment or default occurred,
the MNL model could not be estimated at all.
Figures 23(a) - 23(f) show the coefficient estimates for prepayments per calender month for the
full sample for the non-categorical variables of the MNL3 model.
65
Figure 23: Time varying coefficient estimates non-categorical variables in MNL3 model (full data
set).
66
The figures show that in general the estimated parameters are most fluctuative in the beginning
of the sample period. One reason for this initial instability is fewer observations in the beginning of
the sample. Parameter estimates for the First Home Indicator, Mortgage Insurance and Refinancing
Incentive seem to be relatively stable over the remainder of the sample. For the variables Loan Size,
LTV and House Price Index, a sharp rise around October 2009 is visible. After this point these
variables have a larger effect on prepayment rates. This effect is most profound for the House Price
Index. It appears that after the crisis of 2008 an increase in the House Price Index has a large
positive effect on the prepayment rate compared to the pre-crisis period. The refinancing incentive
shows a sharp drop just before October 2009. The coefficient becomes more negative. This indicates
that a favorable market mortgage rate, e.g. lower than the contractual mortgage rate, after this
period leads to a more pronounced refinancing incentive compared to the pre-crisis period.
Figures 24(a) - 24(d) show the time varying prepayment coefficient estimates for the categorical
variables of the MNL3 model.
(c) Region
Figure 24: Time varying coefficient estimates categorical variables in MNL3 model (full data set).
The coefficients for Property Type fluctuate the most in the beginning of the sample. The
estimates for the property types Cooperative Share, Manufactured Housing and Single Family Home
67
stabilize over time while the coefficient for Planned Unit Developments remains very volatile. The
coefficients for Loan Purpose are relatively stable while the estimates for Region fluactuate quite
a bit. Especially for the second and fifth region. From the Loan Age variables the spikes of loan
ages between three and six years stand out. Apparently, loans of this loan age have an increasing
probability of being prepaid in the period near the end of 2009.
All in all, the coefficient estimates for prepayments are quite fluctuatuve during the sample
period. Hence it is expected that the parameters are not stable over time. This will be tested more
formally with the CUSUMSQ test.
6.3.2
CUSUMSQ Test
The CUSUMSQ test is based on the recursive residuals given in Equation (3.18). Applied to the logit
model, these residuals are adapted slightly. Since the aim of the thesis is to determine a prepayment
model, the test will be applied to prepayment residuals only. The recursive prepayment residuals
are defined as
tj
(6.1)
wt,j
=p
0
1 + xt (Xt1
Xt1 )1 x0t
in which tj denotes the MNL residual for state j and is defined in Equation (3.57).
The CUSUMSQ test statistic, St , is plotted in Figure 6.3.2 for T = 182 and k = 28, the average
number of parameters in the time series models.
mean compared to post October 2009. This break date can also be linked to the credit crisis of
2008. Prior to this crisis economic growth enabled unscheduled return of the mortgage principal
while after the crisis prepayment rates are vastly lower. Indeed Figure 5 shows that after the crisis
prepayment rates fluctuate around a lower mean.
6.3.3
The MNL3 model will consequently be estimated by allowing for different parameters before and
after October 2009. To this end the MNL model will be estimated with an indicator variable,
according to Equation (3.21). The number of observations before October 2009 denotes 237,198 and
is slightly higher than the number of observations after this time, which equals 172,121. The model
estimates are given in Table 33. The individual estimates for the categorical variables are given in
Table 49.
C
Firsthome
Mortgage insurance
Loansize
LTV
Houseprice
Refinancing
Loan Age
Property Type
Loan Purpose
Region
Prior
Prep
Coef.
-13,407
0,053
-0,031
0,074
0,020
0,004
0,039
Crisis
P-value
0,000
0,928
0,106
0,853
0,131
0,435
0,862
0,000
0,000
0,000
0,000
Default
Coef.
-18,287
0,240
-0,014
0,399
0,049
0,011
-0,258
P-value
0,000
0,682
0,476
0,315
0,001
0,022
0,246
0,000
0,000
0,000
0,000
Post
Prep
Coef.
Crisis
-0,137
-0,016
0,349
0,003
0,013
-1,597
0,475
0,017
0,001
0,568
0,002
0,000
0,000
0,000
0,000
0,000
P-value
Default
Coef.
-0,139
-0,018
0,119
0,048
0,013
-1,713
P-value
0,612
0,045
0,437
0,000
0,016
0,000
0,000
0,000
0,000
0,000
Table 33: Coefficient estimates and p-values MNL3 model with structural break.
The estimates show that while the majority of the variables are significant in the post crisis
period, they are not significant pre-crisis. This indicates that pre and post crisis prepayment rates
differ in terms of the risk drivers by which they can be modelled. Furthermore, since the variables
that were selected in the original MNL3 model are the variables that were significant in the post crisis
period it seems that the post crisis period largely influenced the variable selection procedure even
though this period comprises less observations. The model can be extended by allowing different
variables in the model depending on the regime in which the model is.
The performance of the model over time is given in Figure 26(a) and the prepayment residuals
in 26(b).
69
Figure 26: MNL3 model estimates with structural break at October 2009.
With an R2 of 0.256 the model scorers slightly better than the MNL3 model without structural
breaks. The MSPE of the model is also low compared to the other models, namely 2.132 106 .
However this comes at the expense of a sharp increase in the AIC, which amounts 100.317. This is no
surprise given that the number of parameter estimates has doubled. The residual AR(1) coefficient
has decreased to 0.881 but remains relatively high.
70
Conclusion
In this thesis four types of prepayment models are estimated for a mortgage portfolio of Freddie Mac.
Three of these models are widely used in the prepayment literature, namely the option theoretic
model, the multinomial logit model (MNL) and the competing risk model. A Markov model has been
introduced as a new method for modelling prepayment rates. The aim of this thesis is to determine
which of these models is the best performer for predicting full prepayment rates both in-sample and
out-of-sample, whereby out-of-sample performance is assessed cross sectionally as well as time series
wise.
The option theoretic model is constructed as a lookback put option on the mortgage. This
is an interesting from a theoretical standpoint however its major drawback lies in the inability
of incorporating the behavioural risk that resides in prepayment decisions. Such behavioural risk
arises from the fact that mortgagees are unaware of optimal prepayment options as well as from
diverging borrower specific attributes such as the Debt-to-Income ratio, FICO score, region in which
the property is located and the divorce rate. To incorporate the fact that the optimal prepayment
decision merely constitutes one indicator for prepayment, the risk neutral put price has been included
in exogenous models as an explanatory variable. Exogenous or empirical models aim to estimate
prepayment rates according to a number of borrower specific, loan specific and macro economic
variables. Including the put price as explanatory variable is a good alternative to incorporate the
refinancing incentive however leads to less accurate predictions compared to using a straightforward
variable for the refinancing incentive.
The most popular exogenous model in prepayment modelling is the MNL model. Two MNL
models are estimated. The three state MNL model predicts mortgage termination based on the
competing risks full prepayment and default. The five state MNL model incorporates the transient
states partial prepayment and delinquent payments. The predictions of the MNL models with three
and five states were fairly similar. The addition of partial prepayments is therefore not necessary
for predicting full prepayment rates. An indication for this was also provided during the variable
selection procedure, in which variables that were significant for prepayments were not significant
for partial prepayments and vice versa. The main risk drivers for prepayments are the refinancing
incentive, house prices, the Loan-to-Value ratio, the unemployment rate and loan size while these
variables appear to be insignificant for partial prepayments in the majority of the models. Overall
the MNL predictions performed very well. Out of sample forecasts scored well in terms of accuracy
and efficiency. Also the cross sectional analysis of model performance revealed high success rates and
low false alarm -and missed prepayment rates for the MNL models. The major drawback of MNL
models applied to panel data is the implicit assumption of independent consecutive observations.
This drawback became visible when assessing time series model performance. However, given the
fact that prepayment rates itself as well as well as the main macro-economic variables affecting them
show strong dependence over time autocorrelation is inherent.
The competing risk model is also evaluated and it is shown that for small values of the baseline
hazard the similarity of this model and the MNL model is considerable. The major disadvantage of
the competing risk model is its limited ability to include time varying explanatory variables. Given
the availability of a panel data set and the objective of this thesis to perform out of time analyses,
the added value of the competing risk model does not extend beyond the estimation of the baseline
survival function. This survival function has shown a relatively steep prepayment survival curve and
indicated that the effect of loan age on prepayment rates is not linear. To incorporate this, loan age
is included in the MNL model as a categorical variable.
The transition probabilities of the Markov models are estimated using covariate information and
by conditioning on the transient states of mortgage payment. The three state Markov model is
71
obtained by conditioning on contractual payment. The five state Markov models are estimated by
means of three conditional MNL models, conditioned on partial prepayment, delinquent payment and
contractual payment. Allthough the Markov models account for dependence between observations,
this appeared to be insufficient for incorporating the majority of the autocorrelation present in
prepayment rates.
The performance of the three exogenous models has been evaluated using a wide set of performance measures. The panel data set criteria include the R2 , the Akaike information criterion
and the deviance. Cross sectional performance is evaluated with contingency tables that compare
predicted and actual state realizations across mortgages. Time series wise, the models are evaluated
based on unbiasedness, accuracy and efficiency. The distribution of the residuals is assessed as well
as its autocorrelation properties. Based on the performance assessment of the models, the three
state multinomial logit model outperforms the other models cross sectional wise and time series
wise. Even though this model proved to be incapable of incorporating all time series dependence
present in prepayment rates, its in-sample and out-of-sample predictions are accurate. Allthough
estimation of the Markov models requires a factor three more parameters to be estimated (one
model per transient state), the performance of these models lacks behind the performance of the
multinomial logit model. This is mainly caused by the limited number of observations for, especially,
partial prepayment and delinquent payments. In the cross sectional analyses this lead to success
rates, false alarm rates and missed prepayment rates that were highly influenced by the prediction
of only a few prepayment observations. This was already an issue when evaluating in-sample model
performance but was exemplified during the out-of-sample analysis in which some transitions could
not be predicted at all. In the time series analysis the limited number of observations lead to a sharp
fluctuation of the prepayment rate over calender months, a feature that could not be captured by
the Markov models.
Concluding, the multinomial logit model is the most appropriate model for determining prepayment rates both time series wise and cross-sectional wise as well as in-sample and out-of-sample.
The competing risk model was insufficient for incorporating time varying covariates. Since macro
economic variables are an important indicator for prepayment rates, this model is not a well-suited
prepayment model. Allthough the Markov model theoretically seems a good candidate for a prepayment model, it suffers from some practical drawbacks. The main weakness of this model lies
in the fact that it produces less reliable estimates in case the number of observations for a certain
transition is limited. Given the size of the mortgage market mortgage level data can in general be
obtained in large quantities. However, the requirement of a lot of data for estimation comes at the
expense of a sharp increase in computation time. Taking this into consideration, Markov models
can therefore better be applied to systems in which transition probabilities are more evenly spread
across the state space.
Since the data set includes the credit crisis of 2008, which is likely to have a large impact on
prepayments rates, the parameter estimates of the three state MNL model are tested on stability.
A structural break is found shortly after the crisis. Prior to the crisis prepayment rates were
significantly higher than in the post crisis period. This is incorporated by allowing for different
coefficients in these periods. Since pre and post crisis prepayment rates appear to differ in terms of
the risk drivers by which they can be modelled, the prepayment model can be extended by allowing
for different variables in the model depending on the regime in which the model is.
A limitation of this research is the inability of the MNL models to incorporate dependency
between consecutive observations in a panel data set. Particularly in a prepayment model this constitutes a challenge. Future research could focus on methods of incorporating the strong dependence
between observations.
72
Starting from the cause specific hazard rate defined in Equation (3.6),
0
j (t, x) = j0 (t) exp (Xit
j )
(A.1)
0
j (t, x) = exp (log (j0 (t)) + Xit
j ).
(A.2)
0
Defining zitj = log (j0 (t)) + Xit
j ), the conditional probability of mortgage termination due to
cause j is given by
exp zitj
P
.
exp zitj
(A.3)
Comparing this to the MNL probability given in Equation (3.2) it can be seen that the two
models are comparable when the baseline hazard rate j0 (t) is small.
73
Mean
32.361
729.653
0.139
5.032
34.164
0.915
72.816
0.002
6.156
4.046
155.734
-0.605
0.007
0.271
0.303
0.253
0.181
0.652
0.100
0.248
0.074
0.000
0.145
0.008
0.772
0.294
0.212
0.151
0.183
0.094
0.045
0.021
St. Dev.
28.601
54.576
0.345
10.444
11.796
0.509
16.025
0.041
2.099
0.896
26.367
0.990
0.708
0.444
0.459
0.435
0.385
0.476
0.300
0.432
0.263
0.022
0.352
0.090
0.420
0.456
0.409
0.358
0.386
0.291
0.207
0.144
Min
1
360
0
0
1
0.048
6
0
2.100
1.700
100.590
-7.660
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Max
181
850
1
52
75
7.101
104
1
14.900
9.900
206.670
6.560
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Table 34: Summary statistics of dependent variables for full data set (29,932,667 obs).
74
C
FICO score
Firsthome
Mortgage insurance
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy Loan ID
Dummy FICO
Dummy Firsthome
Dummy Mortgage Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype Planned Unit Development
Propertytype Cooperative Share
Propertytype Manufactured Housing
Propertytype Single Family Home
Purpose No Cash-out Refinance
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
-13,121
-0,002
-0,477
-0,019
-0,002
-0,163
0,015
-1,596
0,087
-0,230
0,025
-1,376
20,400
-1,591
-1,041
2,904
0,210
1,802
-4,066
-1,112
-2,237
2,831
-0,092
0,298
-0,380
-4,478
-2,078
-2,115
-2,155
-0,202
-0,500
0,010
0,330
-0,517
-1,164
-2,235
-2,614
-3,307
P-value
0,000
0,060
0,003
0,001
0,721
0,070
0,000
0,656
0,054
0,076
0,000
0,000
0,000
0,348
0,000
0,000
0,620
0,004
0,000
0,061
0,002
0,000
0,510
0,024
0,002
0,000
0,000
0,000
0,279
0,374
0,337
0,959
0,036
0,003
0,000
0,000
0,000
0,000
Default
Coef.
-9,920
-0,016
-0,346
-0,021
0,038
-0,194
0,048
-0,019
0,320
-0,361
0,027
-1,380
18,258
-15,574
-1,401
2,343
2,117
1,987
-2,204
-1,320
-2,675
3,668
0,920
0,897
-0,337
-4,885
-2,137
-1,508
-5,537
-0,519
0,760
-0,248
0,554
0,278
-0,263
-1,401
-2,115
-3,096
P-value
0,000
0,000
0,110
0,005
0,000
0,136
0,000
0,996
0,000
0,006
0,000
0,000
0,000
0,000
0,000
0,001
0,000
0,005
0,000
0,029
0,001
0,000
0,000
0,000
0,055
0,000
0,000
0,000
0,234
0,082
0,207
0,315
0,017
0,246
0,263
0,000
0,000
0,000
Table 35: Coefficient estimates and p-values MNL3 model before variable selection.
75
C
FICO score
Firsthome
Mortgage ins
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy Loan ID
Dummy FICO
Dummy Firsthome
Dummy Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-4,716
-0,001
0,023
0,003
0,000
0,059
-0,001
-0,590
0,022
0,063
-0,001
-0,110
5,031
-0,449
-0,017
0,246
-0,078
-0,008
0,021
0,303
0,180
-0,114
-0,041
-0,049
0,041
0,159
0,068
0,030
-0,428
0,074
-0,074
0,125
1,172
1,316
1,452
1,637
1,735
2,049
P-value
0,000
0,000
0,539
0,015
0,877
0,013
0,514
0,127
0,000
0,000
0,164
0,000
0,000
0,084
0,563
0,010
0,336
0,963
0,740
0,000
0,016
0,666
0,178
0,107
0,234
0,000
0,124
0,387
0,241
0,160
0,566
0,005
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
-14,953
-0,001
-0,249
-0,015
-0,005
-0,061
0,010
-1,702
0,166
-0,227
0,019
-1,153
18,645
-1,124
-0,867
2,408
-0,124
1,650
-2,643
-0,877
-0,215
2,374
-0,064
0,224
-0,231
-2,906
-1,603
-1,634
-1,335
0,074
-0,160
0,229
0,882
0,538
0,094
-0,217
-0,562
-0,898
P-value
0,000
0,234
0,093
0,004
0,255
0,476
0,005
0,549
0,000
0,020
0,000
0,000
0,000
0,358
0,000
0,000
0,752
0,026
0,000
0,048
0,683
0,000
0,628
0,076
0,066
0,000
0,000
0,000
0,370
0,729
0,752
0,212
0,000
0,001
0,558
0,272
0,010
0,000
Delinq.
Coef.
3,355
-0,016
-0,160
0,000
0,020
-0,093
0,014
-1,426
0,059
-0,041
0,004
-0,371
4,008
-10,761
-0,117
0,531
0,999
-0,040
-0,124
-0,269
0,455
1,857
0,276
0,332
-0,070
-0,035
-0,092
0,257
-6,867
0,330
0,289
0,283
0,540
0,633
0,616
0,421
0,181
-0,081
P-value
0,000
0,000
0,003
0,924
0,000
0,006
0,000
0,156
0,000
0,017
0,000
0,000
0,000
0,000
0,002
0,000
0,000
0,829
0,228
0,001
0,000
0,000
0,000
0,000
0,131
0,481
0,111
0,000
0,454
0,000
0,027
0,000
0,000
0,000
0,000
0,000
0,014
0,387
Default
Coef.
-11,811
-0,015
-0,120
-0,018
0,035
-0,103
0,044
-0,104
0,405
-0,356
0,021
-1,178
16,726
-16,240
-1,232
1,864
1,830
1,890
-0,716
-1,082
-0,599
3,332
0,963
0,842
-0,189
-3,329
-1,691
-1,046
-5,800
-0,257
1,079
-0,039
1,153
1,378
1,035
0,615
-0,058
-0,707
Table 36: Coefficient estimates and p-values MNL5 model before variable selection.
76
P-value
0,000
0,000
0,580
0,018
0,000
0,438
0,000
0,972
0,000
0,001
0,000
0,000
0,000
0,002
0,000
0,018
0,000
0,025
0,088
0,030
0,336
0,000
0,000
0,000
0,307
0,000
0,000
0,000
0,426
0,391
0,081
0,876
0,000
0,000
0,000
0,029
0,859
0,080
C
FICO score
Firsthome
Mortgage ins
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy FICO
Dummy Firsthome
Dummy Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
-5,418
0,001
-0,110
0,000
-0,007
0,464
-0,003
-0,648
-0,096
0,104
-0,001
-0,685
1,318
-0,041
0,694
-0,387
0,093
-0,289
0,233
-0,227
0,393
-0,307
-0,107
0,136
-0,039
-0,148
-0,220
-0,456
0,067
-0,817
-0,009
0,550
0,365
0,201
0,092
0,026
-0,423
P-value
0,000
0,000
0,012
0,886
0,000
0,000
0,002
0,266
0,000
0,000
0,183
0,000
0,000
0,218
0,000
0,000
0,567
0,000
0,000
0,007
0,308
0,000
0,002
0,000
0,339
0,004
0,000
0,315
0,247
0,000
0,863
0,000
0,000
0,000
0,088
0,686
0,000
Default
Coef.
-4,431
-0,011
-0,020
-0,003
0,029
0,393
0,029
0,991
0,077
-0,019
0,001
-0,499
-24,371
-0,330
0,314
1,427
0,617
0,439
0,183
-0,405
1,239
0,593
0,391
0,044
-0,493
-0,295
0,216
-17,015
-0,210
0,172
-0,170
0,884
1,170
1,077
0,940
0,442
0,010
P-value
0,000
0,000
0,898
0,600
0,000
0,000
0,000
0,326
0,001
0,714
0,697
0,000
0,992
0,012
0,539
0,000
0,185
0,167
0,433
0,230
0,097
0,000
0,002
0,749
0,003
0,114
0,111
0,996
0,307
0,648
0,308
0,000
0,000
0,000
0,000
0,078
0,976
Table 37: Coefficient estimates and p-values Markov3 model before variable selection.
77
Part.Prep
C
FICO score
Firsthome
Mortgage insurance
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy FICO
Dummy Firsthome
Dummy Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Part.Prep
Coef.
-6,513
0,006
-0,153
-0,013
-0,002
-0,520
-0,002
-13,338
0,119
-0,069
0,001
0,113
3,667
-0,545
0,522
0,138
0,352
1,179
-0,139
0,469
-14,080
-0,506
-0,151
-0,082
-0,294
0,186
0,138
-13,646
0,418
-0,420
0,609
-0,651
-0,338
-0,187
-0,120
0,210
0,582
P-value
0,000
0,000
0,293
0,010
0,529
0,000
0,476
0,985
0,000
0,219
0,751
0,027
0,003
0,000
0,122
0,601
0,491
0,000
0,581
0,101
0,978
0,000
0,164
0,527
0,046
0,257
0,304
0,985
0,075
0,514
0,003
0,000
0,045
0,241
0,499
0,274
0,005
Prep.
Coef.
-16,959
0,003
-0,359
0,003
-0,003
0,337
0,005
2,011
-0,144
-0,046
-0,005
-0,319
2,148
-0,037
0,894
-0,212
0,899
-0,253
-0,070
-1,106
2,014
-0,265
-0,290
0,307
0,200
0,234
-0,058
-12,985
-0,173
-0,673
0,081
12,397
12,417
12,494
12,630
12,107
12,528
P-value
0,797
0,032
0,129
0,715
0,590
0,024
0,381
0,078
0,001
0,597
0,102
0,000
0,133
0,845
0,029
0,682
0,157
0,535
0,852
0,051
0,017
0,153
0,120
0,156
0,376
0,413
0,801
0,986
0,615
0,525
0,772
0,851
0,851
0,850
0,848
0,855
0,850
Delinq.
Coef.
8,490
-0,018
-0,014
0,002
-0,003
-0,310
-0,003
-11,432
0,086
0,052
-0,001
-0,006
-11,236
-0,122
-11,557
0,310
0,627
-0,765
-0,398
-0,380
-10,181
-0,105
0,094
-0,125
-0,511
-0,158
-0,685
-10,803
0,731
0,754
0,283
0,194
-0,164
0,107
-0,094
-0,998
0,402
P-value
0,000
0,000
0,969
0,861
0,753
0,246
0,720
0,987
0,145
0,666
0,801
0,960
0,000
0,672
0,933
0,662
0,580
0,507
0,504
0,666
0,983
0,727
0,745
0,683
0,145
0,659
0,055
0,988
0,212
0,410
0,592
0,646
0,727
0,805
0,841
0,125
0,476
Table 38: Coefficient estimates and p-values Markov5(1) model before variable selection.
78
Delinq.
C
FICO score
Firsthome
Mortgage ins
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy FICO
Dummy Firsthome
Dummy Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-2,441
0,000
0,366
-0,007
-0,008
0,136
-0,004
-10,922
0,029
0,019
0,000
-0,039
-0,075
0,028
0,171
-0,428
-0,899
0,126
0,050
0,702
0,266
-0,225
-0,114
-0,068
-0,014
-0,403
-0,508
-0,260
0,622
0,076
0,535
0,677
0,933
1,108
1,185
1,262
P-value
0,047
0,894
0,090
0,374
0,139
0,323
0,464
0,979
0,414
0,826
0,918
0,590
0,956
0,870
0,737
0,301
0,389
0,737
0,896
0,145
0,746
0,212
0,530
0,727
0,946
0,116
0,011
0,466
0,215
0,795
0,036
0,008
0,000
0,000
0,000
0,001
Prep.
Coef.
-1,719
0,005
-0,593
-0,012
-0,007
0,322
-0,021
-9,925
-0,484
-0,006
-0,003
-0,299
-4,189
-0,082
-0,954
-0,346
-7,185
-4,491
0,994
-0,321
-5,377
-0,411
-0,033
-0,247
-1,006
-1,496
-1,574
-0,689
0,276
-0,855
0,620
0,698
-0,198
-0,689
-0,425
-6,381
P-value
0,565
0,102
0,452
0,551
0,639
0,387
0,082
0,981
0,000
0,978
0,587
0,130
0,935
0,849
0,410
0,775
0,798
0,001
0,340
0,794
0,879
0,369
0,942
0,596
0,075
0,160
0,003
0,314
0,822
0,146
0,192
0,160
0,735
0,430
0,706
0,578
Delinq.
Coef.
-1,049
-0,003
-0,206
0,000
0,016
-0,168
0,007
-12,112
0,063
-0,058
0,002
-0,184
-2,022
0,033
0,188
0,600
-0,579
-0,225
-0,201
0,728
1,959
0,277
0,327
0,054
-0,020
-0,004
0,024
0,330
-0,058
0,215
0,454
0,502
0,589
0,353
0,828
0,057
P-value
0,087
0,000
0,085
0,984
0,000
0,020
0,043
0,976
0,000
0,148
0,084
0,000
0,006
0,686
0,492
0,006
0,123
0,307
0,270
0,003
0,000
0,002
0,000
0,594
0,856
0,971
0,806
0,057
0,835
0,158
0,000
0,000
0,000
0,005
0,000
0,784
Default
Coef.
-8,222
-0,003
-3,144
-0,016
0,048
0,400
0,031
-2,677
0,273
0,101
0,006
0,116
-9,135
-4,980
-3,321
-2,824
-3,876
5,123
1,648
-3,757
4,286
-0,614
-1,013
1,044
-5,216
-5,198
-0,822
-1,362
-6,005
-1,548
-0,714
-5,659
-0,433
0,233
-4,448
-3,545
Table 39: Coefficient estimates and p-values Markov5(3) model before variable selection.
79
P-value
0,234
0,601
0,154
0,701
0,110
0,581
0,485
0,995
0,117
0,787
0,717
0,783
0,849
0,293
0,850
0,822
0,869
0,531
0,384
0,649
0,116
0,474
0,281
0,246
0,318
0,427
0,492
0,301
0,697
0,068
0,446
0,238
0,636
0,819
0,599
0,720
Contr.Paym.
C
FICO score
Firsthome
Mortgage insurance
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy FICO
Dummy Firsthome
Dummy Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-6,266
-0,001
0,074
0,007
-0,001
0,102
-0,002
-0,230
0,245
0,019
0,001
0,099
0,055
0,231
0,348
-0,228
-0,104
2,680
0,081
0,757
-1,285
-0,058
-0,149
0,116
0,442
0,279
0,551
-0,441
0,106
-0,205
0,056
0,986
1,119
1,248
1,327
1,567
2,146
P-value
0,000
0,000
0,049
0,000
0,204
0,000
0,008
0,518
0,000
0,200
0,007
0,000
0,835
0,000
0,001
0,007
0,574
0,000
0,236
0,000
0,000
0,057
0,000
0,001
0,000
0,000
0,000
0,199
0,038
0,120
0,197
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
-5,327
0,001
-0,089
0,000
-0,007
0,476
-0,003
-1,034
-0,087
0,103
-0,001
-0,706
1,160
-0,041
0,697
-0,392
0,078
-0,148
0,211
-0,203
-0,131
-0,301
-0,094
0,147
-0,029
-0,130
-0,192
-0,429
0,086
-0,816
-0,003
0,539
0,362
0,186
0,074
0,033
-0,445
P-value
0,000
0,000
0,047
0,857
0,000
0,000
0,001
0,146
0,000
0,000
0,229
0,000
0,000
0,229
0,000
0,000
0,648
0,085
0,001
0,019
0,796
0,000
0,008
0,000
0,484
0,014
0,000
0,345
0,146
0,000
0,955
0,000
0,000
0,000
0,183
0,624
0,000
Delinq.
Coef.
3,204
-0,015
-0,111
0,002
0,014
-0,010
0,010
-1,018
0,043
-0,007
0,002
-0,294
-10,068
-0,106
0,480
0,815
0,235
0,400
-0,102
0,185
0,871
0,169
0,187
-0,074
0,036
-0,112
0,284
-18,354
0,207
0,344
0,227
0,229
0,332
0,264
0,161
-0,255
-0,232
P-value
0,000
0,000
0,094
0,260
0,000
0,824
0,000
0,311
0,000
0,740
0,000
0,000
0,000
0,036
0,002
0,000
0,290
0,002
0,311
0,150
0,005
0,001
0,000
0,219
0,570
0,145
0,000
0,996
0,033
0,036
0,006
0,000
0,000
0,000
0,026
0,012
0,067
Default
Coef.
-4,157
-0,012
0,005
-0,002
0,029
0,399
0,030
0,918
0,082
-0,021
0,001
-0,530
-24,828
-0,315
0,354
1,449
0,717
0,522
0,163
-0,369
0,518
0,627
0,433
0,024
-0,453
-0,262
0,255
-17,083
-0,134
0,258
-0,100
0,977
1,295
1,173
1,026
0,542
0,129
Table 40: Coefficient estimates and p-values Markov5(5) model before variable selection.
80
P-value
0,000
0,000
0,977
0,706
0,000
0,000
0,000
0,364
0,001
0,690
0,761
0,000
0,992
0,017
0,490
0,000
0,125
0,106
0,491
0,279
0,615
0,000
0,001
0,863
0,006
0,162
0,063
0,996
0,525
0,497
0,564
0,000
0,000
0,000
0,000
0,033
0,700
C
FICO score
Firsthome
Mortgage insurance
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy Loan ID
Dummy FICO
Dummy Firsthome
Dummy Mortgage Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype Planned Unit Development
Propertytype Cooperative Share
Propertytype Manufactured Housing
Propertytype Single Family Home
Purpose No Cash-out Refinance
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
-12,456
-0,002
-0,434
-0,020
-0,002
-0,168
0,015
-1,308
0,130
-0,229
0,024
-1,435
20,253
-2,087
-1,092
2,740
0,247
1,873
-3,981
-0,827
-2,755
2,635
-0,060
0,362
-0,407
-4,739
-2,112
-2,299
-2,272
-0,224
-0,323
-0,011
0,363
-0,455
-1,202
-2,316
-2,708
-3,432
P-value
0,000
0,021
0,008
0,001
0,580
0,071
0,000
0,741
0,007
0,095
0,000
0,000
0,000
0,201
0,000
0,000
0,569
0,004
0,000
0,187
0,000
0,000
0,678
0,008
0,001
0,000
0,000
0,000
0,271
0,344
0,536
0,955
0,025
0,010
0,000
0,000
0,000
0,000
Table 41: Coefficient estimates and p-values Hausman test three state MNL model (defaults removed).
81
C
FICO score
Firsthome
Mortgage insurance
DTI
Loansize
LTV
Penalty
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy Loan ID
Dummy FICO
Dummy Firsthome
Dummy Insurance
Dummy DTI
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-4,717
-0,001
0,023
0,003
0,000
0,059
-0,001
-0,590
0,022
0,063
-0,001
-0,110
5,032
-0,449
-0,017
0,246
-0,078
-0,005
0,022
0,303
0,180
-0,114
-0,041
-0,048
0,041
0,159
0,068
0,030
-0,427
0,074
-0,073
0,125
1,172
1,317
1,452
1,637
1,734
2,049
P-value
0,000
0,000
0,537
0,015
0,884
0,012
0,512
0,128
0,000
0,000
0,165
0,000
0,000
0,084
0,557
0,010
0,335
0,976
0,727
0,000
0,016
0,667
0,180
0,108
0,230
0,000
0,124
0,386
0,241
0,160
0,570
0,005
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
-18,784
-0,001
-0,205
-0,017
-0,006
-0,065
0,010
-1,196
0,178
-0,230
0,019
-1,157
22,328
-0,919
-0,900
2,464
-0,143
1,658
-2,598
-0,845
-0,276
2,210
-0,036
0,269
-0,265
-2,978
-1,606
-1,714
-1,293
0,099
0,082
0,250
0,885
0,573
0,068
-0,189
-0,544
-0,885
P-value
0,000
0,365
0,187
0,003
0,148
0,471
0,007
0,673
0,000
0,035
0,000
0,000
0,000
0,470
0,000
0,001
0,733
0,078
0,000
0,085
0,623
0,001
0,795
0,044
0,046
0,000
0,000
0,000
0,395
0,663
0,875
0,201
0,000
0,001
0,684
0,364
0,018
0,000
Delinq.
Coef.
3,357
-0,016
-0,160
0,000
0,020
-0,093
0,014
-1,430
0,059
-0,041
0,004
-0,371
4,012
-10,765
-0,116
0,531
0,999
-0,040
-0,125
-0,269
0,456
1,857
0,276
0,332
-0,070
-0,035
-0,092
0,258
-10,704
0,329
0,287
0,282
0,540
0,633
0,616
0,421
0,181
-0,083
P-value
0,000
0,000
0,003
0,927
0,000
0,006
0,000
0,155
0,000
0,017
0,000
0,000
0,000
0,000
0,003
0,000
0,000
0,828
0,226
0,001
0,000
0,000
0,000
0,000
0,133
0,483
0,111
0,000
0,864
0,000
0,028
0,000
0,000
0,000
0,000
0,000
0,014
0,381
Table 42: Coefficient estimates and p-values Hausman test five state MNL model (defaults removed).
82
Dummy Loan ID
Dummy Firsthome
Dummy Mortgage Insurance
Dummy Penalty
Dummy Unempl
Dummy Houseprice
Dummy Refinancing
Propertytype Planned Unit Development
Propertytype Cooperative Share
Propertytype Manufactured Housing
Propertytype Single Family Home
Purpose No Cash-out Refinance
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
20,229
-1,049
2,717
1,735
-4,572
-2,835
2,948
-0,039
0,324
-0,358
-4,396
-1,991
-2,108
-2,108
-0,198
-0,534
0,022
0,326
-0,522
-1,194
-2,249
-2,606
-3,290
P-value
0,000
0,000
0,000
0,006
0,000
0,000
0,000
0,772
0,013
0,004
0,000
0,000
0,000
0,276
0,379
0,300
0,907
0,036
0,002
0,000
0,000
0,000
0,000
Default
Coef.
17,765
-1,516
1,822
1,924
-3,884
-3,375
4,284
1,205
1,012
-0,236
-4,561
-1,871
-1,564
-5,363
-0,544
0,762
-0,152
0,694
0,401
-0,187
-1,202
-1,922
-2,798
P-value
0,000
0,000
0,008
0,006
0,000
0,000
0,000
0,000
0,000
0,166
0,000
0,000
0,000
0,238
0,061
0,193
0,527
0,002
0,086
0,415
0,000
0,000
0,000
Table 43: Coefficient estimates and p-values categorical variables MNL3 model.
83
Dummy LoanID
Dummy Firsthome
Dummy Insurance
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
5,013
-0,025
0,227
-0,046
0,039
0,343
-0,109
-0,030
0,027
0,159
0,062
0,008
-0,439
0,085
-0,064
0,130
1,177
1,323
1,461
1,645
1,741
2,063
P-value
0,000
0,365
0,016
0,783
0,527
0,000
0,681
0,225
0,419
0,000
0,150
0,824
0,229
0,105
0,620
0,003
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
19,403
-0,849
2,407
1,719
-2,549
-1,024
2,302
0,286
-0,171
-2,921
-1,562
-1,610
-1,321
0,104
-0,090
0,249
0,878
0,533
0,104
-0,212
-0,541
-0,841
P-value
0,000
0,000
0,000
0,034
0,000
0,021
0,001
0,006
0,166
0,000
0,000
0,000
0,388
0,633
0,861
0,176
0,000
0,001
0,519
0,289
0,014
0,000
Delinq.
Coef.
3,587
-0,098
0,572
0,138
-0,019
-0,238
2,432
0,227
0,010
0,009
0,053
0,245
-7,951
0,231
0,763
0,398
0,528
0,620
0,626
0,485
0,247
0,019
P-value
0,000
0,009
0,000
0,424
0,855
0,001
0,000
0,000
0,820
0,855
0,338
0,000
0,529
0,002
0,000
0,000
0,000
0,000
0,000
0,000
0,001
0,835
Default
Coef.
17,244
-1,047
1,793
1,910
-0,840
-1,122
4,145
0,376
-0,209
-3,304
-1,709
-1,117
-6,588
-0,206
1,310
0,263
1,243
1,338
0,984
0,625
-0,080
-0,623
Table 44: Coefficient estimates and p-values categorical variables MNL5 model.
84
P-value
0,000
0,000
0,031
0,035
0,052
0,024
0,000
0,010
0,241
0,000
0,000
0,000
0,543
0,486
0,031
0,285
0,000
0,000
0,000
0,026
0,808
0,118
Dummy FICO
Dummy Mortgage Insurance
Dummy DTI
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Propertytype Planned Unit Development
Propertytype Cooperative Share
Propertytype Manufactured Housing
Propertytype Single Family Home
Purpose No Cash-out Refinance
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
1,319
0,711
-0,380
-0,267
0,213
-0,157
-0,316
-0,119
0,137
-0,040
-0,146
-0,220
-0,459
0,067
-0,814
-0,008
0,543
0,358
0,193
0,083
0,016
-0,435
P-value
0,000
0,000
0,000
0,001
0,001
0,011
0,000
0,000
0,000
0,323
0,005
0,000
0,312
0,245
0,000
0,867
0,000
0,000
0,000
0,111
0,800
0,000
Default
Coef.
-24,687
0,243
1,498
0,487
0,185
-0,524
0,550
0,313
0,054
-0,481
-0,286
0,221
-17,317
-0,200
0,154
-0,147
0,900
1,200
1,117
0,982
0,474
0,024
P-value
0,993
0,633
0,000
0,110
0,425
0,025
0,000
0,011
0,693
0,003
0,124
0,101
0,996
0,328
0,683
0,379
0,000
0,000
0,000
0,000
0,057
0,942
Table 45: Coefficient estimates and p-values categorical variables Markov3 model.
85
Part.Prep
Dummy Insurance
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Part.Prep
Coef.
0,179
-13,642
-0,502
-0,151
-0,095
-0,340
0,076
-0,076
-13,749
0,417
-0,355
0,560
-0,625
-0,244
-0,068
0,003
0,331
0,784
P-value
0,582
0,979
0,000
0,111
0,444
0,015
0,624
0,534
0,986
0,071
0,570
0,005
0,000
0,140
0,663
0,985
0,077
0,000
Prep.
Coef.
1,107
1,694
-0,311
-0,300
0,354
0,260
0,223
0,075
-12,992
-0,140
-0,602
0,113
12,485
12,459
12,499
12,604
12,093
12,484
P-value
0,004
0,036
0,058
0,072
0,093
0,233
0,428
0,722
0,987
0,683
0,567
0,685
0,857
0,858
0,857
0,856
0,862
0,857
Delinq.
Coef.
-11,505
-11,426
0,044
0,152
-0,068
-0,476
0,080
-0,766
-11,847
0,702
1,109
0,416
0,197
-0,239
0,050
-0,022
-1,007
0,152
P-value
0,942
0,983
0,861
0,531
0,814
0,156
0,814
0,020
0,988
0,221
0,213
0,423
0,634
0,601
0,905
0,962
0,114
0,778
Table 46: Coefficient estimates and p-values categorical variables Markov5(1) model.
Delinq.
Dummy Unempl
Dummy divorcerate
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
0,533
0,230
-0,268
-0,148
-0,113
-0,019
-0,465
-0,642
-0,124
0,502
0,106
0,496
0,629
0,903
1,099
1,174
1,362
P-value
0,095
0,161
0,095
0,360
0,541
0,922
0,062
0,001
0,710
0,293
0,703
0,051
0,013
0,000
0,000
0,000
0,000
Prep.
Coef.
-4,051
0,985
-0,249
-0,003
-0,227
-0,896
-1,539
-1,490
-0,518
0,133
-0,799
0,549
0,528
-0,315
-0,780
-0,530
-9,009
P-value
0,001
0,021
0,564
0,994
0,611
0,099
0,143
0,003
0,433
0,910
0,167
0,244
0,275
0,578
0,351
0,633
0,834
Delinq.
Coef.
0,182
0,093
0,299
0,338
0,059
-0,027
0,032
-0,039
0,117
-0,155
0,034
0,502
0,555
0,696
0,486
0,963
0,300
P-value
0,299
0,260
0,000
0,000
0,541
0,798
0,784
0,675
0,470
0,568
0,810
0,000
0,000
0,000
0,000
0,000
0,128
Default
Coef.
1,300
0,447
-0,502
-0,952
0,673
-8,178
-7,929
-0,506
-1,772
-9,436
-1,801
-0,602
-8,205
-0,216
0,253
-7,509
-7,173
Table 47: Coefficient estimates and p-values categorical variables Markov5(3) model.
86
P-value
0,362
0,615
0,540
0,280
0,392
0,686
0,751
0,591
0,146
0,877
0,020
0,519
0,661
0,799
0,785
0,820
0,863
Contr.Pay
Dummy FICO
Dummy Insurance
Dummy DTI
Dummy divorcerate
Dummy Houseprice
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-0,055
0,077
-0,180
0,301
1,316
-0,012
-0,044
0,054
0,213
0,101
0,154
-0,323
0,081
-0,116
0,067
1,031
1,149
1,287
1,411
1,547
1,889
P-value
0,834
0,441
0,031
0,000
0,000
0,689
0,113
0,103
0,000
0,019
0,000
0,344
0,108
0,368
0,115
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
1,163
0,723
-0,388
0,185
-0,189
-0,311
-0,106
0,151
-0,023
-0,123
-0,177
-0,435
0,087
-0,813
-0,003
0,530
0,350
0,174
0,065
0,021
-0,455
P-value
0,000
0,000
0,000
0,004
0,001
0,000
0,002
0,000
0,575
0,020
0,000
0,339
0,142
0,000
0,950
0,000
0,000
0,000
0,228
0,752
0,000
Delinq.
Coef.
-10,141
0,331
0,866
-0,001
-0,018
0,154
0,153
-0,073
0,032
-0,118
0,251
-18,745
0,205
0,373
0,234
0,270
0,389
0,334
0,226
-0,205
-0,202
P-value
0,000
0,030
0,000
0,991
0,823
0,002
0,002
0,217
0,614
0,121
0,000
0,997
0,035
0,023
0,004
0,000
0,000
0,000
0,001
0,041
0,107
Default
Coef.
-25,310
0,270
1,533
0,180
-0,264
0,585
0,358
0,041
-0,453
-0,264
0,235
-17,519
-0,126
0,248
-0,077
1,000
1,334
1,221
1,070
0,570
0,100
Table 48: Coefficient estimates and p-values categorical variables Markov5(5) model.
87
P-value
0,993
0,595
0,000
0,440
0,166
0,000
0,004
0,768
0,005
0,157
0,083
0,997
0,550
0,514
0,657
0,000
0,000
0,000
0,000
0,024
0,764
Dummy Loan ID
Dummy Firsthome
Dummy Insurance
Dummy Penalty
Dummy Unempl
Dummy Houseprice
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prior
Prep
Coef.
20,965
0,440
0,499
0,209
0,761
0,066
0,718
-0,348
-0,179
-0,119
0,350
-0,041
-0,245
1,621
1,207
0,566
1,102
-0,080
-0,286
-0,374
-0,425
-0,263
-0,918
Crisis
P-value
0,000
0,330
0,651
0,919
0,641
0,968
0,842
0,447
0,689
0,818
0,527
0,952
0,635
0,787
0,179
0,771
0,160
0,860
0,578
0,460
0,573
0,872
0,737
Default
Coef.
17,778
-0,268
0,134
0,380
-0,807
2,539
-1,778
0,860
0,732
0,468
0,223
0,428
0,597
0,175
0,913
1,456
1,001
0,225
0,311
0,372
0,431
0,109
0,662
P-value
0,000
0,554
0,907
0,852
0,605
0,108
0,738
0,059
0,101
0,368
0,693
0,531
0,248
0,981
0,313
0,439
0,204
0,620
0,544
0,459
0,562
0,946
0,803
Post
Prep
Coef.
18,743
-0,595
-1,909
1,974
-5,133
-3,167
-2,108
-0,379
0,080
0,061
-4,273
-1,427
-1,922
-2,108
-0,101
-0,748
-0,198
1,000
0,051
-0,359
-1,190
-1,323
-2,001
Crisis
P-value
0,000
0,000
0,874
0,045
0,000
0,000
0,716
0,023
0,618
0,705
0,000
0,000
0,000
0,312
0,695
0,251
0,358
0,000
0,834
0,129
0,000
0,000
0,000
Default
Coef.
16,852
-0,837
0,722
2,332
-4,333
-4,191
5,376
0,887
0,583
-0,304
-4,530
-1,690
-1,717
-5,894
-0,582
0,882
-0,485
1,871
1,663
1,105
0,175
-0,363
-1,386
P-value
0,000
0,000
0,956
0,027
0,000
0,000
0,000
0,000
0,010
0,180
0,000
0,000
0,000
0,287
0,096
0,201
0,093
0,000
0,000
0,006
0,688
0,429
0,009
Table 49: Coefficient estimates and p-values categorical variables MNL3 model with structural
breaks.
88
C
Firsthome
Mortgage insurance
Loansize
LTV
Houseprice
Refinancing
Dummy Loan ID
Dummy Firsthome
Dummy Mortgage Insurance
Dummy Penalty
Dummy Unempl
Dummy Houseprice
Dummy Refinancing
Propertytype Planned Unit Development
Propertytype Cooperative Share
Propertytype Manufactured Housing
Propertytype Single Family Home
Purpose No Cash-out Refinance
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
-13,489
-0,422
-0,013
-0,455
0,020
0,017
-0,002
20,837
-1,297
2,452
2,226
-4,681
-4,211
0,999
0,274
0,370
-0,549
-4,730
-2,139
-2,344
-1,844
-0,297
-0,417
0,097
0,171
0,044
-0,272
0,058
-0,340
-1,071
P-value
0,000
0,005
0,017
0,000
0,000
0,000
0,985
0,000
0,000
0,000
0,001
0,000
0,000
0,142
0,040
0,004
0,000
0,000
0,000
0,000
0,402
0,174
0,432
0,596
0,255
0,788
0,082
0,752
0,094
0,000
Default
Coef.
-17,548
-0,312
-0,006
-0,351
0,057
0,016
0,067
18,288
-1,759
1,423
2,565
-4,134
-5,100
1,966
1,540
1,030
-0,439
-4,910
-2,012
-1,814
-5,343
-0,679
0,873
-0,106
0,602
1,084
0,913
1,344
0,593
-0,221
P-value
0,000
0,140
0,425
0,005
0,000
0,000
0,733
0,000
0,000
0,035
0,001
0,000
0,000
0,037
0,000
0,000
0,010
0,000
0,000
0,000
0,309
0,019
0,154
0,658
0,009
0,000
0,000
0,000
0,048
0,550
Table 50: Coefficient estimates and p-values MNL3 model with put option.
89
C
Mortgage insurance
LTV
Unemployment
Divorcerate
Houseprice
Refinancing
Dummy LoanID
Dummy Firsthome
Dummy Insurance
Dummy Penalty
Dummy Unempl
Dummy divorcerate
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-5,562
0,004
0,001
0,033
0,067
-0,002
0,043
4,971
-0,023
0,196
0,001
0,079
0,345
-0,191
-0,053
0,020
0,153
0,067
0,022
-0,418
0,081
-0,040
0,133
1,200
1,369
1,530
1,746
1,879
2,268
P-value
0,000
0,000
0,453
0,000
0,000
0,000
0,464
0,000
0,412
0,038
0,996
0,199
0,000
0,471
0,030
0,550
0,000
0,121
0,520
0,250
0,123
0,757
0,003
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
-17,554
-0,011
0,011
0,212
-0,219
0,022
0,043
19,898
-0,950
2,319
2,130
-2,421
-1,061
1,038
0,123
-0,229
-3,082
-1,577
-1,672
-0,755
-0,020
0,091
0,302
0,819
0,959
0,861
1,578
1,308
1,109
P-value
0,000
0,036
0,002
0,000
0,027
0,000
0,680
0,000
0,000
0,001
0,012
0,000
0,013
0,117
0,227
0,052
0,000
0,000
0,000
0,621
0,923
0,861
0,080
0,000
0,000
0,000
0,000
0,000
0,000
Delinq.
Coef.
-7,246
0,009
0,017
0,062
-0,031
0,001
0,122
3,313
-0,069
0,348
0,428
0,075
-0,249
1,914
0,111
-0,035
-0,024
0,077
0,313
-8,535
0,208
0,886
0,409
0,675
0,890
1,049
1,050
0,957
1,155
P-value
0,000
0,000
0,000
0,000
0,057
0,000
0,000
0,000
0,064
0,004
0,012
0,446
0,001
0,000
0,000
0,436
0,626
0,163
0,000
0,627
0,005
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
Table 51: Coefficient estimates and p-values MNL5 model with put option.
90
Default
Coef.
-22,376
-0,008
0,040
0,398
-0,310
0,027
0,114
17,602
-1,139
1,658
2,428
-0,809
-1,144
2,520
0,173
-0,283
-3,488
-1,725
-1,183
-6,466
-0,362
1,507
0,291
1,220
1,846
1,867
2,592
1,966
1,639
P-value
0,000
0,275
0,000
0,000
0,006
0,000
0,479
0,000
0,000
0,047
0,010
0,056
0,019
0,006
0,233
0,106
0,000
0,000
0,000
0,640
0,207
0,014
0,220
0,000
0,000
0,000
0,000
0,000
0,000
C
FICO score
Firsthome
DTI
Loansize
LTV
Unemployment
Divorcerate
Refinancing
Dummy FICO
Dummy Mortgage Insurance
Dummy DTI
Dummy Unempl
Dummy divorcerate
Dummy Houseprice
Propertytype Planned Unit Development
Propertytype Cooperative Share
Propertytype Manufactured Housing
Propertytype Single Family Home
Purpose No Cash-out Refinance
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Prepayment
Coef.
-4,914
0,000
-0,081
-0,003
0,309
-0,001
0,004
0,078
0,097
0,569
0,451
-0,223
0,271
0,298
-0,309
-0,306
-0,227
0,012
-0,122
-0,187
-0,214
-0,395
0,068
-0,719
0,003
0,667
0,611
0,618
0,701
0,812
0,876
P-value
0,000
0,567
0,061
0,003
0,000
0,430
0,496
0,000
0,007
0,047
0,000
0,016
0,001
0,000
0,000
0,000
0,000
0,738
0,003
0,000
0,000
0,383
0,240
0,001
0,952
0,000
0,000
0,000
0,000
0,000
0,000
Default
Coef.
-3,842
-0,012
0,053
0,031
0,306
0,031
0,148
-0,017
0,144
-25,347
0,047
1,607
0,807
0,288
-0,503
0,558
0,232
-0,012
-0,520
-0,269
0,209
-17,180
-0,173
0,208
-0,123
1,016
1,421
1,457
1,414
1,007
0,950
P-value
0,000
0,000
0,734
0,000
0,001
0,000
0,000
0,744
0,288
0,992
0,927
0,000
0,007
0,234
0,032
0,000
0,059
0,931
0,001
0,147
0,126
0,996
0,399
0,581
0,461
0,000
0,000
0,000
0,000
0,000
0,003
Table 52: Coefficient estimates and p-values Markov3 model with put option.
91
Part.Prep
C
Loansize
Penalty
Unemployment
Refinancing
Dummy Insurance
Dummy Refinancing
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Part.Prep
Coef.
0,542
-0,653
-13,887
-0,017
-25,440
0,623
-14,772
-0,555
-0,239
-0,131
-0,331
0,116
-0,106
-13,312
0,362
-0,348
0,567
-0,638
-0,268
-0,068
0,011
0,240
0,739
P-value
0,212
0,000
0,988
0,341
0,000
0,059
0,986
0,000
0,012
0,297
0,018
0,459
0,385
0,984
0,118
0,578
0,004
0,000
0,105
0,656
0,947
0,177
0,000
Prep.
Coef.
-18,061
0,429
1,960
-0,059
20,099
0,734
1,526
-0,250
-0,274
0,368
0,231
0,213
0,080
-12,753
-0,085
-0,535
0,119
12,395
12,459
12,610
12,835
12,443
12,941
P-value
0,777
0,005
0,082
0,043
0,000
0,059
0,060
0,129
0,103
0,080
0,287
0,448
0,703
0,986
0,804
0,610
0,667
0,846
0,845
0,843
0,841
0,845
0,839
Delinq.
Coef.
-8,937
-0,113
-12,495
0,110
36,802
-12,284
-12,686
0,174
0,322
0,042
-0,478
0,035
-0,672
-11,831
0,789
1,172
0,397
0,212
-0,184
0,134
0,111
-0,753
0,281
P-value
0,000
0,660
0,992
0,015
0,000
0,948
0,990
0,490
0,192
0,886
0,156
0,918
0,042
0,988
0,169
0,188
0,444
0,611
0,688
0,746
0,801
0,230
0,572
Table 53: Coefficient estimates and p-values Markov5(1) model with put option.
92
Delinq.
C
FICO score
LTV
Unemployment
Refinancing
Dummy Unempl
Dummy divorcerate
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-1,715
0,000
-0,004
0,030
-5,658
0,536
0,193
-0,252
-0,143
-0,123
-0,018
-0,432
-0,628
-0,136
0,535
0,108
0,499
0,646
0,921
1,135
1,207
1,463
P-value
0,117
0,721
0,357
0,353
0,238
0,090
0,246
0,116
0,376
0,505
0,924
0,085
0,001
0,683
0,262
0,698
0,049
0,011
0,000
0,000
0,000
0,000
Prep.
Coef.
-6,681
0,007
-0,024
-0,316
21,163
-3,525
1,208
-0,187
-0,010
-0,247
-0,803
-1,624
-1,442
-0,411
0,202
-0,778
0,583
0,610
-0,172
-0,611
-0,311
-8,639
P-value
0,028
0,025
0,013
0,004
0,046
0,005
0,008
0,664
0,982
0,579
0,142
0,123
0,005
0,536
0,864
0,177
0,216
0,207
0,761
0,458
0,777
0,844
Delinq.
Coef.
-1,497
-0,002
0,008
0,068
4,693
0,370
0,124
0,327
0,341
0,028
-0,030
0,021
-0,015
0,145
-0,135
0,034
0,529
0,612
0,791
0,622
1,139
0,580
P-value
0,005
0,001
0,003
0,000
0,034
0,033
0,135
0,000
0,000
0,769
0,775
0,860
0,868
0,372
0,620
0,813
0,000
0,000
0,000
0,000
0,000
0,002
Default
Coef.
-5,558
0,000
0,017
0,223
1,506
1,260
0,401
-0,596
-0,955
0,665
-8,250
-8,054
-0,595
-1,736
-9,579
-1,781
-0,636
-8,325
-0,321
0,117
-7,732
-7,596
Table 54: Coefficient estimates and p-values Markov5(3) model with put option.
93
P-value
0,267
0,948
0,568
0,125
0,948
0,377
0,650
0,462
0,278
0,395
0,689
0,750
0,518
0,154
0,878
0,019
0,496
0,661
0,702
0,897
0,819
0,858
Contr.Pay
C
FICO score
Firsthome
DTI
Loansize
LTV
Unemployment
Divorcerate
Refinancing
Dummy FICO
Dummy Insurance
Dummy DTI
Dummy divorcerate
Dummy Houseprice
Propertytype PUD
Propertytype CS
Propertytype MH
Propertytype SFH
Purpose No Cash
Purpose Purchase
Region 2
Region 3
Region 4
Region 5
Loan Age 2-3Y
Loan Age 3-4Y
Loan Age 4-6Y
Loan Age 6-8Y
Loan Age 8-10Y
Loan Age 10-15Y
Pa.Prep
Coef.
-5,397
-0,001
0,051
0,000
0,016
0,002
0,096
0,049
0,037
-0,031
0,049
-0,161
0,314
1,324
-0,009
-0,038
0,020
0,208
0,064
0,128
-0,328
0,101
-0,143
0,073
1,029
1,143
1,278
1,398
1,530
1,864
P-value
0,000
0,000
0,167
0,945
0,752
0,043
0,000
0,000
0,003
0,906
0,621
0,053
0,000
0,000
0,746
0,170
0,536
0,000
0,128
0,000
0,337
0,043
0,267
0,087
0,000
0,000
0,000
0,000
0,000
0,000
Prep.
Coef.
-5,464
0,001
-0,096
-0,004
0,072
-0,002
-0,058
0,099
-0,660
1,338
0,589
-0,271
0,232
-0,138
-0,291
-0,067
-0,024
-0,066
-0,306
-0,303
-0,508
0,197
-0,948
0,035
0,504
0,312
0,126
0,004
-0,063
-0,585
P-value
0,000
0,000
0,031
0,001
0,278
0,027
0,000
0,000
0,000
0,000
0,000
0,005
0,000
0,014
0,000
0,047
0,517
0,114
0,000
0,000
0,264
0,001
0,000
0,495
0,000
0,000
0,004
0,948
0,335
0,000
Delinq.
Coef.
3,501
-0,015
-0,098
0,014
0,089
0,011
0,027
0,012
-0,280
-10,146
0,333
0,860
-0,003
-0,018
0,154
0,154
-0,069
0,032
-0,113
0,255
-18,818
0,199
0,374
0,232
0,271
0,391
0,335
0,228
-0,202
-0,198
P-value
0,000
0,000
0,141
0,000
0,337
0,000
0,002
0,543
0,000
0,000
0,028
0,000
0,971
0,820
0,002
0,002
0,235
0,616
0,131
0,000
0,997
0,038
0,023
0,005
0,000
0,000
0,000
0,001
0,044
0,112
Default
Coef.
-3,948
-0,011
-0,009
0,032
0,106
0,029
0,082
-0,011
-0,510
-25,254
0,161
1,658
0,236
-0,230
0,590
0,362
-0,109
-0,471
-0,426
0,124
-17,635
-0,010
0,171
-0,055
0,984
1,310
1,193
1,026
0,493
-0,008
Table 55: Coefficient estimates and p-values Markov5(5) model with put option.
94
P-value
0,000
0,000
0,956
0,000
0,724
0,000
0,000
0,837
0,000
0,994
0,751
0,000
0,309
0,224
0,000
0,004
0,419
0,004
0,019
0,350
0,997
0,961
0,652
0,749
0,000
0,000
0,000
0,000
0,050
0,982
An alternative to the CUSUM tests is Quandts likelihood ratio test (QLR) which is a likelihood
ratio test for testing H0 : = 0 versus H1 : 6= 0 when the break point tb is unknown. The QLR
test statistic is defined as the maximum Chow test statistic, FT () defined by
FT () =
(G.1)
in which SSR,t = 0,t ,t is the sum of squared residuals (SSR) from the model using observations
, ...t.
The maximum is taken over a range of break dates t0 , ..., t1
tb
)=
max FT (),
(G.2)
T
[0 ,...,1 ]
in which i = i/T are trimming parameters, i = 0, 1. If there is no knowledge of the break date
the parameters can be set as 0 = 0.15 and 1 = 0.85 (Andrews, 1993). The trimming fraction is
applied to ensure reliable/accurate parameter estimates before and after the break. Each individual
FT () follows a 2 (k) distribution. Due to possible correlations between FT (1 ) and FT (2 ), the
distribution of supF is complicated to derive analytically. (Andrews, 1993) derived the distribution
of supF numerically and shows that it depends on both k and .
The location of supF coincides with the location where the residual variance is minimized. This
location can be determined by estimating the model for each possible break date, 0 T < tb < 1 T ,
T
P
2t (). An estimate
and computing the variance of the residuals at each point in time,
2 () = T1
supF =
max
tb [t0 ,...,t1 ]
FT (
t=1
95
(G.3)
References
Federal Housing Finance Agency. The size of the affordable mortgage market: 2015-2017 enterprise
single-family housing goals. Technical report, Federal Housing Finance Agency, 2014.
P.D. Allison. Measures of fit for logistic regression. SAS Global Forum, Paper 1485, 2014.
D.W.K. Andrews. Tests for parameter instability and structural change with unknown change point.
Econometrica, 59(5826):817858, 1993.
E.K. Baldvinsdottir and L. Palmborg. On constructing a market consistent economic scenario
generator. Handelsbanken Liv, 2011.
M. Bissiri and R. Cogo. Modelling behavioral risk. Cassa Depositi e Prestiti S.p.a, 2014.
T. Bjork. Arbitrage theory in continuous time. Oxford University Press, Third Edition, 2009.
D. Brigo. Interest rate models - Theory and practise. With smile, inflation and credit. Springeer
Finance, 2010.
W. Burns. Prepayment modelling challenges in the wake of the 2008 credit and mortgage crisis.
Interactive Data Fixed Income Analytics, 0930:18, 2010.
A. Van Bussel. Valuation and interest rate risk of mortgages in the Netherlands. PhD thesis,
Maastricht University, 1998.
E. Charlier and A. Van Bussel. Prepayment behavior of dutch mortgagors. Center Discussion Paper,
Tilburg: Econometrics, 2001(64):133, 2001.
J.M. Clapp, G.M. Goldberg, J.P. Harding, and M. LaCour-Little. Movers and shuckers: Interdependent prepayment decisions. Center for Real Estate, University of Conneticut, 2000.
M. Consalvi and G. Scotto di Freca. Measuring prepayment risk: an application to unicredit family
financing. UniCredit Universities Working Paper Series, (05):135, 2010.
Y. Deng. Mortgage termination: An empirical hazard model with stochastic term structure. Journal
of Real Estate Finance and Economics, 14(3):309331, 1997.
Y. Deng, J.M. Clapp, and X. An. Unobserved heterogeneity in models of competing mortgage
termination. Social Science Research Network, 2005.
K. Gerardi and C. Hudson. Did nonrecourse mortgages cause the mortgage crisis?
Research Federal Reserve Bank of Atlanta, 2010.
Real Estate
Y. Gonchanov. An intensity based approach for valuation of mortgage contracts subject to prepayment
risk. PhD thesis, University of Illinois, Department of Mathematics, Statistics and Computer
Science, 2002.
J. Green and J.B. Shoven. The effects of interest rates on mortgage prepayments. National Bureau
of Economic Research, (1246):132, 1983.
S. Gudell. Q1 2015: Negative equity report: After three long years, the hard work begins now.
Zillow Real Estate Research, 2015.
96
J.P.A.M. Jacobs, R.H. Koning, and E. Sterken. Modelling prepayment risk. University of Groningen,
2005.
A. Kolbe. Valuation of mortgage products with stochastic prepayment-intensity models. PhD thesis,
Technical University of Munchen, Centre for Mathematics, 2008.
S. Perry, S. Robinson, and J. Rowland. A study of mortgage prepayment risk. Housing Finance
International, pages 3651, 2001.
M. Plomp. Economic scenario generator, 2013. KPMG.
J.M. Quigley, Y. Deng, and R. Van Order. Mortgage terminations, heterogeneity and the exercise
of mortgage options. Econometrica, 68(2):275307, 2000.
G. Rodriguez. Non parametric estimation in survival models. Princeton, 2005.
H. Tanizaki. Asymptotically exact confidence intervals of cusum and cusumsq tests: A numerical
derivation using simulation techniques. Communications in Statistics - Simulation and Computation, 24(4):10191036, 2007.
P. Vasconcelos. Modelling prepayment risk: Multinomial logit approach for assessing conditional
prepayment rate, 2010. NIBC.
W. Vijverberg. Testing for iia with the hausman-mcfadden test. IZA Discussion Paper, (5826),
2011.
97