Agency Mbs Prepayment Model
Agency Mbs Prepayment Model
M
Jiawei “David” Zhang ortgages and mortgage-backed • Prepayment due to housing turnover
is a managing director in securities (MBS) are among and prepayment due to rate refinance
Securitized Product Research
the largest financial sectors in have very different risk factor sensitivi-
at MSCI in New York, NY.
[email protected]
the United States. They also ties (Yu 2018). Because of this, housing
serve as key levers for the central bank and turnover and rate refinance are often
X iaojian “Jan” Zhao federal government in managing monetary estimated separately.
is a principal in Advanced policy and stimulating the national economy, • Rate refinance tends to increase with
Analytics at Ernst & Young as shown during the recent financial crisis. loan size, while housing turnover
LLP in New York, NY.
[email protected]
Mortgage prepayment modeling is essential often decreases with loan size.
to investment and risk analysis for MBS. • Prepayment sensitivities to loan age,
Joy Zhang It is also among the most complex areas of seasonality, geographical location/
is an executive director financial modeling. The complexities include state, house price appreciation, etc.,
and director in Securitized the following: also differ between housing turnover
Product Research at MSCI,
and rate refinance.
New York, NY.
[email protected]
• A large dataset—for example, the agency • Housing turnover and rate refinance
MBS data cover about 450,000 pools each have their own subset of models,
Fei T eng and 100 million loans, over 20 years; the which include cash-out refinance, term
is a senior quantitative analyst data volume is in the order of terabytes. refinance, “trade-up,” etc. In addition,
in Quantitative Advisory • A large set of risk factors—the number agency MBS prepayment can also be
Services at Ernst & Young
LLP in New York, NY.
of risk factors, including loan/pool caused by curtailment and delinquency
[email protected] and borrower attributes, and macro- buyout. These submodels have their
economic drivers, ranges from tens to own patterns of risk factor dependen-
Siyu Lin hundreds (Dunsky 2014). cies. As such, they are often specified
is a senior quantitative analyst • Difficulties in model specification and esti- and estimated separately. However,
in Quantitative Advisory
mation because of prepayment data often do not include
Services at Ernst & Young
LLP in New York, NY.
• risk factors often being highly the reason for prepayment.
[email protected] nonlinear and interactive. • Housing and mortgage policy changes
• regime changes in credit availability, and borrowers’ behavior changes also
Hongyuan “H enry” Li borrower behavior, and business lead to prepayment regime changes
is an executive director practice. (Zhang 2018b). Technology changes
in Quantitative Advisory
and loose mortgage credit produced
Services at Ernst & Young Some examples of nonlinearity and
LLP in New York, NY. very fast prepayment speeds from
interactive risk factors and regime changes
[email protected] 2003 to 2006. The subsequent housing
are as follows:
6WRS
9DOXHVRIKLGGHQOD\HUVDUHJHQHUDWHGE\ 7ZHDNZHLJKWVDQGELDVHV
XVLQJJUDGLHQWIXQFWLRQ *UDGLHQW$SSURDFK
DFWLYDWLRQIXQFWLRQVZLWKSUHYLRXVOD\HUV
DVLQSXWVDQGZHLJKWVDQGELDVHVDV %DFNZDUGSURSDJDWLRQRSWLPL]HZHLJKWVDQGELDVHV SDUDPHWHUV
SDUDPHWHUV
XQWLOVRPHVWRSSLQJFRQGLWLRQKDVEHHQPHWE\SDVVLQJWKHHUURU
([DPSOHRID1HXUDO1HWZRUNDUFKLWHFWXUH 1RGH VLJQDOWKURXJKWKHQHWZRUNXVLQJJUDGLHQWIXQFWLRQ
network can represent any bounded degree polyno- financial institutions have more motivations to pursue
mial, provided the neural networks are sufficiently deep learning techniques nowadays.
large under mild assumptions on the activation function Many of the MBS prepayment modeling difficul-
(Andoni et al. 2014). Thus, neural networks can learn ties—for example, large datasets, large numbers of risk
extremely complex patterns that may prove challenging factors, nonlinear and highly interactive natures between
for other algorithms. the risk factors—may be suited to the neural networks
Deep learning algorithms are usually not suitable modeling approach. This article describes an example of
as general-purpose methods. Although deep learning applying this new approach to agency MBS pool-level
has the capability of high prediction power, it requires fixed-rate prepayment modeling. The AI modeling
much more data to train than other algorithms because project is a collaboration between MSCI’s Securitized
the models have dramatically more parameters to esti- Products Research group and Ernst & Young LLP’s
mate. It also requires much more expertise to tune Quantitative Advisory Servises (QAS). We also con-
(i.e., architecture setup and hyperparameter selec- trast the AI model results with a production prepayment
tion, which are intensively time consuming). In addi- model used at MSCI Securitized Products Research (the
tion, deep learning is sometimes outperformed by tree “human” model or “Hmodel” in the exhibits’ captions)
ensembles for machine learning classification problems that was constructed and maintained over a long period
(Schmidhuber 2015). Besides, the black-box nature of using the traditional modeling approach discussed in the
neural networks is a barrier to adoption in applications previous section (Yu 2018).
where interpretability is essential. Exhibit 1 shows the schema of the neural net-
The original concept of a neural networks can works modeling approach for the agency prepayment
be traced back more than half a century. Due to the model. We formulate the prediction of prepayment
recent development of computation power and data speed as a regression problem. We apply feedforward
storage capacity, neural networks algorithms can now neural networks here along with several functional
be efficient for analyzing business problems. Recent sublayers and development techniques to predict pre-
analytical tools also provide good transparency on the payment speeds.
neural network approach taken, for example, layer- This MBS machine learning work involves an
wise relevance propagation (LRP; Bach et al. 2015) iterative process that includes exploratory data analysis
and deep learning important features (DeepLIFT; Koh (EDA), feature selection, machine learning modeling
and Liang 2017 and Shrikumar et al. 2017). Therefore,
60
50
40
CPR
30
20
10
0
Aug-03
Apr-04
Dec-04
Aug-05
Apr-06
Dec-06
Aug-07
Apr-08
Dec-08
Aug-09
Apr-10
Dec-10
Aug-11
Apr-12
Dec-12
Aug-13
Apr-14
Dec-14
Aug-15
Actual NNM
design and development, and performance evaluation. use a 10% random sample of the data between years 2003
It incorporates the following: and 2015 to train the model, then compare the model’s
performance with actual prepayment behavior for years
• Exploratory data analysis. Examine data quality, between 2003 and 2015 (in-time out-of-sample tests2)
perform data cleansing (missing values impu- and for years between 2016 and 2018 (out-of-time tests3).
tation and outlier detection), and conduct data We show three categories of model results:
transformation
• Model feature selection. Identify risk drivers and con- 1. Model versus actual prepayment speeds for the
struct input variable sets. overall fixed-rate 30-year mortgage universe.
• Neural network model design and development. Specify 2. Model versus actual prepayment speeds for pool
the neural networks model parameters, objective cohorts of various loan/pool attributes under dif-
functions, and convergence/training methods ferent rates and macroeconomic environments (to
• Performance evaluation. Evaluate model performance, test the ability to model complex, nonlinear, and
understand model sensitivities, and identify poten- interactive risk factors).
tial model overfitting 3. Model’s prepayment sensitivities to various risk fac-
tors, such as loan/pool attributes and rates/mac-
The training for a single neural networks model roeconomic variables (to test whether the model’s
using a 10% data sample takes about three hours. This behaviors are consistent with intuitions embedded
modeling efficiency compares extremely favorably with in the MSCI production model—the Hmodel—
the traditional modeling approach, which often takes that has been developed over the years using the
months or years. traditional approach; this also tests for potential
We discuss the NNM model results in the fol- overfitting issues that are often suspected of the
lowing section, leaving details of the data and model neural networks modeling approach).
specifications to the Appendix.
2
An in-time out-of-sample test uses the data from the same
SAMPLE MODEL RESULTS time period as the training data, but the sample data are exclusive
from the training data.
In order to test our NNM approach for out-of- 3
An out-of-time test uses the data from a different time
sample forecast ability and model overfitting issues, we period from the training data.
20
15
CPR
10
0
16
16
17
17
18
8
-1
-1
l-1
-1
-1
-1
l-1
-1
-1
n-
p-
n-
p-
n-
ar
ar
ar
ay
ov
ay
ov
Ju
Ju
Ja
Se
Ja
Se
Ja
M
M
M
M
N
N
Actual NNM Hmodel
Exhibit 2 shows very good in-time out-of-sample Exhibit 4 shows error tracking results against loan/
error tracking for the overall 30-year sector prepayment pool variables of FICO and SATO (spread at the origina-
speeds. NNM is able to accurately replicate the overall tion), a credit indicator that complements other borrower
speeds with training from only 10% of the pool sample credit variables such as FICO, OLTV (original loan-to-
data. These forecast errors are generally marginally value ratio), loan size, and CLTV (current loan-to-value
smaller than those achieved from the Hmodel. ratio). These are constructed by bucketing the prepayment
Exhibit 3 shows NNM has good out-of-sample speeds and model forecasts based on the pool variables.
error tracking for the overall 30-year sector prepay- NNM is able to capture accurately the sensitivities
ment speeds. The model is able to accurately forecast to the loan/pool variables, on par with the MSCI Hmodel.
the overall speeds between 2016 and 2018 with training Note the contrast between the pre- and post-crisis pre-
from only 10% of the pool sample data from years prior payment behavior versus borrowers’ FICO scores, as dis-
to 2016. cussed in the previous section: On average, prepayment
NNM exhibits modest under-forecasts for the speeds were higher with low-FICO borrowers pre-crisis,
second half of 2016. However, our MSCI human models while the relationship was reversed after the crisis, since
also show similar under-forecast patterns for this period. mortgage credit has been generally tighter.
We also tested NNM by varying the training periods In addition, as shown, NNM is able to accurately
and the relative weight of the training data. The patterns capture the general relationship between prepayment
of under-forecast persist. We conclude that the modestly speeds and SATO, loans sizes, and CLTV:
higher-than-expected prepayment speeds in the second
half of 2016 were likely caused by risk factors outside • Prepayment speeds are generally lower for higher
of those embedded in our data. In this case, NNM can SATO pools because borrowers with poorer credit
serve as indicator of true prepayment surprises. are often slower to respond to refinance incentives.
For the out-of-sample forecast period of 2017, how- • Prepayment speeds are generally higher for large-
ever, NNM forecasts are close to actual prepayment speeds loan-size pools because the f ixed portion of
while Hmodel forecasts are generally under-forecasting. refinance costs often reduces economic benefits
The short-term (month-on-month) prepayment forecasts of refinance for lower loan sizes.
from broker-dealers’ MBS research publications during this • Prepayment speeds are generally slower for high
periods often have forecasting errors of similar amplitudes. CLTV loans due to higher credit risk.
CPR
30
CPR
15
10 20
5 10
0 0
720 740 760 780 720 740 760 780 –1.5 –1 –0.5 0 0.5 1
Pre-2008 Post-2008 SATO
Actual NNM
35 30
30 25
25
20
20
CPR
CPR
15
15
10
10
5 5
0 0
50000 100000 150000 200000 250000 300000 350000 55 65 75 85 95 105
Current Loan Size CLTV
Exhibit 5 shows an example of how NNM accu- Exhibits 4 and 5 measure model performance
rately captures state-level prepayment behaviors (and against a single pool variable. However, MBS pools are
generally better the human model). Generally, California measured by about 30 variables. We have been advo-
loans have higher refinance speeds due to large loan size cating a new ranking-based comprehensive pool-level
and more efficient mortgage refinance business practice, error tracking methodology (Zhang 2018b).
while New York loans are generally slower due to mort- Exhibit 6 shows an example of this ranking-based
gage-recording taxes. In this case, NNM performance error tracking for coupon 4s. All pools for coupon 4s
is generally better than the MSCI Hmodel. Accurately are ranked and sorted based on their model prepayment
modeling state-level prepayment behavior is often dif- forecasts, from slowest to the fastest. Then the whole
ficult for the traditional cohort building approach to cohort is bucketed into 15 equal-sized groups based on
prepayment modeling. State pool cohorts tend to have this model’s forecasted speed-based ranking, from the
different distributions of other pool-level variables, for slowest (group 1) to the fastest (group 15). The number
example, loan size and house price appreciation rates. 15 is chosen so that each group is reasonably large and
This makes it difficult for model specification and esti- statistical errors are smaller than differences in pre-
mation to isolate the pure state-level effect. payment speeds between groups. We believe that this
&35 110 +PRGHO
&35
$XJ
-XO
-XQ
0D\
$SU
0DU
)HE
-DQ
'HF
1RY
2FW
6HS
$XJ
-XO
-XQ
0D\
$SU
0D\
$SU
0DU
)HE
-DQ
'HF
1RY
2FW
6HS
$XJ
-XO
-XQ
0D\
$SU
0DU
)HE
&$ 1<
Exhibit 6
Ranking-Based Sample Error Tracking for Coupon 4s for NNM and Hmodel
110 +PRGHO
&35
&35
ranking-based error tracking methodology provides a Exhibit 6 shows that NNM is accurate across all
comprehensive measure of model accuracy across all pool variables for coupons 4s and performed better than
pool variables (Zhang 2018b). the Hmodel.
25
20
15
10
5
0
1 4 7 10 13 3 6 9 12 15 2 5 8 11 14 1 4 7 10 13 3 6 9 12 15 2 5 8 11 14
3.5 4 3.5 4 3.5 4
Actual NNM
Hmodel
50
45
40
35
30
CPR
25
20
15
10
5
0
1 4 7 10 13 3 6 9 12 15 2 5 8 11 14 1 4 7 10 13 3 6 9 12 15 2 5 8 11 14
3.5 4 3.5 4 3.5 4
Actual Hmodel
Exhibit 7 shows sample ranking-based error The f lip side of neural networks models’ ability
tracking across various coupons and historical time to model highly nonlinear and interactive risk factors is
periods. NNM is able to accurately differentiate pre- lack of transparency. Given the multiple hidden layers
payment behavior across all pool variables, often better and large numbers of nodes involved (see Exhibit 1
than the MSCI human model. This is likely due to the and the later discussion on neural networks develop-
ability of neural networks to model highly nonlinear ment in the Appendix), the relationship between pre-
and interactive risk factors. payment and input variables is not transparent and can
&35
±
±
±
±
±
,QFHQWLYH ESV
potentially be overly noisy. In the context of the neural We now examine NNM’s ability to model several
networks modeling methodology, this is often referred more complex prepayment behaviors: burnout, “media
to as the overfitting problem. In the neural networks effect,” and HARP.
development discussion in the Appendix, we discuss In burnout, loans or pools that saw refinance incen-
the modeling techniques we employed to avoid the tives in the past are generally slower when exposed to
overfitting issue. subsequent refinance opportunities. By not responding
In order to enhance the transparency of the neural to earlier refinance opportunities, loans revealed their
networks model, we test its sensitivities to risk factors/ hidden attributes that are not conducive for refinance,
input variables to see whether these behaviors are consis- and these hidden attributes are often more indicative
tent with economic intuition, which are often obtained than original explicit loan attributes in terms of fore-
through the traditional modeling approach. We pick casting future refinance intensity.
representative loan/pool cohorts and compute how the We examine NNM burnout effect by error
prepayment model forecasts may change with varying tracking against the past refinance incentive. Exhibit 10
pools variables and macroeconomic variables. shows that pools that have higher past refinance incen-
Exhibit 8 shows an example for loan size. NNM tives are generally slower in refinance speeds, hence
s-curves (how prepayment responds to mortgage rate NNM burnout effect is reasonably accurate, on par with
incentives) are steeper for loan/pools with larger loan the MSCI Hmodel.
sizes. In addition, the discount speeds (prepayment Exhibit 11 shows an example of the so-called
speeds for loans with coupons below prevailing mort- media effect. When mortgage rates are at or close to
gage rates) are generally higher for low-loan-size pools. historical lows, as in 2012 and 2016 (Exhibit 12), the
Exhibit 9 shows the reversed loan size sensitivi- whole mortgage universe becomes refinance-able. The
ties for rate refinance (positive incentive) and housing large refinance volumes strain the resources of origi-
turnover (negative incentive). NNM is able to cap- nators/servicers, which often optimize their workforce
ture these behaviors, and is generally consistent with to focus on a certain segment of refinance applica-
economic intuitions and with the MSCI human model tions, for example on newer and often lower coupons.
specification.
&35
&35
N N N N N N N N
2EVHUYDWLRQ $YJ83%
&RKRUW 5DQJH &35 :$/$ 6$72 &/79 &/16= ,QFHQWLYH ),&2 ELOOLRQ
3XUFKDVH5HWDLO
)+ -XO±'HF ±
)+ 1RY±)HE
3XUFKDVH732
)+ -XO±'HF ±
)+ 1RY±)HE
5HIL5HWDLO
)+ -XO±'HF ±
)+ 1RY±)HE
5HIL732
)+ -XO±'HF ±
)+ 1RY±)HE
Notes: 3.5s are faster than 4s, across TPO/Retail and Refi/Purchase combinations for July 2012–December 2012 and November 2011–February 2012.
Refinance incentive and pool attributes (purchase/refinance, age, SATO, CLTV, FICO, loan size) are compared across origination channels
(Retail/TPO).
Exhibit 12
History of Agency 30-Year Fixed Mortgage Rate
-DQ
-XQ
1RY
$SU
6HS
)HE
-XO
'HF
0D\
2FW
0DU
$XJ
-DQ
-XQ
1RY
$SU
6HS
)HE
-XO
'HF
0D\
<HDU)L[HG0RUWJDJH5DWH
Note: Mortgage rates were at or close to historical lows in 2012 and 2016.
help homeowners reduce mortgage payments and avoid Exhibit 15 shows that the neural networks model
default. The effectiveness of the HARP and the subse- follows the general trend in the effectiveness of the HARP
quent HARP2 programs evolved in a complex pattern program but misses the complexity of its evolution. How-
as the mortgage industry adjusted its policies and proce- ever, we are not aware of any industry models that were
dures to this unprecedented federal intervention. able to model these trends in detail during these periods.
&35
-XO
2FW
-DQ
$SU
-XO
2FW
6HS
'HF
0DU
-XQ
6HS
'HF
$XJ
1RY
)HE
0D\
$XJ
1RY
Note: Lower coupons ramp up much faster in response to rate drops and higher peak speeds.
Exhibit 14
NNM and Hmodel Media Effect: 2015 Vintage 3.5/4/4.5s Prepayment Experience in 2016 Refinance Wave
&35
0DU
0DU
'HF
'HF
-DQ
0D\
6HS
-DQ
0D\
6HS
-DQ
$SU
$SU
-XO
$XJ
$XJ
1RY
0DU
-XO
1RY
$SU
Note: Lower coupons ramp up much faster in response to rate drops and higher peak speeds.
&/79
$FWXDO&35 0RGHO&35
CONCLUSION The total eMBS data is about 25 GB in size and has 30 raw
data attributes. In addition, the macroeconomic data include
In this article, we show the promise of a new FHFA weekly primary mortgage rates and state-level HPIs
neural networks/machine learning approach to agency (House Price Indices) and unemployment rates.
MBS prepayment modeling. The model results compare
favorably against an industry production model that was Exploratory Data Analysis (EDA)
constructed and maintained over a long period, using
the traditional modeling approach. The neural networks We first examine the data quality and data statistics.
approach is able to accurately model the highly nonlinear From Exhibit A1, we observe that the overall data quality
and interactive risk factor drivers and produce generally from 2003 to 2018, in terms of missing data, is good. In detail,
accurate prepayment forecasts at the pool level. Model the ratio of missing data for the last four years is low for most
important attributes, while the TPO and Refinance columns
transparency and overfitting issues can be overcome and
have many missing values. The data quality of some attributes
managed by the latest neural networks modeling tech- is a little worse for the period from June 2008 to November
niques. The gains in modeling efficiency gains—in the 2012. For example, about 38% of refinance data is missing
order of a hundred-fold—are potentially revolutionary. for June 2008 to November 2012 and only 1% missing for
January 2014–April 2018; 60% of TPO data is missing for
Appendix June 2008–November 2012 versus 40% missing for January
2014–April 2018. In addition, the number of samples per
month increases tenfold from 2003 to 2018. Furthermore,
DATA AND MODELING DETAILS FICO, OLTV, and AOLS are missing before June 2003. These
issues were due to Fannie Mae and Freddie Mac data disclo-
Model Data sure practices. Thus, we use mortgage data after June 2003
for model training and testing.
We purchased mortgage data from eMBS, a leading
Data cleansing is another important step, which
provider of MBS disclosure data for Fannie Mae and Freddie
includes imputing missing values and detecting outliers. Due
Mac. The eMBS agency MBS pool level prepayment data
to good data quality in recent years and the large volume of
include all fixed rate 30-year TBA eligible pool attributes
eMBS data, we focus on cleansing the univariate data in our
and prepayment rates in each month from 2000 to 2018.
work. Most of the attributes are time varying, and the value
in the current month is correlated to the previous month’s. In addition, we conduct data transformation. Based
To impute missing values for these time-varying attributes, on our domain knowledge, we artificially create spread-at-
we first group the records by pool ID and the number of origination (SATO), incentive, HARP indicators, one indi-
loans to impute missing values via interpolation. If there are cator for mortgage credit environment and indicators for the
missing values at the beginning or the end for a particular HARP program and eligible loans/pools.
pool ID and the number of loans, we impute the missing
values by using the first value in the following or previous Model Feature Selection
months, which are referred as forward-fill or backward-
fill methods, respectively. Missing values are approximated Feature selection is a critical step for machine learning
relatively accurately in this scenario. Then, we fill the rest projects and is an active research topic in academia. Filter,
of the missing values via grouping the records by pool ID wrapper, and embedded methods are widely used for fea-
and imputing missing values using interpolation, forward- ture selection for machine learning modeling in the industry
fill, and backward-fill methods. Because TPO has a high (Kumar 2014). Usually, a neural network does not require
level of missing data and its range is from 0 to 100, we complicated feature selection methods, because it can choose
impute all missing values as -99. For those attributes that the proper nonlinear forms of attributes and interactions based
are constant for a pair of pool and number of loans, we either on the intrinsic features of the data, when there are sufficient
use the average or the most frequent value to impute the hidden nodes and data. In practice, however, feature selection
missing values, for example, for “Vintage,” “OLTV,” and is sometimes still needed due to data/hardware limitations
“FICO.” After these steps, all missing values for each pool and training time tolerances. In our project, we follow four
are imputed, as long as there is a record in at least one month steps for feature selection.
for the pool. As to outliers, we eliminate the unrealistic First, we choose variables based on our domain knowl-
records based on our experience and restrict the extreme edge. From previous experience, we know some variables
value of some variables to avoid negative effects. have a significant impact on the target variable. Second,
4
Multicollinearity is a phenomenon that significant correla- for the next layer. Common activation functions include sigmoid,
tions are present among the input variables. It causes instability ReLU, TanH, and so on.
8
when estimating the model coefficients, and hence the accuracy of Grid search is a traditional way of performing hyperpa-
statistical inference and sensitivity analysis might be compromised. rameter search and optimization in an exhaustive and brute force
5
Information value is a measure based on information theory manner through a manually specified subset of the hyperparameter
pioneered by Claude Shannon and is a commonly used metric to space.
9
measure the information carried by input variables. Model ensemble is a method to obtain better predictive
6
The dummy variable takes either 0 or 1 as its value to indi- performance by leveraging multiple learning algorithms/models.
cate whether some categorical effect on the target variable is present Common types of ensembles include bagging, boosting and
or not. stacking.
7 10
In artificial neural network, the activation function trans- Bagging is a type of ensemble method, which trains each
forms the inputs to an output of the node, which is used as input model using a randomly drawn subset of training data.
Kumar, V. 2014. “Feature Selection: A Literature Review.” To order reprints of this article, please contact David Rowe at
Smart Computing Review 4 (3). [email protected] or 646-891-2157.