420
420
420
Andrija Djurovic
Dr. Alan Forrest
Model shift refers to the quantitative change in a model’s parameters and outputs resulting from shifts in the
input data. It is particularly useful in credit risk modeling for understanding how models react to changes in
their underlying assumptions, data, or environment. The concept of model shift enables practitioners to:
efficiently address “What if. . . ?” scenarios, quantifying how data shifts impact model outputs without
needing complete model redevelopment;
provide a systematic way to assess and respond to model sensitivities and weaknesses, enhancing model
validation, monitoring, and risk management.
The following slides describe the framework for quantifying the model shift in one of the most commonly used
methods in credit risk: logistic regression with categorical risk factors.
In this framework both data and models can be presented as observed and expected counts in a
high-dimensional contingency table. These in turn are converted to points in a Data Space of high dimensional
vectors. Here a data point is defined by the proportion of the observed population in each cell, viewed as a
vector of real numbers indexed by the cells. Likewise, the model point is defined by the proportion of the
expected population in each cell etc. We can keep track separately of the total population size for purposes of
inference, but note that the data point and model point do not vary as population size changes. The maximum
likelihood construction of logistic regression model from data depends solely on the proportions.
Model shift in response to data shift is more than an academic exercise. It is clearly closely related to data drift
and concept drift, well known concerns in model management, and it underlies modeling techniques such as
imputation, bootstrapping and rebalancing. We are interested in it as a method for important analyses in
modern model risk management.
The following list outlines some use cases of the model shift:
Dynamic Model Reweighting: Enables real-time updates of model parameters as new data streams in,
ensuring agility in model adjustments without waiting for periodic reviews.
Prioritizing Validation Investigations: Quickly computes model shifts for various data shift scenarios,
enabling efficient triage and focus on the most impactful concerns.
Quantifying Business Impacts: Links data shifts to business-relevant metrics like Probability of Default
(PD) in Risk-Weighted Assets (RWA), ensuring sensitivity analyses are connected to actionable
outcomes.
Sensitive Data Shift Identification: Enables the identification of data shifts that have the most significant
impact on models, enriching the validation narrative with actionable insights.
Bespoke Model Monitoring: Defines monitoring metrics for sensitive data shifts, creating early warning
systems, particularly for population shifts that do not immediately affect model outputs.
Automated Validation and Monitoring: Streamlines validation and monitoring processes, integrating
them with dynamic model updates for continuous, real-time risk management.
The first-order model shift (∆p) can be explicitly represented as a matrix multiplication of the
data shift. The following formulas illustrate the process of approximating parameter changes given
the data shifts (∆x + and ∆x − ):
∆p = C −1 D T (I + Z )−1 ∆x + − (I + Z −1 )−1 ∆x −
where:
D is the design matrix;
Y + and Y − are the diagonal matrices of modeled frequencies restricted to binary output 1
and 0, respectively;
Z = Y + (Y − )−1 is diagonal matrix of modeled odds ratios;
I is identity matrix dimensions nrow(Z) x nrow(Z);
Y = (I + Z )−1 (I + Z −1 )−1 (Y + + Y − );
C = D T YD;
∆x + and ∆x − are the shifts in the proportions of input factors for binary outputs 1 and 0,
respectively.
Another way to quantify changes in the parameters of the logistic regression based on the data
shift is to re-estimate the weighted binomial logistic regression.
The following formula presents the log-likelihood function used to estimate the parameters (β) of
the weighted logistic regression::
n
1 1
X h i
L(β) = wi yi log + (1 − yi ) log 1 −
1 + exp(−xi β) 1 + exp(−xi β)
i=1
where:
yi is the binary response variable for the i-th observation (either 0 or 1);
xi is the vector of predictors for the i-th observation;
wi is the associated weight of the i-th observation.
The following formula presents the log-likelihood function used to estimate the model parameters (β), along
with the adjustment of the variance-covariance matrix (Σ̂) based on the dispersion parameter:
n h i
1 1
X
L(β) = wi yi log + (1 − yi ) log 1−
1 + exp(−xi β) 1 + exp(−xi β)
i=1
Σ̂ = Φ̂V̂
where:
yi is the binary response variable for the i-th observation (either 0 or 1);
xi is the vector of predictors for the i-th observation;
wi is the associated weight of the i-th observation;
Φ̂ is the estimate of the dispersion parameter;
V̂ is the estimated variance-covariance matrix assuming a binomial distribution (the “naive”
variance-covariance matrix).
Andrija Djurovic and Dr. Alan Forrest Model Shift 7 / 13
Simulation Study
The following steps outline the simulation framework used to quantify changes in model
parameters based on a simulated scenario:
1 Assume a simplified PD model consisting of the target variable Creditability and two
categorical risk factors: Account_Balance and Maturity. The simulation dataset is
available here.
2 The risk factor Account_Balance includes four categories with the following distribution of
observations:
## 01 02 03 04
## 274 269 63 394
3 The risk factor Maturity includes five categories with the following distribution of
observations:
## 01 (-Inf,8) 02 [8,16) 03 [16,36) 04 [36,45) 05 [45,Inf)
## 87 344 399 100 70
5 Assume the following scenario: the portfolio structure changes as the bank plans to increase
loan approvals for riskier groups, specifically clients in the Account_Balance category 01,
by 40%. Simultaneously, loan approvals for clients in category 04 will decrease by the same
number. Given this scenario, the new allocation of Account_Balance modalities is:
## 01 02 03 04
## 383.6 269.0 63.0 284.4
C Matrix (MxM):
## (Intercept) Account_Balance02 Account_Balance03 Account_Balance04 Maturity02 [8,16) Maturity03 [16,36) Maturity04 [36,45) Maturity05 [45,Inf)
## (Intercept) 0.1741 0.0598 0.0105 0.0395 0.0551 0.0758 0.0208 0.0148
## Account_Balance02 0.0598 0.0598 0.0000 0.0000 0.0189 0.0245 0.0070 0.0074
## Account_Balance03 0.0105 0.0000 0.0105 0.0000 0.0044 0.0042 0.0014 0.0000
## Account_Balance04 0.0395 0.0000 0.0000 0.0395 0.0110 0.0177 0.0057 0.0038
## Maturity02 [8,16) 0.0551 0.0189 0.0044 0.0110 0.0551 0.0000 0.0000 0.0000
## Maturity03 [16,36) 0.0758 0.0245 0.0042 0.0177 0.0000 0.0758 0.0000 0.0000
## Maturity04 [36,45) 0.0208 0.0070 0.0014 0.0057 0.0000 0.0000 0.0208 0.0000
## Maturity05 [45,Inf) 0.0148 0.0074 0.0000 0.0038 0.0000 0.0000 0.0000 0.0148
The table below provides a summary and comparison of the model shift simulation results:
## coefficient matrix multiplication weighted logistic weighted quasi-binomial
## (Intercept) 0.0150 0.0158 0.0158
## Account_Balance02 0.0056 0.0056 0.0056
## Account_Balance03 -0.0055 -0.0052 -0.0052
## Account_Balance04 0.0041 0.0042 0.0042
## Maturity02 [8,16) -0.0027 -0.0048 -0.0048
## Maturity03 [16,36) -0.0160 -0.0165 -0.0165
## Maturity04 [36,45) -0.0158 -0.0150 -0.0150
## Maturity05 [45,Inf) -0.0920 -0.0923 -0.0923