Financial returns
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Course overview
Learn how to analyze investment return distributions, build
portfolios and reduce risk, and identify key factors which are
driving portfolio returns.
Univariate Investment Risk
Portfolio Investing
Factor Investing
Forecasting and Reducing Risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Investment risk
What is Risk?
Risk in nancial markets is a measure of uncertainty
Dispersion or variance of nancial returns
How do you typically measure risk?
Standard deviation or variance of daily returns
Kurtosis of the daily returns distribution
Skewness of the daily returns distribution
Historical drawdown
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Financial risk
Returns Probability
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
A tale of two returns
Returns are derived from
stock prices
Discrete returns (simple
returns) are the most
commonly used, and
represent periodic (e.g.
daily, weekly, monthly, etc.)
price movements
Log returns are o en used
in academic research and
nancial modeling. They
assume continuous
compounding.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating stock returns
Discrete returns are
calculated as the change in
price as a percentage of the
previous period's price
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating log returns
Log returns are calculated
as the di erence between
Pt2
the log of two prices Rl = ln( )
Pt1
Log returns aggregate
across time, while discrete or equivalently
returns aggregate across
Rl = ln(Pt2 ) − ln(Pt1 )
assets
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating stock returns in Python
Step 1:
Load in stock prices data and store it as a pandas DataFrame
organized by date:
import pandas as pd
StockPrices = pd.read_csv('StockData.csv', parse_dates=['Date'])
StockPrices = StockPrices.sort_values(by='Date')
StockPrices.set_index('Date', inplace=True)
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating stock Returns in Python
Step 2:
Calculate daily returns of the adjusted close prices and append
the returns as a new column in the DataFrame.
StockPrices["Returns"] = StockPrices["Adj Close"].pct_change()
StockPrices["Returns"].head()
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Visualizing return distributions
import matplotlib.pyplot as plt
plt.hist(StockPrices["Returns"].dropna(), bins=75, density=False)
plt.show()
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Mean, variance, and
normal distribution
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Moments of distributions
Probability distributions have the following moments:
1) Mean (μ)
2) Variance ( σ2 )
3) Skewness
4) Kurtosis
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
There are many types of
distributions. Some are normal
and some are non-normal. A
random variable with a
Gaussian distribution is said
to be normally distributed.
Normal Distributions have the
following properties:
Mean = μ
Variance = σ2
Skewness = 0
Kurtosis = 3
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The standard normal distribution
The Standard Normal is a special case of the Normal
Distribution when:
σ=1
μ=0
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Comparing against a normal distribution
Normal distributions have a skewness near 0 and a kurtosis
near 3.
Financial returns tend not to be normally distributed
Financial returns can have high kurtosis
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Comparing against a normal distribution
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating mean returns in python
To calculate the average daily return, use the np.mean()
function:
import numpy as np
np.mean(StockPrices["Returns"])
0.0003
To calculate the average annualized return assuming 252
trading days in a year:
import numpy as np
((1+np.mean(StockPrices["Returns"]))**252)-1
0.0785
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Standard deviation and variance
Standard Deviation (Volatility)
Variance = σ2
O en represented in
mathematical notation as σ,
or referred to as volatility
An investment with higher σ
is viewed as a higher risk
investment
Measures the dispersion of
returns
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Standard deviation and variance in Python
Assume you have pre-loaded stock returns data in the
StockData object. To calculate the periodic standard deviation
of returns:
import numpy as np
np.std(StockPrices["Returns"])
0.0256
To calculate variance, simply square the standard deviation:
np.std(StockPrices["Returns"])**2
0.000655
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Scaling volatility
Volatility scales with the
square root of time
You can normally assume
252 trading days in a given
year, and 21 trading days in
a given month
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Scaling volatility in Python
Assume you have pre-loaded stock returns data in the
StockData object. To calculate the annualized volatility of
returns:
import numpy as np
np.std(StockPrices["Returns"]) * np.sqrt(252)
0.3071
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Skewness and
kurtosis
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Skewness is the third moment
of a distribution.
Negative Skew: The mass of
the distribution is
concentrated on the right.
Usually a right-leaning curve
Positive Skew: The mass of
the distribution is
concentrated on the le .
Usually a le -leaning curve
In nance, you would tend
to want positive skewness
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Skewness in Python
Assume you have pre-loaded stock returns data in the
StockData object.
To calculate the skewness of returns:
from scipy.stats import skew
skew(StockData["Returns"].dropna())
0.225
Note that the skewness is higher than 0 in this example,
suggesting non-normality.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Kurtosis is a measure of the
thickness of the tails of a
distribution
Most nancial returns are
leptokurtic
Leptokurtic: When a
distribution has positive
excess kurtosis (kurtosis
greater than 3)
Excess Kurtosis: Subtract 3
from the sample kurtosis to
calculate "Excess Kurtosis"
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Excess kurtosis in Python
Assume you have pre-loaded stock returns data in the
StockData object. To calculate the excess kurtosis of returns:
from scipy.stats import kurtosis
kurtosis(StockData["Returns"].dropna())
2.44
Note the excess kurtosis greater than 0 in this example,
suggesting non-normality.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Testing for normality in Python
How do you perform a statistical test for normality?
The null hypothesis of the Shapiro-Wilk test is that the data are
normally distributed.
# Run the Shapiro-Wilk normality test in Python
from scipy import stats
p_value = stats.shapiro(StockData["Returns"].dropna())[1]
if p_value <= 0.05:
print("Null hypothesis of normality is rejected.")
else:
print("Null hypothesis of normality is accepted.")
The p-value is the second variable returned in the list. If the p-
value is less than 0.05, the null hypothesis is rejected because
the data are most likely non-normal.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Portfolio
composition and
backtesting
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Calculating portfolio returns
Portfolio Return Formula:
Rp = Ra1 wa1 + Ra2 wa2 + ... + Ran wa1
Rp : Portfolio return
Ran : Return for asset n
wan : Weight for asset n
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the portfolio return for a set of
portfolio weights as follows:
import numpy as np
portfolio_weights = np.array([0.25, 0.35, 0.10, 0.20, 0.10])
port_ret = StockReturns.mul(portfolio_weights, axis=1).sum(axis=1)
port_ret
Date
2017-01-03 0.008082
2017-01-04 0.000161
2017-01-05 0.003448
...
StockReturns["Portfolio"] = port_ret
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Equally weighted portfolios in Python
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the portfolio return for an equally
weighted portfolio as follows:
import numpy as np
numstocks = 5
portfolio_weights_ew = np.repeat(1/numstocks, numstocks)
StockReturns.iloc[:,0:numstocks].mul(portfolio_weights_ew, axis=1).sum(axis=1)
Date
2017-01-03 0.008082
2017-01-04 0.000161
2017-01-05 0.003448
...
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Plotting portfolio returns in Python
To plot the daily returns in Python:
StockPrices["Returns"] = StockPrices["Adj Close"].pct_change()
StockReturns = StockPrices["Returns"]
StockReturns.plot()
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Plotting portfolio cumulative returns
In order to plot the cumulative returns of multiple portfolios:
import matplotlib.pyplot as plt
CumulativeReturns = ((1 + StockReturns).cumprod() - 1)
CumulativeReturns[["Portfolio","Portfolio_EW"]].plot()
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Market capitalization
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Market capitalization
Market capitalization: The value of a company's publicly traded
shares.
Also referred to as Market cap.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Market-cap weighted portfolios
In order to calculate the market cap weight of a given stock n:
mcapn
wmcapn = n
∑i=1 mcapi
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Market-Cap weights in Python
To calculate market cap weights in python, assuming you have
data on the market caps of each company:
import numpy as np
market_capitalizations = np.array([100, 200, 100, 100])
mcap_weights = market_capitalizations/sum(market_capitalizations)
mcap_weights
array([0.2, 0.4, 0.2, 0.2])
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Correlation and co-
variance
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Pearson correlation
Examples of di erent correlations between two
random variables:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Pearson correlation
A heatmap of a correlation matrix:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Correlation matrix in Python
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the correlation matrix as follows:
correlation_matrix = StockReturns.corr()
print(correlation_matrix)
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Portfolio standard deviation
Portfolio standard deviation for a two asset portfolio:
σp = √w12 σ12 + w22 σ22 + 2w1 w2 ρ1,2 σ1 σ2
σp : Portfolio standard deviation
w: Asset weight
σ : Asset volatility
ρ1,2 : Correlation between assets 1 and 2
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Co-variance matrix
To calculate the co-variance matrix (Σ) of returns X:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Co-variance matrix in Python
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the covariance matrix as follows:
cov_mat = StockReturns.cov()
cov_mat
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Annualizing the covariance matrix
To annualize the covariance matrix:
cov_mat_annual = cov_mat * 252
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Portfolio standard deviation using covariance
The formula for portfolio volatility is:
σP ortf olio = √wT ⋅ Σ ⋅ w
σP ortf olio : Portfolio volatility
Σ: Covariance matrix of returns
w: Portfolio weights (wT is transposed portfolio weights)
⋅ The dot-multiplication operator
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Matrix transpose
Examples of matrix transpose operations:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dot product
The dot product operation of two vectors a and b:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Portfolio standard deviation using Python
To calculate portfolio volatility assume a weights array and a
covariance matrix:
import numpy as np
port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_mat, weights)))
port_vol
0.035
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Markowitz portfolios
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
100,000 randomly generated portfolios
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Sharpe ratio
The Sharpe ratio is a measure of risk-adjusted return.
To calculate the 1966 version of the Sharpe ratio:
Ra − rf
S=
σa
S: Sharpe Ratio
Ra : Asset return
rf : Risk-free rate of return
σa : Asset volatility
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The efficient frontier
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Markowitz portfolios
Any point on the e cient
frontier is an optimum
portfolio.
These two common points are
called Markowitz Portfolios:
MSR: Max Sharpe Ratio
portfolio
GMV: Global Minimum
Volatility portfolio
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Choosing a portfolio
How do you choose the best portfolio?
Try to pick a portfolio on the bounding edge of the e cient
frontier
Higher return is available if you can stomach higher risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Selecting the MSR in Python
Assuming a DataFrame df of random portfolios with
Volatility and Returns columns:
numstocks = 5
risk_free = 0
df["Sharpe"] = (df["Returns"] - risk_free) / df["Volatility"]
MSR = df.sort_values(by=['Sharpe'], ascending=False)
MSR_weights = MSR.iloc[0, 0:numstocks]
np.array(MSR_weights)
array([0.15, 0.35, 0.10, 0.15, 0.25])
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Past performance is not a guarantee of future returns
Even though a Max Sharpe Ratio portfolio might sound nice, in
practice, returns are extremely di cult to predict.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Selecting the GMV in Python
Assuming a DataFrame df of random portfolios with
Volatility and Returns columns:
numstocks = 5
GMV = df.sort_values(by=['Volatility'], ascending=True)
GMV_weights = GMV.iloc[0, 0:numstocks]
np.array(GMV_weights)
array([0.25, 0.15, 0.35, 0.15, 0.10])
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Capital Asset
Pricing Model
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
The founding father of asset pricing models
CAPM
The Capital Asset Pricing Model is the fundamental building
block for many other asset pricing models and factor models in
nance.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Excess returns
To calculate excess returns, simply subtract the risk free rate of
return from your total return:
Excess Return = Return − Risk Free Return
Example:
Investing in Brazil:
10% Portfolio Return - 15% Risk Free Rate = -5% Excess Return
Investing in the US:
10% Portfolio Return - 3% Risk Free Rate = 7% Excess Return
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Capital Asset Pricing Model
E(RP ) − RF = βP (E(RM ) − RF )
E(RP ) − RF : The excess expected return of a stock or
portfolio P
E(RM ) − RF : The excess expected return of the broad
market portfolio B
RF : The regional risk free-rate
βP : Portfolio beta, or exposure, to the broad market portfolio
B
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating Beta using co-variance
To calculate historical beta using co-variance:
Cov(RP , RB )
βP =
V ar(RB )
βP : Portfolio beta
Cov(RP , RB ): The co-variance between the portfolio (P)
and the benchmark market index (B)
V ar(RB ): The variance of the benchmark market index
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating Beta using co-variance in Python
Assuming you already have excess portfolio and market returns
in the object Data :
covariance_matrix = Data[["Port_Excess","Mkt_Excess"]].cov()
covariance_coefficient = covariance_matrix.iloc[0, 1]
benchmark_variance = Data["Mkt_Excess"].var()
portfolio_beta = covariance_coefficient / benchmark_variance
portfolio_beta
0.93
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Linear regressions
Example of a linear regression: Regression formula in matrix
notation:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Calculating Beta using linear regression
Assuming you already have excess portfolio and market returns
in the object Data :
import statsmodels.formula.api as smf
model = smf.ols(formula='Port_Excess ~ Mkt_Excess', data=Data)
fit = model.fit()
beta = fit.params["Mkt_Excess"]
beta
0.93
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
R-Squared vs Adjusted R-Squared
To extract the adjusted r-squared and r-squared values:
import statsmodels.formula.api as smf
model = smf.ols(formula='Port_Excess ~ Mkt_Excess', data=Data)
fit = model.fit()
r_squared = fit.rsquared
r_squared
0.70
adjusted_r_squared = fit.rsquared_adj
0.65
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Alpha and multi-
factor models
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
The Fama-French 3 factor Model
RP =
RF + βM (RM − RF ) + bSM B ⋅ SM B + bHM L ⋅ HM L + α
SMB: The small minus big factor
bSM B : Exposure to the SMB factor
HML: The high minus low factor
bHM L : Exposure to the HML factor
α: Performance which is unexplained by any other factors
βM : Beta to the broad market portfolio B
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Fama-French 3 factor model
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Fama-French 3 factor model in Python
Assuming you already have excess portfolio and market returns
in the object Data :
import statsmodels.formula.api as smf
model = smf.ols(formula='Port_Excess ~ Mkt_Excess + SMB + HML',
data=Data)
fit = model.fit()
adjusted_r_squared = fit.rsquared_adj
adjusted_r_squared
0.90
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
P-values and statistical significance
To extract the HML p-value, assuming you have a ed
regression model object in your workspace as fit :
fit.pvalues["HML"]
0.0063
To test if it is statistically signi cant, simply examine whether or
not it is less than a given threshold, normally 0.05:
fit.pvalues["HML"] < 0.05
True
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Extracting coefficients
To extract the HML coe cient, assuming you have a ed
regression model object in your workspace as fit :
fit.params["HML"]
0.502
fit.params["SMB"]
-0.243
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Alpha and the efficient market hypothesis
Assuming you already have a ed regression analysis in the
object fit :
portfolio_alpha = fit.params["Intercept"]
portfolio_alpha_annualized = ((1 + portfolio_alpha) ** 252) - 1
portfolio_alpha_annualized
0.045
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Expanding the 3-
factor model
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Fama French 1993
The original paper that started it all:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Cliff Assness on Momentum
A paper published later by Cli Asness from AQR:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Fama-French 5 factor model
In 2015, Fama and French extended their previous 3-factor
model, adding two additional factors:
RMW: Pro tability
CMA: Investment
The RMW factor represents the returns of companies with high
operating pro tability versus those with low operating
pro tability.
The CMA factor represents the returns of companies with
aggressive investments versus those who are more conservative.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Fama-French 5 factor model
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Fama-French 5 factor model in Python
Assuming you already have excess portfolio and market returns
in the object Data :
import statsmodels.formula.api as smf
model = smf.ols(formula='Port_Excess ~ Mkt_Excess + SMB + HML + RMW + CMA',
data=Data)
fit = model.fit()
adjusted_r_squared = fit.rsquared_adj
adjusted_r_squared
0.92
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Estimating tail risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Estimating tail risk
Tail risk is the risk of extreme investment outcomes, most
notably on the negative side of a distribution.
Historical Drawdown
Value at Risk
Conditional Value at Risk
Monte-Carlo Simulation
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Historical drawdown
Drawdown is the percentage Historical Drawdown of
loss from the highest the USO Oil ETF
cumulative historical point.
rt
Drawdown = −1
RM
rt : Cumulative return at
time t
RM : Running maximum
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Historical drawdown in Python
Assuming cum_rets is an np.array of cumulative returns over
time
running_max = np.maximum.accumulate(cum_rets)
running_max[running_max < 1] = 1
drawdown = (cum_rets) / running_max - 1
drawdown
Date Return
2007-01-03 -0.042636
2007-01-04 -0.081589
2007-01-05 -0.073062
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Historical Value at Risk
Value at Risk, or VaR, is a
threshold with a given
con dence level that losses
will not (or more accurately,
will not historically) exceed a
certain level.
VaR is commonly quoted with
95% certain that losses will
quantiles such as 95, 99, and
not exceed -2.3% in a given
99.9.
day based on historical values.
Example: VaR(95) = -2.3%
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Historical Value at Risk in Python
var_level = 95
var_95 = np.percentile(StockReturns, 100 - var_level)
var_95
-0.023
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Historical expected shortfall
Conditional Value at Risk, or
CVaR, is an estimate of
expected losses sustained in
the worst 1 - x% of scenarios.
CVaR is commonly quoted
with quantiles such as 95, 99,
and 99.9.
In the worst 5% of cases,
Example: CVaR(95) = -2.5% losses were on average
exceed -2.5% historically.
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Historical expected shortfall in Python
Assuming you have an object StockReturns which is a time
series of stock returns.
To calculate historical CVaR(95):
var_level = 95
var_95 = np.percentile(StockReturns, 100 - var_level)
cvar_95 = StockReturns[StockReturns <= var_95].mean()
cvar_95
-0.025
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
VaR extensions
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
VaR quantiles
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Empirical assumptions
Empirical historical values are those that have actually
occurred.
How do you simulate the probability of a value that has never
occurred historically before?
Sample from a probability distribution
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Parametric VaR in Python
Assuming you have an object StockReturns which is a time
series of stock returns.
To calculate parametric VaR(95):
mu = np.mean(StockReturns)
std = np.std(StockReturns)
confidence_level = 0.05
VaR = norm.ppf(confidence_level, mu, std)
VaR
-0.0235
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Scaling risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Scaling risk in Python
Assuming you have a one-day estimate of VaR(95) var_95 .
To estimate 5-day VaR(95):
forecast_days = 5
forecast_var95_5day = var_95*np.sqrt(forecast_days)
forecast_var95_5day
-0.0525
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Random walks
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Random walks
Most o en, random walks in
nance are rather simple
compared to physics:
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Random walks in Python
Assuming you have an object StockReturns which is a time
series of stock returns.
To simulate a random walk:
mu = np.mean(StockReturns)
std = np.std(StockReturns)
T = 252
S0 = 10
rand_rets = np.random.normal(mu, std, T) + 1
forecasted_values = S0 * (rand_rets.cumprod())
forecasted_values
array([ 9.71274884, 9.72536923, 10.03605425 ... ])
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Monte Carlo simulations
A series of Monte Carlo simulations of a single asset starting at
stock price $10 at T0. Forecasted for 1 year (252 trading days
along the x-axis):
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Monte Carlo VaR in Python
To calculate the VaR(95) of 100 Monte Carlo simulations:
mu = 0.0005
vol = 0.001
T = 252
sim_returns = []
for i in range(100):
rand_rets = np.random.normal(mu, vol, T)
sim_returns.append(rand_rets)
var_95 = np.percentile(sim_returns, 5)
var_95
-0.028
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Understanding risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Summary
Moments and Distributions
Portfolio Composition
Correlation and Co-Variance
Markowitz Optimization
Beta & CAPM
FAMA French Factor Modeling
Alpha
Value at Risk
Monte Carlo Simulations
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Good luck!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Welcome!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Dr. Jamsheed Shorish
Computational Economist
About Me
Computational Economist
Specializing in:
asset pricing
financial technologies ("FinTech")
computer applications to economics and finance
Co-instructor, "Economic Analysis of the Digital Economy" at the ANU
Shorish Research (Belgium): computational business applications
QUANTITATIVE RISK MANAGEMENT IN PYTHON
What is Quantitative Risk Management?
Quantitative Risk Management: Study of quantifiable uncertainty
Uncertainty:
Future outcomes are unknown
Outcomes impact planning decisions
Risk management: mitigate (reduce effects of) adverse outcomes
Quantifiable uncertainty: identify factors to measure risk
Example: Fire insurance. What factors make fire more likely?
This course: focus upon risk associated with a financial portfolio
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Risk management and the Global Financial Crisis
Great Recession (2007 - 2010)
Global growth loss more than $2 trillion
United States: nearly $10 trillion lost in household wealth
U.S. stock markets lost c. $8 trillion in value
Global Financial Crisis (2007-2009)
Large-scale changes in fundamental asset values
Massive uncertainty about future returns
High asset returns volatility
Risk management critical to success or failure
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Quick recap: financial portfolios
Financial portfolio
Collection of assets with uncertain future returns
Stocks
Bonds
Foreign exchange holdings ('forex')
Stock options
Challenge: quantify risk to manage uncertainty
Make optimal investment decisions
Maximize portfolio return, conditional on risk appetite
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Quantifying return
Portfolio return: weighted sum of individual asset returns
Pandas data analysis library
DataFrame prices
.pct_change() method
.dot() method of returns
prices = pandas.read_csv("portfolio.csv")
returns = prices.pct_change()
weights = (weight_1, weight_2, ...)
portfolio_returns = returns.dot(weights)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Quantifying risk
Portfolio return volatility = risk
Calculate volatility via covariance matrix
Use .cov() DataFrame method of
returns and annualize
covariance = returns.cov()*252
print(covariance)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Quantifying risk
Portfolio return volatility = risk
Calculate volatility via covariance matrix
Use .cov() DataFrame method of
returns and annualize
Diagonal of covariance is individual asset
variances
covariance = returns.cov()*252
print(covariance)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Quantifying risk
Portfolio return volatility = risk
Calculate volatility via covariance matrix
Use .cov() DataFrame method of
returns and annualize
Diagonal of covariance is individual asset
variances
Off-diagonals of covariance are
covariances between assets
covariance = returns.cov()*252
print(covariance)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Portfolio risk
Depends upon asset weights in portfolio
Portfolio variance σp2 is
σp2 := wT ⋅ Covp ⋅ w
Matrix multiplication can be computed using @ operator in Python
Standard deviation is usually used instead of variance
weights = [0.25, 0.25, 0.25, 0.25] # Assumes four assets in portfolio
portfolio_variance = np.transpose(weights) @ covariance @ weights
portfolio_volatility = np.sqrt(portfolio_variance)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Volatility time series
Can also calculate portfolio volatility over
time
Use a 'window' to compute volatility over a
fixed time period (e.g. week, 30-day
'month')
Series.rolling() creates a window
Observe volatility trend and possible
extreme events
windowed = portfolio_returns.rolling(30)
volatility = windowed.std()*np.sqrt(252)
volatility.plot()
.set_ylabel("Standard Deviation...")
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Risk factors and the
financial crisis
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Risk factors
Volatility: measure of dispersion of returns
around expected value
Time series: expected value = sample
average
What drives expectation and dispersion?
Risk factors: variables or events driving
portfolio return and volatility
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Risk exposure
Risk exposure: measure of possible portfolio loss
Risk factors determine risk exposure
Example: Flood Insurance
Deductible: out-of-pocket payment regardless of loss
100% coverage still leaves deductible to be paid
So deductible is risk exposure
Frequent flooding => more volatile flood outcome
Frequent flooding => higher risk exposure
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Systematic risk
Systematic risk: risk factor(s) affecting
volatility of all portfolio assets
Market risk: systematic risk from general
financial market movements
Airplane engine failure: systematic risk!
Examples of financial systematic risk
factors:
Price level changes, i.e. inflation
Interest rate changes
Economic climate changes
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Idiosyncratic risk
Idiosyncratic risk: risk specific to a
particular asset/asset class.
Turbulence and the unfastened seatbelt:
idiosyncratic risk!
Examples of idiosyncratic risk:
Bond portfolio: issuer risk of default
Firm/sector characteristics
Firm size (market capitalization)
Book-to-market ratio
Sector shocks
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Factor models
Factor model: assessment of risk factors affecting portfolio return
Statistical regression, e.g. Ordinary Least Squares (OLS):
dependent variable: returns (or volatility)
independent variable(s): systemic and/or idiosyncratic risk factors
Fama-French factor model: combination of
market risk and
idiosyncratic risk (firm size, firm value)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Crisis risk factor: mortgage-backed securities
Investment banks: borrowed heavily just
before the crisis
Collateral: mortgage-backed securities
(MBS)
MBS: supposed to diversify risk by holding
many mortgages of different
characteristics
Flaw: mortgage default risk in fact was
highly correlated
Avalanche of delinquencies/default
destroyed collateral value
90-day mortgage delinquency: risk factor
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Crisis factor model
Factor model regression: portfolio returns vs. mortgage delinquency
Import statsmodels.api library for regression tools
Fit regression using .OLS() object and its .fit() method
Display results using regression's .summary() method
import statsmodels.api as sm
regression = sm.OLS(returns, delinquencies).fit()
print(regression.summary())
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Regression .summary() results
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Modern portfolio
theory
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
The risk-return trade-off
Risk factors: sources of uncertainty affecting return
Intuitively: greater uncertainty (more risk) compensated by greater return
Cannot guarantee return: need some measure of expected return
average (mean) historical return: proxy for expected future return
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Investor risk appetite
Investor survey: minimum return required for given level of risk?
Survey response creates (risk, return) risk profile "data point"
Vary risk level => set of (risk, return) points
Investor risk appetite: defines one quantified relationship between risk and return
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Choosing portfolio weights
Vary portfolio weights of given portfolio => creates set of (risk, return) pairs
Changing weights = beginning risk management!
Goal: change weights to maximize expected return, given risk level
Equivalently: minimize risk, given expected return level
Changing weights = adjusting investor's risk exposure
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Modern portfolio theory
Efficient portfolio: portfolio with weights generating highest expected return for given level
of risk
Modern Portfolio Theory (MPT), 1952
H. M. Markowitz (Nobel Laureate 1990)
Efficient portfolio weight vector w ⋆ solves:
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The efficient frontier
Compute many efficient portfolios for different levels of risk
Efficient frontier: locus of (risk, return) pairs created by efficient portfolios
PyPortfolioOpt library: optimized tools for MPT
EfficientFrontier class: generates one optimal portfolio at a time
Constrained Line Algorithm ( CLA ) class: generates the entire efficient frontier
Requires covariance matrix of returns
Requires proxy for expected future returns: mean historical returns
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Investment bank portfolio 2005 - 2010
Expected returns: historical data
Covariance matrix: Covariance Shrinkage improves efficiency of estimate
Constrained Line Algorithm object CLA
Minimum variance portfolio: cla.min_volatility()
Efficient frontier: cla.efficient_frontier()
expected_returns = mean_historical_return(prices)
efficient_cov = CovarianceShrinkage(prices).ledoit_wolf()
cla = CLA(expected_returns, efficient_cov)
minimum_variance = cla.min_volatility()
(ret, vol, weights) = cla.efficient_frontier()
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the efficient frontier
Scatter plot of (vol, ret) pairs
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the efficient frontier
Scatter plot of (vol, ret) pairs
Minimum variance portfolio: smallest
volatility of all possible efficient portfolios
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the efficient frontier
Scatter plot of (vol, ret) pairs
Minimum variance portfolio: smallest
volatility of all possible efficient portfolios
Increasing risk appetite: move along the
frontier
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Measuring Risk
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
CEO, Shorish Research
The Loss Distribution
Forex Example: Loss distribution: Random realizations of r
Portfolio value in U.S. dollars is USD 100 => distribution of portfolio losses in the
Risk factor = / exchange rate
future
Portfolio value in EURO if 1 =1 :
USD 100 x EUR 1 / USD 1 = EUR 100.
Portfolio value in EURO if r =1 :=
USD 100 x EUR r / 1 USD = EUR 100 x r
Loss = EUR 100 - EUR 100 x r = EUR 100 x
(1 - r)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Maximum loss
What is the maximum loss of a portfolio?
Losses cannot be bounded with 100% certainty
Confidence Level: replace 100% certainty with likelihood of upper bound
Can express questions like "What is the maximum loss that would take place 95% of the
time?"
Here the confidence level is 95%.
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Value at Risk (VaR)
VaR: statistic measuring maximum
portfolio loss at a particular confidence
level
Typical confidence levels: 95%, 99%, and
99.5% (usually represented as decimals)
Forex Example: If 95% of the time EUR /
USD exchange rate is at least 0.40, then:
portfolio value is at least USD 100 x 0.40
EUR / USD = EUR 40,
portofio loss is at most EUR 40 - EUR 100
= EUR 60,
so the 95% VaR is EUR 60.
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Conditional Value at Risk (CVaR)
CVaR: measures expected loss given a Forex Example:
minimum loss equal to the VaR 95% CVaR = expected loss for 5% of
cases when portfolio value smaller than
Equals expected value of the tail of the
loss distribution: EUR 40
1 x̄
CVaR(α) := E∫ xf (x)dx,
1−α VaR(α)
f (⋅) = loss distribution pdf
x̄ = upper bound of the loss (can be
infinity)
VaR(α) = VaR at the α confidence level.
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Deriving the VaR
1. Specify confidence level, e.g. 95% (0.95)
2. Create Series of loss observations
3. Compute loss.quantile() at specified confidence level
4. VaR = computed .quantile() at desired confidence level
5. scipy.stats loss distribution: percent point function .ppf() can also be used
loss = pd.Series(observations)
VaR_95 = loss.quantile(0.95)
print("VaR_95 = ", VaR_95)
Var_95 = 1.6192834157254088
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Deriving the CVaR
1. Specify confidence level, e.g. 95% (0.95)
2. Create or use sample from loss distribution
3. Compute VaR at a specified confidence level, e.g. 0.95.
4. Compute CVaR as expected loss (Normal distribution: scipy.stats.norm.expect() does
this).
losses = pd.Series(scipy.stats.norm.rvs(size=1000))
VaR_95 = scipy.stats.norm.ppf(0.95)
CVaR_95 = (1/(1 - 0.95))*scipy.stats.norm.expect(lambda x: x, lb = VaR_95)
print("CVaR_95 = ", CVaR_95)
CVaR_95 = 2.153595332530393
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
VaR95 = 5.72, i.e. VaR at 95% confidence
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
VaR95 = 5.72, i.e. VaR at 95% confidence
VaR99 = 7.81, i.e. VaR at 99% confidence
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
VaR95 = 5.72, i.e. VaR at 95% confidence
VaR99 = 7.81, i.e. VaR at 99% confidence
VaR99.5 = 8.78, i.e. VaR at 99.5%
confidence
The VaR measure increases as the
confidence level rises
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Risk exposure and
loss
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
A vacation analogy
Hotel reservations for vacation
Pay in advance, before stay
Low room rate
Non-refundable: cancellation fee = 100%
of room rate
Pay after arrival
High room rate
Partially refundable: cancellation fee of
20% of room rate
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Deciding between options
What determines your decision?
1. Chance of negative shock: illness, travel
disruption, weather
Probability of loss
2. Loss associated with shock: amount or
conditional amount
e.g. VaR, CVaR
3. Desire to avoid shock: personal feeling
Risk tolerance
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Risk exposure and VaR
Risk exposure: probability of loss x loss measure
Loss measure: e.g. VaR
10% chance of canceling vacation: P(Illness) = 0.10
Non-refundable:
Total non-refundable hotel cost: € 500
VaR at 90% confidence level: € 500
Partially refundable:
Refundable hotel cost: € 550
VaR at 90% confidence level: 20% cancellation fee x € 550 = € 110
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Calculating risk exposure
Non-refundable exposure ("nr"):
P(illness) x VaRnr
0.90 = 0.10 x € 500 = € 50.
Partially refundable exposure ("pr"):
pr
P(illness) x VaR0.90 = 0.10 x € 110 = € 11.
Difference in risk exposure: € 50 - € 11 = € 39.
Total price difference between offers: € 550 - € 500 = € 50.
Risk tolerance: is paying € 50 more worth avoiding € 39 of additional exposure?
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Risk tolerance and risk appetite
Risk-neutral: only expected values matter
€ 39 < € 50 ⇒ prefer non-refundable option
Risk-averse: uncertainty itself carries a cost
€ 39 < € 50 ⇒ prefer partially refundable option
Enterprise/institutional risk management: preferences as risk appetite
Individual investors: preferences as risk tolerance
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Loss distribution - discrete
Risk exposure depends upon loss
distribution (probability of loss)
Vacation example: 2 outcomes from
random risk factor
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Loss distribution - continuous
Risk exposure depends upon loss
distribution (probability of loss)
Vacation example: 2 outcomes from
random risk factor
More generally: continuous loss distribution
Normal distribution: good for large
samples
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Loss distribution - continuous
Risk exposure depends upon loss
distribution (probability of loss)
Vacation example: 2 outcomes from
random risk factor
More generally: continuous loss distribution
Normal distribution: good for large
samples
Student's t-distribution: good for smaller
samples
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Primer: Student's t-distribution
Also referred to as T distribution
Has "fatter" tails than Normal for small
samples
Similar to portfolio returns/losses
As sample size grows, T converges to
Normal distribution
QUANTITATIVE RISK MANAGEMENT IN PYTHON
T distribution in Python
Example: compute 95% VaR from T
distribution
Import t distribution from scipy.stats
Fit portfolio_loss data using t.fit()
from scipy.stats import t
params = t.fit(portfolio_losses)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
T distribution in Python
Example: compute 95% VaR from T
distribution
Import t distribution from scipy.stats
Fit portfolio_loss data using t.fit()
Compute percent point function with
.ppf() to find VaR
from scipy.stats import t
params = t.fit(portfolio_losses)
VaR_95 = t.ppf(0.95, *params)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Degrees of freedom
Degrees of freedom (df): number of
independent observations
Small df: "fat tailed" T distribution
Large df: Normal distribution
x = np.linspace(-3, 3, 100)
plt.plot(x, t.pdf(x, df = 2))
plt.plot(x, t.pdf(x, df = 5))
plt.plot(x, t.pdf(x, df = 30))
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Risk management
using VaR & CVaR
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Risk management via modern portfolio theory
Efficient Portfolio
Portfolio weights maximize return given
risk level
Efficient Frontier: locus of (risk, return)
points generated by different efficient
portfolios
Each point = portfolio weight
optimization
Creation of efficient portfolio/frontier:
Modern Portfolio Theory
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Incorporating Value at Risk into MPT
Modern Portfolio Theory (MPT): "mean-variance" optimization
Highest expected return
Risk level (volatility) is given
Objective function: expected return
VaR/CVaR: measure risk over distribution of loss
Adapt MPT to optimize over loss distribution vs. expected return
QUANTITATIVE RISK MANAGEMENT IN PYTHON
A new objective: minimize CVaR
Change objective of portfolio optimization
mean-variance objective: maximize expected mean return
CVaR objective: minimize expected conditional loss at a given confidence level
Example: Loss distribution
VaR: maximum loss with 95% confidence
Optimization: portfolio weights minimizing CVaR
CVaR: expected loss given at least VaR loss (worst 5% of cases)
Find lowest expected loss in worst 100% - 95% = 5% of possible outcomes
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The risk management problem
Select optimal portfolio weights w ⋆ as solution to
Recall: f (x) = probability density function of portfolio loss
PyPortfolioOpt: select minimization of CVaR as new objective
QUANTITATIVE RISK MANAGEMENT IN PYTHON
CVaR minimization using PyPortfolioOpt
Create an EfficientCVaR object with asset returns returns
Compute optimal portfolio weights using .min_cvar() method
ec = pypfopt.efficient_frontier.EfficientCVaR(None, returns)
optimal_weights = ec.min_cvar()
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Mean-variance vs. CVaR risk management
Mean-variance minimum volatility portfolio, 2005-2010 investment bank assets
ef = EfficientFrontier(None, e_cov)
min_vol_weights = ef.min_volatility()
print(min_vol_weights)
{'Citibank': 0.0,
'Morgan Stanley': 5.0784330940519306e-18,
'Goldman Sachs': 0.6280157234640608,
'J.P. Morgan': 0.3719842765359393}
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Mean-variance vs. CVaR risk management
CVaR-minimizing portfolio, 2005-2010 investment bank assets
ec = pypfopt.efficient_frontier.EfficientCVaR(None, returns)
min_cvar_weights = ec.min_cvar()
print(min_cvar_weights)
{'Citibank': 0.0,
'Morgan Stanley': 0.0,
'Goldman Sachs': 0.669324359403484,
'J.P. Morgan': 0.3306756405965026}
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Portfolio hedging:
offsetting risk
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Portfolio stability
VaR/CVaR: potential portfolio loss for given confidence level
Portfolio optimization: 'best' portfolio weights
But volatility is still present!
Institutional investors: stability of portfolio against volatile changes
Pension funds: c. USD 20 trillion
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather (rain)
More rain => lower company value
Lower company value => lower stock
price
Lower stock price => lower portfolio value
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather (rain)
More rain => lower company value
Lower company value => lower stock
price
Lower stock price => lower portfolio value
Second opportunity: umbrella company
More rain => more value!
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather (rain)
More rain => lower company value
Lower company value => lower stock
price
Lower stock price => lower portfolio value
Second opportunity: umbrella company
More rain => more value!
Portfolio: sunglasses & umbrellas, more
stable
Volatility of rain is offset
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Hedging
Hedging: offset volatility with another asset
Crucial for institutional investor risk management
Additional return stream moving opposite to portfolio
Used in pension funds, ForEx, futures, derivatives...
2019: hedge fund market c. USD 3.6 trillion
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Hedge instruments: options
Derivative: hedge instrument
European option: very popular derivative
European call option: right (not obligation) to purchase stock at fixed price X on date M
European put option: right (not obligation) to sell stock at fixed price X on date M
Stock = "underlying" of the option
Current market price S = spot price
X = strike price
M = maturity date
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Black-Scholes option pricing
Option value changes when price of underlying changes => can be used to hedge risk
Need to value option: requires assumptions about market, underlying, interest rate, etc.
Black-Scholes option pricing formula: Fisher Black & Nobel Laureate Myron Scholes (1973)
Requires for each time t:
spot price S
strike price X
time to maturity T := M − t
risk-free interest rate r
volatility of underlying returns σ (standard deviation)
1Black, F. and M. Scholes (1973). "The Pricing of Options and Corporate Liabilities", Journal of Political Economy
vol 81 no. 3, pp. 637–654.{{3}}
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Black-Scholes formula assumptions
Market structure
Efficient markets
No transactions costs
Risk-free interest rate
Underlying stock
No dividends
Normally distributed returns
Online calculator: https://www.math.drexel.edu/~pg/fin/VanillaCalculator.html
Python function black_scholes() : source code link available in the exercises
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Computing the Black-Scholes option value
Black-Scholes option pricing formula black_scholes()
Required parameters: S , X , T (in fractions of a year), r , σ
Use the desired option_type ('call' or 'put')
S = 70; X = 80; T = 0.5; r = 0.02; sigma = 0.2
option_value = black_scholes(S, X, T, r, sigma, option_type = "put")
print(option_value)
10.31222171237868
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Hedging a stock position with an option
Hedge stock with European put option: underlying is same as stock in portfolio
Spot price S falls (ΔS < 0) => option value V rises (ΔV > 0)
Delta of an option: Δ := ∂V
∂S
1
Hedge one share with Δ options
ΔV
Delta neutral: ΔS + Δ
= 0; stock is hedged!
Python function bs_delta() : computes the option delta
Link to source available in the exercises
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Parametric
Estimation
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
A class of distributions
Loss distribution: not known with certainty
Class of possible distributions?
Suppose class of distributions f (x; θ)
x is loss (random variable)
θ is vector of unknown parameters
Example: Normal distribution
Parameters: θ = (μ, σ), mean μ and standard deviation σ
Parametric estimation: find 'best' θ ⋆ given data
Loss distribution: f (x, θ ⋆ )
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Fitting a distribution
Fit distribution according to error-minimizing criteria
Example: scipy.stats.norm.fit() , fitting Normal distribution to data
Result: optimally fitted mean and standard deviation
Advantages:
Can visualize difference between data and estimate using histogram
Can provide goodness-of-fit tests
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Goodness of fit
How well does an estimated distribution fit
the data?
Visualize: plot histogram of portfolio losses
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Goodness of fit
How well does an estimated distribution fit
the data?
Visualize: plot histogram of portfolio losses
Normal distribution with norm.fit()
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Goodness of fit
How well does an estimated distribution fit
the data?
Visualize: plot histogram of portfolio losses
Example:
Normal distribution with norm.fit()
Student's t-distribution with t.fit()
Asymmetrical histogram?
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Anderson-Darling test
Statistical test of goodness of fit
Test null hypothesis: data are Normally distributed
Test statistic rejects Normal distribution if larger than critical_values
Import scipy.stats.anderson
Compute test result using loss data
from scipy.stats import anderson
anderson(loss)
AndersonResult(statistic=11.048641503898523,
critical_values=array([0.57 , 0.649, 0.779, 0.909, 1.081]),
significance_level=array([15. , 10. , 5. , 2.5, 1. ]))
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Skewness
Skewness: degree to which data is non-
symmetrically distributed
Normal distribution: symmetric
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Skewness
Skewness: degree to which data is non-
symmetrically distributed
Normal distribution: symmetric
Student's t-distribution: symmetric
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Skewness
Skewness: degree to which data is non-
symmetrically distributed
Normal distribution: symmetric
Student's t-distribution: symmetric
Skewed Normal distribution: asymmetric
Contains Normal as special case
Useful for portfolio data, where e.g.
losses more frequent than gains
Available in scipy.stats as skewnorm
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Testing for skewness
Test how far data is from symmetric distribution: scipy.stats.skewtest
Null hypothesis: no skewness
Import skewtest from scipy.stats
Compute test result on loss data
Statistically significant => use distribution class with skewness
from scipy.stats import skewtest
skewtest(loss)
SkewtestResult(statistic=-7.786120875514511,
pvalue=6.90978472959861e-15)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Historical and
Monte Carlo
Simulation
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Historical simulation
No appropriate class of distributions?
Historical simulation: use past to predict future
No distributional assumption required
Data about previous losses become simulated losses for tomorrow
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Historical simulation in Python
VaR: start with returns in asset_returns
Compute portfolio_returns using portfolio weights
Convert portfolio_returns into losses
VaR: compute np.quantile() for losses at e.g. 95% confidence level
Assumes future distribution of losses is exactly the same as past
weights = [0.25, 0.25, 0.25, 0.25]
portfolio_returns = asset_returns.dot(weights)
losses = - portfolio_returns
VaR_95 = np.quantile(losses, 0.95)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Monte Carlo simulation
Monte Carlo simulation: powerful combination of parametric estimation and simulation
Assumes distribution(s) for portfolio loss and/or risk factors
Relies upon random draws from distribution(s) to create random path, called a run
Repeat random draws ⇒ creates set of simulation runs
Compute simulated portfolio loss over each run up to desired time
Find VaR estimate as quantile of simulated losses
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Monte Carlo simulation in Python
Step One:
Import Normal distribution norm from scipy.stats
Define total_steps (1 day = 1440 minutes)
Define number of runs N
Compute mean mu and standard deviation sigma of portfolio_losses data
from scipy.stats import norm
total_steps = 1440
N = 10000
mu = portfolio_losses.mean()
sigma = portfolio_losses.std()
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Monte Carlo simulation in Python
Step Two:
Initialize daily_loss vector for N runs
Loop over N runs
Compute Monte Carlo simulated loss vector
Uses norm.rvs() to draw repeatedly from standard Normal distribution
Draws match data using mu and sigma scaled by 1/ total_steps
daily_loss = np.zeros(N)
for n in range(N):
loss = ( mu * (1/total_steps) +
norm.rvs(size=total_steps) * sigma * np.sqrt(1/total_steps) )
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Monte Carlo simulation in Python
Step Three:
Generate cumulative daily_loss , for each run n
Use np.quantile() to find the VaR at e.g. 95% confidence level, over daily_loss
daily_loss = np.zeros(N)
for n in range(N):
loss = mu * (1/total_steps) + ...
norm.rvs(size=total_steps) * sigma * np.sqrt(1/total_steps)
daily_loss[n] = sum(loss)
VaR_95 = np.quantile(daily_loss, 0.95)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Simulating asset returns
Refinement: generate random sample paths of asset returns in portfolio
Allows more realism: asset returns can be individually simulated
Asset returns can be correlated
Recall: efficient covariance matrix e_cov
Used in Step 2 to compute asset returns
Exercises: Monte Carlo simulation with asset return simulation
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Structural breaks
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Risk and distribution
Risk management toolkit
Risk mitigation: MPT
Risk measurement: VaR, CVaR
Risk: dispersion, volatility
Variance (standard deviation) as risk definition
Connection between risk and distribution of risk factors as random variables
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Stationarity
Assumption: distribution is same over time
Unchanging distribution = stationary
Global financial crisis period efficient frontier
Not stationary
Estimation techniques require stationarity
Historical: unknown stationary distribution from past data
Parametric: assumed stationary distribution class
Monte Carlo: assumed stationary distribution for random draws
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Structural breaks
Non-stationary => perhaps distribution changes over time
Assume specific points in time for change
Break up data into sub-periods
Within each sub-period, assume stationarity
Structural break(s): point(s) of change
Change in 'trend' of average and/or volatility of data
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Example: China's population growth
Examine period 1950 - 2019
Trend is roughly linear...
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Example: China's population growth
Examine period 1950 - 2019
Trend is roughly linear...
...but seems to slow down from around 1990
Possible structural break near 1990.
Implies distribution of net population (births
- deaths) changed
Possible reasons: government policy,
standard of living, etc.
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The Chow test
Previous example: visual evidence for structural break
Quantification: statistical measure
Chow Test:
Test for existence of structural break given linear model
Null hypothesis: no break
Requires three OLS regressions
Regression for entire period
Two regressions, before and after break
Collect sum-of-squared residuals
Test statistic is distributed according to "F" distribution
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The Chow test in Python
Hypothesis: structural break in 1990 for China population
Assume linear "factor model":
log(Populationt ) = α + β ∗ Yeart + ut
OLS regression using statsmodels 's OLS object over full period 1950 - 2019
Retrieve sum-of-squared residual res.ssr
import statsmodels.api as sm
res = sm.OLS(log_pop, year).fit()
print('SSR 1950-2019: ', res.ssr)
SSR 1950-2019: 0.29240576138055463
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The Chow test in Python
Break 1950 - 2019 into 1950 - 1989 and 1990 - 2019 sub-periods
Perform OLS regressions on each sub-period
Retrieve res_before.ssr and res_after.ssr
pop_before = log_pop.loc['1950':'1989']; year_before = year.loc['1950':'1989'];
pop_after = log_pop.loc['1990':'2019']; year_after = year.loc['1990':'2019'];
res_before = sm.OLS(pop_before, year_before).fit()
res_after = sm.OLS(pop_after, year_after).fit()
print('SSR 1950-1989: ', res_before.ssr)
print('SSR 1990-2019: ', res_after.ssr)
SSR 1950-1989: 0.011741113017411783
SSR 1990-2019: 0.0013717593339608077
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The Chow test in Python
Compute the F-distributed Chow test statistic
Compute the numerator
k = 2 degrees of freedom = 2 OLS coefficients α, β
Compute the denominator
66 degrees of freedom = total number of data points (70) - 2*k
numerator = (ssr_total - (ssr_before + ssr_after)) / 2
denominator = (ssr_before + ssr_after) / 66
chow_test = numerator / denominator
print("Chow test statistic: ", chow_test, "; Critical value, 99.9%: ", 7.7)
Chow test statistic: 702.8715822890057; Critical value, 99.9%: 7.7
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Volatility and
extreme values
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Chow test assumptions
Chow test: identify statistical significance
of possible structural break
Requires: pre-specified point of structural
break
Requires: linear relation (e.g. factor model)
log(Populationt ) = α + β ∗ Yeart + ut
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Structural break indications
Visualization of trend may not indicate
break point
Alternative: examine volatility rather than
trend
Structural change often accompanied by
greater uncertainty => volatility
Allows richer models to be considered
(e.g. stochastic volatility models)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rolling window volatility
Rolling window: compute volatility over time and detect changes
Recall: 30-day rolling window
Create rolling window from ".rolling()" method
Compute the volatility of the rolling window (drop unavailable dates)
Compute summary statistic of interest, e.g. .mean() , .min() , etc.
rolling = portfolio_returns.rolling(30)
volatility = rolling.std().dropna()
vol_mean = volatility.resample("M").mean()
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rolling window volatility
Visualize resulting volatility (variance or import matplotlib.pyplot as plt
vol_mean.plot(
standard deviation) title="Monthly average volatility"
).set_ylabel("Standard deviation")
plt.show()
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Rolling window volatility
Visualize resulting volatility (variance or vol_mean.pct_change().plot(
title="$\Delta$ average volatility"
standard deviation) ).set_ylabel("% $\Delta$ stdev")
plt.show()
Large changes in volatility => possible
structural break point(s)
Use proposed break points in linear model
of volatility
Variant of Chow Test
Guidance for applying e.g. ARCH,
stochastic volatility models
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Extreme values
VaR, CVaR: maximum loss, expected shortfall at particular confidence level
Visualize changes in maximum loss by plotting VaR?
Useful for large datasets
Small datasets: not enough information
Alternative: find losses exceeding some threshold
Example: VaR95 is maximum loss 95% of the time
So 5% of the time, losses can be expected to exceed VaR95
Backtesting: use previous data ex-post to see how risk estimate performs
Used extensively in enterprise risk management
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Backtesting
Suppose VaR95 = 0.03
Losses exceeding 3% are then extreme
values
Backtesting: around 5% (100% - 95%) of
previous losses should exceed 3%
More than 5%: distribution with wider
("fatter") tails
Less than 5%: distribution with narrower
tails
CVaR for backtesting: accounts for tail
better than VaR
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Extreme value
theory
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Extreme values
Portfolio losses: extreme values Extreme values: from tail of distribution
Tail losses: losses exceeding some value
Model tail losses => better risk
management
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Extreme value theory
Extreme value theory: statistical
distribution of extreme values
Block maxima
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Extreme value theory
Extreme value theory: statistical
distribution of extreme values
Block maxima:
Break period into sub-periods
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Extreme value theory
Extreme value theory: statistical
distribution of extreme values
Block maxima:
Break period into sub-periods
Form block from each sub-period
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Extreme value theory
Extreme value theory: statistical
distribution of extreme values
Block maxima:
Break period into sub-periods
Form blocks from each sub-period
Set of block maxima = dataset
Peak over threshold (POT):
Find all losses over given level
Set of such losses = dataset
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Generalized Extreme Value Distribution
Example: Block maxima for 2007 - 2009
Resample losses with desired period (e.g. weekly)
maxima = losses.resample("W").max()
Generalized Extreme Value Distribution (GEV)
Distribution of maxima of data
Example: parametric estimation using scipy.stats.genextreme
from scipy.stats import genextreme
params = genextreme.fit(maxima)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
VaR and CVaR from GEV distribution
99% VaR from GEV distribution
Use .ppf() percent point function to find 99% VaR
Requires params from fitted GEV distribution
Finds maximum loss over one week period at 99% confidence
99% CVaR from GEV distribution
CVaR is conditional expectation of loss given VaR as minimum loss
Use .expect() method to find expected value
VaR_99 = genextreme.ppf(0.99, *params)
CVar_99 = ( 1 / (1 - 0.99) ) * genextreme.expect(lambda x: x, *params, lb = VaR_99)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Covering losses
Risk management: covering losses
Regulatory requirement (banks, insurance)
Reserves must be available to cover losses
For a specified period (e.g. one week)
At a specified confidence level (e.g. 99%)
VaR from GEV distribution:
estimates maximum loss
given period
given confidence level
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Covering losses
Example: Initial portfolio value = $1,000,000
One week reserve requirement at 99% confidence
VaR99 from GEV distribution: maximum loss over one week at 99% confidence
Reserve requirement: Portfolio value x VaR99
Suppose VaR99 = 0.10, i.e. 10% maximum loss
Reserve requirement = $100,000
Portfolio value changes => reserve requirement changes
Regulation sets frequency of reserve requirement updating
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Kernel density
estimation
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
The histogram revisited
Risk factor distributions
Assumed (e.g. Normal, T, etc.)
Fitted (parametric estimation, Monte
Carlo simulation)
Ignored (historical simulation)
Actual data: histogram
How to represent histogram by probability
distribution?
Smooth data using filtering
Non-parametric estimation
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
Pick particular portfolio loss
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
Pick particular portfolio loss
Examine nearby losses
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
Pick particular portfolio loss
Examine nearby losses
Form "weighted average" of losses
Kernel: filter choice; determines "window"
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
Pick particular portfolio loss
Examine nearby losses
Form "weighted average" of losses
Kernel: filter choice; determines "window"
Move window to another loss
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time
Pick particular portfolio loss
Examine nearby losses
Form "weighted average" of losses
Kernel: filter choice; determines "window"
Move window to another loss
Kernel density estimate: probability density
QUANTITATIVE RISK MANAGEMENT IN PYTHON
The Gaussian kernel
Continuous kernel
Weights all observations by distance from
center
Generally: many different kernels are
available
Used in time series analysis
Used in signal processing
QUANTITATIVE RISK MANAGEMENT IN PYTHON
KDE in Python
from scipy.stats import gaussian_kde
kde = guassian_kde(losses)
loss_range = np.linspace(np.min(losses),
np.max(losses),
1000)
plt.plot(loss_range, kde.pdf(loss_range))
Visualization: probability density function
from KDE fit
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Finding VaR using KDE
VaR: use gaussian_kde .resample() method
Find quantile of resulting sample
CVaR: expected value as previously encountered, but
gaussian_kde has no .expect() method => compute integral manually
special .expect() method written for exercise
sample = kde.resample(size = 1000)
VaR_99 = np.quantile(sample, 0.99)
print("VaR_99 from KDE: ", VaR_99)
VaR_99 from KDE: 0.08796423698448601
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Neural network risk
management
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Real-time portfolio updating
Risk management
Defined risk measures (VaR, CVaR)
Estimated risk measures (parameteric, historical, Monte Carlo)
Optimized portfolio (e.g. Modern Portfolio Theory)
New market information => update portfolio weights
Problem: portfolio optimization costly
Solution: weights = f (prices)
Evaluate f in real-time
Update f only occasionally
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural networks
Neural Network: output = f (input)
Neuron: interconnected processing node in function
Initially developed 1940s-1950s
Early 2000s: application of neural networks to "big data"
Image recognition, processing
Financial data
Search engine data
Deep Learning: neural networks as part of Machine Learning
2015: Google releases open-source Tensorflow deep learning library for Python
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Layers: connected processing neurons
Input layer
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Neural network structure
Input layer
Hidden layer
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Neural network structure
Input layer
Hidden layer
Output layer
Training: learn relationship between input
and output
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Neural network structure
Input layer
Hidden layer
Output layer
Training: learn relationship between input
and output
Asset prices => Input layer
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Neural network structure
Input layer
Hidden layer
Output layer
Training: learn relationship between input
and output
Asset prices => Input layer
Input + hidden layer processing
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Neural network structure
Input layer
Hidden layer
Output layer
Training: learn relationship between input
and output
Asset prices => Input layer
Input + hidden layer processing
Hidden + output layer processing
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Neural network structure
Neural network structure
Input layer
Hidden layer
Output layer
Training: learn relationship between input
and output
Asset prices => Input layer
Input + hidden layer processing
Hidden + output layer processing
Output => portfolio weights
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Using neural networks for portfolio optimization
Training
Compare output and pre-existing "best" portfolio weights
Goal: minimize "error" between output and weights
Small error => network is trained
Usage
Input: new, unseen asset prices
Output: predicted "best" portfolio weights for new asset prices
Best weights = risk management
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Creating neural networks in Python
Keras: high-level Python library for neural networks/deep learning
Further info: Introduction to Deep Learning with Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(10, input_dim=4, activation='sigmoid'))
model.add(Dense(4))
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Training the network in Python
Historical asset prices: training_input matrix
Historical portfolio weights: training_output vector
Compile model with:
given error minimization ('loss')
given optimization algorithm ('optimizer')
Fit model to training data
epochs: number of training loops to update internal parameters
model.compile(loss='mean_squared_error', optimizer='rmsprop')
model.fit(training_input, training_output, epochs=100)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Risk management in Python
Usage: provide new (e.g. real-time) asset pricing data
New vector new_asset_prices given to input layer
Evaluate network using model.predict() on new prices
Result: predicted portfolio weights
Accumulate enough data over time => re-train network
Test network on previous data => backtesting
# new asset prices are in the vector new_asset_prices
predicted = model.predict(new_asset_prices)
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Wrap-up and Future
Steps
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Jamsheed Shorish
Computational Economist
Congratulations!
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Congratulations!
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Congratulations!
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Congratulations!
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Tools in your toolkit
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Future steps and reference
Upcoming DataCamp courses
Credit Risk Modeling in Python
Financial Forecasting in Python
Machine Learning for Finance in Python
GARCH Models for Finance in Python
Quantitative Risk Management: Concepts, Techniques and Tools, McNeil, Frey & Embrechts,
Princeton UP, 2015.
QUANTITATIVE RISK MANAGEMENT IN PYTHON
Best of luck on your
data science
journey!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Understanding
credit risk
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
What is credit risk?
The possibility that someone who has borrowed money will not repay it all
Calculated risk di erence between lending someone money and a government bond
When someone fails to repay a loan, it is said to be in default
The likelihood that someone will default on a loan is the probability of default (PD)
CREDIT RISK MODELING IN PYTHON
What is credit risk?
The possibility that someone who has borrowed money will not repay it all
Calculated risk di erence between lending someone money and a government bond
When someone fails to repay a loan, it is said to be in default
The likelihood that someone will default on a loan is the probability of default (PD)
Payment Payment Date Loan Status
$100 Jun 15 Non-Default
$100 Jul 15 Non-Default
$0 Aug 15 Default
CREDIT RISK MODELING IN PYTHON
Expected loss
The dollar amount the rm loses as a result of loan default
Three primary components:
Probability of Default (PD)
Exposure at Default (EAD)
Loss Given Default (LGD)
Formula for expected loss:
expected_loss = PD * EAD * LGD
CREDIT RISK MODELING IN PYTHON
Types of data used
Two Primary types of data used:
Application data
Behavioral data
Application Behavioral
Interest Rate Employment Length
Grade Historical Default
Amount Income
CREDIT RISK MODELING IN PYTHON
Data columns
Mix of behavioral and application Column Column
Contain columns simulating credit bureau Income Loan grade
data Age Loan amount
Home ownership Interest rate
Employment length Loan status
Loan intent Historical default
Percent Income Credit history length
CREDIT RISK MODELING IN PYTHON
Exploring with cross tables
pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'],
values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)
CREDIT RISK MODELING IN PYTHON
Exploring with visuals
plt.scatter(cr_loan['person_income'], cr_loan['loan_int_rate'],c='blue', alpha=0.5)
plt.xlabel("Personal Income")
plt.ylabel("Loan Interest Rate")
plt.show()
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Outliers in Credit
Data
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Data processing
Prepared data allows models to train faster
O en positively impacts model performance
CREDIT RISK MODELING IN PYTHON
Outliers and performance
Possible causes of outliers:
Problems with data entry systems (human error)
Issues with data ingestion tools
CREDIT RISK MODELING IN PYTHON
Outliers and performance
Possible causes of outliers:
Problems with data entry systems (human error)
Issues with data ingestion tools
Feature Coe cient With Outliers Coe cient Without Outliers
Interest Rate 0.2 0.01
Employment Length 0.5 0.6
Income 0.6 0.75
CREDIT RISK MODELING IN PYTHON
Detecting outliers with cross tables
Use cross tables with aggregate functions
pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'],
values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)
CREDIT RISK MODELING IN PYTHON
Detecting outliers visually
Detecting outliers visually
Histograms
Sca er plots
CREDIT RISK MODELING IN PYTHON
Removing outliers
Use the .drop() method within Pandas
indices = cr_loan[cr_loan['person_emp_length'] >= 60].index
cr_loan.drop(indices, inplace=True)
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Risk with missing
data in loan data
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
What is missing data?
NULLs in a row instead of an actual value
An empty string ''
Not an entirely empty row
Can occur in any column in the data
CREDIT RISK MODELING IN PYTHON
Similarities with outliers
Negatively a ect machine learning model performance
May bias models in unanticipated ways
May cause errors for some machine learning models
CREDIT RISK MODELING IN PYTHON
Similarities with outliers
Negatively a ect machine learning model performance
May bias models in unanticipated ways
May cause errors for some machine learning models
Missing Data Type Possible Result
NULL in numeric column Error
NULL in string column Error
CREDIT RISK MODELING IN PYTHON
How to handle missing data
Generally three ways to handle missing data
Replace values where the data is missing
Remove the rows containing missing data
Leave the rows with missing data unchanged
Understanding the data determines the course of action
CREDIT RISK MODELING IN PYTHON
How to handle missing data
Generally three ways to handle missing data
Replace values where the data is missing
Remove the rows containing missing data
Leave the rows with missing data unchanged
Understanding the data determines the course of action
Missing Data Interpretation Action
NULL in loan_status Loan recently approved Remove from prediction data
NULL in person_age Age not recorded or disclosed Replace with median
CREDIT RISK MODELING IN PYTHON
Finding missing data
Null values are easily found by using the isnull() function
Null records can easily be counted with the sum() function
.any() method checks all columns
null_columns = cr_loan.columns[cr_loan.isnull().any()]
cr_loan[null_columns].isnull().sum()
# Total number of null values per column
person_home_ownership 25
person_emp_length 895
loan_intent 25
loan_int_rate 3140
cb_person_default_on_file 15
CREDIT RISK MODELING IN PYTHON
Replacing Missing data
Replace the missing data using methods like .fillna() with aggregate functions and
methods
cr_loan['loan_int_rate'].fillna((cr_loan['loan_int_rate'].mean()), inplace = True)
CREDIT RISK MODELING IN PYTHON
Dropping missing data
Uses indices to identify records the same as with outliers
Remove the records entirely using the .drop() method
indices = cr_loan[cr_loan['person_emp_length'].isnull()].index
cr_loan.drop(indices, inplace=True)
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Logistic regression
for probability of
default
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Probability of default
The likelihood that someone will default on a loan is the probability of default
A probability value between 0 and 1 like 0.86
loan_status of 1 is a default or 0 for non-default
CREDIT RISK MODELING IN PYTHON
Probability of default
The likelihood that someone will default on a loan is the probability of default
A probability value between 0 and 1 like 0.86
loan_status of 1 is a default or 0 for non-default
Probability of Default Interpretation Predicted loan status
0.4 Unlikely to default 0
0.90 Very likely to default 1
0.1 Very unlikely to default 0
CREDIT RISK MODELING IN PYTHON
Predicting probabilities
Probabilities of default as an outcome from machine learning
Learn from data in columns (features)
Classi cation models (default, non-default)
Two most common models:
Logistic regression
Decision tree
CREDIT RISK MODELING IN PYTHON
Logistic regression
Similar to the linear regression, but only produces values between 0 and 1
CREDIT RISK MODELING IN PYTHON
Training a logistic regression
Logistic regression available within the scikit-learn package
from sklearn.linear_model import LogisticRegression
Called as a function with or without parameters
clf_logistic = LogisticRegression(solver='lbfgs')
Uses the method .fit() to train
clf_logistic.fit(training_columns, np.ravel(training_labels))
Training Columns: all of the columns in our data except loan_status
Labels: loan_status (0,1)
CREDIT RISK MODELING IN PYTHON
Training and testing
Entire data set is usually split into two parts
CREDIT RISK MODELING IN PYTHON
Training and testing
Entire data set is usually split into two parts
Data Subset Usage Portion
Train Learn from the data to generate predictions 60%
Test Test learning on new unseen data 40%
CREDIT RISK MODELING IN PYTHON
Creating the training and test sets
Separate the data into training columns and labels
X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]
Use train_test_split() function already within sci-kit learn
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123)
test_size : percentage of data for test set
random_state : a random seed value for reproducibility
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Predicting the
probability of
default
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Logistic regression coefficients
# Model Intercept
array([-3.30582292e-10])
# Coefficients for ['loan_int_rate','person_emp_length','person_income']
array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]])
# Calculating probability of default
int_coef_sum = -3.3e-10 +
(1.29e-09 * loan_int_rate) + (-2.28e-09 * person_emp_length) + (-2.17e-05 * person_income)
prob_default = 1 / (1 + np.exp(-int_coef_sum))
prob_nondefault = 1 - (1 / (1 + np.exp(-int_coef_sum)))
CREDIT RISK MODELING IN PYTHON
Interpreting coefficients
# Intercept
intercept = -1.02
# Coefficient for employment length
person_emp_length_coef = -0.056
For every 1 year increase in person_emp_length , the person is less likely to default
CREDIT RISK MODELING IN PYTHON
Interpreting coefficients
# Intercept
intercept = -1.02
# Coefficient for employment length
person_emp_length_coef = -0.056
For every 1 year increase in person_emp_length , the person is less likely to default
intercept person_emp_length value * coef probability of default
-1.02 10 (10 * -0.06 ) .17
-1.02 11 (11 * -0.06 ) .16
-1.02 12 (12 * -0.06 ) .15
CREDIT RISK MODELING IN PYTHON
Using non-numeric columns
Numeric: loan_int_rate , person_emp_length , person_income
Non-numeric:
cr_loan_clean['loan_intent']
EDUCATION
MEDICAL
VENTURE
PERSONAL
DEBTCONSOLIDATION
HOMEIMPROVEMENT
Will cause errors with machine learning models in Python unless processed
CREDIT RISK MODELING IN PYTHON
One-hot encoding
Represent a string with a number
CREDIT RISK MODELING IN PYTHON
One-hot encoding
Represent a string with a number
0 or 1 in a new column column_VALUE
CREDIT RISK MODELING IN PYTHON
Get dummies
Utilize the get_dummies() within pandas
# Separate the numeric columns
cred_num = cr_loan.select_dtypes(exclude=['object'])
# Separate non-numeric columns
cred_cat = cr_loan.select_dtypes(include=['object'])
# One-hot encode the non-numeric columns only
cred_cat_onehot = pd.get_dummies(cred_cat)
# Union the numeric columns with the one-hot encoded columns
cr_loan = pd.concat([cred_num, cred_cat_onehot], axis=1)
CREDIT RISK MODELING IN PYTHON
Predicting the future, probably
Use the .predict_proba() method within scikit-learn
# Train the model
clf_logistic.fit(X_train, np.ravel(y_train))
# Predict using the model
clf_logistic.predict_proba(X_test)
Creates array of probabilities of default
# Probabilities: [[non-default, default]]
array([[0.55, 0.45]])
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Credit model
performance
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Model accuracy scoring
Calculate accuracy
Use the .score() method from scikit-learn
# Check the accuracy against the test data
clf_logistic1.score(X_test,y_test)
0.81
81% of values for loan_status predicted correctly
CREDIT RISK MODELING IN PYTHON
ROC curve charts
Receiver Operating Characteristic curve
Plots true positive rate (sensitivity) against false positive rate (fall-out)
fallout, sensitivity, thresholds = roc_curve(y_test, prob_default)
plt.plot(fallout, sensitivity, color = 'darkorange')
CREDIT RISK MODELING IN PYTHON
Analyzing ROC charts
Area Under Curve (AUC): area between curve and random prediction
CREDIT RISK MODELING IN PYTHON
Default thresholds
Threshold: at what point a probability is a default
CREDIT RISK MODELING IN PYTHON
Setting the threshold
Relabel loans based on our threshold of 0.5
preds = clf_logistic.predict_proba(X_test)
preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default'])
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0)
CREDIT RISK MODELING IN PYTHON
Credit classification reports
classification_report() within scikit-learn
from sklearn.metrics import classification_report
classification_report(y_test, preds_df['loan_status'], target_names=target_names)
CREDIT RISK MODELING IN PYTHON
Selecting classification metrics
Select and store speci c components from the classification_report()
Use the precision_recall_fscore_support() function from scikit-learn
from sklearn.metrics import precision_recall_fscore_support
precision_recall_fscore_support(y_test,preds_df['loan_status'])[1][1]
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Model
discrimination and
impact
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Confusion matrices
Shows the number of correct and incorrect predictions for each loan_status
CREDIT RISK MODELING IN PYTHON
Default recall for loan status
Default recall (or sensitivity) is the proportion of true defaults predicted
CREDIT RISK MODELING IN PYTHON
Recall portfolio impact
Classi cation report - Underperforming Logistic Regression model
CREDIT RISK MODELING IN PYTHON
Recall portfolio impact
Classi cation report - Underperforming Logistic Regression model
Number of true defaults: 50,000
Loan Amount Defaults Predicted / Not Predicted Estimated Loss on Defaults
$50 .04 / .96 (50000 x .96) x 50 = $2,400,000
CREDIT RISK MODELING IN PYTHON
Recall, precision, and accuracy
Di cult to maximize all of them because there is a trade-o
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Gradient boosted
trees with XGBoost
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Decision trees
Creates predictions similar to logistic regression
Not structured like a regression
CREDIT RISK MODELING IN PYTHON
Decision trees for loan status
Simple decision tree for predicting loan_status probability of default
CREDIT RISK MODELING IN PYTHON
Decision tree impact
Loan True loan status Pred. Loan Status Loan payo value Selling Value Gain/Loss
1 0 1 $1,500 $250 -$1,250
2 0 1 $1,200 $250 -$950
CREDIT RISK MODELING IN PYTHON
A forest of trees
XGBoost uses many simplistic trees (ensemble)
Each tree will be slightly be er than a coin toss
CREDIT RISK MODELING IN PYTHON
Creating and training trees
Part of the xgboost Python package, called xgb here
Trains with .fit() just like the logistic regression model
# Create a logistic regression model
clf_logistic = LogisticRegression()
# Train the logistic regression
clf_logistic.fit(X_train, np.ravel(y_train))
# Create a gradient boosted tree model
clf_gbt = xgb.XGBClassifier()
# Train the gradient boosted tree
clf_gbt.fit(X_train,np.ravel(y_train))
CREDIT RISK MODELING IN PYTHON
Default predictions with XGBoost
Predicts with both .predict() and .predict_proba()
.predict_proba() produces a value between 0 and 1
.predict() produces a 1 or 0 for loan_status
# Predict probabilities of default
gbt_preds_prob = clf_gbt.predict_proba(X_test)
# Predict loan_status as a 1 or 0
gbt_preds = clf_gbt.predict(X_test)
# gbt_preds_prob
array([[0.059, 0.940], [0.121, 0.989]])
# gbt_preds
array([1, 1, 0...])
CREDIT RISK MODELING IN PYTHON
Hyperparameters of gradient boosted trees
Hyperparameters: model parameters (se ings) that cannot be learned from data
Some common hyperparameters for gradient boosted trees
learning_rate : smaller values make each step more conservative
max_depth : sets how deep each tree can go, larger means more complex
xgb.XGBClassifier(learning_rate = 0.2,
max_depth = 4)
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Column selection for
credit risk
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Choosing specific columns
We've been using all columns for predictions
# Selects a few specific columns
X_multi = cr_loan_prep[['loan_int_rate','person_emp_length']]
# Selects all data except loan_status
X = cr_loan_prep.drop('loan_status', axis = 1)
How you can tell how important each column is
Logistic Regression: column coe cients
Gradient Boosted Trees: ?
CREDIT RISK MODELING IN PYTHON
Column importances
Use the .get_booster() and .get_score() methods
Weight: the number of times the column appears in all trees
# Train the model
clf_gbt.fit(X_train,np.ravel(y_train))
# Print the feature importances
clf_gbt.get_booster().get_score(importance_type = 'weight')
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}
CREDIT RISK MODELING IN PYTHON
Column importance interpretation
# Column importances from importance_type = 'weight'
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}
CREDIT RISK MODELING IN PYTHON
Plotting column importances
Use the plot_importance() function
xgb.plot_importance(clf_gbt, importance_type = 'weight')
{'person_income': 315, 'loan_int_rate': 195, 'loan_percent_income': 146}
CREDIT RISK MODELING IN PYTHON
Choosing training columns
Column importance is used to sometimes decide which columns to use for training
Di erent sets a ect the performance of the models
Model Model Default
Columns Importances
Accuracy Recall
loan_int_rate, person_emp_length (100, 100) 0.81 0.67
loan_int_rate, person_emp_length,
(98, 70, 5) 0.84 0.52
loan_percent_income
CREDIT RISK MODELING IN PYTHON
F1 scoring for models
Thinking about accuracy and recall for di erent column groups is time consuming
F1 score is a single metric used to look at both accuracy and recall
Shows up as a part of the classification_report()
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Cross validation for
credit models
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Cross validation basics
Used to train and test the model in a way that simulates using the model on new data
Segments training data into di erent pieces to estimate future performance
Uses DMatrix , an internal structure optimized for XGBoost
Early stopping tells cross validation to stop a er a scoring metric has not improved a er a
number of iterations
CREDIT RISK MODELING IN PYTHON
How cross validation works
Processes parts of training data as (called folds) and tests against unused part
Final testing against the actual test set
1 h ps://scikit-learn.org/stable/modules/cross_validation.html
CREDIT RISK MODELING IN PYTHON
Setting up cross validation within XGBoost
# Set the number of folds
n_folds = 2
# Set early stopping number
early_stop = 5
# Set any specific parameters for cross validation
params = {'objective': 'binary:logistic',
'seed': 99, 'eval_metric':'auc'}
'binary':'logistic' is used to specify classi cation for loan_status
'eval_metric':'auc' tells XGBoost to score the model's performance on AUC
CREDIT RISK MODELING IN PYTHON
Using cross validation within XGBoost
# Restructure the train data for xgboost
DTrain = xgb.DMatrix(X_train, label = y_train)
# Perform cross validation
xgb.cv(params, DTrain, num_boost_round = 5, nfold=n_folds,
early_stopping_rounds=early_stop)
DMatrix() creates a special object for xgboost optimized for training
CREDIT RISK MODELING IN PYTHON
The results of cross validation
Creates a data frame of the values from the cross validation
CREDIT RISK MODELING IN PYTHON
Cross validation scoring
Uses cross validation and scoring metrics with cross_val_score() function in scikit-learn
# Import the module
from sklearn.model_selection import cross_val_score
# Create a gbt model
xg = xgb.XGBClassifier(learning_rate = 0.4, max_depth = 10)
# Use cross valudation and accuracy scores 5 consecutive times
cross_val_score(gbt, X_train, y_train, cv = 5)
array([0.92748092, 0.92575308, 0.93975392, 0.93378608, 0.93336163])
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Class imbalance in
loan data
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Not enough defaults in the data
The values of loan_status are the classes
Non-default: 0
Default: 1
y_train['loan_status'].value_counts()
loan_status Training Data Count Percentage of Total
0 13,798 78%
1 3,877 22%
CREDIT RISK MODELING IN PYTHON
Model loss function
Gradient Boosted Trees in xgboost use a loss function of log-loss
The goal is to minimize this value
True loan status Predicted probability Log Loss
1 0.1 2.3
0 0.9 2.3
An inaccurately predicted default has more negative nancial impact
CREDIT RISK MODELING IN PYTHON
The cost of imbalance
A false negative (default predicted as non-default) is much more costly
Person Loan Amount Potential Pro t Predicted Status Actual Status Losses
A $1,000 $10 Default Non-Default -$10
B $1,000 $10 Non-Default Default -$1,000
Log-loss for the model is the same for both, our actual losses is not
CREDIT RISK MODELING IN PYTHON
Causes of imbalance
Data problems
Credit data was not sampled correctly
Data storage problems
Business processes:
Measures already in place to not accept probable defaults
Probable defaults are quickly sold to other rms
Behavioral factors:
Normally, people do not default on their loans
The less o en they default, the higher their credit rating
CREDIT RISK MODELING IN PYTHON
Dealing with class imbalance
Several ways to deal with class imbalance in data
Method Pros Cons
Increases number of
Gather more data Percentage of defaults may not change
defaults
Increases recall for Model requires more tuning and
Penalize models
defaults maintenance
Sample data Least technical
Fewer defaults in data
di erently adjustment
CREDIT RISK MODELING IN PYTHON
Undersampling strategy
Combine smaller random sample of non-defaults with defaults
CREDIT RISK MODELING IN PYTHON
Combining the split data sets
Test and training set must be put back together
Create two new sets based on actual loan_status
# Concat the training sets
X_y_train = pd.concat([X_train.reset_index(drop = True),
y_train.reset_index(drop = True)], axis = 1)
# Get the counts of defaults and non-defaults
count_nondefault, count_default = X_y_train['loan_status'].value_counts()
# Separate nondefaults and defaults
nondefaults = X_y_train[X_y_train['loan_status'] == 0]
defaults = X_y_train[X_y_train['loan_status'] == 1]
CREDIT RISK MODELING IN PYTHON
Undersampling the non-defaults
Randomly sample data set of non-defaults
Concatenate with data set of defaults
# Undersample the non-defaults using sample() in pandas
nondefaults_under = nondefaults.sample(count_default)
# Concat the undersampled non-defaults with the defaults
X_y_train_under = pd.concat([nondefaults_under.reset_index(drop = True),
defaults.reset_index(drop = True)], axis=0)
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Model evaluation
and implementation
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Comparing classification reports
Create the reports with classification_report() and compare
CREDIT RISK MODELING IN PYTHON
ROC and AUC analysis
Models with be er performance will have more li
More li means the AUC score is higher
CREDIT RISK MODELING IN PYTHON
Model calibration
We want our probabilities of default to accurately represent the model's con dence level
The probability of default has a degree of uncertainty in it's predictions
A sample of loans and their predicted probabilities of default should be close to the
percentage of defaults in that sample
Sample of Average predicted Sample percentage of actual
Calibrated?
loans PD defaults
10 0.12 0.12 Yes
10 0.25 0.65 No
h p://datascienceassn.org/sites/default/ les/Predicting%20good%20probabilities%20with%20supervised%20lea
CREDIT RISK MODELING IN PYTHON
Calculating calibration
Shows percentage of true defaults for each predicted probability
Essentially a line plot of the results of calibration_curve()
from sklearn.calibration import calibration_curve
calibration_curve(y_test, probabilities_of_default, n_bins = 5)
# Fraction of positives
(array([0.09602649, 0.19521012, 0.62035996, 0.67361111]),
# Average probability
array([0.09543535, 0.29196742, 0.46898465, 0.65512207]))
CREDIT RISK MODELING IN PYTHON
Plotting calibration curves
plt.plot(mean_predicted_value, fraction_of_positives, label="%s" % "Example Model")
CREDIT RISK MODELING IN PYTHON
Checking calibration curves
As an example, two events selected (above and below perfect line)
CREDIT RISK MODELING IN PYTHON
Calibration curve interpretation
CREDIT RISK MODELING IN PYTHON
Calibration curve interpretation
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Credit acceptance
rates
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Thresholds and loan status
Previously we set a threshold for a range of prob_default values
This was used to change the predicted loan_status of the loan
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.4 else 0)
Loan prob_default threshold loan_status
1 0.25 0.4 0
2 0.42 0.4 1
3 0.75 0.4 1
CREDIT RISK MODELING IN PYTHON
Thresholds and acceptance rate
Use model predictions to set be er thresholds
Can also be used to approve or deny new loans
For all new loans, we want to deny probable defaults
Use the test data as an example of new loans
Acceptance rate: what percentage of new loans are accepted to keep the number of
defaults in a portfolio low
Accepted loans which are defaults have an impact similar to false negatives
CREDIT RISK MODELING IN PYTHON
Understanding acceptance rate
Example: Accept 85% of loans with the lowest prob_default
CREDIT RISK MODELING IN PYTHON
Calculating the threshold
Calculate the threshold value for an 85% acceptance rate
import numpy as np
# Compute the threshold for 85% acceptance rate
threshold = np.quantile(prob_default, 0.85)
0.804
Loan prob_default Threshold Predicted loan_status Accept or Reject
1 0.65 0.804 0 Accept
2 0.85 0.804 1 Reject
CREDIT RISK MODELING IN PYTHON
Implementing the calculated threshold
Reassign loan_status values using the new threshold
# Compute the quantile on the probabilities of default
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.804 else 0)
CREDIT RISK MODELING IN PYTHON
Bad Rate
Even with a calculated threshold, some of the accepted loans will be defaults
These are loans with prob_default values around where our model is not well calibrated
CREDIT RISK MODELING IN PYTHON
Bad rate calculation
#Calculate the bad rate
np.sum(accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count()
If non-default is 0 , and default is 1 then the sum() is the count of defaults
The .count() of a single column is the same as the row count for the data frame
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Credit strategy and
minimum expected
loss
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Selecting acceptance rates
First acceptance rate was set to 85%, but other rates might be selected as well
Two options to test di erent rates:
Calculate the threshold, bad rate, and losses manually
Automatically create a table of these values and select an acceptance rate
The table of all the possible values is called a strategy table
CREDIT RISK MODELING IN PYTHON
Setting up the strategy table
Set up arrays or lists to store each value
# Set all the acceptance rates to test
accept_rates = [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55,
0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05]
# Create lists to store thresholds and bad rates
thresholds = []
bad_rates = []
CREDIT RISK MODELING IN PYTHON
Calculating the table values
Calculate the threshold and bad rate for all acceptance rates
for rate in accept_rates:
# Calculate threshold
threshold = np.quantile(preds_df['prob_default'], rate).round(3)
# Store threshold value in a list
thresholds.append(np.quantile(preds_gbt['prob_default'], rate).round(3))
# Apply the threshold to reassign loan_status
test_pred_df['pred_loan_status'] = \
test_pred_df['prob_default'].apply(lambda x: 1 if x > thresh else 0)
# Create accepted loans set of predicted non-defaults
accepted_loans = test_pred_df[test_pred_df['pred_loan_status'] == 0]
# Calculate and store bad rate
bad_rates.append(np.sum((accepted_loans['true_loan_status'])
/ accepted_loans['true_loan_status'].count()).round(3))
CREDIT RISK MODELING IN PYTHON
Strategy table interpretation
strat_df = pd.DataFrame(zip(accept_rates, thresholds, bad_rates),
columns = ['Acceptance Rate','Threshold','Bad Rate'])
CREDIT RISK MODELING IN PYTHON
Adding accepted loans
The number of loans accepted for each acceptance rate
Can use len() or .count()
CREDIT RISK MODELING IN PYTHON
Adding average loan amount
Average loan_amnt from the test set data
CREDIT RISK MODELING IN PYTHON
Estimating portfolio value
Average value of accepted loan non-defaults minus average value of accepted defaults
Assumes each default is a loss of the loan_amnt
CREDIT RISK MODELING IN PYTHON
Total expected loss
How much we expect to lose on the defaults in our portfolio
# Probability of default (PD)
test_pred_df['prob_default']
# Exposure at default = loan amount (EAD)
test_pred_df['loan_amnt']
# Loss given default = 1.0 for total loss (LGD)
test_pred_df['loss_given_default']
CREDIT RISK MODELING IN PYTHON
Let's practice!
CREDIT RISK MODELING IN PYTHON
Course wrap up
CREDIT RISK MODELING IN PYTHON
Michael Crabtree
Data Scientist, Ford Motor Company
Your journey...so far
Prepare credit data for machine learning models
Important to understand the data
Improving the data allows for high performing simple models
Develop, score, and understand logistic regressions and gradient boosted trees
Analyze the performance of models by changing the data
Understand the nancial impact of results
Implement the model with an understanding of strategy
CREDIT RISK MODELING IN PYTHON
Risk modeling techniques
The models and framework in this course:
Discrete-time hazard model (point in time): the probability of default is a point-in-time
event
Stuctural model framework: the model explains the default even based on other factors
Other techniques
Through-the-cycle model (continuous time): macro-economic conditions and other e ects
are used, but the risk is seen as an independent event
Reduced-form model framework: a statistical approach estimating probability of default
as an independent Poisson-based event
CREDIT RISK MODELING IN PYTHON
Choosing models
Many machine learning models available, but logistic regression and tree models were used
These models are simple and explainable
Their performance on probabilities is acceptable
Many nancial sectors prefer model interpretability
Complex or "black-box" models are a risk because the business cannot explain their
decisions fully
Deep neural networks are o en too complex
CREDIT RISK MODELING IN PYTHON
Tips from me to you
Focus on the data
Gather as much data as possible
Use many di erent techniques to prepare and enhance the data
Learn about the business
Increase value through data
Model complexity can be a two-edged sword
Really complex models may perform well, but are seen as a "black-box"
In many cases, business users will not accept a model they cannot understand
Complex models can be very large and di cult to put into production
CREDIT RISK MODELING IN PYTHON
Thank you!
CREDIT RISK MODELING IN PYTHON
Why do we need
GARCH models
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Course overview
GARCH: Generalized AutoRegressive Conditional Heteroskedasticity
Chapter 1: GARCH Model Fundamentals
Chapter 2: GARCH Model Con guration
Chapter 3: Model Performance Evaluation
Chapter 4: GARCH in Action
GARCH MODELS IN PYTHON
What is volatility
Describe the dispersion of nancial asset returns over time
O en computed as the standard deviation or variance of price returns
The higher the volatility, the riskier a nancial asset
GARCH MODELS IN PYTHON
How to compute volatility
Step 1: Calculate returns as percentage of price changes
P1 − P0
return =
P0
Step 2: Calculate the sample mean return
∑ni=1 returni
mean =
n
Step 3: Calculate the sample standard deviation
√
2
∑n
(returni − mean)
volatility = i=1
= √variance
n−1
GARCH MODELS IN PYTHON
Compute volatility in Python
Use pandas pct_change() method:
return_data = price_data.pct_change()
Use pandas std() method:
volatility = return_data.std()
GARCH MODELS IN PYTHON
Volatility conversion
Convert to monthly volatility from daily:
(assume 21 trading days in a month)
σmonthly = √21 ∗ σd
Convert to annual volatility from daily:
(assume 252 trading days in a year)
σannual = √252 ∗ σd
GARCH MODELS IN PYTHON
The challenge of volatility modeling
Heteroskedasticity:
In ancient Greek: "di erent" (hetero) + "dispersion" (skedasis)
A time series demonstrates varying volatility systematically over time
GARCH MODELS IN PYTHON
Detect heteroskedasticity
Homoskedasticity vs Heteroskedasticity
GARCH MODELS IN PYTHON
Volatility clustering
VIX historical prices:
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
What are ARCH and
GARCH
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
First came the ARCH
Auto Regressive Conditional Heteroskedasticity
Developed by Robert F. Engle (Nobel prize laureate 2003)
GARCH MODELS IN PYTHON
Then came the GARCH
"Generalized" ARCH
Developed by Tim Bollerslev (Robert F. Engle's student)
GARCH MODELS IN PYTHON
Related statistical terms
White noise (z): Uncorrelated random variables with a zero mean and a nite variance
Residual = predicted value - observed value
GARCH MODELS IN PYTHON
Model notations
Expected return: Expected volatility:
μt = Expected[rt ∣I(t − 1)] σ 2 = Expected[(rt − μt )2 ∣I(t − 1)]
Residual (prediction error): Volatility is related to the residuals:
rt = μt + ϵt ϵt = σt ∗ ζ(W hiteN oise)
GARCH MODELS IN PYTHON
Model equations: ARCH
GARCH MODELS IN PYTHON
Model equations: GARCH
GARCH MODELS IN PYTHON
Model intuition
Autoregressive: predict future behavior based on past behavior
Volatility as a weighted average of past information
GARCH MODELS IN PYTHON
GARCH(1,1) parameter constraints
To make the GARCH(1,1) process realistic, it requires:
All parameters are non-negative, so the variance cannot be negative.
ω, α, β >= 0
Model estimations are "mean-reverting" to the long-run variance.
α+β <1
long-run variance:
ω/(1 − α − β)
GARCH MODELS IN PYTHON
GARCH(1,1) parameter dynamics
The larger the α, the bigger the immediate impact of the shock
The larger the β , the longer the duration of the impact
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
How to implement
GARCH models in
Python
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Python "arch" package
from arch import arch_model
1Kevin Sheppard. (2019, March 28). bashtage/arch: Release 4.8.1 (Version 4.8.1). Zenodo.
h p://doi.org/10.5281/zenodo.2613877
GARCH MODELS IN PYTHON
Workflow
Develop a GARCH model in three steps:
1. Specify the model
2. Fit the model
3. Make a forecast
GARCH MODELS IN PYTHON
Model specification
Model assumptions:
Distribution: "normal" (default), "t" , "skewt"
Mean model: "constant" (default), "zero" , "AR"
Volatility model: "GARCH" (default), "ARCH" , "EGARCH"
basic_gm = arch_model(sp_data['Return'], p = 1, q = 1,
mean = 'constant', vol = 'GARCH', dist = 'normal')
GARCH MODELS IN PYTHON
Model fitting
Display model ing output a er every n iterations:
gm_result = gm_model.fit(update_freq = 4)
Turn o the display:
gm_result = gm_model.fit(disp = 'off')
GARCH MODELS IN PYTHON
Fitted results: parameters
Estimated by "maximum likelihood method"
print(gm_result.params)
mu 0.077239
omega 0.039587
alpha[1] 0.167963
beta[1] 0.786467
Name: params, dtype: float64
GARCH MODELS IN PYTHON
Fitted results: summary
print(gm_result.summary())
GARCH MODELS IN PYTHON
Fitted results: plots
gm_result.plot()
GARCH MODELS IN PYTHON
Model forecasting
# Make 5-period ahead forecast
gm_forecast = gm_result.forecast(horizon = 5)
# Print out the last row of variance forecast
print(gm_forecast.variance[-1:])
h.1 h.2 h.3 h.4 h.5
Date
2019-10-10 0.994079 0.988366 0.982913 0.977708 0.972741
h.1 in row "2019-10-10": 1-step ahead forecast made using data up to and including that date
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Distribution
assumptions
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Why make assumptions
Volatility is not directly observable
GARCH model use residuals as volatility shocks
rt = μt + ϵt
Volatility is related to the residuals:
ϵt = σt ∗ ζ(W hiteN oise)
GARCH MODELS IN PYTHON
Standardized residuals
Residual = predicted return - mean return
residuals = ϵt = rt − μt
Standardized residual = residual / return volatility
ϵt
std Resid =
σt
GARCH MODELS IN PYTHON
Residuals in GARCH
gm_std_resid = gm_result.resid / gm_result.conditional_volatility
plt.hist(gm_std_resid, facecolor = 'orange',label = 'standardized residuals')
GARCH MODELS IN PYTHON
Fat tails
Higher probability to observe large (positive or negative) returns than under a normal
distribution
GARCH MODELS IN PYTHON
Skewness
Measure of asymmetry of a probability distribution
GARCH MODELS IN PYTHON
Student's t-distribution
ν parameter of a Student's t-distribution indicates its shape
GARCH MODELS IN PYTHON
GARCH with t-distribution
arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH',
dist = 't')
Distribution
========================================================================
coef std err t P>|t| 95.0% Conf. Int.
.-----------------------------------------------------------------------
nu 4.9249 0.507 9.709 2.768e-22 [ 3.931, 5.919]
========================================================================
GARCH MODELS IN PYTHON
GARCH with skewed t-distribution
arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH',
dist = 'skewt')
Distribution
===========================================================================
coef std err t P>|t| 95.0% Conf. Int.
.--------------------------------------------------------------------------
nu 5.2437 0.575 9.118 7.681e-20 [ 4.117, 6.371]
lambda -0.0822 2.541e-02 -3.235 1.216e-03 [ -0.132,-3.241e-02]
===========================================================================
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Mean model
specifications
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Constant mean by default
constant mean: generally works well with most nancial return data
arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH')
GARCH MODELS IN PYTHON
Zero mean assumption
zero mean: use when the mean has been modeled separately
arch_model(my_data, p = 1, q = 1,
mean = 'zero', vol = 'GARCH')
GARCH MODELS IN PYTHON
Autoregressive mean
AR mean: model the mean as an autoregressive (AR) process
arch_model(my_data, p = 1, q = 1,
mean = 'AR', lags = 1, vol = 'GARCH')
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Volatility models for
asymmetric shocks
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Asymmetric shocks in financial data
News impact curve:
GARCH MODELS IN PYTHON
Leverage effect
Debt-equity Ratio = Debt / Equity
Stock price goes down, debt-equity ratio goes up
Riskier!
GARCH MODELS IN PYTHON
GJR-GARCH
GARCH MODELS IN PYTHON
GJR-GARCH in Python
arch_model(my_data, p = 1, q = 1, o = 1,
mean = 'constant', vol = 'GARCH')
GARCH MODELS IN PYTHON
EGARCH
A popular option to model asymmetric shocks
Exponential GARCH
Add a conditional component to model the asymmetry in shocks similar to the GJR-GARCH
No non-negative constraints on alpha, beta so it runs faster
GARCH MODELS IN PYTHON
EGARCH in Python
arch_model(my_data, p = 1, q = 1, o = 1,
mean = 'constant', vol = 'EGARCH')
GARCH MODELS IN PYTHON
Which model to use
GJR-GARCH or EGARCH?
Which model is be er depends on the data
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
GARCH rolling
window forecast
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Rolling window for out-of-sample forecast
An exciting part of nancial modeling: predict the unknown
Rolling window forecast: repeatedly perform model ing and forecast as time rolls forward
GARCH MODELS IN PYTHON
Expanding window forecast
Continuously add new data points to the sample
GARCH MODELS IN PYTHON
Motivations of rolling window forecast
Avoid lookback bias
Less subject to over ing
Adapt forecast to new observations
GARCH MODELS IN PYTHON
Implement expanding window forecast
Expanding window forecast:
for i in range(120):
gm_result = basic_gm.fit(first_obs = start_loc,
last_obs = i + end_loc, disp = 'off')
temp_result = gm_result.forecast(horizon = 1).variance
GARCH MODELS IN PYTHON
Fixed rolling window forecast
New data points are added while old ones are dropped from the sample
GARCH MODELS IN PYTHON
Implement fixed rolling window forecast
Fixed rolling window forecast:
for i in range(120):
# Specify rolling window range for model fitting
gm_result = basic_gm.fit(first_obs = i + start_loc,
last_obs = i + end_loc, disp = 'off')
temp_result = gm_result.forecast(horizon = 1).variance
GARCH MODELS IN PYTHON
How to determine window size
Usually determined on a case-by-case basis
Too wide window size: include obsolete data that may lead to higher variance
Too narrow window size: exclude relevant data that may lead to higher bias
The optimal window size: trade-o to balance bias and variance
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Significance testing
of model parameters
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Do I need this parameter?
Is it relevant
KISS: keep it simple stupid
Always prefer a parsimonious model
GARCH MODELS IN PYTHON
Hypothesis test
Null hypothesis (H0): a claim to be veri ed
H0: parameter value = 0
If H0 cannot be rejected, leave out the parameter
GARCH MODELS IN PYTHON
Statistical significance
Quantify having the observed results by chance
Common threshold: 5%
GARCH MODELS IN PYTHON
P-value
The odds of the observed results could have happened by chance
The lower the p-value, the more ridiculous the null hypothesis looks
Reject the null hypothesis if p-value < signi cance level
GARCH MODELS IN PYTHON
P-value example
print(gm_result.summary()) print(gm_result.pvalues)
mu 9.031206e-08
omega 1.619415e-05
alpha[1] 4.283526e-10
beta[1] 1.302531e-183
Name: pvalues, dtype: float64
GARCH MODELS IN PYTHON
T-statistic
T-statistic = estimated parameter / standard error
The absolute value of the t-statistic is a distance measure
If |t-statistic| > 2: keep the parameter in the GARCH model
GARCH MODELS IN PYTHON
T-statistic example
print(gm_result.summary()) print(gm_result.tvalues)
mu 5.345210
omega 4.311785
alpha[1] 6.243330
beta[1] 28.896991
Name: tvalues, dtype: float64
# Manual calculation
t = gm_result.params/gm_result.std_err
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Validation of GARCH
model assumptions
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Visual check
GARCH MODELS IN PYTHON
Autocorrelation
Describe the correlation of a variable with itself given a time lag
Existence of autocorrelation in the standardized residuals indicates the model may not be
sound
To detect autocorrelation:
ACF plot
Ljung-Box
GARCH MODELS IN PYTHON
ACF plot
ACF: AutoCorrelation Function
ACF Plot: visual representation of the autocorrelation by lags
Red area in the plot indicates the con dence level (alpha = 5%)
GARCH MODELS IN PYTHON
ACF plot in Python
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(my_data, alpha = 0.05)
GARCH MODELS IN PYTHON
Ljung-Box test
Test whether any of a group of autocorrelations of a time series are di erent from zero
H0: the data is independently distributed
P-value < 5%: the model is not sound
GARCH MODELS IN PYTHON
Ljung-Box test Python
# Import the Python module
from statsmodels.stats.diagnostic import acorr_ljungbox
# Perform the Ljung-Box test
lb_test = acorr_ljungbox(std_resid , lags = 10)
# Check p-values
print('P-values are: ', lb_test[1])
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Goodness of fit
measures
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Goodness of fit
Can model do a good job explaining the data?
1. Maximum likelihood
2. Information criteria
GARCH MODELS IN PYTHON
Maximum likelihood
Maximize the probability of ge ing the data observed under the assumed model
Prefer models with larger likelihood values
GARCH MODELS IN PYTHON
Log-likelihood in Python
Typically used in log form: log-likelihood
print(gm_result.loglikelihood)
GARCH MODELS IN PYTHON
Overfitting
Fit in-sample data well, but perform poorly on out-out-sample predictions
Usually due to the model is overly complex
GARCH MODELS IN PYTHON
Information criteria
Measure the trade-o between goodness of t and model complexity
Likelihood + penalty for model complexity
AIC: Akaike's Information Criterion
BIC: Bayesian Information Criterion
_Prefer models with the lower information criterion score _
GARCH MODELS IN PYTHON
AIC vs. BIC
Generally they agree with each other
BIC penalizes model complexity more severely
GARCH MODELS IN PYTHON
AIC/BIC in Python
print(gm_result.aic)
print(gm_result.bic)
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
GARCH model
backtesting
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Backtesting
An approach to evaluate model forecasting capability
Compare the model predictions with the actual historical data
GARCH MODELS IN PYTHON
In-sample vs. out-of-sample
In-sample: model ing
Out-of-sample: backtesting
GARCH MODELS IN PYTHON
MAE
Mean Absolute Error
GARCH MODELS IN PYTHON
MSE
Mean Squared Error
GARCH MODELS IN PYTHON
Calculate MAE, MSE in Python
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Call function to calculate MAE
mae = mean_absolute_error(observation, forecast)
# Call function to calculate MSE
mse = mean_squared_error(observation, forecast)
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
VaR in financial risk
management
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
Risk management mindset
Rule No.1: Never lose money
Rule No.2 Never forget Rule No.1
-- Warren Bu e
GARCH MODELS IN PYTHON
What is VaR
VaR stands for Value at Risk
Three ingredients:
1. portfolio
2. time horizon
3. probability
GARCH MODELS IN PYTHON
VaR examples
_1-day 5% VaR of $1 million _
5% probability the portfolio will fall in value by 1 million dollars or more over a 1-day period
10-day 1% VaR of $9 million
1% probability the portfolio will fall in value by 9 million dollars or more over a 10-day period
GARCH MODELS IN PYTHON
VaR in risk management
Set risk limits
VaR exceedance: portfolio loss exceeds the VaR
GARCH MODELS IN PYTHON
Dynamic VaR with GARCH
More realistic VaR estimation with GARCH
VaR = mean + (GARCH vol) * quantile
VaR = mean_forecast.values + np.sqrt(variance_forecast).values * quantile
GARCH MODELS IN PYTHON
Dynamic VaR calculation
Step 1: Use GARCH model to make variance forecast
# Specify and fit a GARCH model
basic_gm = arch_model(bitcoin_data['Return'], p = 1, q = 1,
mean = 'constant', vol = 'GARCH', dist = 't')
gm_result = basic_gm.fit()
# Make variance forecast
gm_forecast = gm_result.forecast(start = '2019-01-01')
GARCH MODELS IN PYTHON
Dynamic VaR calculation (cont.)
Step 2: Use GARCH model to obtain forward-looking mean and volatility
mean_forecast = gm_forecast.mean['2019-01-01':]
variance_forecast = gm_forecast.variance['2019-01-01':]
Step 3: Obtain the quantile according to a con dence level
1. Parametric VaR
2. Empirical VaR
GARCH MODELS IN PYTHON
Parametric VaR
Estimate quantiles based on GARCH assumed distribution of the standardized residuals
# Assume a Student's t-distribution
# ppf(): Percent point function
q_parametric = garch_model.distribution.ppf(0.05, nu)
GARCH MODELS IN PYTHON
Empirical VaR
Estimate quantiles based on the observed distribution of the GARCH standardized residuals
q_empirical = std_resid.quantile(0.05)
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Dynamic covariance
in portfolio
optimization
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
What is covariance
Describe the relationship between movement of two variables
Positive covariance: move together
Negative covariance; move in the opposite directions
GARCH MODELS IN PYTHON
Dynamic covariance with GARCH
If two asset returns have correlation ρ and time-varying volatility of σ1 and σ2 :
Covariance = ρ ⋅ σ1 ⋅ σ2
covariance = correlation * garch_vol1 * garch_vol2
GARCH MODELS IN PYTHON
Calculate GARCH covariance in Python
Step 1: Fit GARCH models and obtain volatility for each return series
# gm_eur, gm_cad are fitted GARCH models
vol_eur = gm_eur.conditional_volatility
vol_cad = gm_cad.conditional_volatility
Step 2: Compute standardized residuals from the ed GARCH models
resid_eur = gm_eur.resid/vol_eur
resid_cad = gm_cad.resid/vol_cad
GARCH MODELS IN PYTHON
Calculate GARCH covariance in Python (cont.)
Step 3: Compute ρ as simple correlation of standardized residuals
corr = np.corrcoef(resid_eur, resid_cad)[0,1]
Step 4: Compute GARCH covariance by multiplying the correlation and volatility.
covariance = corr * vol_eur * vol_cad
GARCH MODELS IN PYTHON
Modern portfolio theory (MPT)
Pioneered by Harry Markowitz in his paper "Portfolio Selection"(1952)
Take advantage of the diversi cation e ect
The optimal portfolio can yield the maximum return with the minimum risk
GARCH MODELS IN PYTHON
MPT intuition
Variance of a simple two-asset portfolio:
_W1∗ Variance1 + W2∗ Variance2 + 2∗W1∗W2∗Covariance _
Diversi cation e ect:
Risk can be reduced in a portfolio by pairing assets that have a negative covariance
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Dynamic Beta in
portfolio
management
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
What is Beta
Stock Beta:
a measure of stock volatility in relation to the general market
Systematic risk:
the portion of the risk that cannot be diversi ed away
GARCH MODELS IN PYTHON
Beta in portfolio management
_Gauge investment risk _
Market Beta = 1: used as benchmark
Beta > 1: the stock bears more risks than the general market
Beta < 1: the stock bears less risks than the general market
GARCH MODELS IN PYTHON
Beta in CAPM
Estimate risk premium of a stock
CAPM: Capital Asset Pricing Model
E(Rs ) = Rf + β (E(Rm ) − Rf )
E(Rs ): stock required rate of return
Rf : risk-free rate (e.g. Treasuries)
E(Rm ): market expected return (e.g. S&P 500)
E(Rm ) − Rf : Market premium
GARCH MODELS IN PYTHON
Dynamic Beta with GARCH
Beta = ρ * σ _stock / σ __market
GARCH MODELS IN PYTHON
Calculate dynamic Beta in Python
1). Compute correlation between S&P500 and stock
resid_stock = stock_gm.resid / stock_gm.conditional_volatility
resid_sp500 = sp500_gm.resid / sp500_gm.conditional_volatility
correlation = numpy.corrcoef(resid_stock, resid_sp500)[0, 1]
2). Compute dynamic Beta for the stock
stock_beta = correlation * (stock_gm.conditional_volatility /
sp500_gm.conditional_volatility)
GARCH MODELS IN PYTHON
Let's practice!
GARCH MODELS IN PYTHON
Congratulations!
GARCH MODELS IN PYTHON
Chelsea Yang
Data Science Instructor
You did it
Fit GARCH models
Make volatility forecast
Evaluate model performance
GARCH in action: VaR, covariance, Beta
GARCH MODELS IN PYTHON
Going forward
Time series analysis
ARIMA (AutoRegressive Integrated Moving Average) models
CAPM (Capital Asset Pricing Model)
Portfolio optimization
GARCH MODELS IN PYTHON
Have fun and keep
improving!
GARCH MODELS IN PYTHON