0% found this document useful (0 votes)
158 views26 pages

Copula Regression

The document describes copula regression, a technique for modeling relationships between variables with any joint distribution. Copula regression models the marginal distributions of each variable separately and the dependence between variables using a copula. This allows greater flexibility than techniques that assume a joint distribution like multivariate normal. The document provides examples of copula regression when variables are continuous or discrete. It also discusses how to estimate parameters, compute standard errors, and compare performance to ordinary least squares regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views26 pages

Copula Regression

The document describes copula regression, a technique for modeling relationships between variables with any joint distribution. Copula regression models the marginal distributions of each variable separately and the dependence between variables using a copula. This allows greater flexibility than techniques that assume a joint distribution like multivariate normal. The document provides examples of copula regression when variables are continuous or discrete. It also discusses how to estimate parameters, compute standard errors, and compare performance to ordinary least squares regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Copula Regression

BY
RAHUL A. PARSA
DRAKE UNIVERSITY
&
STUART A. KLUGMAN
SOCIETY OF ACTUARIES

Outline of Talk
OLS Regression
Generalized Linear Models (GLM)
Copula Regression
Continuous case
Discrete Case
Examples

Notation
Notation:
Y Dependent Variable

X 1 , X 2 , X k Independent Variables

Assumption
Y is related to Xs in some functional form

E[ Y | X 1 x1 X n xn ] f ( X 1 , X 2 , X n )

OLS Regression
Y is linearly related to Xs

OLS Model

Yi 0 1 X 1i 2 X 2i k X ki i

OLS Regression
2

min (Yi Yi )

Estimated Model
1

Y X ' X X 'Y
Yi 0 1 X 1i k X ki

OLS
Multivariate Normal Distribution
Assume Y , X 1 , X 2 , X k
Jointly follow a multivariate normal distribution
Then the conditional distribution of Y | X follows
normal distribution with mean and variance given by

E(Y | X x) y YX ( x x )
1
XX

Variance YY YX

1
XX

YX

OLS & MVN


Y-hat = Estimated Conditional mean
It is the MLE
Estimated Conditional Variance is the error variance
OLS and MLE result in same values
Closed form solution exists

GLM
Y belongs to an exponential family of distributions

E (Y | X x) g ( 0 1 x1 k xk )
1

g is called the link function


x's are not random
Y|x belongs to the exponential family
Conditional variance is no longer constant
Parameters are estimated by MLE using numerical

methods

GLM
Generalization of GLM: Y can be any distribution

(See Loss Models)


Computing predicted values is difficult
No convenient expression conditional variance

Copula Regression
Y can have any distribution
Each Xi can have any distribution
The joint distribution is described by a Copula
Estimate Y by E(Y|X=x) conditional mean

Copula
Ideal Copulas will have the following properties:
ease of simulation
closed form for conditional density
different degrees of association available for different
pairs of variables.

Good Candidates are:


Gaussian or MVN Copula
t-Copula

MVN Copula
CDF for MVN is Copula is
1

F ( x1 , x2 , , xn ) G ( [ F ( x1 )], [ F ( xn )])
Where G is the multivariate normal cdf with zero

mean, unit variance, and correlation matrix R.


Density of MVN Copula is
vT ( R 1 I )v
0.5
f ( x1 , x2 ,, xn ) f ( x1 ) f ( x2 ) f ( xn ) exp
* R
2

Where v is a vector with ith element vi 1[ F ( xi )]

Conditional Distribution in
MVN Copula
The conditional distribution of xn given x1 .xn-1 is

{F ( xn ) r T Rn11vn1}2
1
2
T 1
0.5
f ( xn | x1 xn1 ) f ( xn ) * exp 0.5 *

[
F
(
x
)]}
*
(
1

r
R
r
)

n
n

T 1
(
1

r
Rn1r )

Where

vn 1 (v1 , vn 1 )

Rn1
R T
r

Copula Regression
Continuous Case

Parameters are estimated by MLE.

If Y , X 1 , X k

are continuous variables, then


we use previous equation to find the conditional
mean.
one-dimensional numerical integration is needed to
compute the mean.

Copula Regression
Discrete Case
When one of the covariates is discrete
Problem:
determining discrete probabilities from the Gaussian
copula requires computing many multivariate
normal distribution function values and thus
computing the likelihood function is difficult
Solution:
Replace discrete distribution by a continuous
distribution using a uniform kernel.

Copula Regression Standard Errors


How to compute standard errors of the estimates?
As n -> , MLE , q converges to a normal
n

distribution with mean q and variance I(q)-1, where

I (q ) n * E 2 ln( f ( X ,q ))
q

I(q) Information Matrix.

How to compute Standard Errors


Loss Models: To obtain information matrix, it is

necessary to take both derivatives and expected


values, which is not always easy. A way to avoid this
problem is to simply not take the expected value.
It is called Observed Information.

Examples
All examples have three variables
R Matrix :

0.7

0.7

0.7

0.7

0.7

0.7

Error measured by

(Yi Yi )

Also compared to OLS

Example 1
Dependent X3 - Gamma

Though X2 is simulated from Pareto, parameter

estimates do not converge, gamma model fit


Variables

X1-Pareto

X2-Pareto

X3-Gamma

Parameters

3, 100

4, 300

3, 100

3.44, 161.11

1.04, 112.003

3.77, 85.93

MLE

Error:

Copula

59000.5

OLS

637172.8

Ex 1 - Standard Errors
Diagonal terms are standard deviations and off-

diagonal terms are correlations


X1 Pareto
Alpha1

Theta1

X2 Gamma
Alpha2

Theta2

X3 Gamma
Alpha3

Theta3

R(2,1)

R(3,1)

R(3,2)

Alpha1

0.266606 0.966067 0.359065 -0.33725 0.349482 -0.33268 -0.42141 -0.33863 -0.29216

Theta1

0.966067 15.50974 0.390428 -0.25236 0.346448 -0.26734 -0.37496 -0.29323 -0.25393

Alpha2

0.359065 0.390428 0.025217 -0.78766 0.438662 -0.35533 -0.45221 -0.30294 -0.42493

Theta2

-0.33725 -0.25236 -0.78766 3.558369 -0.38489 0.464513 0.496853

Alpha3

0.349482 0.346448 0.438662 -0.38489 0.100156 -0.93602 -0.34454 -0.46358 -0.46292

Theta3

-0.33268 -0.26734 -0.35533 0.464513 -0.93602 2.485305 0.365629 0.482187 0.481122

R(2,1)
R(3,1)
R(3,2)

-0.42141 -0.37496 -0.45221 0.496853 -0.34454 0.365629 0.010085 0.457452 0.465885


-0.33863 -0.29323 -0.30294 0.35608 -0.46358 0.482187 0.457452 0.01008 0.481447
-0.29216 -0.25393 -0.42493 0.470009 -0.46292 0.481122 0.465885 0.481447 0.009706

0.35608 0.470009

Example 1 - Cont
Maximum likelihood Estimate of Correlation Matrix

R-hat =

0.711

0.699

0.711

0.713

0.699

0.713

Example 2
Dependent X3 - Gamma

X1 & X2 estimated Empirically


Variables

X1-Pareto

X2-Pareto

X3-Gamma

Parameters

3, 100

4, 300

3, 100

F(x) = x/n 1/2n


f(x) = 1/n

F(x) = x/n 1/2n


f(x) = 1/n

4.03, 81.04

MLE

Error:

Copula

595,947.5

OLS

637,172.8

GLM

814,264.754

Example 3
Dependent X3 - Gamma

Pareto for X2 estimated by Exponential


Variables

X1-Poisson

X2-Pareto

X3-Gamma

Parameters

4, 300

3, 100

5.65

119.39

3.67, 88.98

MLE

Error:

Copula

574,968

OLS

582,459.5

Example 4
Dependent X3 - Gamma

X1 & X2 estimated Empirically


C = # of obs x and a = (# of obs = x)
Variables

X1-Poisson

X2-Pareto

X3-Gamma

Parameters

4, 300

3, 100

F(x) = c/n + a/2n


f(x) = a/n

F(x) = x/n 1/2n


f(x) = 1/n

3.96, 82.48

MLE

Error:

Copula

OLS

GLM

559,888.8

582,459.5

652,708.98

Example 5
Dependent X1 - Poisson

X2, estimated by Exponential


Variables

X1-Poisson

X2-Pareto

X3-Gamma

Parameters

4, 300

3, 100

5.65

119.39

3.66, 88.98

MLE

Error:

Copula

108.97

OLS

114.66

Example 6
Dependent X1 - Poisson

X2 & X3 estimated by Empirically

Variables

X1-Poisson

X2-Pareto

X3-Gamma

Parameters

4, 300

3, 100

5.67

F(x) = x/n 1/2n


f(x) = 1/n

F(x) = x/n 1/2n


f(x) = 1/n

MLE

Error:

Copula

110.04

OLS

114.66

You might also like