Copulas - Course Notes
Copulas - Course Notes
17
Copulas
Course Notes
Syllabus objectives
0 Introduction
0.1 Contents
There is no material in the Core Reading for this chapter that relates specifically to R.
Therefore we do not provide detailed Course Notes for this topic. Instead, we have provided an
exam-style worked example below that covers some areas that we think could be examined in the
practical exam.
Copulas – Exercise
Parts (i) and (ii) of this question involve simulating the values from a bivariate copula. You are not
expected to know how to do this for Subject CS2. However, the method we have used here
provides good practice with using some of the features of R. We thought this was more useful
than just giving you a table of pre-calculated values to read in and use for the graphs and
correlation calculations in parts (iii) to (v).
Ex 1 One way to simulate pairs of values from a bivariate copula is to use an acceptance / rejection
approach. Here we generate a large number of pairs of independent uniform random numbers,
then we use another random number to ‘accept’ (ie retain) or ‘reject’ each pair with the correct
probability. The probability is based on the probability density of the copula. However, this
method is sometimes not very efficient in terms of the number of pairs that are rejected in the
process.
1
C (u,v) u v 1 where 0 .
(i) Show that the probability density for this copula is:
2C 1 2
( 1)u 1v 1 u v 1 [6]
uv
(ii) (a) Generate 1,000 pairs of pseudo-random values (u,v) where u and v are
independent values from the U(0,1) distribution, using a ‘seed’ value of 17.
2C
(b) Create a vector (also with 1,000 elements) of the values of with 2 for
uv
each of the pairs (u,v) .
(c) Rescale your vector in (ii)(b) so that all the values lie in the range [0,1] by dividing
the vector by the maximum value.
(d) Create a subset of the pairs in (ii)(a) where each pair is included or excluded with
the rescaled probabilities that you calculated in (ii)(c).
Hint: Create another vector of 1,000 pseudo-random U(0,1) values and ‘accept’
each pair if the random number is less than the rescaled value in (ii)(c).
(e) Count the number of pairs that have been ‘accepted’ using this algorithm and
comment on the efficiency of this method. [20]
(iii) (a) Create a scatterplot of the values of the subset you obtained in (ii)(e).
(iv) (a) Calculate the Pearson, Spearman and Kendall coefficients of correlation for the
points on your scatterplot, quoting your answers to two decimal places.
Hint: You are given that, with the Clayton copula, the theoretical value of Kendall’s
coefficient of correlation is Kendall and the theoretical value of
2
Spearman’s coefficient of correlation when 2 is Spearman 0.68 . [12]
(v) (a) Use your subset of points from (ii)(b) to generate a set of pseudo-random pairs of
values (x , y) where x and y are values from a N(0,1) distribution connected by
the Clayton copula with 2 .
Copulas – Solution
Ex 1 (i) Probability density
C 1
u v 1
u u
u v 1 u 1
1 1 1
1 1
u 1 u v 1
2C 1 1 1
u u v 1
uv v
1 1 2
1 u 1 u v 1 v 1
1 2
( 1)u 1v 1 u v 1
set.seed(17)
We can now generate 1,000 pairs of independent pseudo-random numbers from a U(0,1)
distribution:
n=1000;u=runif(n);v=runif(n)
data=matrix(c(u,v),ncol=2)
head(data)
[,1] [,2]
[1,] 0.1550508 0.97418733
[2,] 0.9683788 0.71681979
[3,] 0.4682631 0.40560335
[4,] 0.7768197 0.10374040
[5,] 0.4078857 0.41535676
[6,] 0.5387971 0.08862091
2C
2.5
3u3v 3 u2 v 2 1
uv
We can define a function density that operates on a pair of values x (u,v) and returns the
value of the copula density:
density=function(x)
3*x[1]^(-3)*x[2]^(-3)*(x[1]^(-2)+x[2]^(-2)-1)^(-2.5)
The x[1] and x[2] in this definition refer to the two components of the vector x.
clayton=apply(data,1,density)
Here we are applying the function to each row of a matrix. This is done using the apply function
and the argument 1, indicating that the function is to be applied across each row. (To apply a
function down each column, we would use the argument 2.)
head(clayton)
maxval=max(clayton)
maxval
[1] 25.9113
probs=clayton/maxval
head(probs)
Following the hint, we can generate another vector of 1,000 uniform random numbers, which we
will use to make the decision whether to include each row:
random=runif(n)
Our outputs will look neater if we add the new columns we have calculated to our table:
fulldata=cbind(data,probs,random)
Just typing the cbind( ) command by itself would achieve the same objective of adding the
extra columns. However, R then automatically prints out the whole table (1,000 rows)! So we’ve
included the fulldata= to assign the answer to a variable and suppress this output.
Let’s label the columns in our table and see how it’s looking:
colnames(fulldata)=c("u","v","probs","random")
head(fulldata)
u v probs random
[1,] 0.1550508 0.97418733 0.003000904 0.07467454
[2,] 0.9683788 0.71681979 0.060242742 0.67884647
[3,] 0.4682631 0.40560335 0.058582488 0.32032500
[4,] 0.7768197 0.10374040 0.002611655 0.37374839
[5,] 0.4078857 0.41535676 0.062013157 0.10154273
[6,] 0.5387971 0.08862091 0.005543453 0.88611921
We can now select all the rows where the rescaled value of the copula density is less than the
random number. This will include (or exclude) them with the correct probability.
subset=fulldata[random<=probs,]
head(subset)
u v probs random
[1,] 0.5754613 0.4629139 0.05298045 0.014864425
[2,] 0.1188054 0.2178978 0.08469126 0.068626715
[3,] 0.3267991 0.6934728 0.02822582 0.019385549
[4,] 0.7242487 0.3391443 0.02735654 0.007623463
[5,] 0.3472285 0.3843515 0.06567128 0.008908249
[6,] 0.8366818 0.5159717 0.04017116 0.016946312
(ii)(e) Efficiency
dim(subset)
[1] 41 4
So only 41 of the original 1,000 records have been included in the subset.
So this method is not very efficient as we have rejected almost 96% of the pairs that we
generated initially.
These values can differ quite a lot depending on the particular random values obtained using the
seed value chosen.
(iii)(a) Scatterplot
We can now plot the pairs of values of (u,v) remaining in our subset as a scatterplot:
plot(subset[,1],subset[,2],xlab="u",ylab="v",las=1,
main="Simulation of Clayton copula with alpha=2")
We’ve included the las=1 option (label axis style) to display the numbers on the vertical axis
‘the right way up’, ie without rotation, but this is purely a matter of personal preference and you
wouldn’t have to do this in the exam.
1.0
0.8
0.6
v
0.4
0.2
0.0
The points are definitely not distributed uniformly throughout the square (as they would be
with the independence / product copula).
There are very few points in the top-left and bottom-right corners.
The bulk of the points lie along the upward diagonal, which indicates a positive association
(correlation) between the values of u and v .
The points in the bottom-left corner form quite a sharp angle, indicating that there is high
lower tail dependence.
The points in the top-right corner are scattered widely, forming a blunt angle, indicating that
there is little or no upper tail dependence.
To calculate the coefficients of correlation, we can use the cor function, specifying which
measure we wish to calculate. The default is Pearson. Note that, despite being names, the
options have to be entered with a small letter.
cor(subset[,1:2],method="pearson")
u v
u 1.0000000 0.6188123
v 0.6188123 1.0000000
This function actually gives the complete correlation matrix. The figure we want here for the
Pearson correlation is the 0.6188, or 0.62 to 2 decimal places.
Similarly:
cor(subset[,1:2],method="spearman")
u v
u 1.0000000 0.5714286
v 0.5714286 1.0000000
cor(subset[,1:2],method="kendall")
u v
u 1.0000000 0.4268293
v 0.4268293 1.0000000
We can see that the Pearson and Spearman values (0.62 and 0.57) are quite similar, while the
Kendall value (0.43) is noticeably lower.
The Spearman value (0.57) is reasonably close to the theoretical value of 0.68, but there appears
to be quite a lot of random variation in the results. This is not surprising, given that the subset it
is based on contains only 41 pairs of values.
The Kendall value (0.43) is also reasonably close to the theoretical value of 0.5 (using the formula
given), but again there appears to be quite a lot of random variation in the results.
The values we have generated in our subset for u and v correspond to the percentile values
(ie the distribution function) for the two variables. We can now map these onto the correct
marginal distributions for x and y , which in this case is N(0,1) for both.
Recall that, to calculate the percentile values (quantiles) of a distribution in R, we prefix the
abbreviation for the distribution’s name with the letter ‘q’. So here we need:
qnorm(0.975)
[1] 1.959964
To apply this function to each of values in the vector of x values in our subset, we can use the
sapply function:
x=sapply(subset[,1],qnorm)
head(x)
y=sapply(subset[,2],qnorm)
head(y)
(v)(b) Plot
To see what they look like, we can plot the (x , y) pairs in a scatterplot.
plot(x,y,las=1,
main="N(0,1) values connected by Clayton copula with alpha=2")
Note that we don’t need to specify labels for the axes in this case, because R will by default use
the variables referred to in the arguments of the function, ie the “x” and “y” in this case.
1
y
-1
-2
-2 -1 0 1 2
These 41 pairs of values (x , y) now each have a N(0,1) distribution but are connected via the
Clayton copula with 2 .