Algorithms of The Copula Fit To The Nonlinear Processe - 2017 - Procedia Compute

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 104 (2017) 572 – 577

ICTE 2016, December 2016, Riga, Latvia

Algorithms of the Copula Fit to the Nonlinear Processes in the


Utility Industry
Andrejs Matvejevsa,*, Jegors Fjodorovsa, Anatoliy Malyarenkob
a
Riga Technical University,Kalku 1, Riga, LV1658, Latvia
b
Mälardalen University, Västerås, Sweden

Abstract

Our research studies the construction and estimation of copula-based semi parametric Markov model for the processes, which
involved in water flows in the hydro plants. As a rule analyzing the dependence structure of stationary time series regressive
models defined by invariant marginal distributions and copula functions that capture the temporal dependence of the processes is
considered. This permits to separate out the temporal dependence (such as tail dependence) from the marginal behavior (such as
fat tails) of a time series. Dealing with utility company data we have found the best copula describing data - Gumbel copula. As a
result constructed algorithm was used for an imitation of low probability events (in a hydro power industry) and predictions.

©
© 2017
2016TheTheAuthors.
Authors.Published
Publishedbyby
Elsevier B.V.
Elsevier This is an open access article under the CC BY-NC-ND license
B.V.
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016.
Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016
Keywords: Copula; Diffusion processes; Time series; Semi parametric regressions

1. Introduction

Our research studies the construction and estimation of copula-based semi parametric Markov model for the
processes, which involved in water flows in the hydro plants.
Copulas became popular in the finance and insurance community in the past years, where modeling and estimating
the dependence structure between several univariate times series are of great interest; see Frees and Valdez1and
Embrechts et al.2 for reviews.

* Corresponding author. Tel.: +371 26015121.


E-mail address: [email protected]

1877-0509 © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016
doi:10.1016/j.procs.2017.01.174
Andrejs Matvejevs et al. / Procedia Computer Science 104 (2017) 572 – 577 573

A copula function is a multivariate distribution function with standard uniform marginals. By Sklar’s3 theorem,
one can always model any multivariate distribution by modeling its marginal distributions and its copula function
separately, where the copula captures all the scale-free dependence in the multivariate distribution.
The central result of this theorem, which states that any continuous N-dimensional cumulative distribution
function F, evaluated at point x ( x1 ,  , x n ) can be represented as
F ( x) C ( F1 ( x1 ),  , Fn ( xn )), (1)
where C is called a copula function and Fi ( xi ) , i 1,  , n are the marginal distributions. The use of copulas
therefore splits a complicated problem (finding a multivariate distribution) into two simpler tasks. The first task is to
model the univariate marginal distributions and the second task is finding a copula that summarizes the dependence
structure between them.
The possibility of identifying nonlinear time series using nonparametric estimates of the conditional mean and
conditional variance were studied in many papers4. As a rule analyzing the dependence structure of stationary time
series {‫ݔ‬௧ ǡ ‫ }ܼ ג ݐ‬regressive models defined by invariant marginal distributions and copula functions that capture the
temporal dependence of the processes. As it indicated4 this permits to separate out the temporal dependence (such as
tail dependence) from the marginal behavior (such as fat tails) of a time series. One more advantage of this type
regressive approach is a possibility to apply probabilistic limit theorems for transition from deference equations to
continuous time stochastic differential equations5,6. In our paper, we also study a class of copula-based semi
parametric stationary Markov models in a form of scalar difference equation

t Z : Xt f ( X t 1 )  g ( X t 1 )[ t , (1a)

where {[ t , t  Z } is i.i.d., N(0; 1). Regressions (1a) are high-usage equations for simulation and parameter
estimation of stochastic volatility models ([2]). But, unfortunately defined by (1a) Markov chain has incompact
phase space that complicates an application of probabilistic limit theorem. Copula approach helps to simplify
asymptotic analysis of (1a). Let us remember that to construct a copula C(u; v) for pair { X t 1, X t } from (1a) one
should find a marginal invariant distribution F(x) for X t and to substitute this in joint distribution function
H ( x, y ) P ( X t 1 d x, X t d y ) , that is, C (u , v) H ( F 1 (u ), F 1 (v)) and H ( x, y ) C ( F ( x), F ( y )) .
After a substitution U t F ( X t ) in equation (1) for a further diffusion approximation one can write a difference
equation in a same form like (1a):

t  Z :Ut M (U t 1 )  \ (U t 1 )[ t . (2)

But now this equation defines Markov chain on the compact [0, 1]. This makes easier formulate construction for
transition probability and further estimators of functions fˆ (u ) and gˆ (u ) . After diffusion approximation of (2) one
can make inverse substitution and derive stochastic differential equation as diffusion approximation for (1a).
We found that the best copula describing data is Gumbel copula. As a result constructed equation (1a) was used
for low probability events imitation (hydro power industry) and predictions.

The paper is structured as follows. Section 2 describes our approach. In Section 3 we report our results for the
data Section 4 concludes and discusses several possible avenues of future research.

2. Evaluation of parameters for the semi parametric regression model

Copula based semi parametric models are characterized by conditional heteroscedasticity and have been often
used in modeling the variability of statistical data. The basic idea was to apply a local linear regression to the
squared residuals for finding the unknown functions f and g5,7.
574 Andrejs Matvejevs et al. / Procedia Computer Science 104 (2017) 572 – 577

Our methodology builds on the finding conditional expectation of the first and second order.
Let {Yt } be a stationary Markov process of order 1 with continuous state space. Then its probabilistic properties
are completely determined by the joint distribution function of {Yt 1 } and {Yt } . For the determination of the
copula based model we should use Markov model in the scalar difference equation in the form (1) with a small
parameter H . And our goal reduced to the estimation of conditional moments, which will be our base regression
model parameters:
g ( X t 1 , H ) and f ( X t 1 , H ) . (3)
As was mentioned above it is not easy task, especially this representation complicates an application of probabilistic
limit theorem. That is why; if we have stationary distribution our suggestion is to find parameters through Markov
chain using copula approach.
And due to persistence of the small parameter H , we can rewrite our expression:

t  Z :U t U t 1  H f (U t 1 , H )  H g (U t 1 , H )[ t
f (Ut1,H) E(Ut | Ut1 u) (3a)
g (U t 1 , H ) E ((U t 1  f (U t 1 , H )) 2 | U t 1 u). (4)

After conditional expectations of (3a) and (4) evaluation one can make inverse substitution and derive stochastic
differential equation as diffusion approximation for the base semi parametric model (1a). Of course, our algorithm
works only if inverse function exists. For example, Gamble copula, which don’t have standard inverse function.
Now we derived a tool for model (1a) parameters evaluation. For describing our idea briefly, let’s take a look in
the next section how works our algorithm with the true market data.

3. Practical approach of the proposed algorithm

We’ll analyze a historical observations of the equipment parameter (sample Y, please see Fig. 1). We have daily
data from 31.12.2000 till 31.12.2015. As a result of successful operations of the equipment we are interested in a
stable, low volatility process, but as it is in a real life, depending on weather conditions parameter values may vary
significantly. That is why our idea to get predictions for significant deviations of the observed values in the future.
Our main idea to set limit for allowed deviation and find an algorithm for finding distribution of the process which
reach this level. It is clear, we are dealing with heteroskedastic process and using first lag of the observation, i.e. we
can skip another factors which can involve this equipment stability and use just time series observations, we can use
copula densities and build semi parametric model.

Fig. 1. Historical parameter values (Y) of the utility company equipment.

An easiest way of parameters estimating of the semi regressive model for the time series would be to hold the
algorithm:
Andrejs Matvejevs et al. / Procedia Computer Science 104 (2017) 572 – 577 575

x Find marginal distributions for the observations of the equipment parameter


x Using marginal distribution, calculate U t points which is R[0,1] (uniform)
x Build scatter plot for (U t 1 ,U t )
x Make several statistical tests to find the suited distribution of data
x Taking into account scatter plot and distribution of data try to choose copula from existing class or build your
own copula, if you know marginal distributions
x Test copula consistency to data (for example, AIC and BIC, Kolmogorov distance etc.)
x Find regression parameters

Using Matlab program we have built scatter plots for Y transformed into uniform distribution (R[0,1]) and non
transformed data.

a b.

Fig. 2. (a) Scatter plot for non-transformed sample data; (b) Scatter plot for transformed into R[0,1] Y data.

As we see in the Fig. 2a time series Y has outliers. This make difficult to construct marginal distributions. Based
on the Kolmogorov – Smirnov test we tried different assumptions about marginal distributions and the best fit was
mixed of the exponential and the uniform distributions:

­1  e  O x , x  T
°
F ( x) ®  OT
,
°̄ H ( x  T )  e , T1  x  T

e  OT1
H ,
T  T1

where T is a size of sample and T1 is size of a sample without outliers.


Basically, taking into account margins we transformed into uniform distribution (R[0,1]) our observations. An
important issue faced by an applied researcher interested in using the class of semi parametric copula-based time
series models is the choice of an appropriate parametric copula. In different papers Chen et al.8 propose two simple
tests for the correct specification of a parametric copula in the context of modeling the contemporaneous
dependence between several univariate time series and of the innovations of univariate GARCH models used to
filter each univariate time series (2) Chen and Fan9 establish pseudo-likelihood ratio tests for selection of parametric
copula models for multivariate i.i.d. observations under copula misspecification4. But our suggestion is simpler – we
can choose the best copula fit using AIC and BIC criteria or using Kolmogorov –Smirnov test for data distribution.
We take for copula comparisons - Kolmogorov-Smirnov (KS) test (see Table 1).

DKS max C n (U 1,i ,U 2, j )  CT (U 1,i ,U 2, j )


i, j
576 Andrejs Matvejevs et al. / Procedia Computer Science 104 (2017) 572 – 577

Table 1. Kolmogorov – Smirnov test (distance) for Y data.


Copula KS value
Gumbel copula 0.67
Frank copula 0.65
Normal copula 0.18
T 0.7

Taking into account KS test results we should choose Normal copula for further model estimation. But Normal
copula leads us to linear dependence between random variable. But in our research we are interested in rare jumps of
the equipment parameter values. For this purpose we can take more tail dependence case – Gumbel copula and
based on this copula density derive semi parametric regression parameters:

C (U t  1 , U t ) exp >  ln U t 1
T
  ln U t
T 1/T
@, T 1, 2847 (5)

And insert expression (5) into conditional expectation, we get our parameters:
1 1 1 1
wC ut 1, ut
E Ut 1 | Ut u ³ut 1 dFut 1|ut u
0
³ut 1 p ut 1 | ut dut 1
0
³ut 1
0 wut 1wut
dut 1 ³u
0
t 1 c ut 1, ut dut 1 (6)

1
g (Ut 1 | Ut u) E (Ut 1  f (Ut ))2 | Ut u ³ (U t 1  f (Ut ))2 c Ut 1,Ut dUt 1
(7)
0

It is impossible to solve analytically (6) and (7) expressions. But numerically it is doable for example in the
Matab. For the Gumbel copula we can use inverse transformation with the aim to return to our base equation (1). Of
course, if we want use this model in practice, it is crucial to compare different class models, which could be suitable
for this data. Finally, we have depicted possible algorithm for constructing semi parametric copula based regressions
and find solutions for modelling processes with heteroskedastic nature. Proposed algorithm allows us to make
imitations of the process and find distribution of time when process reaches certain border. This is very critical in a
utility industry for making special preparations before equipment may go out of order. In the next sections, there is
example of process Y imitation based on Gumbel copula:

x Construct marginal distributions for data Yt


x Find copula and it parameters
x Estimate semi parametric regression model via copula

x
t
Construct iteration procedure in points n with small parameter h 0.01
x Find distribution of time W ( x, * ) to reach X n * via Monte Carlo imitation
x Make iterations of the 4-th step N times until
y (t n ) t F (* ) and remember number n ( k ) (after every iteration)

x Construct histogram of the


{n( k ) , k 1,..., N } and find distribution

4. Conclusions and discussions of the proposed algorithm

Having built algorithm for the constructions copula based regressions and taking into account process imitation
procedure (steps 1-4) we have modeled process Y via Gumbel copula based semi parametric regression (see in the
Fig. 3). Our model imitation results graphically closed to time series Y values. Basically, our imitations react on
volatility fluctuations. This gives possibility to use this model for evaluation border distributions.
Andrejs Matvejevs et al. / Procedia Computer Science 104 (2017) 572 – 577 577

Fig. 3. Historical and modeled values (Y) of the utility company equipment.

But if we deal with copulas we should not skip some facts. For example, it is not easy to say which parametric
copula best fits a given dataset, since some copulas may fit better near the center and other near the tails and many
copulas do not have moments that are directly related to the Pearson correlation, it is difficult to compare financial
models based on correlation.

References

1. Darsow W, Nguyen B, Olsen E. Copulas and Markov processes. Illinois Journal of Mathematics. 36; 1992. p. 600–642.
2. Joe H. Multivariate Models and Dependence Concepts. Chapman & Hall/CRC; 1997.
3. Frees EW, Valdez EA. Understanding relationships using copulas. North American Actuarial Journal. 2; 1998. p. 1–25.
4. Nelson DB. ARCH models as diffusion approximations. Journal of Econometrics. 7 (38); 1990. 441.
5. Chen X, Fan Y. Estimation of copula-based semiparametric time series models. Journal of Econometrics; 2006.
6. Ait-Sahalia Y, Kimmel R. Maximum likelihood estimation of stochastic volatility models. Journal of Financial Economics; 2007.
7. Fjodorovs J, Matvejevs A. Copula Based Semiparametric Regressive Models. Journal of Applied Mathematics. Vol. V; 2012. p.241-
248.
8. Chen X, Hansen LP, Carrasco M. Nonlinearity and temporal dependence. Working Paper, University of Chicago; 1998.
9. Chen X, Fan Y. Pseudo-likelihood ratio tests for model selection in semiparametric multivariate copula models. Canadian Journal of
Statistics; 2004.

Andrejs Matvejevs has graduated from Riga Technical University, Faculty of Computer
Science and Information Technology. He received his Doctoral Degree in 1989 and became
an Associate Professor at Riga Technical University in 2000 and a Full Professor in 2005. He
has made the most significant contribution to the field of actuarial mathematics. For more
than 30 years he has taught at Riga Technical University and Riga International College of
Business Administration, Latvia. His current professional research interests include
applications of Markov chains to actuarial technologies: mathematics of finance and security
portfolio. He is the author of about 80 scientific publications. Contact him at
[email protected].

You might also like