Nonlinear Nonparametric Statistics: Using Partial Moments
Nonlinear Nonparametric Statistics: Using Partial Moments
NONPARAMETRIC
STATISTICS: Using
Partial Moments
Fred Viole
David Nawrocki
Asymptotic Relationships 1
Autoregressive Modeling 93
Causation 147
References 187
Foreword
This book introduces a toolbox of statistical tools using partial moments that are both old
and new. Partial moment analysis is over a century old but most applications of partial
moments have not progressed beyond a substitution for simple variance analysis. Lower
partial moments have been in use in finance in portfolio investment theory for over 60
years. However, just as the normal distribution and the variance leads the statistician into
linear correlation and regression analysis, partial moments leads us towards nonlinear
correlation and nonparametric regression analysis. Using partial moments as a variance
measure is only the tip of the iceberg as the purpose of this book is to explore the entire
iceberg.
This partial moment toolbox is the “new” presented in this book. However, “new”
always should have some advantage over “old”. The advantage of using partial moments
is that it is nonparametric and does not require the knowledge of the underlying
probability function nor does it require a “goodness of fit” analysis. Partial moments
provide us with cumulative density functions, probability density functions, linear
correlation and regression analysis, nonlinear correlation and regression analysis,
ANOVA, and ARMA/ARCH models. This new toolbox is completely nonparametric
and provides a full set of probability hypothesis testing tools without knowing the
underlying probability distribution.
In this new advanced approach to nonparametric statistics, we merge the ideas of discrete
and continuous processes and present them in a unified framework predicated on partial
moments. Through the asymptotic property of partial moments, we show the two
schools of mathematical thought do not converge as commonly envisioned. The
increased observations approximate the continuous area of a function; versus stabilizing
on a discrete counting metric. However, it remains a strictly binary analysis: discrete or
continuous. The known properties generated from this continuous vs. discrete analysis
affords an assumption free analysis of variance (ANOVA) on multiple distributions.
In our correlation and regression analysis, linear segments are aggregated to describe a
nonlinear system. The computational issue is to avoid overfitting. However, since we
can effectively determine the signal to noise ratio, this consideration is alleviated
ultimately yielding a more robust result. By building off basic relationships between
variables, we are able to perform multivariate analysis with ease and transform
“complexity” into “tedious.” One major advantage with our work is that the partial
moment methodology fully replicates linear conditions or known functions. This trust of
methodology is important for transition to chaotic unknowns and forecasting with
autoregressive models.
*** All of the functions in this book are available in the R-package ‘NNS’ available on
CRAN: https://cran.r-project.org/web/packages/NNS/
ASYMPTOTICS
ࢌሺࡺࢋ࢚࢝ሻ
Abstract
We define the relationship between integration and partial moments through the
integral mean value theorem. The area of the function derived through both methods
share an asymptote, allowing for an empirical definition of the area. This is important in
that we are no longer limited to known functions and do not have to resign ourselves to
goodness of fit tests to define f(x). Our empirical method avoids the pitfalls associated
with a truly heterogeneous population such as nonstationarity and estimation error of the
parameters. Our ensuing definition of the asymptotic properties of partial moments to the
area of a given function enables a wide array of equivalent comparative analysis to linear
and nonlinear correlation analysis and calculating cumulative distribution functions for
both discrete and continuous variables.
NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 1
“Imagine how much harder physics would be if electrons had feelings.” - Richard Feynman
INTRODUCTION
Modern finance has an entrenched relationship with calculus, namely in the fields
of risk and portfolio management. Calculus by definition is the study of limits and
infinitesimal series. However, given the seemingly infinite amount of financial data
variable must be defined. Least squares methods and families of distributions have been
identified over the years to assist in this definition prerequisite. Once classified, variables
can be analyzed over specific intervals. Comparison of these intervals between variables
Unfortunately, there are major issues with each of the identified steps of the
preceding paragraph. When defining a continuous variable, you are stating that its shape
(via parameters) is fixed in stone (stationary). Least squares methods of data fitting make
no distinction whether a residual is above or below the fitted value, disregarding any
approximation of a function’s area “is a better fit” to its intended applications. Parsing
variances into positive or negative from a specified point is quite useful for nonlinear
2 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 3
correlation coefficients and multiple nonlinear regressions as demonstrated in [2]; and Where ݔ௧ is the observation of variable x at time t, h and l are the targets from which to
calculating cumulative distribution functions for both discrete and continuous variables compute the lower and upper deviations respectively, and n and q are the weights to the
[1]. lower and upper deviations respectively. We set ݊ǡ ݍൌ ͳ and ݄ ൌ ݈ to calculate the
negate the relevance of true population parameters estimated by the classical parametric
method. Estimation error and nonstationarity of the first moment, μ are testaments to the Partial moments resemble the Lebesgue integral, given by
underlying heterogeneity issue; leaving the nonparametric approach as the only viable
െ݂ሺݔሻǡ݂݂݅ሺݔሻ ൏ Ͳǡ
݂ ି ሺݔሻ ൌ ሺሼെ݂ሺݔሻǡ Ͳሽሻ ൌ ൜ ሺ͵ሻ
solution for truly heterogeneous populations. Our ensuing definition of the asymptotic Ͳǡ݁ݏ݅ݓݎ݄݁ݐǡ
properties of partial moments to the area of a given function enables a wide array of ݂ሺݔሻǡ݂݂݅ሺݔሻ Ͳǡ
݂ ା ሺݔሻ ൌ ሺሼ݂ሺݔሻǡ Ͳሽሻ ൌ ൜ ሺͶሻ
Ͳǡ݁ݏ݅ݓݎ݄݁ݐǤ
equivalent comparative analysis to the classical parametric approach.
In order to transform the partial moments from a time series to a cross-sectional dataset
OUR PROPOSED METHOD where x is a real variable, we need to alter equations 1 and 2 to reflect this distinction and
Integration and differentiation have been important tools in defining the area introduce the interval [a,b] for which the area is to be computed.
under a function ሺ݂ሺݔሻሻ since their identification in the 17th century by Isaac Newton and
Gottfried Leibniz. Approximation of this area is possible empirically with the lower and
ͳ
upper partial moments of the distribution presented in equations 1 and 2. ܯܲܮሺͳǡͲǡ ݂ሺݔሻሻ ൌ ሼሺെ݂ሺݔ ሻ ǡ Ͳሽ ݂݅ א ݔሾܽǡ ܾሿǡሺͷሻ
݊
ୀଵ
் ͳ
ͳ ܷܲܯሺͳǡͲǡ ݂ሺݔሻሻ ൌ ሼሺ݂ሺݔ ሻሻ ǡ Ͳሽ ݂݅ א ݔሾܽǡ ܾሿǤሺሻ
ܯܲܮሺ݊ǡ ݄ǡ ݔሻ ൌ ሼሺ݄ െ ݔ௧ ሻ ǡ Ͳሽ ሺͳሻ ݊
ܶ ୀଵ
௧ୀଵ
் We further constrained equations 5 and 6 by setting the target equal to zero for both
ͳ
ܷܲܯሺݍǡ ݈ǡ ݔሻ ൌ ሼሺݔ௧ െ ݈ሻ ǡ Ͳሽ ሺʹሻ
ܶ functions and consider the total number of observations n, rather than the time
௧ୀଵ
4 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 5
qualification T. The target for the transformed partial moment equations will be a
horizontal line, in this instance zero (x-axis); whereby all ݂ሺݔሻ Ͳ are positive and all Invoking the mean value theorem, where
݂ሺݔሻ ൏ Ͳ are negative area considerations, per the Lebesgue integral in equations 3 and ܨሺܾሻ െ ܨሺܽሻ
ܨԢሺܿሻ ൌ ሺͻሻ
ሺܾ െ ܽሻ
4.
Lebesgue integration also offers flexibility versus its Riemann counterpart; just as partial We have
moments offer flexibility versus the standard moments of a distribution. Equation 7 ܨԢሺܿሻ ൌ ሾܷܲܯ൫ͳǡͲǡ ݂ሺݔሻ൯ െ ܯܲܮሺͳǡͲǡ ݂ሺݔሻሻሿሺͳͲሻ
՜ஶ
illustrates the asymptotic nature of the partial moments as the number of observations
tends towards infinity over the interval [a,b].1 This is analogous to the number of ܨԢሺܿሻ using ο ݔof partition ݅ per the integral mean value theorem shows that
irregular rectangle partitions in other numerical integration methods.
ܨԢሺܿሻ ൌ ሾ݂ሺܿ ሻሺοݔ ሻሿ ሺͳͳሻ
ȁȁο௫ ȁȁ՜
ୀଵ
݂ሺݔሻ݀ݔ
ሾܷܲܯ൫ͳǡͲǡ ݂ሺݔሻ൯ െ ܯܲܮሺͳǡͲǡ ݂ሺݔሻሻሿ ൌ ሺሻ
՜ஶ ሺܾ െ ܽሻ
Thus demonstrating the inverse relationship involving:
Yielding,
Just as integrated area sums converge to the integral of the function with increased
ܨሺܾሻ െ ܨሺܽሻ
ሾܷܲܯ൫ͳǡͲǡ ݂ሺݔሻ൯ െ ܯܲܮሺͳǡͲǡ ݂ሺݔሻሻሿ ൌ ሺͺሻ
՜ஶ ሺܾ െ ܽሻ rectangle areas partitioned over the interval of ݂ሺݔሻ,2 equation 7 shares this asymptote
2
Provided ܨis differentiable everywhere on [a,b] and ܨԢ is integrable on [a,b]. The partial moment term of
the equality in equation 12 makes no such suppositions. The total area, not just the definite integral is
ܾ
1
Detailed examples are offered in Appendix A. simply ቚܽ ݂ሺݔሻ݀ݔቚ ൌ ሾܷܲ ܯቀͳǡͲǡ ݂ሺݔሻቁ ܯܲܮሺͳǡͲǡ ݂ሺݔሻሻሿ
݊՜λ
6 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 7
equal to the integral of the function. This is demostrated above with equation 12. If one Once ܨԢሺܿሻ is defined, we can use the method of leading coefficients to determine the
can define the function of the asymptotic areas ࡲԢሺࢉሻ (UPM+LPM), then one can find horizontal asymptote. Figure 1 above has a horizontal asymptote of zero. However, once
the asymptote or integral of the function directly from observations. ܨԢሺܿሻ is defined the dominant assumption is that of stationarity of function parameters at
time t. Integral calculus is not immune from this stationarity assumption as݂ሺݔሻ needs to
FINDING THE HORIZONTAL ASYMPTOTE
be defined in order to integrate and differentiate. Since we are not defining ݂ሺݔሻ, we
The horizontal asymptote is the horizontal line that the graph of ܨԢሺܿሻ as ݊ ՜ λ.
have the luxury of recalibrating with each data point to capture the nonstationarity;
This asymptote is equal to ሾܨሺܾሻ െ ܨሺܽሻሿȀሺܾ െ ܽሻ for the interval [a,b] where ܽ ൏ ܾ.
consequently updating ܨԢሺܿሻ.
Goodness of fit tests also assume a stationarity on the parameters; detracting from
DISCUSSION
To define, or not to define: that is the question. If we define ܨԢሺܿሻ we can find the
exact asymptote, thus area of ݂ሺݔሻ. If we appreciate the fact that nothing in finance
seems to be guided by an exactly defined function, the measured area of ݂ሺݔሻ over the
interval [a,b] will likely change over time due to the multiple levels of heterogeneity
defined ܨᇱ ሺܿሻ݂ݎሺݔሻ? The next observation may very well lead to a redefinition.
8 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 9
Our proposed method of closely approximating the area of a function over an APPENDIX A: EXAMPLES OF KNOWN FUNCTIONS USING EQUATION 7
interval with partial moments is an important first step in enjoining flexibility into ࢌሺ࢞ሻ ൌ ࢞
finance versus integral calculus. We shed the dependence on stationarity, and alleviate To find the area of the function over the interval [0,10] for ݂ሺݔሻ ൌ ݔଶ , we differentiate
௫య ଵ
according to x yielding ܨሺݔሻ ൌ . ܨሺͳͲሻ െ ܨሺͲሻ ൌ െ Ͳ ൌ ͵͵͵Ǥ͵͵
the need for goodness of fit tests for underlying function definitions. Moreover, if the ଷ ଷ
We are hopeful over time this method will be refined and expanded in order to [1] 35
> x=seq(0,10,.1);y=x^2;UPM(1,0,y)-LPM(1,0,y)
bring a more robust and precise method of analysis then currently enjoyed; while
[1] 33.5
avoiding the pitfalls associated with the parametric approach on a truly heterogeneous
> x=seq(0,10,.02);y=x^2;UPM(1,0,y)-LPM(1,0,y)
population.
[1] 33.36667
> x=seq(0,10,.01);y=x^2;UPM(1,0,y)-LPM(1,0,y)
[1] 33.35
Figure 2. Asymptotic partial moment areas for ࢞ ࢊ࢞Ǥ
10 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 11
ࢌሺ࢞ሻ ൌ ξ࢞
APPENDIX B: PERFECT UNIFORM SAMPLE ASSUMPTION ቆ ܕܑܔ ൌ ܕܑܔቇ
หȁο࢞ ȁห՜ ՜ஶ
To find the area of the function over the interval [0,10] for ݂ሺݔሻ ൌ ξݔ, we differentiate
య
ଶ௫ మ ଷǤଶସହ We can see from an analysis of samples over the interval [0,100] as the number of
according to x yielding ܨሺݔሻ ൌ . ܨሺͳͲሻ െ ܨሺͲሻ ൌ െ Ͳ ൌ ʹͳǤͲͺ observations tends towards ∞, the observations approach a perfect uniform sample in
ଷ ଷ
Figure 1b. However, when using a sample representing irregular partitions, (more
Using equation 7 in the ‘NNS’ package in R, we know ܨԢሺܿሻ should converge to
ଶଵǤ଼
realistic of observations than completely uniform) the length of observations required to
ଵ
ʹݎǤͳͲͺǤ achieve perfect uniformity is greater than by assuming it initially. This condition speaks
volumes to misinterpretations of real world data when limit conditions are used as an
> x=seq(0,10,1);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)
artifact of fitting distributions.
[1] 2.042571
> x=seq(0,10,.1);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)
[1] 2.102329
> x=seq(0,10,.02);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)
[1] 2.107075
> x=seq(0,10,.01);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)
[1] 2.107638
Figure 1b. Randomly generated uniform sample over the interval approaches
perfect uniform as number of observations goes to infinity.
Figure 3. Asymptotic partial moment areas for ξ࢞ࢊ࢞Ǥ
DISCRETE VS.
CONTINUOUS
DISTRIBUTIONS
Cumulative Distribution Functions and UPM/LPM Analysis
Abstract
I. Introduction:
The Empirical Cumulative Distribution Function (EDF) should, most of the time, be a
good approximation of the true cumulative distribution function (CDF) as the sample set
increases. This generalization is at the heart of statistics. Means and variances are used
to assign and fit a distribution, but partial moments stabilize with a smaller sample size
The empirical CDF is a simple construct. It is simply the number of observations less
than or equal to a target, divided by the total number of observations in a given data set.
The problem with extrapolating these results to an assumed true CDF is that the discrete
empirical CDF is extremely sensitive to sample size,3 and any parameter nonstationarity
will deteriorate the fit to the true distribution. The paper is organized as follows:
First, we propose a method to derive the CDF and PDF of the EDF, utilizing the upper
and lower partial moments (UPM and LPM respectively) of the EDF. The benefits are
obvious, such as compensating for any observed skewness and kurtosis that would force a
more esoteric distribution family onto the data. These measurements require zero
true distribution. Partial moments also happen to exhibit less sample size sensitivity than
Next, this foundation is then used to develop conditional probabilities and joint
formulation for UPM/LPM analysis and we note that each of the co-partial moment The Upper and Lower partial moment formulas are below in Equations 1 and 2:
II. Deriving Cumulative Distribution and Partial Density Functions Using where ݔ௧ represents the observation x at time t, n is the degree of the LPM, q is the degree
Partial Moments
of the UPM, h is the target for computing below target returns, and l is the target for
derived from degree one partial moments will approximate the area derived from the
integral of the function over an interval [a,b] asymptotically. This asymptotic numerical
integration is shown in Viole and Nawrocki (2012c) and represented with equation (3).
݂ ሺݔሻ݀ݔ
ሾܷܲܯ൫ͳǡͲǡ ᐦሺݔሻ൯ െ ܯܲܮሺͳǡͲǡ ᐦሺݔሻሻሿ ൌ ሺ͵ሻ
௧՜ஶ ሺܾ െ ܽሻ
We use a degree zero (n=q=0) to generate a discrete analysis, replicating results from
Figure 1. A distribution dissected into its two partial moment segments,
red LPM and blue UPM, from a shared target. the conventional CDF and PDF methodology. Degree one (n=q=1) is used to generate
4
Equations 1 and 2 will generate a 0 for degree 0 instances of 0 results.
20 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 21
relative frequency and probability investigation; while the continuous analysis integrates The point probability is often included in the CDF calculation but it is not uniformly
a variance consideration to capture the rectangles of infinitesimal width in deriving an treated as less than or equal to the target.5
area under a function. Standard deviation remains stable as sample size range increases,
Theorem 1,
thus it is not an accurate barometer of the area of the function to estimate a continuous
ܲሼܺ ൏ ݔሽ ܲሼܺ ݔሽ ܲሼܺ ൌ ݔሽ ൌ ͳሺͶሻ
variable. Figure 2 illustrates the range increase as the number of observations increase
for a normal distribution with μ=10 and σ=20 for 5 million random draws from a normal
If,
distribution. ܯܲܮሺͲǡ ݔǡ ܺሻ ߝ
ܲሼܺ ݔሽ ൌ ܯܲܮ௧ ሺͲǡ ݔǡ ܺሻ ൌ െ ሺͷሻ
ሾܯܲܮሺͲǡ ݔǡ ܺሻ ܷܲܯሺͲǡ ݔǡ ܺሻሿ ʹ
30000
60000
90000
6000
100
250
400
600
900
3000
9000
300000
600000
900000
2000000
3500000
5000000
ܷܲܯሺͲǡ ݔǡ ܺሻ ߝ
ܲሼܺ ݔሽ ൌ ܷܲܯ௧ ሺͲǡ ݔǡ ܺሻ ൌ െ ሺሻ
Observations ሾܯܲܮሺͲǡ ݔǡ ܺሻ ܷܲܯሺͲǡ ݔǡ ܺሻሿ ʹ
Figure 2. Range for a randomly generated normal distribution μ=10 and σ=20 for 5
5
million random draws. There is no consensus language for CDF definitions. Some instances are “൏ ”ݔwhile others reference
“ ”ݔdepending on the distribution, discrete or continuous. We are uniform in our treatment of
Just as the probability of two mutually exclusive events equal one, the sum of the distributions with
“ ”ݔfor both discrete and continuous distributions. See
http://www.mathworks.com/help/toolbox/stats/unifcdf.html and
ratios - LPM to the entire distribution; and UPM to the entire distribution (ܯܲܮ௧ and http://www.mathworks.com/help/toolbox/stats/unidcdf.html for treatment of the target, ݔ.
6
It is important to note that ܯܲܮሺͲǡ ݔǡ ܺሻ is a probability measure and will yield a result from 0 to 1. Thus,
ܷܲܯ௧ respectively) plus the point probability, equal one as in equations 8 and 8a. the ratio of ܯܲܮሺͲǡ ݔǡ ܺሻ to the entire distribution (ܯܲܮ௧ ሺͲǡ ݔǡ ܺሻ) is equal to the probability measure
itself, ܯܲܮሺͲǡ ݔǡ ܺሻ.
22 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 23
integral of a point equals zero. Thus for a continuous distribution, there is no difference
ܷܲܯ௧ ሺͳǡ ݔǡ ܺሻ ് ܷܲܯሺͳǡ ݔǡ ܺሻሺܾሻ
between ܲሼܺ ൏ ݔሽand ܲሼܺ ݔሽ since ߝ ൌ ͲǤ If one wishes to subscribe to the notion
that the sum of an infinite amount of points each equal to zero must sum to one per the
Since the entire normalized distribution is represented by,
integral definition, then equation 7 is simply reduced to equation 8a for continuous
variables. However, equation 7 with degree 1 can also be used for the continuous
ܯܲܮሺͲǡ ݔǡ ܺሻ ߝ ܷܲܯሺͲǡ ݔǡ ܺሻ ߝ
ቈ െ ቈ െ ߝ ൌ ͳ
ሾܯܲܮሺͲǡ ݔǡ ܺሻ ܷܲܯሺͲǡ ݔǡ ܺሻሿ ʹ ሾܯܲܮሺͲǡ ݔǡ ܺሻ ܷܲܯሺͲǡ ݔǡ ܺሻሿ ʹ variable to compensate for ߝ Ͳ and generate a normalized continuous probability.
(7)
Where ߝ is the point probability ܲሼܺ ൌ ݔሽ. The use of an empty set for ߝ yields,
A. Review of the Literature
ܯܲܮሺͲǡ ݔǡ ܺሻ ܷܲܯሺͲǡ ݔǡ ܺሻ ൌ ͳሺͺሻ Guthoff et al (1997) illustrate how the value at risk of an investment is equivalent to
the degree zero LPM. We confirm this derivation as the degree zero LPM does indeed
ܯܲܮ௧ ሺͳǡ ݔǡ ܺሻ ܷܲܯ௧ ሺͳǡ ݔǡ ܺሻ ൌ ͳሺͺܽሻ
provide a normalized solution. However, critical errors were made by Guthoff and in
subsequent works by Shadwick and Keating (2002), and Kaplan and Knowles (2004).
For a discrete distribution, an empty set for target observations lowers both
ܯܲܮሺͲǡ ݔǡ ܺሻ and ܷܲܯሺͲǡ ݔǡ ܺሻ simultaneously so that Equation 8 still equals one with The omega ratio is defined as,
ஶ
ܯܲܮሺͲǡ ݔǡ ܺሻ ൌ ܲሼܺ ݔሽ and ܷܲܯሺͲǡ ݔǡ ܺሻ ൌ ܲሼܺ ݔሽǤ The point probability ߝ for a த ሾͳ െ ܨሺܴሻሿܴ݀
ȳሺɒሻ ൌ த ሺͻሻ
ିஶ ܨሺܴሻܴ݀
discrete distribution can easily be computed by the frequency of the specific point
Where F(.) is the CDF for total returns on an investment and ɒ is the threshold return.
divided by the total number of observations. The point probability would be more
Guthoff and Shadwick and Keating’s error was the use of degree one LPM (area) on a
relevant in a discrete distribution of integers, and has an inverse relationship to the degree
degree 0 LPM, the probability CDF of the distribution. Degree one LPM does not need
of specification of the underlying variable. As the specification approaches infinity, ߝ
to be performed on the probability CDF as they present.
approaches zero.
24 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 25
Kaplan and Knowles’ error was the dismissal of the degree zero LPM (0-th root of
something does not exist) which we show equals historical CDF measurements for
various distributions. Also, ඥܯܲܮ ሺɒሻ forces concavity upon increased n, which do not
ܽ
presume such a condition.
Discrete,
Continuous,
7
Figure 7 offers a visual representation of the difference between continuous and discrete CDFs of the
mean.
26 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 27
B. Methodology Notes:
ܯܲܮሺͳǡ ܽǡ ݔሻ ߝ ܷܲܯሺͳǡ ܾǡ ݔሻ ߝ
െ െ
ሾܷܲܯሺͳǡ ܽǡ ݔሻ ܯܲܮሺͳǡ ܽǡ ݔሻሿ ʹ ሾܷܲܯሺͳǡ ܾǡ ݔሻ ܯܲܮሺͳǡ ܾǡ ݔሻሿ ʹ We generated random distributions for 5 million observations. We then took 300
iterations with different seeds and averaged them. For stability estimates, we generated
mean average deviations (MAD) for each statistic over the 300 iterations for observations
30 through 5 million.
ܽ ܾ
The statistics used in the following discussion are as follows: CHIDF(target) -
Figure 4. Probability Density Function for the intervalሾࢇǡ ࢈ሿǤ
Cumulative distribution function for the Chi-square distribution and specified target;
Kurtosis - Relative Kurtosis measure of the entire sample; Mean - μ of the entire sample;
Probability Density Function (PDF) using partial moments: Norm Prob(target) - Cumulative distribution function for the Normal distribution and
specified target; POIDF(target) - Cumulative distribution function for the Poisson
ܲሾܽ ݔ ܾሿ ൌ න ݂ሺݔሻ݀ݔሺͳሻ
distribution and specified target; Range - Max observation – min observation for the
Discrete,
entire sample; SemiDev - Semi-deviation of the sample using mean as the target; Skew -
ܲሾܽ ݔ ܾሿ ൌ ܯܲܮሺͲǡ ܾǡ ݔሻ െ ܯܲܮሺͲǡ ܽǡ ݔሻሺͳܽሻ
Skewness measure of the entire sample;
Continuous,
StdDev - Standard deviation of the sample; UNDF(target) - Cumulative distribution
ܲሾܽ ݔ ܾሿ ൌ ܯܲܮ௧ ሺͳǡ ܾǡ ݔሻ െ ܯܲܮ௧ ሺͳǡ ܽǡ ݔሻሺͳܾሻ
function for the Uniform distribution.
All of the above mentioned distributions and targets can be easily verified by the
reader with statistical software such as the ISML subroutine library. Furthermore, the
direct computation of the partial moments can also be easily implemented into such
Normal Distribution: ߤ ൌ ͳͲǤͲͲͲͳͺߪ ൌ ͳͻǤͻͻͻ LPM increases if the target is greater than the mean, the continuous CDF will be
Poisson Distribution: ߠ ൌ ͻǤͻͻͻͻͳͶ consistently higher than the discrete CDF. This holds for all distribution families. The
Uniform Distribution ߤ ൌ ͳͲǤͲͲͲͶͷ continuous and discrete probabilities are obviously equal at the endpoints of the
Chi-Square Distribution: ݒൌ ͳ ߤ ൌ ͲǤͻͻͻͻͶ
distribution, 0 and 1 for minimum and maximum respectively.
C. Normal Distribution
CDFs for 0% Target
We compare our metric to the traditional CDF, Φ, of a standard normal random variable. 0.38
௫ ି௧ మ
0.36
ͳ
Ȱሺݔሻ ൌ න ݁ ଶ ݀ݐ 0.34
ξʹߨ ିஶ
0.32
Probability
The probability generated from the normal distribution converges to ܯܲܮሺͲǡͲǡ ܺሻin 0.3
Norm Prob(0)
0.28
approximately 90 observations as shown in Figure 5. ܯܲܮሺͲǡͲǡ ܺሻ stabilizes with less
LPM(0,0,X)
0.26
observations than the normal probability (exhibiting a lower MAD) as shown in LPM(1,0,X)
0.24
0.22
Appendix A, Table 1a. This is proof that ܯܲܮሺͲǡͲǡ ܺሻis indeed the discrete CDF of the
0.2
148
286
424
562
700
838
976
1114
1252
1390
1528
1666
1804
1942
2080
2218
2356
2494
2632
2770
2908
10
distribution for the area less than the target. While the normal probability is less than or
Observations
equal to the target compared to less than for ܯܲܮሺͲǡͲǡ ܺሻ; the probability of the specific
target outcome does not affect the probability to the specification of four decimal places. Figure 5. CDF of 0% target for Normal distribution with μ=10 and σ=20 parameter
constraints.
The relationship between ܯܲܮ௧ ሺͳǡͲǡ ܺሻ, ܯܲܮሺͲǡͲǡ ܺሻ and the normal probability
or Norm Prob(0) is shown in Figure 5. The further from the mean, the greater the
discrepancy between the continuous and discrete CDF as seen in Figure 6. As the area of
the distribution increases for the UPM if the target is less than the mean, the continuous
CDF will be consistently lower than the discrete CDF. Conversely, as the area of the
30 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 31
0.4 0.505
Probability
Probability
0.35
LPM(0,0,X) 0.5
0.3 LPM(1,0,X)
0.495 LPM(0,μ,x)
0.25 LPM(1,4.5,X)
LPMratio(1,μ,x)
LPM(0,4.5,X) 0.49
0.2
2080
2494
2908
148
286
424
562
700
838
976
1114
1252
1390
1528
1666
1804
1942
2218
2356
2632
2770
10
0.485
10
1858
1066
1198
1330
1462
1594
1726
1990
2122
2254
2386
2518
2650
2782
2914
142
274
406
538
670
802
934
Observations
Observations
In Figure 7, the plot shows the convergence of the discrete LPM degree 0 from the
Above and Below Mean CDFs
mean to the continuous LPM degree 1 using the mean as the target return. The discrete 0.7
0.65
isn’t stable until around 1000 observations. 0.6
Probability
0.55 LPM(1,13.5,X)
0.5 LPM(0,13.5,X)
0.45
LPM(1,4.5,X)
0.4
LPM(0,4.5,X)
0.35
0.3 LPM(1,u,X)
1666
148
286
424
562
700
838
976
1114
1252
1390
1528
1804
1942
2080
2218
2356
2494
2632
2770
2908
10
Observations
Figure 8. Different locations of the target versus the mean and relationships between
discrete and continuous CDFs.
32 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 33
In Figure 8, we used different targets of 4.5%, 9% (mean), and 13.5% and we see that Table 2 below shows the convergence of our metric to the traditional method for the
the continuous is outside of the range of the discrete measures. Note that with the mean uniform CDF (UNDF) with a mean of 10. The results are the same as we noted for the
as the target, the continuous measure is rock solid on the 50% probability. normal distribution in Table 1.
Normal Distribution Probabilities - 5 Million Draws 300 Iteration Seeds Uniform Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
Norm Prob(X ≤ 0.00) = .3085 LPM(0, 0, X) = .3085 LPM(1, 0, X) = .2208 UNDF(X ≤ 0.00) = .4 LPM(0, 0, X) = .4 LPM(1, 0, X) = .3077
Norm Prob(X ≤ 4.50) = .3917 LPM(0, 4.5, X) = .3917 LPM(1, 4.5, X) = .3339 UNDF(X ≤ 4.50) = .445 LPM(0, 4.5, X) = .445 LPM(1, 4.5, X) = .3913
Norm Prob(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5 UNDF(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5
Norm Prob(X ≤ 13.5) = .5694 LPM(0, 13.5, X) = .5694 LPM(1, 13.5, X) = .608 UNDF(X ≤ 13.5) = .535 LPM(0, 13.5, X) = .535 LPM(1, 13.5, X) = .5697
Table 1. Final probability estimates with 5 million observations and 300 iteration seeds Table 2. Uniform distribution results illustrate convergence of ࡹࡼܮሺǡ ࢞ǡ ࢄሻ to UNDF and
averaged for the Normal distribution. consistent relationship between ࡹࡼܮሺǡ ࢞ǡ ࢄሻ and ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࢞ǡ ࢄሻ above and below the
mean target.
In Table 1, we see that the LPM degree 0 provides equivalent probabilities as the
Normal Probability function from the IMSL library. The continuous probability using
E. Poisson Distribution
the LPM degree 1 is at 0.5 for the mean as a target and has a lower probability below the
We compare our metric to the traditional Poisson CDF (POIDF) for values less than or
mean and a higher probability above the mean as we have noted previously.
equal to X.
ିఏ
ߠ௫
݂ሺݔሻ ൌ ݁
ݔǨ
D. Uniform Distribution
Poisson Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
We compare our metric to the traditional uniform CDF for values less than or equal to x.
POIDF(X ≤ 0.00) = .00005 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0
Ͳǡ݂݅ ݔ൏ ܣ POIDF(X ≤ 4.50) = .0293 LPM(0, 4.5, X) = .0293 LPM(1, 4.5, X) = .0051
ݔെܣ POIDF(X ≤ Mean) = .5151 LPM(0, μ, X) = .5151 LPM(1, μ, X) = .5
ሺݔȁܣǡ ܤሻ ൌ ൞ ǡ݂݅ ܣ ݔ ܤ POIDF(X ≤ 13.5) = .8645 LPM(0, 13.5, X) = .8645 LPM(1, 13.5, X) = .9365
ܤെܣ
ͳǡ݂݅ ݔ ܤ
34 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 35
Table 3. Poisson distribution results illustrate convergence of ࡹࡼܮሺǡ ࢞ǡ ࢄሻ to POIDF and G. Continuous Distributions:
consistent relationship between ࡹࡼܮሺǡ ࢞ǡ ࢄሻ and ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࢞ǡ ࢄሻ above and below the
mean target.
In a discrete measurement with a zero target, there is no difference between a 40%
F. Chi-Square Distribution
observation and a 70% observation as both will yield a single positive count in the
We compare our metric to traditional chi-square CDF (CHIDF) for values less than or
equal to X. frequency (both were observed in our normal distribution generation with μ=10 and σ=20
௫
ͳ ି௧ ௩ parameter constraints). However, there is considerable area between these two
ܨሺݔሻ ൌ ௩ ݒ න ݁ ଶ ݐଶିଵ ݀ݐ
ʹଶ Ȟሺ ሻ
ʹ observations that merely gets binned in a probability analysis. This undesirable construct
We set the degrees of freedom for the chi-square equal to one. The reason for this
also has the ubiquitous quality of scale invariance. Equation (14) measures this neglected
arbitrary selection is the distinct curve generated by this parameter value, and its likeness
area with its inherent variance consideration simultaneously factored with the discrete
to the power law distribution. There is no a priori argument that the degrees of freedom
frequency analysis.
will affect our methodology given its non-parametric derivation.
“All actual sample spaces are discrete, and all observable random
variables have discrete distributions. The continuous distribution is a
Chi-Squared Distribution Probabilities - 5 Million Draws 300 Iteration Seeds mathematical construction, suitable for mathematical treatment, but not
CHIDF(X ≤ 0) = 0 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0 practically observable.” E.J.G. Pitman (1979).
CHIDF(X ≤ 0.5) = .5205 LPM(0, 0.5, X) = .5205 LPM(1, 0.5, X) = .2087
CHIDF(X ≤ 1) = .6827 LPM(0, 1, X) = .6827 LPM(1, 1, X) = .5
CHIDF(X ≤ 5) = .9747 LPM(0, 5, X) = .9747 LPM(1, 5, X) = .989
ܯܲܮ௧ degree of 1 (n=q=1) permits us to calculate the area “between the bins.” For
example, in a roll of a die, the area of the function between 3.1 and 3.9 will be static for
Table 4. Chi-Square distribution results illustrate convergence of ࡹࡼܮሺǡ ࢞ǡ ࢄሻ to UNDF
and consistent relationship between ࡹࡼܮሺǡ ࢞ǡ ࢄሻ and ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࢞ǡ ࢄሻ above and below the discrete method (based on integer bins 1-6). If the distribution were actually
the mean target.
continuous, the variance influence in ܯܲܮ௧ degree 1 generates an accurate
measurement of the area 3.1 through 3.9 for this area between the bins - for uniform and
all other distributions. Furthermore, the mean for a die roll is approximately 3.5.
ܯܲܮ௧ degree 1 generates a 0.5 result for the CDF with the 3.5 mean as the target in a
are not able to generate a continuous distribution to observe and verify this notion for III. Joint Distribution Co-Partial Moments and UPM/LPM Analysis
target values other than the mean (which we prove always equal 0.5) or endpoints (0 or 1
In this section, we introduce the framework for the joint distribution using partial
for sample minimum and maximum). The consistent observed relationship we
moments. For more background, Appendix B and Appendix C provide more information
demonstrated between ܯܲܮ௧ ሺͳǡ ݔǡ ܺሻ and ܯܲܮሺͲǡ ݔǡ ܺሻ for targets above and below
on joint probabilities and conditional CDFs. We also replicate the covariance matrix of a
the mean, offers considerable support of the continuous estimates.
two variable normal distribution and its cosemivariance matrix with the variables’
A better example to distinguish between discrete and continuous analysis is the chi- aggregated partial moment components. This information provides a toolbox that yields
square distribution with degrees of freedom set to one. The range of the observations a positive definite symmetrical co-partial moment matrix capable of handling any target
extended to X=35.1 and resembles the power law function. Considering μ=1.0 and and resulting asymmetry, providing a distinct advantage over its cosemivariance
σ=1.414, the discrete probability of a mean return was 0.6827 as shown in Table 4. counterpart.
However, if one envisions the decreasing thin slice of area under the function all the way
The issue in this area traces back to the Markowitz (1959) chapter on semivariance
down the x-axis to the observation X=35.1, this extended result only generates a reading
analysis. The cosemivariance matrix in Markowitz is an endogenous matrix that is
of one in its probability calculation of x > μ. No different than an observation of X=11
computed after the portfolio returns have been computed. Because we have to know the
which is also a positive count in this example. The frequency of X=11 is the
portfolio allocations before we can compute the portfolio returns, the cosemivariance
distinguishing characteristic. This difference in area between 11 and 35.1 is considerable
matrix is not known until after we have solved the problem. Attempts to solve the mean-
and is completely disregarded under discrete frequency analysis. When the variance of
semivariance problem with an exogenous matrix, a matrix computed from the security
that deviation is considered to account for the infinite possible outcomes for the
return data, have had problems because the cosemivariance matrix is asymmetric, and
continuous variable, the probability of a mean return drops significantly to 0.5 from
therefore, not positive semi-definite. Grootveld and Hallerbach (1999) noted that the
0.6827.
endogenous and exogenous matrices are not equivalent. Estrada (2008), however,
The reason for this is straightforward, ܯܲܮሺͲǡ ݔǡ ܺሻ converges to the frequency / demonstrates that a symmetric exogenous matrix is a very good approximation for the
counting data set while ܯܲܮ௧ ሺͳǡ ݔǡ ܺሻ retains its area property.
38 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 39
endogenous matrix. Our purpose is to demonstrate a method that provides a positive And the covariance between 2 variables is simply
semi-definite matrix system that preserves any asymmetry in the underlying process. ்
ͳ
ߪ௫௬ ൌ ൬ ൰ ή ሺݔ௧ െ ߤ௫௧ ሻሺݕ௧ െ ߤ௬௧ ሻ ሺʹ͵ሻ
ܶ
௧ୀଵ
First, the LPM and the CLPM are defined as follows:
்
ͳ Since semivariance from benchmark B is
ܯܲܮሺ݊ǡ ݄ǡ ݔሻ ൌ ݉ܽݔሼͲǡ ݄ െ ݔ௧ ሽ ൩ሺͳͺሻ
ܶ ்
௧ୀଵ ͳ
ଶ
ȭ௫ ൌ ሼሾ݉݅݊ሺ ݔെ ܤǡ Ͳሻ ሿ ൌ ൬ ൰ ή ሾሺݔ௧ െ ܤǡ Ͳሻሻଶ ሿ ሺʹͶሻ
ଶ
ܶ
் ௧ୀଵ
ͳ
ܯܲܮܥሺ݊ǡ ݄ǡ ݔȁݕሻ ൌ ሺ݉ܽݔሼͲǡ ݄ െ ݔ௧ ሽ ή ݉ܽݔሼͲǡ ݄ െ ݕ௧ ሽ ሻ൩ሺͳͻሻ
ܶ
௧ୀଵ
Then it is also the cosemivariance of itself
்
The Degree 1 Co-LPM (CLPM) matrix is: ͳ
ȭ௫௫ ൌ ൬ ൰ ή ሾ݉݅݊ሺݔ௧ െ ܤǡ Ͳሻሿሾ݉݅݊ሺݔ௧ െ ܤǡ Ͳሻሿ ሺʹͷሻ
ܶ
௧ୀଵ
ܯܲܮሺʹǡ ݄ǡ ݔሻ ܯܲܮܥሺͳǡ ݄ǡ ݔȁݕሻ
൨
ܯܲܮܥሺͳǡ ݄ǡ ݕȁݔሻ ܯܲܮሺʹǡ ݄ǡ ݕሻ And the cosemivariance between 2 variables is
்
ͳ
ȭ௫௬ ൌ ൬ ൰ ή ሾ݉݅݊ሺݔ௧ െ ܤǡ Ͳሻሿሾ݉݅݊ሺݕ௧ െ ܤǡ Ͳሻሿ ሺʹሻ
ܯܲܮሺʹǡ ݄ǡ ݔሻ ൌ ܯܲܮܥሺͳǡ ݄ǡ ݔȁݔሻሺʹͲሻ ܶ
௧ୀଵ
And the Co-LPM degree 1 between 2 variables is The main diagonal of the aggregated matrix will retain the covariance equivalence under
்
ͳ any asymmetry with the following relationship for all targets,
ܯܲܮܥሺͳǡ ܤǡ ݔȁݕሻ ൌ ൬ ൰ ή ሾ݉ܽݔሺ ܤെ ݔ௧ ǡ Ͳሻሿሾ݉ܽݔሺ ܤെ ݕ௧ ǡ Ͳሻሿሺʹͻሻ
ܶ
௧ୀଵ
ߪ௫ଶ ൌ ܯܲܮሺʹǡ ߤǡ ݔሻ ܷܲܯሺʹǡ ߤǡ ݔሻሺ͵Ͳሻ
Equation (31) will generate a zero instead of a negative covariance result, ensuring a
positive matrix. This zero (instead of the negative) result does not affect the preservation
of information for the instances whereby one variable is above the target and one below.
Furthermore, the addition of the Co-LPM matrix, the Co-UPM matrix is equivalent to the
The addition of this observation to the complement set lowers both the CLPM and
covariance matrix on the main diagonal.
CUPM. In essence, nothing is something.
We note that each of the co-partial moment matrices are positive symmetrical semi-
ܯܲܮܥሺͳǡ ݄ǡ ݔȁݕሻ ܯܷܲܥሺͳǡ ݄ǡ ݔȁݕሻ ൌ
definite, ensuring a positive symmetrical definite aggregate matrix.
ܯܲܮሺʹǡ ݄ǡ ݔሻ ܯܲܮܥሺͳǡ ݄ǡ ݔȁݕሻ ܷܲܯሺʹǡ ݄ǡ ݔሻ ܯܷܲܥሺͳǡ ݄ǡ ݔȁݕሻ
൨ ൨
ܯܲܮܥሺͳǡ ݄ǡ ݕȁݔሻ ܯܲܮሺʹǡ ݄ǡ ݕሻ ܯܷܲܥሺͳǡ ݄ǡ ݕȁݔሻ ܷܲܯሺʹǡ ݄ǡ ݕሻ
A. Complement Set Matrix diagonal consists of all zeros since the divergent partial moment of the same variable
To further analyze the information in the ሺ ܯܲܮܥ ܯܷܲܥሻ complement set from does not exist. The degree 1 DPM is presented below.
diverging target returns between variables, we introduce two new metrics - the diverging
(34)
்
ͳ
ܯܷܲܦሺ݊ȁݍǡ ݄ǡ ݔȁݕሻ ൌ ሺ݉ܽݔሼͲǡ ݄ െ ݔ௧ ሽ ή ݉ܽݔሼݕ௧ െ ݄ǡ Ͳሽ ሻ൩ሺ͵͵ሻ
ܶ
௧ୀଵ Since there only exists four possible interactions between two variables,
respectively) explained earlier in equations 1 and 2. For example, given a 20% X > target, Y > target ܯܷܲܥሺݍǡ ݄ǡ ݔȁݕሻ
observation for variable X and a shared target of 0%, a -10% observation for variable Y we can clearly see that the sum of the degree 0 probability matrices of all four
will generate a larger DLPM than a -5% observation for variable Y. interactions must equal one, explaining the entire multivariate distribution.
Conversely, equation (33) provides the divergent UPM for variable Y given a negative The distinct advantage for the partial moments over semivariance as the preferred
target deviation for variable X. below target analysis method is the ability for the partial moments to compensate for any
asymmetry.
The matrix of each divergent partial moment will be aggregated to represent the
divergent partial moment matrix (DPM). One key feature of this matrix is the main
44 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 45
Under symmetry, Each of the co-partial moment matrices is positive symmetrical semi-definite,
ȭ௫௫ఓ ȭ௫௬ఓ ͳ ߪ௫௫ ߪ௫௬ endogenous/exogenous matrix problem described by Grootveld and Hallerbach (1999)
൨ ൌ ቂߪ ߪ௬௬ ቃ
ȭ௬௫ఓ ȭ௬௬ఓ ʹ ௬௫ and Estrada (2008).
ȭ௫௫ఓ ȭ௫௬ఓ ȭ௫௫ఓ ȭ௫௬ఓ ߪ௫௫ ߪ௫௬
൨ ൨ ൌ ቂߪ ߪ௬௬ ቃ (35)
ȭ௬௫ఓ ȭ௬௬ఓ ȭ௬௫ఓ ȭ௬௬ఓ ௬௫ In R, using the ‘NNS’ package, we can verify the variance/covariance equivalence.
> cov(x,y)
[1] -0.04372107
> (Co.LPM(1,1,x,y)+Co.UPM(1,1,x,y)-D.LPM(1,1,x,y)-
D.UPM(1,1,x,y))*(length(x)/(length(x)-1))
[1] -0.04372107
Ͳ ܯܲܦሺͳȁͳǡ ߤǡ ݔȁݕሻ
െ ൨
ܯܲܦሺͳȁͳǡ ߤǡ ݕȁݔሻ Ͳ
IV. Conclusions area estimate; it merely creates larger quantities of smaller areas thus keeping the total
area constant. Equation (14) makes no such concessions and generates the theoretical
We have demonstrated how the ܯܲܮdegree 0 is equal to the traditionally derived
continuous area, while maintaining the relationship identified in Equation (15). We note
CDF of any assumed distribution. ܯܲܮሺͲǡ ݔǡ ܺሻ converges to:
how the continuous CDF is much more pronounced the further from the mean the integral
షమ
ଵ ௫
Ȱሺݔሻ ൌ ݁ మ ݀ݐ, is - compensating for the asymmetry of the additional area “between the bins” that is
ξଶగ ିஶ
Ͳǡ݂݅ ݔ൏ ܣ placed in the proceeding bin during discrete analysis.
௫ି
ሺݔȁܣǡ ܤሻ ൌ ቐି ǡ݂݅ ܣ ݔ ܤǡ
ͳǡ݂݅ ݔ ܤ Benoit Mandelbrot notes the shorter the measuring instrument, the larger the coastline
The obvious benefit is the distribution agnostic manner of this direct computation, We show that the Cumulative Distribution Function (CDF) is represented by the ratio
which consumes far less time and cpu effort than bootstrapping a discrete estimate.
of the lower partial moment ratio (ܯܲܮ௧ ) to the distribution for the interval in
Furthermore, the stability of the partial moments versus each of the distribution estimates
question. The addition of the upper partial moment ratio (ܷܲܯ௧ ) enables us to create
is yet another benefit of our method. Finally, the ability to derive results for a truly
probability density functions (PDF) for any function or distribution without prior
continuous variable emphasizes the flexibility of this method.
knowledge of its characteristics. The ability to derive the CDF and PDF without any
Any computer generated sample and analysis thereof, is that of a discrete variable. A distributional assumptions yields a more accurate calculation devoid of any error terms
histogram and bins as commonly performed in Excel by practitioners and academics alike present from a less than perfect goodness of fit, as well as critical information about the
ignores a large area under the function due to this discrete classification. The addition of tails of the distribution. This foundation is then used to develop conditional probabilities
bins with increased observations does not fill in the area and converge to the continuous and joint distribution co-partial moments. The resulting toolbox allows us to propose a
48 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 49
new formulation for UPM/LPM analysis and we note that each of the co-partial moment Appendix A:
offer the results of a separate study comparing the deviations from the large sample sizes
Stability of Estimates
25
20
Estivate Value
15
Mean
StdDev
10
SemiDev
5 UPM(1,0,x)
46
10
22
34
58
70
82
94
178
106
118
130
142
154
166
190
202
214
226
238
250
262
274
286
298
Observation
Figure 1a. Visual representation of the stabilization of statistics as sample size increases.
50 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 51
52 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 53
Conditional Probabilities:
We illustrate how the partial moment ratios can also emulate conditional
distribution areas from which the LPM and UPM can be observed.
Figure 1b. Venn diagram illustrating conditional probabilities of different ͳ ൌ ͳ െ ܯܲܮሺͲǡ ܽǡ ܤଵ ሻ െ ܷܲܯሺͲǡ ܾǡ ܤଵ ሻሺǤ ͳሻ
areas in the sample space, S.
P(B1|A) = 1
ͳ ൌ ܷܲܯሺͲǡ ܽǡ ܤଵ ሻ െ ܷܲܯሺͲǡ ܾǡ ܤଵ ሻሺǤ ʹሻ
P(B2|A) ≈ 0.85
The conditional probability P(B2|A) ≈ 0.85. The conditional probability P(B2|A) ≈ 0.85.
ͲǤͺͷ ൌ ͳ െ ܯܲܮሺͲǡ ܽǡ ܤଶ ሻ െ ܷܲܯሺͲǡ ܾǡ ܤଶ ሻሺǤ Ͷሻ
ͲǤͺͷ ൌ ͳ െ ܯܲܮሺͲǡ ܽǡ ܤଶ ሻ െ ܷܲܯሺͲǡ ܾǡ ܤଶ ሻሺǤ ሻ
Ͳ ൌ ሺͳሻ െ ሺͳሻ
Ͳ ൌ ሺͲሻ െ ሺͲሻ
58 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 59
Bayes’ Theorem:
Bayes’ theorem will also generate the conditional probability of A given B, Cancelling out ܲሺܣሻleaves us with Bayes’ theorem represented by partial
ܲሺܣȁܤሻ with the formula moments, and our conditional probability on the right side of the equality.
Where the probability of A is represented by, The following table of the canonical breast cancer test example will help place the
B.2),
Appendix C: Joint CDFs and UPM/LPM Correlation Analysis
Joint CDF
ൣ ݄௫ ǡ ݄௬ ൧ ൌ ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ሺǤ ͳሻ
6.00000%
5.00000%
This is the discrete CDF of the joint distribution, just how we
4.00000%
proveܯܲܮሺͲǡ ݄ǡ ܺሻ is the discrete CDF of the univariate distribution.
Target
3.00000%
CLPM
Where, 2.00000%
ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ has the following properties for various correlations 0.00000%
-1
-0.4
-0.9
-0.8
-0.7
-0.6
-0.5
-0.3
-0.2
-0.1
0
1
0.6
0.1
0.2
0.3
0.4
0.5
0.7
0.8
0.9
between the two variables ߩ௫௬ ǡ when ݄௫ = ݄௬ .9 Correlation
Figure 1C. Hypothetical 5% shared target on two variables (x, y) and the joint CDF
x If ߩ௫௬ = 1; ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ሼܯܲܮሺͲǡ ݄௫ ǡ ݔሻǡ ܯܲܮ൫Ͳǡ ݄௬ ǡ ݕ൯ሽ.
for various correlations.
x If ߩ௫௬ = 0; ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ݄௫ ή ݄௬
x If ߩ௫௬ = -1; ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ Ͳ.
We can deduce the correlation between the assets only with knowledge of the
ܯܲܮܥand ݄௫ ȁ݄௬ . For example, with both our variables and their 5% targets, if
An example may help illustrate the relationship. Let’s assume the same target
the ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ͲǤʹͷΨ we know that ߩ௫௬ ൌ ͲǤ
݄௫ = ݄௬ which we arbitrarily select to the 5% CDF level for two normal
distributions with μ= 9 and σ= 20. We then ask, what’s the probability that both Equation C.3 will provide the implied correlation for an observed discrete joint
variables will be in the lower 5% of their distribution simultaneously under CDF, ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯. Lucas (1995) provides a framework for estimating
different correlations? the correlation between two events with the following equation which substitutes
9
We leave further asymmetric target analysis for future research.
62 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 63
ܯܲܮܥ൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܲܮሺͲǡ ݄௫ ǡ ݔሻ ή ܯܲܮሺͲǡ ݄௬ ǡ ݕሻ ሾܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ሿ
ටሾܯܲܮሺͲǡ ݄௫ ǡ ݔሻ ή ܷܲܯሺͲǡ ݄௫ ǡ ݔሻሿ ή ሾܯܲܮ൫Ͳǡ ݄௬ ǡ ݕ൯ ή ܷܲܯ൫Ͳǡ ݄௬ ǡ ݕ൯ሿ ሺǤ ͷሻ
ሺǤ Ͷሻ
If there is a -1 correlation, then the returns between the variables will always be
divergent, thus
From our ݄௫ ൌ ݄௬ ൌ ͷΨ example, ሾͲ െ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ Ͳሿ
ߩ௫௬ ൌ ൌ െͳሺǤ ሻ
ͲǤʹͷΨ െ ሺͷΨሻሺͷΨሻ ሾͲ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ Ͳሿ
ߩ௫௬ ൌ
ඥሾͷΨ ή ͻͷΨሿ ή ሾͷΨ ή ͻͷΨሿ
ߩ௫௬ ൌ ͲǤ If there is a perfect correlation between two variables, then there will be no
divergent returns, thus
ܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯
64 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS
Thus,
ߩ௫௬ ൌ
ൣܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯൧
ൣܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯൧
ൌͲ
ሺǤ ͺሻ
the ratio of ܯܲܮܥ൫ͳȁͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ to the entire degree 1 joint distribution will
COMPLEX
generate the probability percentage. Thus,
ሺǤ ͻሻ
Abstract
1. INTRODUCTION
Chen et al. (2010) explore the problem of estimating a nonlinear correlation (See
Figure 1). They note that a generic use statistic such as the Pearson correlation
coefficient does not exist for nonlinear correlations. We introduce a generic nonlinear
correlation coefficient metric derived from partial moments that can be substituted for the
partial moments enables ordered partitions of the data whereby linear segments are
required, (2) partial moments are integrated into economics through expected utility
theory (Holthausen, 1981 and Guthoff et al., 1997), and are integrated into statistics as
Viole and Nawrocki (2012a) find that partial moments can be used to derive the CDF and
The paper is organized as follows: The next section will cover the development of the
measure followed by a section with empirical results. Next, we extend the analysis to a
ܺ ≤ target, ܻ ≤ target n is the degree of the LPM, q is the degree of the UPM, ݄௫ is the target for computing
ܺ ≤ target, ܻ > target below target observations for X, and ݈௫ is the target for computing above target
We propose a method of partitioning the distribution with partial moments to capture the 2.2 Divergent Partial Moments
்
ͳ
ܯܷܲܦ൫݊ȁݍǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ ൫݉ܽݔሼͲǡ ݄௫ െ ܺ௧ ሽ ή ݉ܽݔ൛ܻ௧ െ ݄௬ ǡ Ͳൟ ൯൩ሺͶሻ
ܶ
௧ୀଵ
72 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 73
2.3 Definition of Variable Relationships: If there is a perfect correlation between two variables, then there will be no divergent
If there is zero correlation between two variables, then the co- and divergent returns will
coefficient, we can use the following nonparametric formula in equation 5 to determine ܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ൌ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯
ߩ௫௬ ൌ Thus,
ሾܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ሿ
ሾܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ሿ ߩ௫௬ ൌ
ሺͷሻ
ൣܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯൧
ൣܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯൧
The axiomatic relationship between correlation and co- or divergent returns follows.
ൌͲ
If there is a -1 correlation, then the returns between the variables will always be
dependence coefficient.
74 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 75
ߤ௫
ߤ௫ ͳݔ
തതത
ʹݔ
തതത
ܯܷܲܦଵ
ܯܷܲܥଵ
ͳݕ
തതത
ܯܷܲܦሺ݊ȁݍǡ ߤ௫ หߤ௬ ǡ ݔหݕሻ
ܯܷܲܦଶ ܯܷܲܥଶ
ܯܷܲܥሺݍǡ ߤ௫ หߤ௬ ǡ ݔหݕሻ
ܯܲܮܦଵ
ܯܲܮܥଵ
Y ʹݕ
തതത
ܯܲܮܥଶ
ܯܲܮܦଶ
ߤ௬ Y ߤ௬
ܯܷܲܥସ ܯܷܲܥଷ
ܯܷܲܦଷ
ܯܷܲܦସ ͵ݕ
തതത
ܯܲܮܦሺݍȁ݊ǡ ߤ௫ หߤ௬ ǡ ݔหݕሻ ݕͶ
തതത
ܯܲܮܥሺ݊ǡ ߤ௫ หߤ௬ ǡ ݔหݕሻ
ܯܲܮܥସ ܯܲܮܦଷ
ܯܲܮܦସ ܯܲܮܥଷ
ݔͶ
തതത ͵ݔ
തതത
X
Figure 1. 1st order partitioning of the distribution based on variable relationships X
with co- and divergent partial moments on an observed nonlinear correlation in a
microarray study from Chen et al. (2010).
Figure 2. 2nd order partitioning of the microarray study based on means of partial
moment subsets as targets.
76 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 77
ሺͳͲሻ
2.6 Definition of Subset Means:
்
σୀଵ ݔଵ σୀଵ ݕଵ ͳ
ݔଵ ൌ
തതത ݕ
തതതଵ ൌ ܯܲܮܥଵ ሺ݊ǡ തതതȁݔଵ ܺଵ ȁܻଵ ሻ ൌ ൫݉ܽݔ൛Ͳǡ തതത
ݔଵ തതതǡ ݔଵ െ ݔଵ ௧ ൟ ή ݉ܽݔ൛Ͳǡ ݕ
തതതଵ െ ݕଵ ௧ ൟ ൯൩
݊ ݊ ܶ
௧ୀଵ
For a 3rd order analysis for example, one needs to then compute the 12 remaining
subset partial moments (in addition to the four identified in equations 9-12 above) using
the appropriate subset mean targets for each quadrant. The total amount of subset means
will be less than or equal to Ͷሺିଵሻ where N is the number of orders specified.ii
78 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 79
The eventual correlation metric is accomplished by adding all CUPM’s and CLPM’s 2.8 Dependence:
(positive correlations) and subtracting DUPM’s and DLPM’s (negative correlations) in We can also define the dependence present between two variables as the sum of the
the numerator, while summing all 16 co- and divergent partial moments representing the absolute value of the per quadrant correlations. Stated differently, when all of the per
entire distribution in the denominator per equation 13 below. quadrant observations are either the CLPM & CLPM, or DLPM & DUPM, the variables
(ܯܲܮܥଵ ܯܲܮܥଶ ܯܲܮܥଷ ܯܲܮܥସ ܯܲܮܦଵ ܯܲܮܦଶ ܯܲܮܦଷ ܯܲܮܦସ respectively) to the positive correlations.
ܯܷܲܦଵ ܯܷܲܦଶ ܯܷܲܦଷ ܯܷܲܦସ ܯܷܲܥଵ ܯܷܲܥଵ ܯܷܲܥଷ ܯܷܲܥସ ሻ
(13) When ߟሺܺǡ ܻሻ equals one, there is maximum dependence between the two variables.
3. EMPIRICAL EVIDENCE:
Third order partitions are shown and calculated in R. The 1st order partition is the Nonlinear Differences:
thick red line (per Figure 1), the 2nd order partition is the thin red line (per Figure 2) and
the 3rd order partition is the dotted black line. ࢅ ൌ ࢄ for positive X
> x=seq(0,3,.01);y=x^2
iii > cor(x,y)
Linear Equalities:
[1] 0.9680452
ࢅ ൌ ࢄ > NNS.dep(x,y,print.map = T)
$Correlation
> x=seq(-3,3,.01);y=2*x
[1] 0.9994402
> cor(x,y)
$Dependence
[1] 1
[1] 0.9994402
> NNS.dep(x,y,print.map = T)
$Correlation
Figure 5. Nonlinear positive relationship between two variables (X, Y).
[1] 1
$Dependence
[1] 1
ࢅ ൌ ࢄ
ࢅ ൌ െࢄ
> x=seq(-3,3,.01);y=x^2
> x=seq(-3,3,.01);y=-2*x > cor(x,y)
> cor(x,y) [1] 7.665343e-17
[1] -1 > NNS.dep(x,y,print.map = T)
> NNS.dep(x,y,print.map = T) $Correlation
$Correlation [1] -0.001647721
[1] -1 $Dependence
$Dependence [1] 0.9993975
[1] 1
Figure 6. Nonlinear relationship between two variables (X, Y).
As the exponential function increases in magnitude, we actually find it to retain its linear 4. MULTIDIMENSIONAL NONLINEAR ANALYSIS:
relationship… To find the 1st order aggregate correlation for more than two dimensions, the method
is similar to what was just presented. Instead of co- and divergent partial moments, we
ࢅ ൌ ࢄ
are going to substitute co- and divergent partial moment matrices into equation 5. A n x
> x=seq(0,3,.01);y=x^10
> cor(x,y)
n matrix for each of the interactions (CLPM, DLPM, DUPM and CUPM) per Viole and
[1] 0.6610183
> NNS.dep(x,y,print.map = T)
Nawrocki (2012a), can be constructed and treated analogously to the direct partial
$Correlation
[1] 0.9812511
moment computation.
$Dependence
[1] 0.9812511
Thus,
Figure 7. Nonlinear positive relationship between two variables (X, Y). ܯܲܮܥሺͲǡ ݄௫ ȁ݄௫ ǡ ݔȁݔሻ ܯܲܮܥ ڮሺͲǡ ݄௫ ȁ݄ ǡ ݔȁ݊ሻ
ܯܲܮܥ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ ൌ ൭ ڭ ڰ ڭ ൱ሺͳͷሻ
And a completely nonlinear clustered dataset, where coefficient weighting due to ܯܲܮܥሺͲǡ ݄ ȁ݄௫ ǡ ݊ȁݔሻ ܯܲܮܥ ڮሺͲǡ ݄ ȁ݄ ǡ ݊ȁ݊ሻ
ࢅ ൌ ࢛ࢊࢋ࢚ࢋ࢘ࢋࢊࢌሺ࢞ሻ
Yielding,
> ߩ௫ǥ ൌ
cor(cluster.df[,3],cluster.df[,4])
[1] -0.6275592 ሾܯܲܮܥ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ െ ܯܲܮܦ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ െ ܯܷܲܦ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ ܯܷܲܥ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻሿ
ሾܯܲܮܥ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ ܯܲܮܦ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ ܯܷܲܦ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻ ܯܷܲܥ௧௫ ሺͲǡ ݄௫ ǥ ݄ ǡ ݔǥ ݊ሻሿ
> NNS.dep(cluster.df[,3],
cluster.df[,4],print.map = T)
(16)
$Correlation
[1] -0.1020994
$Dependence
[1] 0.2637387 Whereby the final result will be an equal sized n x n matrix,
Figure 8. Nonlinear relationship between two variables (X, Y). ߩ௫௫ ߩ ڮ௫ ͳ ߩ ڮ௫
ߩ௫ǥ ൌ൭ ڭ ڰ ڭ൱ൌ൭ ڭ ڰ ڭ൱
ߩ௫ ߩ ڮ ߩ௫ ͳ ڮ
84 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 85
To derive the overall correlation, we need to sterilize the main diagonal of 1’s (which
ߩ௫௫ ߩ ڮ௫
σ ൭ ڭ ڰ ڭ൱ െ ݊൩
ߩ௫ ߩ ڮ
ߩ௫ǥ ൌ ሺͳሻ
݊ଶ െ ݊
Again, if the variables are all below or above their respective targets at time t, the CLPM
and CUPM matrices respectively will capture that information. If the variables are i.i.d.,
the likelihood that one variable would diverge at time t increases as n increases, reducing
ߩ௫ǥ .
matrices for each of the identified subsets for all of the variables.
The target means from which the four partial moment matrices are calculated also
Figure 9. Nonparametric regression points for a linear relationship between (X, Y).
serve as the basis for a nonlinear regression. By plotting all of the mean intersections, the Orders progressing restricted to the previous partition boundary.
linear segments will fit the underlying, nonparametrically. The increased order of
portioning will generate more intersecting points (maximum of Ͷሺିଵሻ) for a more We can also perform this on nonlinear relationships. Below is an example with 3rd
granular analysis. Below is an example with 3rd order partitioning, generating a fit to the order partitioning, generating a fit to an exponential relationship between the variables.
linear data.
86 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 87
And the nonlinear multiple regression can be performed in kind to the two variable
nonparametric local means regression only the number of means has to be a factor of 4
Figure 11 below is the nonlinear correlation matrix and the subsequent weightings for
the multiple variable nonlinear regression using SPY as the dependent variable with TLT,
GLD, FXE, and GSG as explanatory variables.iv The data involved 100 daily
observations from 5/8/12 through 9/27/12 for all variables. As shown in Viole and
Figure 10. Nonparametric regression points for a nonlinear relationship between
(X, Y). As partition orders increase, the curve is better fit. Nawrocki (2012c) partial moments asymptotically converge to the area of the function,
> NNS.cor(ReturnsDF,order=3)
Generating a multiple variable nonlinear regression analysis requires creating a
GSG GLD TLT FXE SPY
synthetic variable. This variable, X* is the weighted average of all of the explanatory
GSG 1.00000000 -0.10111213 -0.05050505 0.06070809 0.11111111
variables. The weighting is the nonlinear correlation derived from the n x n matrix where
GLD -0.10111213 1.00000000 0.23232323 0.21212121 0.03030303
the explanatory variables are on the same row as the dependent variable which will have TLT -0.05050505 0.23232323 1.00000000 0.15151515 -0.23242629
a 1.0 self-correlation. Thus, an explanatory variable with zero correlation to the FXE 0.06070809 0.21212121 0.15151515 1.00000000 0.23232323
σୀଵ൫ߩ௬ǡ௫ ൯ሺݔ ሻ
כൌ ሺͳͺሻ
݊
88 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 89
Again, there are no multicollinearity issues with the explanatory variables, it simply
does not matter if they are correlated or not. Below in figure 13 is the graph of this
Figure 12. Our 9th order fit for a sine wave function of X.
90 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 91
computations, just as other statistics such as means and variances. The obvious benefit is
the ability to parse what was referred to as “noise” into valid information. Due to the fact
ଵ
that individual observations are weighted by ቀ ቁ, the number of observations in each
்
segment will weigh the segment accordingly; thus affirming outlier observation status for
The purpose of this paper was to put forth a nonparametric, nonlinear correlation
metric where Chen et al. (2010) note, “there is no commonly use statistic quantifying
nonlinear correlation that can find a similarly generic use as Pearson’s correlation
coefficient for quantifying linear correlation.” Our linear sum of the weighted micro does
th
Figure 13. Our 4 order fit for an undetermined function of X*.
indeed capture the aggregate correlation. But, unlike Pearson’s single correlation
from the individual partial moment matrices. As for a direct policy statement resulting
from the nonlinear regression analysis; it would have to assume the form of a conditional
equation whereby each linear segment is defined for a specific range of the explanatory
variable(s).
Autoregressive Modeling
ABSTRACT
Using component series from a given time series, we are able to demonstrate
forecasting ability with none of the requirements of the traditional ARMA method, while
strictly adhering to the definition of an autoregressive model. We also propose a new test
for seasonality using coefficient of variation comparisons for component series, and then
extend this proposed method to non-seasonal data. The resulting effect is that of
conditional heteroskedasticity on the forecast with more accurate forecasts derived from
implementing nonlinear regressions into the component series.
NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 95
INTRODUCTION
adhering to the above definition. We accomplish this by using a linear regression of like
data points excluded from the total time series. For instance, in monthly data, we will
examine the “January” data points autonomously to generate the ex ante “January”
observation.
Testing for seasonality of each of the monthly classifications will alert us weather
to incorporate other months’ data in the linear regression. Through simple examples, we
x Model Identification
x Model estimation
x Diagnostic Testing
x Forecasting
We will also demonstrate how the ARIMA requirement of stationarity of the time series
10
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc444.htm
96 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 97
In his 2008 article, Wang explains how to use Box-Jenkins models for
Our first step is to break the time series down into like classifications. In this
forecasting. He uses an example of the quarterly electric demand in New York City from
example, first quarter data will be aggregated to form a first quarter time series. The
the first quarter of 1995 through the fourth quarter of 2005.
vectors of observation number and sales are given below
Figure 1 clearly shows that the demand data are quarterly seasonal trending
Observation number = {1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41}
upward; consequently, the mean of the data will change over time. We can define that a
Sales = {22.91, 23.39, 23.51, 23.97, 24.81, 25.37, 24.95, 26.21, 25.76, 25.91, 27.08}
stationary time series has a constant mean and has no trend overtime. A plot of the data
is usually enough to see if the data are stationary. In practice, few time series can meet
this condition, but as long as the data can be transformed into a stationary series, a Box- Vectors for Quarters 2 through 4 will be created analogously using every fourth
Jenkins model can be developed. As defined above, this time series is not stationary. observation starting from the corresponding quarter number and the sales data.
Sales
45 QTR 1
40 28
27
35 26
25
30
Sales 24
25 23 QTR 1
22
20
21
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
20
Observation
1 5 9 13 17 21 25 29 33 37 41
Observation
Figure 1. Recreation of data set from Wang [2008] based on quarterly electric
demand in New York City from the first quarter of 1995 through the fourth quarter
Figure 2. First quarter series isolated from original time series.
of 2005.
98 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 99
In order to test for seasonality, outside of the recommended “eyeball test” of the In order to adhere to the autoregressive definition provided in the introduction, we
plotted data, we propose another method. If each of the quarterly series’ coefficient of need to use a linear regression on the prior values of a variable. We have just created a
variation (σ/ μ) is less than the total sample coefficient of variation, seasonality exists. In subset of those values with like classifications to perform the regression.
our working example, the variances and means are presented in table 1 below.
Figure 3 below is the linear regression of the QTR 1 series. The regression equation is
y = 0.0961x + 22.878
Full Thus, our estimate for the next QTR 1 observation (the 45th observation overall)12 is
Sample QTR 1 QTR 2 QTR 3 QTR 4
σ 4.589798 1.261198 1.313679 3.632291 1.306242 y = 0.0961*45 + 22.878
μ 26.23295 24.89727 22.47545 33.09091 24.46818
y = 27.203
σ/ μ 0.174963 0.050656 0.058449 0.109767 0.053385 This is fairly close to the Box-Jenkins model result provided in Wang [2008] of 27.40.
Again, we have lost no observations due to differencing in order to transform the data
Table 1. Variances and means for full sample vs. each quarterly series. The
coefficient of variation (σ/ μ) is less than the sample for all component series, into a stationary series. Aside from the nonstationarity of the quarterly series, we note
indicating seasonality present in the data.
the linear approximation of the data as evidenced by the high ܴ ଶ of 0.9297. This linearity
is not necessary as will be discussed later when we introduce the nonlinear regression
In monthly time series from 1/2000 through 5/2013 for the S&P 500, we find the total
method to the discussion.
coefficient of variation to equal 0.158665526 with the “Janurary” series coefficient of
variation equal to 0.16710549, thus negating the seasonality consideration (and enabling
11 12
Plots of total and monthly series are in the Appendix. The same series can be regressed on its own index, for this example (1:11).
100 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 101
Figure 4. All quarterly plots with associated linear regressions and estimates for
each quarterly series.
We extend the analysis to all four quarter series and generate the forecasts based on the
linear regression of each series in figure 4 below. You will note the overall pattern
Figure 5. 50 period forecast using static 4 period lag and linear regression.
102 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 103
IV. CONDITIONAL HETEROSKEDASTICITY In this example, we perform 8 component regressions and the forecast output
weights are determined by summing the inverses of each period’s coefficient of variation.
We noted earlier that under seasonality of the data, it is a simple regression of the
component series to generate a forecast. However, under the absence of perfect Period ) Intercept + β (t+1) = Forecast
seasonality this is not the case. When a single seasonal period is not identified, we use a 2) 24.6275325 + 0.3797007 (23) = 33.36065
3) 23.1120879 + 0.3990549 (15) = 29.09791
weighted average of all identified seasonal components. 4) 22.5900000 + 0.3845455 (12) = 27.20455
6) 23.874286 + 1.256071 (8) = 33.92286
7) 25.87466667 + 0.03914286 (7) = 26.14867
Figure 6 illustrates the seasonal components to the Wang [2008] quarterly time 8) 22.786 + 0.728 (6) = 27.154
10) 20.075 + 2.945 (5) = 34.8
series (data provided in Appendix). Note the strong seasonal presence in periods 4 and 8. 11) 23.110 + 0.999 (5) = 28.105
Forecast * Averaged Output Weight = Weighted Forecast So even if the data for the component series resembles the sine wave function as in figure
33.36065* 0.218622275 = 7.293381202 7 below (we are highlighting the nonlinearity of the data, stationarity is irrelevant) we
29.09791* 0.126095125 = 3.669104596
27.20455* 0.172320331 = 4.687897049 will be able to generate a more accurate series forecast. We can see that the linear
33.92286* 0.114071942 = 3.869646502
26.14867* 0.077667561 = 2.03090341 regression would suggest a positive data point (in green), yet the nonlinear regression
27.154* 0.127909485 = 3.473254147
34.8* 0.09780591 = 3.40364566 based on partial moments from Viole and Nawrocki [2012] would suggest a decidedly
28.105* 0.065507373 = 1.841084715
negative observation for their forecasts.
Weighted Forecast Sum = 30.269
component series and its coefficient of variation. Again, it should be reserved for
instances of truly unknown seasonal periods and be more effective than a single seasonal
NONLINEAR REGRESSION
autoregressive model. Perhaps it was due to the time in which the models were derived?
Regardless, we can use a nonlinear regression method to derive more accurate forecasts
than the stipulated linear regression. This option will handle the nonlinearity of the
component series.
Figure 7. Nonlinear regression on a hypothetical component series used to highlight
the inadequacy of a linear regression for forecasting even component series, let
alone total series.
106 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 107
We have closely approximated the results from a Box-Jenkins method with an Obs # Value Obs # Value
1 22.9 33 25.76
autoregressive model with no stationarity requirement, no model identification, capable
2 20.63 34 22.88
of handling nonlinearity. The absence of requirements and the retention of all of the 3 28.85 35 34.02
4 22.97 36 25.8
original data is a promising starting point to adhere to the definition of the process. 5 23.39 37 25.91
6 20.65 38 24.07
We have also introduced a method of detecting seasonality in time series data. 7 30.02 39 36.6
8 23.13 40 26.43
This technique can be used in conjunction with existing methods to confirm the results 9 23.51 41 27.08
10 22.99 42 24.99
found in tests with normalized data (typically autocorrelation plots of differenced data).
11 32.61 43 41.29
In the absence of seasonality, we offer a simple procedure for giving equal representation 12 23.28 44 26.69
13 23.97
of other component variance which typically influences the component series via 14 21.48
15 27.39
conditional heteroskedasticity.
16 23.75
17 24.81
18 21.51
19 33.2
20 23.68
21 25.37
22 22.36
23 33.36
24 23.5
25 24.95
26 22.22
27 34.81
28 24.64
29 26.21
30 23.45
31 31.85
32 25.28
108 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS
56
12
23
34
45
67
78
89
Observation
APPLES
Figure 1A. S&P 500 monthly returns 1/2000 – 5/2013.
157
1
13
25
37
49
61
73
85
97
Observation
ABSTRACT
INTRODUCTION
Normalization is the preferred technique for aligning and then comparing various
data sets. However, this technique often loses the variance properties associated with the
underlying distributions. The results are catastrophic on continuous variables, such that
they are effectively transformed into discrete variables. Viole and Nawrocki [2012a]
We propose a new method of normalization that improves upon the linear scaling
and Viole and Nawrocki [2012b]. In essence the typical linear scaling method assumes a
We then compare these normalized data sets using our proposed nonlinear scaling
METHODS
Linear Scaling
Linear scaling uses each set as a reference once, then averaging all of the
iterations. This way original series for all is considered in the final normalization. It is
The Genomics and Bioinformatics Group of the NIH describe the linear scaling process where ܨ is the distribution function of chip i, and ܨ is the distribution function of the
In practice, for a series of chips, define normalization constants C1 , C 2 ,…, by: A quick illustration of such normalizing on a very small dataset:14
Arrays 1 to 3, genes A to D
ܥଵ ൌ ݂ଵ
ǡ ܥଶ ൌ ݂ଶ
ǡ ǡ A 5 4 3
௦ ௦ B 2 1 4
C 3 4 6
where the numbers ݂ଵ are the fluorescent intensities measured for each probe on chip
D 4 2 8
i. Select a common total intensity K (eg. the average of the Ci's). Then to normalize all For each column determine a rank from lowest to highest and assign number i-iv
the chips to the common total intensity K, divide all fluorescent intensity readings from A iv iii i
B i i ii
chip i by Ci., and multiply by K.
C ii iii iii
distribution of probe intensities; they then transform the original value to that quantile's The result is:
A 5 4 3 becomes A 2 1 3
value on the reference chip. In a formula, the transform is
B 2 1 4 becomes B 3 2 4
C 3 4 6 becomes C 4 4 6
ݔ ൌ ܨିଵ ቀܨ ሺݔሻቁǡሺͳሻ
D 4 2 8 becomes D 5 4 8
13 14
http://discover.nci.nih.gov/microarrayAnalysis/Affymetrix.Preprocessing.jsp http://en.wikipedia.org/wiki/Quantile_normalization
116 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 117
Now find the mean for each row to determine the ranks OUR PROPOSED METHOD
A (2 1 3)/3 = 2.00 = rank i The nonlinear association between variables is an important metric. It is also
B (3 2 4)/3 = 3.00 = rank ii
quite new to the literature. Chen et al. [2010] propose a method by using a rank
C (4 4 6)/3 = 4.67 = rank iii
D (5 4 8)/3 = 5.67 = rank iv transformation on the underlying data, while Viole and Nawrocki [2012b] propose a
method based on the partial moments of the underlying data. VN will be the method
Now take the ranking order and substitute in new values:
employed for this analysis.
A iv iii i
B i i ii
We define the amount of nonlinearity association present between two variables as.
C ii iii iii
D iii ii iv
ߟሺܺǡ ܻሻ ൌ ȁߩெ ȁ ȁߩெ ȁ ȁߩெ ȁ ȁߩெ ȁሺʹሻ
becomes: Original
A 5.67 4.67 2.00 5 4 3 Where,
B 2.00 2.00 3.00 2 1 4 Co-Partial Moments
C 3.00 4.67 4.67 3 4 6 ்
ͳ
D 4.67 3.00 5.67 4 2 8 ܯܲܮܥ൫݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ ሺ݉ܽݔሼͲǡ ݄௫ െ ܺ௧ ሽ ή ݉ܽݔ൛Ͳǡ ݄௬ െ ܻ௧ ൟ ሻ൩ሺ͵ሻ
ܶ
௧ୀଵ
்
This is the new normalized values. The new values have the same distribution and can ͳ
ܯܷܲܥ൫ݍǡ ݈௫ ȁ݈௬ ǡ ܺหܻ൯ ൌ ሺ݉ܽݔሼܺ௧ െ ݈௫ ǡ Ͳሽ ή ݉ܽݔ൛Ͳǡ ܻ௧ െ ݈௬ ൟ ሻ൩ሺͶሻ
ܶ
௧ୀଵ
now be easily compared.
n is the degree of the LPM, q is the degree of the UPM, ݄௫ is the target for computing
below target observations for X, and ݈௫ is the target for computing above target
்
ͳ
associated with maximum nonlinear correlation readings ߩ௫௬ ൌ ͳ ݎെ ͳ. Thus the use
ܯܷܲܦ൫݊ȁݍǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ ൫݉ܽݔሼͲǡ ݄௫ െ ܺ௧ ሽ ή ݉ܽݔ൛ܻ௧ െ ݄௬ ǡ Ͳൟ ൯൩ሺሻ
ܶ
௧ୀଵ of dependence is more aptly defining the nonlinear association between variables. For a
complete treatment on nonlinear correlations and associations please see Viole and
Nawrocki [2012b].
Definition of Variable Relationships:
Figure 1 below illustrates the process for a 2 gene and a 4 gene example. Each
Equation 2 describes the amount of nonlinearity present when the negative
gene has the desired property of serving as the reference gene (RG) in the process once.
correlations (D-PM’s) are equal in frequency or magnitude (depending on degree 0 or 1
This consideration is identical to the standard linear scaling technique. From each RG’s
respectively) to the positive correlations (C-PM’s).
total intensity, we derive the RG factor for each gene to the RG. Simple enough.
The nonlinear correlation between two variables is given by However, we then multiply each gene’s observations by the RG factor and the nonlinear
ሾܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ െ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ሿ
ሺሻ
ሾܯܲܮܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܲܮܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܦ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ ܯܷܲܥ൫Ͳǡ ݄௫ ȁ݄௬ ǡ ݔหݕ൯ሿ
120 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 121
We repeat this process with every gene serving as the RG and then average all of
the RG factored observations for each gene. The result is a fully normalized distribution
for each gene with variance retention of the original data set.
122 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 123
We now present the results of this method on four financial variables SPY, TLT,
GLD, and FXE. The nonlinear association between self and cross financial time-series is
well noted. This is an important test, since gene distributions are roughly similar, how
Figure 3 below illustrates the results. Our method visually represents the original
data set more clearly and also retains the finite moment relationships that the linear
scaling method enjoys. We note the strong influence the nonlinear association has on the
normalized series, as SPY is distinct due to its very low correlation to any of the other
time series. Thus, the more correlated the series are, the lower the variance of the
normalized population.
intersect, the quantile ranks remain static and the normalized value is simply the mean.
This is exemplified below with the financial variables. Obviously this is not an issue
with gene arrays, however, it speaks to the ad hoc nature of the method. We see in
Figure 3 below quantile normalization does succeed in creating the same distribution for
scale), S&P 500 (point scale) and the US 10 Year Yield (% scale).
Unnormalized Data
14000.00 18.00
12000.00 16.00
14.00
10000.00
12.00
Yield %
8000.00 10.00 S&P 500
6000.00 8.00 MZM
6.00
4000.00 10 Yr Yield
4.00
2000.00 2.00
0.00 0.00
1984
1959
1964
1969
1974
1979
1989
1994
1999
2004
2009
Nonlinear Scaling
5000
4500
4000
3500
3000
S&P 500
2500
2000 10 Yr Yield
1500 MZM
1000
500
0
1978
1959
1962
1965
1968
1972
1975
1981
1985
1988
1991
1994
1998
2001
2004
2007
2011
Figure 4. Orders of magnitude differences removed from 3 financial variables.
126 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS
DISCUSSION
Note the tighter overall distribution from our method versus the linear scaling
method. Also note the variance properties of the each of the distributions versus the
quantile normalization. We are tighter and more representative of the original data set for
similar distributions. When the distributions vary considerably, the nonlinear association
variables. This characteristic is lost via its use as the normalizing factor in the linear
Abstract
scaling technique. Factoring the nonlinear association between variables is imperative in
Analysis of Variance (ANOVA) is a statistical method used to determine whether
noting the nonlinear differences. Moreover, if the variable relationship is linear, our a sample originated from a larger population distribution. We provide an alternate
method of determination using the continuous cumulative distribution functions derived
method retains the relationship between variables! from degree one lower partial moment ratios. The resulting analysis is performed with
no restrictive assumptions on the underlying distribution or the associated error terms.
Bolstad et al. [2003] note,
“The four baselines shifted slightly lower in the intensity scale give the
most precise estimates. Using this logic, one could argue that choosing the
array with the smallest spread and centered at the lowest level would be
the best, but this does not seem to treat the data on all arrays fairly.”
Our method does treat all of the data on all of the arrays fairly. We use each array as a
RG and utilize its nonlinear association (which uses all observations equally) with all
INTRODUCTION
the sum of squares for the total, treatment, and errors, we then obtain the P-value
corresponding to the computed F-ratio of the mean squared values. If the P-value is
small (large F-ratio), we can reject the null hypothesis that all means are the same for the
different samples. However, the distributions of the residuals are assumed to be normal
and this normality assumption is critical for P-values computed from the F-distribution to
be meaningful. Instead of using the ratio of variability between means to the variability
Using known distributional facts from samples, we can deduce a level of certainty
that multiple samples originated from the same population without any of the
When using one-way analysis of variance, the process of looking up the resulting Viole and Nawrocki [2012a] offer a detailed examination of CDFs and PDFs of
value of F in an F-distribution table, is proven to be reliable under the following various families of distributions represented by partial moments. They find that the
assumptions: continuous degree 1 LPM ratio is .5 from the mean of the sample. No deviations, for
The assumption that the groups follow the normal curve is the usual one made in most sample belongs to that population.
Where,
one reason that ANOVA may give incorrect results. It would be wise to consider
்
whether it is reasonable to believe that the groups' distributions follow the normal curve. ͳ
ܯܲܮሺ݊ǡ ݄ǡ ݔሻ ൌ ݉ܽݔሼͲǡ ݄ െ ݔ௧ ሽ ൩ሺʹሻ
ܶ
௧ୀଵ
Of course the different population averages imposes no restriction on the use of ANOVA; ்
ͳ
ܷܲܯሺݍǡ ݈ǡ ݔሻ ൌ ݉ܽݔሼͲǡ ݔ௧ െ ݈ሽ ൩ሺ͵ሻ
the null hypothesis, as usual, allows us to do the computations that yield F. ܶ
௧ୀଵ
The third assumption, that the populations' standard deviations are equal, is important in
where ݔ௧ represents the observation x at time t, n is the degree of the LPM, q is the degree
principle, and it can only be approximately checked by using as bootstrap estimates the
of the UPM, h is the target for computing below target returns, and l is the target for
sample standard deviations. In practice, statisticians feel safe in using ANOVA if the
computing above target returns. ݄ ൌ ݈ ൌ ߤ throughout this paper.
largest sample SD is not larger than twice the smallest.15
Tables 1 through 4 illustrate the consistency of the degree 1 LPM ratio across distribution
types.
15
http://math.colgate.edu/math102/dlantz/examples/ANOVA/anovahyp.html
132 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 133
0.5
Table 2. Final probability estimates with 5 million observations and 300 iteration seeds
0.495 LPM(0,μ,x) averaged for the Uniform distribution. Bold estimate is the continuous ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻ ൌ
LPMratio(1,μ,x) Ǥ .
0.49
0.485
10
1666
148
286
424
562
700
838
976
1114
1252
1390
1528
1804
1942
2080
2218
2356
2494
2632
2770
2908
Poisson Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
Observations POIDF(X ≤ 0.00) = .00005 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0
POIDF(X ≤ 4.50) = .0293 LPM(0, 4.5, X) = .0293 LPM(1, 4.5, X) = .0051
POIDF(X ≤ Mean) = .5151 LPM(0, μ, X) = .5151 LPM(1, μ, X) = .5
Figure 1. Differences in discrete ࡸࡼࡹሺǡ ࣆǡ ࢄሻ and continuous ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻǤCDFs POIDF(X ≤ 13.5) = .8645 LPM(0, 13.5, X) = .8645 LPM(1, 13.5, X) = .9365
converge when using the mean target for a Normal distribution. ࡸࡼࡹሺǡ ࣆǡ ࢄሻ ്
ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻǤ
Table 3. Final probability estimates with 5 million observations and 300 iteration seeds
averaged for the Poisson distribution. Bold estimate is the continuous ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻ ൌ
Normal Distribution Probabilities - 5 Million Draws 300 Iteration Seeds Ǥ .
Norm Prob(X ≤ 0.00) = .3085 LPM(0, 0, X) = .3085 LPM(1, 0, X) = .2208
Norm Prob(X ≤ 4.50) = .3917 LPM(0, 4.5, X) = .3917 LPM(1, 4.5, X) = .3339
Norm Prob(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5
Norm Prob(X ≤ 13.5) = .5694 LPM(0, 13.5, X) = .5694 LPM(1, 13.5, X) = .608
Chi-Squared Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
CHIDF(X ≤ 0) = 0 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0
CHIDF(X ≤ 0.5) = .5205 LPM(0, 0.5, X) = .5205 LPM(1, 0.5, X) = .2087
Table 1. Final probability estimates with 5 million observations and 300 iteration seeds CHIDF(X ≤ 1) = .6827 LPM(0, 1, X) = .6827 LPM(1, 1, X) = .5
averaged for the Normal distribution. Bold estimate is the continuous ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻ ൌ CHIDF(X ≤ 5) = .9747 LPM(0, 5, X) = .9747 LPM(1, 5, X) = .989
Ǥ .
Table 4. Final probability estimates with 5 million observations and 300 iteration seeds
averaged for the Chi-Squared distribution. Bold estimate is the continuous
ሺǡ
ࡸࡼࡹ࢘ࢇ࢚ ࣆǡ ࢄሻ ൌ Ǥ .
134 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 135
We propose using the mean absolute deviation from 0.5 for the samples in Figure 1 below illustrates 3 hypothetical sample distributions. The dotted lines
question. This result compared to the ideal 0.5 will then answer the ANOVA inquiry are the sample means ࣆ, which we know have an associated ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻ ൌ Ǥ .
whether the samples originated from the same population. The solid black line is the mean of means ࣆ
ഥ ൌ ૢǤ ૡ, and associated LPM ratio
Yielding our measure of certainty ߩ associated with the null hypothesis that the samples
ሺͲǤͷ െ ܦܣܯሻଶ
ߩൌ ሺሻ
ͲǤͷ
The next section will provide some visual confirmation of this methodology with Figure 1. 3 samples from the same population.
0.51, and 0.48 for blue, purple and green respectively. The mean absolute deviation from previous example. The dotted lines are the sample means ࣆ, which we know have an
.5 is equal to .0167. Thus we are certain ሺߩ ൌ ͲǤͻ͵Ͷሻ these 3 samples are from the same associated ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻ ൌ Ǥ . The solid black line is the mean of means
population. ࣆ
ഥ ൌ Ǥ ૡ, and associated LPM ratio deviations from 0.5 can be visually estimated.
The classic ANOVA would reach the same conclusion even at ܲ ݁ݑ݈ܽݒ൏ ǤͲͳ.
0.63, and 0.2 for blue, purple and green respectively. The mean absolute deviation from
.5 is equal to .1933. Thus we are not certain ሺߩ ൌ ͲǤ͵ሻ these 3 samples are from the
same population. The null hypothesis of a same population was rejected by classic
Figure 3 below illustrates 3 hypothetical sample distributions, only more varied than the
previous example. The dotted lines are the sample means, which we know have an
associated ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆǡ ࢄሻ ൌ Ǥ . The solid black line is the mean of means
ࣆ
ഥ ൌ Ǥ ૡ, and associated LPM ratio deviations from 0.5 can be visually estimated.
0.63, and 0.01 for blue, purple and green respectively. The mean absolute deviation from
.5 is equal to .2567. Thus we are more certain ሺߩ ൌ ͲǤʹ͵ሻ than the previous example
In the previous sections, we identified whether a difference exists and demonstrated how Viole and Nawrocki [2012c] define the asymptotic properties of partial moments
to assign a measure of uncertainty to our data. We focus now on how to ascertain the to the area of any f(x). Thus, it makes intuitive sense that increased quantities of samples
size of the difference present. The use of confidence intervals is often suggested as a and observations will provide a better approximation of the population. Given this
method to evaluate effect sizes. Our methodology assigns the interval to the effect truism, the degrees of freedom do not properly compensate the number of observations.
The first step is to derive a sample mean for which we would be 95% certain the sample magnitude effect on the F-Values.
mean belongs to the population. We calculate the lower 2.5% of the distribution with a
2 distributions and 3 distributions with 30 observations each:
LPM test at each point to identify the inverse, akin to a value-at-risk derivation. We
ࡲǤ ሺǡ ૢࢊࢌሻ ൌ Ǥ ࡲǤ ሺǡ ૡૡࢊࢌሻ ൌ Ǥ
perform the same on the upper portion of the distribution with a UPM test. This two
sided test results in a negative deviation from the population mean ሺࣆ ିכሻ and a 2 distributions and 3 distributions with 100 observations each:
corresponding positive deviation from the mean ሺࣆכା ሻ. It is critical to note that this is not ࡲǤ ሺǡ ૢૢࢊࢌሻ ൌ Ǥ ૡૡૡ ࡲǤ ሺǡ ૢૡࢊࢌሻ ൌ Ǥ
necessarily a symmetrical deviation, since any underlying skew will alter the CDF
The effect size then is simply, the difference between the observed meanሺࣆሻ and a distribution tests can be performed. For example, if 15 samples are all drawn from the
certain mean associated within a tolerance either side of the population mean same population, then there are 105 possible comparisons to be made leading to an
ሺࣆכࣆࢊࢇ ିכା ሻ. increased type-1 error rate. The mean absolute deviation for 2 distributions’
ࡸࡼࡹ࢘ࢇ࢚ ሺǡ ࣆ
ഥ ǡ ࢄሻ would have to be > 0.025 to be less than 95% certain (0.475/.5) the
ሺࣆ െ ࣆ ିכሻ ࢋࢌࢌࢋࢉ࢚ ሺࣆ െ ࣆכା ሻ.
distributions came from the same population. This translates to a substantial percentage
142 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS
difference in means. It is not hard to visualize such an extreme scenario such as Figure 4
below.
Again, we have no assumptions on the data to generate this analysis and compensate for
any deviation from normality either in the distribution of returns or the distribution of
CAUSATION
error terms. We substitute our level of certainty ߩ for an F-test and associated P-value
based ANOVA; the latter has been the subject of increasing debate recently and should
probably be avoided.16
16
http://news.sciencemag.org/sciencenow/2009/10/30-01.html?etoc
http://www.sciencenews.org/view/feature/id/57091/description/Odds_Are_Its_Wrong
Causation
Abstract
INTRODUCTION
Correlation does not imply causation. We have known this to be the case for
the suspicion that correlation and causation are entwined…but how? Fischer Black
[1984] offers multiple normative cases explaining how causality can only be
CORRELATION(X,Y)
Conditional Probability: The probability that an event will occur, given that one or more other
Correlation is a reciprocal relationship between two things. Conditional probability is not variables (Z). Attractor reconstruction is used to determine if two time series variables
necessarily a reciprocal relationship between two things. This distinction is critical in belong to the same dynamic system and are thus causally related.
HISTORICAL CAUSALITY TESTS related. CCM uses the historical record of Y to estimate the states of X and vice versa.
With longer time series the reconstructed manifolds are denser, nearest neighbors are
GRANGER CAUSALITY
closer, and the cross map estimates increase in precision. This convergence is used as a
Granger causality (GC) measures whether one event (X) happens before another
practical criterion for determining causation, further exposed by measuring the extent to
event (Y) and helps predict it. According to Granger causality, past values of X should
which the historical record of Y values can reliably estimate states of X. CCM
contain information that helps predict Y better than a prediction based on past values of Y
hypothesizes that this reliable estimate holds only if X is causally influencing Y.
alone. The formulation is based on a linear regression modeling of stochastic processes.
“In dynamical systems theory, time-series variables (say, X and Y ) are
This technique immediately raises some well documented concerns, namely, causally linked if they are from the same dynamic system (Dixon et al.
[1999], Takens [1981], Deyle et al. [2011])—that is, they share a common
linearity, stationarity and of course the appropriate selection of variables. Any proposed attractor manifold M.” Sugihara et al. [2012]
substitute should be able to address these basic data set concerns. Figure 1 is a reproduction from their paper illustrating the manifold relationship.
causation in ecological time series called convergent cross mapping (CCM). They
demonstrate the principles of their approach with simple model examples, showing that
the method distinguishes species interactions (X, Y) from the effects of shared driving
150 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 151
probabilities are not restricted to these specific characteristics. Separability reflects the
view that systems can be understood a piece at a time rather than as a whole. By
variables with a nonlinear scaling method. It also avoids the Granger problems of reverse
causality since the Venn areas (conditional probabilities) would have to be identical in
Figure 1. Manifold relationship from Sugihara et al. [2012]. The first step in our method is to normalize the variables in order to determine the
conditional probability is controlled quite easily; in fact, this is the main argument of
Black [1984]. To determine the conditional probability, we need a shared histogram for
Separability Requirement
variables X and Y. This is not all dissimilar to the approach in the convergent cross map
Sugihara et al. note the key requirement of GC is separability, namely that technique, with the common attractor manifold for the original system M used to describe
information about a causative factor is independently unique to that variable. Conditional ܯ and ܯ .
normalized variables retain their variance and other finite moment characteristics. This is 3) Derive the conditional probabilities. Using the partial moments of each of
important to accurately derive the conditional probability of the new normalized the resulting distributions will allow us to derive the conditional probabilities of the
variables. This is also critical in addressing the nonlinearity between variables where GC normalized variables.
fails.
்
ͳ
The CCM manifolds ܯ and ܯ are constructed from lagged coordinates of the ܯܲܮሺ݊ǡ ݄ǡ ܺሻ ൌ ሼሺ݄ െ ܺ௧ ሻ ǡ Ͳሽ ሺͳሻ
ܶ
௧ୀଵ
time series variables to retain past information. We accomplish the retention of lagged
்
information via the normalization of each variable against lagged values of itself (߬ and ͳ
ܷܲܯሺݍǡ ݈ǡ ܺሻ ൌ ሼሺܺ௧ െ ݈ሻ ǡ Ͳሽ ሺʹሻ
ܶ
௧ୀଵ
ʹ߬), resulting in normalized variables ܺԢ and ܻԢ.
We then normalize ܺԢ and ܻԢ to each other via the VN process of nonlinear Where ܺ௧ is the observation of variable X at time t, h and l are the targets from
scaling to generate the shared histogram resulting in ܺԢԢ and ܻԢԢ. which to compute the lower and upper deviations respectively, and n and q are the
CONDITIONAL PROBABILITIES
Figure 3. Normalized Data Sets ࡼሺࢅԢԢȁࢄԢԢሻ ൌ
We illustrate how the partial moment ratios can also emulate conditional
distribution areas from which the LPM and UPM can be observed.
ࢅԢԢ B1
Z
X
ܽ ܿ ܾ ݀
nothing about the relationship between them, in fact, if the correlation is negative we
could state that ࢄ cures ࢅ! We assume (know) this to not be the case, but it illustrates the
The conditional probability ࡼሺࢅȁࢄሻ ൌ reconstructed as normalized distributions. The
necessity to define the relationship between ࢄ and ࢅ further than just their conditional
following degree 0 partial moment relationships will yield the conditional probability of
probability.
ࢅԢԢ given ࢄԢԢ .
156 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 157
Per figure 3 above, given the conditional probability ࡼሺࢅȁࢄሻ ൌ , and if a positive independent variables to satisfy equation 4 is nearly impossible in the social sciences and
correlation exists such that measured increases (decreases) in ࢄ result in measured is a prominent argument in Black [1984].17
increases (decreases) in ࢅ (correlation ࣋ࢄǡࢅ ൌ ), we can state definitively that ࢄ causes
ࢅ.
Z
X
ሺࢄ ՜ ࢅሻ ൌ ࡼሺࢅȁࢄሻ ࢄ࣋ כǡࢅ
ሺࢄ ՜ ࢅሻ ൌ כ
ሺࢄ ՜ ࢅሻ ൌ
Y
The reciprocal case does not necessarily hold as we can see from the figure above. Since
ADDITIVITY OF CAUSATION
The conditional probability ࡼሺࢅȁࢄሻ̱Ǥ ૡ reconstructed as normalized distributions.
ሺࢄǥ ՜ ࢅሻ ൌ ሺͶሻ
ୀଵ
Figure 5. Normalized Data SetsࡼሺࢅԢԢȁࢄԢԢሻ̱Ǥ ૡ If the correlation between variables ࢄ and ࢅ is the same as our theortetical assumption
ሺࢄ ՜ ࢅሻ ൌ Ǥ ૡ כ
ࢄԢԢ
ሺࢄ ՜ ࢅሻ ൌ Ǥ ૡ
ࢅԢԢ
Then by the additive assumption, there exist other variable(s) to explain the causation of
ࢅ for the remaining 0.15 while factoring their specific correlations as well. It should be
Bayes’ theorem will also generate the conditional probability of X given Y, ܲሺܺȁܻሻ with ܯܷܲܥሺͲȁͲǡ ܿȁܽǡ ܻȁܺሻ
ܷܲܯሺͲǡ ܽǡ ܺሻ
ܷܲܯሺͲǡ ܽǡ ܺሻ
ܲሺܺȁܻሻ ൌ
the formula ܷܲܯሺͲǡ ܿǡ ܻሻ
ܲሺܻȁܺሻܲሺܺሻ Cancelling out ܲሺܺሻ leaves us with Bayes’ theorem represented by partial moments, and
ܲሺܺȁܻሻ ൌ Ǥ
ܲሺܻሻ
our conditional probability on the right side of the equality.
method. We then apply our method to the S&P 500 – 10 Year Treasury Yield – Money
Where ݁ is the minimum value target of area (distribution) Z; just as ܽ and ܿ are for areas Supply relationship.
Sugihara et al. Sardine – Anchovy – SST Example Replication This example raises an important correlation consideration, especially when the
differences in variables are in orders of magnitude. The sardine landings (left y-axis) and
Sugihara et al. examine the relationship among Pacific sardine landings, northern
anchovy landings (right y-axis) in figure 7 are represented in different orders of
anchovy landings, and sea surface temperature (SST). Figure 7 below, reproduced from
magnitude for their unnormalized observations. Linear correlation coefficients are ill
Sugihara et al. panel C shows the California landings of Pacific sardine and northern
suited for such analysis. Figure 8 from VN[2012] illustrates the VN correlation
anchovy, while panels D to F show the CCM (or lack thereof) of sardine versus anchovy,
coefficient differences under such an extreme scale consideration ሺܻ ൌ ܺଵ ሻversus the
sardine versus SST, and anchovy versus SST respectively. Sugihara et al. contend this
Pearson correlation coefficient.
shows that sardines and anchovies do not interact with each other and that both are
Unnormalized Data
1,400,000,000
19.5
1,200,000,000
18.5
1,000,000,000
17.5
800,000,000
Temp ˚C
16.5 Sardines
600,000,000
15.5 La Jolla SST Anchovies
400,000,000
14.5 Newport SST
200,000,000
13.5 -
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
12.5
1983
1928
1933
1938
1943
1948
1953
1958
1963
1968
1973
1978
1988
1993
1998
2003
Year
Nonlinear Scaling
800,000,000
Figure 9. Newport and La Jolla SST relationship visualized. Newport Beach
SST data were used for anchovy data set versus La Jolla SST for sardine data 700,000,000
set per Sugihara et al. procedure. 600,000,000
500,000,000
400,000,000 Sardines
300,000,000 Anchovies
Figure 9 illustrates the (nonlinear) relationship between Newport and La Jolla SST. The 200,000,000
100,000,000
VN correlation coefficient under a less extreme scale consideration versus the Pearson
-
1944
1948
1928
1932
1936
1940
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
correlation coefficient are .43 and .6541 respectively. The extreme scaling differences,
present even after normalization, argue for the more accurate nonlinear VN correlation Figure 10. Unnormalized and Normalized Sardine and
Anchovy landings per the VN process. Successfully
coefficient. Figure 10 represents the results of the VN normalization process. Sugihara eliminating orders of magnitude differences while
maintaining distributional properties.
et al. use a first difference normalization technique with unintended consequences as will
be discussed later.
166 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 167
Table 1. Sardine-Anchovy data set with ૌ ൌ for normalization. A: The Table 2. Sardine-SST data set with ૌ ൌ for normalization. A: The conditional
conditional probability matrix; B: The VN ρ on the normalized data; C: The probability matrix; B: The VN ρ on the normalized data; C: The Pearson ρ on the
Pearson ρ on the normalized data (for comparison to VN results); D: Causality normalized data (for comparison to VN results); D: Causality matrix.
matrix.
۾ሺ܇ԢԢȁ܆ԢԢሻ
۾ሺ܇ԢԢȁ܆ԢԢሻ X
X Sardines SST
Sardines Anchovies A Y
Sardines - .008
A Y
Sardines - .775 SST 1.0 -
Anchovies 1.0 -
ૉ ܆ᇲᇲ ǡ ܇ᇲᇲ
ૉ ܆ᇲᇲ ǡ ܇ᇲᇲ X
X Sardines SST
Sardines Anchovies B Y Sardines - (.157)
B Y Sardines - (.5663) SST (.157) -
Anchovies (.5663) -
Table 3. Anchovy-SST data set with ૌ ൌ for normalization. A: The conditional Sugihara et al. Sardine – Anchovy – SST Example Discussion
probability matrix; B: The VN ρ on the normalized data; C: The Pearson ρ on the
normalized data (for comparison to VN results); D: Causality matrix.
Sugihara et al. [2012] declare from the implementation of the CCM method on the
sardine – anchovy – SST dataset,
conditions (salinity?) and also have inverse causal relationships. The sardines leave
Pearson ૉ ܆ᇲᇲ ǡ܇ᇲᇲ
X (diminished presence) due to this omitted variable, and the SST subsequently rises. The
SST Anchovies
C Y SST - .1459 sardines did not cause the water temperature increase, they anticipated the rise and left.
Anchovies .1459 -
“Thus, although sardines and anchovies are not actually interacting, they
are weakly forced by a common environmental driver, for which
۱ሺ ܆՜ ܇ሻ temperature is at least a viable proxy. Note that because of transitivity,
X temperature may be a proxy for a group of driving variables (i.e.,
SST Anchovies temperature may not be the most proximate environmental driver).”
A*B=D Y SST - (.0067) Sugihara et al. [2012].
Anchovies (.00003) -
coupling case from Sugihara et al. The minimal net effect sardine-anchovy of (.1275) Table 3. Normalization effects on Pearson correlations and resulting correlation
also suggests another variable at play. We are not here to prove causation of sardine and matrices.
anchovy landing data, as the authors’ focus of finance and economics precludes them
Raw Data Pearson ρ 1st Differences Normalized data
from accurately selecting relevant variables. However, we do offer a contending insight Pearson ρ
SST Anchovy Sardine SST(NB) SST Anchovy Sardine
to the Sugihara et al. conclusion using exclusively nonlinear techniques. SST(NB)
techniques on the raw data. Sugihara et al. use the first difference in data points to Sardine (.10) (.358) 1 .1607 .017 (.073) 1 .0403
normalize the data in CCM. This standard normalization technique results in a Pearson SST(NB) .6541 (.2431) .1607 1 .8694 (.0632) .0403 1
correlation of -.073 and equally paltry .0278 VN correlation coefficient for sardines
VN Normalized data Pearson ρ
versus anchovies. However, this is compared with a -.3579 Pearson and -.67 VN
SST Anchovy Sardine SST(NB)
correlation coefficient on the raw data. Table 3 below presents the Pearson correlation SST 1 (.3043) (.10) .6541
coefficient for the raw data set, the Sugihara et al. first differences data set, and the VN Anchovy (.3043) 1 (.358) (.2431)
Money Supply – S&P 500 – 10 Year US Treasury Yield Example Figure 11. Visual representation of the unnormalized (top) dual y-axis and
final normalized variables (τ=1) single y-axis using the method presented in Viole
and Nawrocki [2013]. Also illustrates the ability for true multivariable
We present the findings of on the S&P 500 – 10 Year Treasury Yield – Money normalization.
Supply relationship through our method using a three variable normalization versus the
multiple variables. The resulting normalized variables are analogous to the manifolds
Unnormalized Data
offered in CCM and present the system as a whole for consideration by placing them on a
14000.00 18.00
16.00
12000.00 shared axis.
14.00
10000.00
12.00
One important feature is that ࡹࢆࡹԢԢ has a conditional probability equal to one
Yield %
8000.00 10.00 S&P 500
6000.00 8.00 MZM
given the events of both the ࢅࢋࢇ࢘ࢅࢋࢊԢԢ and the ࡿƬࡼԢԢ. All of the normalized
6.00
4000.00 10 Yr Yield
4.00
2000.00
data points fit within the normalized range for ࡹࢆࡹԢԢ per figure 11 above. These
2.00
0.00 0.00 numbers are in red in section A of table 4 below.
1959
1964
1969
1974
1979
1984
1989
1994
1999
2004
2009
1988
1991
1994
1998
2001
2004
2007
2011
174 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 175
Table 4. Financial variable dataset with ૌ ൌ for normalization. A: The We can state that MZM is a cause to S&P 500 prices and inverse cause to 10 year
conditional probability matrix; B: The VN ρ on the normalized data; C: Causality
matrix with cumulative causation in the bottom row and cumulative effect in far Treasury yields net of the bidirectional coupling the variables share. It should be
right column.
noted that the linear Pearson correlation resulted in extremely high correlations, and
consequently causation for these same variable setsሺɏଡ଼ᇲᇲ ǡଢ଼ᇲᇲ ǤͻͲሻ. These above results
۾ሺ܇ԢԢȁ܆ԢԢሻ are consistent (and stronger) with the asymmetrical bidirectional coupling predator – prey
X
S&P 500 10 Year Yield MZM example in Sugihara et al. and with Black’s casual argument on the intertwined
SPY - .6867 1.0
A Y 10 Year
1.0 - 1.0
relationship between money stock and economic activity.
Yield
MZM .9074 .6651 -
Rogalski and Vinso [1977] through GC firmly reject the hypothesis that causality runs
ૉ ܆ᇲᇲ ǡ ܇ᇲᇲ
X unidirectionally from past values of money to equity returns. Their results are consistent
S&P 500 10 Year Yield MZM
SPY 1.0 (.2841) .5031 with the hypothesis that stock returns are not purely passive but perhaps influence money
B Y 10 Year
(.2841) 1.0 (.5287) supply in some complicated fashion. Our results showing asymmetrical bidirectional
ૉ ܇ᇲᇲ ǡ܆ᇲᇲ Yield
MZM .5031 (.5287) 1.0
coupling directly support Rogalski and Vinso’s contention.
۱ሺ ܆՜ ܇ሻ
X
S&P 500 10 Year Yield MZM
SPY - (.1940) .5031
C Y 10 Year
(.2841) - (.5287)
Yield
MZM .4565 (.3517) -
۱ሺ ܆՜ ܇ሻ .1724 (.5457) (.0256)
176 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 177
DISCUSSION While CCM was not designed to compete with GC, rather is specifically aimed at
We need correlations and conditional probabilities, with and without leads and systems as the convergent cross mapping method exemplifies without collaboration. We
lags, to determine causation.
could provide many more axiomatic examples of known (and unknown) conditional
Granger causality was predicated on prediction instead of correlation to identify
probabilities as Black does for support (or rejection) of causation, but experimentation
causation between time-series variables. Stochastic variables predicated on nonlinear
and empirical analysis will ultimately serve as proof to this theoretical work. We look
relationships do not lend themselves to prediction, especially if they are not strongly
forward to extending the discussion to other fields in search of these experiments, thus
synchronized.
satisfying the conditional probability requirement in proving causation.
“Therefore, information about X(t) that is relevant to predicting Y is
redundant in this system and cannot be removed simply by eliminating X
as an explicit variable. When Granger’s definition is violated, GC
calculations are no longer valid, leaving the question of detecting
causation in such systems unanswered.” Sugihara et al. [2012]
178 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 179
APPENDIX A Using the following data in Table 1A, we are after the bold red numbers:
Earlier we illustrated the conditional probability for a given occurrence using S&P 500 10 Year Yield S&P 500 10 Year Yield
1/1/2005 2.56% 0.95% 5/1/2007 3.95% 2.81%
partial moments from normalized variables. However, if we wish to further constrain the 2/1/2005 -1.50% -0.24% 6/1/2007 3.19% 1.27%
3/1/2005 1.53% -1.19% 7/1/2007 0.22% 7.11%
conditional distribution to positive and negative occurrences we need to use co-partial
4/1/2005 -0.40% 7.62% 8/1/2007 0.41% -1.98%
5/1/2005 -2.58% -3.62% 9/1/2007 -4.44% -6.83%
moments of reduced the reduced observation count. This differs from a joint probability
6/1/2005 1.18% -4.72% 10/1/2007 2.88% -3.26%
7/1/2005 2.01% -3.44% 11/1/2007 2.80% 0.22%
where the number of observations is not reduced to the conditional occurrences.
8/1/2005 1.65% 4.40% 12/1/2007 -5.08% -8.76%
9/1/2005 0.17% 1.90% 1/1/2008 1.08% -1.21%
The following example will generate the conditional probability of a specific 10/1/2005 0.13% -1.42% 2/1/2008 -7.03% -9.19%
11/1/2005 -2.81% 6.01% 3/1/2008 -1.75% 0.00%
occurrence with Bayes’ theorem, then with our method. Given 100 observations of 10 12/1/2005 3.74% 1.78% 4/1/2008 -2.84% -6.35%
1/1/2006 1.98% -1.55% 5/1/2008 3.98% 4.73%
Year yield returns and S&P 500 returns (normalized by percentage return), what is the
2/1/2006 1.31% -1.12% 6/1/2008 2.36% 5.29%
3/1/2006 -0.16% 3.34% 7/1/2008 -4.52% 5.52%
probability that given an interest rate increase, stocks rose?
4/1/2006 1.33% 3.23% 8/1/2008 -6.46% -2.22%
5/1/2006 0.65% 5.56% 9/1/2008 1.90% -3.04%
6/1/2006 -0.94% 2.38% 10/1/2008 -5.16% -5.28%
7/1/2006 -2.90% 0.00% 11/1/2008 -22.81% 3.20%
8/1/2006 0.57% -0.39% 12/1/2008 -9.27% -7.63%
9/1/2006 2.11% -4.21% 1/1/2009 -0.62% -37.75%
10/1/2006 2.35% -3.33% 2/1/2009 -1.37% 4.05%
11/1/2006 3.40% 0.21% 3/1/2009 -7.23% 13.01%
12/1/2006 1.84% -2.79% 4/1/2009 -6.16% -1.76%
1/1/2007 1.98% -0.87% 5/1/2009 11.35% 3.83%
2/1/2007 0.54% 4.29% 6/1/2009 6.20% 11.59%
3/1/2007 1.44% -0.84% 7/1/2009 2.59% 12.28%
4/1/2007 -2.65% -3.45% 8/1/2009 1.04% -4.40%
180 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 181
S&P 500 10 Year Yield S&P 500 10 Year Yield P(SI) = probability of the S&P 500 increasing
9/1/2009 7.60% 0.84% 1/1/2012 1.37% -1.50% P(SD) = probability of the S&P 500 decreasing
10/1/2009 3.39% -5.44% 2/1/2012 4.50% -0.51% P(II) = probability of interest rates increasing
11/1/2009 2.19% -0.29% 3/1/2012 3.91% 0.00% P(ID) = probability of interest rates decreasing
12/1/2009 1.89% 0.29% 4/1/2012 2.68% 9.67%
1/1/2010 2.03% 5.44% 5/1/2012 -0.20% -5.69%
Interest Rate Increase Interest Rate Decrease Interest Rate Unchanged Total
2/1/2010 1.18% 3.83% 6/1/2012 -3.31% -13.01%
3/1/2010 -3.11% -1.08% 7/1/2012 -1.34% -10.54% S&P Increase 35 CUPM 28 DLPM 2 65 UPM
4/1/2010 5.61% 1.08% 8/1/2012 2.71% -5.72%
5/1/2010 3.85% 3.17% 9/1/2012 3.16% 9.35% S&P Decrease 9 DUPM 24 CLPM 2 35 LPM
6/1/2010 -6.22% -11.84% 10/1/2012 2.81% 2.35%
7/1/2010 -3.78% -6.65% 11/1/2012 -0.39% 1.73%
S&P Unchanged 0 0 0 0
8/1/2010 -0.33% -6.12% 12/1/2012 -3.06% -5.88%
Total 44 UPM 52 LPM 4 100
9/1/2010 0.69% -10.87% 1/1/2013 1.97% 4.15%
10/1/2010 3.15% -1.87% 2/1/2013 4.00% 10.48% Table 2A. Bayes’ Theorem probabilities identified and displayed from the data in
11/1/2010 4.32% -4.24% 3/1/2013 2.13% 3.60% table 1A. Corresponding partial moments quadrants also represented.
12/1/2010 2.30% 8.31% 4/1/2013 2.52% -1.02%
1/1/2011 3.49% 17.57%
2/1/2011 3.26% 2.99%
ሺூூȁௌூሻሺௌூሻ
3/1/2011 2.96% 5.45% According to Bayes’ theorem ܲሺܵܫȁܫܫሻ ൌ ሺூூሻ
4/1/2011 -1.27% -4.87%
5/1/2011 2.05% 1.46% ͵ͷ ͷ
ቀ
ቁቀ ቁ ͵ͷ
6/1/2011 0.51% -8.75% ܲሺܵܫȁܫܫሻ ൌ ͷ ͳͲͲ ൌ ൬ ൰ ൌ ͻǤͷͷΨ
ͶͶ ͶͶ
7/1/2011 -3.89% -5.51% ቀͳͲͲቁ
8/1/2011 2.90% 0.00%
9/1/2011 -11.15% -26.57%
10/1/2011 -0.97% -14.98%
This example raises an immediate concern - in the instance where there is a zero return,
11/1/2011 2.80% 8.24%
12/1/2011 1.58% -6.73%
the observation is neither a gain nor a loss. These observations are highlighted in grey in
table 1A. When an observation equals a target in the partial moment derivations, that
observation is placed into an empty set; analogous to the unchanged column in the table
above. Empty sets reduce both the lower and upper partial moments, thus their effect is
minimum 10 Year Yield observation is equal to .7955. The S&P 500 degree zero upper [1] 0.44
partial moment from the maximum 10 Year Yield observation is equal to zero. Thus, the
conditional probability of a positive S&P 500 return given an increase in 10 Year Yields The number of occurrences is (0.44 * T) which yields 44 in this example. Using T* as
is equal to 79.55%, represented by the lighter shaded blue. our reduced universe of observations, we compute the conditional upper partial moment
for a direct computation of the conditional probability from the underlying time series.
8
6
10 Year Yield In our example,
4
2 S&P ܯܷܲܥሺͲȁͲǡͲȁͲǡ ܺȁܻሻ ൌ Ǥͻͷͷ
0
-5.85%
11.11%
-8.27%
-3.43%
-1.01%
13.53%
15.95%
1.42%
3.84%
6.26%
8.68%
And in R:
-22.81%
-20.39%
-17.96%
-15.54%
-13.12%
-10.70%
> Co.UPM(0,0,sp,ten.yr,0,0)/UPM(0,0,ten.yr)
Return
[1] 0.7954545
Figure 1A. Graphical representation of conditional probability of positive S&P500 > UPM(0,0,sp[ten.yr>0])
return given an increase in 10 Year Yields.
[1] 0.7954545
Alternatively, we can derive the same conclusion with conditional partial moments. The
But, this result isn’t particularly interesting or innovative since degree zero partial
frequency of positive 10 Year Yield returns is represented by the degree zero upper
moments are frequency and counting statistics – just as in the Bayes derivation.
partial moment from a zero target, where X= S&P 500 and Y = 10 year yield.
184 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS
measure whereby the average S&P increase given an increase in interest rates can be
்כ
ͳ
ܯܷܲܥሺͳȁͲǡͲȁͲǡ ܺȁܻሻ ൌ ൬ כ൰ ή ሾ݉ܽݔሺܺ௧ כെ ͲǡͲሻሿሾ݉ܽݔሺܻ௧ כെ ͲǡͲሻሿ
ܶ כ ௧ ୀଵ
In our example the average S&P 500 increase given an increase in interest rates is,
And in R:
> (Co.UPM(1,0,sp,ten.yr,0,0)-D.UPM(1,0,sp,ten.yr,0,0))/UPM(0,0,ten.yr)
[1] 0.01495909
REFERENCES
> UPM(1,0,sp[ten.yr>0])-LPM(1,0,sp[ten.yr>0])
[1] 0.01495909
Both methodologies yield the same conditional probability which is not surprising given
the simple frequency requirement of the underlying calculation and same associated
targets for the partial moments. However, since partial moments are already used in
overlooked.
NONLINEAR NONPARAMETRIC STATISTICS References 187
REFERENCES
Black, Fischer [1984]. “The Trouble with Econometric Models.” Financial Analysts
Journal, Vol. 38, No. 2, pp. 29-37.
Guthoff, A, Pfingsten, A., Wolf, J (1997), “On the compatibility of Value-at-risk, other
risk concepts and expected utility maximization,” in: Hipp, C. et.al. (eds).
Guthoff, A., Pfingsten, A. and J. Wolf (1997). “On the Comapatibility of Value at
Risk,Other Risk Concepts, and Expected Utility Maximization”; in: Hipp, C. et.al.
(eds.): Geld, Finanzwirtschaft, Banken und Versicherungen: 1996; Beiträge zum
7. Symposium Geld, Finanzwirtschaft, Banken und Versicherungen an der
Universität Karlsruhe vom 11.-13. Dezember 1996, Karlsruhe 1997, p. 591-614.
188 References NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS References 189
Holthausen, D. M. (1981). "A Risk-Return Model With Risk And Return Measured As A.W. van der Vaart, J.A. Wellner, Jon A. (1996), “Weak convergence and empirical
Deviations From a Target Return." American Economic Review, v71(1), 182-188. processes.” With applications to statistics. Springer Series in Stat. Springer-Verlag, New
Kaplan, P. and Knowles, J. (2004), “Kappa: A Generalized Downside Risk-Adjusted York.
Performance Measure.” Journal of Performance Measurement, 8(3), 42-54. Wang, G.S., [2008]. “A Guide to Box-Jenkins Modeling.” Journal of Business
Lucas, D. (1995). “Default Correlation and Credit Analysis.” Journal of Fixed Income, Forecasting; Spring 2008, Vol. 27 Issue 1, p19
Vol. 11, pp. 76-87. http://demonstrations.wolfram.com/SingleFactorAnalysisOfVariance/
Markowitz, Harry. 1959, Portfolio Selection. (First Edition). New York: John Wiley and
Sons.
Pitman, E.J.G. (1979). “Some Basic Theory for Statistical Inference.” London,
Chapman and Hall. i
Newton proved the integral of a point in a continuous distribution to be equal to zero.
ii
Rogalski, R. J., and Vinso, J. D. [1977] "Stock Returns, Money Supply, and the If no data exists in a subset, no mean is calculated.
iii
Direction of Causality." Journal of Finance, September 1977, pp. 1017-1030. The horizontal line as in the equation ܻ ൌ ͳ (point probability) yields a 0 correlation
for both Pearson’s correlation and our metric.
iv
Shadwick, W. and Keating, C. (2002), “A Universal Performance Measure.” Journal of All variables in the regression are exchange traded funds (ETFs) that trade in US
Performance Measurement, Spring 2002, pp. 59-84, 2002. markets: SPY is the S&P 500 ETF, TLT is the Barclays 20+ year Treasury Bond ETF,
GLD is the Gold Trust ETF, FXE is the Euro Currency ETF, and GSG is the S&P GSCI
G.R. Shorack, J.A. Wellner, (1986), “Empirical processes with applications to statistics,” Commodity Index ETF.
v
Wiley Series in Probab. and Math. Stat.: Probab. and Math. Stat. John Wiley & Sons, The data are monthly series from 01/01/1959 through 04/01/2013. They are available
Inc., New York. from FRED with links to graphs and data for each of the variables listed.
Sugihara, G., May, R., Ye, H., Hsiech, C., Deyle, E., Fogarty, M., Much, S. [2012]. http://research.stlouisfed.org/fred2/graph/?id=SP500
“Detecting Causality in Complex Ecosystems.” Science, Vol. 338, pp. 496-500.
http://research.stlouisfed.org/fred2/graph/?s[1][id]=GS10
Takens, F. [1981] in Dynamical Systems and Turbulence, D. A. Rand, L. S. Young, Eds.
http://research.stlouisfed.org/fred2/series/MZMNS?rid=61
(Springer-Verlag, New York, 1981), pp. 366–381.