100% found this document useful (2 votes)
542 views101 pages

Nonlinear Nonparametric Statistics: Using Partial Moments

Nonlinear Nonparametric Statistics: Using Partial Moments
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
100% found this document useful (2 votes)
542 views101 pages

Nonlinear Nonparametric Statistics: Using Partial Moments

Nonlinear Nonparametric Statistics: Using Partial Moments
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 101

NONLINEAR

NONPARAMETRIC
STATISTICS: Using
Partial Moments

Fred Viole
David Nawrocki

© 2013 Viole & Nawrocki. All Rights Reserved


Table of Contents

Asymptotic Relationships 1

Discrete Vs. Continuous Distributions 13

Correlation & Regression 65

Autoregressive Modeling 93

Normalization of Data 113

Analysis of Variance (ANOVA) 129

Causation 147

References 187
Foreword
This book introduces a toolbox of statistical tools using partial moments that are both old
and new. Partial moment analysis is over a century old but most applications of partial
moments have not progressed beyond a substitution for simple variance analysis. Lower
partial moments have been in use in finance in portfolio investment theory for over 60
years. However, just as the normal distribution and the variance leads the statistician into
linear correlation and regression analysis, partial moments leads us towards nonlinear
correlation and nonparametric regression analysis. Using partial moments as a variance
measure is only the tip of the iceberg as the purpose of this book is to explore the entire
iceberg.

This partial moment toolbox is the “new” presented in this book. However, “new”
always should have some advantage over “old”. The advantage of using partial moments
is that it is nonparametric and does not require the knowledge of the underlying
probability function nor does it require a “goodness of fit” analysis. Partial moments
provide us with cumulative density functions, probability density functions, linear
correlation and regression analysis, nonlinear correlation and regression analysis,
ANOVA, and ARMA/ARCH models. This new toolbox is completely nonparametric
and provides a full set of probability hypothesis testing tools without knowing the
underlying probability distribution.

In this new advanced approach to nonparametric statistics, we merge the ideas of discrete
and continuous processes and present them in a unified framework predicated on partial
moments. Through the asymptotic property of partial moments, we show the two
schools of mathematical thought do not converge as commonly envisioned. The
increased observations approximate the continuous area of a function; versus stabilizing
on a discrete counting metric. However, it remains a strictly binary analysis: discrete or
continuous. The known properties generated from this continuous vs. discrete analysis
affords an assumption free analysis of variance (ANOVA) on multiple distributions.

In our correlation and regression analysis, linear segments are aggregated to describe a
nonlinear system. The computational issue is to avoid overfitting. However, since we
can effectively determine the signal to noise ratio, this consideration is alleviated
ultimately yielding a more robust result. By building off basic relationships between
variables, we are able to perform multivariate analysis with ease and transform
“complexity” into “tedious.” One major advantage with our work is that the partial
moment methodology fully replicates linear conditions or known functions. This trust of
methodology is important for transition to chaotic unknowns and forecasting with
autoregressive models.

Normalization of data has the unintended consequence of transforming continuous


variables to discrete variables while eliminating prior relationships. We present a
normalization method that enables a truly apples to apples comparison that retains the
finite moment properties of the underlying distribution. In the ensuing analysis of the
question variables, we illustrate the distinction between correlation and causation. Using
this distinction we offer a definition of causation that integrates historical correlation with
conditional probabilities.

Finally, linearity should be a pleasant surprise to encounter in data, not a prerequisite. By


eliminating all preconceptions and assumptions, we offer a powerful framework for
statistical analysis. The simple nonparametric architecture based on partial moments
yields important information to easily conduct multivariate analysis; generating
descriptive and inferential statistics for a nonlinear world.

*** All of the functions in this book are available in the R-package ‘NNS’ available on
CRAN: https://cran.r-project.org/web/packages/NNS/
ASYMPTOTICS
ࢌሺࡺࢋ࢚࢝࢕࢔ሻ

Abstract

We define the relationship between integration and partial moments through the
integral mean value theorem. The area of the function derived through both methods
share an asymptote, allowing for an empirical definition of the area. This is important in
that we are no longer limited to known functions and do not have to resign ourselves to
goodness of fit tests to define f(x). Our empirical method avoids the pitfalls associated
with a truly heterogeneous population such as nonstationarity and estimation error of the
parameters. Our ensuing definition of the asymptotic properties of partial moments to the
area of a given function enables a wide array of equivalent comparative analysis to linear
and nonlinear correlation analysis and calculating cumulative distribution functions for
both discrete and continuous variables.
NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 1

“Imagine how much harder physics would be if electrons had feelings.” - Richard Feynman

INTRODUCTION

Modern finance has an entrenched relationship with calculus, namely in the fields

of risk and portfolio management. Calculus by definition is the study of limits and

infinitesimal series. However, given the seemingly infinite amount of financial data

available we ask the question whether calculus is too restrictive.

In order to utilize the powerful tools of calculus, a function of a continuous

variable must be defined. Least squares methods and families of distributions have been

identified over the years to assist in this definition prerequisite. Once classified, variables

can be analyzed over specific intervals. Comparison of these intervals between variables

is also possible by normalizing the area of that interval.

Unfortunately, there are major issues with each of the identified steps of the

preceding paragraph. When defining a continuous variable, you are stating that its shape

(via parameters) is fixed in stone (stationary). Least squares methods of data fitting make

no distinction whether a residual is above or below the fitted value, disregarding any

implications thereof. And finally, normalization of continuous variables has been

shown to generate discrete variable solutions [1].

Given these formidable detractions, we contend that a proper asymptotic

approximation of a function’s area “is a better fit” to its intended applications. Parsing

variances into positive or negative from a specified point is quite useful for nonlinear
2 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 3

correlation coefficients and multiple nonlinear regressions as demonstrated in [2]; and Where ‫ݔ‬௧ is the observation of variable x at time t, h and l are the targets from which to

calculating cumulative distribution functions for both discrete and continuous variables compute the lower and upper deviations respectively, and n and q are the weights to the

[1]. lower and upper deviations respectively. We set ݊ǡ ‫ ݍ‬ൌ ͳ and ݄ ൌ ݈ to calculate the

continuous area of the function as demonstrated in [1].


Furthermore, the multiple levels of heterogeneity present in the market structure

negate the relevance of true population parameters estimated by the classical parametric

method. Estimation error and nonstationarity of the first moment, μ are testaments to the Partial moments resemble the Lebesgue integral, given by

underlying heterogeneity issue; leaving the nonparametric approach as the only viable 
െ݂ሺ‫ݔ‬ሻǡ݂݂݅ሺ‫ݔ‬ሻ ൏ Ͳǡ
݂ ି ሺ‫ݔ‬ሻ ൌ ƒšሺሼെ݂ሺ‫ݔ‬ሻǡ Ͳሽሻ ൌ  ൜ ሺ͵ሻ
solution for truly heterogeneous populations. Our ensuing definition of the asymptotic Ͳǡ‫݁ݏ݅ݓݎ݄݁ݐ݋‬ǡ

properties of partial moments to the area of a given function enables a wide array of ݂ሺ‫ݔ‬ሻǡ݂݂݅ሺ‫ݔ‬ሻ ൐ Ͳǡ
݂ ା ሺ‫ݔ‬ሻ ൌ ƒšሺሼ݂ሺ‫ݔ‬ሻǡ Ͳሽሻ  ൌ  ൜ ሺͶሻ
Ͳǡ‫݁ݏ݅ݓݎ݄݁ݐ݋‬Ǥ
equivalent comparative analysis to the classical parametric approach.

In order to transform the partial moments from a time series to a cross-sectional dataset

OUR PROPOSED METHOD where x is a real variable, we need to alter equations 1 and 2 to reflect this distinction and

Integration and differentiation have been important tools in defining the area introduce the interval [a,b] for which the area is to be computed.

under a function ሺ݂ሺ‫ݔ‬ሻሻ since their identification in the 17th century by Isaac Newton and

Gottfried Leibniz. Approximation of this area is possible empirically with the lower and ௡
ͳ
upper partial moments of the distribution presented in equations 1 and 2. ‫ܯܲܮ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻ ൌ  ෍ሼƒšሺെ݂ሺ‫ݔ‬௜ ሻ ǡ Ͳሽ ݂݅‫ א ݔ‬ሾܽǡ ܾሿǡሺͷሻ
݊
௜ୀଵ


் ͳ
ͳ ܷܲ‫ܯ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻ ൌ  ෍ሼƒšሺ݂ሺ‫ݔ‬௜ ሻሻ ǡ Ͳሽ ݂݅‫ א ݔ‬ሾܽǡ ܾሿǤሺ͸ሻ
‫ܯܲܮ‬ሺ݊ǡ ݄ǡ ‫ݔ‬ሻ ൌ  ෍ሼƒšሺ݄ െ ‫ݔ‬௧ ሻ ǡ Ͳሽ௡ ሺͳሻ ݊
ܶ ௜ୀଵ
௧ୀଵ

் We further constrained equations 5 and 6 by setting the target equal to zero for both
ͳ
ܷܲ‫ܯ‬ሺ‫ݍ‬ǡ ݈ǡ ‫ݔ‬ሻ ൌ  ෍ሼƒšሺ‫ݔ‬௧ െ ݈ሻ ǡ Ͳሽ௤ ሺʹሻ
ܶ functions and consider the total number of observations n, rather than the time
௧ୀଵ
4 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 5

qualification T. The target for the transformed partial moment equations will be a

horizontal line, in this instance zero (x-axis); whereby all ݂ሺ‫ݔ‬ሻ ൐ Ͳ are positive and all Invoking the mean value theorem, where

݂ሺ‫ݔ‬ሻ ൏ Ͳ are negative area considerations, per the Lebesgue integral in equations 3 and ‫ܨ‬ሺܾሻ െ ‫ܨ‬ሺܽሻ
‫ܨ‬Ԣሺܿሻ ൌ ሺͻሻ
ሺܾ െ ܽሻ
4.

Lebesgue integration also offers flexibility versus its Riemann counterpart; just as partial We have

moments offer flexibility versus the standard moments of a distribution. Equation 7 ‫ܨ‬Ԣሺܿሻ ൌ Ž‹ ሾܷܲ‫ܯ‬൫ͳǡͲǡ ݂ሺ‫ݔ‬ሻ൯ െ ‫ܯܲܮ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻሿሺͳͲሻ
௡՜ஶ

illustrates the asymptotic nature of the partial moments as the number of observations

tends towards infinity over the interval [a,b].1 This is analogous to the number of ‫ܨ‬Ԣሺܿሻ using ο‫ ݔ‬of partition ݅ per the integral mean value theorem shows that

irregular rectangle partitions in other numerical integration methods.
‫ܨ‬Ԣሺܿሻ ൌ Ž‹ ෍ሾ݂ሺܿ௜ ሻሺο‫ݔ‬௜ ሻሿ ሺͳͳሻ
ȁȁο௫೔ ȁȁ՜଴
௜ୀଵ

‫׬‬௔ ݂ሺ‫ݔ‬ሻ݀‫ݔ‬
Ž‹ ሾܷܲ‫ܯ‬൫ͳǡͲǡ ݂ሺ‫ݔ‬ሻ൯ െ ‫ܯܲܮ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻሿ ൌ ሺ͹ሻ
௡՜ஶ ሺܾ െ ܽሻ
Thus demonstrating the inverse relationship involving:

(i) the distance between irregular rectangle partitions (ο‫ݔ‬௜ )


Using the proof of the second fundamental theorem of calculus we know
(ii) the number of observations (n)


‫ܨ‬ሺܾሻ െ ‫ܨ‬ሺܽሻ ൌ න ݂ሺ‫ݔ‬ሻ݀‫ݔ‬Ǥ
௔ Ž‹ ෍ሾ݂ሺܿ௜ ሻሺο‫ݔ‬௜ ሻሿ ൌ Ž‹ ሾܷܲ‫ܯ‬൫ͳǡͲǡ ݂ሺ‫ݔ‬ሻ൯ െ ‫ܯܲܮ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻሿሺͳʹሻ
ȁȁο௫೔ ȁȁ՜଴ ௡՜ஶ
௜ୀଵ

Yielding,
Just as integrated area sums converge to the integral of the function with increased
‫ܨ‬ሺܾሻ െ ‫ܨ‬ሺܽሻ
Ž‹ ሾܷܲ‫ܯ‬൫ͳǡͲǡ ݂ሺ‫ݔ‬ሻ൯ െ ‫ܯܲܮ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻሿ ൌ ሺͺሻ
௡՜ஶ ሺܾ െ ܽሻ rectangle areas partitioned over the interval of ݂ሺ‫ݔ‬ሻ,2 equation 7 shares this asymptote

2
Provided ‫ ܨ‬is differentiable everywhere on [a,b] and ‫ܨ‬Ԣ is integrable on [a,b]. The partial moment term of
the equality in equation 12 makes no such suppositions. The total area, not just the definite integral is
ܾ
1
Detailed examples are offered in Appendix A. simply ቚ‫ܽ׬‬ ݂ሺ‫ݔ‬ሻ݀‫ݔ‬ቚ ൌ Ž‹ ሾܷܲ‫ ܯ‬ቀͳǡͲǡ ݂ሺ‫ݔ‬ሻቁ ൅ ‫ܯܲܮ‬ሺͳǡͲǡ ݂ሺ‫ݔ‬ሻሻሿ
݊՜λ
6 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 7

equal to the integral of the function. This is demostrated above with equation 12. If one Once ‫ܨ‬Ԣሺܿሻ is defined, we can use the method of leading coefficients to determine the

can define the function of the asymptotic areas ࡲԢሺࢉሻ (UPM+LPM), then one can find horizontal asymptote. Figure 1 above has a horizontal asymptote of zero. However, once

the asymptote or integral of the function directly from observations. ‫ܨ‬Ԣሺܿሻ is defined the dominant assumption is that of stationarity of function parameters at

time t. Integral calculus is not immune from this stationarity assumption as݂ሺ‫ݔ‬ሻ needs to
FINDING THE HORIZONTAL ASYMPTOTE
be defined in order to integrate and differentiate. Since we are not defining ݂ሺ‫ݔ‬ሻ, we
The horizontal asymptote is the horizontal line that the graph of ‫ܨ‬Ԣሺܿሻ as ݊ ՜ λ.
have the luxury of recalibrating with each data point to capture the nonstationarity;
This asymptote is equal to ሾ‫ܨ‬ሺܾሻ െ ‫ܨ‬ሺܽሻሿȀሺܾ െ ܽሻ for the interval [a,b] where ܽ ൏ ܾ.
consequently updating ‫ܨ‬Ԣሺܿሻ.

Goodness of fit tests also assume a stationarity on the parameters; detracting from

its appeal as a reason to define a function.

DISCUSSION

To define, or not to define: that is the question. If we define ‫ܨ‬Ԣሺܿሻ we can find the

exact asymptote, thus area of ݂ሺ‫ݔ‬ሻ. If we appreciate the fact that nothing in finance

seems to be guided by an exactly defined function, the measured area of ݂ሺ‫ݔ‬ሻ over the

interval [a,b] will likely change over time due to the multiple levels of heterogeneity

present in the market structure.



Figure 1. Asymptote of ൌ ࢞ . As the range of the interval increases, we can fit ࡲԢሺࢉሻ
Furthermore, if we are going to expand the extra effort to define a function
or ࢌሺ࢞ሻ to determine the asymptote.
(within tolerances mind you, not an exact fit), does it really matter which function is

defined ‫ ܨ‬ᇱ ሺܿሻ‫݂ݎ݋‬ሺ‫ݔ‬ሻ? The next observation may very well lead to a redefinition.
8 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 9

Our proposed method of closely approximating the area of a function over an APPENDIX A: EXAMPLES OF KNOWN FUNCTIONS USING EQUATION 7

interval with partial moments is an important first step in enjoining flexibility into ࢌሺ࢞ሻ ൌ ࢞૛

finance versus integral calculus. We shed the dependence on stationarity, and alleviate To find the area of the function over the interval [0,10] for ݂ሺ‫ݔ‬ሻ ൌ ‫ ݔ‬ଶ , we differentiate
௫య ଵ଴଴଴
according to x yielding ‫ܨ‬ሺ‫ݔ‬ሻ ൌ . ‫ܨ‬ሺͳͲሻ െ ‫ܨ‬ሺͲሻ ൌ െ Ͳ ൌ ͵͵͵Ǥ͵͵
the need for goodness of fit tests for underlying function definitions. Moreover, if the ଷ ଷ

Using equation 7 in the ‘NNS’ package in R, we know ‫ܨ‬Ԣሺܿሻ should converge to


underlying process is stationary then simply increasing the number of observations will ଷଷଷǤଷଷ
ଵ଴
‫͵͵ݎ݋‬Ǥ͵͵Ǥ
ensure a convergence of methods.
> x=seq(0,10,1);y=x^2;UPM(1,0,y)-LPM(1,0,y)

We are hopeful over time this method will be refined and expanded in order to [1] 35

> x=seq(0,10,.1);y=x^2;UPM(1,0,y)-LPM(1,0,y)
bring a more robust and precise method of analysis then currently enjoyed; while
[1] 33.5
avoiding the pitfalls associated with the parametric approach on a truly heterogeneous
> x=seq(0,10,.02);y=x^2;UPM(1,0,y)-LPM(1,0,y)
population.
[1] 33.36667

> x=seq(0,10,.01);y=x^2;UPM(1,0,y)-LPM(1,0,y)

[1] 33.35

૚૙
Figure 2. Asymptotic partial moment areas for ‫׬‬૙ ࢞૛ ࢊ࢞Ǥ
10 Asymptotics NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Asymptotics 11

ࢌሺ࢞ሻ ൌ ξ࢞
APPENDIX B: PERFECT UNIFORM SAMPLE ASSUMPTION ቆ ‫ܕܑܔ‬ ൌ ‫ ܕܑܔ‬ቇ
หȁο࢞࢏ ȁห՜૙ ࢔՜ஶ
To find the area of the function over the interval [0,10] for ݂ሺ‫ݔ‬ሻ ൌ ξ‫ݔ‬, we differentiate

ଶ௫ మ ଺ଷǤଶସହ We can see from an analysis of samples over the interval [0,100] as the number of
according to x yielding ‫ܨ‬ሺ‫ݔ‬ሻ ൌ . ‫ܨ‬ሺͳͲሻ െ ‫ܨ‬ሺͲሻ ൌ െ Ͳ ൌ ʹͳǤͲͺ observations tends towards ∞, the observations approach a perfect uniform sample in
ଷ ଷ
Figure 1b. However, when using a sample representing irregular partitions, (more
Using equation 7 in the ‘NNS’ package in R, we know ‫ܨ‬Ԣሺܿሻ should converge to
ଶଵǤ଴଼
realistic of observations than completely uniform) the length of observations required to
ଵ଴
‫ʹݎ݋‬ǤͳͲͺǤ achieve perfect uniformity is greater than by assuming it initially. This condition speaks
volumes to misinterpretations of real world data when limit conditions are used as an
> x=seq(0,10,1);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)
artifact of fitting distributions.
[1] 2.042571

> x=seq(0,10,.1);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)

[1] 2.102329

> x=seq(0,10,.02);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)

[1] 2.107075

> x=seq(0,10,.01);y=sqrt(x);UPM(1,0,y)-LPM(1,0,y)

[1] 2.107638

Figure 1b. Randomly generated uniform sample over the interval approaches
perfect uniform as number of observations goes to infinity.

૚૙
Figure 3. Asymptotic partial moment areas for ‫׬‬૙ ξ࢞ࢊ࢞Ǥ


DISCRETE VS.
CONTINUOUS
DISTRIBUTIONS
Cumulative Distribution Functions and UPM/LPM Analysis

Abstract

We show that the Cumulative Distribution Function (CDF) is represented by the


ratio of the lower partial moment (LPM) ratio to the distribution for the interval in
question. The addition of the upper partial moment (UPM) ratio enables us to create
probability density functions (PDF) for any function without prior knowledge of its
characteristics. We are able to replicate discrete distribution CDFs and PDFs for normal,
uniform, poisson, and chi-square distributions, as well as true continuous distributions.
This framework provides a new formulation for UPM/LPM portfolio analysis using co-
partial moment matrices which are positive symmetrical semi-definite, aggregated to
yield a positive symmetrical definite matrix.
NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 17

I. Introduction:

The Empirical Cumulative Distribution Function (EDF) should, most of the time, be a

good approximation of the true cumulative distribution function (CDF) as the sample set

increases. This generalization is at the heart of statistics. Means and variances are used

to assign and fit a distribution, but partial moments stabilize with a smaller sample size

ensuring a more accurate analysis of the EDF.

The empirical CDF is a simple construct. It is simply the number of observations less

than or equal to a target, divided by the total number of observations in a given data set.

The problem with extrapolating these results to an assumed true CDF is that the discrete

empirical CDF is extremely sensitive to sample size,3 and any parameter nonstationarity

will deteriorate the fit to the true distribution. The paper is organized as follows:

First, we propose a method to derive the CDF and PDF of the EDF, utilizing the upper

and lower partial moments (UPM and LPM respectively) of the EDF. The benefits are

obvious, such as compensating for any observed skewness and kurtosis that would force a

more esoteric distribution family onto the data. These measurements require zero

knowledge of the underlying function and no goodness-of-fit tests to approximate a likely

true distribution. Partial moments also happen to exhibit less sample size sensitivity than

means and variances as we will discuss later.

Next, this foundation is then used to develop conditional probabilities and joint

distribution co-partial moments. Finally, this toolbox allows us to propose a new


3
Estimated mean average deviations are provided in Appendix A.
18 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 19

formulation for UPM/LPM analysis and we note that each of the co-partial moment The Upper and Lower partial moment formulas are below in Equations 1 and 2:

matrices are positive symmetrical semi-definite, ensuring a positive symmetrical definite



ͳ
aggregate matrix. This represents a major improvement in the use of partial moment ‫ܯܲܮ‬ሺ݊ǡ ݄ǡ ‫ݔ‬ሻ ൌ ൥෍ ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݔ‬௧ ሽ௡ ൩ሺͳሻ
ܶ
௧ୀଵ
matrices in portfolio theory and avoids the problems with co-semivariance matrices as

ͳ
noted by Grootveld and Hallerbach (1999) and Estrada(2008). ܷܲ‫ܯ‬ሺ‫ݍ‬ǡ ݈ǡ ‫ݔ‬ሻ ൌ ൥෍ ݉ܽ‫ݔ‬ሼͲǡ ‫ݔ‬௧ െ ݈ሽ௤ ൩ሺʹሻ
ܶ
௧ୀଵ

II. Deriving Cumulative Distribution and Partial Density Functions Using where ‫ݔ‬௧ represents the observation x at time t, n is the degree of the LPM, q is the degree
Partial Moments
of the UPM, h is the target for computing below target returns, and l is the target for

computing above target returns.4


A distribution may be dissected into two partial moment segments using an arbitrary
One can visualize how the entire distribution is quantified with the upper and lower
target as shown in Figure 1.
partial moment from the same target, (h = l = 0) in Figure 1. The area under the function

derived from degree one partial moments will approximate the area derived from the

integral of the function over an interval [a,b] asymptotically. This asymptotic numerical

integration is shown in Viole and Nawrocki (2012c) and represented with equation (3).


‫݂ ׬‬ሺ‫ݔ‬ሻ݀‫ݔ‬
Ž‹ ሾܷܲ‫ܯ‬൫ͳǡͲǡ ᐦሺ‫ݔ‬ሻ൯ െ ‫ܯܲܮ‬ሺͳǡͲǡ ᐦሺ‫ݔ‬ሻሻሿ ൌ ௔ ሺ͵ሻ
௧՜ஶ ሺܾ െ ܽሻ

We use a degree zero (n=q=0) to generate a discrete analysis, replicating results from
Figure 1. A distribution dissected into its two partial moment segments,
red LPM and blue UPM, from a shared target. the conventional CDF and PDF methodology. Degree one (n=q=1) is used to generate

the continuous results. This is an important distinction, as the discrete analysis is a

4
Equations 1 and 2 will generate a 0 for degree 0 instances of 0 results.
20 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 21

relative frequency and probability investigation; while the continuous analysis integrates The point probability is often included in the CDF calculation but it is not uniformly

a variance consideration to capture the rectangles of infinitesimal width in deriving an treated as less than or equal to the target.5

area under a function. Standard deviation remains stable as sample size range increases,
Theorem 1,
thus it is not an accurate barometer of the area of the function to estimate a continuous
ܲሼܺ ൏ ‫ݔ‬ሽ ൅ ܲሼܺ ൐ ‫ݔ‬ሽ ൅ ܲሼܺ ൌ ‫ݔ‬ሽ ൌ ͳሺͶሻ
variable. Figure 2 illustrates the range increase as the number of observations increase

for a normal distribution with μ=10 and σ=20 for 5 million random draws from a normal
If,
distribution. ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ߝ
ܲሼܺ ൑ ‫ݔ‬ሽ ൌ ‫ܯܲܮ‬௥௔௧௜௢ ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ െ ሺͷሻ͸
ሾ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൅ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻሿ ʹ

Range ‫ܯܲܮ‬௥௔௧௜௢ ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻሺͷܽሻ


250
200
Max - Min

150 ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ‫ݔ‬ǡ ܺሻ ് ‫ܯܲܮ‬ሺͳǡ ‫ݔ‬ǡ ܺሻሺͷܾሻ


100
50 Range
0 And,
10
40
70

30000
60000
90000
6000
100
250
400
600
900
3000

9000

300000
600000
900000
2000000
3500000
5000000
ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ߝ
ܲሼܺ ൐ ‫ݔ‬ሽ ൌ ܷܲ‫ܯ‬௥௔௧௜௢ ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ െ ሺ͸ሻ
Observations ሾ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൅ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻሿ ʹ

Figure 2. Range for a randomly generated normal distribution μ=10 and σ=20 for 5
5
million random draws. There is no consensus language for CDF definitions. Some instances are “൏ ‫ ”ݔ‬while others reference
“൑ ‫ ”ݔ‬depending on the distribution, discrete or continuous. We are uniform in our treatment of
Just as the probability of two mutually exclusive events equal one, the sum of the distributions with
“൑ ‫ ”ݔ‬for both discrete and continuous distributions. See
http://www.mathworks.com/help/toolbox/stats/unifcdf.html and
ratios - LPM to the entire distribution; and UPM to the entire distribution (‫ܯܲܮ‬௥௔௧௜௢ and http://www.mathworks.com/help/toolbox/stats/unidcdf.html for treatment of the target, ‫ݔ‬.
6
It is important to note that ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ is a probability measure and will yield a result from 0 to 1. Thus,
ܷܲ‫ܯ‬௥௔௧௜௢ respectively) plus the point probability, equal one as in equations 8 and 8a. the ratio of ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ to the entire distribution (‫ܯܲܮ‬௥௔௧௜௢ ሺͲǡ ‫ݔ‬ǡ ܺሻ) is equal to the probability measure
itself, ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ.
22 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 23

ܷܲ‫ܯ‬௥௔௧௜௢ ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻሺ͸ܽሻ ௕


We know from calculus that the ‫׬‬௔ ሺ‫ݔ‬ሻ݀‫ ݔ‬ൌ ‫ܨ‬ሺܾሻ െ ‫ܨ‬ሺܽሻ and if ‫ܨ‬ሺܾሻ ൌ ‫ܨ‬ሺܽሻ, the

integral of a point equals zero. Thus for a continuous distribution, there is no difference
ܷܲ‫ܯ‬௥௔௧௜௢ ሺͳǡ ‫ݔ‬ǡ ܺሻ ് ܷܲ‫ܯ‬ሺͳǡ ‫ݔ‬ǡ ܺሻሺ͸ܾሻ
between ܲሼܺ ൏ ‫ݔ‬ሽand ܲሼܺ ൑ ‫ݔ‬ሽ since ߝ ൌ ͲǤ If one wishes to subscribe to the notion

that the sum of an infinite amount of points each equal to zero must sum to one per the
Since the entire normalized distribution is represented by,
integral definition, then equation 7 is simply reduced to equation 8a for continuous

variables. However, equation 7 with degree 1 can also be used for the continuous
‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ߝ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ߝ
ቈ െ ቉൅ቈ െ ቉ ൅ ߝ ൌ ͳ
ሾ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൅ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻሿ ʹ ሾ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൅ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻሿ ʹ variable to compensate for ߝ ൐ Ͳ and generate a normalized continuous probability.
(7)

Where ߝ is the point probability ܲሼܺ ൌ ‫ݔ‬ሽ. The use of an empty set for ߝ yields,
A. Review of the Literature

‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൅ ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ ͳሺͺሻ Guthoff et al (1997) illustrate how the value at risk of an investment is equivalent to

 the degree zero LPM. We confirm this derivation as the degree zero LPM does indeed
‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ‫ݔ‬ǡ ܺሻ ൅ ܷܲ‫ܯ‬௥௔௧௜௢ ሺͳǡ ‫ݔ‬ǡ ܺሻ ൌ ͳሺͺܽሻ
provide a normalized solution. However, critical errors were made by Guthoff and in

subsequent works by Shadwick and Keating (2002), and Kaplan and Knowles (2004).
For a discrete distribution, an empty set for target observations lowers both

‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ and ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ simultaneously so that Equation 8 still equals one with The omega ratio is defined as,

‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ ܲሼܺ ൑ ‫ݔ‬ሽ and ܷܲ‫ܯ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ ൌ ܲሼܺ ൐ ‫ݔ‬ሽǤ The point probability ߝ for a ‫׬‬த ሾͳ െ ‫ܨ‬ሺܴሻሿܴ݀
ȳሺɒሻ ൌ த ሺͻሻ
‫ି׬‬ஶ ‫ܨ‬ሺܴሻܴ݀
discrete distribution can easily be computed by the frequency of the specific point
Where F(.) is the CDF for total returns on an investment and ɒ is the threshold return.
divided by the total number of observations. The point probability would be more
Guthoff and Shadwick and Keating’s error was the use of degree one LPM (area) on a
relevant in a discrete distribution of integers, and has an inverse relationship to the degree
degree 0 LPM, the probability CDF of the distribution. Degree one LPM does not need
of specification of the underlying variable. As the specification approaches infinity, ߝ
to be performed on the probability CDF as they present.
approaches zero.
24 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 25

The Kappa measure is defined as,


ߤെɒ ‫ܯܲܮ‬ሺͳǡ ܽǡ ‫ݔ‬ሻ ߝ
‫ܭ‬୬ ሺɒሻ ൌ ೙
ሺͳͲሻ െ
ሾܷܲ‫ܯ‬ሺͳǡ ܽǡ ‫ݔ‬ሻ ൅ ‫ܯܲܮ‬ሺͳǡ ܽǡ ‫ݔ‬ሻሿ ʹ
ඥ‫ܯܲܮ‬௡ ሺɒሻ

Kaplan and Knowles’ error was the dismissal of the degree zero LPM (0-th root of

something does not exist) which we show equals historical CDF measurements for


various distributions. Also, ඥ‫ܯܲܮ‬௡ ሺɒሻ forces concavity upon increased n, which do not
ܽ
presume such a condition.

Figure 3. Area of a Probability Density Function represented by the


The omega ratio (Shadwick and Keating, 2002) and kappa measure (Kaplan and Cumulative Distribution Function of an arbitrary point ࢇ for the
intervalሾെλǡ ࢇሿǤ 
Knowles, 2004) both demonstrate the need for a full derivation of partial moments and

their CDF equivalence with full degree explanation and relevance.

Cumulative Distribution Function (CDF) using partial moments:

‫ܨ‬௑ ሺ‫ݔ‬ሻ ൌ ܲሺܺ ൑ ‫ݔ‬ሻሺͳͳሻ



‫ܨ‬ሺ‫ݔ‬ሻ ൌ  න ݂ሺ‫ݔ‬ሻ݀‫ݔ‬Ǥ ሺͳʹሻ
ିஶ

Discrete,

‫ܨ‬ሺ‫ݔ‬ሻ ൌ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ݔ‬ሻሺͳ͵ሻ

Continuous,

‫ܨ‬ሺ‫ݔ‬ሻ ൌ  ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ܽǡ ‫ݔ‬ሻሺͳͶሻ

For any distribution the continuous estimate yields,7

ͲǤͷ ൌ  ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ߤǡ ‫ݔ‬ሻሺͳͷሻ

7
Figure 7 offers a visual representation of the difference between continuous and discrete CDFs of the
mean.
26 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 27

B. Methodology Notes:
‫ܯܲܮ‬ሺͳǡ ܽǡ ‫ݔ‬ሻ ߝ ܷܲ‫ܯ‬ሺͳǡ ܾǡ ‫ݔ‬ሻ ߝ
െ െ
ሾܷܲ‫ܯ‬ሺͳǡ ܽǡ ‫ݔ‬ሻ ൅ ‫ܯܲܮ‬ሺͳǡ ܽǡ ‫ݔ‬ሻሿ ʹ ሾܷܲ‫ܯ‬ሺͳǡ ܾǡ ‫ݔ‬ሻ ൅ ‫ܯܲܮ‬ሺͳǡ ܾǡ ‫ݔ‬ሻሿ ʹ We generated random distributions for 5 million observations. We then took 300

iterations with different seeds and averaged them. For stability estimates, we generated

mean average deviations (MAD) for each statistic over the 300 iterations for observations

30 through 5 million.
ܽ ܾ
The statistics used in the following discussion are as follows: CHIDF(target) -
Figure 4. Probability Density Function for the intervalሾࢇǡ ࢈ሿǤ 
Cumulative distribution function for the Chi-square distribution and specified target;

Kurtosis - Relative Kurtosis measure of the entire sample; Mean - μ of the entire sample;

Probability Density Function (PDF) using partial moments: Norm Prob(target) - Cumulative distribution function for the Normal distribution and

specified target; POIDF(target) - Cumulative distribution function for the Poisson
ܲሾܽ ൑ ‫ ݔ‬൑ ܾሿ ൌ  න ݂ሺ‫ݔ‬ሻ݀‫ݔ‬ሺͳ͸ሻ

distribution and specified target; Range - Max observation – min observation for the
Discrete,
entire sample; SemiDev - Semi-deviation of the sample using mean as the target; Skew -
ܲሾܽ ൑ ‫ ݔ‬൑ ܾሿ ൌ ‫ܯܲܮ‬ሺͲǡ ܾǡ ‫ݔ‬ሻ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ݔ‬ሻሺͳ͹ܽሻ
Skewness measure of the entire sample;

Continuous,
StdDev - Standard deviation of the sample; UNDF(target) - Cumulative distribution
ܲሾܽ ൑ ‫ ݔ‬൑ ܾሿ ൌ  ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ܾǡ ‫ݔ‬ሻ െ ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ܽǡ ‫ݔ‬ሻሺͳ͹ܾሻ
function for the Uniform distribution.

All of the above mentioned distributions and targets can be easily verified by the

reader with statistical software such as the ISML subroutine library. Furthermore, the

direct computation of the partial moments can also be easily implemented into such

software. The sample parameters generated were as follows:


28 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 29

Normal Distribution: ߤ ൌ ͳͲǤͲͲͲͳͺߪ ൌ ͳͻǤͻͻͻ͹͸ LPM increases if the target is greater than the mean, the continuous CDF will be

Poisson Distribution: ߠ ൌ ͻǤͻͻͻͻͳͶ consistently higher than the discrete CDF. This holds for all distribution families. The
Uniform Distribution ߤ ൌ ͳͲǤͲͲͲͶͷ continuous and discrete probabilities are obviously equal at the endpoints of the
Chi-Square Distribution: ‫ ݒ‬ൌ ͳ ߤ ൌ ͲǤͻͻͻͻͶ͹
distribution, 0 and 1 for minimum and maximum respectively.

C. Normal Distribution
CDFs for 0% Target
We compare our metric to the traditional CDF, Φ, of a standard normal random variable. 0.38
௫ ି௧ మ
0.36
ͳ
Ȱሺ‫ݔ‬ሻ ൌ න ݁ ଶ ݀‫ݐ‬ 0.34
ξʹߨ ିஶ
0.32

Probability
The probability generated from the normal distribution converges to ‫ܯܲܮ‬ሺͲǡͲǡ ܺሻin 0.3
Norm Prob(0)
0.28
approximately 90 observations as shown in Figure 5. ‫ܯܲܮ‬ሺͲǡͲǡ ܺሻ stabilizes with less
LPM(0,0,X)
0.26
observations than the normal probability (exhibiting a lower MAD) as shown in LPM(1,0,X)
0.24
0.22
Appendix A, Table 1a. This is proof that ‫ܯܲܮ‬ሺͲǡͲǡ ܺሻis indeed the discrete CDF of the
0.2

148
286
424
562
700
838
976
1114
1252
1390
1528
1666
1804
1942
2080
2218
2356
2494
2632
2770
2908
10
distribution for the area less than the target. While the normal probability is less than or
Observations
equal to the target compared to less than for ‫ܯܲܮ‬ሺͲǡͲǡ ܺሻ; the probability of the specific

target outcome does not affect the probability to the specification of four decimal places. Figure 5. CDF of 0% target for Normal distribution with μ=10 and σ=20 parameter
constraints.

The relationship between ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡͲǡ ܺሻ, ‫ܯܲܮ‬ሺͲǡͲǡ ܺሻ and the normal probability

or Norm Prob(0) is shown in Figure 5. The further from the mean, the greater the

discrepancy between the continuous and discrete CDF as seen in Figure 6. As the area of

the distribution increases for the UPM if the target is less than the mean, the continuous

CDF will be consistently lower than the discrete CDF. Conversely, as the area of the
30 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 31

Continuous vs. Discrete Approaching Mean CDFs of Mean


0.45 0.51

0.4 0.505
Probability

Probability
0.35
LPM(0,0,X) 0.5
0.3 LPM(1,0,X)
0.495 LPM(0,μ,x)
0.25 LPM(1,4.5,X)
LPMratio(1,μ,x)
LPM(0,4.5,X) 0.49
0.2

2080

2494

2908
148
286
424
562
700
838
976
1114
1252
1390
1528
1666
1804
1942

2218
2356

2632
2770
10

0.485

10

1858
1066
1198
1330
1462
1594
1726

1990
2122
2254
2386
2518
2650
2782
2914
142
274
406
538
670
802
934
Observations
Observations

Figure 6. Continuous estimate converges towards discrete estimate as the target


approaches sample mean (as h is increased from 0 to 4.5). The LPM n=0, h=0 is denoted as Figure 7. Differences in discrete LPM(0,μ,X) and continuous LPMratio(1,μ,x) CDFs
LPM(0,0,X), LPM n=1, h=0) is denoted by LPM(1,0,X), LPM n=1, h=4.5 is denoted as converge when using the mean target for the Normal distribution. ࡸࡼࡹሺ૙ǡ ࣆǡ ࢄሻ ്
LPM(1,4.5,X) and the LPM n=0, h=4.5 is LPM(0,4.5,X). ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻǤ

In Figure 7, the plot shows the convergence of the discrete LPM degree 0 from the
Above and Below Mean CDFs
mean to the continuous LPM degree 1 using the mean as the target return. The discrete 0.7
0.65
isn’t stable until around 1000 observations. 0.6

Probability
0.55 LPM(1,13.5,X)
0.5 LPM(0,13.5,X)
0.45
LPM(1,4.5,X)
0.4
LPM(0,4.5,X)
0.35
0.3 LPM(1,u,X)

1666
148
286
424
562
700
838
976
1114
1252
1390
1528

1804
1942
2080
2218
2356
2494
2632
2770
2908
10
Observations

Figure 8. Different locations of the target versus the mean and relationships between
discrete and continuous CDFs.
32 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 33

In Figure 8, we used different targets of 4.5%, 9% (mean), and 13.5% and we see that Table 2 below shows the convergence of our metric to the traditional method for the

the continuous is outside of the range of the discrete measures. Note that with the mean uniform CDF (UNDF) with a mean of 10. The results are the same as we noted for the

as the target, the continuous measure is rock solid on the 50% probability. normal distribution in Table 1.

Normal Distribution Probabilities - 5 Million Draws 300 Iteration Seeds Uniform Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
Norm Prob(X ≤ 0.00) = .3085 LPM(0, 0, X) = .3085 LPM(1, 0, X) = .2208 UNDF(X ≤ 0.00) = .4 LPM(0, 0, X) = .4 LPM(1, 0, X) = .3077
Norm Prob(X ≤ 4.50) = .3917 LPM(0, 4.5, X) = .3917 LPM(1, 4.5, X) = .3339 UNDF(X ≤ 4.50) = .445 LPM(0, 4.5, X) = .445 LPM(1, 4.5, X) = .3913
Norm Prob(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5 UNDF(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5
Norm Prob(X ≤ 13.5) = .5694 LPM(0, 13.5, X) = .5694 LPM(1, 13.5, X) = .608 UNDF(X ≤ 13.5) = .535 LPM(0, 13.5, X) = .535 LPM(1, 13.5, X) = .5697

Table 1. Final probability estimates with 5 million observations and 300 iteration seeds Table 2. Uniform distribution results illustrate convergence of ‫ࡹࡼܮ‬ሺ૙ǡ ࢞ǡ ࢄሻ to UNDF and
averaged for the Normal distribution. consistent relationship between ‫ࡹࡼܮ‬ሺ૙ǡ ࢞ǡ ࢄሻ and ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࢞ǡ ࢄሻ above and below the
mean target.
In Table 1, we see that the LPM degree 0 provides equivalent probabilities as the

Normal Probability function from the IMSL library. The continuous probability using
E. Poisson Distribution
the LPM degree 1 is at 0.5 for the mean as a target and has a lower probability below the
We compare our metric to the traditional Poisson CDF (POIDF) for values less than or
mean and a higher probability above the mean as we have noted previously.
equal to X.

ିఏ
ߠ௫
݂ሺ‫ݔ‬ሻ ൌ ݁
‫ݔ‬Ǩ
D. Uniform Distribution
Poisson Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
We compare our metric to the traditional uniform CDF for values less than or equal to x.
POIDF(X ≤ 0.00) = .00005 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0
Ͳǡ݂݅‫ ݔ‬൏ ‫ܣ‬ POIDF(X ≤ 4.50) = .0293 LPM(0, 4.5, X) = .0293 LPM(1, 4.5, X) = .0051
‫ݔ‬െ‫ܣ‬ POIDF(X ≤ Mean) = .5151 LPM(0, μ, X) = .5151 LPM(1, μ, X) = .5
ሺ‫ݔ‬ȁ‫ܣ‬ǡ ‫ܤ‬ሻ ൌ ൞ ǡ݂݅‫ ܣ‬൑ ‫ ݔ‬൑ ‫ܤ‬ POIDF(X ≤ 13.5) = .8645 LPM(0, 13.5, X) = .8645 LPM(1, 13.5, X) = .9365
‫ܤ‬െ‫ܣ‬
ͳǡ݂݅‫ ݔ‬൐ ‫ܤ‬
34 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 35

Table 3. Poisson distribution results illustrate convergence of ‫ࡹࡼܮ‬ሺ૙ǡ ࢞ǡ ࢄሻ to POIDF and G. Continuous Distributions:
consistent relationship between ‫ࡹࡼܮ‬ሺ૙ǡ ࢞ǡ ࢄሻ and ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࢞ǡ ࢄሻ above and below the
mean target.
In a discrete measurement with a zero target, there is no difference between a 40%
F. Chi-Square Distribution
observation and a 70% observation as both will yield a single positive count in the
We compare our metric to traditional chi-square CDF (CHIDF) for values less than or
equal to X. frequency (both were observed in our normal distribution generation with μ=10 and σ=20

ͳ ି௧ ௩ parameter constraints). However, there is considerable area between these two
‫ܨ‬ሺ‫ݔ‬ሻ ൌ ௩ ‫ݒ‬ න ݁ ଶ ‫ ݐ‬ଶିଵ ݀‫ݐ‬
ʹଶ Ȟሺ ሻ ଴
ʹ observations that merely gets binned in a probability analysis. This undesirable construct
We set the degrees of freedom for the chi-square equal to one. The reason for this
also has the ubiquitous quality of scale invariance. Equation (14) measures this neglected
arbitrary selection is the distinct curve generated by this parameter value, and its likeness
area with its inherent variance consideration simultaneously factored with the discrete
to the power law distribution. There is no a priori argument that the degrees of freedom
frequency analysis.
will affect our methodology given its non-parametric derivation.
“All actual sample spaces are discrete, and all observable random
variables have discrete distributions. The continuous distribution is a
Chi-Squared Distribution Probabilities - 5 Million Draws 300 Iteration Seeds mathematical construction, suitable for mathematical treatment, but not
CHIDF(X ≤ 0) = 0 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0 practically observable.” E.J.G. Pitman (1979).
CHIDF(X ≤ 0.5) = .5205 LPM(0, 0.5, X) = .5205 LPM(1, 0.5, X) = .2087
CHIDF(X ≤ 1) = .6827 LPM(0, 1, X) = .6827 LPM(1, 1, X) = .5
CHIDF(X ≤ 5) = .9747 LPM(0, 5, X) = .9747 LPM(1, 5, X) = .989
‫ܯܲܮ‬௥௔௧௜௢ degree of 1 (n=q=1) permits us to calculate the area “between the bins.” For

example, in a roll of a die, the area of the function between 3.1 and 3.9 will be static for
Table 4. Chi-Square distribution results illustrate convergence of ‫ࡹࡼܮ‬ሺ૙ǡ ࢞ǡ ࢄሻ to UNDF
and consistent relationship between ‫ࡹࡼܮ‬ሺ૙ǡ ࢞ǡ ࢄሻ and ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࢞ǡ ࢄሻ above and below the discrete method (based on integer bins 1-6). If the distribution were actually
the mean target.
continuous, the variance influence in ‫ܯܲܮ‬௥௔௧௜௢ degree 1 generates an accurate

measurement of the area 3.1 through 3.9 for this area between the bins - for uniform and

all other distributions. Furthermore, the mean for a die roll is approximately 3.5.

‫ܯܲܮ‬௥௔௧௜௢ degree 1 generates a 0.5 result for the CDF with the 3.5 mean as the target in a

uniform distribution ranging from 1 to 6. Unfortunately, per Pitman’s observation, we


36 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 37

are not able to generate a continuous distribution to observe and verify this notion for III. Joint Distribution Co-Partial Moments and UPM/LPM Analysis

target values other than the mean (which we prove always equal 0.5) or endpoints (0 or 1
In this section, we introduce the framework for the joint distribution using partial
for sample minimum and maximum). The consistent observed relationship we
moments. For more background, Appendix B and Appendix C provide more information
demonstrated between ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ‫ݔ‬ǡ ܺሻ and ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ for targets above and below
on joint probabilities and conditional CDFs. We also replicate the covariance matrix of a
the mean, offers considerable support of the continuous estimates.
two variable normal distribution and its cosemivariance matrix with the variables’

A better example to distinguish between discrete and continuous analysis is the chi- aggregated partial moment components. This information provides a toolbox that yields

square distribution with degrees of freedom set to one. The range of the observations a positive definite symmetrical co-partial moment matrix capable of handling any target

extended to X=35.1 and resembles the power law function. Considering μ=1.0 and and resulting asymmetry, providing a distinct advantage over its cosemivariance

σ=1.414, the discrete probability of a mean return was 0.6827 as shown in Table 4. counterpart.

However, if one envisions the decreasing thin slice of area under the function all the way
The issue in this area traces back to the Markowitz (1959) chapter on semivariance
down the x-axis to the observation X=35.1, this extended result only generates a reading
analysis. The cosemivariance matrix in Markowitz is an endogenous matrix that is
of one in its probability calculation of x > μ. No different than an observation of X=11
computed after the portfolio returns have been computed. Because we have to know the
which is also a positive count in this example. The frequency of X=11 is the
portfolio allocations before we can compute the portfolio returns, the cosemivariance
distinguishing characteristic. This difference in area between 11 and 35.1 is considerable
matrix is not known until after we have solved the problem. Attempts to solve the mean-
and is completely disregarded under discrete frequency analysis. When the variance of
semivariance problem with an exogenous matrix, a matrix computed from the security
that deviation is considered to account for the infinite possible outcomes for the
return data, have had problems because the cosemivariance matrix is asymmetric, and
continuous variable, the probability of a mean return drops significantly to 0.5 from
therefore, not positive semi-definite. Grootveld and Hallerbach (1999) noted that the
0.6827.
endogenous and exogenous matrices are not equivalent. Estrada (2008), however,

The reason for this is straightforward, ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ converges to the frequency / demonstrates that a symmetric exogenous matrix is a very good approximation for the

counting data set while ‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ‫ݔ‬ǡ ܺሻ retains its area property.
38 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 39

endogenous matrix. Our purpose is to demonstrate a method that provides a positive And the covariance between 2 variables is simply

semi-definite matrix system that preserves any asymmetry in the underlying process. ்
ͳ
ߪ௫௬ ൌ ൬ ൰ ή ෍ሺ‫ݔ‬௧ െ ߤ௫௧ ሻሺ‫ݕ‬௧ െ ߤ௬௧ ሻ ሺʹ͵ሻ
ܶ
௧ୀଵ
First, the LPM and the CLPM are defined as follows:


ͳ Since semivariance from benchmark B is
‫ܯܲܮ‬ሺ݊ǡ ݄ǡ ‫ݔ‬ሻ ൌ ൥෍ ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݔ‬௧ ሽ௡ ൩ሺͳͺሻ
ܶ ்
௧ୀଵ ͳ

ȭ௫஻ ൌ ሼሾ݉݅݊ሺ‫ ݔ‬െ ‫ܤ‬ǡ Ͳሻ ሿ ൌ ൬ ൰ ή ෍ሾ‹ሺ‫ݔ‬௧ െ ‫ܤ‬ǡ Ͳሻሻଶ ሿ ሺʹͶሻ

ܶ
் ௧ୀଵ
ͳ
‫ܯܲܮܥ‬ሺ݊ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݔ‬௧ ሽ௡ ή ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݕ‬௧ ሽ௡ ሻ൩ሺͳͻሻ
ܶ
௧ୀଵ
Then it is also the cosemivariance of itself

The Degree 1 Co-LPM (CLPM) matrix is: ͳ
ȭ௫௫஻ ൌ ൬ ൰ ή ෍ሾ݉݅݊ሺ‫ݔ‬௧ െ ‫ܤ‬ǡ Ͳሻሿሾ݉݅݊ሺ‫ݔ‬௧ െ ‫ܤ‬ǡ Ͳሻሿ ሺʹͷሻ
ܶ
௧ୀଵ
‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݔ‬ሻ ‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ
൤ ൨
‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ ‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݕ‬ሻ And the cosemivariance between 2 variables is

ͳ
ȭ௫௬஻ ൌ ൬ ൰ ή ෍ሾ݉݅݊ሺ‫ݔ‬௧ െ ‫ܤ‬ǡ Ͳሻሿሾ݉݅݊ሺ‫ݕ‬௧ െ ‫ܤ‬ǡ Ͳሻሿ ሺʹ͸ሻ
‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݔ‬ሻ ൌ ‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݔ‬ሻሺʹͲሻ ܶ
௧ୀଵ

Since variance is the squared deviation ଶ


Since LPM degree 2 is equal to semivariance, ‫ܯܲܮ‬ሺʹǡ ‫ܤ‬ǡ ‫ݔ‬ሻ ൌ ȭ௫஻

ͳ ்
ߪ௫ଶ ൌ ሾሺ‫ݔ‬௧ െ ߤ௫௧ ሻ ሿ ൌ ൬ ൰ ή ෍ሺ‫ݔ‬௧ െ ߤ௫௧ ሻଶ ሺʹͳሻ
ଶ ͳ
ܶ ‫ܯܲܮ‬ሺʹǡ ‫ܤ‬ǡ ‫ݔ‬ሻ ൌ ൬ ൰ ή ෍ሾ݉ܽ‫ݔ‬ሺ‫ ܤ‬െ ‫ݔ‬௧ ǡ Ͳሻሿଶ ሺʹ͹ሻ
௧ୀଵ ܶ
௧ୀଵ

Also equals the Co-LPM degree 1 of the same variable


It is also the deviation times itself…the covariance of itself. ்

ͳ
் ‫ܯܲܮܥ‬ሺͳǡ ‫ܤ‬ǡ ‫ݔ‬ȁ‫ݔ‬ሻ ൌ ȭ௫஻ ൌ ȭ௫௫஻ ൌ ൬ ൰ ή ෍ሾ݉ܽ‫ݔ‬ሺ‫ ܤ‬െ ‫ݔ‬௧ ǡ Ͳሻሿሾ݉ܽ‫ݔ‬ሺ‫ ܤ‬െ ‫ݔ‬௧ ǡ Ͳሻሿ
ͳ ܶ
௧ୀଵ
ߪ௫௫ ൌ ൬ ൰ ή ෍ሺ‫ݔ‬௧ െ ߤ௫௧ ሻሺ‫ݔ‬௧ െ ߤ௫௧ ሻ ሺʹʹሻ
ܶ
௧ୀଵ ሺʹͺሻ
40 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 41

And the Co-LPM degree 1 between 2 variables is The main diagonal of the aggregated matrix will retain the covariance equivalence under

ͳ any asymmetry with the following relationship for all targets,
‫ܯܲܮܥ‬ሺͳǡ ‫ܤ‬ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൌ ൬ ൰ ή ෍ሾ݉ܽ‫ݔ‬ሺ‫ ܤ‬െ ‫ݔ‬௧ ǡ Ͳሻሿሾ݉ܽ‫ݔ‬ሺ‫ ܤ‬െ ‫ݕ‬௧ ǡ Ͳሻሿሺʹͻሻ
ܶ
௧ୀଵ
ߪ௫ଶ ൌ ‫ܯܲܮ‬ሺʹǡ ߤǡ ‫ݔ‬ሻ ൅ ܷܲ‫ܯ‬ሺʹǡ ߤǡ ‫ݔ‬ሻሺ͵Ͳሻ

‫ܯܲܮܥ‬ሺͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൅ ‫ܯܷܲܥ‬ሺͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൌ


For two symmetrical distributions x, y with ݄ ൌ ߤ

ͳ
Co-LPM Matrix = Co-UPM Matrix  ൥෍ሺ݉ܽ‫ ݔ‬ሼͲǡ ߤ െ ‫ݔ‬௧ ሽ ή ݉ܽ‫ݔ‬ሼͲǡ ߤ െ ‫ݕ‬௧ ሽሻ ൅ ሺ݉ܽ‫ݔ‬ሼͲǡ ‫ݔ‬௧ െ ߤሽ ή ݉ܽ‫ݔ‬ሼͲǡ ‫ݕ‬௧ െ ߤሽሻ൩
ܶ
௧ୀଵ
‫ܯܲܮ‬ሺʹǡ ߤǡ ‫ݔ‬ሻ ‫ܯܲܮܥ‬ሺͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ ܷܲ‫ܯ‬ሺʹǡ ߤǡ ‫ݔ‬ሻ ‫ܯܷܲܥ‬ሺͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ
൤ ൨ൌ൤ ൨
‫ܯܲܮܥ‬ሺͳǡ ߤǡ ‫ݕ‬ȁ‫ݔ‬ሻ ‫ܯܲܮ‬ሺʹǡ ߤǡ ‫ݕ‬ሻ ‫ܯܷܲܥ‬ሺͳǡ ߤǡ ‫ݕ‬ȁ‫ݔ‬ሻ ܷܲ‫ܯ‬ሺʹǡ ߤǡ ‫ݕ‬ሻ (31)

Equation (31) will generate a zero instead of a negative covariance result, ensuring a

positive matrix. This zero (instead of the negative) result does not affect the preservation

of information for the instances whereby one variable is above the target and one below.
Furthermore, the addition of the Co-LPM matrix, the Co-UPM matrix is equivalent to the
The addition of this observation to the complement set lowers both the CLPM and
covariance matrix on the main diagonal.
CUPM. In essence, nothing is something.

We note that each of the co-partial moment matrices are positive symmetrical semi-
‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൅ ‫ܯܷܲܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൌ
definite, ensuring a positive symmetrical definite aggregate matrix.
‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݔ‬ሻ ‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ܷܲ‫ܯ‬ሺʹǡ ݄ǡ ‫ݔ‬ሻ ‫ܯܷܲܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ
൤ ൨൅൤ ൨
‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ ‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݕ‬ሻ ‫ܯܷܲܥ‬ሺͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ ܷܲ‫ܯ‬ሺʹǡ ݄ǡ ‫ݕ‬ሻ

‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݔ‬ሻ ൅ ܷܲ‫ܯ‬ሺʹǡ ݄ǡ ‫ݔ‬ሻ ‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൅ ‫ܯܷܲܥ‬ሺͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ


ൌ൤ ൨
‫ܯܲܮܥ‬ሺͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ ൅ ‫ܯܷܲܥ‬ሺͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ ‫ܯܲܮ‬ሺʹǡ ݄ǡ ‫ݕ‬ሻ ൅ ܷܲ‫ܯ‬ሺʹǡ ݄ǡ ‫ݕ‬ሻ
42 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 43

A. Complement Set Matrix diagonal consists of all zeros since the divergent partial moment of the same variable

To further analyze the information in the ሺ‫ ܯܲܮܥ‬൅ ‫ܯܷܲܥ‬ሻ஼ complement set from does not exist. The degree 1 DPM is presented below.

diverging target returns between variables, we introduce two new metrics - the diverging

lower partial moment (‫ )ܯܲܮܦ‬and diverging upper partial moment (‫)ܯܷܲܦ‬.


Ͳ ‫ܯܲܦ‬ሺͳȁͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ
൤ ൨ൌ
‫ܯܲܦ‬ሺͳȁͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ Ͳ


ͳ
‫ܯܲܮܦ‬ሺ‫ݍ‬ȁ݊ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼ‫ݔ‬௧ െ ݄ǡ Ͳሽ௤ ή ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݕ‬௧ ሽ௡ ሻ൩ሺ͵ʹሻ Ͳ ‫ܯܲܮܦ‬ሺͳȁͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ Ͳ ‫ܯܷܲܦ‬ሺͳȁͳǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ
ܶ ൤ ൨൅൤ ൨
௧ୀଵ ‫ܯܲܮܦ‬ሺͳȁͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ Ͳ ‫ܯܷܲܦ‬ሺͳȁͳǡ ݄ǡ ‫ݕ‬ȁ‫ݔ‬ሻ Ͳ

(34)

ͳ
‫ܯܷܲܦ‬ሺ݊ȁ‫ݍ‬ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݔ‬௧ ሽ௡ ή ݉ܽ‫ݔ‬ሼ‫ݕ‬௧ െ ݄ǡ Ͳሽ௤ ሻ൩ሺ͵͵ሻ
ܶ
௧ୀଵ Since there only exists four possible interactions between two variables,

X ≤ target, Y ≤ target ‫ܯܲܮܥ‬ሺ݊ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ


Equation (32) provides the divergent LPM for variable Y given a positive target X ≤ target, Y > target ‫ܯܷܲܦ‬ሺ݊ȁ‫ݍ‬ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ
deviation for variable X from shared target h, with the LPM and UPM degrees (n and q X > target, Y ≤ target ‫ܯܲܮܦ‬ሺ‫ݍ‬ȁ݊ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ

respectively) explained earlier in equations 1 and 2. For example, given a 20% X > target, Y > target ‫ܯܷܲܥ‬ሺ‫ݍ‬ǡ ݄ǡ ‫ݔ‬ȁ‫ݕ‬ሻ

observation for variable X and a shared target of 0%, a -10% observation for variable Y we can clearly see that the sum of the degree 0 probability matrices of all four

will generate a larger DLPM than a -5% observation for variable Y. interactions must equal one, explaining the entire multivariate distribution.

Conversely, equation (33) provides the divergent UPM for variable Y given a negative The distinct advantage for the partial moments over semivariance as the preferred

target deviation for variable X. below target analysis method is the ability for the partial moments to compensate for any

asymmetry.
The matrix of each divergent partial moment will be aggregated to represent the

divergent partial moment matrix (DPM). One key feature of this matrix is the main
44 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 45

Under symmetry, Each of the co-partial moment matrices is positive symmetrical semi-definite,

ensuring a positive symmetrical definite aggregate matrix, thus avoiding the


Cosemivariance Matrix = ½ Covariance Matrix

ȭ௫௫ఓ ȭ௫௬ఓ ͳ ߪ௫௫ ߪ௫௬ endogenous/exogenous matrix problem described by Grootveld and Hallerbach (1999)
൤ ൨ ൌ ቂߪ ߪ௬௬ ቃ
ȭ௬௫ఓ ȭ௬௬ఓ ʹ ௬௫ and Estrada (2008).
ȭ௫௫ఓ ȭ௫௬ఓ ȭ௫௫ఓ ȭ௫௬ఓ ߪ௫௫ ߪ௫௬
൤ ൨൅൤ ൨ ൌ ቂߪ ߪ௬௬ ቃ (35)
ȭ௬௫ఓ ȭ௬௬ఓ ȭ௬௫ఓ ȭ௬௬ఓ ௬௫ In R, using the ‘NNS’ package, we can verify the variance/covariance equivalence.

> set.seed(123); x=rnorm(100); y=rnorm(100)


ȭ௫௫ఓ ȭ௫௬ఓ
Minimizing ൤ ൨ creates an imbalance that has no offsetting components to > var(x)
ȭ௬௫ఓ ȭ௬௬ఓ
[1] 0.8332328
#Sample:
equal the covariance matrix when added to itself. The minimizing of the LPM matrix and > UPM(2,mean(x),x)+LPM(2,mean(x),x)
[1] 0.8249005
the DLPM matrix has a simultaneous inverse effect of increasing the UPM matrix and #Population:
> (UPM(2,mean(x),x)+LPM(2,mean(x),x))*(length(x)/(length(x)-1))
[1] 0.8332328
DUPM matrix, ergo compensating for any asymmetry. This balancing effect holds for #Variance is also the co-variance of itself:
> (Co.LPM(1,1,x,x)+Co.UPM(1,1,x,x)-D.LPM(1,1,x,x)-
any target, not just ߤǤ D.UPM(1,1,x,x))*(length(x)/(length(x)-1))
[1] 0.8332328

> cov(x,y)
[1] -0.04372107
> (Co.LPM(1,1,x,y)+Co.UPM(1,1,x,y)-D.LPM(1,1,x,y)-
D.UPM(1,1,x,y))*(length(x)/(length(x)-1))
[1] -0.04372107

ߪ௫௫ ߪ௫௬ ‫ܯܲܮ‬ሺʹǡ ߤǡ ‫ݔ‬ሻ ‫ܯܲܮܥ‬ሺͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ


ቂߪ ߪ௬௬ ቃ̱൤‫ܯܲܮܥ‬ሺͳǡ ߤǡ ‫ݕ‬ȁ‫ݔ‬ሻ ൨
௬௫ ‫ܯܲܮ‬ሺʹǡ ߤǡ ‫ݕ‬ሻ

Ͳ ‫ܯܲܦ‬ሺͳȁͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ
െ൤ ൨
‫ܯܲܦ‬ሺͳȁͳǡ ߤǡ ‫ݕ‬ȁ‫ݔ‬ሻ Ͳ

ܷܲ‫ܯ‬ሺʹǡ ߤǡ ‫ݔ‬ሻ ‫ܯܷܲܥ‬ሺͳǡ ߤǡ ‫ݔ‬ȁ‫ݕ‬ሻ


൅൤ ൨ (36)
‫ܯܷܲܥ‬ሺͳǡ ߤǡ ‫ݕ‬ȁ‫ݔ‬ሻ ܷܲ‫ܯ‬ሺʹǡ ߤǡ ‫ݕ‬ሻ
46 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 47

IV. Conclusions area estimate; it merely creates larger quantities of smaller areas thus keeping the total

area constant. Equation (14) makes no such concessions and generates the theoretical
We have demonstrated how the ‫ ܯܲܮ‬degree 0 is equal to the traditionally derived
continuous area, while maintaining the relationship identified in Equation (15). We note
CDF of any assumed distribution. ‫ܯܲܮ‬ሺͲǡ ‫ݔ‬ǡ ܺሻ converges to:
how the continuous CDF is much more pronounced the further from the mean the integral
ష೟మ
ଵ ௫
Ȱሺ‫ݔ‬ሻ ൌ ‫݁ ׬‬ మ ݀‫ݐ‬, is - compensating for the asymmetry of the additional area “between the bins” that is
ξଶగ ିஶ

Ͳǡ݂݅‫ ݔ‬൏ ‫ܣ‬ placed in the proceeding bin during discrete analysis.
௫ି஺
ሺ‫ݔ‬ȁ‫ܣ‬ǡ ‫ܤ‬ሻ ൌ ቐ஻ି஺ ǡ݂݅‫ ܣ‬൑ ‫ ݔ‬൑ ‫ ܤ‬ǡ
ͳǡ݂݅‫ ݔ‬൐ ‫ܤ‬ Benoit Mandelbrot notes the shorter the measuring instrument, the larger the coastline

ߠ௫ of Britain; ultimately yielding a result of infinity. This line of reasoning is commensurate


݂ሺ‫ݔ‬ሻ ൌ ݁ ିఏ ǡ
‫ݔ‬Ǩ
with the continuous CDF versus its discrete counterpart; and the infinitesimal

ͳ ି௧ ௩
‫ܨ‬ሺ‫ݔ‬ሻ ൌ ௩ ‫ ݒ‬න ݁ ଶ ‫ ݐ‬ଶିଵ ݀‫ݐ‬Ǥ
ʹଶ Ȟሺ ሻ ଴ subintervals of a continuous distribution. We hope that further research on this method
ʹ
and its applications eventually finds its way to various fields of study.

The obvious benefit is the distribution agnostic manner of this direct computation, We show that the Cumulative Distribution Function (CDF) is represented by the ratio
which consumes far less time and cpu effort than bootstrapping a discrete estimate.
of the lower partial moment ratio (‫ܯܲܮ‬௥௔௧௜௢ ) to the distribution for the interval in
Furthermore, the stability of the partial moments versus each of the distribution estimates
question. The addition of the upper partial moment ratio (ܷܲ‫ܯ‬௥௔௧௜௢ ) enables us to create
is yet another benefit of our method. Finally, the ability to derive results for a truly
probability density functions (PDF) for any function or distribution without prior
continuous variable emphasizes the flexibility of this method.
knowledge of its characteristics. The ability to derive the CDF and PDF without any

Any computer generated sample and analysis thereof, is that of a discrete variable. A distributional assumptions yields a more accurate calculation devoid of any error terms

histogram and bins as commonly performed in Excel by practitioners and academics alike present from a less than perfect goodness of fit, as well as critical information about the

ignores a large area under the function due to this discrete classification. The addition of tails of the distribution. This foundation is then used to develop conditional probabilities

bins with increased observations does not fill in the area and converge to the continuous and joint distribution co-partial moments. The resulting toolbox allows us to propose a
48 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 49

new formulation for UPM/LPM analysis and we note that each of the co-partial moment Appendix A:

matrices are positive symmetrical semi-definite, ensuring a positive symmetrical definite


In this section we address any sample size concerns the reader may logically infer. Since
aggregate matrix.
these concerns are not specific to our methodology but rather to statistics in general, we

offer the results of a separate study comparing the deviations from the large sample sizes

reported in the main body of this paper.

Stability of Estimates
25

20

Estivate Value
15
Mean
StdDev
10
SemiDev

5 UPM(1,0,x)

46
10
22
34

58
70
82
94

178
106
118
130
142
154
166

190
202
214
226
238
250
262
274
286
298
Observation

Figure 1a. Visual representation of the stabilization of statistics as sample size increases.
50 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 51
52 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 53

Appendix B: The conditional probability P(B1|A) = 1.

Conditional Probabilities:

We illustrate how the partial moment ratios can also emulate conditional

probability calculations. We re-visualize the Venn diagram areas in Figure 1b as

distribution areas from which the LPM and UPM can be observed.

Figure 1b. Venn diagram illustrating conditional probabilities of different ͳ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ܤ‬ଵ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଵ ሻሺǤ ͳሻ
areas in the sample space, S.

P(B1|A) = 1
ͳ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܤ‬ଵ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଵ ሻሺǤ ʹሻ
P(B2|A) ≈ 0.85

P(B3|A) = 0. ͳ ൌ  ሺͳሻ െ ሺͲሻ


54 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 55

The conditional probability P(B2|A) ≈ 0.85. The conditional probability P(B2|A) ≈ 0.85.


ͲǤͺͷ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ܤ‬ଶ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଶ ሻሺǤ Ͷሻ
ͲǤͺͷ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ܤ‬ଶ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଶ ሻሺǤ ͹ሻ

ͲǤͺͷ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܤ‬ଶ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଶ ሻሺǤ ͷሻ


ͲǤͺͷ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܤ‬ଶ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଶ ሻሺǤ ͺሻ

ͲǤͺͷ ൌ  ሺǤͺͷሻ െ ሺͲሻ


ͲǤͺͷ ൌ  ሺͳሻ െ ሺǤͳͷሻ
56 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 57

The conditional probability P(B3|A) = 0. The conditional probability P(B3|A) = 0.

 Ͳ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ܤ‬ଷ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଷ ሻሺǤ ͳ͵ሻ


Ͳ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ‫ܤ‬ଷ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଷ ሻሺǤ ͳͲሻ

Ͳ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܤ‬ଷ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଷ ሻሺǤ ͳͶሻ


Ͳ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܤ‬ଷ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ‫ܤ‬ଷ ሻሺǤ ͳͳሻ

Ͳ ൌ  ሺͳሻ െ ሺͳሻ
Ͳ ൌ  ሺͲሻ െ ሺͲሻ
58 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 59

Bayes’ Theorem:

Bayes’ theorem will also generate the conditional probability of A given B, Cancelling out ܲሺ‫ܣ‬ሻleaves us with Bayes’ theorem represented by partial

ܲሺ‫ܣ‬ȁ‫ܤ‬ሻ with the formula moments, and our conditional probability on the right side of the equality.

ܲሺ‫ܤ‬ȁ‫ܣ‬ሻܲሺ‫ܣ‬ሻ ‫ܯܷܲܥ‬ሺͲȁͲǡ ܿȁܽǡ ‫ܤ‬ȁ‫ܣ‬ሻ


ܲሺ‫ܣ‬ȁ‫ܤ‬ሻ ൌ Ǥ ܲሺ‫ܣ‬ȁ‫ܤ‬ሻ ൌ
ܲሺ‫ܤ‬ሻ ܷܲ‫ܯ‬ሺͲǡ ܿǡ ‫ܤ‬ሻ

Where the probability of A is represented by, The following table of the canonical breast cancer test example will help place the

‫ܣ݂݋ܽ݁ݎܣ‬ partial moments with their respective outcomes (R commands in red):


ܲሺ‫ܣ‬ሻ ൌ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܣ‬ሻ
‫݈ܵ݁݌݉ܽݏ݈ܽݐ݋ݐ݂݋ܽ݁ݎܣ‬
x 1% of women have breast cancer (and therefore 99% do not).
x 80% of mammograms detect breast cancer when it is there (and therefore
And the probability of B is represented by, 20% miss it).
‫ܤ݂݋ܽ݁ݎܣ‬ x 10% of mammograms detect breast cancer when it’s not there (and
ܲሺ‫ܤ‬ሻ ൌ ൌ ܷܲ‫ܯ‬ሺͲǡ ܿǡ ‫ܤ‬ሻ therefore 90% correctly return a negative result).
‫݈ܵ݁݌݉ܽݏ݈ܽݐ݋ݐ݂݋ܽ݁ݎܣ‬
x Using -1 for C & TN instances, and 1 for NC & TP instances8

Cancer (1%) No Cancer (99%) Y variable


Where ݁ is the minimum value target of area (distribution) S; just as ܽ and ܿ are Test Co.UPM(0,0,T,C,0,0)=.008 D.LPM(0,0,T,C,0,0)=.099 UPM(0,0,T) = .107
Positive
for areas (distributions) A and B respectively (d and b are maximum respective Test D.UPM(0,0,T,C,0,0)=.002 Co.LPM(0,0,T,C,0,0)=.891 LPM(0,0,T) =.893
Negative
value targets). Thus, if the conditional probability of B given A is (per equation X variable UPM(0,0,C) = .01 LPM(0,0,C) = .99 UPM+LPM=1

B.2),
Appendix C: Joint CDFs and UPM/LPM Correlation Analysis

‫ܯܷܲܥ‬ሺͲȁͲǡ ܿȁܽǡ ‫ܤ‬ȁ‫ܣ‬ሻ Joint CDFs:


ܲሺ‫ܤ‬ȁ‫ܣ‬ሻ ൌ
ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܣ‬ሻ
The discrete probability that both X is less than some target ݄௫ and Y is less than

some target݄௬ simultaneously is simply the degree 0 co-LPM provided earlier in


Then,
‫ܯܷܲܥ‬ሺͲȁͲǡ ܿȁܽǡ ‫ܤ‬ȁ‫ܣ‬ሻ equation (29).
ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܣ‬ሻ
ܷܲ‫ܯ‬ሺͲǡ ܽǡ ‫ܣ‬ሻ
ܲሺ‫ܣ‬ȁ‫ܤ‬ሻ ൌ 8
In R representing 1000 individuals:
ܷܲ‫ܯ‬ሺͲǡ ܿǡ ‫ܤ‬ሻ > C=c(rep(1,8),rep(-1,990),rep(1,2)); T=c(rep(1,107),rep(-1,893))
60 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 61

Joint CDF
”ൣš ൑ ݄௫ ǡ › ൑ ݄௬ ൧ ൌ ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሺǤ ͳሻ
6.00000%

5.00000%
This is the discrete CDF of the joint distribution, just how we
4.00000%
prove‫ܯܲܮ‬ሺͲǡ ݄ǡ ܺሻ is the discrete CDF of the univariate distribution.

Target
3.00000%
CLPM
Where, 2.00000%

Ͳ ൑ ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൑ ͳሺǤ ʹሻ 1.00000%

‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ has the following properties for various correlations 0.00000%

-1

-0.4
-0.9
-0.8
-0.7
-0.6
-0.5

-0.3
-0.2
-0.1
0

1
0.6
0.1
0.2
0.3
0.4
0.5

0.7
0.8
0.9
between the two variables ߩ௫௬ ǡ when ݄௫ = ݄௬ .9 Correlation

Figure 1C. Hypothetical 5% shared target on two variables (x, y) and the joint CDF
x If ߩ௫௬ = 1; ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‹ሼ‫ܯܲܮ‬ሺͲǡ ݄௫ ǡ ‫ݔ‬ሻǡ ‫ܯܲܮ‬൫Ͳǡ ݄௬ ǡ ‫ݕ‬൯ሽ.
for various correlations.
x If ߩ௫௬ = 0; ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ݄௫ ή ݄௬
x If ߩ௫௬ = -1; ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ Ͳ.
We can deduce the correlation between the assets only with knowledge of the

‫ ܯܲܮܥ‬and ݄௫ ȁ݄௬ . For example, with both our variables and their 5% targets, if
An example may help illustrate the relationship. Let’s assume the same target
the ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ͲǤʹͷΨ we know that ߩ௫௬ ൌ ͲǤ
݄௫ = ݄௬ which we arbitrarily select to the 5% CDF level for two normal

distributions with μ= 9 and σ= 20. We then ask, what’s the probability that both Equation C.3 will provide the implied correlation for an observed discrete joint

variables will be in the lower 5% of their distribution simultaneously under CDF, ‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯. Lucas (1995) provides a framework for estimating

different correlations? the correlation between two events with the following equation which substitutes

a binomial event into the standard Pearson correlation coefficient:

9
We leave further asymmetric target analysis for future research.
62 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Discrete Vs. Continuous 63

ܲሺ‫ܤ݀݊ܽܣ‬ሻ െ ܲሺ‫ܣ‬ሻ ൈ ܲሺ‫ܤ‬ሻ Partial Moment (Nonlinear) Correlations:


‫ݎݎ݋ܥ‬ሺ‫ܣ‬ǡ ‫ܤ‬ሻ ൌ ଵ ଵ ሺǤ ͵ሻ
ሾܲሺ‫ܣ‬ሻሺͳ െ ܲሾ‫ܣ‬ሿሻሿଶ  ൈ ሾܲሺ‫ܤ‬ሻሺͳ െ ܲሾ‫ܤ‬ሿሻሿଶ
Avoiding the linear dependence of the Pearson coefficient from which Lucas’

coefficient is derived, we can use the following relationship in Equation C.5 to


From which we can substitute the partial moments for our events
determine the nonlinear correlation between two variables (ͲȁͲ ՜ ͲሻǤ
ሺ‫ ݔ‬൑ ݄௫ ǡ ‫ ݕ‬൑ ݄௬ ሻ, yielding
ߩ௫௬ ൌ
ߩ௫௬ ൌ
ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ

‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܲܮ‬ሺͲǡ ݄௫ ǡ ‫ݔ‬ሻ ή ‫ܯܲܮ‬ሺͲǡ ݄௬ ǡ ‫ݕ‬ሻ ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ

ටሾ‫ܯܲܮ‬ሺͲǡ ݄௫ ǡ ‫ݔ‬ሻ ή ܷܲ‫ܯ‬ሺͲǡ ݄௫ ǡ ‫ݔ‬ሻሿ ή ሾ‫ܯܲܮ‬൫Ͳǡ ݄௬ ǡ ‫ݕ‬൯ ή ܷܲ‫ܯ‬൫Ͳǡ ݄௬ ǡ ‫ݕ‬൯ሿ ሺǤ ͷሻ

ሺǤ Ͷሻ
If there is a -1 correlation, then the returns between the variables will always be
divergent, thus
From our ݄௫ ൌ ݄௬ ൌ ͷΨ example, ሾͲ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ Ͳሿ
ߩ௫௬ ൌ  ൌ െͳሺǤ ͸ሻ
ͲǤʹͷΨ െ ሺͷΨሻሺͷΨሻ ሾͲ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ Ͳሿ
ߩ௫௬ ൌ 
ඥሾͷΨ ή ͻͷΨሿ ή ሾͷΨ ή ͻͷΨሿ

ߩ௫௬ ൌ ͲǤ If there is a perfect correlation between two variables, then there will be no
divergent returns, thus

ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ Ͳ െ Ͳ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ


If the first term in the numerator (‫ܯܲܮܥ‬൫ͲȁͲǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯) equals 0.25%, the ߩ௫௬ ൌ  ൌ ͳሺǤ ͹ሻ
ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ Ͳ ൅ Ͳ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ
implied correlation for that joint CDF is zero. This example also illustrates the

independence criterion (݄௫ ή ݄௬ ) from a zero correlation.


If there is zero correlation between two variables, then the co- and divergent
returns will be of equal frequency or magnitude (degree zero and degree one
respectively),

‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯
64 Discrete Vs. Continuous NONLINEAR NONPARAMETRIC STATISTICS

Thus,

ߩ௫௬ ൌ

ൣ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯൧

ൣ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯൧

ൌͲ

ሺǤ ͺሻ

Degree one can be substituted to generate correlations whereby the magnitude

of the target deviations are compared, generating a dependence coefficient. NONLINEARITY


Continuous Joint CDF:
IS TEDIOUS, NOT
The continuous joint CDF can be obtained with the following equation; whereby

the ratio of ‫ܯܲܮܥ‬൫ͳȁͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ to the entire degree 1 joint distribution will
COMPLEX
generate the probability percentage. Thus,

‫ܯܲܮܥ‬௥௔௧௜௢ ൫ͳȁͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ

‫ܯܲܮܥ‬൫ͳȁͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯


ൣ‫ܯܲܮ‬ሺͳǡ ݄௫ ǡ ‫ݔ‬ሻ ή ‫ܯܲܮ‬൫ͳǡ ݄௬ ǡ ‫ݕ‬൯൧ ൅ ൣܷܲ‫ܯ‬ሺͳǡ ݄௫ ǡ ‫ݔ‬ሻ ή ܷܲ‫ܯ‬൫ͳǡ ݄௬ ǡ ‫ݕ‬൯൧

ሺǤ ͻሻ

”ൣš ൑ ݄௫ ǡ › ൑ ݄௬ ൧ ൌ  ‫ܯܲܮܥ‬௥௔௧௜௢ ൫ͳȁͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሺǤ ͳͲሻ


Deriving Nonlinear Correlation Coefficients from Partial Moments

Abstract

We introduce a nonlinear correlation coefficient metric derived from


partial moments that can be substituted for the Pearson correlation coefficient in
linear instances as well. The flexibility offered by partial moments enables
ordered partitions of the data whereby linear segments are aggregated for an
overall correlation coefficient. Our coefficient works without the need to
perform a linear transformation on the underlying data, and can also provide a
general measure of nonlinearity between two variables. We also extend the
analysis to a multiple nonlinear regression without the adverse effects of
multicollinearity.
NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 69

1. INTRODUCTION

Chen et al. (2010) explore the problem of estimating a nonlinear correlation (See

Figure 1). They note that a generic use statistic such as the Pearson correlation

coefficient does not exist for nonlinear correlations. We introduce a generic nonlinear

correlation coefficient metric derived from partial moments that can be substituted for the

Pearson correlation coefficient in linear instances as well. The flexibility offered by

partial moments enables ordered partitions of the data whereby linear segments are

aggregated for an overall correlation coefficient.

Partial moments have three main advantages: (1) no distributional assumption is

required, (2) partial moments are integrated into economics through expected utility

theory (Holthausen, 1981 and Guthoff et al., 1997), and are integrated into statistics as

Viole and Nawrocki (2012a) find that partial moments can be used to derive the CDF and

PDF of any distribution.

The paper is organized as follows: The next section will cover the development of the

measure followed by a section with empirical results. Next, we extend the analysis to a

multidimensional nonlinear analysis with an application to nonlinear regression analysis.

A final discussion and summary completes the paper.


70 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 71

2. DEVELOPMENT OF NONLINEAR CORRELATION MEASURE 2.1 Co-Partial Moments

The Pearson correlation coefficient is represented by ்


ͳ ௡
‫ܯܲܮܥ‬൫݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼͲǡ ݄௫ െ ܺ௧ ሽ௡ ή ݉ܽ‫ݔ‬൛Ͳǡ ݄௬ െ ܻ௧ ൟ ሻ൩ሺͳሻ
ܿ‫ݒ݋‬ሺܺǡ ܻሻ ܶ
௧ୀଵ
ߩ௫ǡ௬ ൌ 
ߪ௫ ߪ௬

ͳ ௤
and is standardized in the range [-1,1]. The covariance and standard deviation cannot ‫ܯܷܲܥ‬൫‫ݍ‬ǡ ݈௫ ȁ݈௬ ǡ ܺหܻ൯ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼܺ௧ െ ݈௫ ǡ Ͳሽ௤ ή ݉ܽ‫ݔ‬൛Ͳǡ ܻ௧ െ ݈௬ ൟ ሻ൩ሺʹሻ
ܶ
௧ୀଵ
isolate and differentiate the information present in each of the four possible relationships

between two variables where the target is some reference point:


where ܺ௧ represents the observation X at time t, ܻ௧ represents the observation Y at time t,

ܺ ≤ target, ܻ ≤ target n is the degree of the LPM, q is the degree of the UPM, ݄௫ is the target for computing

ܺ ≤ target, ܻ > target below target observations for X, and ݈௫ is the target for computing above target

ܺ > target, ܻ ≤ target observations for X. For simplicity we assume that ݄௫ ൌ ݈௫ .

ܺ > target, ܻ > target

We propose a method of partitioning the distribution with partial moments to capture the 2.2 Divergent Partial Moments

information from each linear relationship embedded within a bi- or multivariate



relationship (linear or nonlinear). Based on the above four relationships between two ͳ ௡
‫ܯܲܮܦ‬൫‫ݍ‬ȁ݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼܺ௧ െ ݄௫ ǡ Ͳሽ௤ ή ݉ܽ‫ݔ‬൛Ͳǡ ݄௬ െ ܻ௧ ൟ ሻ൩ሺ͵ሻ
ܶ
variables, a co- or divergent partial moment is constructed to quantify it.i ௧ୀଵ


ͳ ௤
‫ܯܷܲܦ‬൫݊ȁ‫ݍ‬ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ  ൥෍൫݉ܽ‫ݔ‬ሼͲǡ ݄௫ െ ܺ௧ ሽ௡ ή ݉ܽ‫ݔ‬൛ܻ௧ െ ݄௬ ǡ Ͳൟ ൯൩ሺͶሻ
ܶ
௧ୀଵ
72 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 73

2.3 Definition of Variable Relationships: If there is a perfect correlation between two variables, then there will be no divergent

ܺ ≤ target, ܻ ≤ target → ‫ܯܲܮܥ‬൫݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ returns, thus,

ܺ ≤ target, ܻ > target → ‫ܯܷܲܦ‬ሺ݊ห‫ݍ‬ǡ ݄௫ ȁ݄௬ ǡ ܺหܻሻ


ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ Ͳ െ Ͳ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ
ߩ௫௬ ൌ  ൌ ͳሺ͹ሻ
ܺ > target, ܻ ≤ target → ‫ܯܲܮܦ‬ሺ‫ݍ‬ห݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻሻ ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ Ͳ ൅ Ͳ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ

ܺ > target, ܻ > target → ‫ܯܷܲܥ‬൫‫ݍ‬ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯

If there is zero correlation between two variables, then the co- and divergent returns will

be of equal frequency or magnitude (degree zero and degree one respectively),


To avoid the blunt covariance and standard deviation dependence of the Pearson

coefficient, we can use the following nonparametric formula in equation 5 to determine ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൌ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯

the correlation (linear or nonlinear) between two variables.

ߩ௫௬ ൌ Thus,
ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ
ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ ߩ௫௬ ൌ

ሺͷሻ
ൣ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯൧

ൣ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯൧

The axiomatic relationship between correlation and co- or divergent returns follows.
ൌͲ
If there is a -1 correlation, then the returns between the variables will always be

divergent, thus, ሺͺሻ

Degree one can be substituted for parameters n and q, to generate correlations


ሾͲ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ Ͳሿ
ߩ௫௬ ൌ  ൌ െͳሺ͸ሻ
ሾͲ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ Ͳሿ whereby the magnitude of the target deviations are compared; thus generating a

dependence coefficient.
74 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 75

2.4 Visualization of the Partitions Using Means as Targets:

‫ܯܷܲܥ‬ሺ‫ݍ‬ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ further


partitioned with new mean targets,
‫ݔ‬ଵ
തതതand ‫ݕ‬ଵ
തതത.

ߤ௫ 
ߤ௫  ‫ͳݔ‬
തതത
‫ʹݔ‬
തതത 
‫ܯܷܲܦ‬ଵ
‫ܯܷܲܥ‬ଵ
‫ͳݕ‬
തതത
‫ܯܷܲܦ‬ሺ݊ȁ‫ݍ‬ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ
‫ܯܷܲܦ‬ଶ ‫ܯܷܲܥ‬ଶ
‫ܯܷܲܥ‬ሺ‫ݍ‬ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ
‫ܯܲܮܦ‬ଵ
‫ܯܲܮܥ‬ଵ
Y ‫ʹݕ‬
തതത
‫ܯܲܮܥ‬ଶ
‫ܯܲܮܦ‬ଶ

ߤ௬  Y ߤ௬ 
‫ܯܷܲܥ‬ସ ‫ܯܷܲܥ‬ଷ
‫ܯܷܲܦ‬ଷ
‫ܯܷܲܦ‬ସ ‫͵ݕ‬
തതത
‫ܯܲܮܦ‬ሺ‫ݍ‬ȁ݊ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ ‫ݕ‬Ͷ
തതത
‫ܯܲܮܥ‬ሺ݊ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ
‫ܯܲܮܥ‬ସ ‫ܯܲܮܦ‬ଷ
‫ܯܲܮܦ‬ସ ‫ܯܲܮܥ‬ଷ

‫ݔ‬Ͷ
തതത ‫͵ݔ‬
തതത

X
Figure 1. 1st order partitioning of the distribution based on variable relationships X
with co- and divergent partial moments on an observed nonlinear correlation in a
microarray study from Chen et al. (2010).
Figure 2. 2nd order partitioning of the microarray study based on means of partial
moment subsets as targets.
76 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 77

2.5 Definition of Variable Subsets: 2.7 Defintion of Subset Partial Moments:



ሼ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሽ ‫ܯܷܲܥ א‬ሺ‫ݍ‬ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ ͳ ௤ ௤
‫ܯܷܲܥ‬ଵ ሺ‫ݍ‬ǡ തതതȁ‫ݕ‬ଵ ܺଵ ȁܻଵ ሻ ൌ  ൥෍൫݉ܽ‫ݔ‬൛Ͳǡ ‫ݔ‬ଵ ௧ െ തതതൟ
‫ݔ‬ଵ തതതǡ ‫ݔ‬ଵ ή ݉ܽ‫ݔ‬൛Ͳǡ ‫ݕ‬ଵ ௧ െ തതതൟ
‫ݕ‬ଵ ൯൩
ܶ
ሼ‫ݔ‬ଶ ǡ ‫ݕ‬ଶ ሽ ‫ܯܲܮܦ א‬ሺ‫ݍ‬ȁ݊ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ ௧ୀଵ

ሼ‫ݔ‬ଷ ǡ ‫ݕ‬ଷ ሽ ‫ܯܲܮܥ א‬ሺ݊ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ ሺͻሻ

ሼ‫ݔ‬ସ ǡ ‫ݕ‬ସ ሽ ‫ܯܷܲܦ א‬ሺ݊ȁ‫ݍ‬ǡ ߤ௫ หߤ௬ ǡ ‫ݔ‬ห‫ݕ‬ሻ



ͳ ௤ ௡
‫ܯܲܮܦ‬ଵ ሺ‫ݍ‬ȁ݊ǡ തതതȁ‫ݕ‬ଵ ܺଵ ȁܻଵ ሻ ൌ  ൥෍൫݉ܽ‫ݔ‬൛‫ݔ‬ଵ ௧ െ ‫ݔ‬
‫ݔ‬ଵ തതതǡ തതതǡ
ଵ Ͳൟ ή ݉ܽ‫ݔ‬൛Ͳǡ തതത
‫ݕ‬ଵ െ ‫ݕ‬ଵ ௧ ൟ ൯൩
ܶ
௧ୀଵ

ሺͳͲሻ
2.6 Definition of Subset Means:

σ௡௡ୀଵ ‫ݔ‬ଵ ௡ σ௡௡ୀଵ ‫ݕ‬ଵ ௡ ͳ ௡ ௡
‫ݔ‬ଵ ൌ 
തതത ‫ݕ‬
തതതଵ ൌ   ‫ܯܲܮܥ‬ଵ ሺ݊ǡ തതതȁ‫ݔ‬ଵ ܺଵ ȁܻଵ ሻ ൌ  ൥෍൫݉ܽ‫ݔ‬൛Ͳǡ തതത
‫ݔ‬ଵ തതതǡ ‫ݔ‬ଵ െ ‫ݔ‬ଵ ௧ ൟ ή ݉ܽ‫ݔ‬൛Ͳǡ ‫ݕ‬
തതതଵ െ ‫ݕ‬ଵ ௧ ൟ ൯൩
݊ ݊ ܶ
௧ୀଵ

σ௡௡ୀଵ ‫ݔ‬ଶ ௡ σ௡௡ୀଵ ‫ݕ‬ଶ ௡ ሺͳͳሻ


‫ݔ‬
തതതଶ ൌ  ‫ݕ‬
തതതଶ ൌ 
݊ ݊
σ௡௡ୀଵ ‫ݔ‬ଷ ௡ σ௡௡ୀଵ ‫ݕ‬ଷ ௡
തതതଷ ൌ 
‫ݔ‬ ‫ݕ‬
തതതଷ ൌ  ்
݊ ݊ ͳ ௡ ௤
‫ܯܷܲܦ‬ଵ ሺ݊ȁ‫ݍ‬ǡ ‫ݔ‬
തതതȁ‫ݕ‬ ଵ ܺଵ ȁܻଵ ሻ ൌ ൥෍൫݉ܽ‫ݔ‬൛Ͳǡ തതത
ଵ തതതǡ ‫ݔ‬ଵ െ ‫ݔ‬ଵ ௧ ൟ ή ݉ܽ‫ݔ‬൛‫ݕ‬ଵ ௧ െ ‫ݕ‬
തതതǡ
ଵ Ͳൟ ൯൩
σ௡௡ୀଵ ‫ݔ‬ସ ௡ σ௡௡ୀଵ ‫ݕ‬ସ ௡ ܶ
௧ୀଵ
തതതସ ൌ 
‫ݔ‬ ‫ݕ‬
തതതത
ସ  ൌ 
݊ ݊
ሺͳʹሻ

For a 3rd order analysis for example, one needs to then compute the 12 remaining

subset partial moments (in addition to the four identified in equations 9-12 above) using

the appropriate subset mean targets for each quadrant. The total amount of subset means

will be less than or equal to Ͷሺ୒ିଵሻ where N is the number of orders specified.ii
78 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 79

The eventual correlation metric is accomplished by adding all CUPM’s and CLPM’s 2.8 Dependence:

(positive correlations) and subtracting DUPM’s and DLPM’s (negative correlations) in We can also define the dependence present between two variables as the sum of the

the numerator, while summing all 16 co- and divergent partial moments representing the absolute value of the per quadrant correlations. Stated differently, when all of the per

entire distribution in the denominator per equation 13 below. quadrant observations are either the CLPM & CLPM, or DLPM & DUPM, the variables

are dependent upon one another.

ߟሺܺǡ ܻሻ ൌ  ȁߩ஼௅௉ெ ȁ ൅ ȁߩ஼௎௉ெ ȁ ൅ ȁߩ஽௅௉ெ ȁ ൅ ȁߩ஽௎௉ெ ȁሺͳͶሻ

ߩ௫௬ ൌ Where the CLPM quadrant’s correlation is given by

Numerator: ‫ܯܲܮܥ‬ସ ൅ ‫ܯܷܲܥ‬ସ െ ‫ܯܲܮܦ‬ସ െ ‫ܯܷܲܦ‬ସ


ȁߩ஼௅௉ெ ȁ ൌ ฬ ฬ
‫ܯܲܮܥ‬ସ ൅ ‫ܯܷܲܥ‬ସ ൅ ‫ܯܲܮܦ‬ସ ൅ ‫ܯܷܲܦ‬ସ
(‫ܯܲܮܥ‬ଵ ൅ ‫ܯܲܮܥ‬ଶ ൅ ‫ܯܲܮܥ‬ଷ ൅ ‫ܯܲܮܥ‬ସ െ ‫ܯܲܮܦ‬ଵ െ ‫ܯܲܮܦ‬ଶ െ ‫ܯܲܮܦ‬ଷ െ ‫ܯܲܮܦ‬ସ െ
‫ܯܷܲܦ‬ଵ െ ‫ܯܷܲܦ‬ଶ െ ‫ܯܷܲܦ‬ଷ െ ‫ܯܷܲܦ‬ସ ൅ ‫ܯܷܲܥ‬ଵ ൅ ‫ܯܷܲܥ‬ଵ ൅ ‫ܯܷܲܥ‬ଷ ൅ ‫ܯܷܲܥ‬ସ ሻ
Equation 14 describes the amount of nonlinearity present in each quadrant when the

Denominator: negative correlations are equal in frequency or magnitude (depending on degree 0 or 1

(‫ܯܲܮܥ‬ଵ ൅ ‫ܯܲܮܥ‬ଶ ൅ ‫ܯܲܮܥ‬ଷ ൅ ‫ܯܲܮܥ‬ସ ൅ ‫ܯܲܮܦ‬ଵ ൅ ‫ܯܲܮܦ‬ଶ ൅ ‫ܯܲܮܦ‬ଷ ൅ ‫ܯܲܮܦ‬ସ ൅ respectively) to the positive correlations.
‫ܯܷܲܦ‬ଵ ൅ ‫ܯܷܲܦ‬ଶ ൅ ‫ܯܷܲܦ‬ଷ ൅ ‫ܯܷܲܦ‬ସ ൅ ‫ܯܷܲܥ‬ଵ ൅ ‫ܯܷܲܥ‬ଵ ൅ ‫ܯܷܲܥ‬ଷ ൅ ‫ܯܷܲܥ‬ସ ሻ

(13) When ߟሺܺǡ ܻሻ equals one, there is maximum dependence between the two variables.

As ߟሺܺǡ ܻሻ approaches 0, the relationship is approaching maximum independence.


80 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 81

3. EMPIRICAL EVIDENCE:

Third order partitions are shown and calculated in R. The 1st order partition is the Nonlinear Differences:

thick red line (per Figure 1), the 2nd order partition is the thin red line (per Figure 2) and

the 3rd order partition is the dotted black line. ࢅ ൌ ࢄ૛ for positive X
> x=seq(0,3,.01);y=x^2
iii > cor(x,y)
Linear Equalities:
[1] 0.9680452
ࢅ ൌ ૛ࢄ > NNS.dep(x,y,print.map = T)
$Correlation
> x=seq(-3,3,.01);y=2*x
[1] 0.9994402
> cor(x,y)
$Dependence
[1] 1
[1] 0.9994402
> NNS.dep(x,y,print.map = T)
$Correlation
Figure 5. Nonlinear positive relationship between two variables (X, Y).
[1] 1
$Dependence
[1] 1

Figure 3. Linear positive relationship between two variables (X, Y).

ࢅ ൌ ࢄ૛ 
ࢅ ൌ െ૛ࢄ
> x=seq(-3,3,.01);y=x^2
> x=seq(-3,3,.01);y=-2*x > cor(x,y)
> cor(x,y) [1] 7.665343e-17
[1] -1 > NNS.dep(x,y,print.map = T)
> NNS.dep(x,y,print.map = T) $Correlation
$Correlation [1] -0.001647721
[1] -1 $Dependence
$Dependence [1] 0.9993975
[1] 1
Figure 6. Nonlinear relationship between two variables (X, Y).

Figure 4. Linear inverse relationship between two variables (X, Y).


82 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 83

As the exponential function increases in magnitude, we actually find it to retain its linear 4. MULTIDIMENSIONAL NONLINEAR ANALYSIS:

relationship… To find the 1st order aggregate correlation for more than two dimensions, the method

is similar to what was just presented. Instead of co- and divergent partial moments, we
ࢅ ൌ ࢄ૚૙
are going to substitute co- and divergent partial moment matrices into equation 5. A n x
> x=seq(0,3,.01);y=x^10
> cor(x,y)
n matrix for each of the interactions (CLPM, DLPM, DUPM and CUPM) per Viole and
[1] 0.6610183
> NNS.dep(x,y,print.map = T)
Nawrocki (2012a), can be constructed and treated analogously to the direct partial
$Correlation
[1] 0.9812511
moment computation.
$Dependence
[1] 0.9812511
Thus,


Figure 7. Nonlinear positive relationship between two variables (X, Y). ‫ܯܲܮܥ‬ሺͲǡ ݄௫ ȁ݄௫ ǡ ‫ݔ‬ȁ‫ݔ‬ሻ ‫ܯܲܮܥ ڮ‬ሺͲǡ ݄௫ ȁ݄௡ ǡ ‫ݔ‬ȁ݊ሻ
‫ܯܲܮܥ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ ൌ ൭ ‫ڭ‬ ‫ڰ‬ ‫ڭ‬ ൱ሺͳͷሻ
And a completely nonlinear clustered dataset, where coefficient weighting due to ‫ܯܲܮܥ‬ሺͲǡ ݄௡ ȁ݄௫ ǡ ݊ȁ‫ݔ‬ሻ ‫ܯܲܮܥ ڮ‬ሺͲǡ ݄௡ ȁ݄௡ ǡ ݊ȁ݊ሻ

partition occupancy is exemplified.

ࢅ ൌ ࢛࢔ࢊࢋ࢚ࢋ࢘࢓࢏࢔ࢋࢊࢌሺ࢞ሻ
Yielding,
> ߩ௫ǥ௡ ൌ
cor(cluster.df[,3],cluster.df[,4])
[1] -0.6275592 ሾ‫ܯܲܮܥ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ െ ‫ܯܲܮܦ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ െ ‫ܯܷܲܦ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ ൅ ‫ܯܷܲܥ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻሿ
ሾ‫ܯܲܮܥ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ ൅ ‫ܯܲܮܦ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ ൅ ‫ܯܷܲܦ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻ ൅ ‫ܯܷܲܥ‬௠௔௧௥௜௫ ሺͲǡ ݄௫ ǥ ݄௡ ǡ ‫ ݔ‬ǥ ݊ሻሿ
> NNS.dep(cluster.df[,3],
cluster.df[,4],print.map = T)
(16)
$Correlation
[1] -0.1020994
$Dependence
[1] 0.2637387 Whereby the final result will be an equal sized n x n matrix,

Figure 8. Nonlinear relationship between two variables (X, Y). ߩ௫௫ ‫ߩ ڮ‬௫௡ ͳ ‫ߩ ڮ‬௫௡
ߩ௫ǥ௡ ൌ൭ ‫ڭ‬ ‫ڰ‬ ‫ ڭ‬൱ൌ൭ ‫ڭ‬ ‫ڰ‬ ‫ ڭ‬൱
ߩ௡௫ ‫ߩ ڮ‬௡௡ ߩ௡௫ ‫ͳ ڮ‬
84 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 85

To derive the overall correlation, we need to sterilize the main diagonal of 1’s (which

are self-correlations) with the following formula,

ߩ௫௫ ‫ߩ ڮ‬௫௡
൥σ ൭ ‫ڭ‬ ‫ڰ‬ ‫ ڭ‬൱ െ ݊൩
ߩ௡௫ ‫ߩ ڮ‬௡௡
ߩ௫ǥ௡ ൌ ሺͳ͹ሻ
݊ଶ െ ݊

Again, if the variables are all below or above their respective targets at time t, the CLPM

and CUPM matrices respectively will capture that information. If the variables are i.i.d.,

the likelihood that one variable would diverge at time t increases as n increases, reducing

ߩ௫ǥ௡ .

Further order partition analysis can be translated to the multidimensional by creating

matrices for each of the identified subsets for all of the variables.

4.1 Nonlinear Regression Analysis:

The target means from which the four partial moment matrices are calculated also
Figure 9. Nonparametric regression points for a linear relationship between (X, Y).
serve as the basis for a nonlinear regression. By plotting all of the mean intersections, the Orders progressing restricted to the previous partition boundary.
linear segments will fit the underlying, nonparametrically. The increased order of

portioning will generate more intersecting points (maximum of Ͷሺ୒ିଵሻ) for a more We can also perform this on nonlinear relationships. Below is an example with 3rd

granular analysis. Below is an example with 3rd order partitioning, generating a fit to the order partitioning, generating a fit to an exponential relationship between the variables.

linear data.
86 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 87

And the nonlinear multiple regression can be performed in kind to the two variable

example above with means of Y, X* as intersection points. This is similar to a

nonparametric local means regression only the number of means has to be a factor of 4

due to the four partial moment matrices per each analysis.

Figure 11 below is the nonlinear correlation matrix and the subsequent weightings for

the multiple variable nonlinear regression using SPY as the dependent variable with TLT,

GLD, FXE, and GSG as explanatory variables.iv The data involved 100 daily

observations from 5/8/12 through 9/27/12 for all variables. As shown in Viole and
Figure 10. Nonparametric regression points for a nonlinear relationship between
(X, Y). As partition orders increase, the curve is better fit. Nawrocki (2012c) partial moments asymptotically converge to the area of the function,

and stabilize with approximately 100 observations.

> NNS.cor(ReturnsDF,order=3)
Generating a multiple variable nonlinear regression analysis requires creating a
GSG GLD TLT FXE SPY
synthetic variable. This variable, X* is the weighted average of all of the explanatory
GSG 1.00000000 -0.10111213 -0.05050505 0.06070809 0.11111111
variables. The weighting is the nonlinear correlation derived from the n x n matrix where
GLD -0.10111213 1.00000000 0.23232323 0.21212121 0.03030303

the explanatory variables are on the same row as the dependent variable which will have TLT -0.05050505 0.23232323 1.00000000 0.15151515 -0.23242629

a 1.0 self-correlation. Thus, an explanatory variable with zero correlation to the FXE 0.06070809 0.21212121 0.15151515 1.00000000 0.23232323

SPY 0.11111111 0.03030303 -0.23242629 0.23232323 1.00000000


dependent variable will be excluded from consideration.
Figure 11. Nonlinear correlation matrix for 5 variables (SPY, TLT, GLD,
Thus, FXE, GSG). Highlighted row isolates the coefficients for equation 18.

σ௡௜ୀଵ൫ߩ௬ǡ௫೔ ൯ሺ‫ݔ‬௜ ሻ
‫ כ‬ൌ ሺͳͺሻ
݊
88 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 89

In this example per equation 18 our aggregated explanatory variable is,

െͲǤʹ͵ሺܶ‫ܶܮ‬ሻ ൅ ͲǤͲ͵ሺ‫ܦܮܩ‬ሻ ൅ ͲǤʹ͵ሺ‫ܧܺܨ‬ሻ ൅ ͲǤͳͳሺ‫ܩܵܩ‬ሻ


‫ כ‬ൌ
Ͷ

Again, there are no multicollinearity issues with the explanatory variables, it simply

does not matter if they are correlated or not. Below in figure 13 is the graph of this

analysis with our 3rd order fit.

Figure 12. Our 9th order fit for a sine wave function of X.
90 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 91

5. DISCUSSION AND SUMMARY:

There is no argument as to why the partition cannot be further specified N times,

ultimately yielding a Ͷே number of segments. The partial moments are direct

computations, just as other statistics such as means and variances. The obvious benefit is

the ability to parse what was referred to as “noise” into valid information. Due to the fact


that individual observations are weighted by ቀ ቁ, the number of observations in each

segment will weigh the segment accordingly; thus affirming outlier observation status for

such instances where a segment has minimal occupancy.

The purpose of this paper was to put forth a nonparametric, nonlinear correlation

metric where Chen et al. (2010) note, “there is no commonly use statistic quantifying

nonlinear correlation that can find a similarly generic use as Pearson’s correlation

coefficient for quantifying linear correlation.” Our linear sum of the weighted micro does
th
Figure 13. Our 4 order fit for an undetermined function of X*.
indeed capture the aggregate correlation. But, unlike Pearson’s single correlation

coefficient, we also generate the information necessary to reconstruct the relationship

from the individual partial moment matrices. As for a direct policy statement resulting

from the nonlinear regression analysis; it would have to assume the form of a conditional

equation whereby each linear segment is defined for a specific range of the explanatory

variable(s).
Autoregressive Modeling

ABSTRACT

Using component series from a given time series, we are able to demonstrate
forecasting ability with none of the requirements of the traditional ARMA method, while
strictly adhering to the definition of an autoregressive model. We also propose a new test
for seasonality using coefficient of variation comparisons for component series, and then
extend this proposed method to non-seasonal data. The resulting effect is that of
conditional heteroskedasticity on the forecast with more accurate forecasts derived from
implementing nonlinear regressions into the component series.
NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 95

INTRODUCTION

An autoregressive model is simply a linear regression of the current value of the


series against one or more prior values of the series.10

In this article we aim to present a method of autoregressive modeling strictly

adhering to the above definition. We accomplish this by using a linear regression of like

data points excluded from the total time series. For instance, in monthly data, we will

examine the “January” data points autonomously to generate the ex ante “January”

observation.

Testing for seasonality of each of the monthly classifications will alert us weather

to incorporate other months’ data in the linear regression. Through simple examples, we

will show how the steps of:

x Model Identification
x Model estimation
x Diagnostic Testing
x Forecasting

Will be reduced to that of:

x Separating like classifications


x Testing for seasonality
x Regression / Forecasting

We will also demonstrate how the ARIMA requirement of stationarity of the time series

is no longer necessary to forecast while no data will be lost to differencing techniques.

10
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc444.htm
96 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 97

METHODOLOGY I. COMPONENT SERIES

In his 2008 article, Wang explains how to use Box-Jenkins models for
Our first step is to break the time series down into like classifications. In this
forecasting. He uses an example of the quarterly electric demand in New York City from
example, first quarter data will be aggregated to form a first quarter time series. The
the first quarter of 1995 through the fourth quarter of 2005.
vectors of observation number and sales are given below
Figure 1 clearly shows that the demand data are quarterly seasonal trending
Observation number = {1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41}
upward; consequently, the mean of the data will change over time. We can define that a
Sales = {22.91, 23.39, 23.51, 23.97, 24.81, 25.37, 24.95, 26.21, 25.76, 25.91, 27.08}
stationary time series has a constant mean and has no trend overtime. A plot of the data

is usually enough to see if the data are stationary. In practice, few time series can meet

this condition, but as long as the data can be transformed into a stationary series, a Box- Vectors for Quarters 2 through 4 will be created analogously using every fourth

Jenkins model can be developed. As defined above, this time series is not stationary. observation starting from the corresponding quarter number and the sales data.

Sales
45 QTR 1
40 28
27
35 26
25
30
Sales 24
25 23 QTR 1
22
20
21
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
20
Observation
1 5 9 13 17 21 25 29 33 37 41
Observation
Figure 1. Recreation of data set from Wang [2008] based on quarterly electric
demand in New York City from the first quarter of 1995 through the fourth quarter
Figure 2. First quarter series isolated from original time series.
of 2005.
98 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 99

II. SEASONALITY III. LINEAR REGRESSION

In order to test for seasonality, outside of the recommended “eyeball test” of the In order to adhere to the autoregressive definition provided in the introduction, we

plotted data, we propose another method. If each of the quarterly series’ coefficient of need to use a linear regression on the prior values of a variable. We have just created a

variation (σ/ μ) is less than the total sample coefficient of variation, seasonality exists. In subset of those values with like classifications to perform the regression.

our working example, the variances and means are presented in table 1 below.
Figure 3 below is the linear regression of the QTR 1 series. The regression equation is

y = 0.0961x + 22.878

Full Thus, our estimate for the next QTR 1 observation (the 45th observation overall)12 is
Sample QTR 1 QTR 2 QTR 3 QTR 4
σ 4.589798 1.261198 1.313679 3.632291 1.306242 y = 0.0961*45 + 22.878
μ 26.23295 24.89727 22.47545 33.09091 24.46818
y = 27.203
σ/ μ 0.174963 0.050656 0.058449 0.109767 0.053385 This is fairly close to the Box-Jenkins model result provided in Wang [2008] of 27.40.

Again, we have lost no observations due to differencing in order to transform the data
Table 1. Variances and means for full sample vs. each quarterly series. The
coefficient of variation (σ/ μ) is less than the sample for all component series, into a stationary series. Aside from the nonstationarity of the quarterly series, we note
indicating seasonality present in the data.
the linear approximation of the data as evidenced by the high ܴ ଶ of 0.9297. This linearity

is not necessary as will be discussed later when we introduce the nonlinear regression
In monthly time series from 1/2000 through 5/2013 for the S&P 500, we find the total
method to the discussion.
coefficient of variation to equal 0.158665526 with the “Janurary” series coefficient of

variation equal to 0.16710549, thus negating the seasonality consideration (and enabling

the data for a conditional heteroskedasticity treatment we will illustrate later).11

11 12
Plots of total and monthly series are in the Appendix. The same series can be regressed on its own index, for this example (1:11).
100 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 101

QTR 1 45 Sales as Quarterly Series


28
QTR 1
27
40 QTR 1 Estimate
26
25 QTR 2
24 35 QTR 2 Estimate
QTR 1
23 QTR 3
Linear (QTR 1)
22 QTR 3 Estimate
y = 0.0961x + 22.878 30 y = 0.0961x + 22.878
21
R² = 0.9297 R² = 0.9297 QTR 4
20 y = 0.0905x + 20.485
1 5 9 13 17 21 25 29 33 37 41 R² = 0.7586 QTR 4 Estimate
25 y = 0.2347x + 27.692 Linear (QTR 1)
Observation
R² = 0.6682
Linear (QTR 2)
y = 0.0986x + 22.102
20 R² = 0.9115 Linear (QTR 3)
Figure 3. QTR 1 plot with linear regression.
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

Figure 4. All quarterly plots with associated linear regressions and estimates for
each quarterly series.
We extend the analysis to all four quarter series and generate the forecasts based on the

linear regression of each series in figure 4 below. You will note the overall pattern

resemblance of the estimates to the seasonal data set.

Figure 5. 50 period forecast using static 4 period lag and linear regression.
102 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 103

IV. CONDITIONAL HETEROSKEDASTICITY In this example, we perform 8 component regressions and the forecast output

weights are determined by summing the inverses of each period’s coefficient of variation.
We noted earlier that under seasonality of the data, it is a simple regression of the

component series to generate a forecast. However, under the absence of perfect Period ) Intercept + β (t+1) = Forecast

seasonality this is not the case. When a single seasonal period is not identified, we use a 2) 24.6275325 + 0.3797007 (23) = 33.36065
3) 23.1120879 + 0.3990549 (15) = 29.09791
weighted average of all identified seasonal components. 4) 22.5900000 + 0.3845455 (12) = 27.20455
6) 23.874286 + 1.256071 (8) = 33.92286
7) 25.87466667 + 0.03914286 (7) = 26.14867
Figure 6 illustrates the seasonal components to the Wang [2008] quarterly time 8) 22.786 + 0.728 (6) = 27.154
10) 20.075 + 2.945 (5) = 34.8
series (data provided in Appendix). Note the strong seasonal presence in periods 4 and 8. 11) 23.110 + 0.999 (5) = 28.105

Period (i) Observations (t+1) Output Weight


2 23 0.283950617
3 15 0.185185185
4 12 0.148148148
6 8 0.098765432
7 7 0.086419753
8 6 0.074074074
10 5 0.061728395
11 5 0.061728395
SUM 81 1.0

Figure 6. Periods (i) where
࣌࢏ ࣌
൏ ࣆ࢞  for variable (x). Period (i) Inverse Coefficient of Variation ࣌࢏ Output Weight
ࣆ࢏ ࢞ ࢏
2 13.93351 0.153293933
3 6.090362835 0.067005065
࣌࢏ ࣌࢞
Period (i) Coefficient of Variation Variable Coefficient of Variation 4 17.86000365 0.196492513
ࣆ࢏ ࣆ࢞
6 11.75973359 0.129378451
2 0.07176943 0.1769858
7 6.263998078 0.068915368
3 0.16419383 0.1769858
8 16.5195327 0.181744895
4 0.05599103 0.1769858
10 12.16920896 0.133883424
6 0.08503594 0.1769858
11 6.297718204 0.069286351
7 0.15964245 0.1769858
SUM 90.89406702 1.0
8 0.06053440 0.1769858
10 0.08217461 0.1769858
11 0.15878767 0.1769858
Table 1. Coefficients of variance for all periods versus the variable coefficient of Table 2. Forecast output weights for all periods demonstrating seasonality.
variation.
104 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 105

Forecast * Averaged Output Weight = Weighted Forecast So even if the data for the component series resembles the sine wave function as in figure

33.36065* 0.218622275 = 7.293381202 7 below (we are highlighting the nonlinearity of the data, stationarity is irrelevant) we
29.09791* 0.126095125 = 3.669104596
27.20455* 0.172320331 = 4.687897049 will be able to generate a more accurate series forecast. We can see that the linear
33.92286* 0.114071942 = 3.869646502
26.14867* 0.077667561 = 2.03090341 regression would suggest a positive data point (in green), yet the nonlinear regression
27.154* 0.127909485 = 3.473254147
34.8* 0.09780591 = 3.40364566 based on partial moments from Viole and Nawrocki [2012] would suggest a decidedly
28.105* 0.065507373 = 1.841084715
negative observation for their forecasts.
Weighted Forecast Sum = 30.269

This technique places equal consideration on the number of observations in a

component series and its coefficient of variation. Again, it should be reserved for

instances of truly unknown seasonal periods and be more effective than a single seasonal

factor on a test set from the sample.

NONLINEAR REGRESSION

There is not a strong argument as to why a linear regression is required in the

autoregressive model. Perhaps it was due to the time in which the models were derived?

Regardless, we can use a nonlinear regression method to derive more accurate forecasts

than the stipulated linear regression. This option will handle the nonlinearity of the

component series.
Figure 7. Nonlinear regression on a hypothetical component series used to highlight
the inadequacy of a linear regression for forecasting even component series, let
alone total series.
106 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Tedious, Not Complex 107

DISCUSSION APPENDIX: Wang[2008] dataset.

We have closely approximated the results from a Box-Jenkins method with an Obs # Value Obs # Value
1 22.9 33 25.76
autoregressive model with no stationarity requirement, no model identification, capable
2 20.63 34 22.88

of handling nonlinearity. The absence of requirements and the retention of all of the 3 28.85 35 34.02
4 22.97 36 25.8
original data is a promising starting point to adhere to the definition of the process. 5 23.39 37 25.91
6 20.65 38 24.07

We have also introduced a method of detecting seasonality in time series data. 7 30.02 39 36.6
8 23.13 40 26.43
This technique can be used in conjunction with existing methods to confirm the results 9 23.51 41 27.08
10 22.99 42 24.99
found in tests with normalized data (typically autocorrelation plots of differenced data).
11 32.61 43 41.29

In the absence of seasonality, we offer a simple procedure for giving equal representation 12 23.28 44 26.69
13 23.97
of other component variance which typically influences the component series via 14 21.48
15 27.39
conditional heteroskedasticity.
16 23.75
17 24.81
18 21.51
19 33.2
20 23.68
21 25.37
22 22.36
23 33.36
24 23.5
25 24.95
26 22.22
27 34.81
28 24.64
29 26.21
30 23.45
31 31.85
32 25.28
108 Tedious, Not Complex NONLINEAR NONPARAMETRIC STATISTICS

S&P 500 2000 - 2013


1800
1600
1400
1200
1000
800
600 S&P 500 2000 - 2013
400
200
0
100
111
122
133
144
155
1

56
12
23
34
45

67
78
89

Observation
APPLES
Figure 1A. S&P 500 monthly returns 1/2000 – 5/2013.

S&P 500 January Series TO


1600
1400
1200
1000
800
APPLES
600 S&P 500 January Series
400
200
0
COMPARISONS
145
109
121
133

157
1
13
25
37
49
61
73
85
97

Observation

Figure 2A. S&P 500 January only returns 1/2000 – 5/2013.


NonLinear Scaling Normalization with Variance Retention

ABSTRACT

We present a nonlinear method of scaling to achieve normalization of multiple


variables. We compare this method to the standard linear scaling and the quantile
normalization methods. We find our overall normalized distribution to be more
representative of the original data set with regards to standard moments of individual
variables. We also find our normalized results to have an overall lower standard
deviation versus both the linear scaling and quantile normalization results for variables
with similar distributions.
NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 113

INTRODUCTION

Normalization is the preferred technique for aligning and then comparing various

data sets. However, this technique often loses the variance properties associated with the

underlying distributions. The results are catastrophic on continuous variables, such that

they are effectively transformed into discrete variables. Viole and Nawrocki [2012a]

demonstrate this undesirable transformation for normalized variables.

We propose a new method of normalization that improves upon the linear scaling

technique by incorporating a nonlinear association metric as proposed in Chen [2010],

and Viole and Nawrocki [2012b]. In essence the typical linear scaling method assumes a

linear relationship between variables.

We then compare these normalized data sets using our proposed nonlinear scaling

technique, the linear scaling method, and quantile normalization.

METHODS

Linear Scaling
Linear scaling uses each set as a reference once, then averaging all of the

iterations. This way original series for all is considered in the final normalization. It is

an equitable treatment of the data, yet blunt in its approach.


114 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 115

The Genomics and Bioinformatics Group of the NIH describe the linear scaling process where ‫ܨ‬௜ is the distribution function of chip i, and ‫ܨ‬௥௘௙ is the distribution function of the

as:13 reference chip.

In practice, for a series of chips, define normalization constants C1 , C 2 ,…, by: A quick illustration of such normalizing on a very small dataset:14
Arrays 1 to 3, genes A to D

‫ܥ‬ଵ ൌ ෍ ݂ଵ
௚௘௡௘
ǡ ‫ܥ‬ଶ ൌ ෍ ݂ଶ
௚௘௡௘
ǡ ƒ†•‘‘ǡ A 5 4 3
௚௘௡௘௦ ௚௘௡௘௦ B 2 1 4
௚௘௡௘ C 3 4 6
where the numbers ݂ଵ are the fluorescent intensities measured for each probe on chip
D 4 2 8
i. Select a common total intensity K (eg. the average of the Ci's). Then to normalize all For each column determine a rank from lowest to highest and assign number i-iv
the chips to the common total intensity K, divide all fluorescent intensity readings from A iv iii i
B i i ii
chip i by Ci., and multiply by K.
C ii iii iii

Quantile Normalization D iii ii iv


These rank values are set aside to use later. Go back to the first set of data. Rearrange that
The goal of the Quantile method is to make the distribution of probe intensities
first set of column values so each column is in order going lowest to highest value. (First
for each array in a set of arrays the same. Quantile normalization assumes that the
column consists of 5,2,3,4. This is rearranged to 2,3,4,5. Second Column 4,1,4,2 is
distribution of gene abundances is nearly the same in all samples. For convenience
rearranged to 1,2,4,4, and column 3 consisting of 3,4,6,8 stays the same because it is
Bolstad et al. [2003] take the pooled distribution of probes on all chips. Then to
already in order from lowest to highest value.)
normalize each chip they compute for each value, the quantile of that value in the

distribution of probe intensities; they then transform the original value to that quantile's The result is:
A 5 4 3 becomes A 2 1 3
value on the reference chip. In a formula, the transform is
B 2 1 4 becomes B 3 2 4
C 3 4 6 becomes C 4 4 6
‫ݔ‬௡௢௥௠ ൌ ‫ܨ‬௜ିଵ ቀ‫ܨ‬௥௘௙ ሺ‫ݔ‬ሻቁǡሺͳሻ
D 4 2 8 becomes D 5 4 8

13 14
http://discover.nci.nih.gov/microarrayAnalysis/Affymetrix.Preprocessing.jsp http://en.wikipedia.org/wiki/Quantile_normalization
116 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 117

Now find the mean for each row to determine the ranks OUR PROPOSED METHOD
A (2 1 3)/3 = 2.00 = rank i The nonlinear association between variables is an important metric. It is also
B (3 2 4)/3 = 3.00 = rank ii
quite new to the literature. Chen et al. [2010] propose a method by using a rank
C (4 4 6)/3 = 4.67 = rank iii
D (5 4 8)/3 = 5.67 = rank iv transformation on the underlying data, while Viole and Nawrocki [2012b] propose a

method based on the partial moments of the underlying data. VN will be the method
Now take the ranking order and substitute in new values:
employed for this analysis.
A iv iii i
B i i ii
We define the amount of nonlinearity association present between two variables as.
C ii iii iii
D iii ii iv
ߟሺܺǡ ܻሻ ൌ  ȁߩ஼௅௉ெ ȁ ൅ ȁߩ஼௎௉ெ ȁ ൅ ȁߩ஽௅௉ெ ȁ ൅ ȁߩ஽௎௉ெ ȁሺʹሻ
becomes: Original
A 5.67 4.67 2.00 5 4 3 Where,
B 2.00 2.00 3.00 2 1 4 Co-Partial Moments
C 3.00 4.67 4.67 3 4 6 ்
ͳ ௡
D 4.67 3.00 5.67 4 2 8 ‫ܯܲܮܥ‬൫݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼͲǡ ݄௫ െ ܺ௧ ሽ௡ ή ݉ܽ‫ݔ‬൛Ͳǡ ݄௬ െ ܻ௧ ൟ ሻ൩ሺ͵ሻ
ܶ
௧ୀଵ


This is the new normalized values. The new values have the same distribution and can ͳ ௤
‫ܯܷܲܥ‬൫‫ݍ‬ǡ ݈௫ ȁ݈௬ ǡ ܺหܻ൯ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼܺ௧ െ ݈௫ ǡ Ͳሽ௤ ή ݉ܽ‫ݔ‬൛Ͳǡ ܻ௧ െ ݈௬ ൟ ሻ൩ሺͶሻ
ܶ
௧ୀଵ
now be easily compared.

where ܺ௧ represents the observation X at time t, ܻ௧ represents the observation Y at time t,

n is the degree of the LPM, q is the degree of the UPM, ݄௫ is the target for computing

below target observations for X, and ݈௫ is the target for computing above target

observations for X. For notational simplicity we assume that ݄௫ ൌ ݈௫ and ݄௬ ൌ ݈௬ .


118 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 119

Divergent Partial Moments


When ߟሺܺǡ ܻሻ equals one, there is maximum dependence between the two

் variables. As ߟሺܺǡ ܻሻ approaches 0, it is approaching maximum quadrant linearity. Per


ͳ ௡
‫ܯܲܮܦ‬൫‫ݍ‬ȁ݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ  ൥෍ሺ݉ܽ‫ݔ‬ሼܺ௧ െ ݄௫ ǡ Ͳሽ௤ ή ݉ܽ‫ݔ‬൛Ͳǡ ݄௬ െ ܻ௧ ൟ ሻ൩ሺͷሻ
ܶ Viole and Nawrocki [2012b], the instances of maximum linearity ߟሺܺǡ ܻሻ ൌ Ͳ, are
௧ୀଵ


ͳ ௤
associated with maximum nonlinear correlation readings ߩ௫௬ ൌ ͳ‫ ݎ݋‬െ ͳ. Thus the use
‫ܯܷܲܦ‬൫݊ȁ‫ݍ‬ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯ ൌ  ൥෍൫݉ܽ‫ݔ‬ሼͲǡ ݄௫ െ ܺ௧ ሽ௡ ή ݉ܽ‫ݔ‬൛ܻ௧ െ ݄௬ ǡ Ͳൟ ൯൩ሺ͸ሻ
ܶ
௧ୀଵ of dependence is more aptly defining the nonlinear association between variables. For a

complete treatment on nonlinear correlations and associations please see Viole and

Nawrocki [2012b].
Definition of Variable Relationships:

ܺ ≤ target, ܻ ≤ target → ‫ܯܲܮܥ‬൫݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯


Using this nonlinear association metric as a factor in the normalization iterative
ܺ ≤ target, ܻ > target → ‫ܯܷܲܦ‬ሺ݊ห‫ݍ‬ǡ ݄௫ ȁ݄௬ ǡ ܺหܻሻ
process produces very different results than the assumed 1 (linearity) from the standard
ܺ > target, ܻ ≤ target → ‫ܯܲܮܦ‬ሺ‫ݍ‬ห݊ǡ ݄௫ ȁ݄௬ ǡ ܺหܻሻ linear scaling method.
ܺ > target, ܻ > target → ‫ܯܷܲܥ‬൫‫ݍ‬ǡ ݄௫ ȁ݄௬ ǡ ܺหܻ൯

Figure 1 below illustrates the process for a 2 gene and a 4 gene example. Each
Equation 2 describes the amount of nonlinearity present when the negative
gene has the desired property of serving as the reference gene (RG) in the process once.
correlations (D-PM’s) are equal in frequency or magnitude (depending on degree 0 or 1
This consideration is identical to the standard linear scaling technique. From each RG’s
respectively) to the positive correlations (C-PM’s).
total intensity, we derive the RG factor for each gene to the RG. Simple enough.

The nonlinear correlation between two variables is given by However, we then multiply each gene’s observations by the RG factor and the nonlinear

ߩ௫௬ ൌ association between the genes ߟሺܺǡ ܻሻ.

ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ െ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ
ሺ͹ሻ
ሾ‫ܯܲܮܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܲܮܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܦ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ ൅ ‫ܯܷܲܥ‬൫Ͳǡ ݄௫ ȁ݄௬ ǡ ‫ݔ‬ห‫ݕ‬൯ሿ
120 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 121

We repeat this process with every gene serving as the RG and then average all of

the RG factored observations for each gene. The result is a fully normalized distribution

for each gene with variance retention of the original data set.
122 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 123

We now present the results of this method on four financial variables SPY, TLT,

GLD, and FXE. The nonlinear association between self and cross financial time-series is

well noted. This is an important test, since gene distributions are roughly similar, how

does this method work on the most stochastic variables?

Figure 3 below illustrates the results. Our method visually represents the original

data set more clearly and also retains the finite moment relationships that the linear

scaling method enjoys. We note the strong influence the nonlinear association has on the

normalized series, as SPY is distinct due to its very low correlation to any of the other

time series. Thus, the more correlated the series are, the lower the variance of the

normalized population.

The problem with quantile normalization is that if the distributions do not

intersect, the quantile ranks remain static and the normalized value is simply the mean.

This is exemplified below with the financial variables. Obviously this is not an issue

with gene arrays, however, it speaks to the ad hoc nature of the method. We see in

Figure 3 below quantile normalization does succeed in creating the same distribution for

all, however, they are all uniform distributions.


124 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 125

ORDERS OF MAGNITUDE DIFFERENCES REMOVED

The method also successfully removes orders of magnitude differences between

variables. Below in figure 4 is an example illustrating the results on MZM ($ billions

scale), S&P 500 (point scale) and the US 10 Year Yield (% scale).

Unnormalized Data
14000.00 18.00

12000.00 16.00
14.00
10000.00
12.00

Yield %
8000.00 10.00 S&P 500
6000.00 8.00 MZM
6.00
4000.00 10 Yr Yield
4.00
2000.00 2.00
0.00 0.00

1984
1959
1964
1969
1974
1979

1989
1994
1999
2004
2009
Nonlinear Scaling
5000
4500
4000
3500
3000
S&P 500
2500
2000 10 Yr Yield
1500 MZM
1000
500
0

1978
1959
1962
1965
1968
1972
1975

1981
1985
1988
1991
1994
1998
2001
2004
2007
2011
Figure 4. Orders of magnitude differences removed from 3 financial variables.
126 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS

DISCUSSION
Note the tighter overall distribution from our method versus the linear scaling

method. Also note the variance properties of the each of the distributions versus the

quantile normalization. We are tighter and more representative of the original data set for

similar distributions. When the distributions vary considerably, the nonlinear association

will be reflected in the variance of the normalized series.

ANOVA Using Continuous Cumulative Distribution Functions


We also have retained mean differences between the distributions for nonlinear

variables. This characteristic is lost via its use as the normalizing factor in the linear
Abstract
scaling technique. Factoring the nonlinear association between variables is imperative in
Analysis of Variance (ANOVA) is a statistical method used to determine whether
noting the nonlinear differences. Moreover, if the variable relationship is linear, our a sample originated from a larger population distribution. We provide an alternate
method of determination using the continuous cumulative distribution functions derived
method retains the relationship between variables! from degree one lower partial moment ratios. The resulting analysis is performed with
no restrictive assumptions on the underlying distribution or the associated error terms.
Bolstad et al. [2003] note,

“The four baselines shifted slightly lower in the intensity scale give the
most precise estimates. Using this logic, one could argue that choosing the
array with the smallest spread and centered at the lowest level would be
the best, but this does not seem to treat the data on all arrays fairly.”

Our method does treat all of the data on all of the arrays fairly. We use each array as a

RG and utilize its nonlinear association (which uses all observations equally) with all

other arrays equally.


NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 129

INTRODUCTION

Analysis of Variance (ANOVA) is a statistical method used to determine whether

a sample originated from a larger population distribution. This is accomplished by using

a statistical test for heterogeneity of means by analysis of group variances. By defining

the sum of squares for the total, treatment, and errors, we then obtain the P-value

corresponding to the computed F-ratio of the mean squared values. If the P-value is

small (large F-ratio), we can reject the null hypothesis that all means are the same for the

different samples. However, the distributions of the residuals are assumed to be normal

and this normality assumption is critical for P-values computed from the F-distribution to

be meaningful. Instead of using the ratio of variability between means to the variability

within each sample, we suggest an alternative approach.

Using known distributional facts from samples, we can deduce a level of certainty

that multiple samples originated from the same population without any of the

assumptions listed below.


130 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 131

ANOVA ASSUMPTIONS KNOWN DISTRIBUTIONAL FACTS FROM SAMPLES

When using one-way analysis of variance, the process of looking up the resulting Viole and Nawrocki [2012a] offer a detailed examination of CDFs and PDFs of

value of F in an F-distribution table, is proven to be reliable under the following various families of distributions represented by partial moments. They find that the

assumptions: continuous degree 1 LPM ratio is .5 from the mean of the sample. No deviations, for

every distribution type, regardless of number of observations, period. Thus when a


x the values in each of the groups (as a whole) follow the normal curve,
x with possibly different population averages (though the null hypothesis is that all sample mean is compared to the population, the further the population continuous degree
of the group averages are equal) and
x equal population standard deviations (SD). 1 LPM ratio from the sample mean target is from 0.5, the less confident we are that

The assumption that the groups follow the normal curve is the usual one made in most sample belongs to that population.

significance tests, though here it is somewhat stronger in that it is applied to several


‫ܯܲܮ‬ሺͳǡ ݄ǡ ܺሻ
‫ܯܲܮ‬௥௔௧௜௢ ሺͳǡ ݄ǡ ܺሻ ൌ ሺͳሻ
groups at once. Of course many distributions do not follow the normal curve, so here is ሾ‫ܯܲܮ‬ሺͳǡ ݄ǡ ܺሻ ൅ ܷܲ‫ܯ‬ሺͳǡ ݄ǡ ܺሻሿ

Where,
one reason that ANOVA may give incorrect results. It would be wise to consider

whether it is reasonable to believe that the groups' distributions follow the normal curve. ͳ
‫ܯܲܮ‬ሺ݊ǡ ݄ǡ ‫ݔ‬ሻ ൌ ൥෍ ݉ܽ‫ݔ‬ሼͲǡ ݄ െ ‫ݔ‬௧ ሽ௡ ൩ሺʹሻ
ܶ
௧ୀଵ

Of course the different population averages imposes no restriction on the use of ANOVA; ்
ͳ
ܷܲ‫ܯ‬ሺ‫ݍ‬ǡ ݈ǡ ‫ݔ‬ሻ ൌ ൥෍ ݉ܽ‫ݔ‬ሼͲǡ ‫ݔ‬௧ െ ݈ሽ௤ ൩ሺ͵ሻ
the null hypothesis, as usual, allows us to do the computations that yield F. ܶ
௧ୀଵ

The third assumption, that the populations' standard deviations are equal, is important in
where ‫ݔ‬௧ represents the observation x at time t, n is the degree of the LPM, q is the degree
principle, and it can only be approximately checked by using as bootstrap estimates the
of the UPM, h is the target for computing below target returns, and l is the target for
sample standard deviations. In practice, statisticians feel safe in using ANOVA if the
computing above target returns. ݄ ൌ ݈ ൌ ߤ throughout this paper.
largest sample SD is not larger than twice the smallest.15
Tables 1 through 4 illustrate the consistency of the degree 1 LPM ratio across distribution

types.
15
http://math.colgate.edu/math102/dlantz/examples/ANOVA/anovahyp.html
132 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 133

Uniform Distribution Probabilities - 5 Million Draws 300 Iteration Seeds


CDFs of Mean UNDF(X ≤ 0.00) = .4 LPM(0, 0, X) = .4 LPM(1, 0, X) = .3077
0.51 UNDF(X ≤ 4.50) = .445 LPM(0, 4.5, X) = .445 LPM(1, 4.5, X) = .3913
UNDF(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5
0.505 UNDF(X ≤ 13.5) = .535 LPM(0, 13.5, X) = .535 LPM(1, 13.5, X) = .5697
Probability

0.5
Table 2. Final probability estimates with 5 million observations and 300 iteration seeds
0.495 LPM(0,μ,x) averaged for the Uniform distribution. Bold estimate is the continuous ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻ ൌ
LPMratio(1,μ,x) ૙Ǥ ૞.
0.49

0.485
10

1666
148
286
424
562
700
838
976
1114
1252
1390
1528

1804
1942
2080
2218
2356
2494
2632
2770
2908
Poisson Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
Observations POIDF(X ≤ 0.00) = .00005 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0
POIDF(X ≤ 4.50) = .0293 LPM(0, 4.5, X) = .0293 LPM(1, 4.5, X) = .0051
POIDF(X ≤ Mean) = .5151 LPM(0, μ, X) = .5151 LPM(1, μ, X) = .5
Figure 1. Differences in discrete ࡸࡼࡹሺ૙ǡ ࣆǡ ࢄሻ and continuous ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻǤCDFs POIDF(X ≤ 13.5) = .8645 LPM(0, 13.5, X) = .8645 LPM(1, 13.5, X) = .9365
converge when using the mean target for a Normal distribution. ࡸࡼࡹሺ૙ǡ ࣆǡ ࢄሻ ്
ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻǤ

Table 3. Final probability estimates with 5 million observations and 300 iteration seeds
averaged for the Poisson distribution. Bold estimate is the continuous ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻ ൌ
Normal Distribution Probabilities - 5 Million Draws 300 Iteration Seeds ૙Ǥ ૞.
Norm Prob(X ≤ 0.00) = .3085 LPM(0, 0, X) = .3085 LPM(1, 0, X) = .2208
Norm Prob(X ≤ 4.50) = .3917 LPM(0, 4.5, X) = .3917 LPM(1, 4.5, X) = .3339
Norm Prob(X ≤ Mean) = .5 LPM(0, μ, X) = .5 LPM(1, μ, X) = .5
Norm Prob(X ≤ 13.5) = .5694 LPM(0, 13.5, X) = .5694 LPM(1, 13.5, X) = .608
Chi-Squared Distribution Probabilities - 5 Million Draws 300 Iteration Seeds
CHIDF(X ≤ 0) = 0 LPM(0, 0, X) = 0 LPM(1, 0, X) = 0
CHIDF(X ≤ 0.5) = .5205 LPM(0, 0.5, X) = .5205 LPM(1, 0.5, X) = .2087
Table 1. Final probability estimates with 5 million observations and 300 iteration seeds CHIDF(X ≤ 1) = .6827 LPM(0, 1, X) = .6827 LPM(1, 1, X) = .5
averaged for the Normal distribution. Bold estimate is the continuous ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻ ൌ CHIDF(X ≤ 5) = .9747 LPM(0, 5, X) = .9747 LPM(1, 5, X) = .989
૙Ǥ ૞.

Table 4. Final probability estimates with 5 million observations and 300 iteration seeds
averaged for the Chi-Squared distribution. Bold estimate is the continuous
ሺ૚ǡ
ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ࣆǡ ࢄሻ ൌ ૙Ǥ ૞.
134 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 135

METHODOLOGY EXAMPLES OF OUR METHODOLOGY

We propose using the mean absolute deviation from 0.5 for the samples in Figure 1 below illustrates 3 hypothetical sample distributions. The dotted lines

question. This result compared to the ideal 0.5 will then answer the ANOVA inquiry are the sample means ࣆ, which we know have an associated ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻ ൌ ૙Ǥ ૞.

whether the samples originated from the same population. The solid black line is the mean of means ࣆ
ഥ ൌ ૚ૢǤ ૡ૚, and associated LPM ratio

deviations from 0.5 can be visually estimated.


ഥ . Then we can compute each
First we need the average of all of the sample means, ࣆ

sample’s absolute deviation from the mean of means.

‫ܦ‬௜ ൌ ȁ‫ ݋݅ݐܽݎܯܲܮ‬ሺͳǡ ߤത ǡ ܺሻ െ ‫ ݋݅ݐܽݎܯܲܮ‬ሺͳǡ ߤǡ ܺሻȁሺͶሻ

Which reduces to,

‫ܦ‬௜ ൌ ȁ‫ ݋݅ݐܽݎܯܲܮ‬ሺͳǡ ߤത ǡ ܺሻ െ ͲǤͷȁ

The mean absolute deviation for n samples is then



ͳ
‫ ܦܣܯ‬ൌ ෍ȁ‫ ݋݅ݐܽݎܯܲܮ‬ሺͳǡ ߤതǡ ܺ݅ ሻ െ ͲǤͷȁ ሺͷሻ
݊
௜ୀଵ

Yielding our measure of certainty ߩ associated with the null hypothesis that the samples

in question belong to the same population

ሺͲǤͷ െ ‫ܦܣܯ‬ሻଶ
ߩൌ ሺ͸ሻ
ͲǤͷ

The next section will provide some visual confirmation of this methodology with Figure 1. 3 samples from the same population.

confirming classic ANOVA analysis.


136 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 137

We can see visually that the ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆ


ഥ ǡ ࢄሻ for these 3 samples is approximately 0.52, Figure 2 below illustrates 3 hypothetical sample distributions, only more varied than the

0.51, and 0.48 for blue, purple and green respectively. The mean absolute deviation from previous example. The dotted lines are the sample means ࣆ, which we know have an

.5 is equal to .0167. Thus we are certain ሺߩ ൌ ͲǤͻ͵Ͷሻ these 3 samples are from the same associated ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻ ൌ ૙Ǥ ૞. The solid black line is the mean of means

population. ࣆ
ഥ ൌ ૛૙Ǥ ૝ૡ, and associated LPM ratio deviations from 0.5 can be visually estimated.

According to the F-Values and associated degrees of freedom,

ࡲǤ૙૞ ሺ૛ǡ ૛ૠࢊࢌሻ ൌ ૜Ǥ ૜૞૝૚

ࡲǤ૙૚ ሺ૛ǡ ૛ૠࢊࢌሻ ൌ ૞Ǥ ૝ૡૡ૚

The classic ANOVA would reach the same conclusion even at ܲ‫ ݁ݑ݈ܽݒ‬൏ ǤͲͳ.

Figure 2. 3 samples not from the same population.


138 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 139

We can see visually that the ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆ


ഥ ǡ ࢄሻ for these 3 samples is approximately 0.65,

0.63, and 0.2 for blue, purple and green respectively. The mean absolute deviation from

.5 is equal to .1933. Thus we are not certain ሺߩ ൌ ͲǤ͵͹͸ሻ these 3 samples are from the

same population. The null hypothesis of a same population was rejected by classic

ANOVA at ܲ‫ ݁ݑ݈ܽݒ‬൏ ǤͲͳ.

Figure 3 below illustrates 3 hypothetical sample distributions, only more varied than the

previous example. The dotted lines are the sample means, which we know have an

associated ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆǡ ࢄሻ ൌ ૙Ǥ ૞. The solid black line is the mean of means


ഥ ൌ ૛૙Ǥ ૝ૡ, and associated LPM ratio deviations from 0.5 can be visually estimated.

Figure 3. 3 samples not from the same population.

We can see visually that the ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆ


ഥ ǡ ࢄሻfor these 3 samples is approximately 0.65,

0.63, and 0.01 for blue, purple and green respectively. The mean absolute deviation from

.5 is equal to .2567. Thus we are more certain ሺߩ ൌ ͲǤʹ͵͹ሻ than the previous example

that these 3 samples are NOT from the same population.


140 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Apples to Apples 141

SIZE OF EFFECT DISCUSSION

In the previous sections, we identified whether a difference exists and demonstrated how Viole and Nawrocki [2012c] define the asymptotic properties of partial moments

to assign a measure of uncertainty to our data. We focus now on how to ascertain the to the area of any f(x). Thus, it makes intuitive sense that increased quantities of samples

size of the difference present. The use of confidence intervals is often suggested as a and observations will provide a better approximation of the population. Given this

method to evaluate effect sizes. Our methodology assigns the interval to the effect truism, the degrees of freedom do not properly compensate the number of observations.

without the standardization or parameterization required for traditional confidence


We can see below that increasing the number of distributions from two to three and
intervals.
increasing the number of observations from 30 to 100 does not have an order of

The first step is to derive a sample mean for which we would be 95% certain the sample magnitude effect on the F-Values.

mean belongs to the population. We calculate the lower 2.5% of the distribution with a
2 distributions and 3 distributions with 30 observations each:
LPM test at each point to identify the inverse, akin to a value-at-risk derivation. We
ࡲǤ૙૞ ሺ૚ǡ ૞ૢࢊࢌሻ ൌ ૝Ǥ ૙૙૝ ࡲǤ૙૞ ሺ૛ǡ ૡૡࢊࢌሻ ൌ ૜Ǥ ૚૙૙૚
perform the same on the upper portion of the distribution with a UPM test. This two

sided test results in a negative deviation from the population mean ሺࣆ‫ ିכ‬ሻ and a 2 distributions and 3 distributions with 100 observations each:

corresponding positive deviation from the mean ሺࣆ‫כ‬ା ሻ. It is critical to note that this is not ࡲǤ૙૞ ሺ૚ǡ ૚ૢૢࢊࢌሻ ൌ ૜Ǥ ૡૡૡ૟ ࡲǤ૙૞ ሺ૛ǡ ૛ૢૡࢊࢌሻ ൌ ૜Ǥ ૙૛૟૚

necessarily a symmetrical deviation, since any underlying skew will alter the CDF

derivations for these autonomous points.


The t-test concerns are simply nonexistent under this methodology, thus multiple 2

The effect size then is simply, the difference between the observed meanሺࣆሻ and a distribution tests can be performed. For example, if 15 samples are all drawn from the

certain mean associated within a tolerance either side of the population mean same population, then there are 105 possible comparisons to be made leading to an

ሺࣆ‫כࣆࢊ࢔ࢇ ିכ‬ା ሻ. increased type-1 error rate. The mean absolute deviation for 2 distributions’

ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆ
ഥ ǡ ࢄሻ would have to be > 0.025 to be less than 95% certain (0.475/.5) the
ሺࣆ െ ࣆ‫ ିכ‬ሻ ൑ ࢋࢌࢌࢋࢉ࢚ ൑ ሺࣆ െ ࣆ‫כ‬ା ሻ.
distributions came from the same population. This translates to a substantial percentage
142 Apples to Apples NONLINEAR NONPARAMETRIC STATISTICS

difference in means. It is not hard to visualize such an extreme scenario such as Figure 4

below.

Figure 4. 2 samples not from the same population.


CORRELATION
Given this scenario whereby ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆ
ഥ ǡ ࡭ሻ ൌ ૚Ǥ ૙ and ࡸࡼࡹ࢘ࢇ࢚࢏࢕ ሺ૚ǡ ࣆ
ഥ ǡ ࡮ሻ ൌ ૙, the

mean absolute deviation from ࣆ


ഥ ൌ ૙Ǥ ૞ thus ߩ ൌ Ͳ. Therefore, we are certain these ≠
distributions came from different populations.

Again, we have no assumptions on the data to generate this analysis and compensate for

any deviation from normality either in the distribution of returns or the distribution of
CAUSATION
error terms. We substitute our level of certainty ߩ for an F-test and associated P-value

based ANOVA; the latter has been the subject of increasing debate recently and should

probably be avoided.16

16
http://news.sciencemag.org/sciencenow/2009/10/30-01.html?etoc
http://www.sciencenews.org/view/feature/id/57091/description/Odds_Are_Its_Wrong
Causation

Abstract

We identify the necessary conditions to define causation between two variables.


We compare this to Granger causality and the convergent cross mapping method to
illustrate the theoretical differences. Our proposed method avoids the reciprocal Granger
and nonlinearity concerns. We loosely share a procedural step with the convergent cross
mapping method in so much that our lagged variable time-series are normalized. The
resulting normalized variables permit relevant conditional probability and correlation
statistics to be generated and used to determine causation.
NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 147

INTRODUCTION

Correlation does not imply causation. We have known this to be the case for

decades, however, the often misapplication of correlation to causation speaks volumes to

the suspicion that correlation and causation are entwined…but how? Fischer Black

[1984] offers multiple normative cases explaining how causality can only be

demonstrated with experimentation. Black’s argument is indirectly identifying the

conditional probability associated with a causal relationship and is explicit in our

proposed measure of causality.

࡯ሺࢄ ՜ ࢅሻ ൌ ࡼሺࢅȁࢄሻ ‫ࢄ࣋ כ‬ǡࢅ ሺͳሻ

CAUSATION(X ՜ Y) = CONDITIONAL PROBABILITY(Y|X) *

CORRELATION(X,Y)

Conditional Probability: The probability that an event will occur, given that one or more other

events have occurred.

Correlation: A mutual relationship or connection between two or more things.


148 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 149

Correlation is a reciprocal relationship between two things. Conditional probability is not variables (Z). Attractor reconstruction is used to determine if two time series variables

necessarily a reciprocal relationship between two things. This distinction is critical in belong to the same dynamic system and are thus causally related.

factoring correlation to define the correlation/causation link.


Points on manifolds X and Y will only be nearest neighbors if X and Y are causally

HISTORICAL CAUSALITY TESTS related. CCM uses the historical record of Y to estimate the states of X and vice versa.

With longer time series the reconstructed manifolds are denser, nearest neighbors are
GRANGER CAUSALITY
closer, and the cross map estimates increase in precision. This convergence is used as a
Granger causality (GC) measures whether one event (X) happens before another
practical criterion for determining causation, further exposed by measuring the extent to
event (Y) and helps predict it. According to Granger causality, past values of X should
which the historical record of Y values can reliably estimate states of X. CCM
contain information that helps predict Y better than a prediction based on past values of Y
hypothesizes that this reliable estimate holds only if X is causally influencing Y.
alone. The formulation is based on a linear regression modeling of stochastic processes.
“In dynamical systems theory, time-series variables (say, X and Y ) are
This technique immediately raises some well documented concerns, namely, causally linked if they are from the same dynamic system (Dixon et al.
[1999], Takens [1981], Deyle et al. [2011])—that is, they share a common
linearity, stationarity and of course the appropriate selection of variables. Any proposed attractor manifold M.” Sugihara et al. [2012]

substitute should be able to address these basic data set concerns. Figure 1 is a reproduction from their paper illustrating the manifold relationship.

CONVERGENT CROSS MAPPING

Sugihara et al. [2012] examine an approach specifically aimed at identifying

causation in ecological time series called convergent cross mapping (CCM). They

demonstrate the principles of their approach with simple model examples, showing that

the method distinguishes species interactions (X, Y) from the effects of shared driving
150 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 151

probabilities are not restricted to these specific characteristics. Separability reflects the

view that systems can be understood a piece at a time rather than as a whole. By

normalizing the variables, we retain the whole system view perspective.

Our proposed measure avoids the GC problems of nonlinearity by normalizing the

variables with a nonlinear scaling method. It also avoids the Granger problems of reverse

causality since the Venn areas (conditional probabilities) would have to be identical in

size and shape and location to permit reverse causality.

OUR PROPOSED METHOD

Figure 1. Manifold relationship from Sugihara et al. [2012]. The first step in our method is to normalize the variables in order to determine the

conditional probability between the two variables in question. In an experiment setting,

conditional probability is controlled quite easily; in fact, this is the main argument of

Black [1984]. To determine the conditional probability, we need a shared histogram for
Separability Requirement
variables X and Y. This is not all dissimilar to the approach in the convergent cross map

Sugihara et al. note the key requirement of GC is separability, namely that technique, with the common attractor manifold for the original system M used to describe

information about a causative factor is independently unique to that variable. Conditional ‫ܯ‬௑ and ‫ܯ‬௒ .

probability is also independently unique to that variable. Separability is characteristic of


1) Normalize the variables. Viole and Nawrocki [2013] (VN) present a
purely stochastic and linear systems, and GC can be useful for detecting interactions
method for normalizing variables with a nonlinear scaling method that reflects the
between strongly coupled (synchronized) variables in nonlinear systems. Conditional
inherent nonlinear association between the variables within the scaling factor. The
152 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 153

normalized variables retain their variance and other finite moment characteristics. This is 3) Derive the conditional probabilities. Using the partial moments of each of

important to accurately derive the conditional probability of the new normalized the resulting distributions will allow us to derive the conditional probabilities of the

variables. This is also critical in addressing the nonlinearity between variables where GC normalized variables.

fails.

ͳ
The CCM manifolds ‫ܯ‬௑ and ‫ܯ‬௒ are constructed from lagged coordinates of the ‫ܯܲܮ‬ሺ݊ǡ ݄ǡ ܺሻ ൌ  ෍ሼƒšሺ݄ െ ܺ௧ ሻ ǡ Ͳሽ௡ ሺͳሻ
ܶ
௧ୀଵ
time series variables to retain past information. We accomplish the retention of lagged

information via the normalization of each variable against lagged values of itself (߬ and ͳ
ܷܲ‫ܯ‬ሺ‫ݍ‬ǡ ݈ǡ ܺሻ ൌ  ෍ሼƒšሺܺ௧ െ ݈ሻ ǡ Ͳሽ௤ ሺʹሻ
ܶ
௧ୀଵ
ʹ߬), resulting in normalized variables ܺԢ and ܻԢ.

We then normalize ܺԢ and ܻԢ to each other via the VN process of nonlinear Where ܺ௧ is the observation of variable X at time t, h and l are the targets from

scaling to generate the shared histogram resulting in ܺԢԢ and ܻԢԢ. which to compute the lower and upper deviations respectively, and n and q are the

weights to the lower and upper deviations respectively.

2) Derive the correlation between normalized variables. VN [2012] offer a


The next section will discuss deriving conditional probabilities from partial
method of deriving nonlinear correlation coefficients from partial moments that fully
moments of the normalized distributions of ܺԢԢ and ܻԢԢ. Partial moments are
replicate Pearson’s correlation coefficient in linear variable relationships. This is an
asymptotic approximations of the area of an interval (in this instance as shown later,
important advantage at our disposal, and one Granger did not have access to at the time
the entire distribution) for any ࢌሺ࢞ሻ. This nonparametric flexibility captures the
of his work. Given the lack of linear relationships between variables, any linear
nonstationarity associated with variables, which often spoils attempts at estimating true
consideration will prove ineffectual. Furthermore, the normalization procedure in step 1
population parameters. Convergence, the first “C” in CCM, is demonstrated as the
significantly reduces the nonlinearity between variables, allowing for a visual
number of observations increases. Our method also benefits from increased observations
confirmation of the nonlinear correlation coefficients.
as partial moments gain stability as the number of observations increases.
154 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 155

CONDITIONAL PROBABILITIES
Figure 3. Normalized Data Sets ࡼሺࢅԢԢȁࢄԢԢሻ ൌ ૚

We illustrate how the partial moment ratios can also emulate conditional

probability calculations. We re-visualize the Venn diagram areas in Figure 2 as ࢄԢԢ

distribution areas from which the LPM and UPM can be observed.

ࢅԢԢ B1
Z
X
ܽ ܿ ܾ ݀

ࡼሺࢅԢԢȁࢄԢԢሻ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ܻ ᇱᇱ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ܻԢԢሻሺ͵ሻ

ࡼሺࢅԢԢȁࢄԢԢሻ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ܻԢԢሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ܻԢԢሻ


Y
ࡼሺࢅԢԢȁࢄԢԢሻ ൌ  ሺͳሻ െ ሺͲሻ

If ࢄ is chewing tobacco and ࢅ is rare tongue cancer, does ࢄ cause ࢅ? Axiomatically,


Figure 2. Venn diagram illustrating conditional probabilities
X,Y in sample space Z.ࡼሺࢅȁࢄሻ ൌ ૚. there exists a conditional probability between the two variables. However, we know

nothing about the relationship between them, in fact, if the correlation is negative we

could state that ࢄ cures ࢅ! We assume (know) this to not be the case, but it illustrates the
The conditional probability ࡼሺࢅȁࢄሻ ൌ ૚ reconstructed as normalized distributions. The
necessity to define the relationship between ࢄ and ࢅ further than just their conditional
following degree 0 partial moment relationships will yield the conditional probability of
probability.
ࢅԢԢ given ࢄԢԢ .
156 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 157

Per figure 3 above, given the conditional probability ࡼሺࢅȁࢄሻ ൌ ૚, and if a positive independent variables to satisfy equation 4 is nearly impossible in the social sciences and

correlation exists such that measured increases (decreases) in ࢄ result in measured is a prominent argument in Black [1984].17

increases (decreases) in ࢅ (correlation ࣋ࢄǡࢅ ൌ ૚), we can state definitively that ࢄ causes

ࢅ.
Z
X
࡯ሺࢄ ՜ ࢅሻ ൌ ࡼሺࢅȁࢄሻ ‫ࢄ࣋ כ‬ǡࢅ

࡯ሺࢄ ՜ ࢅሻ ൌ ૚ ‫ כ‬૚

࡯ሺࢄ ՜ ࢅሻ ൌ ૚
Y
The reciprocal case does not necessarily hold as we can see from the figure above. Since

ࢄ can occur without the occurrence of ࢅ, ࡼሺࢄȁࢅሻ ൏ ૚, thus reducing ࡯ሺࢅ ՜ ࢄሻ

regardless of correlation since ࣋ࢄǡࢅ ൌ  ࣋ࢅǡࢄ . In order for reciprocity of causality to


Figure 4. Venn diagram illustrating conditional probabilities
occur, ࡼሺࢄȁࢅሻ ൌ ࡼሺࢅȁࢄሻ. X, Y in sample space Z.ࡼሺࢅȁࢄሻ̱૙Ǥ ૡ૞.

ADDITIVITY OF CAUSATION
The conditional probability ࡼሺࢅȁࢄሻ̱૙Ǥ ૡ૞ reconstructed as normalized distributions.

࡯ሺࢄ ՜ ࢅሻ is also additive such that

෍ ࡯ሺࢄ૚ǥ࢔ ՜ ࢅሻ ൌ ૚ ሺͶሻ
௜ୀଵ

Below is a figure wherebyࡼሺࢅȁࢄሻ ൏ ૚. This is an important realization and primarily the


17
We do not rule out the possibility of multiple causes. However, multiple highly causative independent
problem with finance and Bayes’ application to finance and economics. Identifying the variables would then by necessity be exceptionally correlated with conditional probability overlays. This
observation satisfies the conditions of the omitted variable bias whereby the omitted variable: 1) must be a
determinant of the dependent variable; and 2) must be correlated with one or more of the included
independent variables.
158 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 159

Figure 5. Normalized Data SetsࡼሺࢅԢԢȁࢄԢԢሻ̱૙Ǥ ૡ૞ If the correlation between variables ࢄ and ࢅ is the same as our theortetical assumption

from the prior example ൫࣋ࢄǡࢅ ൌ ૚൯, then

࡯ሺࢄ ՜ ࢅሻ ൌ ૙Ǥ ૡ૞ ‫ כ‬૚
ࢄԢԢ

࡯ሺࢄ ՜ ࢅሻ ൌ ૙Ǥ ૡ૞

ࢅԢԢ
Then by the additive assumption, there exist other variable(s) to explain the causation of

ࢅ for the remaining 0.15 while factoring their specific correlations as well. It should be

noted that it is irrelevant which side of the distribution ࢅ overlaps ࢄ.


ܿ ܽ ݀ ܾ

ࡼሺࢅԢԢȁࢄԢԢሻ ൌ ͳ െ ‫ܯܲܮ‬ሺͲǡ ܽǡ ܻ ᇱᇱ ሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ܻԢԢሻ

ࡼሺࢅԢԢȁࢄԢԢሻ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ܻԢԢሻ െ ܷܲ‫ܯ‬ሺͲǡ ܾǡ ܻԢԢሻ

ࡼሺࢅԢԢȁࢄԢԢሻ ൌ  ሺǤͺͷሻ െ ሺͲሻ


160 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 161

BAYES’ THEOREM Then,

Bayes’ theorem will also generate the conditional probability of X given Y, ܲሺܺȁܻሻ with ‫ܯܷܲܥ‬ሺͲȁͲǡ ܿȁܽǡ ܻȁܺሻ
ܷܲ‫ܯ‬ሺͲǡ ܽǡ ܺሻ
ܷܲ‫ܯ‬ሺͲǡ ܽǡ ܺሻ
ܲሺܺȁܻሻ ൌ
the formula ܷܲ‫ܯ‬ሺͲǡ ܿǡ ܻሻ

ܲሺܻȁܺሻܲሺܺሻ Cancelling out ܲሺܺሻ leaves us with Bayes’ theorem represented by partial moments, and
ܲሺܺȁܻሻ ൌ Ǥ
ܲሺܻሻ
our conditional probability on the right side of the equality.

Where the probability of X is represented by,


‫ܯܷܲܥ‬ሺͲȁͲǡ ܿȁܽǡ ܻȁܺሻ
ܲሺܺȁܻሻ ൌ
ܷܲ‫ܯ‬ሺͲǡ ܿǡ ܻሻ
‫݂ܺ݋ܽ݁ݎܣ‬
ܲሺܺሻ ൌ ൌ ܷܲ‫ܯ‬ሺͲǡ ܽǡ ܺሻ
‫ܼ݈݁݌݉ܽݏ݈ܽݐ݋ݐ݂݋ܽ݁ݎܣ‬
‫ז‬

MULTIVARIATE CAUSATION MATRIX


And the probability of Y is represented by,
We can construct a multivariate causation matrix summarizing all causative
‫ܻ݂݋ܽ݁ݎܣ‬ influences per variables in question. We first use our method on the Sardine-Anchovy-
ܲሺܻሻ ൌ ൌ ܷܲ‫ܯ‬ሺͲǡ ܿǡ ܻሻ
‫ܼ݈݁݌݉ܽݏ݈ܽݐ݋ݐ݂݋ܽ݁ݎܣ‬
Sea Surface Temperature example in Sugihara et al. and compare our results to the CCM

method. We then apply our method to the S&P 500 – 10 Year Treasury Yield – Money

Where ݁ is the minimum value target of area (distribution) Z; just as ܽ and ܿ are for areas Supply relationship.

(distributions) X and Y respectively (d and b are maximum respective value targets).

Thus, if the conditional probability of Y given X is (per equation 3),

‫ܯܷܲܥ‬ሺͲȁͲǡ ܿȁܽǡ ܻȁܺሻ


ܲሺܻȁܺሻ ൌ 
ܷܲ‫ܯ‬ሺͲǡ ܽǡ ܺሻ
162 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 163

Sugihara et al. Sardine – Anchovy – SST Example Replication This example raises an important correlation consideration, especially when the

differences in variables are in orders of magnitude. The sardine landings (left y-axis) and
Sugihara et al. examine the relationship among Pacific sardine landings, northern
anchovy landings (right y-axis) in figure 7 are represented in different orders of
anchovy landings, and sea surface temperature (SST). Figure 7 below, reproduced from
magnitude for their unnormalized observations. Linear correlation coefficients are ill
Sugihara et al. panel C shows the California landings of Pacific sardine and northern
suited for such analysis. Figure 8 from VN[2012] illustrates the VN correlation
anchovy, while panels D to F show the CCM (or lack thereof) of sardine versus anchovy,
coefficient differences under such an extreme scale consideration ሺܻ ൌ ܺଵ଴ ሻversus the
sardine versus SST, and anchovy versus SST respectively. Sugihara et al. contend this
Pearson correlation coefficient.
shows that sardines and anchovies do not interact with each other and that both are

weakly forced by temperature.

Figure 7. Reproduced Figures 5C through 5F from Sugihara


et al. [2012].
ࢅ ൌ ࢄ૚૙
> x=seq(0,3,.01);y=x^10
> cor(x,y)
[1] 0.6610183
> NNS.dep(x,y,print.map = T)
$Correlation
[1] 0.9812511
$Dependence
[1] 0.9812511

Figure 8. Correlation coefficients for nonlinear relationship


on extreme scale. Source Viole and Nawrocki [2012b].
164 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 165

Unnormalized Data
1,400,000,000
19.5
1,200,000,000
18.5
1,000,000,000
17.5
800,000,000
Temp ˚C

16.5 Sardines
600,000,000
15.5 La Jolla SST Anchovies
400,000,000
14.5 Newport SST
200,000,000
13.5 -

1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
12.5

1983
1928
1933
1938
1943
1948
1953
1958
1963
1968
1973
1978

1988
1993
1998
2003
Year

Nonlinear Scaling
800,000,000
Figure 9. Newport and La Jolla SST relationship visualized. Newport Beach
SST data were used for anchovy data set versus La Jolla SST for sardine data 700,000,000
set per Sugihara et al. procedure. 600,000,000
500,000,000
400,000,000 Sardines
300,000,000 Anchovies
Figure 9 illustrates the (nonlinear) relationship between Newport and La Jolla SST. The 200,000,000
100,000,000
VN correlation coefficient under a less extreme scale consideration versus the Pearson
-

1944
1948
1928
1932
1936
1940

1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
correlation coefficient are .43 and .6541 respectively. The extreme scaling differences,

present even after normalization, argue for the more accurate nonlinear VN correlation Figure 10. Unnormalized and Normalized Sardine and
Anchovy landings per the VN process. Successfully
coefficient. Figure 10 represents the results of the VN normalization process. Sugihara eliminating orders of magnitude differences while
maintaining distributional properties.
et al. use a first difference normalization technique with unintended consequences as will

be discussed later.
166 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 167

Table 1. Sardine-Anchovy data set with ૌ ൌ ૛ for normalization. A: The Table 2. Sardine-SST data set with ૌ ൌ ૛ for normalization. A: The conditional
conditional probability matrix; B: The VN ρ on the normalized data; C: The probability matrix; B: The VN ρ on the normalized data; C: The Pearson ρ on the
Pearson ρ on the normalized data (for comparison to VN results); D: Causality normalized data (for comparison to VN results); D: Causality matrix.
matrix.

‫۾‬ሺ‫܇‬ԢԢȁ‫܆‬ԢԢሻ
‫۾‬ሺ‫܇‬ԢԢȁ‫܆‬ԢԢሻ X
X Sardines SST
Sardines Anchovies A Y
Sardines - .008
A Y
Sardines - .775 SST 1.0 -
Anchovies 1.0 -
ૉ‫ ܆‬ᇲᇲ ǡ‫ ܇‬ᇲᇲ
ૉ‫ ܆‬ᇲᇲ ǡ‫ ܇‬ᇲᇲ X
X Sardines SST
Sardines Anchovies B Y Sardines - (.157)
B Y Sardines - (.5663) SST (.157) -
Anchovies (.5663) -

Pearson ૉ‫ ܆‬ᇲᇲ ǡ‫܇‬ᇲᇲ


Pearson ૉ‫ ܆‬ᇲᇲ ǡ‫܇‬ᇲᇲ X
X Sardines SST
Sardines Anchovies C Y Sardines - (.18)
C Y Sardines - (.358) SST (.18) -
Anchovies (.358) -

۱ሺ‫ ܆‬՜ ‫܇‬ሻ


۱ሺ‫ ܆‬՜ ‫܇‬ሻ X
X Sardines SST
Sardines Anchovies A*B=D Y Sardines - (.0013)
A*B=D Y Sardines - (.4388) SST (.157) -
Anchovies (.5633) -
168 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 169

Table 3. Anchovy-SST data set with ૌ ൌ ૛ for normalization. A: The conditional Sugihara et al. Sardine – Anchovy – SST Example Discussion
probability matrix; B: The VN ρ on the normalized data; C: The Pearson ρ on the
normalized data (for comparison to VN results); D: Causality matrix.
Sugihara et al. [2012] declare from the implementation of the CCM method on the
sardine – anchovy – SST dataset,

“In addition, as expected, there is no detectable signature from either


‫۾‬ሺ‫܇‬ԢԢȁ‫܆‬ԢԢሻ sardine or anchovy in the temperature manifold; obviously, neither
X
sardines nor anchovies affect SST.”
SST Anchovies
A Y
SST - 1.0
Anchovies .005 -
We concur that there is no anchovy signature in the SST data. However, there is a very
ૉ‫ ܆‬ᇲᇲ ǡ‫ ܇‬ᇲᇲ
X slight sardine signature. Obviously sardines do not affect SST, but we are measuring
SST Anchovies
B Y SST - (.0067) their presence through landing data. Given this semantic clarification, perhaps the
Anchovies (.0067) -
sardines pick up on another diminishing variable which is more sensitive to other water

conditions (salinity?) and also have inverse causal relationships. The sardines leave
Pearson ૉ‫ ܆‬ᇲᇲ ǡ‫܇‬ᇲᇲ
X (diminished presence) due to this omitted variable, and the SST subsequently rises. The
SST Anchovies
C Y SST - .1459 sardines did not cause the water temperature increase, they anticipated the rise and left.
Anchovies .1459 -
“Thus, although sardines and anchovies are not actually interacting, they
are weakly forced by a common environmental driver, for which
۱ሺ‫ ܆‬՜ ‫܇‬ሻ temperature is at least a viable proxy. Note that because of transitivity,
X temperature may be a proxy for a group of driving variables (i.e.,
SST Anchovies temperature may not be the most proximate environmental driver).”
A*B=D Y SST - (.0067) Sugihara et al. [2012].
Anchovies (.00003) -

We measure the presence of sardines and presence of anchovies as inversely related

(nonlinearly) due to the substantial difference in correlations between the VN and

Pearson correlation coefficients; and in a manner consistent with the bidirectional


170 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 171

coupling case from Sugihara et al. The minimal net effect sardine-anchovy of (.1275) Table 3. Normalization effects on Pearson correlations and resulting correlation

also suggests another variable at play. We are not here to prove causation of sardine and matrices.

anchovy landing data, as the authors’ focus of finance and economics precludes them
Raw Data Pearson ρ 1st Differences Normalized data
from accurately selecting relevant variables. However, we do offer a contending insight Pearson ρ
SST Anchovy Sardine SST(NB) SST Anchovy Sardine
to the Sugihara et al. conclusion using exclusively nonlinear techniques. SST(NB)

SST 1 (.3043) (.10) .6541 1 (.13) .017 .8694


This striking linear vs. nonlinear difference occurs in the very first step, the normalization
Anchovy (.3043) 1 (.358) (.2431) (.13) 1 (.073) (.0632)

techniques on the raw data. Sugihara et al. use the first difference in data points to Sardine (.10) (.358) 1 .1607 .017 (.073) 1 .0403

normalize the data in CCM. This standard normalization technique results in a Pearson SST(NB) .6541 (.2431) .1607 1 .8694 (.0632) .0403 1

correlation of -.073 and equally paltry .0278 VN correlation coefficient for sardines
VN Normalized data Pearson ρ
versus anchovies. However, this is compared with a -.3579 Pearson and -.67 VN
SST Anchovy Sardine SST(NB)
correlation coefficient on the raw data. Table 3 below presents the Pearson correlation SST 1 (.3043) (.10) .6541

coefficient for the raw data set, the Sugihara et al. first differences data set, and the VN Anchovy (.3043) 1 (.358) (.2431)

Sardine (.10) (.358) 1 .1607


normalized data set.
SST(NB) .6541 (.2431) .1607 1

A closer examination of the normalization processes reveals the VN nonlinear


scaling method retains the identical results for both Pearson and VN correlation
coefficients while the first differences method eliminates the underlying sardine-
anchovy-SST relationships.
172 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 173

Money Supply – S&P 500 – 10 Year US Treasury Yield Example Figure 11. Visual representation of the unnormalized (top) dual y-axis and
final normalized variables (τ=1) single y-axis using the method presented in Viole
and Nawrocki [2013]. Also illustrates the ability for true multivariable
We present the findings of on the S&P 500 – 10 Year Treasury Yield – Money normalization.
Supply relationship through our method using a three variable normalization versus the

two variable prior example.v


Figure 11 illustrates the effects of the nonlinear scaling normalization process on

multiple variables. The resulting normalized variables are analogous to the manifolds
Unnormalized Data
offered in CCM and present the system as a whole for consideration by placing them on a
14000.00 18.00
16.00
12000.00 shared axis.
14.00
10000.00
12.00
One important feature is that ࡹࢆࡹԢԢ has a conditional probability equal to one

Yield %
8000.00 10.00 S&P 500
6000.00 8.00 MZM
given the events of both the ૚૙ࢅࢋࢇ࢘ࢅ࢏ࢋ࢒ࢊԢԢ and the ࡿƬࡼ૞૙૙ԢԢ. All of the normalized
6.00
4000.00 10 Yr Yield
4.00
2000.00
data points fit within the normalized range for ࡹࢆࡹԢԢ per figure 11 above. These
2.00
0.00 0.00 numbers are in red in section A of table 4 below.
1959
1964
1969
1974
1979
1984
1989
1994
1999
2004
2009

The correlation coefficient in section B of table 4 represents the 3rd order

nonlinear correlation coefficient as demonstrated in VN [2012]. This offers a distinct


Nonlinear Scaling τ=1
5000 insight versus its linear alternative, the Pearson correlation coefficient.
4500
4000
3500
3000
S&P 500
2500
2000 10 Yr Yield
1500 MZM
1000
500
0
1985
1959
1962
1965
1968
1972
1975
1978
1981

1988
1991
1994
1998
2001
2004
2007
2011
174 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 175

Table 4. Financial variable dataset with ૌ ൌ ૚ for normalization. A: The We can state that MZM is a cause to S&P 500 prices and inverse cause to 10 year
conditional probability matrix; B: The VN ρ on the normalized data; C: Causality
matrix with cumulative causation in the bottom row and cumulative effect in far Treasury yields net of the bidirectional coupling the variables share. It should be
right column.
noted that the linear Pearson correlation resulted in extremely high correlations, and

consequently causation for these same variable setsሺɏଡ଼ᇲᇲ ǡଢ଼ᇲᇲ ൐ ǤͻͲሻ. These above results

‫۾‬ሺ‫܇‬ԢԢȁ‫܆‬ԢԢሻ are consistent (and stronger) with the asymmetrical bidirectional coupling predator – prey
X
S&P 500 10 Year Yield MZM example in Sugihara et al. and with Black’s casual argument on the intertwined
SPY - .6867 1.0
A Y 10 Year
1.0 - 1.0
relationship between money stock and economic activity.
Yield
MZM .9074 .6651 -
Rogalski and Vinso [1977] through GC firmly reject the hypothesis that causality runs
ૉ‫ ܆‬ᇲᇲ ǡ‫ ܇‬ᇲᇲ
X unidirectionally from past values of money to equity returns. Their results are consistent
S&P 500 10 Year Yield MZM
SPY 1.0 (.2841) .5031 with the hypothesis that stock returns are not purely passive but perhaps influence money
B Y 10 Year
(.2841) 1.0 (.5287) supply in some complicated fashion. Our results showing asymmetrical bidirectional
ૉ‫ ܇‬ᇲᇲ ǡ‫܆‬ᇲᇲ Yield
MZM .5031 (.5287) 1.0
coupling directly support Rogalski and Vinso’s contention.
۱ሺ‫ ܆‬՜ ‫܇‬ሻ
X
S&P 500 10 Year Yield MZM
SPY - (.1940) .5031
C Y 10 Year
(.2841) - (.5287)
Yield
MZM .4565 (.3517) -
෍ ۱ሺ‫ ܆‬՜ ‫܇‬ሻ .1724 (.5457) (.0256)
176 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 177

DISCUSSION While CCM was not designed to compete with GC, rather is specifically aimed at

a class of system not covered by GC (nonseparable, weakly coupled systems effected by


Fischer Black had a very insightful article on causation in “The Trouble with
shared driving variables), our method is aimed at all systems. We normalize the variables
Econometric Models.” Black recommends experiments to isolate the causal variable in
to lagged observations of themselves, nonlinearly. We normalize the normalized
question, conditional probability. He illustrates several examples identifying the
variables to the other normalized variables of interest, nonlinearly. We generate
specification error associated with conditional probability as the cause of the lack of
nonlinear correlations between the normalized variables. All of the nonlinear methods
causality. Black sums it up beautifully,
employed fully replicate linear situations as demonstrated in Viole and Nawrocki [2012b,
We just can't use correlations, with or without leads and lags, to determine
2013].
causation.
The authors’ main focus is economics and finance. This binding condition
That’s as true today as it was decades ago. However, we can now say:
inhibits them from extending the analysis to other areas such as biology or ecological

We need correlations and conditional probabilities, with and without leads and systems as the convergent cross mapping method exemplifies without collaboration. We
lags, to determine causation.
could provide many more axiomatic examples of known (and unknown) conditional
Granger causality was predicated on prediction instead of correlation to identify
probabilities as Black does for support (or rejection) of causation, but experimentation
causation between time-series variables. Stochastic variables predicated on nonlinear
and empirical analysis will ultimately serve as proof to this theoretical work. We look
relationships do not lend themselves to prediction, especially if they are not strongly
forward to extending the discussion to other fields in search of these experiments, thus
synchronized.
satisfying the conditional probability requirement in proving causation.
“Therefore, information about X(t) that is relevant to predicting Y is
redundant in this system and cannot be removed simply by eliminating X
as an explicit variable. When Granger’s definition is violated, GC
calculations are no longer valid, leaving the question of detecting
causation in such systems unanswered.” Sugihara et al. [2012]
178 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 179

APPENDIX A Using the following data in Table 1A, we are after the bold red numbers:

EMPIRICAL CONDITIONAL PROBABILITY EXAMPLE

Earlier we illustrated the conditional probability for a given occurrence using S&P 500 10 Year Yield S&P 500 10 Year Yield
1/1/2005 2.56% 0.95% 5/1/2007 3.95% 2.81%
partial moments from normalized variables. However, if we wish to further constrain the 2/1/2005 -1.50% -0.24% 6/1/2007 3.19% 1.27%
3/1/2005 1.53% -1.19% 7/1/2007 0.22% 7.11%
conditional distribution to positive and negative occurrences we need to use co-partial
4/1/2005 -0.40% 7.62% 8/1/2007 0.41% -1.98%
5/1/2005 -2.58% -3.62% 9/1/2007 -4.44% -6.83%
moments of reduced the reduced observation count. This differs from a joint probability
6/1/2005 1.18% -4.72% 10/1/2007 2.88% -3.26%
7/1/2005 2.01% -3.44% 11/1/2007 2.80% 0.22%
where the number of observations is not reduced to the conditional occurrences.
8/1/2005 1.65% 4.40% 12/1/2007 -5.08% -8.76%
9/1/2005 0.17% 1.90% 1/1/2008 1.08% -1.21%
The following example will generate the conditional probability of a specific 10/1/2005 0.13% -1.42% 2/1/2008 -7.03% -9.19%
11/1/2005 -2.81% 6.01% 3/1/2008 -1.75% 0.00%
occurrence with Bayes’ theorem, then with our method. Given 100 observations of 10 12/1/2005 3.74% 1.78% 4/1/2008 -2.84% -6.35%
1/1/2006 1.98% -1.55% 5/1/2008 3.98% 4.73%
Year yield returns and S&P 500 returns (normalized by percentage return), what is the
2/1/2006 1.31% -1.12% 6/1/2008 2.36% 5.29%
3/1/2006 -0.16% 3.34% 7/1/2008 -4.52% 5.52%
probability that given an interest rate increase, stocks rose?
4/1/2006 1.33% 3.23% 8/1/2008 -6.46% -2.22%
5/1/2006 0.65% 5.56% 9/1/2008 1.90% -3.04%
6/1/2006 -0.94% 2.38% 10/1/2008 -5.16% -5.28%
7/1/2006 -2.90% 0.00% 11/1/2008 -22.81% 3.20%
8/1/2006 0.57% -0.39% 12/1/2008 -9.27% -7.63%
9/1/2006 2.11% -4.21% 1/1/2009 -0.62% -37.75%
10/1/2006 2.35% -3.33% 2/1/2009 -1.37% 4.05%
11/1/2006 3.40% 0.21% 3/1/2009 -7.23% 13.01%
12/1/2006 1.84% -2.79% 4/1/2009 -6.16% -1.76%
1/1/2007 1.98% -0.87% 5/1/2009 11.35% 3.83%
2/1/2007 0.54% 4.29% 6/1/2009 6.20% 11.59%
3/1/2007 1.44% -0.84% 7/1/2009 2.59% 12.28%
4/1/2007 -2.65% -3.45% 8/1/2009 1.04% -4.40%
180 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 181

Defining the probabilities as:

S&P 500 10 Year Yield S&P 500 10 Year Yield P(SI) = probability of the S&P 500 increasing
9/1/2009 7.60% 0.84% 1/1/2012 1.37% -1.50% P(SD) = probability of the S&P 500 decreasing
10/1/2009 3.39% -5.44% 2/1/2012 4.50% -0.51% P(II) = probability of interest rates increasing
11/1/2009 2.19% -0.29% 3/1/2012 3.91% 0.00% P(ID) = probability of interest rates decreasing
12/1/2009 1.89% 0.29% 4/1/2012 2.68% 9.67%
1/1/2010 2.03% 5.44% 5/1/2012 -0.20% -5.69%
Interest Rate Increase Interest Rate Decrease Interest Rate Unchanged Total
2/1/2010 1.18% 3.83% 6/1/2012 -3.31% -13.01%
3/1/2010 -3.11% -1.08% 7/1/2012 -1.34% -10.54% S&P Increase 35 CUPM 28 DLPM 2 65 UPM
4/1/2010 5.61% 1.08% 8/1/2012 2.71% -5.72%
5/1/2010 3.85% 3.17% 9/1/2012 3.16% 9.35% S&P Decrease 9 DUPM 24 CLPM 2 35 LPM
6/1/2010 -6.22% -11.84% 10/1/2012 2.81% 2.35%
7/1/2010 -3.78% -6.65% 11/1/2012 -0.39% 1.73%
S&P Unchanged 0 0 0 0
8/1/2010 -0.33% -6.12% 12/1/2012 -3.06% -5.88%
Total 44 UPM 52 LPM 4 100
9/1/2010 0.69% -10.87% 1/1/2013 1.97% 4.15%
10/1/2010 3.15% -1.87% 2/1/2013 4.00% 10.48% Table 2A. Bayes’ Theorem probabilities identified and displayed from the data in
11/1/2010 4.32% -4.24% 3/1/2013 2.13% 3.60% table 1A. Corresponding partial moments quadrants also represented.
12/1/2010 2.30% 8.31% 4/1/2013 2.52% -1.02%
1/1/2011 3.49% 17.57%
2/1/2011 3.26% 2.99%
௉ሺூூȁௌூሻ௉ሺௌூሻ
3/1/2011 2.96% 5.45% According to Bayes’ theorem ܲሺܵ‫ܫ‬ȁ‫ܫܫ‬ሻ ൌ ௉ሺூூሻ
4/1/2011 -1.27% -4.87%
5/1/2011 2.05% 1.46% ͵ͷ ͸ͷ

ቁቀ ቁ ͵ͷ
6/1/2011 0.51% -8.75% ܲሺܵ‫ܫ‬ȁ‫ܫܫ‬ሻ ൌ ͸ͷ ͳͲͲ ൌ ൬ ൰ ൌ ͹ͻǤͷͷΨ
ͶͶ ͶͶ
7/1/2011 -3.89% -5.51% ቀͳͲͲቁ
8/1/2011 2.90% 0.00%
9/1/2011 -11.15% -26.57%
10/1/2011 -0.97% -14.98%
This example raises an immediate concern - in the instance where there is a zero return,
11/1/2011 2.80% 8.24%
12/1/2011 1.58% -6.73%
the observation is neither a gain nor a loss. These observations are highlighted in grey in

table 1A. When an observation equals a target in the partial moment derivations, that

observation is placed into an empty set; analogous to the unchanged column in the table

above. Empty sets reduce both the lower and upper partial moments, thus their effect is

symmetrical to the resulting statistics.


182 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS Correlation ≠ Causation 183

Using our method: ்


ͳ
ܷܲ‫ܯ‬ሺͲǡͲǡ ܻሻ ൌ  ෍ሼƒšሺܻ௧ െ Ͳሻ ǡ Ͳሽ଴  ൌ ͲǤͶͶ
ܶ
௧ୀଵ
Figure 1A below illustrates the normalized distributions from the data in table 1A. Using
In R where sp = S&P 500 and ten.yr = 10 year yield:
equation 3, we can see that the S&P 500 degree zero upper partial moment from the
> UPM(0,0,ten.yr)

minimum 10 Year Yield observation is equal to .7955. The S&P 500 degree zero upper [1] 0.44

partial moment from the maximum 10 Year Yield observation is equal to zero. Thus, the

conditional probability of a positive S&P 500 return given an increase in 10 Year Yields The number of occurrences is (0.44 * T) which yields 44 in this example. Using T* as

is equal to 79.55%, represented by the lighter shaded blue. our reduced universe of observations, we compute the conditional upper partial moment

for a direct computation of the conditional probability from the underlying time series.

Normalized Histogram ்‫כ‬


14 ͳ
‫ܯܷܲܥ‬ሺͲȁͲǡͲȁͲǡ ܺȁܻሻ ൌ ൬ ‫ כ‬൰ ή ෍ ሾ݉ܽ‫ݔ‬ሺܺ௧ ‫ כ‬െ ͲǡͲሻሿ଴ ሾ݉ܽ‫ݔ‬ሺܻ௧ ‫ כ‬െ ͲǡͲሻሿ଴ 
12 ܶ ‫כ‬ ௧ ୀଵ
10
Frequency

8
6
10 Year Yield In our example,
4
2 S&P ‫ܯܷܲܥ‬ሺͲȁͲǡͲȁͲǡ ܺȁܻሻ ൌ Ǥ͹ͻͷͷ
0
-5.85%

11.11%
-8.27%

-3.43%
-1.01%

13.53%
15.95%
1.42%
3.84%
6.26%
8.68%

And in R:
-22.81%
-20.39%
-17.96%
-15.54%
-13.12%
-10.70%

> Co.UPM(0,0,sp,ten.yr,0,0)/UPM(0,0,ten.yr)
Return
[1] 0.7954545

Figure 1A. Graphical representation of conditional probability of positive S&P500 > UPM(0,0,sp[ten.yr>0])
return given an increase in 10 Year Yields.
[1] 0.7954545

Alternatively, we can derive the same conclusion with conditional partial moments. The
But, this result isn’t particularly interesting or innovative since degree zero partial
frequency of positive 10 Year Yield returns is represented by the degree zero upper
moments are frequency and counting statistics – just as in the Bayes derivation.
partial moment from a zero target, where X= S&P 500 and Y = 10 year yield.
184 Correlation ≠ Causation NONLINEAR NONPARAMETRIC STATISTICS

However, the method permits an easy conversion to a conditional expected shortfall

measure whereby the average S&P increase given an increase in interest rates can be

computed by changing the degree of the X term to 1 from 0.

்‫כ‬
ͳ
‫ܯܷܲܥ‬ሺͳȁͲǡͲȁͲǡ ܺȁܻሻ ൌ ൬ ‫ כ‬൰ ή ෍ ሾ݉ܽ‫ݔ‬ሺܺ௧ ‫ כ‬െ ͲǡͲሻሿሾ݉ܽ‫ݔ‬ሺܻ௧ ‫ כ‬െ ͲǡͲሻሿ଴ 
ܶ ‫כ‬ ௧ ୀଵ

In our example the average S&P 500 increase given an increase in interest rates is,

‫ܯܷܲܥ‬ሺͳȁͲǡͲȁͲǡ ܺȁܻሻ ൌ ͳǤͷΨ

And in R:
> (Co.UPM(1,0,sp,ten.yr,0,0)-D.UPM(1,0,sp,ten.yr,0,0))/UPM(0,0,ten.yr)

[1] 0.01495909
REFERENCES
> UPM(1,0,sp[ten.yr>0])-LPM(1,0,sp[ten.yr>0])

[1] 0.01495909

Both methodologies yield the same conditional probability which is not surprising given

the simple frequency requirement of the underlying calculation and same associated

targets for the partial moments. However, since partial moments are already used in

portfolio analysis their flexibility in constructing other relevant statistics is often

overlooked.
NONLINEAR NONPARAMETRIC STATISTICS References 187

REFERENCES

Billingsley, P. (1968), “Convergence of Probability measure,” John Wiley and Sons,


New York, third edition.

Black, Fischer [1984]. “The Trouble with Econometric Models.” Financial Analysts
Journal, Vol. 38, No. 2, pp. 29-37.

B. M. Bolstad, R. A. Irizarry, M. Astrand and T. P. Speed [2003] “A Comparison of


Normalization Methods for High Density Oligonucleotide Array Data Based on Variance
and Bias.” Bioinformatics, Vol. 19, Number 2, p. 185-193.
Chen, Y. A., Almeida, J., Richards, A., Muller, P., Carroll, R., and Rohrer, B. [2010]. “A
Nonparametric Approach to Detect Nonlinear Correlation in Gene Expression.” Journal
of Computational and Graphical Studies, Volume 19, Number 3, p. 552-568.

Dixon, P. A., M. J. Milicich, and G. Sugihara. [1999]. “Episodic fluctuations in larval


supply.” Science, Vol. 283, pp. 1528-1530.
Deyle, E.R, and Sugihara G. [2011]. Generalized Theorems for Nonlinear State Space
Reconstruction. PLoS ONE 6(3): e18295.
Estrada, Javier, (2008). “Mean-Semivariance Optimization: A Heuristic Approach,”
Journal of Applied Finance, v18(1), 57-72.

Granger, C. [1969]. “Investigating Causal Relations by Econometric Models and Cross-


Spectral Methods”. Econometrica, Vol. 37, No. 3, pp. 424-438.
Grootveld, Henk and Winfried Hallerbach. (1999), “Variance vs. Downside Risk: Is
There Really That Much Difference.” European Journal of Operational Research, v114,
304-319.

Guthoff, A, Pfingsten, A., Wolf, J (1997), “On the compatibility of Value-at-risk, other
risk concepts and expected utility maximization,” in: Hipp, C. et.al. (eds).

Guthoff, A., Pfingsten, A. and J. Wolf (1997). “On the Comapatibility of Value at
Risk,Other Risk Concepts, and Expected Utility Maximization”; in: Hipp, C. et.al.
(eds.): Geld, Finanzwirtschaft, Banken und Versicherungen: 1996; Beiträge zum
7. Symposium Geld, Finanzwirtschaft, Banken und Versicherungen an der
Universität Karlsruhe vom 11.-13. Dezember 1996, Karlsruhe 1997, p. 591-614.
188 References NONLINEAR NONPARAMETRIC STATISTICS NONLINEAR NONPARAMETRIC STATISTICS References 189

Holthausen, D. M. (1981). "A Risk-Return Model With Risk And Return Measured As A.W. van der Vaart, J.A. Wellner, Jon A. (1996), “Weak convergence and empirical
Deviations From a Target Return." American Economic Review, v71(1), 182-188. processes.” With applications to statistics. Springer Series in Stat. Springer-Verlag, New
Kaplan, P. and Knowles, J. (2004), “Kappa: A Generalized Downside Risk-Adjusted York.
Performance Measure.” Journal of Performance Measurement, 8(3), 42-54. Wang, G.S., [2008]. “A Guide to Box-Jenkins Modeling.” Journal of Business
Lucas, D. (1995). “Default Correlation and Credit Analysis.” Journal of Fixed Income, Forecasting; Spring 2008, Vol. 27 Issue 1, p19
Vol. 11, pp. 76-87. http://demonstrations.wolfram.com/SingleFactorAnalysisOfVariance/
Markowitz, Harry. 1959, Portfolio Selection. (First Edition). New York: John Wiley and
Sons.

Pitman, E.J.G. (1979). “Some Basic Theory for Statistical Inference.” London,
Chapman and Hall. i
Newton proved the integral of a point in a continuous distribution to be equal to zero.
ii
Rogalski, R. J., and Vinso, J. D. [1977] "Stock Returns, Money Supply, and the If no data exists in a subset, no mean is calculated.
iii
Direction of Causality." Journal of Finance, September 1977, pp. 1017-1030. The horizontal line as in the equation ܻ ൌ ͳ (point probability) yields a 0 correlation
for both Pearson’s correlation and our metric.
iv
Shadwick, W. and Keating, C. (2002), “A Universal Performance Measure.” Journal of All variables in the regression are exchange traded funds (ETFs) that trade in US
Performance Measurement, Spring 2002, pp. 59-84, 2002. markets: SPY is the S&P 500 ETF, TLT is the Barclays 20+ year Treasury Bond ETF,
GLD is the Gold Trust ETF, FXE is the Euro Currency ETF, and GSG is the S&P GSCI
G.R. Shorack, J.A. Wellner, (1986), “Empirical processes with applications to statistics,” Commodity Index ETF.
v
Wiley Series in Probab. and Math. Stat.: Probab. and Math. Stat. John Wiley & Sons, The data are monthly series from 01/01/1959 through 04/01/2013. They are available
Inc., New York. from FRED with links to graphs and data for each of the variables listed.

Sugihara, G., May, R., Ye, H., Hsiech, C., Deyle, E., Fogarty, M., Much, S. [2012]. http://research.stlouisfed.org/fred2/graph/?id=SP500
“Detecting Causality in Complex Ecosystems.” Science, Vol. 338, pp. 496-500.
http://research.stlouisfed.org/fred2/graph/?s[1][id]=GS10
Takens, F. [1981] in Dynamical Systems and Turbulence, D. A. Rand, L. S. Young, Eds.
http://research.stlouisfed.org/fred2/series/MZMNS?rid=61
(Springer-Verlag, New York, 1981), pp. 366–381.

Viole, F. and Nawrocki, D. [2012a]. “Deriving Cumulative Distribution Functions &


Probability Density Functions Using Partial Moments.” Available at SSRN:
http://ssrn.com/abstract=2148482
Viole, F. and Nawrocki, D. [2012b]. “Deriving Nonlinear Correlation Coefficients from
Partial Moments.” Available at SSRN: http://ssrn.com/abstract=2148522
Viole, F. and Nawrocki, D. [2012c]. “f(Newton).” Available at SSRN:
http://ssrn.com/abstract=2186471 .

Viole, F. and Nawrocki, D. [2013]. “Nonlinear Scaling Normalization with Variance


Retention”. Available http://ssrn.com/abstract=2262358
Proof

You might also like