Nonlinearity Matters Stock Price
Nonlinearity Matters Stock Price
PII: S0264-9993(20)31240-2
DOI: https://doi.org/10.1016/j.econmod.2020.11.004
Reference: ECMODE 5391
Please cite this article as: Behrendt, S., Schmidt, A., Nonlinearity matters: The stock price – trading
volume relation revisited Economic Modelling, https://doi.org/10.1016/j.econmod.2020.11.004.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
The purpose of this paper is to investigate the information transfer in the relation between
stock prices and trading volume. While several theoretical models establish this relation,
determining its direction remains an empirical question. Conventional linear approaches, such
as Granger causality, provide only limited insights. Importantly, they do not take into account
the nonlinear nature of this relation which is advocated by theoretical models of
noninformational trading. Moreover, they cannot deduce the dominant direction of the
information transfer. Both shortcomings can be addressed by relying upon the concept of
Shannon transfer entropy. In an empirical application to a large sample of stocks, we employ
this model-free measure and find: (i) A substantial amount of nonlinear information transfer
across stocks, and (ii) this information predominantly flows from returns to trading volume
growth. Thus, we present empirical evidence that the relation between these financial
variables is in fact likely to be nonlinear.
of
JEL: C14, C58, G14
ro
-p
re
lP
na
ur
Jo
1 Introduction
A thorough understanding of the relationship between stock prices and trading volume has
important implications for our insights into the structure of financial markets, the dissemination
consequence, the analysis of the stock price – trading volume relationship has drawn a
considerable amount of attention in the finance literature (for an early summary of empirical
evidence investigating contemporaneous effects, see Karpoff, 1987). Of special interest to market
participants has been the question whether or not there exists a directional information transfer
of
between stock prices and trading volume, i.e., whether knowledge of past movements in stock
ro
prices leads to improved predictions not only of current but also future movements in trading
-p
volume, and vice versa. For instance, the former direction may be derived from a sequential
arrival of information argument in the sense that stock prices incorporate the latest information
re
to the market earlier than trading volume.
lP
In order to illustrate the relation between stock price and trading volume, we make use of tick-
na
by-tick Trade and Quote (TAQ) data for one constituent of the Dow Jones Industrial Average
(DJIA).1 These data are obtained from the Wharton Research Data Services at the frequency of
ur
milliseconds. To be more precise, we exemplarily consider the stock of Procter & Gamble (PG) on
Jo
May 6, 2010. This trading day is special since it is characterized by a significant flash crash,
marked by a large and temporary selling pressure (e.g., Kirilenko et al., 2017; Easley et al., 2011).
Figure 1 depicts the tick-by-tick transaction prices and trading volumes for PG on May 6, 2010,
throughout the trading day, providing an impression of the sudden drop in prices and increase in
traded shares affecting many constituents of the DJIA. Although it is immediately clear from the
figure that some relation exists between a stock’s price and its trading volume, the direction of
the information transfer is not readily deduced. This means, it is not clear what the dominant
direction in the predictive relation between stock prices and trading volume is.
1
Given that the econometric analysis of such high-frequency data comes with its very own considerations, we focus
on the analysis of daily data in the empirical application below.
1
Figure 1: Tick-by-tick transaction prices and trading volume of the Procter & Gamble
stock on May 6, 2010
Considering the stock of Procter & Gamble, the plots show (a) tick-by-tick transaction prices and (b) tick-
by-tick trading volumes for the trading day May 6, 2010.
62
60
58
of
56
ro
10:30:00 12:00:00 13:30:00 15:00:00
time
-p
(a) PG intraday transaction prices
re
lP
90000
na
60000
ur
30000
Jo
Regarding theoretical considerations behind the stock price – trading volume relation, early
models concerned with sequential information arrival are put forward by Copeland (1976) and
Jennings et al. (1981) and indicate the presence of a bidirectional predictive relationship between
absolute stock returns and trading volume. Moreover, different tax- and non-tax-related trading
motives such as the timing of realized capital gains and losses as well as portfolio rebalancing and
contrarian strategies have also been proposed to induce a stock price – trading volume relation
2
(e.g., Lakonishok and Smidt, 1989). Other potential explanations involve mixture of distributions
models (e.g., Clark, 1973; Epps and Epps, 1976) or the positive-feedback trading strategies of
noise traders, which induce a positive directional relation from stock returns to trading volume
In order to assure stationary time series, empirical studies usually consider stock return and
transformed trading volume (e.g., trading volume growth) series instead of the price and trading
volume series in levels. Moreover, most of the empirical evidence for the relation of stock returns
and trading volume is based on linear models. For example, Smirlock and Starks (1988) and Lee
and Rui (2002) rely on bivariate vector autoregressive (VAR) models and Granger causality
of
(Granger, 1969) tests. While the former find a positive relation between absolute price changes
ro
(a measure for stock return volatility) and trading volume for individual-level stocks, the latter
-p
discover that trading volume does not Granger-cause stock returns using daily index data from
the stock exchanges in New York, Tokyo and London.2 Chordia and Swaminathan (2000) provide
re
evidence that trading volume determines the lead-lag cross-correlations in stock returns, Gagnon
lP
and Karolyi (2009) find further support for a large sample of internationally cross-listed stocks
and Chuang et al. (2009) use quantile regressions to show that the directional relation of trading
na
volume on stock returns is more heterogeneous across quantiles than the directional relation of
ur
stock returns on trading volume. On a different note, Gallant et al. (1992) stress that large price
movements are followed by high trading volume and Chen (2012) finds that returns of the S&P
Jo
500 predict trading volume in both bear and bull markets, while trading volume predicts returns
By contrast, considerations of nonlinearities have less often been a central part of the discussion,
even though empirical evidence of nonlinear dependencies in stock returns is ample (for an early
reference, see Hinich and Patterson, 1985). More recently, the relevance of nonlinear predictions
of end-of-day traded volume has also been shown in Sancetta (2019). Turning to the directional
relation between stock returns and trading volume, Campbell et al. (1993) develop a model in
which expected future stock returns evolve as a nonlinear function of current and past returns as
well as trading volume and document corresponding empirical evidence. Subsequently, Hiemstra
2
Note that the concept of Granger causality is concerned with one time series’ ability to forecast present and
future values of another time series and is thus not to be confused with a true causal relationship
3
and Jones (1994) propose nonlinear Granger causality tests to analyze the relation between daily
returns of the DJIA and trading volume growth on the New York Stock Exchange. They indeed find
a significant bidirectional nonlinear relationship between stock returns and trading volume. In
addition, McMillan (2007) shows that lagged trading volume can be used to improve forecast
In this paper, we add to the previous literature in two ways: (i) We quantify and test for the
nonlinear directional information transfer between stock returns and trading volume growth by
drawing upon a practical two-step procedure, and (ii) we obtain novel empirical results from daily
calendar-adjusted log-returns and volume growth for more than 400 stocks over an 18 years time
of
period. The idea of the two-step procedure follows the same reasoning as the nonlinear Granger
ro
causality test of Hiemstra and Jones (1994). First, we remove all linear auto- and cross-
-p
correlations from the bivariate system by applying a linear filter in the form of a VAR model. Any
remaining residual information may then be attributed to nonlinearities in the bivariate system of
re
stock returns and trading volume growth. The second step is different from the nonlinear
lP
Granger causality test proposed by Hiemstra and Jones (1994) in that we make use of Shannon
transfer entropy, as initially introduced by Schreiber (2000), to quantify the remaining nonlinear
na
residual information transfer. Derived from information theory, Shannon transfer entropy is a
ur
nonparametric measure that is sensitive to any statistical dependencies between two time series.
Moreover, we are not only able to infer the existence of nonlinear residual information transfer
Jo
in the bivariate system of stock returns and trading volume growth, but also the dominant
direction of that information transfer, which constitutes an improvement over the usual (non-
)linear Granger causality tests. Thus, even in cases where the transfer entropy estimates point to
a bidirectional nonlinear residual information transfer, we can determine whether or not most of
the information flows from trading volume growth to stock returns or vice versa. In the appendix,
induce in a bivariate system with regards to measuring information transfer and to illustrate the
Overall, our results indicate that after accounting for all linear auto- and cross-correlations in the
bivariate system of stock returns and trading volume growth there still exists a statistically
significant nonlinear residual information transfer in at least one direction for a large number of
4
stocks. While the nonlinear information transfer is bidirectional in many cases and no overall
pattern for all stocks emerges, information predominantly flows from stock returns to trading
volume growth for most of the stocks that we consider. This finding is not specific to stocks
pertaining to a certain industry. As can be expected, the magnitude of the information transfer is
slightly affected when we additionally account for volatility persistence, which could otherwise
be a potential driver of the information transfer (e.g., Clark, 1973; Hiemstra and Jones, 1994), and
when we separately consider the time periods before and after the financial crisis unfolding in
2008. However, our main finding of significant nonlinear information transfer between the VAR
residuals remains robust across all specifications. Therefore, linear empirical models do not seem
of
to suffice to adequately assess the theoretical stock price – trading volume relation.
ro
The remainder of the paper is structured as follows: Section 2 briefly discusses models of
-p
noninformational trading as source for a nonlinear directional relation between stock prices and
trading volume. Next, Section 3 introduces the concept of Shannon transfer entropy and outlines
re
the idea of the two-step procedure used to test for nonlinear residual information. In Section 4,
lP
the data set used in the empirical analysis is described and Section 5 presents the results of the
empirical analysis for the calendar-adjusted bivariate system of stock returns and volume growth
na
(Section 5.1) as well as when we additionally account for volatility persistence (Section 5.2). Lastly,
ur
Section 6 offers some concluding remarks. Additional results and the simulation experiments are
Although many channels have been put forward to describe the relation between stock prices
and trading volume, as outlined above, we want to briefly consider one strand of the finance
literature in more detail: Models of noninformational trading. These models are especially
interesting since while the underlying reasons for trading may be different, they make similar
predictions. For example, models along the lines of De Long et al. (1990a) shed light on the effect
of investor sentiment (i.e., subjective beliefs about investment risks and future payoffs) on stock
prices and trading volume. Here, noninformational trading is based on shifting misperceptions of
noise traders related to future payoffs from a risky asset. In these behavioral models, noise
5
traders’ random beliefs influence prices since rational arbitrageurs, who also have a downward-
sloping demand for risky assets, do not aggressively push back prices to fundamentals when noise
traders experience a belief shock. These models predict that, on the one hand, downward price
pressure is generated by low investor sentiment and, on the other hand, high trading volume by
On a different note, the model of Campbell et al. (1993) turns to the empirical phenomenon that
the first-order daily autocorrelation of stock returns tends to decrease when trading volume is
high. Their model also builds on two types of traders, namely liquidity, or noninformational,
traders and other risk-averse traders that can be thought of as “market makers”.3 Contrary to
of
models in the spirit of De Long et al. (1990a), the model of Campbell et al. (1993) derives
ro
noninformational trading from random shifts in the level of risk aversion related to the large
-p
subset of traders that is made up of liquidity traders. In equilibrium, this change in the level of
risk aversion leads to an increase in the expected return of the risky asset (i.e., stock), which
re
compensates the risk-averse market makers for bearing the risk of holding that asset. Risk is
lP
reallocated from the noninformational traders to the rest of the market since the former are
selling the stock, whereas the market makers accommodate this selling pressure. As a result,
na
trading volume increases, the current stock price falls, inducing a negative current return and
ur
Even though the differences between investor sentiment and risk aversion are of a rather
philosophical nature with regards to noise trader and liquidity trader theories (Tetlock, 2007), the
latter allows Campbell et al. (1993) to establish a direct link between trading volume and
expected future returns. Moreover, this link is nonlinear since, in their model, the expected
future stock return is related to the lagged stock return and an interaction term of lagged stock
return and trading volume. Such nonlinearities may pose a problem for econometricians when
trying to capture the relation between stock returns and trading volume with simple linear
models.
3
Market makers in the sense of Grossman and Miller (1988), even though they may not be specialists on the
exchange and potentially hold positions for longer periods of time.
6
Coming back to the example illustrated in Figure 1, in a highly electronic market, a large shift in
the demand for risky assets may also originate with “noninformed” algorithms trading on a signal
and ultimately amplifying the selling pressure in that market (e.g., Kirilenko et al., 2017). While
this discussion only serves to provide a brief outline of one channel that may explain the
relationship between stock prices and trading volume, it is by no means exhaustive. For further
details, we refer the interested reader to the references listed in the previous section.
We draw upon the concept of Shannon transfer entropy to quantify empirically the residual
of
information transfer between stock returns and trading volume growth. Therefore, let us begin
ro
by elaborating on this measure of information transfer in greater detail, before introducing our
Shannon transfer entropy is a model-free measure that characterizes the randomness of draws
na
from a specific probability distribution (for some general remarks, see Bossomaier et al., 2016;
Dehmer et al., 2017).4 Denote by I a discrete random variable that can take on n distinct values
ur
and let i indicate one such outcome of I. Then, taking the probability mass function (pmf) of I,
Jo
=− , (1)
with 0 ≤ ≤ 1 and ≥ 0. It follows that is large for small , i.e. unlikely outcomes
convey more information. By taking the expectation of Equation (1) over all possible outcomes i,
= η = −∑ ⋅ . (2)
4
Entropy is also defined as the expected log-likelihood ratio in some contexts (Hansen and Sargent, 2008, p. 55)
5
The base of the logarithm only affects the unit of measurement. By taking the base 2 logarithm, the gain in
information when observing i is measured in bits.
7
Since this entropy is a univariate measure for uncertainty, it is maximized when I follows a
uniform distribution and decreases with the degree of dispersion of the probability distribution of
the concept of mutual information, which is in turn based on the Kullback-Leibler divergence
(Kullback and Leibler, 1951). Considering two different pmf’s and for the same random
variable I, the Kullback-Leibler divergence is a measure for the difference between these two
=∑ ⋅ . (3)
of
Instead of considering two different pmf’s for the same random variable, we can also rewrite the
ro
Kullback-Leibler divergence in Equation (3) in terms of the difference between the joint and the
-p
marginal pmf’s of two discrete random variables I and J, which is then called mutual information:
= ∑ ∑$
,$
! ,# ⋅
re
$
, (4)
with and
joint pmf. Comparing this equation with Equation (3), the intuition behind mutual information
na
! can be stated analogously: It measures the difference between the distribution of the
ur
random variable I, with its marginal pmf , and the distribution of the random variable J, with
statistical dependence among the two discrete random variables I and J. Note that only in case of
case of statistical independence of I and J, for our intended empirical application to the bivariate
system of stock returns and trading volume growth, we need an asymmetric measure in order to
To this end, Schreiber (2000) considers transition probabilities in order to introduce time series
dynamics into the framework of mutual information. Let I and J now denote stationary Markov
processes of order l and h, respectively. By measuring the deviation from the generalized Markov
8
#& + = #& , … , #&-+'( . , Schreiber (2000) proposes to quantify the information transfer from
process J to I in the form of the Shannon transfer entropy, which is given by:
5 6
234 ) 2 ,$2
/!→ = ∑ ∑$ &'( , , #& ⋅ 1 7.
* +
& 5
234 ) 2
(5)
Again, comparing Equation (5) with the previous equation for mutual information, one sees that
the two look similar. Both equations measure deviations from statistical independence. The
important change that enables one to go from mutual information to Shannon transfer entropy is
the introduction of the generalized Markov property. Thereby, this measure now allows us to
quantify the informational gain that we can achieve in predicting the future value of process I by
of
observing past values of J instead of only observing past values of I itself. The information flow
ro
of the information transfer between the two processes by taking the difference of /!→ and / →8 .
-p
This step constitutes an improvement over (non-)linear Granger causality tests, which generally
re
do not permit to deduce the dominant direction of the information transfer.
lP
Given that we work with financial time series, we have to discretize the time series before
na
estimating Shannon transfer entropy. Following Dimpfl and Peter (2013), we subsequently
partition our time series into three bins and use an encoded time series for estimation. Since we
ur
are interested in general tail events, i.e., extreme residual values that potentially contain
Jo
nonlinear information, three bins should suffice for the application we have in mind. Specifying
two quantiles of the empirical distribution of the observed time series {:& }=&<( , ( and , such an
1 for :& ≤ (
>& = ? 2 for ( < :& < ∀F .
3 for :& ≥
(6)
The encoding replaces each value in {:& }=&<( by one of the three integers {1,2,3}, in line with
Equation (6). Lastly, in order to provide statistical inference, we follow Horowitz (2003) and
bootstrap the underlying Markov processes I and J based on the calculated transition
probabilities (see also the outline in Behrendt et al., 2019). While the statistical dependencies
between I and J are removed, dynamics of each process are preserved. Thus, a bootstrap sample
Our approach to test for nonlinear residual information in the bivariate system of stock returns
and trading volume growth is straightforward and follows the general idea of Hiemstra and Jones
(1994): If we use a VAR model to act as a linear filter on a bivariate system in order to purge the
residual series of any linear auto- and cross-correlations, then all remaining predictive power of
one residual series for the other may be attributed to nonlinear information. Instead of the
of
nonlinear Granger causality test proposed by Hiemstra and Jones (1994), we draw upon the
ro
concept of Shannon transfer entropy, as outlined above, to test for such information transfer
from one residual series to the other. There are two reasons for this choice: Firstly, the nonlinear
-p
Granger causality test proposed by Hiemstra and Jones (1994) is based on the correlation integral,
re
which has initially been introduced by Grassberger and Procaccia (1983) to measure the similarity
lP
of time series obtained from dissipative dynamical systems exhibiting chaotic behavior. However,
their test does not account for temporal correlation in the formulation of the correlation integrals
na
to be estimated and thus it is not clear that these estimators indeed suffice as measures of spatial
correlation (for a detailed discussion, see ch. 6 in Kantz and Schreiber, 2004). As a result, the
ur
correlation integrals used to obtain estimates for the joint probabilities in their formulation of the
Jo
nonlinear Granger causality test are generally biased if some temporal correlation is still present
in the residual series. Secondly, and more importantly, in addition to detecting nonlinear residual
information, relying on Shannon transfer entropy allows us to infer on the dominant direction of
the residual information transfer, which constitutes an improvement over Granger-type causality
Let us be more precise and describe the idea of the two-step testing procedure applied in the
empirical analysis below in more detail. The first step involves the estimation of a bivariate
= =
VAR model for the two appropriately standardized series JK(,& L&<( and JK ,& L&<(:7
6
For a recent contribution to the literature, see Camacho et al. (2020), who develop a nonlinear Granger causality test
based on Shannon entropy for longitudinal data.
7
For more information on the general structure of a VAR model, please see, for example, Lütkepohl (2013).
10
K(,& P( T
$
T(
$
K(,&-$ ε(,&
M N = OP Q + ∑$<( S (( UM N + Oε Q,
K ,& T
$
T
$ K ,&-$ ,&
(7)
(
P( T T(
$ $
where OP Q and S (( U, # = 1, … , , denote the vector of constants and the autoregressive
T( T
$ $
coefficient matrices, respectively. The VAR-order p is chosen large enough to account for any
linear auto- and cross-correlations in the system. In a practical application, the lag order can be
determined with the usual information criteria. If there is any auto- or cross-correlation left, an
overparameterized specification should be estimated to purge the series of all remaining linear
correlation. Obtaining residuals series free of any linear correlation is a crucial step in context of
our investigation into the directional information transfer between stock returns and trading
of
volume growth. Only then are we able to look beyond linear dependencies into potentially
ro
=
remaining nonlinear relations present in the residual series of the VAR model,JεW(,& L&<( and
=
-p
JX̂ ,& L&<( . For the innovations in Equation (7), we may relax the Gaussian white noise assumption,
re
which is usually made in the VAR context, since our main goal is not statistical inference related
to the VAR coefficients. Accordingly, in the second step, /ZW4,2 →ZW[,2 and /ZW4,2 →ZW[,2 are estimated for
lP
the residual series and used to determine whether or not there is still a statistically significant
na
information transfer from one residual series to the other and, if this is the case, the dominant
ur
direction of that information transfer. Note that the above two-step procedure can easily be
extended to more system variables by relying on group transfer entropy, as described by Dimpfl
Jo
and Peter (2018) in the context of volatility transmission. See Appendix 7.2 for some simulation
experiments underlining the usefulness of the two-step procedure in the case of nonlinear
In finite samples, Shannon transfer entropy estimates are likely to be biased upwards. This is why
we follow Marschinski and Kantz (2002) and calculate a bias corrected, or effective, transfer
11
where, /\W] →\W^ denotes
shuffled
the estimated Shannon transfer entropy with a shuffled time series of
= =
JX̂ ,& L&<( . A shuffled time series is obtained from the encoded time series of JεW ,& L&<( by randomly
drawing realizations from it and generating a new time series from these draws. Doing so
destroys any statistical dependence between εW shuffled and εW$ and /\W] →\W^
shuffled
converges to zero
for/ → ∞, while a non-zero value of /\W] →\W^ is an indication of finite sample bias. We calculate
/\W] →\W^ for residual information transfer in both directions by shuffling a sufficient number of
times and subtracting the mean over all shuffles from the Shannon transfer entropy that is
calculated as given by Equation (5).
4 Data
of
After the above introduction of the two-step procedure used to test for nonlinear residual
ro
information transfer in the bivariate system of stock returns and trading volume growth, we now
-p
turn to the description of the data set that is used in the subsequent empirical analysis. We
re
gather daily observations of stock prices and trading volumes for all constituents of the S&P 500
lP
and DJIA between January 3, 2000, and December 29, 2017, from the Thomson Reuters data base.
This amounts to 417 individual-level stocks for which we can test for nonlinear residual
na
information transfer.8 Since the time span of our data includes the global financial crisis in 2008
and 2009, we conduct our analysis not only on the full sample from 2000 to 2017 but also on two
ur
subsamples, representing the pre-crisis period (2000 - 2007) and the post-crisis period (2010 -
Jo
2017). This leads to a maximum of 2263 observations for each stock in the pre-crisis and 2013
observations in the post-crisis sample, while in the overall sample we have a maximum of 4528
observations for each of the 417 stocks.9 For all of these stocks, daily closing prices are used to
calculate log-returns as given by j& = & − &-( , where & denotes the respective
closing price on trading day t. We note that augmented Dickey Fuller (ADF) tests reveal the
presence of unit roots, a sign of nonstationary behavior, in the daily trading volume series,
denoted by k& , for some of the 417 individual-level stocks in the pre- and post-crisis samples. In
order to obtain stationary time series for our analysis, we calculate volume growth for each stock
and, to stay consistent across (sub)samples, proceed with the volume growth series l& instead of
8
The complete list of stocks and detailed descriptive statistics are available from the authors upon request.
9
Due to discrepancies in recorded trading days among the stocks in our sample, the actual number of observations
per stock and sample may differ. However, we only include stocks that have at least 4000 observations in the full
sample, i.e. during the period from 2000 to 2017.
12
trading volume in levels in all three samples. This is consistent with the application in Hiemstra
For our empirical application, we first use a linear VAR model to filter out all linear auto- and
cross-correlations in the bivariate system of stock returns and trading volume growth, as given in
Equation (7). The VAR model with daily log-returns j& and volume growth l& reads
as follows:
of
j& P( T
$
T(
$
j&-$ Xm,&
Ol Q = OP Q + ∑$<( S (( U Ol Q + OX Q .
T T
$ $
ro
& &-$ n,&
(9)
(
-p
As above, we relax the Gaussian white noise assumption for the innovations Jεm,& L&<( and
=
re
=
Jεn,& L&<( since our main objective is not statistical inference within the VAR framework, but the
lP
application of a suitable linear filter. Note that the two system variables j& and l are replaced by
na
calendar- and volatility-adjusted series, indicated by superscripts (∗) and (∗∗), respectively, in the
following empirical analysis. We use the Bayesian information criterion (BIC) to determine the lag
ur
length p of the VAR for each stock individually. Even though the autocorrelation functions for
Jo
most stocks do not suggest any form of strong autocorrelation in the VAR residuals, we also
specify a heavily overparameterized version of each VAR model. These VAR models with lag
length equal to = 20 serve to eliminate any potentially remaining linear dependencies not
captured by the VAR models with shorter BIC-determined lag lengths.10 In our discussion below,
we only report results for the VAR models whose lag length is determined via the BIC since
results for the overparameterized VAR models are qualitatively similar to the reported results.
Detailed results for these overparameterized models are relegated to Appendix 7.3. The residual
series from the VAR model in Equation (9) are then used to estimate Shannon transfer entropy as
a measure that quantifies any remaining nonlinear residual information. Considering that all
residual series of the different VAR models appear to exhibit a pronounced excess kurtosis, a
Shannon transfer entropy and the aforementioned discretization into three bins appears to be a
reasonable approach. We use 200 shuffles to compute the effective transfer entropies and
Following the general arguments and references in Hiemstra and Jones (1994), we do not use the
raw stock return and trading volume growth time series as input to the bivariate VAR in Equation
(9), but apply a generic filter to the data beforehand. This filter corrects the mean and variance of
of
both the log-return and trading volume growth time series for the potential systematic influence
ro
that a certain day of the week or month of the year can have on the observations. In order to do
so, we use the approach by Gallant et al. (1992) and create dummy variables for each day of the
-p
week and each month of the year, collected in the matrix &. These serve as independent
re
variables for two linear regressions, which are summarized as follows:
lP
K& = β ⋅ + ε&
na
variance equation:
Jo
where K& stands for either one of the system variables j& and l& . For the mean equation, the log-
return and trading volume growth series, respectively, are regressed on the dummy variables in
rs . Thus, the residuals of the mean equation X̂& should capture the part of K& that is not explained
by calendar effects. Using the logarithm of the squared residuals of this first regression, X̂& ,
as dependent variable, the same dummy variables in rs also serve as regressors for the variance
equation. The idea behind this second regression is essentially the same as for the mean equation.
Lastly, the final calendar adjusted time series are then obtained by standardizing the residuals of
\u2
K&∗ = vw u⋅y2 /
x
. (11)
14
Applied to our two system variables, this results in calendar adjusted log-returns j&∗ and trading
Table 1 summarizes the results of our two-step procedure using these two calendar-adjusted
series as input variables to the bivariate system for the complete time period from 2000 to 2017
(Full), the time period from 2000 to 2007 (Pre-Crisis) and the time period from 2010 to 2017
(Post-Crisis). Columns denoted by “# stocks” show the number of stocks for which the respective
test yields estimates that are statistically significant on a 10% significance level.11 Moreover,
columns denoted by “+” (“−”) in Panels B and C of Table 1 report the number of stocks that show
a positive (negative) sign of the difference between the effective transfer entropy estimates,
of
/\W{∗ →\W|∗ − /\W|∗ →\W{∗ . Thus, a positive difference indicates a dominant nonlinear residual
ro
information transfer from calendar-adjusted log-returns to calendar-adjusted trading volume
-p
growth and a negative difference a dominant information transfer in the opposite direction. In
re
order to speak of a dominant direction of the information transfer, we require at least one of the
Panel A of Table 1 reports the results of the linear Granger causality tests on the original time
na
series of calendar-adjusted stock returns and trading volume growth. Importantly, note that we
are not proposing the linear Granger causality test as a benchmark for Shannon transfer entropy.
ur
We rather use Granger causality as a first step in our analysis to capture the linear dependencies
Jo
present between the two time series under investigation. As Panel A shows, for all three samples,
we can find a considerable amount of stocks for which either of the two tested Granger causality
hypotheses can be rejected. It also seems as if the first tested hypothesis (calendar-adjusted log-
returns do not Granger cause calendar-adjusted trading volume growth) is rejected for more
stocks than is the case for the second Granger hypothesis (calendar-adjusted trading volume
growth does not Granger cause calendar-adjusted log-returns). In fact, we find that the former
hypothesis is reject for at least twice as many stocks compared to stocks for which the latter
11
Our main findings are robust to an alternative choice of the significance level of 5%. Results are available upon
request.
12
While it is generally possible to use the bootstrap results to test for statistical significance of the differences as
well, we have chosen this pragmatic approach. The reason is that it is possible to obtain a statistically significant
difference, even though the transfer entropy estimates themselves are statistically insignificant. Obviously, such an
“information transfer” does not make sense economically. On the other hand, if one of the estimates is significant and
the other is not, the latter is fairly small and statistically indistinguishable from zero. Hence, it seems reasonable to still
refer to the difference of these estimates as an indicator of the dominant direction of the nonlinear residual
information transfer.
15
hypothesis can be rejected. This finding is consistent across all three samples and indicates that,
on average, calendar-adjusted stock returns would be a more reliable predictor of trading volume
growth than vice versa. However, Panel A only informs about linear dependencies. The Granger
causality tests also cannot inform about the dominant direction of the information transfer. If, for
example, for a given stock, we can reject both Granger hypotheses, we would still not know
whether the information transfer from stock returns to trading volume growth is dominating or
vice versa. The following Panels B and C can help to shed light on this question. They furthermore
illustrate clearly that important information would remain inaccessible if one would only consider
the linear Granger causality tests. As elaborated above and illustrated by the simulation results in
of
the appendix, any kind of nonlinear information left in the residuals is not captured by this testing
ro
procedure. Thus, finding a statistically significant information flow among the VAR residuals,
which, by construction, are purged of linear dependencies, would imply that a linear model such
-p
as a VAR is not suitable to fully assess the directional information transfer between stock returns
re
and trading volume growth.
For estimation of the effective transfer entropies, we use a Markov order of = ℎ = 1. Given
lP
that the first step removes all linear auto- and cross-correlations in the bivariate system, we
na
consider Markov processes of small order in the transfer entropy calculation.13 Moreover, we
= 0.05 and = 0.95 as the first pair of quantiles used for discretization and = 0.1
ur
choose ( (
= 0.9 as the second pair. In our specific application, the quantile choice of ( = 0.01 and
Jo
and
= 0.99 would imply to rely on about 20 observations for inference in our pre- and post-crisis
samples, which is why we do not report these results here. They are, however, available upon
request. As one can see from the first row of the column “# stocks” in Panel B for the full time
period, we find significant estimates for the residual information transfer from j&∗ to l&∗ for 226 of
the 417 stocks. This implies that there appears to exist an informational gain in predicting future
values of trading volume growth when utilizing past values of stock returns for more than half of
the stocks of our sample. For 91 out of 417 stocks, there also appears to exist an informational
gain using past observations of trading volume growth to predict stock returns, as indicated by
the second row of that column. Similar findings can be observed for the pre-crisis and post-crisis
Results for Markov order = ℎ = 2 can be found in Table 7.4 in the appendix. Generally, for the other
13
specifications below, additional results for Markov order = ℎ = 2 are reported in Appendix 7.3.
16
time periods, with the number of significant estimates being the lowest in the post-crisis time
period. However, even here, we still find significant, bidirectional residual information transfer
Another insight we can gain from Panel B is that for most of the stocks the dominant direction of
information transfer in all samples appears to point from stock returns to trading volume growth
since we find more stocks for which the difference between the effective transfer entropy
estimates /\W{∗ →\W|∗ − /\W|∗ →\W{∗ is positive than we observe negative differences (columns “+”
and “−”, respectively). This observation of a dominant direction of information flow from returns
to trading volume growth is even more pronounced for the 10% and 90% quantile choices in
of
Panel C. Especially for the post-crisis sample, we now find that for 221 stocks the information
ro
predominantly flows from returns to trading volume growth, while for only 63 stocks information
-p
flows in the opposite direction. Thus, the results of both quantile choices indicate that the
informational gain we can achieve by using past observations of j&∗ for predicting future values of
re
l&∗ is larger than the informational gain in predicting future values of j&∗ using past observations of
l&∗. While these observations are in line with the Granger causality tests, their informational
lP
content is richer by revealing the direction in which the information predominantly flows. They
na
also show that even after the two series have been purged of any linear dependencies, there still
ur
is important information available in the residual series. Moreover, with respect to these findings,
Jo
no particular industry clustering can be found. Thus, while we discover no overall pattern that
holds for all stocks in our sample, these mixed findings are well in line with the previous empirical
literature. A further disentanglement of the potential reasons why the dominant direction of the
information transfer is different for some stocks than for others is beyond the scope of this paper.
In summary, our findings indicate that a linear model such as a VAR model does not provide an
accurate assessment of the directional relation between stock returns and trading volume growth
since it does not account for all information in the bivariate system. We find evidence for
to calendar-adjusted trading volume growth and vice versa in our sample of stocks and across the
three time periods considered. In terms of magnitude of the transfer entropy estimates, the
dominant direction of information transfers points from returns towards volume for a majority of
17
stocks. This nonlinear residual information transfer in the bivariate system is not detected by the
nonlinear residual information. /\W{∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗,& }, where j ∗
and l ∗ stand for the calendar-adjusted log-return and trading volume growth series, respectively. The
results of the linear Granger causality tests performed on the calendar-adjusted series of stock returns
and trading volume growth are given in Panel A. For each of the tests, the column “# stocks” reports the
amount of stocks that show a p-value of that respective test of < 0.1. The columns “+” and “−”,
respectively, in Panels B and C count for how many of these stocks the difference in effective transfer
is set to = ℎ = 1. For discretization, the 5% and 95% (Panel B) and the 10% and 90% (Panel C)
entropies is positive or negative. For the estimation of the effective transfer entropy, the Markov order
the computation of /ZW{∗ →ZW|∗ and /ZW|∗ →ZW{∗ and statistical inference is based on 500 bootstrap
quantiles of the respective empirical distribution of the residuals are chosen. 200 shuffles are used in
of
replications. The amount of lags used in the VAR model, from which the residuals series are obtained, is
determined by the BIC. The “Full” period ranges from January 2000 to December 2017, while “Pre-Crisis”
ro
ranges from January 2000 to December 2007 and “Post-Crisis” ranges from January 2010 to December
2017.
Panel A
Full
-p Pre-Crisis Post-Crisis
re
# stocks # stocks # stocks
•j€• ‚jm ∗ →n∗
•j€• ‚jn∗ →m ∗
322 221 198
lP
92 100 72
na
( = 0.05
Panel B
/\W{∗ →\W|∗
# stocks + - # stocks + - # stocks + -
/\W|∗ →\W{∗
226 205 21 159 154 5 134 123 11
Jo
91 35 56 53 15 38 50 10 40
( = 0.10
Panel C
/\W{∗ →\W|∗
# stocks + - # stocks + - # stocks + -
/\W|∗ →\W{∗
278 259 19 168 161 7 221 211 10
122 70 52 66 15 51 63 30 33
14
By construction, the Granger causality test does not reveal any statistically significant information transfer
between the residual series obtained from the VAR model.
18
While accounting for calendar effects in the stock return and trading volume growth time series
seems to be a reasonable approach, persistence in stock return volatility could be responsible for
(part of) the detected nonlinear residual information transfer in the bivariate system and thus
influence the results. Therefore, we now further investigate whether the findings in the previous
section prove robust when additionally accounting for persistence in return volatility. Hiemstra
and Jones (1994) argue along the lines of the common factor model proposed by Clark (1973), in
which daily stock returns j& and trading volume k& are both modeled as a function of the latent
of
(12)
k& = ƒ >& ,
ro
where • ⋅ and ƒ ⋅ are some unknown functions and X& denotes i.i.d. noise. It follows from
-p
Equation (12) that return variance, which can be expressed as l€j j& = • >& ⋅ l€j ε& , is
re
influenced by the latent speed of information flow. Since trading volume is a function of the
lP
latter, both volume and returns are driven by the same factor. As a result, lagged values of
trading volume, k& , capturing temporal dependence of the latent speed of information flow,
na
might induce a spurious correlation between trading volume and stock returns that may
ur
persistence of return volatility, which is induced by the dependence of l€j j& on. >& Therefore,
Jo
we can eliminate this spurious correlation by accounting for the persistence in return volatility, as
explained by Andersen (1996). In order to do so, we follow Hiemstra and Jones (1994) and use
Nelson’s (1991) exponential GARCH (EGARCH(p,q)) to model return volatility persistence. The
EGARCH(p,q) model is especially appealing since it allows for a leverage effect in the volatility
equation:
19
where F = 1, 2, … , / , †&-( denotes the information set in period F − 1 , αI , … , α are the
parameters for the GARCH effects, and βI , … , β are those for the ARCH effects. While we fix the
lag lengths of the latter to be = 1, we let the lag length p for the GARCH effects vary between 1
and 6 and use the BIC to determine the optimal model fit for each stock individually. As a result,
the calendar- and volatility-adjusted stock return time series j&∗∗ are then given as:
where W̃& are obtained via the EGARCH(p,q) model for each stock.
of
Table 2 summarizes the results for the linear Granger causality tests and the two-step procedure
ro
for the residuals of the VAR in Equation (9) using calendar- and volatility-adjusted stock returns as
-p
the first system variable and calendar-adjusted trading volume growth as the second system
variable. The Granger causality tests performed on the two adjusted time series, as they are
re
summarized in Panel A, reveal linear dependencies between stock returns and trading volume
lP
growth for between 12% and 72% of the 417 stocks of the sample. Similar to the case of the
calendar-adjusted stock returns, also for the volatility-adjusted time series, we find that these
na
adjusted stock returns on average constitute a more reliable predictor of trading volume growth
ur
than vice versa. As Table 2 reveals, we find at least three times as many stocks for which we
Jo
reject the first Granger hypothesis (calendar- and volatility-adjusted log-returns do not Granger
cause calendar-adjusted trading volume growth) than we find stocks for which we can reject the
second (calendar-adjusted trading volume growth does not Granger cause calendar- and
volatility-adjusted log-returns).
Turning to the effective transfer entropy estimates and using Markov orders of = ℎ = 1 and
quantiles ( = 0.05 and = 0.95 in Panel B and ( = 0.1 and = 0.9 in Panel C, the results
resemble those found in Panels B and C of Table 1. We find a substantial amount of stocks for
which the effective transfer entropy estimates are significantly different from zero. While the
total amount of stocks with significant effective transfer entropy estimates decreases in
comparison with Table 1, the dominant direction of nonlinear residual information transfer for
the majority of stocks also remains unchanged - pointing from stock returns to trading volume
20
growth. This pattern is increasingly visible when enlarging the quantiles used for discretization
from 5% and 95% to 10% and 90%: In both pre- and post-crisis samples, the amount of stocks for
which we detect a dominant direction of information flow from returns to trading volume growth
increases by 30% to 70%, while for the opposite direction of information flow, the amount of
significant stocks either increases by only 1% (post-crisis) or even decreases (pre-crisis). Again,
these findings are generally in line with the Granger causality test results in Panel A. However,
the latter do not reveal any dominant direction of the information transfer nor do they capture
these nonlinear dependencies. Summarizing the findings in Table 2, accounting for volatility
persistence in stock returns holds two important implications: Firstly, it seems that some of the
of
detected nonlinear residual information transfer detected in Section 5.1 is due to volatility effects,
ro
similar to the findings in Hiemstra and Jones (1994). Secondly, the main conclusions drawn in the
previous section - the failure of a linear model to accurately assess the (nonlinear) relation
-p
between stock returns and trading volume growth and a dominant direction of information
re
transfer from returns to trading volume growth for the majority of stocks - still appear to be valid
lP
21
Table 2: Volatility-adjusted time series (Markov order 1, BIC)
nonlinear residual information. /\W{∗∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗∗,& }, where
j ∗∗ and l ∗ stand for the calendar- and volatility-adjusted log-return and calendar-adjusted trading
volume growth series, respectively. The results of the linear Granger causality tests performed on the
volatility-adjusted series of stock returns and trading volume growth are given in Panel A. For each of
the tests, the column “# stocks” reports the amount of stocks that show a p-value of that respective test
of < 0.1. The columns “+” and “−”, respectively, in Panels B and C count for how many of these stocks
transfer entropy, the Markov order is set to = ℎ = 1. For discretization, the 5% and 95% (Panel B) and
the difference in effective transfer entropies is positive or negative. For the estimation of the effective
200 shuffles are used in the computation of /ZW{∗∗ →ZW|∗ and /ZW|∗ →ZW{∗∗ and statistical inference is based
the 10% and 90% (Panel C) quantiles of the respective empirical distribution of the residuals are chosen.
on 500 bootstrap replications. The amount of lags used in the VAR model, from which the residuals
series are obtained, is determined by the BIC. The “Full” period ranges from January 2000 to December
2017, while “Pre-Crisis” ranges from January 2000 to December 2007 and “Post-Crisis” ranges from
January 2010 to December 2017.
of
Panel A
ro
Full Pre-Crisis Post-Crisis
# stocks # stocks # stocks
•j€• ‚jm ∗∗ →n∗ -p
•j€• ‚jn∗ →m ∗
299 211 175
65 65 52
re
( = 0.05
Panel B
lP
/\W{∗∗ →\W|∗
# stocks + - # stocks + - # stocks + -
na
/\W|∗ →\W{∗∗
186 174 12 115 111 4 105 93 12
72 21 51 53 13 40 79 8 71
ur
( = 0.10
Panel C
Jo
/ZW{∗∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
/ZW|∗ →ZW{∗∗
249 229 20 151 144 7 180 165 15
122 58 64 42 9 33 80 24 56
6 Concluding remarks
In this paper, we have applied a practical two-step procedure to test for nonlinear residual
trading volume growth for a sample of 417 individual-level stocks over an 18 years time period.
The procedure draws upon the concept of Shannon transfer entropy in the second step, which is
22
not only a highly versatile nonparametric measure to quantify any kind of statistical dependence
between two time series but also allows to infer the dominant direction of the information
transfer. In our application to the bivariate system of stock returns and trading volume growth,
causality test.
Our main results are robust when considering three different time periods and when additionally
accounting for volatility persistence in stock returns: We find evidence for statistically significant
nonlinear residual information transfer from stock returns to trading volume growth and vice
versa, where for most stocks this nonlinear residual information predominantly flows from
of
returns to trading volume growth. Importantly, the question of why the dominant direction of
ro
the nonlinear information transfer is different for some stocks than for others merits a closer
-p
investigation and is left for future research. From a technical point of view, given the large
fraction of stocks for which we find such nonlinear residual information transfer, it may be
re
argued that the widespread use of linear models to assess this directional relation does not
lP
constitute a sufficient approach. Thus, empirical applications such as forecasting and trading are
likely to benefit from incorporating nonlinear dynamics in their modeling approaches, for which
na
We can think of several extensions to further validate our findings. For example, while our sample
Jo
look at small to mid cap stocks, which are not part of the S&P 500 or DJIA. Moreover, considering
specific industries that, on the one hand, have a high impact on real economic activity or, on the
other hand, are important from a systemic risk point of view might be a worthwhile endeavor.
The two-step procedure could also be used to study the directional relation of returns and
trading volume for cryptocurrencies, which regularly display highly volatile periods. It should also
be emphasized that the two-step procedure is not restricted to our specific application but may
be used to shed light on other important questions involving financial time series. From a more
technical point of view, it is also possible to extend the second step Shannon transfer entropy
measure to more than two variables possibly extending the testing procedure to higher-
dimensional systems (for a group transfer entropy measure see, for example, Dimpfl and Peter,
2018).
23
References
Andersen, T. G. (1996) Return volatility and trading volume: An information flow interpretation of
Behrendt, S., Dimpfl, T., Peter, F. J. and Zimmermann, D. J. (2019) RTransferEntropy – Quantifying
information flow between different time series using effective transfer entropy, SoftwareX, 10,
100265.
Bossomaier, T., Barnett, L., Harré, M. and Lizier, J. T. (2016) An introduction to transfer entropy:
of
Brock, W. (1991) Causality, chaos, explanation and prediction in economics and finance, in
ro
Beyond belief: Randomness, prediction and explanation in science (Eds.) J. Casti and A. Karlqvist,
Campbell, J. Y., Grossman, S. J. and Wang, J. (1993) Trading volume and serial correlation in stock
na
Chen, S.-S. (2012) Revisiting the empirical linkages between stock returns and trading volume,
Chuang, C.-C., Kuan, C.-M. and Lin, H.-Y. (2009) Causality in quantiles and dynamic stock return -
Clark, P. K. (1973) A subordinated stochastic process model with finite variance for speculative
Copeland, T. E. (1976) A model of asset trading under the assumption of sequential information
De Long, J. B., Shleifer, A., Summers, L. H. and Waldmann, R. J. (1990a) Noise trader risk in
investment strategies and destabilizing rational speculation, Journal of Finance, 45, 379– 395.
Dehmer, M., Emmert-Streib, F., Chen, Z., Li, X. and Shi, Y. (Eds.) (2017) Mathematical foundations
Dimpfl, T. and Peter, F. J. (2013) Using transfer entropy to measure information flows between
Dimpfl, T. and Peter, F. J. (2018) Analyzing volatility transmission using group transfer entropy,
of
Easley, D., De Prado, M. M. L. and O’Hara, M. (2011) The microstructure of the flash crash: Flow
ro
toxicity, liquidity crashes and the probability of informed trading, Journal of Portfolio
305–321.
na
Gagnon, L. and Karolyi, G. A. (2009) Information, trading volume, and international stock return
comovements: Evidence from cross-listed stocks, Journal of Financial and Quantitative Analysis,
ur
44, 953–986.
Jo
Gallant, A. R., Rossi, P. E. and Tauchen, G. E. (1992) Stock prices and volume, Review of Financial
Studies, 5, 199–242.
Grossman, S. J. and Miller, M. H. (1988) Liquidity and market structure, Journal of Finance, 43,
617–633.
25
Hiemstra, C. and Jones, J. D. (1994) Testing for linear and nonlinear Granger causality in the stock
Hinich, M. J. and Patterson, D. M. (1985) Evidence of nonlinearity in daily stock returns, Journal of
Horowitz, J. L. (2003) Bootstrap methods for Markov processes, Econometrica, 71, 1049– 1082.
Jennings, R. H., Starks, L. T. and Fellingham, J. C. (1981) An equilibrium model of asset trading
Kantz, H. and Schreiber, T. (2004) Nonlinear time series analysis, Cambridge University Press, New
of
York, second edn.
ro
Karpoff, J. M. (1987) The relation between price changes and trading volume: A survey, Journal of
-p
Financial and Quantitative Analysis, 22, 109–126.
re
Kirilenko, A., Kyle, A. S., Samadi, M. and Tuzun, T. (2017) The Flash Crash: High-frequency trading
lP
Statistics, 1, 79–86.
ur
Lakonishok, J. and Smidt, S. (1989) Past price changes and current trading volume, Journal of
Jo
Lee, B.-S. and Rui, O. M. (2002) The dynamic relationship between stock returns and trading
volume: Domestic and cross-country evidence, Journal of Banking and Finance, 26, 51–78.
Marschinski, R. and Kantz, H. (2002) Analysing the information flow between financial time series:
An improved estimator for transfer entropy, European Physical Journal B: Condensed Matter
McMillan, D. G. (2007) Non-linear forecasting of stock returns: Does volume help?, International
26
Nelson, D. B. (1991) Conditional heteroskedasticity in asset returns: A new approach,
(Advance Article).
Schreiber, T. (2000) Measuring information transfer, Physical Review Letters, 85, 461–464.
Shannon, C. E. (1948) A mathematical theory of communication, Bell System Technical Journal, 27,
379–423.
Smirlock, M. and Starks, L. (1988) An empirical analysis of the stock price - volume relationship,
of
Journal of Banking and Finance, 12, 31–41.
ro
Tetlock, P. C. (2007) Giving content to investor sentiment: The role of media in the stock market,
27
7 Appendix
In the following, we illustrate how conventional approaches to measure information transfer fail
applicability of the two-step procedure outlined in Section 3.2 to detect the nonlinear
information transfer in such scenarios using three instructive simulation experiments. Note that
for each bivariate system, we report detailed results for one simulation experiment only.
However, qualitatively, results do not significantly change for larger numbers of replications.
of
Monte Carlo results with 200 replications of each simulation and different values for T can be
ro
found below. In most cases, the two-step procedure correctly identifies the residual information
=
transfer. For each of the models, we take Jε ,& L&<( , , # ∈ { 1,2 } and ≠ #, to be Gaussian white
-p
re
noise with zero mean and unit variance. Moreover, for every univariate time series in each of the
™III ™III
bivariate systems, we simulate 4000 observations, i.e., JK(,& L&<( and JK ,& L&<( . The first model is
lP
the same as in the simulation experiment of Dimpfl and Peter (2013), who make use of Shannon
na
transfer entropy to subsequently measure the information transfer between the markets for
credit default swaps and corporate bonds as well as the information transfer between market risk
ur
Model (1)
In this model, {K(,& } follows a stationary autoregressive (AR) process of order one. While the
model does not allow {K ,& } to have any effect on {K(,& }, {K ,& } itself depends linearly on the
lagged value of the first system variable, superimposed with Gaussian white noise. The second
model is similar to the first model, however, the effect of the lagged first system variable on the
second system variable is now nonlinear. In order to avoid negative values of the lagged first
system variable, the absolute value is taken (see Behrendt et al., 2019).
28
Model (2)
In the third model, {K ,& } now follows a stationary AR process of order one, whereas {K(,& }
depends on an interaction term of both lagged system variables, which is superimposed with
Gaussian white noise. This model is related to an example mentioned by Hiemstra and Jones
of
(1994). It dates back to Brock (1991), who uses a similar setup to illustrate how a linear Granger
ro
causality test may fail to reveal an existing nonlinear dynamic relationship.
Model (3) -p
K(,& = 0.4K(,&-( ⋅ K ,&-( + ε(,& ,
re
lP
(7.3)
K ,& = 0.2K ,&-( + ε ,& .
na
Given the true lag structure, for each of the three bivariate systems, we first estimate a VAR(1)
ur
model to purge the residual series of any linear auto- and cross-correlations. In the second step,
œ••• œ•••
after obtaining JεW(,& L&<( and JX̂ ,& L&<( , the effective transfer entropy for both
Jo
possible directions of the residual information transfer is estimated. To this end, we set the
Markov order for both processes to = ℎ = 1 and choose the 5% as well as 95% quantile of the
respective empirical distribution for discretization. It should be noted that the Markov orders l
and h should be set to the same order to facilitate interpretation of the results (Schreiber, 2000).
While = ℎ = 1 is appropriate based on the true lag structure of the three models, the fact that
only Gaussian white noise is considered in these simulation experiments and since it is certainly
more efficient from a computational point of view, setting > 1 and ℎ > 1 might be reasonable
if dynamical residual dependencies lasting longer than one time lag can be expected. Moreover,
estimation of the effective transfer entropy is based on 150 shuffles and statistical inference rests
on 400 bootstrap replications of the underlying Markov processes. For comparison, we also
29
compute the F-statistic and corresponding p-value of the usual linear Granger causality test for
Similar to the Markov order, we consider the case where the respective other system variable is
included with one lag in the unrestricted model.15 The results are illustrated in Table 7.1.
As can be seen, for the first model, the effective transfer entropy estimates are statistically
insignificant in both directions, correctly indicating that there is no nonlinear residual information
transfer present after having applied a linear filter to the time series. On the other
hand, for the second model /\W4 →\W[ and for the third model /\W[ →\W4 are statistically significant.
Thus, for these models, the effective transfer entropy correctly detects the nonlinear residual
of
information transfer, which the VAR(1) model fails to remove in the first step. By contrast, the
ro
linear Granger causality tests fail to reject the null hypothesis of no residual information transfer
-p
for all three models. While this is not an issue for the first model, relying on linear models would
leave out potentially important nonlinear information when the underlying bivariate system
re
entails nonlinearities, as is the case in the latter two models.
lP
Granger causality test. In the first step, each model is filtered by a VAR(1) model. /\W] →\W^ in the first
the form of effective transfer entropy estimates and, for comparison, (ii) the results of the usual linear
two rows refers to the effective transfer entropy from {X̂ ,& } to {X̂$,& }, where , # ∈ {1,2} and ≠ #.
Analogously, •j€• ‚j\W] →\W^ in the last two rows refers to the respective linear Granger causality test.
ur
While for the effective transfer entropy both the estimate and corresponding bootstrapped p-value are
the effective transfer entropy, the Markov order of both processes is set to = ℎ = 1. For discretization,
Jo
given, for the Granger causality test only the p-value of the F-statistic is reported. For the estimation of
the 5% as well as 95% of the respective empirical distribution are chosen. 150 shuffles are used and
statistical inference is based on 400 bootstrap replications. The linear Granger causality test assumes
one lag of the respective other system variable in the unrestricted model.
Model (1) Model (2) Model (3)
Estimate p-value Estimate p-value Estimate p-value
/ZW4→ZW[
0.0008 0.1125 0.0054 0.0000 0.0000 0.3925
15
Obviously, the results of the linear Granger causality test follow, by construction, from the first step of the
procedure. We report these results merely as an illustration of the problem that might be encountered in a practical
application, not as a benchmark.
30
•j€• ‚jZW[→ZW4 - 0.9665 - 0.5571 - 0.8348
In Tables 7.2 and 7.3, we present Monte Carlo results for the simulation experiments outlined
above. While the true lag-structure is known in the simulation experiments, the BIC is used to
empirically determine the lag length of the VAR model in the first step. Thus, the setup is the
same as in the empirical applications. For each of the three models, Tables 7.2 and 7.3 report the
percent chosen, i.e., the relative number of statistically significant occurrences of measured
of
information transfer. In Table 7.2, statistical significance is determined with the 10% significance
ro
level, whereas the 5% significance level is used in Table 7.3. The number of replications is set to R
-p
= 200. As can be seen, the two-step procedure selects the true nonlinear residual information
re
transfer in almost all cases. Only in very few cases, an information transfer, which should not be
present, is discovered by the Shannon transfer entropy measure or even the linear Granger
lP
causality test.
na
Table 7.2: Extensive results for simulated time series, 10% significance level
The table depicts for each of the three models described in Section 7.1 (i) the results of the two-step
Sample sizes vary from / = 2000 to / = 12000 as indicted in each respective column. For each of
estimation procedure and (ii) the results of the linear Granger causality test for the simulated series.
ur
significant occurrences of the respective test procedure over all ž = 200 replications. Statistical
these sample sizes, we report the percent chosen, which gives the relative number of statistically
Jo
/\W4 →\W[
/\W[ →\W4
0.09 0.09 0.07 0.10
/\W4 →\W[
/\W[ →\W4
0.99 1.00 1.00 1.00
31
/\W4 →\W[
/\W[ →\W4
0.09 0.07 0.10 0.12
Table 7.3: Extensive results for simulated time series, 5% significance level
The table depicts for each of the three models described in Section 7.1 (i) the results of the two-step
estimation procedure and (ii) the results of the linear Granger causality test for the simulated series.
Sample sizes vary from T = 2000 to T = 12000 as indicted in each respective column. For each of these
sample sizes, we report the percent chosen, which gives the relative number of statistically significant
occurrences of the respective test procedure over all R = 200 replications. Statistical significance is
determined with the 5% significance level.
of
/\W4 →\W[
/\W[ →\W4
0.04 0.05 0.04 0.04
ro
•j€• ‚jZW4 →ZW[
0.04 0.02 0.04 0.04
/\W4 →\W[
lP
/\W[ →\W4
0.97 1.00 1.00 1.00
/\W4 →\W[
Jo
/\W[ →\W4
0.05 0.02 0.06 0.06
32
7.3 Supplementary tables
nonlinear residual information. /\W{∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗,& }, where j ∗
and l ∗ stand for the calendar-adjusted log-return and trading volume growth series, respectively. The
results of the linear Granger causality tests performed on the calendar-adjusted series of stock returns
and trading volume growth are given in Panel A. For each of the tests, the column “# stocks” reports the
amount of stocks that show a p-value of that respective test of < 0.1. The columns “+” and “−”,
respectively, in Panels B and C count for how many of these stocks the difference in effective transfer
is set to = ℎ = 2. For discretization, the 5% and 95% (Panel B) and the 10% and 90% (Panel C)
entropies is positive or negative. For the estimation of the effective transfer entropy, the Markov order
the computation of /ZW{∗ →ZW|∗ and /ZW|∗ →ZW{∗ and statistical inference is based on 500 bootstrap
quantiles of the respective empirical distribution of the residuals are chosen. 200 shuffles are used in
replications. The amount of lags used in the VAR model, from which the residuals series are obtained, is
of
determined by the BIC. The “Full” period ranges from January 2000 to December 2017, while “Pre-Crisis”
ranges from January 2000 to December 2007 and “Post-Crisis” ranges from January 2010 to December
ro
2017.
Panel A -p
Full Pre-Crisis Post-Crisis
re
# stocks # stocks # stocks
•j€• ‚jm ∗ →n∗
•j€• ‚jn∗ →m ∗
322 221 198
lP
92 100 72
( = 0.05
Panel B
na
/\W{∗ →\W|∗
# stocks + - # stocks + - # stocks + -
/\W|∗ →\W{∗
234 218 16 197 179 18 132 127 5
Jo
( = 0.10
Panel C
/\W{∗ →\W|∗
# stocks + - # stocks + - # stocks + -
/\W|∗ →\W{∗
258 240 18 164 153 11 115 110 5
130 86 44 116 72 44 43 19 24
33
Table 7.5: Volatility-adjusted time series (Markov order 2, BIC)
nonlinear residual information. /\W{∗∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗∗,& }, where
j ∗∗ and l ∗ stand for the calendar- and volatility-adjusted log-return and calendar-adjusted trading
volume growth series, respectively. The results of the linear Granger causality tests performed on the
volatility-adjusted series of stock returns and trading volume growth are given in Panel A. For each of
the tests, the column “# stocks” reports the amount of stocks that show a p-value of that respective test
of < 0.1. The columns “+” and “−”, respectively, in Panels B and C count for how many of these stocks
transfer entropy, the Markov order is set to = ℎ = 2. For discretization, the 5% and 95% (Panel B) and
the difference in effective transfer entropies is positive or negative. For the estimation of the effective
200 shuffles are used in the computation of /ZW{∗∗ →ZW|∗ and /ZW|∗ →ZW{∗∗ and statistical inference is based
the 10% and 90% (Panel C) quantiles of the respective empirical distribution of the residuals are chosen.
on 500 bootstrap replications. The amount of lags used in the VAR model, from which the residuals
series are obtained, is determined by the BIC. The “Full” period ranges from January 2000 to December
2017, while “Pre-Crisis” ranges from January 2000 to December 2007 and “Post-Crisis” ranges from
January 2010 to December 2017.
of
Panel A
ro
Full Pre-Crisis Post-Crisis
# stocks # stocks # stocks
•j€• ‚jm ∗∗ →n∗ -p
•j€• ‚jn∗ →m ∗
299 211 175
65 65 52
re
( = 0.05
Panel B
lP
/\W{∗∗ →\W|∗
# stocks + - # stocks + - # stocks + -
na
/\W|∗ →\W{∗∗
196 184 12 132 124 8 128 115 13
77 42 35 72 23 49 66 23 43
ur
( = 0.10
Panel C
Jo
/ZW{∗∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
/ZW|∗ →ZW{∗∗
211 203 8 104 102 2 99 93 6
49 25 24 33 14 19 41 14 27
34
Table 7.6: Calendar-adjusted time series (Markov order 1, p = 20)
nonlinear residual information. /\W{∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗,& }, where j ∗
and l ∗ stand for the calendar-adjusted log-return and trading volume growth series, respectively. The
results of the linear Granger causality tests performed on the calendar-adjusted series of stock returns
and trading volume growth are given in Panel A. For each of the tests, the column “# stocks” reports the
amount of stocks that show a p-value of that respective test of < 0.1. The columns “+” and “−”,
respectively, in Panels B and C count for how many of these stocks the difference in effective transfer
is set to = ℎ = 1. For discretization, the 5% and 95% (Panel B) and the 10% and 90% (Panel C)
entropies is positive or negative. For the estimation of the effective transfer entropy, the Markov order
the computation of /ZW{∗ →ZW|∗ and /ZW|∗ →ZW{∗ and statistical inference is based on 500 bootstrap
quantiles of the respective empirical distribution of the residuals are chosen. 200 shuffles are used in
set to = 20. The “Full” period ranges from January 2000 to December 2017, while “Pre-Crisis” ranges
replications. The amount of lags used in the VAR model, from which the residual series are obtained, is
from January 2000 to December 2007 and “Post-Crisis” ranges from January 2010 to December 2017.
Panel A
of
Full Pre-Crisis Post-Crisis
ro
# stocks # stocks # stocks
•j€• ‚jm ∗ →n∗
•j€• ‚jn∗ →m ∗
322 221 198
92 -p 100 72
re
( = 0.05
Panel B
= 0.95
lP
/ZW{∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
na
/ZW|∗ →ZW{∗
204 184 20 122 110 12 119 112 7
104 36 68 57 10 47 54 8 46
ur
( = 0.10
Panel C
= 0.90
Jo
/ZW{∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
/ZW|∗ →ZW{∗
255 230 25 124 113 11 166 150 16
124 59 65 57 12 45 75 17 58
35
Table 7.7: Volatility-adjusted time series (Markov order 1, p = 20)
nonlinear residual information. /\W{∗∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗∗,& }, where
j ∗∗ and l ∗ stand for the calendar- and volatility-adjusted log-return and calendar-adjusted trading
volume growth series, respectively. The results of the linear Granger causality tests performed on the
volatility-adjusted series of stock returns and trading volume growth are given in Panel A. For each of
the tests, the column “# stocks” reports the amount of stocks that show a p-value of that respective test
of < 0.1. The columns “+” and “−”, respectively, in Panels B and C count for how many of these stocks
transfer entropy, the Markov order is set to = ℎ = 1. For discretization, the 5% and 95% (Panel B) and
the difference in effective transfer entropies is positive or negative. For the estimation of the effective
200 shuffles are used in the computation of /ZW{∗∗ →ZW|∗ and /ZW|∗ →ZW{∗∗ and statistical inference is based
the 10% and 90% (Panel C) quantiles of the respective empirical distribution of the residuals are chosen.
series are obtained, is set to = 20. The “Full” period ranges from January 2000 to December 2017,
on 500 bootstrap replications. The amount of lags used in the VAR model, from which the residual
while “Pre-Crisis” ranges from January 2000 to December 2007 and “Post-Crisis” ranges from January
2010 to December 2017.
of
Panel A
ro
Full Pre-Crisis Post-Crisis
# stocks # stocks # stocks
•j€• ‚jm ∗∗ →n∗ -p
•j€• ‚jn∗ →m ∗
299 211 175
65 65 52
re
( = 0.05
Panel B
lP
/\W{∗∗ →\W|∗
# stocks + - # stocks + - # stocks + -
na
/\W|∗ →\W{∗∗
156 144 12 88 85 3 84 70 14
66 16 50 58 11 47 82 9 73
ur
( = 0.10
Panel C
Jo
/ZW{∗∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
/ZW|∗ →ZW{∗∗
236 212 24 119 116 3 140 130 10
108 45 63 39 8 31 81 16 65
36
Table 7.8: Calendar-adjusted time series (Markov order 2, p = 20)
nonlinear residual information. /\W{∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗,& }, where j ∗
and l ∗ stand for the calendar-adjusted log-return and trading volume growth series, respectively. The
results of the linear Granger causality tests performed on the calendar-adjusted series of stock returns
and trading volume growth are given in Panel A. For each of the tests, the column “# stocks” reports the
amount of stocks that show a p-value of that respective test of < 0.1. The columns “+” and “−”,
respectively, in Panels B and C count for how many of these stocks the difference in effective transfer
is set to = ℎ = 2. For discretization, the 5% and 95% (Panel B) and the 10% and 90% (Panel C)
entropies is positive or negative. For the estimation of the effective transfer entropy, the Markov order
the computation of /ZW{∗ →ZW|∗ and /ZW|∗ →ZW{∗ and statistical inference is based on 500 bootstrap
quantiles of the respective empirical distribution of the residuals are chosen. 200 shuffles are used in
set to = 20. The “Full” period ranges from January 2000 to December 2017, while “Pre-Crisis” ranges
replications. The amount of lags used in the VAR model, from which the residual series are obtained, is
from January 2000 to December 2007 and “Post-Crisis” ranges from January 2010 to December 2017.
Panel A
of
Full Pre-Crisis Post-Crisis
ro
# stocks # stocks # stocks
•j€• ‚jm ∗ →n∗
•j€• ‚jn∗ →m ∗
322 221 198
92 -p 100 72
re
( = 0.05
Panel B
= 0.95
lP
/ZW{∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
na
/ZW|∗ →ZW{∗
206 178 28 135 116 19 77 73 4
219 127 92 190 84 106 80 17 63
ur
( = 0.10
Panel C
= 0.90
Jo
/ZW{∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
/ZW|∗ →ZW{∗
211 197 14 115 108 7 54 53 1
126 78 48 116 59 57 45 14 31
37
Table 7.9: Volatility-adjusted time series (Markov order 2, p = 20)
nonlinear residual information. /\W{∗∗ →\W|∗ in Panels B and C refers to the effective transfer entropy from
This table summarizes the amount of stocks for which the two-step procedure detects significant
{X̂m ∗∗,& } to {X̂n ∗,& }, while /ZW|∗ →ZW{∗∗ refers to the effective transfer entropy from {X̂n ∗,& } to {X̂m ∗∗,& }, where
j ∗∗ and l ∗ stand for the calendar- and volatility-adjusted log-return and calendar-adjusted trading
volume growth series, respectively. The results of the linear Granger causality tests performed on the
volatility-adjusted series of stock returns and trading volume growth are given in Panel A. For each of
the tests, the column “# stocks” reports the amount of stocks that show a p-value of that respective test
of < 0.1. The columns “+” and “−”, respectively, in Panels B and C count for how many of these stocks
transfer entropy, the Markov order is set to = ℎ = 2. For discretization, the 5% and 95% (Panel B) and
the difference in effective transfer entropies is positive or negative. For the estimation of the effective
200 shuffles are used in the computation of /ZW{∗∗ →ZW|∗ and /ZW|∗ →ZW{∗∗ and statistical inference is based
the 10% and 90% (Panel C) quantiles of the respective empirical distribution of the residuals are chosen.
series are obtained, is set to = 20. The “Full” period ranges from January 2000 to December 2017,
on 500 bootstrap replications. The amount of lags used in the VAR model, from which the residual
while “Pre-Crisis” ranges from January 2000 to December 2007 and “Post-Crisis” ranges from January
2010 to December 2017.
of
Panel A
ro
Full Pre-Crisis Post-Crisis
# stocks # stocks # stocks
•j€• ‚jm ∗∗ →n∗ -p
•j€• ‚jn∗ →m ∗
299 211 175
65 65 52
re
( = 0.05
Panel B
lP
/\W{∗∗ →\W|∗
# stocks + - # stocks + - # stocks + -
na
/\W|∗ →\W{∗∗
153 143 10 92 76 16 64 58 6
80 31 49 72 9 63 71 7 64
ur
( = 0.10
Panel C
Jo
/ZW{∗∗ →ZW|∗
# stocks + - # stocks + - # stocks + -
/ZW|∗ →ZW{∗∗
171 162 9 72 67 5 49 47 2
46 18 28 41 9 32 38 7 31
38
Nonlinearity matters:
The stock price – trading volume relation revisited
Highlights:
• We investigate the stock price – trading volume relation found in theoretical models
• We test for nonlinearities in the system of stock returns and trading volume growth
of
• The dominant nonlinear information flow is from returns to trading volume growth
ro
• Our empirical evidence highlights the nonlinear nature of this relation
-p
re
lP
na
ur
Jo