Time Series R
Time Series R
June 6, 2019
Abstract
Modelling data over time requires a set of unique and bespoke processes in order to deal with problems
induced by temporal dependency and autocorrelation. Methods are well developed and widely applied
for working with such data when sample sizes are high, but the issue of how to fit models which are
methodologically robust but not under-powered or over-fitted on short time series data remains vexing.
This research proposes an interrupted time series analysis model solution to this problem, and uses a
Type II Sum Squares ANCOVA Lagged Dependent Variable, variance-centric approach as part of a newly
introduced R package - its.analysis. Using this model switches the null hypothesis situation to a much
more reliable test and allows for a much more flexible approach to adding covariates in small sample
conditions. The model performs very well under test conditions, appears more conservative than existing
alternative techniques, and as such is recommended to researchers for future analysis of temporal data
Abstract
I would like to thank the authors of the following R packages which the its.analysis package imports:
plyr, car, stats, graphics, grDevices, boot, and forecast. I would also like to thank each of Greg Love
and Lorien Jasney for their feedback on the paper and the model, to Chris Hanretty for testing and
given feedback on an early version of the package, to all three colleagues and to Will Jennings for their
ries Framework
The its.analysis R package1 contains two functions for running interrupted time series analysis (ITSA),
using a Type II Sum Squares, ANCOVA, Lagged-Dependent Variable design. The ‘quasi-experimental’
ITSA approach is used frequently in the biomedical and biological sciences, but not so much in the social
sciences. ITSA considers a that naturally-progressing dependent time series can potentially be ‘interrupted’
by an exogenous, independent factor which causes significant change in its variation over time. In this sense,
we can view a dependent time series as being ‘treated’ by different, exogenous conditions (of an independent
Traditionally, ITSA has been used to analyse the impact of critical events or major sudden changes
(to legislation, for example) to time series. However, we can also consider moments of threshold reaching,
sudden trend-alteration, or rapid acceleration in independent variable time series to be ’interruptions’ which
then have lasting and quantifiable impacts on dependent series. Time in this sense can be split into periods
according to different moments of ‘pre’, ‘during’, and ‘post interruption’. Equally, time could be split into two
factors (‘pre- and post- interruption’), which would be most common perhaps in instances of (non-reversed)
policy changes. Whichever approach is used, the design offers a different way of inspecting the association
between two temporally related variables using statistical methods than classical models, and - crucially -
an approach which is much more flexible and yet robust when working with small samples.
ITSA has already previously and comprehensively been discussed as a potential solution for estimating
temporal relationships in short time series data while still abiding by the requirements to accurately model
time-serial processes (Linden, 2015; Linden and Adams, 2011; Crosbie, 1993; Grottman, 1981; Bloom, 2003).
Both Grottman (1981) and Crosbie (1993) presented slope-estimation based alternatives to the more popular
ARIMA approach (ITSE and ITSACORR respectively), but these methods have subsequently unfortunately
More recently, Linden (2015) provided a new command for the Stata statistical programme to set up
and test time series data in the ITSA framework, which provides users with a simple and easy tool to
carry out interrupted time series analysis. Linden’s model is a ‘regression discontinuity’ approach to ITSA
which analyses the change in slope between interrupted and non-interrupted periods. The model fits a trend
variable (from time 0 to n, where n is the total number of temporal observations minus one), a dummy
variable indicating the presence of an interrupted period, and an interaction term between the time and
interruption variables to test for significant slope deviations. However, it is still the case that the regression
1 The package is available for download via the CRAN repository.
handling of autocorrelation and trend, which (as above) is simply very difficult to achieve in small sample
designs.
Modelling data over time comes with its own set of unique challenges, as normal assumptions regarding the
independence of observations and residual fits are violated by the very nature of the temporally dependent
data itself (Hyndman and Athanasopoulos, 2018; Grottman, 1981; McCleary et al., 1980; Brockwell and
Davis, 2002; Pickup, 2015). In Thinking Time Serially, Mark Pickup (2015) defines time series data as that
which has a separate observation for each time point, and each observation for the same unit of analysis.
For example, the annual GDP (gross domestic product) of a country, the percentage of a hospital’s total bed
capacity occupied by patients from day to day, or aggregate government spending preferences. Time series
analysis involves fitting models which can analyse the trend and movements in time series data (possibly
also including the impact or covariance of exogenous variables regarding said data’s movements) in an
accurate and robust fashion, controlling for the often ‘non-stationary’, dependent process of time series data
generation.
Generally speaking, time series models attempt to estimate and ‘account for’ autocorrelation in the data
by distributing various autoregressive components (including most often lagged dependent variable terms)
into the ‘right-hand side’ of the regression equation (i.e., they attempt to control temporal dependency
through estimating it as a covariate). Whichever specific approach is used to forecast and examine associ-
ations between time series, most models are likely to work on linear assumptions, or generalise the linear
processes, and should therefore be conditioned by rules common to the majority2 of such models regarding
a) minimum sample sizes, and b) the ratio between sample size and exogenous parameters (Brockwell and
Davis, 2002; Hurvich and Tsai, 1989). However, these considerations appear to be largely absent in current
time-series research, where the time-series specific assumptions and processes (such as of course modelling
and compensating for residual autocorrelation) seem to wash over the original assumptions and requirements
Further to this, we ought to be very cautious about the ability of models to properly estimate (and
thus account for) autocorrelation in small samples. In his experiments, Hyndman (2014) found that the
‘best fitting’ ARIMA models (according to AIC scores) usually applied only one or zero autoregressive
parameters on data series with fewer than 20 observations. This concurs with Solanas et al. (2010), who
reported small probabilities of models effectively detecting autocorrelation in series with less than 20 data
points. Despite these issues (and those raised above), it is quite common to see time series models (including
2 The exception being for instance LASSO models, which can be estimated with fewer observations than parameters (Reid
et al., 2016)
of only a few dozen observations. While accurately modelling time series while avoiding over-fitting is not an
easy challenge to solve, particularly given the often short and noisy nature of real-world data (Weigend et al.,
1995), it cannot be ignored that estimating slopes and deriving significance from slope-associated T-tests in
time series conditions with short data is hugely problematic (Crosbie, 1993; Hurvich and Tsai, 1989), and
can lead to “severe underestimate of the actual mean squared error” of time-serial estimates (Brockwell and
ITSA using ANCOVA frameworks can get us around the vexing and common issue of time series data
which is too short to effectively model in ARIMA or OLS frameworks, as the it can be modelled using far
less taxing (in terms of statistical power and error freedom) methods. By making use of a model similar
to a repeated-measures ANCOVA (but with additional time-series specific components), this package allows
researchers to investigate whether or not substantial changes in the trajectories, levels, or thresholds of
an independent time series have had a significant impact on a dependent series without running into such
Theoretically, the main difference between using coefficient and standard error approaches (mostly in
OLS) and an AN(C)OVA in time series is that we are not attempting to predict an over-time outcome
(level) in the dependent variable associated with change in another variable (i.e. we are not concerned with
estimating the impact on Y with a one-unit change in X over time), and as such we do not have to worry
about the accuracy or reliability of parameter prediction. Instead, we are focused on a significant difference in
the variation between time-grouped-means of a dependent variable (while accounting for variation explained
by any covariates included). We therefore move the null hypothesis situation there is no a linear relationship
between X and Y over time, to there is no significance difference between adjusted mean levels of the
dependent variable over periods of time. This allows for much greater research design flexibility, (discussed
below) whilst still retaining a rigorous and easily falsifiable alternative hypotheses situation.
able model
ITSA using the ANOVA (Analysis of Vairance) framework can get us around the vexing and common issue
of time series data which is too short to effectively model in ARIMA or OLS frameworks. ANOVAs can
be reliably modelled using far less taxing (in terms of statistical power and error freedom) methods. By
making use of a model similar to a repeated-measures ANCOVA (but with additional time-series specific
substantial changes in the trajectories, levels, or thresholds of an independent time series have had a sig-
nificant impact on a dependent series without running into such problems (see referenced paper for full
discussion).
Theoretically, the main difference between using coefficient and standard error approaches (mostly in
OLS) and an AN(C)OVA in time series is that we are not attempting to predict an over-time outcome
(level) in the dependent variable associated with change in another variable (i.e. we are not concerned with
estimating the impact on Y with a one-unit change in X over time), and as such we do not have to worry
about the accuracy or reliability of parameter prediction. Instead, we are focused on a significant difference in
the variation between time-grouped-means of a dependent variable (while accounting for variation explained
by any covariates included). We therefore move the null hypothesis situation there is no a linear relationship
between X and Y over time, to there is no significance difference between adjusted mean levels of the
dependent variable over periods of time. This allows for much greater research design flexibility, (discussed
below) whilst still retaining a rigorous and easily falsifiable alternative hypotheses situation.
Previous research has raised issue with the inability of ANOVAs to deal with residual autocorrelation,
with Crosbie (1993, p. 967) highlighting that when used in time series analysis, the key ANOVA assump-
simulation testing, Crosbie demonstrates how Type-I errors are inflated by the ANOVA design, with the
series, and 10% for somewhat autocorrelated series. This, he argued, was an unacceptable level of Type I
error control (or lack of) and as such ANOVA ITSA models ought to be discarded. The ANCOVA model
provided in the itsa.package does not fall foul of the criticisms and issues highlighted by Crosbie, and in
my own Monte-Carlo simulations, the null hypothesis rejection rate on autocorrelated series is far lower (see
below).
This is because the model presented in the package takes a new approach to the violation of independence
issue by including lagged values of the dependent variable as an automatically included covariate in the
adjusted means calculation (with the possibility for the user to add further autoregressive covariate terms
as is necessary). The model specification therefore reflects traditional lagged-dependent variable models,
which are a common first step and usually highly effective way of addressing autoregressive processes and
avoiding serial error correlation (Pickup, 2015). Such models have however been shown to be potentially
problematic (Keele and Kelly, 2006) and certainly limited (Pickup, 2015) in wider time-series analysis, but
the ANCOVA (variance focussed) design means that the lagged dependent variable performs a more specific
and generalisable function in the its.analysis model than in coefficient-estimating time series models.
effectively a ‘catch-all’ for the latent trend within the dependent variable. As such, including the lagged
dependent variable in this design not only provides a highly effective control for autocorrelation, but is also
highly effective at preventing spurious mean differences across an interrupted period being detected. Of
course, though one lagged dependent variable term is automatically fitted, further covariates may also be
manually specified to control for other (auto)correlative factors, should the user wish to do so (this should be
conducted in line with suggestions related to power and effect size on given sample sizes outlined by authors
above). Though this may be appropriate in some circumstances, it is worth keeping in mind the problems
outlined above regarding effectively and accurately estimating and modelling time-serial processes in short
data.
The its.analysis ITSA model deploys Type II Sum Squares (T2SS) in order to account for covariance.
T2SS, sometimes dubbed ‘random effects’ ANOVAs (Ståhle and Wold, 1989), account for the variance of
all other parameters included in the model before estimating a particular parameter’s variance itself. This
means we can fit covariate controls which will effectively isolate exogenous effects from temporal periods.
Type II models are also most recommended for imbalanced sample designs (Langsrud, 2003), which will be
naturally more common in time series modelling where periods of normality will most likely be far longer
than periods of abnormality (or interruption)3 . Taken together, we can describe the model as a Type II Sum
Considering the automatic inclusion of the lagged term and the ability to include more covariate compo-
nents, using the T2SS design we can express this alternative hypotheses of the ITSA model introduced by
Where Ȳ is the mean of the dependent variable from all i observations in temporal group j, n is the total
number of time-groups being compared in the analysis, and adj expressed as:
Where σ is covariance between the dependent variable Y at time t = 0, the lagged values of the dependent
variable (t − 1), and any n number of additional covariates (c1 ...) specified.
3 See however Shaw and Mitchell-Olds (1993) for discussion on how means-level focused AN(C)OVA models may not be so
The T2SS ANCOVA LDV model contained in the itsa.model() function is designed to provide users with an
easy, all-inclusive model for estimating the impact of shocks, changes, and crises on dependent time series. It
delivers a wide range of outputs by default, including point estimates of means, analysis of variance adjusted
for temporal dependency, and bootstrapped F-values. It takes as its input a dataframe, specified temporal,
dependent, independent, and covariate vectors (within the data frame). It offers users the ability to change
the alpha value against which the results of the test are measured, turn off the automatically generated
plot, turn off the automatic bootstrapping, or change the number of replicates that the bootstrap model
produces (1,000 by default). The function returns to the console a tables of group means, a table of analysis
of variance between the time periods, an R-squared statistic, a summary result of the model and assumption
tests, and sends a graph to the plot window. Assigning the function will create a summary object in the
global environment which contains all of the above as objects within a list, as well as a Tukey’s ‘Honest
Significant Difference test result’, the bootstrapped confidence intervals for all model parameters, the full
length of F-values produced by the bootstrap model, summaries of each individual assumption test, the data
used in the main model, and the residual and fitted values.
For the function to work correctly, the dependent variable must be a continuous vector, and the indepen-
dent variable must be a factor variable which identify substantively different periods of time relative to the
original independent variable series. For example, a sudden step-change in the independent variable series,
or the passing of a threshold, a quick and sustained change in a previous trend, or another quantifiable and
substantial movement in the development of the independent variable time series. Covariates may be fit using
the covariates argument in the model function, which both increases the power of the test (by accounting
for more variance) and also adjusts the variance explained by the factorial independent variable (controlling
for the competing variance of the covariate). Users should be sensible in the number of covariates fitted, and
keep in mind normal assumptions regarding multicollinearity and interaction between covariates.
Various post-estimation procedures can be ran using the itsa.postest() function. These include: a plot of
the bootstrapped F-values, a Shapiro-Wilks test for residual abnormality (overlaid on a QQ-Norm plot), a
Levene’s Test of heterogeneous variances (overlain on a boxplot), a residual v fitted plot, and an autocorre-
lation function plot. These are designed to test typical AN(C)OVA and time series model assumptions. The
model name must be defined as the object assigned to the global environment for this function to run.
By default, the its.analysis model bootstraps 1,000 F-values for each variable included in the estimation
using the bootstrap model from the boot package. Users may define an alternative number of replicates
using the Reps argument in the model function. Once 1,000 re-samples are drawn and F-values calculated
mean F-value (top and bottom 10% removed). The bootstrapping allows users to check for the precision and
reliability of the estimate reported in the table of results, which is an especially useful tool for accurately
estimating variation within small samples. A summary statement containing the lower and upper 95%
confidence intervals, the trimmed mean F-value, and the p-value from the bootstrapped replications of the
model is reported below the main results. Users should ensure that the lower bound of the interval does not
move too close to zero, that the mean value of the bootstrap does not deviate too far from the F-statistic
reported in the model, and that the p-value is consistent with that reported in the main model result.
4 An example
The model performed very well with simulation data4 , and a ‘real-world’ examination of the model was
carried out using the the case of immigration in Britain. Figure 1 below shows annual estimated rates of
immigration into Britain from 1985 through until 2015. The figures are provided by the Office for National
Statistics5 . Interruption points are also plotted via the dotted red lines in 1997 (where a stark rise departs
from the trend in the previous years) and in 2006 (when the previous rising rates of immigration stabilises
somewhat).
Source: ONS - ‘Long-term international migration 2.00 Citizenship, UK’. Non-UK Citizenship Immigration
4 Please see the package GitHub page for simulation code: http://www.github.com/patrick-eng/its.analysis/GitUploads.
5 The specific ONS database is called: Long-term international migration 2.00 Citizenship, UK, and is available on-line
one year (to 1998 and 2007 respectively) in order to allow for public attitudes to ‘catch up’ with the change
occurring in the previous year. This ought to be standard practice when considering public attitudes as a
response variable in the ITSA framework. This partitioned series was then fitted to the its.analysis mode
as the independent time-period variable (labelled as the ‘interrupt var’ argument in the model function),
with British public opinion according to figures reported in English (2018b,a), over the same period fitted as
the dependent time series. The public opinion series is an aggregate measure of responses to surveys asking
questions about immigration. It ranges from 0 to 1, whereby 1 is the most anti-immigrant position for the
public possible (100% opposition). Values can take any possible position on the scale. The model reported
the following result: Significant variation between time periods (p < 0.05). Mean public opinion levels prior
to the 1997 rise in immigration were at 0.50, rose to 0.58 between 1997 and 2006, and then shifted again to
0.62 after this time. Table 2 below shows the full model output.
Residuals 6.25 26
Result: Significant difference from interruption. 29 Observations. Model R-sq: 77%. Note that public
opinion variable has been means-centred for easier Sum-sq interpretation. Same test result found from
Figure 2 below shows the plot automatically generated by the model function. It is very similar to that
provided in the package by Linden (2015), but no slopes are estimated and a trend line is drawn instead
of plotted points. We can see a three distinct trends in British public opinion between the time periods.
Firstly, a period of high volatility but (comparatively) low mean anti-immigrant sentiment, which is then
followed by an almost linear rise in the middle (interruption) period, before the series levels off somewhat
Figure 2 - ITSA model plot - British immigration opinions and immigration rates
There is sufficient evidence to reject the null hypothesis that different immigration time-periods are not
associated with changing public opinion about immigration, from this example. From inspecting the change
in means, we can conclude from the model output that the rise in immigration from 1997 through until 2004
did cause a significant shift in anti-immigrant public opinion, which then largely levelled out over the course
of the rest of the time series as immigration remained steady at between 500,000 and 600,000 per annum.
From the bootstrapped confidence intervals, though the median F-value is sufficiently close to that of the
model estimate, the spread of the 95% range is quite large. However, the lower interval of 1.59 remains
sufficiently high.
Of course, as previously noted, it is also possible to add further covariates to the its.analysis ANCOVA
model, and this illustrative result is by no means a substantive conclusion about the relationship between
immigration and immigration opinions in Great Britain. For example, the above positive result regarding
the impact of different periods of immigration on public opinion could be completely reversed perhaps by
10
Time series methodology continues to be a subject of great debate and discussion across many fields. From
economics to psychology and political science to biology, the question of how to effectively model data over
time to produce reliable, consistent, and unbiased results is of utmost importance. Nowhere more so do
these questions remain a thorn in the side of researchers than in cases of short time series data, where we are
straitjacketed by the demands of estimating and modelling autocorrelation on the one hand and a distinct
Instead of attempting to ascertain whether or not an exogenous interruption to a trending series results
in unequal slopes, the T2SS ANCOVA LDV ITSA model proposed here investigates whether interruptions
produce unequal means and variance in the dependent time series. By switching from estimating (partial)
parameter effects to comparisons of temporally-adjusted group means, we not only reduce the demands on
minimum sample sizes for model estimation itself but also relax the impact of adding covariates - including
lagged dependent variable terms or other time-serial components - into the calculation. Testing the model on
real world data and under Monte-Carlo simulation6 conditions (on almost 40,000 datasets) demonstrated its
strengths in accounting for autocorrelation and avoiding spurious results, while still demonstrating the ability
to also contain Type II (false negative) errors where appropriate. Even in the most serially autocorrelated
circumstances, the probability of Type I errors was well below 0.1. Users should be confident in the model’s
conservatism, but could add an extra level of certainty here by reducing alpha to 0.01 (which is a specifiable
object in the model command, of course). The lagged dependent variable fit, the ability to manually reduce
alpha, inspect and test all relevant assumptions, and bootstrapped F-values build accuracy and robustness
to the model which is otherwise not present in other techniques current available for estimating short time
series.
Finally, the question begs as to the suggested minimum and maximum sample sizes that the its.analysis
package ought to be used with. On the first, this is certainly at least in part conditioned by a) the number
of temporal periods under analysis (the number of factors in the grouping variable), and b) a suggested
minimum of 7 observations per group in line with Vanvoorhis and Morgan (2007). In this sense then the
minimum acceptable sample size ought to be 15 with two time periods. This should inflate to a minimum
of 21 for three-time periods (assuming an equal distribution of time between them). Further Monte-Carlo
simulations were ran to establish the null hypothesis rejection probability under different sample size and
covariate combinations. In summary, the notional Type I error probability for sample sizes of 15 with no
covariates registered at around 0.13, with the post-estimation controlled Type I error rate probability at 0.07.
6 Please see the package GitHub page for simulation code: http://www.github.com/patrick-eng/its.analysis/GitUploads.
11
at this level of sample size by reducing reducing alpha in the model command. For example, pulling alpha
down to 0.01 produces a Type I error rate of 3.5% without post-estimation and 1.5% with post-estimation
rejection included.
Of course, AN(C)OVA based approaches such as this ought not to be considered superior to classical
time series analysis techniques when sufficient sample sizes allow, but their here-demonstrated strong ability
to effectively a) handle and eliminate residual autocorrelation, b) account for the influence of dependent
variable trend on change over time, c) provide sufficient levels of conservatism and scepticism on model
(F-value) results, and yet d) provide the correct substantive conclusions in the testing carried out for this
research, highlights how useful they can be for small n studies. I recommend that researchers take seriously
the possibility of switching to this variance-based model over the estimation of coefficients and standard
errors when fitting short time series models, and that following researchers apply further rigorous testing of
model performance in a wider set of real-world scenarios than there was scope to do here.
12
Bloom, H. S. (2003). Using ‘Short’ Interrupted Time-Series Analysis to Measure the Impacts of Whole-School
Brockwell, P. J. and R. A. Davis (2002). Introduction to Time Series and Forecasting (2nd ed.). London:
Springer.
Crosbie, J. (1993). Interrupted Time-Series Analysis With Brief Single-Subject Data. Journal of Consulting
English, P. (2018a). Thermostatic public opinion: why UK anti-immigrant sentiments rise and then fall.
English, P. (2018b). Visibly Restricted: Public Opinion and the Representation of Immigrant Origin Com-
Grottman, J. M. (1981). Time Series Analysis: A Comprehensive Introduction for Social Scientists. Cam-
Huitema, B. E., J. W. Mckean, and S. Laraway (2007). Time-Series Intervention Analysis Using ITSACORR:
Hurvich, C. M. and C.-L. Tsai (1989). Regression and time series model selection in small samples.
Keele, L. and N. J. Kelly (2006). Dynamic Models for Dynamic Theories: The Ins and Outs of Lagged
Langsrud, Ø. (2003). instead of Type III sums of squares. Statistics and Computing 13, 163–167.
Linden, A. (2015). Conducting Interrupted Time-series Analysis for Single- and Multiple-group Comparisons.
Linden, A. and J. L. Adams (2011). Applying a propensity score-based weighting model to interrupted
time series data: improving causal inference in programme evaluation. Journal of Evaluation in Clinical
13
Pickup, M. (2015). Introduction to Time Series Analysis. Thousand Oaks: Sage Publications.
Reid, S., R. Tibshirani, and J. Friedman (2016). A Study of Error Variance Estimation in Lasso Regression.
Shaw, R. G. and T. Mitchell-Olds (1993). Anova for Unbalanced Data: An Overview. Ecology 74 (6),
1638–1645.
Solanas, A., R. Manolov, and V. Sierra (2010). Lag-one autocorrelation in short series: Estimation and
Ståhle, L. and S. Wold (1989). Analysis of Variance. Chemometrics and Intelligent Laboratory Systems 6,
259–272.
Vanvoorhis, C. R. W. and B. L. Morgan (2007). Understanding Power and Rules of Thumb for Determining
Weigend, A. S., M. Mangeas, and A. N. Srivastava (1995). Nonlinear gated experts for time series: discovering
14