0% found this document useful (0 votes)
121 views14 pages

Time Series R

The document discusses the its.analysis R package for modeling short time series data using interrupted time series analysis. It reviews current time series methods and the interrupted time series framework. The package uses a Type II Sum Squares ANCOVA Lagged Dependent Variable design to analyze the impact of events or changes on a time series while being robust for small sample sizes between 15-45 observations, where traditional time series models often perform poorly. The method switches the null hypothesis and allows for a more flexible approach to adding covariates compared to existing techniques.

Uploaded by

VarunKatyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views14 pages

Time Series R

The document discusses the its.analysis R package for modeling short time series data using interrupted time series analysis. It reviews current time series methods and the interrupted time series framework. The package uses a Type II Sum Squares ANCOVA Lagged Dependent Variable design to analyze the impact of events or changes on a time series while being robust for small sample sizes between 15-45 observations, where traditional time series models often perform poorly. The method switches the null hypothesis and allows for a more flexible approach to adding covariates compared to existing techniques.

Uploaded by

VarunKatyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

The its.

analysis R package - Modelling short time series data

Patrick English University of Exeter

June 6, 2019

Abstract

Modelling data over time requires a set of unique and bespoke processes in order to deal with problems

induced by temporal dependency and autocorrelation. Methods are well developed and widely applied

for working with such data when sample sizes are high, but the issue of how to fit models which are

methodologically robust but not under-powered or over-fitted on short time series data remains vexing.

This research proposes an interrupted time series analysis model solution to this problem, and uses a

Type II Sum Squares ANCOVA Lagged Dependent Variable, variance-centric approach as part of a newly

introduced R package - its.analysis. Using this model switches the null hypothesis situation to a much

more reliable test and allows for a much more flexible approach to adding covariates in small sample

conditions. The model performs very well under test conditions, appears more conservative than existing

alternative techniques, and as such is recommended to researchers for future analysis of temporal data

where observations are limited (between 15 and 45 observations).

Abstract

I would like to thank the authors of the following R packages which the its.analysis package imports:

plyr, car, stats, graphics, grDevices, boot, and forecast. I would also like to thank each of Greg Love

and Lorien Jasney for their feedback on the paper and the model, to Chris Hanretty for testing and

given feedback on an early version of the package, to all three colleagues and to Will Jennings for their

guidance in constructing model features.

Electronic copy available at: https://ssrn.com/abstract=3398189


1 Reviewing Time Series Methods and the Interrupted Time Se-

ries Framework

The its.analysis R package1 contains two functions for running interrupted time series analysis (ITSA),

using a Type II Sum Squares, ANCOVA, Lagged-Dependent Variable design. The ‘quasi-experimental’

ITSA approach is used frequently in the biomedical and biological sciences, but not so much in the social

sciences. ITSA considers a that naturally-progressing dependent time series can potentially be ‘interrupted’

by an exogenous, independent factor which causes significant change in its variation over time. In this sense,

we can view a dependent time series as being ‘treated’ by different, exogenous conditions (of an independent

variable), and hence specify the ‘quasi-experimental’ model approach.

Traditionally, ITSA has been used to analyse the impact of critical events or major sudden changes

(to legislation, for example) to time series. However, we can also consider moments of threshold reaching,

sudden trend-alteration, or rapid acceleration in independent variable time series to be ’interruptions’ which

then have lasting and quantifiable impacts on dependent series. Time in this sense can be split into periods

according to different moments of ‘pre’, ‘during’, and ‘post interruption’. Equally, time could be split into two

factors (‘pre- and post- interruption’), which would be most common perhaps in instances of (non-reversed)

policy changes. Whichever approach is used, the design offers a different way of inspecting the association

between two temporally related variables using statistical methods than classical models, and - crucially -

an approach which is much more flexible and yet robust when working with small samples.

ITSA has already previously and comprehensively been discussed as a potential solution for estimating

temporal relationships in short time series data while still abiding by the requirements to accurately model

time-serial processes (Linden, 2015; Linden and Adams, 2011; Crosbie, 1993; Grottman, 1981; Bloom, 2003).

Both Grottman (1981) and Crosbie (1993) presented slope-estimation based alternatives to the more popular

ARIMA approach (ITSE and ITSACORR respectively), but these methods have subsequently unfortunately

been shown to be fatally problematic (Huitema et al., 2007).

More recently, Linden (2015) provided a new command for the Stata statistical programme to set up

and test time series data in the ITSA framework, which provides users with a simple and easy tool to

carry out interrupted time series analysis. Linden’s model is a ‘regression discontinuity’ approach to ITSA

which analyses the change in slope between interrupted and non-interrupted periods. The model fits a trend

variable (from time 0 to n, where n is the total number of temporal observations minus one), a dummy

variable indicating the presence of an interrupted period, and an interaction term between the time and

interruption variables to test for significant slope deviations. However, it is still the case that the regression
1 The package is available for download via the CRAN repository.

Electronic copy available at: https://ssrn.com/abstract=3398189


discontinuity approach to ITSA relies upon effective and robust slope (change) estimation, as well as proper

handling of autocorrelation and trend, which (as above) is simply very difficult to achieve in small sample

designs.

Modelling data over time comes with its own set of unique challenges, as normal assumptions regarding the

independence of observations and residual fits are violated by the very nature of the temporally dependent

data itself (Hyndman and Athanasopoulos, 2018; Grottman, 1981; McCleary et al., 1980; Brockwell and

Davis, 2002; Pickup, 2015). In Thinking Time Serially, Mark Pickup (2015) defines time series data as that

which has a separate observation for each time point, and each observation for the same unit of analysis.

For example, the annual GDP (gross domestic product) of a country, the percentage of a hospital’s total bed

capacity occupied by patients from day to day, or aggregate government spending preferences. Time series

analysis involves fitting models which can analyse the trend and movements in time series data (possibly

also including the impact or covariance of exogenous variables regarding said data’s movements) in an

accurate and robust fashion, controlling for the often ‘non-stationary’, dependent process of time series data

generation.

Generally speaking, time series models attempt to estimate and ‘account for’ autocorrelation in the data

by distributing various autoregressive components (including most often lagged dependent variable terms)

into the ‘right-hand side’ of the regression equation (i.e., they attempt to control temporal dependency

through estimating it as a covariate). Whichever specific approach is used to forecast and examine associ-

ations between time series, most models are likely to work on linear assumptions, or generalise the linear

processes, and should therefore be conditioned by rules common to the majority2 of such models regarding

a) minimum sample sizes, and b) the ratio between sample size and exogenous parameters (Brockwell and

Davis, 2002; Hurvich and Tsai, 1989). However, these considerations appear to be largely absent in current

time-series research, where the time-series specific assumptions and processes (such as of course modelling

and compensating for residual autocorrelation) seem to wash over the original assumptions and requirements

of the underlying models being used.

Further to this, we ought to be very cautious about the ability of models to properly estimate (and

thus account for) autocorrelation in small samples. In his experiments, Hyndman (2014) found that the

‘best fitting’ ARIMA models (according to AIC scores) usually applied only one or zero autoregressive

parameters on data series with fewer than 20 observations. This concurs with Solanas et al. (2010), who

reported small probabilities of models effectively detecting autocorrelation in series with less than 20 data

points. Despite these issues (and those raised above), it is quite common to see time series models (including
2 The exception being for instance LASSO models, which can be estimated with fewer observations than parameters (Reid

et al., 2016)

Electronic copy available at: https://ssrn.com/abstract=3398189


ITSA models) with multiple parameters (including distributed autoregressive terms) fitted to sample sizes

of only a few dozen observations. While accurately modelling time series while avoiding over-fitting is not an

easy challenge to solve, particularly given the often short and noisy nature of real-world data (Weigend et al.,

1995), it cannot be ignored that estimating slopes and deriving significance from slope-associated T-tests in

time series conditions with short data is hugely problematic (Crosbie, 1993; Hurvich and Tsai, 1989), and

can lead to “severe underestimate of the actual mean squared error” of time-serial estimates (Brockwell and

Davis, 2002, p. 169).

ITSA using ANCOVA frameworks can get us around the vexing and common issue of time series data

which is too short to effectively model in ARIMA or OLS frameworks, as the it can be modelled using far

less taxing (in terms of statistical power and error freedom) methods. By making use of a model similar

to a repeated-measures ANCOVA (but with additional time-series specific components), this package allows

researchers to investigate whether or not substantial changes in the trajectories, levels, or thresholds of

an independent time series have had a significant impact on a dependent series without running into such

problems (see referenced paper for full discussion).

Theoretically, the main difference between using coefficient and standard error approaches (mostly in

OLS) and an AN(C)OVA in time series is that we are not attempting to predict an over-time outcome

(level) in the dependent variable associated with change in another variable (i.e. we are not concerned with

estimating the impact on Y with a one-unit change in X over time), and as such we do not have to worry

about the accuracy or reliability of parameter prediction. Instead, we are focused on a significant difference in

the variation between time-grouped-means of a dependent variable (while accounting for variation explained

by any covariates included). We therefore move the null hypothesis situation there is no a linear relationship

between X and Y over time, to there is no significance difference between adjusted mean levels of the

dependent variable over periods of time. This allows for much greater research design flexibility, (discussed

below) whilst still retaining a rigorous and easily falsifiable alternative hypotheses situation.

2 The Type II Sum Squares ANCOVA Lagged Dependent Vari-

able model

ITSA using the ANOVA (Analysis of Vairance) framework can get us around the vexing and common issue

of time series data which is too short to effectively model in ARIMA or OLS frameworks. ANOVAs can

be reliably modelled using far less taxing (in terms of statistical power and error freedom) methods. By

making use of a model similar to a repeated-measures ANCOVA (but with additional time-series specific

Electronic copy available at: https://ssrn.com/abstract=3398189


components), the its.analysis R package presented in this allows researchers to investigate whether or not

substantial changes in the trajectories, levels, or thresholds of an independent time series have had a sig-

nificant impact on a dependent series without running into such problems (see referenced paper for full

discussion).

Theoretically, the main difference between using coefficient and standard error approaches (mostly in

OLS) and an AN(C)OVA in time series is that we are not attempting to predict an over-time outcome

(level) in the dependent variable associated with change in another variable (i.e. we are not concerned with

estimating the impact on Y with a one-unit change in X over time), and as such we do not have to worry

about the accuracy or reliability of parameter prediction. Instead, we are focused on a significant difference in

the variation between time-grouped-means of a dependent variable (while accounting for variation explained

by any covariates included). We therefore move the null hypothesis situation there is no a linear relationship

between X and Y over time, to there is no significance difference between adjusted mean levels of the

dependent variable over periods of time. This allows for much greater research design flexibility, (discussed

below) whilst still retaining a rigorous and easily falsifiable alternative hypotheses situation.

Previous research has raised issue with the inability of ANOVAs to deal with residual autocorrelation,

with Crosbie (1993, p. 967) highlighting that when used in time series analysis, the key ANOVA assump-

tion regarding ‘independent observations’ is completely violated by the serial-autocorrelation of data. In

simulation testing, Crosbie demonstrates how Type-I errors are inflated by the ANOVA design, with the

probability of null-hypothesis rejection on a series of 30 observations being 50% in extremely autocorrelated

series, and 10% for somewhat autocorrelated series. This, he argued, was an unacceptable level of Type I

error control (or lack of) and as such ANOVA ITSA models ought to be discarded. The ANCOVA model

provided in the itsa.package does not fall foul of the criticisms and issues highlighted by Crosbie, and in

my own Monte-Carlo simulations, the null hypothesis rejection rate on autocorrelated series is far lower (see

below).

This is because the model presented in the package takes a new approach to the violation of independence

issue by including lagged values of the dependent variable as an automatically included covariate in the

adjusted means calculation (with the possibility for the user to add further autoregressive covariate terms

as is necessary). The model specification therefore reflects traditional lagged-dependent variable models,

which are a common first step and usually highly effective way of addressing autoregressive processes and

avoiding serial error correlation (Pickup, 2015). Such models have however been shown to be potentially

problematic (Keele and Kelly, 2006) and certainly limited (Pickup, 2015) in wider time-series analysis, but

the ANCOVA (variance focussed) design means that the lagged dependent variable performs a more specific

and generalisable function in the its.analysis model than in coefficient-estimating time series models.

Electronic copy available at: https://ssrn.com/abstract=3398189


Namely, as the ANCOVA covariate fitting process is means-adjusting, the lagged dependent variable is

effectively a ‘catch-all’ for the latent trend within the dependent variable. As such, including the lagged

dependent variable in this design not only provides a highly effective control for autocorrelation, but is also

highly effective at preventing spurious mean differences across an interrupted period being detected. Of

course, though one lagged dependent variable term is automatically fitted, further covariates may also be

manually specified to control for other (auto)correlative factors, should the user wish to do so (this should be

conducted in line with suggestions related to power and effect size on given sample sizes outlined by authors

above). Though this may be appropriate in some circumstances, it is worth keeping in mind the problems

outlined above regarding effectively and accurately estimating and modelling time-serial processes in short

data.

The its.analysis ITSA model deploys Type II Sum Squares (T2SS) in order to account for covariance.

T2SS, sometimes dubbed ‘random effects’ ANOVAs (Ståhle and Wold, 1989), account for the variance of

all other parameters included in the model before estimating a particular parameter’s variance itself. This

means we can fit covariate controls which will effectively isolate exogenous effects from temporal periods.

Type II models are also most recommended for imbalanced sample designs (Langsrud, 2003), which will be

naturally more common in time series modelling where periods of normality will most likely be far longer

than periods of abnormality (or interruption)3 . Taken together, we can describe the model as a Type II Sum

Squares, ANCOVA, Lagged Dependent Variable interrupted time series model.

Considering the automatic inclusion of the lagged term and the ability to include more covariate compo-

nents, using the T2SS design we can express this alternative hypotheses of the ITSA model introduced by

this new R package, as:

adj(Ȳij1 ) 6= adj(Ȳij2 ) 6= ...adj(Ȳijn )

Where Ȳ is the mean of the dependent variable from all i observations in temporal group j, n is the total

number of time-groups being compared in the analysis, and adj expressed as:

adj = σ(Yt , Yt−1 , c1 ...cn )

Where σ is covariance between the dependent variable Y at time t = 0, the lagged values of the dependent

variable (t − 1), and any n number of additional covariates (c1 ...) specified.
3 See however Shaw and Mitchell-Olds (1993) for discussion on how means-level focused AN(C)OVA models may not be so

readily susceptible to problems induced by imbalanced samples.

Electronic copy available at: https://ssrn.com/abstract=3398189


3 The its.analysis package

The T2SS ANCOVA LDV model contained in the itsa.model() function is designed to provide users with an

easy, all-inclusive model for estimating the impact of shocks, changes, and crises on dependent time series. It

delivers a wide range of outputs by default, including point estimates of means, analysis of variance adjusted

for temporal dependency, and bootstrapped F-values. It takes as its input a dataframe, specified temporal,

dependent, independent, and covariate vectors (within the data frame). It offers users the ability to change

the alpha value against which the results of the test are measured, turn off the automatically generated

plot, turn off the automatic bootstrapping, or change the number of replicates that the bootstrap model

produces (1,000 by default). The function returns to the console a tables of group means, a table of analysis

of variance between the time periods, an R-squared statistic, a summary result of the model and assumption

tests, and sends a graph to the plot window. Assigning the function will create a summary object in the

global environment which contains all of the above as objects within a list, as well as a Tukey’s ‘Honest

Significant Difference test result’, the bootstrapped confidence intervals for all model parameters, the full

length of F-values produced by the bootstrap model, summaries of each individual assumption test, the data

used in the main model, and the residual and fitted values.

For the function to work correctly, the dependent variable must be a continuous vector, and the indepen-

dent variable must be a factor variable which identify substantively different periods of time relative to the

original independent variable series. For example, a sudden step-change in the independent variable series,

or the passing of a threshold, a quick and sustained change in a previous trend, or another quantifiable and

substantial movement in the development of the independent variable time series. Covariates may be fit using

the covariates argument in the model function, which both increases the power of the test (by accounting

for more variance) and also adjusts the variance explained by the factorial independent variable (controlling

for the competing variance of the covariate). Users should be sensible in the number of covariates fitted, and

keep in mind normal assumptions regarding multicollinearity and interaction between covariates.

Various post-estimation procedures can be ran using the itsa.postest() function. These include: a plot of

the bootstrapped F-values, a Shapiro-Wilks test for residual abnormality (overlaid on a QQ-Norm plot), a

Levene’s Test of heterogeneous variances (overlain on a boxplot), a residual v fitted plot, and an autocorre-

lation function plot. These are designed to test typical AN(C)OVA and time series model assumptions. The

model name must be defined as the object assigned to the global environment for this function to run.

By default, the its.analysis model bootstraps 1,000 F-values for each variable included in the estimation

using the bootstrap model from the boot package. Users may define an alternative number of replicates

using the Reps argument in the model function. Once 1,000 re-samples are drawn and F-values calculated

Electronic copy available at: https://ssrn.com/abstract=3398189


for each, the bootstrap model then computes a p-value for each bootstrapped parameter using the trimmed

mean F-value (top and bottom 10% removed). The bootstrapping allows users to check for the precision and

reliability of the estimate reported in the table of results, which is an especially useful tool for accurately

estimating variation within small samples. A summary statement containing the lower and upper 95%

confidence intervals, the trimmed mean F-value, and the p-value from the bootstrapped replications of the

model is reported below the main results. Users should ensure that the lower bound of the interval does not

move too close to zero, that the mean value of the bootstrap does not deviate too far from the F-statistic

reported in the model, and that the p-value is consistent with that reported in the main model result.

4 An example

The model performed very well with simulation data4 , and a ‘real-world’ examination of the model was

carried out using the the case of immigration in Britain. Figure 1 below shows annual estimated rates of

immigration into Britain from 1985 through until 2015. The figures are provided by the Office for National

Statistics5 . Interruption points are also plotted via the dotted red lines in 1997 (where a stark rise departs

from the trend in the previous years) and in 2006 (when the previous rising rates of immigration stabilises

somewhat).

Figure 1 - Immigration into Britain - an ITSA Framework

Source: ONS - ‘Long-term international migration 2.00 Citizenship, UK’. Non-UK Citizenship Immigration
4 Please see the package GitHub page for simulation code: http://www.github.com/patrick-eng/its.analysis/GitUploads.
5 The specific ONS database is called: Long-term international migration 2.00 Citizenship, UK, and is available on-line

Electronic copy available at: https://ssrn.com/abstract=3398189


As the dependent variable is estimated public opinion, the interruption partitioning was dropped back by

one year (to 1998 and 2007 respectively) in order to allow for public attitudes to ‘catch up’ with the change

occurring in the previous year. This ought to be standard practice when considering public attitudes as a

response variable in the ITSA framework. This partitioned series was then fitted to the its.analysis mode

as the independent time-period variable (labelled as the ‘interrupt var’ argument in the model function),

with British public opinion according to figures reported in English (2018b,a), over the same period fitted as

the dependent time series. The public opinion series is an aggregate measure of responses to surveys asking

questions about immigration. It ranges from 0 to 1, whereby 1 is the most anti-immigrant position for the

public possible (100% opposition). Values can take any possible position on the scale. The model reported

the following result: Significant variation between time periods (p < 0.05). Mean public opinion levels prior

to the 1997 rise in immigration were at 0.50, rose to 0.58 between 1997 and 2006, and then shifted again to

0.62 after this time. Table 2 below shows the full model output.

Table 2 - its.analysis model results on British immigration data

Parameter Sum-Sq Degrees F. F-value P-value

Immigration Periods 2.90 2 6.035 0.007

Lag Public Opinion 2.26 1 9.405 0.005

Residuals 6.25 26

Lower 95% CI Upper 95% CI Mean F-value P-value

Bootstrap Results [1.61 18.44] 7.10 0.003

Result: Significant difference from interruption. 29 Observations. Model R-sq: 77%. Note that public

opinion variable has been means-centred for easier Sum-sq interpretation. Same test result found from

its.analysis model if interruption is dichotomous around 1997.

Figure 2 below shows the plot automatically generated by the model function. It is very similar to that

provided in the package by Linden (2015), but no slopes are estimated and a trend line is drawn instead

of plotted points. We can see a three distinct trends in British public opinion between the time periods.

Firstly, a period of high volatility but (comparatively) low mean anti-immigrant sentiment, which is then

followed by an almost linear rise in the middle (interruption) period, before the series levels off somewhat

Electronic copy available at: https://ssrn.com/abstract=3398189


(and begins to fall somewhat) in the final period.

Figure 2 - ITSA model plot - British immigration opinions and immigration rates

There is sufficient evidence to reject the null hypothesis that different immigration time-periods are not

associated with changing public opinion about immigration, from this example. From inspecting the change

in means, we can conclude from the model output that the rise in immigration from 1997 through until 2004

did cause a significant shift in anti-immigrant public opinion, which then largely levelled out over the course

of the rest of the time series as immigration remained steady at between 500,000 and 600,000 per annum.

From the bootstrapped confidence intervals, though the median F-value is sufficiently close to that of the

model estimate, the spread of the 95% range is quite large. However, the lower interval of 1.59 remains

sufficiently high.

Of course, as previously noted, it is also possible to add further covariates to the its.analysis ANCOVA

model, and this illustrative result is by no means a substantive conclusion about the relationship between

immigration and immigration opinions in Great Britain. For example, the above positive result regarding

the impact of different periods of immigration on public opinion could be completely reversed perhaps by

the addition of an unemployment or other economic deprivation measure covariate.

10

Electronic copy available at: https://ssrn.com/abstract=3398189


5 Concluding remarks

Time series methodology continues to be a subject of great debate and discussion across many fields. From

economics to psychology and political science to biology, the question of how to effectively model data over

time to produce reliable, consistent, and unbiased results is of utmost importance. Nowhere more so do

these questions remain a thorn in the side of researchers than in cases of short time series data, where we are

straitjacketed by the demands of estimating and modelling autocorrelation on the one hand and a distinct

lack of statistical power to do so on the other.

Instead of attempting to ascertain whether or not an exogenous interruption to a trending series results

in unequal slopes, the T2SS ANCOVA LDV ITSA model proposed here investigates whether interruptions

produce unequal means and variance in the dependent time series. By switching from estimating (partial)

parameter effects to comparisons of temporally-adjusted group means, we not only reduce the demands on

minimum sample sizes for model estimation itself but also relax the impact of adding covariates - including

lagged dependent variable terms or other time-serial components - into the calculation. Testing the model on

real world data and under Monte-Carlo simulation6 conditions (on almost 40,000 datasets) demonstrated its

strengths in accounting for autocorrelation and avoiding spurious results, while still demonstrating the ability

to also contain Type II (false negative) errors where appropriate. Even in the most serially autocorrelated

circumstances, the probability of Type I errors was well below 0.1. Users should be confident in the model’s

conservatism, but could add an extra level of certainty here by reducing alpha to 0.01 (which is a specifiable

object in the model command, of course). The lagged dependent variable fit, the ability to manually reduce

alpha, inspect and test all relevant assumptions, and bootstrapped F-values build accuracy and robustness

to the model which is otherwise not present in other techniques current available for estimating short time

series.

Finally, the question begs as to the suggested minimum and maximum sample sizes that the its.analysis

package ought to be used with. On the first, this is certainly at least in part conditioned by a) the number

of temporal periods under analysis (the number of factors in the grouping variable), and b) a suggested

minimum of 7 observations per group in line with Vanvoorhis and Morgan (2007). In this sense then the

minimum acceptable sample size ought to be 15 with two time periods. This should inflate to a minimum

of 21 for three-time periods (assuming an equal distribution of time between them). Further Monte-Carlo

simulations were ran to establish the null hypothesis rejection probability under different sample size and

covariate combinations. In summary, the notional Type I error probability for sample sizes of 15 with no

covariates registered at around 0.13, with the post-estimation controlled Type I error rate probability at 0.07.
6 Please see the package GitHub page for simulation code: http://www.github.com/patrick-eng/its.analysis/GitUploads.

11

Electronic copy available at: https://ssrn.com/abstract=3398189


Adding covariates did not significantly alter the probabilities in either case. Users can increase confidence

at this level of sample size by reducing reducing alpha in the model command. For example, pulling alpha

down to 0.01 produces a Type I error rate of 3.5% without post-estimation and 1.5% with post-estimation

rejection included.

Of course, AN(C)OVA based approaches such as this ought not to be considered superior to classical

time series analysis techniques when sufficient sample sizes allow, but their here-demonstrated strong ability

to effectively a) handle and eliminate residual autocorrelation, b) account for the influence of dependent

variable trend on change over time, c) provide sufficient levels of conservatism and scepticism on model

(F-value) results, and yet d) provide the correct substantive conclusions in the testing carried out for this

research, highlights how useful they can be for small n studies. I recommend that researchers take seriously

the possibility of switching to this variance-based model over the estimation of coefficients and standard

errors when fitting short time series models, and that following researchers apply further rigorous testing of

model performance in a wider set of real-world scenarios than there was scope to do here.

12

Electronic copy available at: https://ssrn.com/abstract=3398189


References

Bloom, H. S. (2003). Using ‘Short’ Interrupted Time-Series Analysis to Measure the Impacts of Whole-School

Reforms. Evaluation Review 27 (1), 3–49.

Brockwell, P. J. and R. A. Davis (2002). Introduction to Time Series and Forecasting (2nd ed.). London:

Springer.

Crosbie, J. (1993). Interrupted Time-Series Analysis With Brief Single-Subject Data. Journal of Consulting

and Clinical Psychology 61 (1), 966–974.

English, P. (2018a). Thermostatic public opinion: why UK anti-immigrant sentiments rise and then fall.

English, P. (2018b). Visibly Restricted: Public Opinion and the Representation of Immigrant Origin Com-

munities across Great Britain. Ethnic and Racial Studies, 1–19.

Grottman, J. M. (1981). Time Series Analysis: A Comprehensive Introduction for Social Scientists. Cam-

bridge: Cambridge University Press.

Huitema, B. E., J. W. Mckean, and S. Laraway (2007). Time-Series Intervention Analysis Using ITSACORR:

Fatal Flaws. Journal of Modern Applied Statistical Methods 6 (2), 367–379.

Hurvich, C. M. and C.-L. Tsai (1989). Regression and time series model selection in small samples.

Biometrika 76 (2), 297–307.

Hyndman, R. J. (2014). Fitting models to short time series.

Hyndman, R. J. and G. Athanasopoulos (2018). Forecasting: Principles and Practice. OTexts.

Keele, L. and N. J. Kelly (2006). Dynamic Models for Dynamic Theories: The Ins and Outs of Lagged

Dependent Variables. Political Analysis 14 (2), 186–205.

Langsrud, Ø. (2003). instead of Type III sums of squares. Statistics and Computing 13, 163–167.

Linden, A. (2015). Conducting Interrupted Time-series Analysis for Single- and Multiple-group Comparisons.

Stata Journal 15 (2), 480–500.

Linden, A. and J. L. Adams (2011). Applying a propensity score-based weighting model to interrupted

time series data: improving causal inference in programme evaluation. Journal of Evaluation in Clinical

Practice 17, 1231–1238.

13

Electronic copy available at: https://ssrn.com/abstract=3398189


McCleary, R., R. A. Hay, E. E. Meidinger, and D. McDowall (1980). Applied Time Series Analysis for the

Social Sciences. Thousand Oaks: Sage Publications.

Pickup, M. (2015). Introduction to Time Series Analysis. Thousand Oaks: Sage Publications.

Reid, S., R. Tibshirani, and J. Friedman (2016). A Study of Error Variance Estimation in Lasso Regression.

Statistica Sinica 26, 35–67.

Shaw, R. G. and T. Mitchell-Olds (1993). Anova for Unbalanced Data: An Overview. Ecology 74 (6),

1638–1645.

Solanas, A., R. Manolov, and V. Sierra (2010). Lag-one autocorrelation in short series: Estimation and

hypotheses testing. Psicológica 31, 357–381.

Ståhle, L. and S. Wold (1989). Analysis of Variance. Chemometrics and Intelligent Laboratory Systems 6,

259–272.

Vanvoorhis, C. R. W. and B. L. Morgan (2007). Understanding Power and Rules of Thumb for Determining

Sample Sizes. Tutorials in Quantitative Methods for Psychology 3 (2), 43–50.

Weigend, A. S., M. Mangeas, and A. N. Srivastava (1995). Nonlinear gated experts for time series: discovering

regimes and avoiding overfitting. International Journal of Neural Systems 6, 373–399.

14

Electronic copy available at: https://ssrn.com/abstract=3398189

You might also like