0% found this document useful (0 votes)

28 views35 pages

Mixed Frequency Data Sampling Regression Models: The R Package Midasr

The document introduces the R package midasr for regression modeling with mixed frequency data. The package allows estimating models with variables sampled at different frequencies within a MIDAS regression framework. It defines a general autoregressive MIDAS regression model with multiple variables of different frequencies and discusses model specification, estimation, validation and forecasting.

Uploaded by

Hannes schmidt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views35 pages

Mixed Frequency Data Sampling Regression Models: The R Package Midasr

Uploaded by

Hannes schmidt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

JSS Journal of Statistical Software

August 2016, Volume 72, Issue 4. doi: 10.18637/jss.v072.i04

Mixed Frequency Data Sampling Regression

Models: The R Package midasr

Eric Ghysels Virmantas Kvedaras Vaidotas Zemlys

University of North Carolina Vilnius University Vilnius University

Abstract
When modeling economic relationships it is increasingly common to encounter data
sampled at different frequencies. We introduce the R package midasr which enables esti-
mating regression models with variables sampled at different frequencies within a MIDAS
regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002).
In this article we define a general autoregressive MIDAS regression model with multiple
variables of different frequencies and show how it can be specified using the familiar R
formula interface and estimated using various optimization methods chosen by the re-
searcher. We discuss how to check the validity of the estimated model both in terms of
numerical convergence and statistical adequacy of a chosen regression specification, how
to perform model selection based on a information criterion, how to assess forecasting
accuracy of the MIDAS regression model and how to obtain a forecast aggregation of
different MIDAS regression models. We illustrate the capabilities of the package with
a simulated MIDAS regression model and give two empirical examples of application of
MIDAS regression.

Keywords: MIDAS, specification test.

1. Introduction
Regression models involving data sampled at different frequencies are of general interest. In
this document we introduce the R (R Core Team 2016) package midasr (Ghysels, Kvedaras,
and Zemlys 2016) for regression modeling with mixed frequency data based on a frame-
work put forward in recent work by Ghysels et al. (2002), Ghysels, Santa-Clara, and Valka-
nov (2006a) and Andreou, Ghysels, and Kourtellos (2010) using so called MIDAS, meaning
Mi(xed) Da(ta) S(ampling), regressions. The package is available from the Comprehensive R
Archive Network (CRAN) at http://CRAN.R-project.org/package=midasr.
In a general framework of regressions with functional constraints on parameters, the midasr
2 midasr: Mixed Frequency Data Sampling Regression Models in R

package not only provides similar functionality within a standard R framework of the model
specification comparable to that available in the usual functions lm or nls, but also deals
with an extended model specification analysis for MIDAS regressions.
Several recent surveys on the topic of MIDAS are worth mentioning at the outset. They are:
Andreou, Ghysels, and Kourtellos (2011) who review more extensively some of the material
summarized in this document, Armesto, Engemann, and Owyang (2010) who provide a very
simple introduction to MIDAS regressions and finally Ghysels and Valkanov (2012) who
discuss volatility models and mixed data sampling.
Econometric analysis of MIDAS regressions appears in Ghysels, Sinko, and Valkanov (2006b),
Andreou et al. (2010), Bai, Ghysels, and Wright (2013), Kvedaras and Račkauskas (2010),
Rodriguez and Puggioni (2010), Wohlrabe (2009), among others.
MIDAS regression can also be viewed as a reduced form representation of the linear projection
which emerges from a state space model approach – by reduced form we mean that the MIDAS
regression does not require the specification of a full state space system of equations. Bai et al.
(2013) show that in some cases the MIDAS regression is an exact representation of the Kalman
filter, in other cases it involves approximation errors which are typically small. The Kalman
filter, while clearly optimal as far as linear projection goes, has several disadvantages: (1) it
is more prone to specification errors as a full system of measurement and state equations is
required and as a consequence (2) requires a lot more parameters, which in turn results in (3)
computational complexities which often limit the scope of applications. In contrast, MIDAS
regressions – combined with forecast combination schemes if large data sets are involved (see
Andreou, Ghysels, and Kourtellos 2013) – are computationally easy to implement and are
less prone to specification errors.
The key feature of the package is its flexibility in terms of the model formulation and estima-
tion, which allows for1 :

• estimation of regression models with their parameters defined (restricted) by certain

functional constraints using the familiar R formula interface allowing any choice of
a constraint, which can be selected from a standard list or can be customized using
user-defined R functions;

• parsimonious aggregation-linked restrictions (as, e.g., in Ghysels 2013) as a special case;

• estimation of MIDAS models with many variables and (numerous) different frequencies;

• constrained, partially constrained, or unconstrained estimation of the model;

• various mixtures of restrictions/weighting schemes and also lag orders as they can be
specific to each series;

• statistical testing for the adequacy of the model specification and the imposed functional
constraint;
1
Ghysels (2013) also developed a package for MATLAB (The MathWorks Inc. 2014) which deals with
the estimation and information criteria-based specification of MIDAS regressions as well as forecasting and
nowcasting of low frequency series. All of the midasr features replicate or extend features provided by this
package. The key extensions are: the specification of any user-defined functional constraint, the inclusion of
multiple variables of different frequency and different functional constraints, the testing of the adequacy of
a chosen model specification, and the option to calculate standard errors of the parameters robust to serial
correlation and heteroscedasticity in regression disturbances.
Journal of Statistical Software 3

• information criteria and testing-based selection of models;

• forecasting and nowcasting functionality, including various forecast combinations.

Suppose {yt , t ∈ Z} is a univariate process observed at low frequency. Lags of the process
are denoted by Byt = yt−1 , where B is the low frequency lag operator. A MIDAS regression
(i)
involves linear projections using stochastic processes {xτ , τ ∈ Z}, i = 0, . . . , k, observed at
(i)
a higher frequency, i.e., for each low frequency period t = t0 we observe the process xτ at
mi ∈ N high frequency periods τ = (t0 − 1)mi + 1, . . . , t0 mi . Throughout the article we
represent ith high frequency period τ in terms of low frequency period t as τ = (t − 1)mi + j,
j = 1, . . . , mi . Note that this notation does not exclude the case mi = 1. In that case the
(i)
high frequency process xτ is observed at the same frequency as the low frequency process yt .
However we require that mi ≥ 1, such that the process yt is observed at the lowest frequency.
(i) (i)
Lags of the processes xτ are denoted by Lxτ = xτ −1 , where L is the high frequency lag
operator, which operates on the lag irrespective of the frequency of the process.
The package deals with any specification of mixed frequency regression model which can be
represented as
li
k X
X (i) (i)
yt − α1 yt−1 − · · · − αp yt−p = βj xtmi −j + εt , (1)
i=0 j=0

where we require
(0) (0) (k) (k)
E(εt |yt−1 , . . . , yt−p , xtm0 , . . . , xtm0 −li , . . . , xtmk , . . . , xtmk −lk ) = 0,

so that the Equation 1 is identified as a projection equation.

The model stated in the Equation 1 can be estimated in the usual time series regression
fashion or using a Bayesian approach. However, the number of parameters in this model d =
p + ki=0 li can be very large in terms of the number n of available observations of yt 2 . Since
P

the estimation of the model can easily become infeasible, whenever either larger differences in
frequencies or more variables and/or higher lag orders prevail, Ghysels et al. (2002) introduced
a sufficiently flexible parametric restriction to be imposed on the original parameters,
(i) (i)
βj = fi (γ i , j), j = 0, . . . , li , γ i = (γ1 , . . . , γq(i)
i
), qi ∈ N. (2)

This approach greatly reduces the number of parameters to be estimated, from d to q =

Phi
i=0 qi , which is assumed to be always considerably less than the number of observations
available at the lowest frequency. This gain is offset by the fact that (1) becomes a non-linear
model; however, if the parameters of an underlying data generating process followed a certain
functional constraint which is perfectly or well approximated by a constraint function chosen
by a researcher, then significant efficiency gains could be achieved from the imposed con-
straints. Figure 1 plots the out-of-sample prediction precision (on the left) and the parameter
estimation precision (on the right) in an unconstrained and constrained simple model with
correct and approximate restrictions (see Appendix A for details).
2
In the MIDAS literature it is common to have li ≥ mi and mi can be large, for example monthly versus
daily data.
4 midasr: Mixed Frequency Data Sampling Regression Models in R

Mean squared error Distance from true values

0.8

1.6
0.6 Constraint
Unrestricted
1.4
Correct
0.4
Incorrect
1.2
0.2

1.0
250 500 750 1000 250 500 750 1000
Sample size

Figure 1: A plot depicting efficiency gains when the correct non-linear constraint is imposed.
The left panel plots the average out-of-sample prediction accuracy against sample size. The
right panel plots the average Euclidean distance of estimated model parameters to their true
values.

As can be seen, even an incorrect constraint might be useful whenever the number of de-
grees of freedom in an unconstrained model is low and, consequently, one cannot rely on the
large sample properties of unconstrained estimators. Furthermore, this approach seems to be
necessary whenever estimation is simply infeasible because of the lack of degrees of freedom.

2. Theory
The model (1) can be rewritten in a more compact form:

α(B)yt = β(L)> xt,0 + εt , (3)

Pp j
where α(z) = 1 − j=1 αj z and

>
(0) (i) (l)
xt,0 : = xtm0 , . . . , xtmi , . . . , xtml ,
l
(l) >

X (0) (i)
β(z) = β j z j , β j = βj , . . . , βj , . . . , βj ,
j=0
>
(0) (i) (l)
Lj xt,0 : = xt,j = Lj xtm0 , . . . , Lj xtmi , . . . , Lj xtml .

In order to simplify notation, without loss of generality, a single order of the lag polynomials
is used with l being the maximum lag order. If the orders of some components of β(z) are
smaller, it is easy to set some coefficients of the polynomial equal to zero.
We require the existence of the continuous second derivative of the functional constraint
2
with respect to its parameters, i.e., ∂ γ∂ ∂fγi > . The functional constraints can vary with each
i i
variable and/or frequency, and therefore we use γ to represent a vector of all the parameters
of a restricted model with q = dim(γ) their total number.
Journal of Statistical Software 5

As will be shown in the next section, all variants of the usual linear (in terms of variables)
MIDAS regression model are covered by regression (3) via the specification of functional con-
straints. When each restriction function is an identity mapping, one obtains an unrestricted
MIDAS regression model.3

2.1. Frequency alignment

It is instructive to rewrite the model (1) in matrix notation. We start with a few examples.
Suppose yt is observed quarterly and we want to explain its variation with the variable xτ ,
which is observed monthly. Since each quarter has three months, the frequency m is 3 in
this example. Suppose we assume that the monthly data in the current and the previous
quarter has explanatory power. This means that for each quarter t we want to model yt as a
linear combination of variables x3t , x3t−1 , x3t−2 observed in the quarter t and variables yt−1
and x3(t−1) , x3(t−1)−1 , x3(t−1)−2 observed in the previous quarter t − 1. In matrix notation the
MIDAS model (1) for this example is:

y2 y1 x6 . . . x1 β0 ε2
        
 ..   ..  α +  .. .. ..   ..  +  ..  .
 . = .  1  . . .  .   . 
yn yn−1 x3n . . . x3n−5 β5 εn

By writing the model in matrix notation we transform the high-frequency variable xτ into a
low-frequency vector (x3t , . . . , x3t−5 )> . We call this transformation the frequency alignment.
Note that we require that the number of observations of xτ is exactly 3n.
Let us examine another example. Suppose we have another variable zt observed weekly which
we want to add to the model. The model (1) does not allow varying frequency ratios, so we
need to assume that each month has exactly 4 weeks. If months do not always have four
weeks, as they do in practice, one can simply think of this model as taking a fixed set of
weekly lags. The frequency m for the variable zτ is then 12. We use again the current and
previous quarter data for explaining variation in yt . This means that for quarter t we model
yt as a linear combination of variables x3t , x3t−1 , x3t−2 and z12t , z12t−1 , . . . , z12t−11 observed in
the quarter t, and variables yt−1 , x3(t−1) , . . . , x3(t−1)−2 and z12(t−1) , z12(t−1)−1 , . . . , z12(t−1)−11
observed in the quarter t − 1. The model in matrix form is then:

y2 y1 x6 . . . x1 β0 z24 . . . z1 γ0 ε2
           
 ..   ..  α +  .. .. ..   ..  +  .. .. ..   ..   .. 
 . = .  1  . . .  .   . . .  .  +  . .
yn yn−1 x3n . . . x3n−5 β5 z12n . . . z12n−23 γ23 εn

In this example we aligned xτ into a vector (x3t , . . . , x3t−5 )> and zτ into a vector (z12t , . . . ,
z12t−23 )> . Again we require that the number of observations of high frequency variables are
multiples of n, with multiplication factor being the corresponding frequencies. This is not a
restrictive assumption in practical applications as will be further explained in Section 3.
Let us return to the general case of the model (1). We align the frequency of high-frequency
(i) (i) (i)
variable xτ by transforming it to the low-frequency vector (xtmi , xtmi −1 , . . . , xtmi −l )> . The

3
See Foroni, Marcellino, and Schumacher (2015).
6 midasr: Mixed Frequency Data Sampling Regression Models in R

model (1) is then expressed in matrix notation as follows:

 (i) 
yl yl−1 ... yl−p α1 β0 εl
      
k
 ..   .. .. ..   ..  + X X (i)  .  .
 .  +  . ,
 . = . . .  .   .  .
i=0 (i)
yn yn−1 . . . yn−p αp β ε n l

where
(i) (i) (i)
 
xumi xumi −1 ... xumi −l
 (i) (i) (i) 
x x(u+1)mi −1 ... x(u+1)mi −l 
 (u+1)mi 
 .. .. .. 
. . ... .
 
 
(i) (i) (i)
X (i) := 
 
 xtmi xtmi −1 ... xtmi −l  , (4)
 .. .. .. 

 . . ... . 

 (i) (i) (i) 
x x(n−1)mi −1 . . . x(n−1)mi −l 
 (n−1)mi 
(i) (i) (i)
xnmi xnmi −1 ... xnmi −l
and u is the smallest integer such that umi − l > 0 and u > p.
The purpose of this subsection was to show how the frequency alignment procedure turns a
MIDAS regression into a classical time series regression where all the variables are observed
at the same frequency.

2.2. Estimation
Equation 3 can be estimated directly via ordinary least squares (OLS), without restrictions
on the parameters. This is a so called U-MIDAS regression model, see Foroni et al. (2015).
Furthermore, a consistent non-parametric approach could be used to estimate the underlying
parameters of a function as, e.g., in Breitung, Roling, and Elengikal (2013). Since, none of
these approaches use a parametric functional constraint, they can be estimated using already
available R packages. The midasr package aims at the estimation of mixed frequency models
with some parametric functional constraints.
While model (3) is a linear model in terms of variables, any non-linear functional constraints
will result in non-linearities with respect to the parameters γ. Therefore, in the general case,
we use in the function midas_r the non-linear least squares (NLS) estimator of parameters γ
of a restricted model (3) as defined by
n 2
>
X
γ
b = argmin α(B)yt − f γ (L) xt,0 , (5)
γ ∈Rq d(l+1)/me

where the lag polynomial of constrained parameters is defined by

l
X
f γ (z) = f γ ,j z j
j=0

with
>
f γ ,j = f0 (γ 0 ; j), . . . , fi (γ i ; j), . . . , fk (γ k ; j)
Journal of Statistical Software 7

for each (i, j) ∈ {0, 1, . . . , k} × {0, 1, . . . , l}. A number of numerical algorithms are readily
available in R. By default, the optim optimization function is used with optional choices of
optimization algorithms in it. However, a user can also choose within the function midas_r
other procedures available in R such as nls, thus customizing the desired algorithm which is
suitable for the problem at hand.
The efficiency of the estimator and consistency of the standard errors depend on whether
the errors of the model are spherical. We leave the aspect of efficiency of estimation to be
considered by a user, however the implementation of heteroscedasticity and autocorrelation
(HAC) robust standard errors is an option available in the package sandwich (see Zeileis 2004,
Zeileis 2006).
If all the functional relations fi (·) were non-constraining identity mappings, then the NLS
estimator would be equivalent to the ordinary least squares (OLS) problem in terms of the
original parameters. For convenience, such a U-MIDAS version can be dealt with directly
using a different function midas_u of the package (see an illustration in the Section 3) or a
standard lm function, provided the alignment of data frequencies is performed as discussed
in the previous section.

2.3. Taxonomy of aggregates-based MIDAS regression models

Based on the parsimony of representation argument, the higher-frequency part of conditional
expectation of MIDAS regressions is often formulated in terms of aggregates as follows
l
k X
(i) (i)
β(L)> xt,0 =
X
βj xtmi −j
i=0 j=0
(6)
q
k X
X (i)
= λ(i)
r x̃t−r ,
i=0 r=0

with some low-frequency number of lags q ∈ N and parameter-driven low-frequency aggregates

mi
(i) (i) X (i)
x̃t−r := xt−r (δ i,r ) = wr(i) (δ i,r ; s)x(t−1−r)mi +s ,
s=1

which depend on a weighting (aggregating within a low-frequency period) function wr (δ i,r ; s)

with its parameter vector δ i,r , which, in the general case, can vary with each variable/frequency
and/or the low-frequency lag order r ∈ N. Here the aggregation weights are usually non-
(i)
negative and, for identification of parameters {λr }h,p i=0,r=0 , satisfy the normalization con-
Pmi −1
straint such as s=0 wr (δ i,r ; s) = 1. To have the weights add to one, it is convenient to
define a weighting function in the following form
(i)
ψr (δ i,r ; s)
∀ i, r wr(i) (δ i,r ; s) = Pm (i)
, s = 1, . . . , mi , (7)
i
j=1 ψr (δ i,r ; j)

(i)
given some underlying function ψr (·). Provided that the latter function is non-negatively-
valued (and the denominator is positive), the resulting weights in Equation 7 are also non-
negative. Table 1 provides a list of some underlying functions producing, within the context of
Equation 7, the usual weighting schemes with non-negative weights (whenever the parameter
8 midasr: Mixed Frequency Data Sampling Regression Models in R

(i)
Resulting (normalized) ψ(δ; s) := ψr (δ i,r ; s) Related midasr
weighting scheme P function
p
Exponential Almon lag ψ(δ; s) = exp j=1 δj s
j
, p ∈ N, nealmon
polynomial >
where δ = (δ1 , . . . , δj , . . . , δp ) ∈ Rp .
Beta (analogue of probability ψ(δ; s) = xsδ1 −1 (1 − xs )δ2 −1 , nbeta
density function) where xs := ξ + (1 − ξ)h(s),
h(s) := (s − 1)/(m − 1), with some
marginally small quantity ξ > 0, and
>
δ = (δ1 , δ2 ) ∈ R2+ .
Gompertz (analogue of proba- ψ(δ; s) = z(s)e−δ1 z(s) , gompertzp
bility density function) where z(s) = exp δ2 s , and
>
δ = (δ1 , δ2 ) ∈ R2+ .
−1
Log-Cauchy (analogue of ψ(δ; s) = s−1 δ22 + (ln s − δ1 )2 , lcauchyp
probability density function) >
where δ = (δ1 , δ2 ) ∈ R × R+ .

Nakagami (analogue of ψ(δ; s) = s2δ1 −1 exp(−δ1 /δ2 s2 ), where nakagamip

>
probability density function) δ = (δ1 , δ2 ) , δ1 ≥ 0.5, δ2 ∈ R+ .

Table 1: A sample of weighting schemes in aggregation-based MIDAS specifications.

space of underlying functions is appropriately bounded, which in some cases is also needed
for identification of parameters). In order to avoid heavy notation, indices i and r – which are
connected respectively with frequency/variable and the lag order – are dropped in the table.
Some other weighting functions which do not have a representation as in Equation 7 are also
available in the package such as (non-normalized) almonp and the polynomial specification
with step functions polystep (see Ghysels et al. 2006b for further discussion of step functions).
However, the choice of a particular weighting function in the MIDAS regression with aggre-
gates represents only one restriction imposed on β(L) out of many other choices to be made.
To see this, let us note that aggregates-based MIDAS regressions can be connected with the
following restrictions on the conditional expectation of model (3):

(i)
E α(B)yt |y t,1 , {xt,0 }lj=0 = β(L)> xt,0
q
k X
X (i)
= λ(i)
r x̃t−r
i=0 r=0
k X q mi
X X (i)
= λ(i)
r wr(i) (δ i,r ; s)x(t−1−r)mi +s
i=0 r=0 s=1
k X q mi
X X (i)
= λ(i)
r wr (δ i,r ; s)x(t−1−r)mi +s (8)
(i)
wr (·)=wr (·) i=0 r=0 s=1
k X q mi
X X (i)
= λ(i)
r w(δ i,r ; s)x(t−1−r)mi +s
wr (·)=w(·) i=0 r=0 s=1
k X q mi
X X (i)
= λ(i)
r w(δ i ; s)x(t−1−r)mi +s
δ i,r =δ i i=0 r=0 s=1
Journal of Statistical Software 9

k q X
mi
X
(i)
X (i)
= λ w(δ i ; s)x(t−1−r)mi +s ,
(i)
λr =λ(i) i=0 r=0 s=1

where y t,1 = (yt−1 , . . . , yt−p )> .

As can be seen – and leaving aside other less intuitive restrictions – depending on the choice of
a particular MIDAS specification with aggregates, one can impose restrictions on the equality
of
(i)
• the applied weighting scheme/function across variables and/or frequencies (∀ i, wr (·) =
wr (·));

• the applied weighting scheme/function across all low-frequency lags r = 0, 1, . . . , q of

aggregates (∀ r, wr (·) = w(·));

• parameters of the weighting functions in each lag (∀ r, δ i,r = δ i );

(i)
• impact of contemporaneous and lagged aggregates for all lags (∀ r, λr = λ(i) ).

Furthermore, let si stand for an enumerator of ith higher-frequency periods within a low-
frequency period. Then, noting that, given a frequency ratio mi , there is a one-to-one mapping
between higher-frequency index j ∈ N and a pair (r, si ) ∈ N × {1, 2, . . . , mi }

j = rmi + si ,

it holds that
fi (γ i ; rmi + si ) = λ(i) (i)
r wr (δ i,r ; s). (9)
Hence, it is easy to see that the aggregates-based MIDAS induces a certain periodicity of the
functional constraint fi in Equation 3 as illustrated bellow using a stylized case where all the
restrictions are imposed in Equation 8:
fi (·, 0), fi (·, 1), ... fi (·, m − 1) fi (·, m), fi (·, m + 1), ... fi (·, 2m − 1) ...
,
λ(i) w(·, 1), λ(i) w(·, 2), ... λ(i) w(·, m) λ(i) w(·, 1), λ(i) w(·, 2), ... λ(i) w(·, m) ...

for any i ∈ {0, 1, . . . , h}. From Equation 9 it is clear that any specification of MIDAS regression
models which relies on aggregates is a special case of representation (3) with just a specific
functional constraint on parameters. On the other hand, not every general constraint β(L) can
be represented using periodic aggregates. For instance, in the above characterized example
the correspondence necessarily breaches whenever there exists at least one frequency i, for
which none of q ∈ N satisfies l = qmi − 1.

2.4. Specification selection and adequacy testing

Besides the usual considerations about the properties of the error term, there are two main
questions about the specification of the MIDAS regression models. First, suitable functional
constraints need to be selected, since their choice will affect the precision of the model. Second,
the appropriate maximum lag orders need to be chosen.
One way to address both issues together is to use some information criterion to select the
best model in terms of the parameter restriction and the lag orders using either in- or out-of-
sample precision measures. Functions midas_r_ic_table and amidas_table of the package
10 midasr: Mixed Frequency Data Sampling Regression Models in R

allow the user to make an in-sample choice using some usual information criteria, such as
AIC and BIC, and a user-specified list of functional constraints.4
Another way is to test the adequacy of the chosen functional constraints. For instance, when-
ever the autoregressive terms in model (3) are present (p > 0), it was pointed out by Ghysels
et al. (2006b) that, in the general case, φ(L) = β(L)/α(B) will have seasonal patterns thus
corresponding to some seasonal impact of explanatory variables on the dependent one in a
pure distributed lag model (i.e., without autoregressive terms). To avoid such an effect when-
ever it is not (or is believed to be not) relevant, Clements and Galvão (2008) proposed to
us a common factor restriction which can be formulated as a common polynomial restriction
with a constraint on the polynomial β(L) to satisfy a factorization β(L) = α(B)φ(L), so that
inverting Equation 3 in terms of the polynomial α(B) leaves φ(L) unaffected, i.e., without
creating/destroying any (possibly absent) seasonal pattern of the impact of explanatory vari-
ables. However, there is little if any knowledge a priori whether the impact in the distributed
lag model should be seasonal or not. Hence, an explicit testing of adequacy of the model and,
in particular, of the imposed functional constraint is obviously useful.
Let β denote a vector of all coefficients of polynomial β(z) defined in Equation 3, while
f γ represents the corresponding vector of coefficients restricted by a (possibly incorrect)
functional constraint in f γ (z). Let βb denote the respective OLS estimates of unconstrained
model, i.e., where functional restrictions of parameters are not taken into account. Let f̂ γ :=
f γ γ =γ
b denote a vector of the corresponding quantities obtained from the restricted model
relying on the NLS estimates γ b as defined in Equation 5. Denote by α, α, b and α b γ the
corresponding vectors of coefficients of polynomial α(z), its OLS estimates in an unrestricted
>
model, and its NLS estimates in a restricted model.5 Let θ := (α> , β > )> , θ b >, β
b := (α b )> ,
>
and θ b>
e := (α >
γ , f̂ γ ) signify the corresponding vectors of all coefficients in Equation 3. Then,
under the null hypothesis of ∃ γ ∈ Rq such that f γ = β, it holds

e > A(θ e ∼ χ2 d − q ,

b − θ)
(θ b − θ)

where A is a suitable normalization matrix (see Kvedaras and Zemlys 2012 for a standard
and Kvedaras and Zemlys 2013 for a HAC-robust version of the test), and q = dim(γ)
and d = dim(θ) denotes the number of parameters in a restricted and unrestricted model,
respectively. Functions hAh_test and hAhr_test of the package implement the described
testing as will be illustrated later.

2.5. Forecasting
Let us write model (3) for period t + 1 as

yt+1 = α> y t,0 + β(L)> xt+1,0 + εt+1 , (10)

where y t,0 = (yt , . . . , yt−p+1 )> and α = (α1 , α2 , . . . , αp )> is a vector of parameters of the
autoregressive terms. This representation is well suited for (one step ahead) conditional
forecasting of yt+1 , provided that the information on the explanatory variables is available.
4
Although aimed at forecasting, the function select_and_forecast can also be used to perform the selection
of models relying on their out-of-sample performance.
5
Recall that unconstrained α elements make a subset of parameter vector γ of a constrained model.
Journal of Statistical Software 11

If it were absent, forecasts of xt+1,0 would be also necessary from a joint process of {yt , xt,0 }
which might be difficult to specify and estimate correctly, especially, bearing in mind the
presence of data with mixed frequencies. Instead, a direct approach to forecasting is often
applied in the MIDAS framework. Namely, given an information set available up to a moment
t defined by It,0 = {y t,j , xt,j }∞
j=0 , where

y t,j = (yt−j , . . . , yt−j−p+1 )>

(0) (i) (k)
xt,j = (xtm0 , . . . , xtmi , . . . , xtmk )> ,

an h-step ahead direct forecast

ỹt+h = E yt+h It,0 = α> >

h y t,0 + β h (L) xt,0 , h ∈ N (11)

can be formed leaning on a model linked to a corresponding conditional expectation

yt+h = α> >

h y t,0 + β h (L) xt,0 + εh,t , E εh,t It,0 ,

where αh and β h (L) are the respective horizon h-specific parameters. Note that, in principle,
these conditional expectations have a form of representation (3) with certain restrictions on
the original lag polynomials of coefficients. Hence, in the general case, the suitable restrictions
for each h will have a different form.
Given periods h = 1, 2, . . . , and a selected model or a list of specifications to be considered,
package midasr provides the point forecasts corresponding to the estimated analogue of Equa-
tion 11 evaluates the precision of different specifications, and performs weighted forecasting
using the framework defined in Ghysels (2013).

2.6. Alternative representations of MIDAS regressions

The model (1) represents a very general MIDAS regression representation. We give below a
sample of other popular MIDAS regression specifications from Andreou et al. (2011). These
specifications assume that only one high frequency variable is available. We denote its fre-
quency by m.

1. DL-MIDAS(pX ):
pX m−1
X X
yt+1 = µ + βrm+j x(t−r)m−j + εt+1
r=0 j=0

2. ADL-MIDAS(pY ,pX ):
pY
X pX m−1
X X
yt+1 = µ + µj yt−j + βrm+j x(t−r)m−j + εt+1
j=0 r=0 j=0

3. FADL-MIDAS(pF ,pX ,pY ):

pF
X pY
X pX m−1
X X
yt+1 = µ + αi Ft−i + µj yt−j + βrm+j x(t−r)m−j + εt+1 ,
i=0 j=0 r=0 j=0

where Ft is a factor derived from additional data.

12 midasr: Mixed Frequency Data Sampling Regression Models in R

4. ADL-MIDAS-M(pX ,pY ), or multiplicative MIDAS:

pY
X pX
X
yt+1 = µ + µj yt−j + αr Xt−r + εt+1 ,
j=0 r=0

Pmi −1
where Xt−r = j=0 βj x(t−r)m−j .

5. MIDAS with leads, where it is assumed that we have only J < m observations of high
frequency variable xτ available for period t + 1
pY
X J
X pX m−1
X X
yt+1 = µ + µj Yt−j + β−j xtm+j + βrm+j x(t−r)m−j + εt+1
j=0 j=1 r=0 j=0

The parametric restriction is usually placed on coefficients βj in these representations in

the usual manner of (2), i.e., it is assumed that βj = f (γ, j), for chosen function f with
parameters γ.
If we compare these representations with (1) the notable differences are the specification of
yt+1 instead of yt as a response variable and a preference to make the maximum high frequency
lag be a multiple of the corresponding frequency. The multiplicative MIDAS is the special
case of aggregates-based MIDAS specification. It is evident then that all these representations
are special cases of (1).
Various examples illustrating MIDAS regressions specifications together with corresponding
R code are also given in the Table 3.

3. Implementation in the midasr package

3.1. Data handling

From a data handling point of view, the key specificity of the MIDAS regression model is
that the length of observations of variables observed at various frequencies differs and needs
to be aligned as described in Section 2. There is no existing R function which performs such
a transformation and the package midasr provides a solution to these challenges. The basic
functionality of data handling is summarized in Table 2.
Function fmls(x, k, m) performs exactly the transformation defined in Equation 4, con-
verting an observation vector x of a given (potentially) higher-frequency series into the cor-
responding stacked matrix of observations of (k + 1) low-frequency series (contemporaneous
with k lags) as defined by the maximum lag order k and the frequency ratio m. For instance,
given a series of twelve observations

R> x <- 1:12

we get the following result

R> fmls(x, k = 2, m = 3)
Journal of Statistical Software 13

Function Description Example Notes

mls(x, k, m) Stacks a HF data vector x mls(x, 2:3, 3) dim x
m must be an integer (NA
into a corresponding ma- are allowed).
trix of observations at LF For m = 1, the function pro-
x
of size dim
m × dim k: from duces lags of x that are de-
the first to the last HF lag fined by vector argument k,
defined by vector k. e.g., mls(x, 2:3, 1) yields a
data set containing the lags
xt−2 and xt−3 of xt .
fmls(x, k, m) Same as mls, except that fmls(x, 2, 3) fmls(x, 2, 3) is equivalent
k is a scalar and the k + 1 to mls(x, 0:2, 3).
lags are produced starting
from 0 up to k.
dmls(x, k, m) Same as fmls, only the dmls(x, 2, 3) mls(x, 1, 1) can be used in
resulting matrix contains dmls to get stacking of lagged
k + 1 first-order HF differ- differences, e.g., dmls(mls(x,
ences of x. 1, 1), 2, 3).

Table 2: A summary of a basic data handling functionality in the package midasr.

X.0/m X.1/m X.2/m

[1,] 3 2 1
[2,] 6 5 4
[3,] 9 8 7
[4,] 12 11 10

i.e., three variables (a contemporaneous and two lags) with four low-frequency observations
(n = 12/m).
Function mls is slightly more flexible as the lags included can start from a given order rather
than from zero, whereas the function fmls uses a full lag structure. dmls performs in addition
a first-order differencing of the data which is convenient when working with integrated series.
A couple of issues should be taken into account when working with series of different frequen-
cies:

• It is assumed that the numbers of observations of different frequencies match exactly

through the frequency ratio (ni = nmi ), and the first and last observations of each
series of different frequency are correspondingly aligned (possibly using NA to account
for some missing observations for series of higher frequency).

• Because of different lengths of series of various frequencies, the data for the model
cannot be kept in one data.frame. It is expected that variables for the model are
either vectors residing in the R global environment, or are passed as elements of a list.
Variables of different frequency can be in the same data.frame, which in turn should
be an element of a list.

3.2. An example of simulated MIDAS regression

Using the above data handling functions, it is straightforward to simulate a response series
from the MIDAS regression as a data generating process (DGP). For instance, suppose one
14 midasr: Mixed Frequency Data Sampling Regression Models in R

0.4
Weights

0.2
0.0

0 5 10 15

High frequency lag

Figure 2: A plot of shapes of functional constraints of xτ (blue) and zτ (red).

is willing to generate a low-frequency response variable y in the MIDAS with two higher-
frequency series x and z where the impact parameters satisfy the exponential Almon lag
polynomials of different orders as follows:
7 16
X (1) X (2)
yt = 2 + 0.1t + βj x4t−j + βj z12t−j + εt ,
j=0 j=0 (12)
xτ1 ∼ n.i.d.(0, 1), zτ2 ∼ n.i.d.(0, 1), εt ∼ n.i.d.(0, 1),

where (xτ1 , zτ2 , εt ) are independent for any (τ1 , τ2 , t) ∈ Z3 , and

P
qi −1 (i) s
(i) (i)
exp s=1 γs j
βj = γ0 Pd −1 P
qi −1 (i) s
, i = 1, 2,
i
j=0 exp s=1 γs j

where d1 = k1 + 1 = 8 is a multiple of the frequency ratio m1 = 4, whereas d2 = k2 + 1 = 17

is not a multiple of m2 = 12. Here q1 = 2, q2 = 3 with parameterizations

γ 1 = (1, −0.5)> ,
γ 2 = (2, 0.5, −0.1)> ,

which yield the shapes of functional constraints as plotted in Figure 2.

The following R code produces a series according to the DGP characterized above:

R> set.seed(1001)
R> n <- 250
R> trend <- 1:n
R> x <- rnorm(4 * n)
R> z <- rnorm(12 * n)
R> fn_x <- nealmon(p = c(1, -0.5), d = 8)
R> fn_z <- nealmon(p = c(2, 0.5, -0.1), d = 17)
R> y <- 2 + 0.1 * trend + mls(x, 0:7, 4) %*% fn_x +
+ mls(z, 0:16, 12) %*% fn_z + rnorm(n)

It is of interest to note that the impact of variable x can be represented using aggregates-based
MIDAS, whereas the impact of z cannot.
Journal of Statistical Software 15

3.3. Some specification examples of MIDAS regressions

Suppose now that we have (only) observations of y, x, and z which are stored as vectors,
matrices, time series, or list objects in R, and our intention is to estimate a MIDAS regression
model as in Equation 12:

(a) without restricting the parameters (as in U-MIDAS) and using OLS;

(b) with the exponential Almon lag polynomial constraint on parameters (as in the function
nealmon) and using NLS.

The OLS estimation as in case (a) is straightforwardly performed using

R> eq_u <- lm(y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16, m = 12))

or, equivalently

R> eq_u <- midas_r(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12),
+ start = NULL)

Note that in this case, midas_r picks up the variables from the global R environment. It is
possible to pass the data explicitly:

R> eq_u <- midas_r(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12),
+ start = NULL, data = list(y = y, trend = trend, x = x, z = z))

The variables of the same frequency can reside in the same data.frame:

R> eq_u <- midas_r(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12),
+ start = NULL, data = list(data.frame(y = y, trend = trend), x = x,
+ z = z))

In this case, there is no need to name the data.frame element in the list.
The following R code estimates the constrained case (b) using the function midas_r and
reports the NLS estimates γ
b of parameters with the related summary statistics.

R> eq_r <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12,
+ nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
R> summary(eq_r)

Formula y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon)

Parameters:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.9881956 0.1152990 17.244 < 2e-16 ***
trend 0.0998828 0.0007769 128.571 < 2e-16 ***
x1 1.3533434 0.1512202 8.949 < 2e-16 ***
x2 -0.5075657 0.0966696 -5.251 3.32e-07 ***
z1 2.2634729 0.1728150 13.098 < 2e-16 ***
16 midasr: Mixed Frequency Data Sampling Regression Models in R

z2 0.4096532 0.1556852 2.631 0.009052 **

z3 -0.0729794 0.0203915 -3.579 0.000417 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9316 on 242 degrees of freedom

As you can see the syntax of the function midas_r is similar to the standard R function nls.
The model is specified via the familiar formula interface. The lags included and functional
restriction used can be individual to each variable and are specified within the respective
mls, fmls, or dmls function used with midas_r. It is necessary to provide a list of starting
values for each variable with restricted coefficients, since it implicitly defines the number of
parameters of the constraint functions to be used for each series.
The main difference to the function nls is that there is a greater choice of numerical opti-
mization algorithms. The function midas_r is written in a way that in theory it can use any
R optimization function. The choice is controlled via the Ofunction argument. Currently
it is possible to use functions optim and nls which are present in a standard R installation
and function optimx from the package optimx (Nash and Varadhan 2011; Nash 2014). The
additional arguments to the aforementioned functions can be specified directly in the call to
midas_r. So for example if we want to use the optimization algorithm of Nelder and Mead,
which is the default option in the function optim we use the following code

R> eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) +

+ mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5),
+ z = c(2, 0.5, -0.1)), Ofunction = "optim", method = "Nelder-Mead")

If we want to use the Golub-Pereyra algorithm for partially linear least-squares models im-
plemented in the function nls we use the following code

R> eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) +

+ mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5),
+ z = c(2, 0.5, -0.1)), Ofunction = "nls", method = "plinear")

It is possible to re-estimate the NLS problem with the different algorithm using as starting
values the final solution of the previous algorithm. For example it is known, that the default
algorithm in nls is sensitive to starting values. So first we can use the standard Nelder-Mead
algorithm to find “more feasible” starting values and then use nls to get the final result:

R> eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) +

+ mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5),
+ z = c(2, 0.5, -0.1)), Ofunction = "optim", method = "Nelder-Mead")
R> eq_r2 <- update(eq_r2, Ofunction = "nls")

The output of the optimization function used can be found by inspecting the element opt of
the midas_r output.

R> eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) +

+ mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5),
Journal of Statistical Software 17

+ z = c(2, 0.5, -0.1)), Ofunction = "optim", method = "Nelder-Mead")

R> eq_r2$opt

$par
(Intercept) trend x1 x2 z1 z2
1.98564830 0.09990172 1.35251875 -0.50816124 2.26317742 0.40886037
z3
-0.07283773

$value
[1] 210.0091

$counts
function gradient
502 NA

$convergence
[1] 1

$message
NULL

Here we observe that the Nelder-Mead algorithm evaluated the cost function 502 times.
The optimization functions in R report the status of the convergence of the optimization
algorithm by a numeric constant with 0 indicating successful convergence. This numeric
constant is reported as the element convergence of the midas_r output.

R> eq_r2$convergence

[1] 1

In this case the convergence was not successful. The help page of the function optim indicates
that convergence code 1 means that the iteration limit was reached.
In order to improve the convergence it is possible to use user defined gradient functions. To
use them it is necessary to define the gradient function of the restriction. For example for the
nealmon restriction the gradient function is defined in the following way:

R> nealmon_gradient <- function(p, d, m) {

+ i <- 1:d
+ pl <- poly(i, degree = length(p) - 1, raw = TRUE)
+ eplc <- exp(pl %*% p[-1])[, , drop = TRUE]
+ ds <- colSums(pl * eplc)
+ s <- sum(eplc)
+ cbind(eplc / s, p[1] * (pl * eplc / s - eplc %*% t(ds) / s^2))
+ }

The gradient functions are passed as named list elements via argument weight_gradients:
18 midasr: Mixed Frequency Data Sampling Regression Models in R

R> eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) +

+ mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5,
+ -0.1)), weight_gradients = list(nealmon = nealmon_gradient))

This way midas_r calculates the exact gradient of the NLS problem (5) using the specified
gradient function of the restriction. For all the types of the restrictions referenced in Table 3
their gradient functions are specified in the package midasr. The naming convention for
gradient functions is restriction_name_gradient. It is not necessary to explicitly pass
gradient functions named according to this convention. If weight_gradients is not NULL
and does not contain the appropriately named element, it is assumed that there exists a
gradient function conforming to the gradient naming convention which is then subsequently
used.
The gradient and the Hessian of the NLS problem are supplied as the output of midas_r.
The gradient is calculated exactly if appropriate gradients for weight functions are supplied as
explained above, otherwise the numerical approximation of the gradient is calculated using the
package numDeriv (Gilbert and Varadhan 2015). For Hessian the numerical approximation
is always used. Having the gradient and Hessian calculated allows to check whether the
necessary and sufficient conditions for convergence are satisfied. This is performed by the
function deriv_test which calculates the Euclidean norm of the gradient and the eigenvalues
of the Hessian. It then tests whether the norm of the gradient is close to zero and whether
the eigenvalues are positive.

R> deriv_tests(eq_r, tol = 1e-06)

$first
[1] FALSE

$second
[1] TRUE

$gradient
[1] 0.0042441084 0.0503345610 -0.0008512998 0.0021251990 0.0004804969
[6] -0.0004505732 -0.3851793847

$eigenval
[1] 1.047988e+07 5.888193e+04 3.664418e+02 1.221224e+02 8.117055e+01
[6] 5.147646e+01 4.595632e+01

To retrieve a vector of constrained estimates θ̃ (and, hence, also f̂ = f γ γ =γ

b ) which corre-
sponds to the vector θ (β, respectively), the function coef can be used as follows

R> coef(eq_r, midas = TRUE)

(Intercept) trend x1 x2 x3
1.988196e+00 9.988277e-02 5.481358e-01 3.299554e-01 1.986196e-01
x4 x5 x6 x7 x8
1.195609e-01 7.197078e-02 4.332347e-02 2.607896e-02 1.569847e-02
Journal of Statistical Software 19

z1 z2 z3 z4 z5
3.346553e-01 4.049713e-01 4.235080e-01 3.827453e-01 2.989297e-01
z6 z7 z8 z9 z10
2.017619e-01 1.176847e-01 5.932147e-02 2.584132e-02 9.728106e-03
z11 z12 z13 z14 z15
3.164848e-03 8.897916e-04 2.161895e-04 4.539331e-05 8.236827e-06
z16 z17
1.291633e-06 1.750366e-07

In the example provided above, a functional constraint was imposed directly on β(L) terms
corresponding to each series without the usage of aggregates. Relying on the relationship (9),
it is always possible to write such an explicit general constraint from an aggregates-based one.
For convenience of use, the function amweights can be used to form several standard periodic
functional constraints with “typical” restrictions explicated in Equation 7. For instance,

R> amweights(p = c(1, -0.5), d = 8, m = 4, weight = nealmon, type = "C")

[1] 0.4550542 0.2760043 0.1674051 0.1015363 0.4550542 0.2760043 0.1674051

[8] 0.1015363

with type = "C" corresponds to a fully restricted version of aggregates-based expression

(7) apart the cross-restriction on the equality of weighting schemes between different vari-
ables/frequencies. Note that the code above repeats the result of

R> nealmon(p = c(1, -0.5), d = 4)

[1] 0.4550542 0.2760043 0.1674051 0.1015363

twice (d/m = 2), as implied by the number of periods at higher-frequency (d = 8) and the
frequency ratio (m = 4). In this way, the function amweights can be used to define explicitly
a new functional constraint relying on the relationship (9). Alternatively, one can indicate
directly within the function midas_r that the aggregates-based restriction must be used as
follows

R> eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C") +
+ mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5),
+ z = c(2, 0.5, -0.1)))

where the first variable follows an aggregates-based MIDAS restriction scheme. Note that the
selection of alternative types "A" and "B" are connected with specifications having a larger
number of parameters (see Table 3), hence the list of starting values needs to be adjusted to
account for an increase in the number of (potentially unequal) impact parameters.
It should be also noted that, whenever the aggregates-connected restrictions are used, the
number of periods must be a multiple of the frequency ratio. For instance, the current lag
specification for variable z is not consistent with this requirement and cannot be represented
through the (periodic) aggregates, but either mls(z, 0:11, 12, amweights, nealmon, "C")
Description Code example Analytical expression Notes
midas_r(y ˜ mls(x, 0:7, 4, nealmon) + P7 (1) (i)
Different constraint mls(z, 0:16, 12, gompertzp), start = yt = c + j=0 βj x4t−j + Constraints on βj , i = 1, 2
functions list(x = c(1, -0.5), z = c(1, 0.5,
P16 (2) are given by different func-
j=0 βj z12t−j + εt
midasr: Mixed Frequency Data Sampling Regression Models in R

0.1))) tions.
midas_r(y ˜ mls(x, 0:7, 4) + P7 (1)
yt = c + j=0 βj x4t−j + x enters linearly with uncon-
Partial constraint
mls(z, 0:16, 12, nealmon), start P16 (2) (1)
(only on z) strained βj .
= list(z = c(1, -0.5))) j=0 βj z12t−j + εt
With unrestricted midas_r(y ˜ mls(y, 1:2, 1) + P2
yt = c + j=1 αj yt−j + Autoregressive terms enter
autoregressive mls(x, 0:7, 4, nealmon), start = P7 linearly with unconstrained
terms list(x = c(1, -0.5))) j=0 βj x4t−j + εt coefficients.
midas_r(y ˜ mls(y, 1:2, 1, "*") Here coefficients of λ(z) are
With a common
+ mls(x, 0:7, 4, nealmon), start α(B)yt =c+ α(B)λ(L)x4t + εt , assumed to satisfy nealmon
factor restriction
= list(x = c(1, -0.5))) restriction.
With autoregres- midas_r(y ˜ mls(y, 1:6, 1, P6 Autoregressive parameters
sive parameters nealmon) + mls(x, 0:7, 4, yt = c + j=1 αj yt−j + αj , j = 1, . . . , 6 are con-
nealmon), start = list(y = c(1,
P7
restricted by a j=0 βj x4t−j + εt
strained to satisfy nealmon
function -0.5), x = c(1, -0.5))) restriction.
midas_r(y ˜ mls(x, 0:7, 4, y =c+ The same weighting scheme
Aggregates-based t
amweights, nealmon, "A"), start P 1 P4 (not parameters) is used in
(Case A) r=0 λr s=1 w(δ r ; s)x4(t−1−r)+s + εt
= list(x = c(1, 1, 1, -0.5))) aggregation.
midas_r(y ˜ mls(x, 0:7, 4, yt = c + P
Aggregates-based The same weights are used in
amweights, nealmon, "B"), start P 1 4
(Case B) r=0 λr s=1 w(δ; s)x4(t−1−r)+s + εt aggregation.
= list(x = c(1, 1, -0.5)))
midas_r(y ˜ mls(x, 0:7, 4, ytP
= c +P A common impact parameter
Aggregates-based
amweights, nealmon, "C"), start 1 4
λ r=0 s=1 w(δ; s)x4(t−1−r)+s + εt of lags and the same weights
(Case C)
= list(x = c(1, -0.5))) are used in aggregation.
User defined function: fn
midas_r(y ˜ mls(x, 0:101, 4,
P101
With a user-defined yt = c + j=0 βj x4t−j + εt , <- function(p, d) p[1] *
constraint fn), start = list(x = c(0, 0))) βj = γ1 (j + 1)γ2 , j = 0, 1, . . . , 101. c(1:d)ˆp[2].
Table 3: A non-extensive list of possible specifications of the MIDAS regression in the midasr package.
20
Journal of Statistical Software 21

or mls(z, 0:23, 12, amweights, nealmon, "C") would be valid expressions from the code
implementation point of view.
Table 3 summarizes and provides various other examples of correspondence between midas_r
coding and the analytical specifications of MIDAS regressions.

3.4. Adequacy testing of restrictions

Given a MIDAS regression model estimated with midas_r, the empirical adequacy of the
functional restrictions can be tested under quite standard assumptions (see Kvedaras and
Zemlys 2012 and Kvedaras and Zemlys 2013) using functions hAh_test and hAhr_test of the
package. In the case of a stationary series {yt } they can be applied directly, whereas whenever
{yt } is cointegrated with explanatory variables, a special transformation needs to be applied
before the testing (see, e.g., Bilinskas and Zemlys 2013). The hAh_test can be used whenever
errors of the process are independently and identically distributed, whereas the hAhr_test
uses a HAC-robust version of the test. We should just point out that, whenever no significant
HAC in the residuals are observed, we would suggest using hAh_test which would then have
more precise test sizes in small samples. In the case of an integrated series {yt } which is
co-integrated with explanatory variables, some other alternatives are available (see Bilinskas,
Kvedaras, and Zemlys 2013).
For illustration, let us use eq_r to represent an estimated model as in the previous subsections.
Then the functions produce, respectively,

R> hAh_test(eq_r)

hAh restriction test

data:
hAh = 16.552, df = 20, p-value = 0.6818

R> hAhr_test(eq_r)

hAh restriction test (robust version)

data:
hAhr = 14.854, df = 20, p-value = 0.7847

Here the value of a test statistic, the degrees of freedom (the number of binding constraints
on parameters in Equation 3), and the empirical significance of the null hypothesis that a
functional constraint is adequate are reported.
As can be seen, such a specification, which in fact corresponds to the underlying DGP, cannot
be rejected at the usual significance levels, whereas, e.g., reducing the number of parameters
of the functional constraint of variable z to only two instead of three is quite strongly rejected
using either version of the test:

R> eq_rb <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) +

+ mls(z, 0:12, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, -0.1)))
R> hAh_test(eq_rb)
22 midasr: Mixed Frequency Data Sampling Regression Models in R

hAh restriction test

data:
hAh = 36.892, df = 17, p-value = 0.00348

R> hAhr_test(eq_rb)

hAh restriction test (robust version)

data:
hAhr = 32.879, df = 17, p-value = 0.01168

Whenever the empirical adequacy cannot be rejected at some appropriate level of significance
for a couple of models, we could further rely on information criteria to make the selection of
the best candidate(s).

3.5. Model selection

Suppose that we want to investigate which out of several functional constraints – for instance,
the normalized ("nealmon") or non-normalized ("almonp") exponential Almon lag polynomi-
als, or with polynomial of order 2 or 3, and so on – are better suited in a MIDAS regression
model of y on x and z (possibly different for each variable). Since the best maximum number
of lags can differ with a functional constraint and/or variable/frequency, let us first define
using the midasr function expand_weights_lags the sets of potential models corresponding
to each explanatory variable as follows

R> set_x <- expand_weights_lags(weights = c("nealmon", "almonp"), from = 0,

+ to = c(5, 10), m = 1, start = list(nealmon = c(1, 1),
+ almonp = c(1, 0, 0)))
R> set_z <- expand_weights_lags(c("nealmon", "nealmon"), 0, c(10, 20), 1,
+ start = list(nealmon = c(1, -1), nealmon = c(1, -1, 0)))

Here, for each variable, vector (or list) weights defines the potential restrictions to be con-
sidered and a list start gives the appropriate starting values defining implicitly the number
of parameters per function.
The potential lag structures are given by the following ranges of high-frequency lags: from
[from; m ∗ min(to)] to [from; m ∗ max(to)]. When aggregates-based modeling is involved
using amweights in midas_r, m can be set to the frequency ratio which ensures that the
considered models (lag structures) are multiples of it. Otherwise, we would recommend to
operate with high-frequency lag structures without changing the default value m = 1. Then,
the set of potential models is defined as all possible different combinations of functions and
lag structures with a corresponding set of starting values. A simple example bellow illustrates
the result in order to reveal the underlying structure, which, besides the understanding of it,
is otherwise not needed for a user.

R> expand_weights_lags(weights = c("nealmon", "nbeta"), from = 1,

+ to = c(2, 3), m = 1, start = list(nealmon = c(1, -1),
+ nbeta = rep(0.5, 3)))
Journal of Statistical Software 23

weights lags starts

1 nealmon 1:2 c(1, -1)
2 nealmon 1:3 c(1, -1)
3 nbeta 1:2 c(0.5, 0.5, 0.5)
4 nbeta 1:3 c(0.5, 0.5, 0.5)

Given the sets of potential specifications for each variable as defined above, the estimation of
all the models is performed by

R> eqs_ic <- midas_r_ic_table(y ~ trend + mls(x, 0, m = 4) +

+ fmls(z, 0, m = 12), table = list(z = set_z, x = set_x))

The function midas_r_ic_table returns a summary table of all models together with the cor-
responding values of the usual information criteria and the empirical sizes of adequacy testing
of functional restrictions of parameters. The result of derivative tests and the convergence
status of the optimization function is also returned.
The summary table is a data.frame where each row corresponds to a candidate model; so
this table can be manipulated in the usual R way. The table can be accessed as table element
of the list returned by midas_r_ic_table. The list of fitted ‘midas_r’ objects of all candidate
models can be accessed as candlist element. It is possible to inspect each candidate model
and fine-tune its convergence if necessary.

R> eqs_ic$candlist[[5]] <- update(eqs_ic$candlist[[5]], Ofunction = "nls")

The summary table can be recalculated using the update method for ‘midas_r_ic_table’.
This function then recalculates all the necessary statistics.

R> eqs_ic <- update(eqs_ic)

It should be pointed out that there is no need to provide the weighting function nor a specific
lag order in the mls functions in a call to midas_r_ic_table, since they are defined by the
respective potential sets of models under option table. Any provided values with mls (or
other similar functions) are over-written by those defined in table.
Finally, the best model in terms of a selected information criterion in a restricted or unre-
stricted model then is simply obtained by using

R> modsel(eqs_ic, IC = "AIC", type = "restricted")

which also prints the usual summary statistics as well as the testing of adequacy of the applied
functional restriction using, by default, hAh_test. A word of caution is needed here to remind
that, as is typical, the empirical size of a test corresponding to a complex model-selection
procedure might not correspond directly to a nominal one of a single-step estimation.

3.6. Forecasting
Conditional forecasting (with prediction intervals, etc.) using unrestricted U-MIDAS regres-
sion models which are estimated using lm can be performed using standard R functions, e.g.,
24 midasr: Mixed Frequency Data Sampling Regression Models in R

the predict method for ‘lm’ objects. Conditional point prediction given a specific model is
also possible relying on a standard predict function.
The predict method implemented in package midasr works similarly to the predict method
for ‘lm’ objects. It takes the new data, transforms it into an appropriate matrix and multiplies
it with the coefficients. Suppose we want to produce the forecast ŷT +1|T for model (12). To
produce this forecast we need the data x4(T +1) , . . . , x4T −3 and z12(T +1) , . . . , z12T −4 . It would
be tedious to calculate precisely the required data each time we want to perform a forecasting
exercise. To alleviate this problem package midasr provides the function forecast. This
function assumes that the model was estimated with the data up to low frequency index T. It
is then assumed that the new data is the data after the low frequency T and then calculates
the appropriate forecast. For example suppose that we have new data for one low frequency
period for model (12). Here is how the forecast for one period would look like:

R> newx <- rnorm(4)

R> newz <- rnorm(12)
R> forecast(eq_rb, newdata = list(x = newx, z = newz, trend = 251))

Point Forecast
1 28.28557

It is also common to estimate models which do not require new data for forecasting
7 16
X (1) X (2)
yt+` = 2 + 0.1t + βj x4t−j + βj z12t−j + εt+` ,
j=0 j=0

where ` is the desired forecasting horizon. This model can be rewritten as

7+4` 16+12`
X (1) X (2)
yt = 2 + 0.1(t − `) + βj x4t−j + βj z12t−j + εt ,
j=4` j=12`

and can be estimated using midas_r. For such a model we can get forecasts ŷT +`|T , . . . , ŷT +1|T
using the explanatory variable data up to low frequency index T. To obtain these forecasts
using the function forecast we need to supply NA values for explanatory variables. An
example for ` = 1 is as follows:

R> eq_f <- midas_r(y ~ trend + mls(x, 4 + 0:7, 4, nealmon) +

+ mls(z, 12 + 0:16, 12, nealmon), start = list(x = c(1, -0.5),
+ z = c(2, 0.5, -0.1)))
R> forecast(eq_f, newdata = list(x = rep(NA, 4), z = rep(NA, 12),
+ trend = 251))

Point Forecast
1 27.20452

Note that we still need to specify a value for the trend.

In addition, the package midasr provides a general flexible environment for out-of-sample
prediction, forecast combination, and evaluation of restricted MIDAS regression models using
Journal of Statistical Software 25

the function select_and_forecast. If exact models were known for different forecasting
horizons, it can also be used just to report various in- and out-of-sample prediction charac-
teristics of the models. In the general case, it also performs an automatic selection of the
best models for each forecasting horizon from a set of potential specifications defined by all
combinations of functional restrictions and lag orders to be considered, and produces forecast
combinations according to a specified forecast weighting scheme.
In general, the definition of potential models in the function select_and_forecast is similar
to that one used in the model selection analysis described in the previous section. However,
different best performing specifications are most likely related with each low-frequency fore-
casting horizon ` = 0, 1, 2, . . . . Therefore the set of potential models (parameter restriction
functions and lag orders) to be considered for each horizon needs to be defined.
Suppose that, as in the previous examples, we have variables x and z with frequency ratios
m1 = 4 and m2 = 12, respectively. Suppose that we intend to consider forecasting of y up
to three low-frequency periods ` ∈ {1, 2, 3} ahead. It should be noted that, in terms of high-
frequency periods, they correspond to `m1 ∈ {4, 8, 12} for variable x, and `m2 ∈ {12, 24, 36}
for variable z. Thus these variable-specific vectors define the lowest lags6 of high-frequency
period to be considered for each variable in the respective forecasting model (option from
in the function select_and_forecast). Suppose further that in all the models we want
to consider specifications having not less than 10 high-frequency lags and not more than
15 for each variable. This defines the maximum high-frequency lag of all potential models
considered for each low-frequency horizon period ` ∈ {1, 2, 3}. Hence, for each variable, three
corresponding pairs (`m1 + 10, `m1 + 15), ` ∈ {1, 2, 3} will define the upper bounds of ranges
to be considered (option to in the function select_and_forecast). For instance, for variable
x, three pairs (14, 19), (18, 23), and (22, 27) correspond to ` = 1, 2, and 3 and together with
that defined in option from (see x = (4, 8, 12)) imply that the following ranges of potential
models will be under consideration for variable x:

• ` = 1 : from [4 − 14] to [4 − 19],

• ` = 2 : from [8 − 18] to [8 − 23],
• ` = 3 : from [12 − 22] to [12 − 27].

The other options of the function select_and_forecast are options of functions

midas_r_ic_table, modsel and average_forecast.

R> cbfc <- select_and_forecast(y ~ trend + mls(x, 0, 4) + mls(z, 0, 12),

+ from = list(x = c(4, 8, 12), z = c(12, 24, 36)),
+ to = list(x = rbind(c(14, 19), c(18, 23), c(22, 27)),
+ z = rbind(c(22, 27), c(34, 39), c(46, 51))),
+ insample = 1:200, outsample = 201:250,
+ weights = list(x = c("nealmon", "almonp"), z = c("nealmon", "almonp")),
+ wstart = list(nealmon = rep(1, 3), almonp = rep(1, 3)),
+ IC = "AIC", seltype = "restricted", ftype = "fixed",
+ measures = c("MSE", "MAPE", "MASE"),
+ fweights = c("EW", "BICW", "MSFE", "DMSFE"))
6
Including lags smaller than that would imply that more information on explanatory variables is available
and, in fact, the ` − 1 forecasting horizon is actually under consideration.
26 midasr: Mixed Frequency Data Sampling Regression Models in R

The names of weighting schemes are taken from the MIDAS MATLAB toolbox (Ghysels
2013). Similarly forecasting using rolling and recursive model estimation samples defined
therein (Ghysels 2013) is supported by setting option seltype = "rolling" or seltype =
"recursive".
Then, among others,

R> cbfc$accuracy$individual
R> cbfc$accuracy$average

report, respectively:

• the best forecasting equations (in terms of a specified criterion out of the above-defined
potential specifications), and their in- and out-of-sample forecasting precision measures
for each forecasting horizon;

• the out-of-sample precision of forecast combinations for each forecasting horizon.

The above example illustrated a general usage of the function select_and_forecast includ-
ing selection of best models. Now suppose that a user is only interested in evaluating a one
step ahead forecasting performance of a given model. Suppose further that he/she a priori
knows that the best specifications to be used for this forecasting horizon ` = 1 is with

• mls(x, 4:12, 4, nealmon) with parameters x = c(2, 10, 1, -0.1) (the first one
representing an impact parameter and the last three being the parameters of the nor-
malized weighting function), and

• mls(z, 12:20, 12, nealmon) with parameters z = c(-1, 2, -0.1), i.e., with one
parameter less in the weighting function.

Given already preselected and evaluated models, a user can use the function average_forecast
to evaluate the forecasting performance. To use this function at first it is necessary to fit the
model and then pass it to function average_forecast specifying the in-sample and out-
of-sample data, accuracy measures and weighting scheme in a similar manner to function
select_and_forecast

R> mod1 <- midas_r(y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z,

+ 12:22, 12, nealmon), start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))
R> avgf <- average_forecast(list(mod1), data = list(y = y, x = x, z = z,
+ trend = trend), insample = 1:200, outsample = 201:250, type = "fixed",
+ measures = c("MSE", "MAPE", "MASE"),
+ fweights = c("EW", "BICW", "MSFE", "DMSFE"))

It should also be pointed out that the forecast combinations in select_and_forecast are
obtained only from the forecasts linked to different restriction functions on parameters. The
forecasts related to different lag specifications are not combined, but the best lag order is cho-
sen in terms of a given information criterion. If there is a need to get forecast combinations for
a group of models which the user selected using other criteria, the function average_forecast
should be used in a manner outlined in the previous example.
Journal of Statistical Software 27

4. Empirical illustrations

4.1. Forecasting GDP growth

We replicate the example provided in Ghysels (2013). In particular we run MIDAS regression
to forecast quarterly GDP growth with monthly non-farms payroll employment growth. The
forecasting equation is the following
8
X
yt+1 = α + ρyt + θj x3t−j + εt ,
j=0

where yt is the log difference of quarterly seasonally adjusted real US GDP and x3t is the
log difference of monthly total employment non-farms payroll. The data is taken from the
St. Louis FRED website.
First we load the data and perform necessary transformations.

R> data("USqgdp", package = "midasr")

R> data("USpayems", package = "midasr")
R> y <- window(USqgdp, end = c(2011, 2))
R> x <- window(USpayems, end = c(2011, 7))
R> yg <- diff(log(y)) * 100
R> xg <- diff(log(x)) * 100
R> nx <- ts(c(NA, xg, NA, NA), start = start(x), frequency = 12)
R> ny <- ts(c(rep(NA, 33), yg, NA), start = start(x), frequency = 4)

The last two lines are needed to equalize the sample sizes, which are different in the original
data. We simply add additional NA values at the beginning and the end of the data. The
graphical representation of the data is shown in Figure 3.
To specify the model for the midas_r function we rewrite it in the following equivalent form:
11
X
yt = α + ρyt−1 + θj x3t−j + εt .
j=3
6
4
Percentages

2
0
−2
−4

1940 1960 1980 2000

Time

Figure 3: A plot of time series of quarterly gross domestic product growth rates and monthly
non-farm payroll employment growth rates.
28 midasr: Mixed Frequency Data Sampling Regression Models in R

As in Ghysels (2013) we restrict the estimation sample from the first quarter of 1985 to the
first quarter of 2009. We evaluate the models with the Beta polynomial, Beta with non-zero
and U-MIDAS weight specifications.

R> xx <- window(nx, start = c(1985, 1), end = c(2009, 3))

R> yy <- window(ny, start = c(1985, 1), end = c(2009, 1))
R> beta0 <- midas_r(yy ~ mls(yy, 1, 1) + mls(xx, 3:11, 3, nbeta),
+ start = list(xx = c(1.7, 1, 5)))
R> coef(beta0)

(Intercept) yy xx1 xx2 xx3

0.8315274 0.1058910 2.5887103 1.0201202 13.6867809

R> betan <- midas_r(yy ~ mls(yy, 1, 1) + mls(xx, 3:11, 3, nbetaMT),

+ start = list(xx = c(2, 1, 5, 0)))
R> coef(betan)

(Intercept) yy xx1 xx2 xx3 xx4

0.93778705 0.06748141 2.26970646 0.98659174 1.49616336 -0.09184983

R> um <- midas_r(yy ~ mls(yy, 1, 1) + mls(xx, 3:11, 3), start = NULL)

R> coef(um)

(Intercept) yy xx1 xx2 xx3 xx4

0.92989757 0.08358393 2.00047205 0.88134597 0.42964662 -0.17596814
xx5 xx6 xx7 xx8 xx9
0.28351010 1.16285271 -0.53081967 -0.73391876 -1.18732001

We can evaluate the forecasting performance of these three models on the out-of-sample data,
containing 9 quarters, from 2009Q2 to 2011Q2

R> fulldata <- list(xx = window(nx, start = c(1985, 1), end = c(2011, 6)),
+ yy = window(ny, start = c(1985, 1), end = c(2011, 2)))
R> insample <- 1:length(yy)
R> outsample <- (1:length(fulldata$yy))[-insample]
R> avgf <- average_forecast(list(beta0, betan, um), data = fulldata,
+ insample = insample, outsample = outsample)
R> sqrt(avgf$accuracy$individual$MSE.out.of.sample)

[1] 0.5361953 0.4766972 0.4457144

We see that the unrestricted MIDAS regression model gives the best out-of-sample RMSE.

4.2. Forecasting realized volatility

As another demonstration we use the package midasr to forecast the daily realized volatility.
A simple model for forecasting the daily realized volatility was proposed by Corsi (2009). The
heterogeneous autoregressive model of realized volatility (HAR-RV) is defined as
(d) (d) (w) (m)
RVt+1 = c + β (d) RVt + β (w) RVt + β (m) RVt + wt+1 ,
Journal of Statistical Software 29

(w) (m)
where RVt is the daily realized volatility and RVt and RVt are weekly and monthly
averages:

(w) 1 (d) (d) (d)

RVt = RVt + RVt−1 + . . . + RVt−4
5
(m) 1 (d) (d) (d)

RVt = RVt + RVt−1 + . . . + RVt−19 ,
20
where we assume that a week has 5 days, and a month has 4 weeks. This model is a special
case of a MIDAS regression:
19
(d) X (d)
RVt+1 = c + βj RVt−j + wt+1 ,
j=0

where

(d) 1 (w) 1 (m)
β + 5 β

 + 20 β , for j = 0,
βj = 1 (w) 1 (m)
 5β + 20 β , for j = 1, . . . , 4,
 1 β (m) , for j = 5, . . . , 19.

20

The corresponding R code is the following

R> harstep <- function(p, d, m) {

+ if (d != 20)
+ stop("HAR(3)-RV process requires 20 lags")
+ out <- rep(0, 20)
+ out[1] <- p[1] + p[2] / 5 + p[3] / 20
+ out[2:5] <- p[2] / 5 + p[3] / 20
+ out[6:20] <- p[3] / 20
+ out
+ }

For empirical demonstration we use the realized variance data on stock indices provided by
Heber, Lunde, Shephard, and Sheppard (2009).
We estimate the model for the annualized realized volatility of the S&P500 index, which is
based on 5-minute returns data.

R> data("rvsp500", package = "midasr")

R> spx2_rvol <- 100 * sqrt(252 * as.numeric(rvsp500[, "SPX2.rv"]))
R> mh <- midas_r(rv ~ mls(rv, 1:20, 1, harstep), data = list(rv = spx2_rvol),
+ start = list(rv = c(1, 1, 1)))
R> summary(mh)

Formula rv ~ mls(rv, 1:20, 1, harstep)

Parameters:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.83041 0.36437 2.279 0.022726 *
30 midasr: Mixed Frequency Data Sampling Regression Models in R

rv1 0.34066 0.04463 7.633 2.95e-14 ***

rv2 0.41135 0.06932 5.934 3.25e-09 ***
rv3 0.19317 0.05081 3.802 0.000146 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.563 on 3435 degrees of freedom

For comparison we also estimate the model with the normalized exponential Almon weights

R> mr <- midas_r(rv ~ mls(rv, 1:20, 1, nealmon), data = list(rv = spx2_rvol),

+ start = list(rv = c(0, 0, 0)), weight_gradients = list())
R> summary(mr)

Formula rv ~ mls(rv, 1:20, 1, nealmon)

Parameters:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.837660 0.377536 2.219 0.0266 *
rv1 0.944719 0.027748 34.046 < 2e-16 ***
rv2 -0.768296 0.096120 -7.993 1.78e-15 ***
rv3 0.029084 0.005604 5.190 2.23e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.535 on 3435 degrees of freedom

We can test which of these restrictions is compatible with the data using the heteroscedasticity
and autocorrelation robust weight specification test hAhr_test.

R> hAhr_test(mh)

hAh restriction test (robust version)

data:
hAhr = 28.074, df = 17, p-value = 0.04408

R> hAhr_test(mr)

hAh restriction test (robust version)

data:
hAhr = 19.271, df = 17, p-value = 0.3132

We can see that the null hypothesis pertaining to the HAR-RV-implied constraints in the
MIDAS regression model is rejected at the 0.05 significance level, whereas the null hypothesis
that the exponential Almon lag restriction is adequate cannot be rejected.
Journal of Statistical Software 31

MIDAS coefficients

0.3
0.1
−0.1

5 10 15 20

High frequency lag

Figure 4: Comparison of HAR-RV (blue), Nealmon (green) and U-MIDAS (black) models.

Figure 4 illustrates the coefficients of the fitted MIDAS regressions and the coefficients of
U-MIDAS regression with their corresponding 95% confidence bounds.
For the exponential Almon lag specification we can choose the number of lags via AIC or
BIC.

R> tb <- expand_weights_lags("nealmon", from = 1, to = c(5, 15),

+ start = list(nealmon = c(0, 0, 0)))
R> mtb <- midas_r_ic_table(rv ~ mls(rv, 1:20, 1, nealmon),
+ data = list(rv = spx2_rvol), table = list(rv = tb),
+ test = "hAh_test", weight_gradients = list(), show_progress = FALSE)
R> mtb$candlist <- lapply(mtb$candlist, update, Ofunction = "nls")
R> mtb$test <- "hAhr_test"
R> mtb <- update(mtb)

Here we used two optimization methods to improve the convergence. The midas_r_ic_table
function applies the test function for each candidate model. The function hAhr_test takes
a lot of computing time, especially for models with larger number of lags, so we calculate it
only for the second final step, and we restrict the number of lags to choose from. The AIC
selects the model with 9 lags:

R> bm <- modsel(mtb)

Selected model with AIC = 21551.97

Based on restricted MIDAS regression model
The p-value for the null hypothesis of the test hAhr_test is 0.5531733

Formula rv ~ mls(rv, 1:9, 1, nealmon)

Parameters:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.96102 0.36944 2.601 0.00933 **
rv1 0.93707 0.02729 34.337 < 2e-16 ***
rv2 -1.19233 0.19288 -6.182 7.08e-10 ***
32 midasr: Mixed Frequency Data Sampling Regression Models in R

rv3 0.09657 0.02190 4.411 1.06e-05 ***

---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.524 on 3440 degrees of freedom

The HAC robust version of hAh_test again does not reject the null hypothesis of the expo-
nential Almon lag specifications.
We can look into the forecast performance of both models, using a rolling forecast with
1000 observations window. For comparison we also calculate the forecasts of an unrestricted
AR(20) model.

R> ar20 <- midas_r(rv ~ mls(rv, 1:20, 1), data = list(rv = spx2_rvol),
+ start = NULL)
R> forc <- average_forecast(list(ar20, mh, bm), data = list(rv = spx2_rvol),
+ insample = 1:1000, outsample = 1001:1100, type = "rolling",
+ show_progress = FALSE)
R> forc$accuracy$individual

Model MSE.out.of.sample MAPE.out.of.sample

1 rv ~ mls(rv, 1:20, 1) 10.82516 26.60201
2 rv ~ mls(rv, 1:20, 1, harstep) 10.45842 25.93013
3 rv ~ mls(rv, 1:9, 1, nealmon) 10.34797 25.90268
MASE.out.of.sample MSE.in.sample MAPE.in.sample MASE.in.sample
1 0.8199566 28.61602 21.56704 0.8333858
2 0.8019687 29.24989 21.59220 0.8367377
3 0.7945121 29.08284 21.81484 0.8401646

We see that the exponential Almon lag model slightly outperforms the HAR-RV model and
both models outperform the AR(20) model.

5. Final remarks
Only a part of the available functionality of the discussed functions of the package midasr
was presented. As it is usual in R, much more information on the resulting objects and all
the information on the package-specific functions can be retrieved by using the generic func-
tions objects and ?, respectively. Furthermore, in order to save space, the coding examples
provided were almost always presented with minimal accompanying output obtained after
running the code. The package page contains all the codes and complete output together
with some additional illustration of the functionality of the package. Other information with
a list of the functions and a number of demonstration codes is accessible using the usual
??midasr.
Journal of Statistical Software 33

References

Andreou E, Ghysels E, Kourtellos A (2010). “Regression Models With Mixed Sampling

Frequencies.” Journal of Econometrics, 158, 246–261. doi:10.1016/j.jeconom.2010.01.
004.

Andreou E, Ghysels E, Kourtellos A (2011). “Forecasting with Mixed-Frequency Data.” In

MP Clements, DF Hendry (eds.), Oxford Handbook of Economic Forecasting, pp. 225–245.

Andreou E, Ghysels E, Kourtellos A (2013). “Should Macroeconomic Forecasters Look at

Daily Financial Data?” Journal of Business and Economic Statistics, 31, 240–251. doi:
10.1080/07350015.2013.767199.

Armesto MT, Engemann KM, Owyang MT (2010). “Forecasting with Mixed Frequencies.”
Federal Reserve Bank of St. Louis Review, 92, 521–536.

Bai J, Ghysels E, Wright J (2013). “State Space Models and MIDAS Regressions.” Econo-
metric Reviews, 32(7), 779–813. doi:10.1080/07474938.2012.690675.

Bilinskas B, Kvedaras V, Zemlys V (2013). “Testing the Functional Constraints on Parameters

in Cointegrated MIDAS Regressions.” Unpublished manuscript.

Bilinskas B, Zemlys V (2013). “Testing the Functional Constraints on Parameters in Regres-

sion Models with Cointegrated Variables of Different Frequency.” Unpublished manuscript.

Breitung J, Roling C, Elengikal S (2013). “Forecasting Inflation Rates Using Daily Data: A
Nonparametric MIDAS Approach.” Working Paper, URL http://www.ect.uni-bonn.de/
mitarbeiter/joerg-breitung/npmidas.

Clements M, Galvão A (2008). “Macroeconomic Forecasting with Mixed Frequency Data:

Forecasting US Output Growth.” Journal of Business and Economic Statistics, 26, 546–
554. doi:10.1198/073500108000000015.

Corsi F (2009). “A Simple Approximate Long-Memory Model of Realized Volatility.” Journal

of Financial Econometrics, 7, 174–196. doi:10.1093/jjfinec/nbp001.

Foroni C, Marcellino M, Schumacher C (2015). “Unrestricted Mixed Data Sampling (MIDAS):

MIDAS Regressions with Unrestricted Lag Polynomials.” Journal of the Royal Statistical
Society A, 178(1), 57–82. doi:10.1111/rssa.12043.

Ghysels E (2013). “MATLAB Toolbox for Mixed Sampling Frequency Data Analysis Using
MIDAS Regression Models.” Available on MATLAB Central at http://www.mathworks.
com/matlabcentral/fileexchange/45150-midas-regression.

Ghysels E, Kvedaras V, Zemlys V (2016). Mixed Frequency Data Sampling Regression Mod-
els: The R Package midasr. R package version 0.6, URL https://CRAN.R-project.org/
package=midasr.

Ghysels E, Santa-Clara P, Valkanov R (2002). “The MIDAS Touch: Mixed Data Sampling
Regression Models.” Working paper, UNC and UCLA.
34 midasr: Mixed Frequency Data Sampling Regression Models in R

Ghysels E, Santa-Clara P, Valkanov R (2006a). “Predicting Volatility: Getting the Most Out
of Return Data Sampled at Different Frequencies.” Journal of Econometrics, 131, 59–95.
doi:10.1016/j.jeconom.2005.01.004.
Ghysels E, Sinko A, Valkanov R (2006b). “MIDAS Regressions: Further Results and New
Directions.” Econometric Reviews, 26, 53–90. doi:10.1080/07474930600972467.
Ghysels E, Valkanov R (2012). “Forecasting Volatility with MIDAS.” In L Bauwens, C Hafner,
S Laurent (eds.), Handbook of Volatility Models and Their Applications, pp. 383–401. John
Wiley & Sons.
Gilbert P, Varadhan R (2015). numDeriv: Accurate Numerical Derivatives. R package version
2014.2-1, URL https://CRAN.R-project.org/package=numDeriv.
Heber G, Lunde A, Shephard N, Sheppard K (2009). “Oxford-Man Institute’s Realized Li-
brary.” Oxford-Man Institute, University of Oxford. Library Version 0.2.
Kvedaras V, Račkauskas A (2010). “Regression Models with Variables of Different Frequen-
cies: The Case of a Fixed Frequency Ratio.” Oxford Bulletin of Economics and Statistics,
72(5), 600–620. doi:10.1111/j.1468-0084.2010.00585.x.
Kvedaras V, Zemlys V (2012). “Testing the Functional Constraints on Parameters in Re-
gressions With Variables of Different Frequency.” Economics Letters, 116(2), 250–254.
doi:10.1016/j.econlet.2012.03.009.
Kvedaras V, Zemlys V (2013). “The Statistical Content and Empirical Testing of the MIDAS
Restrictions.” Unpublished manuscript.
Nash J (2014). “On Best Practice Optimization Methods in R.” Journal of Statistical Software,
60(2), 1–14. doi:10.18637/jss.v060.i02.
Nash J, Varadhan R (2011). “Unifying Optimization Algorithms to Aid Software System
Users: optimx for R.” Journal of Statistical Software, 43(9), 1–14. doi:10.18637/jss.
v043.i09.
R Core Team (2016). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Rodriguez A, Puggioni G (2010). “Mixed Frequency Models: Bayesian Approaches to
Estimation and Prediction.” International Journal of Forecasting, 26, 293–311. doi:
10.1016/j.ijforecast.2010.01.009.
The MathWorks Inc (2014). MATLAB – The Language of Technical Computing, Version
R2014b. Natick. URL http://www.mathworks.com/products/matlab/.
Wohlrabe K (2009). Forecasting with Mixed-Frequency Time Series Models. Ph.D. thesis,
Ludwig-Maximilians-Universität München.
Zeileis A (2004). “Econometric Computing with HC and HAC Covariance Matrix Estimators.”
Journal of Statistical Software, 11(10), 1–17. doi:10.18637/jss.v011.i10.
Zeileis A (2006). “Object-Oriented Computation of Sandwich Estimators.” Journal of Statis-
tical Software, 16(9), 1–16. doi:10.18637/jss.v016.i09.
Journal of Statistical Software 35

A. Appendix
Figure 1 was created using Monte Carlo simulation. The following DGP was used
16
X
yt = 2 + 0.1t + βj z12t−j + ut , zτ ∼ N (0, σ 2 ), ut ∼ N (0, σ 2 ),
j=0

where zτ and ut are independent. The coefficients βj were chosen to come from the normalized
exponential Almon polynomial restriction:

R> nealmon(p = c(2, 0.5, -0.1), d = 17)

The data for this DGP was generated for low frequency sample sizes 50, 100, 200, 300, 500,
750 and 1000. For each sample size an additional out-of-sample data set was generated using a
quarter of the size of an in-sample data set. Three MIDAS regression models were estimated
using the in-sample data set: an unrestricted MIDAS, a restricted one using the correct
constraint from the DGP, and the one with an incorrect restriction (non-exponential Almon
polynomial). The forecast was calculated using the out-of-sample data set. The Euclidean
distance between the model coefficients and the coefficients of the DGP was recorded together
with the mean squared error of the forecast.
This process was repeated 1000 times. The points in the figure are the averages of the
replications. The code for generating the data can be found in the help page of data set
oos_prec in the package midasr.

Affiliation:
Eric Ghysels
Department of Economics
University of North Carolina – Chapel Hill
Gardner Hall, CB 3305 Chapel Hill, NC 27599-3305
Unites States of America
E-mail: [email protected]
URL: http://www.unc.edu/~eghysels/

Virmantas Kvedaras, Vaidotas Zemlys

Department of Econometric Analysis
Faculty of Mathematics and Informatics
Vilnius University
Naugarduko g. 24, Vilnius, Lithuania
E-mail: [email protected], [email protected]
URL: http://mif.vu.lt/ututi/teacher/vk, http://mif.vu.lt/~zemlys/

Journal of Statistical Software http://www.jstatsoft.org/

published by the Foundation for Open Access Statistics http://www.foastat.org/
August 2016, Volume 72, Issue 4 Submitted: 2014-02-19
doi:10.18637/jss.v072.i04 Accepted: 2015-07-16

Rao (2022) - A Course in Time Series Analysis
No ratings yet
Rao (2022) - A Course in Time Series Analysis
527 pages
MM17 Custom Fields Update MVKE Table
0% (1)
MM17 Custom Fields Update MVKE Table
10 pages
Time Series Theory and Methods Brockwell PDF
No ratings yet
Time Series Theory and Methods Brockwell PDF
530 pages
Calculus For Iit Jee PDF
77% (96)
Calculus For Iit Jee PDF
510 pages
Amit Konar, Diptendu Bhattacharya-Time-Series Prediction and Applications. A Machine Intelligence Approach-Springer (2017)
No ratings yet
Amit Konar, Diptendu Bhattacharya-Time-Series Prediction and Applications. A Machine Intelligence Approach-Springer (2017)
248 pages
Px40 Introduction SN
No ratings yet
Px40 Introduction SN
63 pages
Mixed Data Sampling (MIDAS) Regression Models
No ratings yet
Mixed Data Sampling (MIDAS) Regression Models
37 pages
Matlab Toolbox For MIDAS 2011
No ratings yet
Matlab Toolbox For MIDAS 2011
24 pages
MIDAS Further Results Ghysels
No ratings yet
MIDAS Further Results Ghysels
49 pages
Regression Models With Mixed Sampling Frequencies
No ratings yet
Regression Models With Mixed Sampling Frequencies
16 pages
MIDAS Usersguide V2.3
No ratings yet
MIDAS Usersguide V2.3
57 pages
US Output Growth - Clemens
No ratings yet
US Output Growth - Clemens
33 pages
Mixed Frequency Fcsting
No ratings yet
Mixed Frequency Fcsting
16 pages
Forecasting With Mixed Frequencies - Armesto
No ratings yet
Forecasting With Mixed Frequencies - Armesto
16 pages
Norges Bank Working Paper 2013 06
No ratings yet
Norges Bank Working Paper 2013 06
45 pages
Time Series Kendall
No ratings yet
Time Series Kendall
320 pages
User Guide of GARCH-MIDAS and DCC-MIDAS MATLAB Programs
No ratings yet
User Guide of GARCH-MIDAS and DCC-MIDAS MATLAB Programs
12 pages
STAT 520 Forecasting and Time Series: Lecture Notes
No ratings yet
STAT 520 Forecasting and Time Series: Lecture Notes
311 pages
Winkel Reid
No ratings yet
Winkel Reid
17 pages
Pub - Time Series Theory and Methods PDF
No ratings yet
Pub - Time Series Theory and Methods PDF
530 pages
Resumos Forecasting
No ratings yet
Resumos Forecasting
17 pages
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
No ratings yet
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
15 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Time Series A Data Analysis Approach Using R by Robert Shumway, David Stoffer
No ratings yet
Time Series A Data Analysis Approach Using R by Robert Shumway, David Stoffer
272 pages
Time Series and Its Applications
No ratings yet
Time Series and Its Applications
6 pages
Time Series Previewpdf
No ratings yet
Time Series Previewpdf
30 pages
Time Series Analysis
No ratings yet
Time Series Analysis
709 pages
(Ebook PDF) Time Series: A Data Analysis Approach Using R Instant Download
100% (8)
(Ebook PDF) Time Series: A Data Analysis Approach Using R Instant Download
56 pages
(Ebook PDF) Time Series: A Data Analysis Approach Using R Install Download
No ratings yet
(Ebook PDF) Time Series: A Data Analysis Approach Using R Install Download
47 pages
Methods of Forecasting in A Manufacturing Company
No ratings yet
Methods of Forecasting in A Manufacturing Company
31 pages
Econometric Modeling
No ratings yet
Econometric Modeling
38 pages
Advanced Multivariate Time Series Forecasting Mode
No ratings yet
Advanced Multivariate Time Series Forecasting Mode
8 pages
CH 13
No ratings yet
CH 13
11 pages
MATH545-Time Series
No ratings yet
MATH545-Time Series
79 pages
Bitcoin Chapter 2
No ratings yet
Bitcoin Chapter 2
18 pages
Nên Dùng DL Tài Chính Hàng Ngày Như Thế nào-USA
No ratings yet
Nên Dùng DL Tài Chính Hàng Ngày Như Thế nào-USA
82 pages
MDFA Legacy
No ratings yet
MDFA Legacy
278 pages
Question Bank MBM696
No ratings yet
Question Bank MBM696
14 pages
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition PDF Download
100% (2)
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition PDF Download
45 pages
Time Series - SAS
No ratings yet
Time Series - SAS
9 pages
Notes On Time Series Analysis
No ratings yet
Notes On Time Series Analysis
111 pages
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition Instant Download
100% (4)
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition Instant Download
45 pages
Ewan
No ratings yet
Ewan
144 pages
Introduction To (Demand) Forecasting
No ratings yet
Introduction To (Demand) Forecasting
39 pages
? ?????? ?? ???? ?????? ????????
No ratings yet
? ?????? ?? ???? ?????? ????????
300 pages
UNIVAR4
No ratings yet
UNIVAR4
56 pages
A Course in Time Series Analysis 1662068197
No ratings yet
A Course in Time Series Analysis 1662068197
300 pages
Modeling and Analysis of Time Series Data: Chapter 1: Introduction
No ratings yet
Modeling and Analysis of Time Series Data: Chapter 1: Introduction
26 pages
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
No ratings yet
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
305 pages
Nonparametric Risk Bounds For Time-Series Forecasting - McDonald at Al
No ratings yet
Nonparametric Risk Bounds For Time-Series Forecasting - McDonald at Al
40 pages
UNIT V Time Series 1
No ratings yet
UNIT V Time Series 1
24 pages
Automatic Forecasting: Month Traffic
No ratings yet
Automatic Forecasting: Month Traffic
11 pages
High-Dimensional Statistics: Lecture Notes
No ratings yet
High-Dimensional Statistics: Lecture Notes
168 pages
(Marc Aerts, Geert Molenberghs, Louise M. Ryan, He
No ratings yet
(Marc Aerts, Geert Molenberghs, Louise M. Ryan, He
316 pages
Waveletbased and Fourierbased Multivariate Whittle Estimation Multiwave2019Journal of Statistical Software
No ratings yet
Waveletbased and Fourierbased Multivariate Whittle Estimation Multiwave2019Journal of Statistical Software
31 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Topic 9 Forecasting
No ratings yet
Topic 9 Forecasting
35 pages
Automatic Forecasting
No ratings yet
Automatic Forecasting
11 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
From Everand
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
S. Twomey
No ratings yet
Step by Step Procdure by Power Point Presentation 5289M
No ratings yet
Step by Step Procdure by Power Point Presentation 5289M
34 pages
CETECOM Antenna Testing Pocket Guide
No ratings yet
CETECOM Antenna Testing Pocket Guide
2 pages
ECall Letter
No ratings yet
ECall Letter
2 pages
Finite Fields and String Matching Presentation
No ratings yet
Finite Fields and String Matching Presentation
10 pages
Kementerian Keuangan Republik Indonesia: Direktorat Jenderal Bea Dan Cukai Sekretariat Direktorat Jenderal Bea Dan Cukai
No ratings yet
Kementerian Keuangan Republik Indonesia: Direktorat Jenderal Bea Dan Cukai Sekretariat Direktorat Jenderal Bea Dan Cukai
3 pages
Schools Division of Cebu City/Lusaran National High School Workweek Plan
No ratings yet
Schools Division of Cebu City/Lusaran National High School Workweek Plan
2 pages
C# Concepts
No ratings yet
C# Concepts
2 pages
Accountinginthe Cloud
No ratings yet
Accountinginthe Cloud
15 pages
18 Ajit Gupta Android Practical
No ratings yet
18 Ajit Gupta Android Practical
122 pages
Assignment - 4 - Risk Response, Contingency and Control
No ratings yet
Assignment - 4 - Risk Response, Contingency and Control
4 pages
Corning 144F (12x12) Armoured
No ratings yet
Corning 144F (12x12) Armoured
4 pages
Chapter 1 - MIS
No ratings yet
Chapter 1 - MIS
62 pages
3.5.7 Lab - Create A Python Unit Test - ILM
No ratings yet
3.5.7 Lab - Create A Python Unit Test - ILM
9 pages
QBlade An Open Source Tool For Design An
No ratings yet
QBlade An Open Source Tool For Design An
6 pages
SNMP Datasheet For Ita Ups Sic snmp810
No ratings yet
SNMP Datasheet For Ita Ups Sic snmp810
2 pages
Camara Linksys WVC2300 Wireless
No ratings yet
Camara Linksys WVC2300 Wireless
80 pages
Cuet 2nd Day Admit Card
No ratings yet
Cuet 2nd Day Admit Card
2 pages
Non Traditional Machining Processes
No ratings yet
Non Traditional Machining Processes
108 pages
Vail CMMS
No ratings yet
Vail CMMS
24 pages
141 Colors That Start With Z (Names, Hex, RGB, CMYK)
No ratings yet
141 Colors That Start With Z (Names, Hex, RGB, CMYK)
77 pages
Diagnostic Systematic Reviews Road Map V3
No ratings yet
Diagnostic Systematic Reviews Road Map V3
2 pages
Datasheet - A-HV-3U Battery BOS-A V1.1
No ratings yet
Datasheet - A-HV-3U Battery BOS-A V1.1
6 pages
0417 s13 QP 31
No ratings yet
0417 s13 QP 31
8 pages
PROGRAM 24: C++ Program For Multilevel Inheritance
No ratings yet
PROGRAM 24: C++ Program For Multilevel Inheritance
23 pages
Machine Learning May 2024
No ratings yet
Machine Learning May 2024
8 pages
Fish Finder 300 C
No ratings yet
Fish Finder 300 C
24 pages
Digital System Design Q1 Q2
No ratings yet
Digital System Design Q1 Q2
3 pages
Simplification
No ratings yet
Simplification
18 pages