Sas Arma Forecast
Sas Arma Forecast
Sas Arma Forecast
| and
1
u .
To estimate the parameters SAS uses the ESTIMATE statement as part of PROC ARIMA.
Some important options to this statement that are used are: noint to set the intercept
to 0, method to set the use of the maximum-likelihood methodology, and p and q to tell
the SAS system the order of the ARMA equation.
In this study we use noint to set the intercept to 0. This is because we have already
transformed the series to a mean 0 untrended process. The intercept should be 0.
Maximum-likelihood estimation (MLE) is a common method for calculating that
parameter estimates, and has been shown to have good properties of an estimator. For
this study, theres no strong reason to move away from using MLEs.
The final two modifiers mentioned are used to specify the AR order p and the MA order
q. In order to provide the researcher with control over the specific |
i
and u
i
estimated,
this is actually a list in parentheses. For example to model an AR(5) with no |
2
or |
4
term the command would be:
p=(1,3,5)
Putting all this information together, the final form of the ESTIMATE statement used for
an ARMA(1,1) model is:
ESTIMATE p=(1) q=(1) noint method=ml;
And when used in conjunction with PROC ARIMA:
PROC ARIMA DATA=GEData;
IDENTIFY var=lnGE(1)
nlag=30
center;
ESTIMATE p=(1) q=(1) noint method=ml;
run;
As an aside, notice the use of the data statement. While not required, many SAS PROCs
output new data objects and thus, unbeknownst to the programmer, alter the last called
data object. Thus, as a defensive measure, it is good software engineering practice to
always explicitly define what data object is being used. This will have an additional
benefit with regard to future maintainability of the code as well.
The result of the above SAS statement is
1
| =.-89396 with a
1
|
se = .08490 and
1
u =.83655
with a
1
u
se =.10416. At this point care must be taken to establish what convention the
SAS system is using to define these estimates. Some programs use the convention
Y
t
+E|
i
=Eu
i
c
t-1
+ c
t
while others use the alternate form Y
t
= E|
i
u
i
c
t-1
+ c
t
. SAS uses the
first form, so to convert to the same form as originally stated we have to change the
sign of
1
=.894Y
t-1
+ .837c
t-1
Prediction
Once we have the model, forecasting becomes an exercise in conditional expectation.
For the ARMA(1,1) the resulting maximum likelihood estimator is:
t
Y
=
1
|
1
t
Y
Notice that the MA term has dropped off as it is multiplied by a term with an expected
value of 0.
We can use SAS to generate this prediction with the FORECAST statement of PROC
ARIMA. Two useful options to this statement are LEAD and OUT.
LEAD allows the researcher to set the number of time steps into the future to forecast.
Out is used to specify the DATA object to put the results into. As this model is working
with a transformed time series, this statement is necessary to enable translation of
forecasts into the original terms.
The complete forecast statement is:
proc arima data=GEData;
identify var=lnGE(1) scan esacf nlag=30 center;
estimate p=(1) q=(1) noint method=ml;
forecast lead=4 out=predictOut;
run;
Note that the FORECAST statement must come after the ESTIMATE as the results of
estimate are used to specify the model.
Using this command will result in both an estimate and a nice confidence interval for the
process.
However, this still needs to be transformed back to meaningful units. This is
accomplished with the following data step:
data predictOut;
set predictOut;
l95 = exp( l95 );
u95 = exp( u95 );
forecast = exp( forecast);
This results in a point estimate of next weeks closing pricing of $17.75 with a 95%
confidence interval of {$16.23, $19.42}. The actual closing price of GE on December 13
th
was $17.62.
Creating Quality Graphical Output
A plot of the prediction is also of value, however, a significant amount of manipulation
of the data is required to achieve a professional look. Among the challenges are: the
output of FORECAST does not provide a date for each predicted time step, the plot
should include both the original GE data as well as the forecast, and creating a shaded
region to depict the prediction interval. To accomplish these requires some
manipulation with the data step. The following DATA statement merges the two data
objects, fixes the timestamp problems and creates two new variables, FL95 and FU95 for
the forecast in the prediction time period. Additionally, it creates an extra row that is
used to create dummy values of FL95 and FU95 so that a shaded region can be drawn:
data allData;
MERGE GEData predictOut;
*by TradeDate;
IF TradeDate EQ . THEN TradeDate='06DEC2010'D + (_n_-
523)*7;
IF TradeDate EQ '06DEC2010'D THEN
DO;
FL95=AdjClose;
FU95=AdjClose;
END;
IF TradeDate GE '08DEC2010'D THEN FL95=L95;
IF TradeDate GE '08DEC2010'D THEN FU95=U95;
FORMAT TradeDate Date9.;
IF TradeDate = '03JAN2011'D THEN DO; *Create extra row for
shading;
Output;
TradeDate = '03JAN2011'D;
G
E
S
h
a
r
e
P
r
i
c
e
12
13
14
15
16
17
18
19
20
21
22
Time Axis
01JAN2010 26FEB2010 23APR2010 18JUN2010 13AUG2010 08OCT2010 03DEC2010 28JAN2011
FL95=17.72;
FU95=17.72;
END;
OUTPUT;
Most of this is straightforward. The lone exception is the creation of the extra row. The
reason for creating this is so the drawing algorithm can accurately determine the
bounds of the polygon it is drawing. The start of the code to create the extra row begins
on the line with the associated comment. The output statement saves the current copy,
and a new one is also created. Then the values for that new row are adjusted as
needed, specifically setting the second set of bounds for FL95 and FU95.
Finally, it is important to note that this code is not particularly maintainable. First, the
use of 06DEC2010, 08DEC2010, and 523 tie the command specifically to the current
data set. Second, the time-shift is specified in days and always seven. Production code
should seek to address these issues. A good starting point would be the article by
Croker referenced in the bibliography.
Conclusion
Implementing Time Series analysis in SAS is surprisingly easy. The commands are
relatively straightforward and implement the core algorithms needed for ARMA
modeling, as well as providing the accompanying plots.
Appendix I SAS Code
data GEData;
infile 'F:\SASAssignmentNotes\Project\ge_data.csv' DLM=',' FIRSTOBS=2;
input TradeDate :MMDDYY10. Open High Low Close Volume
AdjClose;
format TradeDate Date9.;
lnGE = log(AdjClose);
lnGELagged = lag(lnGE);
GEReturn = lnGE-lnGELagged;
GELagged = lag(AdjClose);
DiffedGE = AdjClose - GELagged;
output;
run;
proc print;run;
ods graphics on;
proc timeseries print=summary plots=pacf data=GEData;
var lnGE;
proc timeseries print=summary plots=acf data=GEData;
var lnGE;
run;
proc arima data=GEData;
identify var=lnGE(1) scan esacf nlag=30 center;
estimate p=(1) q=(1) noint method=ml;
forecast lead=4 out=predictOut;
run;
data predictOut;
set predictOut;
l95 = exp( l95 );
u95 = exp( u95 );
forecast = exp( forecast );
proc print data=predictOut;
run;
data allData;
MERGE GEData predictOut;
IF TradeDate EQ . THEN TradeDate='06DEC2010'D + (_n_-523)*7;
IF TradeDate EQ '06DEC2010'D THEN
DO;
FL95=AdjClose;
FU95=AdjClose;
END;
IF TradeDate GE '08DEC2010'D THEN FL95=L95;
IF TradeDate GE '08DEC2010'D THEN FU95=U95;
FORMAT TradeDate Date9.;
IF TradeDate = '03JAN2011'D THEN DO;
Output;
TradeDate = '03JAN2011'D;
FL95=17.72;
FU95=17.72;
END;
OUTPUT;
proc print data=allData;
run;
goptions reset=all;
symbol1 value=none i=join line=1 c=black co=libgr;
symbol2 value=none i=join line=3 c=blue co=libgr;
*symbol2 value=none i=join line=3 c=CX803009 co=libgr; *if you prefer
orange;
symbol3 value=none I=ms co=libgr c=gwh;
symbol3 value=none I=ms co=libgr c=CXD9A465;
symbol3 value=none I=ms co=libgr c=CXE5C5C2; * or pink...;
axis1 label=("Time Axis" )
order=('01JAN2010'D to '29JAN2011'D by 56)
value=(h=1 angle=0 rotate=0) ;
* angle MUST come before the text or the text won't be rotated;
axis2 label=(angle=90 rotate=0 "GE Share Price") order=(12 to 22);
Proc Gplot data=allData;
PLOT FL95*TradeDate=3 FU95*TradeDate=3
AdjClose*TradeDate=1 Forecast*TradeDate=2
/overlay haxis=axis1 vaxis=axis2;
run;
quit;
Appendix II Sources
- Presentation Quality Forecast Visualization with SAS/Graph by Samuel T. Croker
http://www.nesug.org/proceedings/nesug07/np/np04.pdf
- Time Series Analysis: With Applications In R by Jonathan D. Cryer and Kung-Sik Chan
- Time Series Analysis and Its Applications: With R Examples by Robert Shumway and
David Stoffer