Assignment No 03
Assignment No 03
Question 1
The Excel spreadsheet ”AssignmentQ1data.xls” contains data for the returns of FTSE 100
and its 5-minute realized variance from 04-01-2000 to 29-12-2017. This question draws
heavily from material across different weeks but the code that you need to use is very similar
to the ones used in our computer classes. You need to do the following tasks:
a) Import all variables above into the software of your choice and provide a summary
of the FTSE 100 returns using descriptive statistics and graphical analysis. Your
summary should contain information on any visible patterns (if any) in the data and
your next steps.
SOLUTION: Descriptive statistics
RETURN
Mean -6.26E-05
Median 0.000266
Maximum 0.093872
Minimum -0.088661
Std. Dev. 0.011567
Skewness -0.197665
Kurtosis 9.357238
Jarque-Bera 7669.565
Probability 0
Sum -0.284043
Sum Sq. Dev. 0.606911
Observations 4537
Central Tendency:
1. Mean (-6.26E-05):
The average return is close to zero (-0.0000626), suggesting the returns hover around
zero but with slight negativity over the dataset.
2. Median (0.000266):
The median return is slightly positive, indicating that at least half of the observed returns
are greater than or equal to 0.000266.
Dispersion:
3. Maximum (0.093872) & Minimum (-0.088661):
The returns range from -0.088661 to 0.093872, showing a wide variation in returns.
4. Standard Deviation (0.011567):
This indicates the returns are dispersed around the mean by approximately 0.011567. This
suggests moderate volatility in the returns.
Normality Test:
7. Jarque-Bera Statistic (7669.565) & Probability (0):
The Jarque-Bera statistic is significantly large, and the probability value is 0, indicating
the null hypothesis of normality is rejected. The returns are not normally distributed.
Additional Metrics:
8. Sum (-0.284043):
Over the 4537 observations, the cumulative return is slightly negative (-0.284043),
reflecting the slight negativity in the average return.
9. Sum of Squared Deviations (0.606911):
This reflects the total squared deviations of returns from their mean, emphasizing the
magnitude of fluctuations in the data.
10. Number of Observations (4537):
A robust sample size provides confidence in the reliability of these descriptive statistics.
Overall Interpretation:
The data suggests a slightly negative average return, with moderate volatility.
The distribution is not normal, showing negative skewness and leptokurtosis, which
implies higher probabilities of extreme negative and positive returns.
Risk managers and analysts should consider the non-normal nature and potential for
extreme outcomes when analyzing this dataset.
GRAPHICAL REPRESENTATION:
Mean dependent
R-squared 0.524293 var 4.87E-06
S.D. dependent
Adjusted R-squared 0.523978 var 0.016672
Akaike info
S.E. of regression 0.011503 criterion -6.091588
Sum squared resid 0.599374 Schwarz criterion -6.085925
Hannan-Quinn
Log likelihood 13813.63 criter. -6.089593
F-statistic 1664.223 Durbin-Watson 1.995949
stat
Prob(F-statistic) 0
Null Hypothesis: RETURN has a unit root
Exogenous: Constant
Lag Length: 2 (Automatic - based on SIC,
maxlag=31)
t-Statistic Prob.*
Mean dependent
R-squared 0.524293 var 4.87E-06
S.D. dependent
Adjusted R-squared 0.523978 var 0.016672
Akaike info
S.E. of regression 0.011503 criterion -6.091588
Sum squared resid 0.599374 Schwarz criterion -6.085925
Hannan-Quinn
Log likelihood 13813.63 criter. -6.089593
Durbin-Watson
F-statistic 1664.223 stat 1.995949
Prob(F-statistic) 0
Steps to Check for Stationarity and Making Variables Stationary
1. Checking for Stationarity
Stationarity of a time series means its statistical properties (mean, variance, and
autocorrelation) remain constant over time. To check for stationarity, follow these steps:
a) Visual Inspection
Plot the data: Examine the raw time series plot (like the one you provided). Non-
stationary data often shows trends, seasonality, or changing variance.
In your data: The mean appears constant, but volatility clustering may indicate
conditional heteroskedasticity.
b) Statistical Tests
Use statistical tests to confirm stationarity, such as:
o Augmented Dickey-Fuller (ADF) Test:
Null Hypothesis: The series has a unit root (non-stationary).
Alternative Hypothesis: The series is stationary.
Result from your ADF test:
The t-statistic (-43.58485) is far below the critical values at 1%, 5%, and
10%. The p-value is 0.0000, strongly rejecting the null hypothesis.
Conclusion: The series is stationary, and no further transformation is
necessary for this dataset.
o KPSS (Kwiatkowski-Phillips-Schmidt-Shin) Test (optional complement to
ADF):
Null Hypothesis: The series is stationary.
High test statistic or low p-value indicates non-stationarity.
c) Autocorrelation Analysis
Plot the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF):
o A slow decay in ACF suggests non-stationarity.
o Your data likely shows no slow decay, consistent with the ADF result.
a) Differencing
Apply first-order differencing: Compute the change between consecutive observations:
Dt=Xt−Xt−1 This removes trends in the data. Apply further differencing (e.g., second-
order) if required.
b) Log Transformation
Apply a logarithmic transformation to stabilize variance in data with an exponential
trend: Yt=ln(Xt)
c) Detrending
Use linear regression to model and remove deterministic trends. Subtract the fitted trend
line from the original series.
Coefficie t-
Variable nt Std. Error Statistic Prob.
-6.09E- -
C 05 0.000157 0.38797 0.6981
AR(1) 0.628215 0.279898 2.24444 0.0249
-
AR(2) -0.10658 0.208038 0.51229 0.6085
-
MA(1) -0.67411 0.281269 2.39666 0.0166
MA(2) 0.070914 0.224333 0.31611 0.7519
92.9922
SIGMASQ 0.000133 1.43E-06 4 0
Mean
dependent -6.26E-
R-squared 0.007716 var 05
S.D.
dependent 0.01156
Adjusted R-squared 0.006621 var 7
Akaike
info -
S.E. of regression 0.011529 criterion 6.08662
Schwarz -
Sum squared resid 0.602228 criterion 6.07813
Hannan- -
Log likelihood 13813.49 Quinn criter. 6.08363
Durbin- 1.99747
F-statistic 7.046607 Watson stat 8
Prob(F-statistic) 0.000001
FINDINGS:
1. AR (Autoregressive) Terms:
o AR(1) (Coefficient: 0.628): This term is statistically significant with a t-statistic
of 2.24 and a p-value of 0.0249 (< 0.05). It indicates that the first lag of the return
series has a positive and significant effect on current returns.
o AR(2) (Coefficient: -0.107): The second lag is not significant, with a p-value of
0.6085. It has a small and negative effect, suggesting no strong dependency on the
second lag.
2. MA (Moving Average) Terms:
o MA(1) (Coefficient: -0.674): This term is significant, with a t-statistic of -2.40
and a p-value of 0.0166 (< 0.05). It suggests that the first lag of the error term
influences the current returns negatively.
o MA(2) (Coefficient: 0.071): This term is not significant, with a p-value of
0.7519. It implies little contribution to explaining variations in returns.
3. Constant (C):
o The constant term is not significant (p = 0.6981), meaning there is no substantial
mean shift in the series.
4. Variance of Errors (SIGMASQ):
o The variance of the residuals is very small (1.33×10−41.33 \times 10^{-
4}1.33×10−4) and highly significant. This supports the model's good fit to the
observed volatility.
Model Diagnostics
1. Goodness-of-Fit:
o R-squared = 0.0077: Indicates the model explains a small fraction (0.77%) of the
variation in returns.
o Adjusted R-squared = 0.0066: Corrected for model complexity, confirming
limited explanatory power.
2. Residual Analysis:
o Durbin-Watson Statistic (1.997): Very close to 2, indicating no significant
autocorrelation in residuals.
o Sum of Squared Residuals (0.602): Shows a small magnitude of errors in the
model fit.
3. Information Criteria:
o Akaike Information Criterion (AIC = -6.087), Schwarz Criterion (SC = -
6.078), and Hannan-Quinn Criterion (HQ = -6.084): These metrics are used to
compare model performance. Lower values indicate better model fit.
4. F-statistic:
o F-statistic = 7.05 (p-value < 0.0001): The overall model is statistically
significant, suggesting that the combination of AR and MA terms explains some
variation in returns.
Diagnostics
Akaike Information Criterion (AIC): −6.087-6.087−6.087
Schwarz Criterion (SC): −6.078-6.078−6.078
Hannan-Quinn Criterion (HQ): −6.084-6.084−6.084
Durbin-Watson Statistic: 1.9971.9971.997 (No significant autocorrelation in residuals).
Comparison
-0.107
AR(2) Coefficient N/A AR(2) not significant in ARMA(2,2)
(p=0.609)
-0.674 -0.674
MA(1) Coefficient Significant in both models
(p=0.017) (p=0.017)
Conclusion
The ARMA(1,1) model is more parsimonious and performs as well as the ARMA(2,2)
model based on information criteria (AIC, SC, HQ).
The additional terms (AR(2), MA(2)) in ARMA(2,2) are not statistically significant and
do not improve the model fit.
Recommendation
Proceed with the ARMA(1,1) model, as it is simpler and achieves the same level of
performance as ARMA(2,2). Would you like to explore residual diagnostics or alternative
models?
(ii) Which model would you prefer? To answer this question you need to choose
information criteria. Use MAE and RMSFE and also explain the differences between
the two. Further, state which ones should be preferred and when.
SOLUTION:
MAE (Mean Absolute Error):
Definition: The average absolute differences between observed and predicted values.
Interpretation: Measures how far predictions deviate from the actual values without
considering the direction of the error (over or under-prediction).
Advantages:
o Easy to interpret.
o Less sensitive to large outliers because it does not square errors.
Limitations: Does not penalize large errors as heavily as RMSFE.
RMSFE (Root Mean Square Forecast Error):
Definition: The square root of the average squared differences between observed and
predicted values.
Interpretation: Places a larger penalty on large prediction errors than MAE because it
squares the errors before averaging.
Advantages:
o Useful when large prediction errors are critical and should be avoided.
o Penalizes larger errors more, making it sensitive to outliers.
Limitations: More influenced by outliers than MAE.
Error Handling Treats all errors equally Penalizes large errors heavily
(d) Explain how testing for ARCH effects works. State clearly the test and
results. Test for ARCH effects using the software of your choice.
Heteroskedasticity Test:
ARCH
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 01/19/25 Time: 23:26
Sample (adjusted): 2 4537
Included observations: 4536 after
adjustments
t-
Variable Coefficient Std. Error Statistic Prob.
17.8039
C 0.000104 5.81E-06 1 0.0000
15.0489
RESID^2(-1) 0.217803 0.014473 1 0.0000
E) Estimate a GARCH(p, q) model using the best performing ARMA(p,q) model as the
conditional mean. Do an out-of-sample (recursive) forecasting exercise for h = 1, 2, 3,
where you need to produce conditional variance forecasts for the last 100 observations.
Use RMFSE to select the optimal order for the GARCH model. Use both yt 2 (squared
returns of FTSE 100) and Realised variance as proxies to compute RMFSE. Explain
why the choice of the proxy matters.
z-
Variable Coefficient Std. Error Statistic Prob.
-
REALISED__VARIANCE__5_MIN_ -3.151126 5.19E-01 6.066581 0
Variance
Equation
Mean -6.26E-
R-squared 0.012945 dependent var 05
S.D. 0.01156
Adjusted R-squared 0.012945 dependent var 7
Akaike info
S.E. of regression 0.011492 criterion -6.5036
Schwarz
Sum squared resid 0.599054 criterion -6.49793
Hannan-
Log likelihood 14757.4 Quinn criter. -6.5016
Durbin-Watson stat 2.084302
Question 2
1 The Excel spreadsheet AssignmentQ2data.xlsx contains data on four time
series, X1, X2, X3 and X4. This question is similar to that posed in Computer
Class 9 but you are now dealing with four variables instead of three.
a) Examine whether these variables are stationary or non-stationary. Explain carefully
which test you would use to do the above. Note that you should allow for a linear trend in
the data where the lag selection determined automatically using the standard settings in
Eviews or in MATLAB.
Null Hypothesis: X1 has a unit root
Exogenous: Constant
Lag Length: 1 (Automatic - based on SIC,
maxlag=21)
t-
Statistic Prob.*
-
Augmented Dickey-Fuller test statistic 1.07902 0.7259
-
Test critical values: 1% level 3.43668
-
5% level 2.86423
-
10% level 2.56825
Coefficien t-
Variable t Std. Error Statistic Prob.
-
X1(-1) -0.00334 0.003095 1.07902 0.2808
-
D(X1(-1)) -0.25575 0.030644 8.34576 0
1.05808
C 0.119818 0.11324 9 0.2903
Mean 0.00342
R-squared 0.067457 dependent var 1
S.D. dependent 1.17894
Adjusted R-squared 0.065582 var 8
Akaike info 3.10229
S.E. of regression 1.139634 criterion 3
Schwarz 3.11703
Sum squared resid 1292.272 criterion 9
Hannan-Quinn 3.10789
Log likelihood -1545.04 criter. 8
Durbin-Watson 2.02122
F-statistic 35.9872 stat 3
Prob(F-statistic) 0
t-Statistic Prob.*
Coefficien
Variable t Std. Error t-Statistic Prob.
Mean 0.00826
R-squared 0.087217 dependent var 4
S.D. dependent 1.19880
Adjusted R-squared 0.085382 var 5
Akaike info
S.E. of regression 1.146485 criterion 3.11428
Schwarz 3.12902
Sum squared resid 1307.856 criterion 7
Hannan-Quinn 3.11988
Log likelihood -1551.026 criter. 5
Durbin-Watson 2.03873
F-statistic 47.53622 stat 9
Prob(F-statistic) 0
Null Hypothesis: X3 has a unit root
Exogenous: Constant
Lag Length: 3 (Automatic - based on SIC,
maxlag=21)
t-Statistic Prob.*
Coefficien
Variable t Std. Error t-Statistic Prob.
Mean 0.00992
R-squared 0.241366 dependent var 8
S.D. dependent 1.23467
Adjusted R-squared 0.238304 var 8
Akaike info 2.99229
S.E. of regression 1.077568 criterion 7
Schwarz 3.01691
Sum squared resid 1150.702 criterion 4
Hannan-Quinn 3.00165
Log likelihood -1485.164 criter. 5
Durbin-Watson
F-statistic 78.8239 stat 2.02187
Prob(F-statistic) 0
t-
Statistic Prob.*
-
Augmented Dickey-Fuller test statistic 0.04877 0.9528
-
Test critical values: 1% level 3.43668
-
5% level 2.86423
-
10% level 2.56825
Coefficien t-
Variable t Std. Error Statistic Prob.
-
X4(-1) -0.00011 0.002159 0.04877 0.9611
-
D(X4(-1)) -0.10071 0.031608 3.18622 0.0015
0.30826
C 0.038362 0.124444 8 0.7579
Mean 0.02968
R-squared 0.010155 dependent var 4
S.D. dependent 1.06160
Adjusted R-squared 0.008166 var 8
Akaike info
S.E. of regression 1.057265 criterion 2.95225
Schwarz 2.96699
Sum squared resid 1112.22 criterion 6
Hannan-Quinn 2.95785
Log likelihood -1470.17 criter. 5
Durbin-Watson 1.98583
F-statistic 5.104056 stat 4
Prob(F-statistic) 0.006232
Conclusion: None of the variables (X1, X2, X3, X4) are stationary at level form
b) Estimate a VAR(1) model for the four variables, again allowing for a linear
trend. Like in the previous computer class, you can use Eviews’ built-in
sequential modified LR test statistic to determine an appropriate order for a
VAR model for these variable
Vector Autoregression Estimates
Date: 01/20/25 Time: 23:25
Sample (adjusted): 3 1000
Included observations: 998 after
adjustments
Standard errors in ( ) & t-statistics in [ ]
X1 X2 X3 X4
0.00456 - 0.03340 -
X1(-2) 1 0.02206 5 0.08218
- - - -
0.03854 0.03606 0.03608 0.03821
[- [-
[ 0.1183 0.61177 [ 0.9258 2.15091
6] ] 4] ]
- - - -
X2(-2) 0.03616 0.02712 0.09818 0.06538
- - - -
0.03679 0.03443 0.03445 0.03648
[- [- [- [-
0.98263 0.78753 2.85004 1.79215
] ] ] ]
- 0.00520 - 0.10461
X4(-2) 0.02044 6 0.01278 7
- - -
-0.0369 0.03453 0.03455 0.03658
[- [-
0.55388 [ 0.1507 0.36985 [ 2.8598
] 6] ] 6]
- 20.4774
C 3.23819 -0.1147 7 -5.7501
- - - -
1.28031 1.19806 1.19869 1.26932
[- [- [-
2.52923 0.09574 [ 17.083 4.53008
] ] 3] ]
0.49093
Determinant resid covariance (dof adj.) 4
0.47346
Determinant resid covariance 3
-
Log likelihood 5291.31
10.6759
Akaike information criterion 7
10.8529
Schwarz criterion 3
Number of coefficients 36
Conclusion
Based on the high fit, t-statistics, and low determinant residual covariance, the VAR(1) model
appears to describe the dynamics effectively. To finalize the order, the sequential modified LR
test can confirm whether lag 1 suffices or if a higher order is needed. Incorporating a linear trend
is advisable if a trend in the data is evident
(C)Perform the Johansen test for cointegration imposing the lag order
identified in Question 2. Allow for a linear trend in the data by allowing for an
intercept in the cointegrating equation and test VAR. Explain carefully on how
many cointegrating relationships do you find?
47.8561
None * 0.171631 374.7514 3 0.0001
29.7970
At most 1 * 0.166022 187.3963 7 0.0001
15.4947
At most 2 0.005992 6.756266 1 0.6061
3.84146
At most 3 0.00078 0.776594 6 0.3782
Max-
Hypothesized Eigen 0.05
No. of CE(s) Eigenvalu Statistic Critical Prob.*
e Value *
27.5843
None * 0.171631 187.3551 4 0.0001
21.1316
At most 1 * 0.166022 180.64 2 0.0001
At most 2 0.005992 5.979672 14.2646 0.6158
3.84146
At most 3 0.00078 0.776594 6 0.3782
X1 X2 X3 X4
1.308984 -2.04585 1.283201 0.0301
-0.90722 0.478443 2.218595 -0.69324
0.073212 0.03013 0.029599 -0.08803
0.03926
0.027808 0.02071 -0.01008 1
Log
likelihoo
1 Cointegrating Equation(s): d -5341.29
Log
likelihoo
2 Cointegrating Equation(s): d -5250.97
Log
likelihoo
3 Cointegrating Equation(s): d -5247.98
Hypothesized No. of CE(s) Trace Statistic Critical Value (5%) p-value Decision
Hypothesized No. of CE(s) Max-Eigen Statistic Critical Value (5%) p-value Decision
Interpretation: The maximum eigenvalue test also confirms that there are 2 cointegrating
relationships at the 5% significance level.
3. Cointegrating Relationships
Number of Cointegrating Equations: Both the Trace Test and Maximum Eigenvalue
Test consistently indicate 2 cointegrating relationships.
Normalized Cointegrating Coefficients:
For the two identified cointegrating equations:
1. X1−1.56293X2+0.980303X3+0.022995X4=0
X2−4.19009X3+1.141571X4=0
These coefficients indicate long-term equilibrium relationships among the variables
X1,X2,X3, and X4.
Conclusion:
The Johansen test identifies 2 cointegrating relationships among the variables X1,X2,X3,and
X4 when allowing for a linear deterministic trend. These relationships suggest that there exist
long-term equilibrium dynamics binding the variables together, despite short-term deviations.
(D)How would your answer change in the previous question if the Johansen
procedure determined more cointegrating relationships.
3. Adjustment Coefficients
The adjustment coefficients (α\alphaα) would describe how deviations from the new, larger set of
equilibrium relationships are corrected. With more cointegrating equations, each variable would
respond to a larger number of deviations. For instance:
If there are 3 cointegrating equations, the adjustment coefficients
(D(X1),D(X2),D(X3),D(X4) would reflect corrections for deviations from all 3
equations.
4. Interpretation
The system’s dynamics would appear more interconnected, with more robust long-term
linkages.
Short-term shocks to any variable would need to correct for deviations from more
equilibrium conditions. This could result in faster or more intricate adjustment processes.
Conclusion
If the Johansen procedure determined more cointegrating relationships, it would reveal a more
interconnected system with stronger long-term equilibrium dynamics. The analysis would
expand to include these additional relationships, and the adjustment process for short-term
deviations would involve corrections toward multiple equilibrium conditions, leading to a more
stable and predictable system in the long run.
(e) Estimate the Vector Error Correction Model that is implied by the results
from the Johansen test for cointegration in Question 3. You need to write out
the estimated long run relationship(s) and also the equations governing the
short run dynamics of the system. Discuss in depth your results.
1. Long-Run Relationships
The Johansen test indicated 2 cointegrating relationships among the variables
X1,X2,X3,X4. These relationships are derived from the normalized cointegrating
coefficients:
Normalized Cointegrating Equations:
1. X1−1.56293X2+0.980303X3+0.022995X4=0
2.X2−4.19009X3+1.141571X4=0.
These equations imply:
The first cointegrating relationship describes a long-term equilibrium among
X1,X2,X3,X4, where deviations from this equilibrium would lead to
adjustments in the system.
The second cointegrating relationship highlights another equilibrium,
primarily driven by X2,X3,X4.
2. VECM Representation
The VECM captures both:
1. Short-Run Dynamics (via lagged differences of variables).
2. Long-Run Adjustments (via error correction terms derived from
cointegrating relationships).
For n=4n variables and r=2r cointegrating relationships, the VECM is:
Where:
ΔX: First differences of the variables (short-run changes).
α: Adjustment coefficients, showing how quickly each variable corrects
deviations from long-run equilibrium.
β: Cointegrating matrix (long-run relationships).
Γ: Coefficients for lagged differences (short-run dynamics).
μ: Deterministic trend or intercept.
ϵt: Error terms.
3. Estimated VECM Equations
Using the results, we can write the estimated short-run dynamics for each variable (ΔX1,
ΔX2, ΔX3, ΔX4)
General Form:
4. Interpretation of Results
Adjustment Coefficients (α):
aij values indicate how strongly each variable adjusts to deviations from the
cointegrating relationships:
o A large absolute value of αij suggests that the variable responds
significantly to restore equilibrium when deviations occur.
o For example:
α11: Adjustment of X1 to the first equilibrium.
α21: Adjustment of X2 to the first equilibrium.
Short-Run Dynamics (Γi):
The Γi coefficients capture the effects of lagged differences of variables on
current changes. These coefficients explain short-term interactions between
the variables.
Cointegrating Relationships (β):
The β (normalized cointegrating coefficients) defines the long-term
equilibria binding the variables.
5. Discussion of Results
The system has 2 long-run equilibria, indicating strong and stable
relationships among the variables in the long term. These relationships
suggest economic, financial, or physical interdependencies.
Short-term deviations occur, but the adjustment coefficients show that the
system gradually restores equilibrium. Variables with higher α\alphaα-values
play a stronger role in the adjustment process.
The inclusion of short-run dynamics (Γi) ensures that transient changes are
captured, improving the model's accuracy.
6. Practical Implications
Policymakers or analysts can focus on the cointegrating relationships to
understand the fundamental links among variables.
The adjustment coefficients help predict how quickly the system reacts to
shocks or deviations from equilibrium.
The short-run dynamics highlight immediate interdependencies, useful for
forecasting and intervention.
This comprehensive VECM provides insights into both long-term stability and
short-term fluctuations in the system.