Assignment Final
Assignment Final
ASSIGNMENT
ECONOMETRICS 2
Hanoi, 2023
I. Introduction
Predicting the future movement of stock prices and their volatility is a common area of interest
for both financial professionals and academics. For investors, understanding the future behavior
of stock prices allows them to make informed decisions about how to allocate their assets in
order to maximize their returns. Similarly, financial scholars believe that accurately predicting
the prices of capital assets is crucial for developing more precise asset pricing theories. Over the
years, researchers have developed a variety of quantitative modeling techniques to forecast stock
prices and volatility, including combinations of ARIMA and GARCH models.
Within the scope of the Econometrics II course, this study will focus on applying ARIMA and
GARCH models to predict the closing prices and volatility of Binh Duong Mineral and
Construction JSC’s share on 10 trading days between August 1st, 2023 and August 14th, 2023.
Where Y t is the stationary I(d) series, µ is the intercept, ε t is the shock of Y t at time t, p is the
order of AR(p), ϕ p is the coefficient of AR(p), q is the order of MA(q), θq is the coefficient of
MA(q). In addition, the values assigned to parameters p, d and q must be non-negative integer. If
d = 0, then the ARIMA(p, d, q) model will become an ARMA(p, q) model.
In business and finance, ARIMA models is widely used to forecast future quantities and prices
for its simplicity and the capability to generalize non-stationary series.
II.2. ARCH - GARCH Models
The ARCH (autoregressive conditional heteroskedasticity) and GARCH (generalized
autoregressive conditional heteroskedasticity) models are statistical models used to model
time series data with time-varying volatility. In other words, they are used to model time series
data where the variance of the error term is not constant and instead changes over time.
ARCH models were developed by Robert Engle in 1982 to model the volatility of financial
returns. The ARCH model is a type of autoregressive (AR) model, which means that the current
value of the error term is a function of its past values. However, unlike a standard AR model, the
ARCH model allows the variance of the error term to change over time. This is done by making
the size of the ARCH coefficients (which represent the weights given to the past values of the
error term) dependent on the size of the past values of the error term themselves.
GARCH models were developed by Tim Bollerslev in 1986 as an extension of the ARCH model.
The GARCH model includes an additional term that represents the dependence of the current
variance on the past variances. This allows the GARCH model to capture more complex patterns
of volatility than the ARCH model.
The model of ARCH(p):
2 2 2
σ t =w+ γ 1 ε t −1+ ⋯ + γ p ε t − p +v t
where w is the long-run volatility parameter with positive value, v t is the residual at time t, p is
the number of actual errors included in the model, and γ 1 ,… , γ q are non-negative regression
q
coefficients such that ∑ γ j <1. Besides, the distribution of ϵ t given its lag values ϵ t −1 , … , ϵ t−q is
j=1
assumed to be normally distributed.
The model of GARCH(p, q):
2 2 2 2 2
σ t =w+ δ 1 σ t −1+ ⋯ +δ p σ t − p+ γ 1 ε t −1+ ⋯ + γ q ε t−q + v t
where p is the number of lag values of conditional variance, and δ 1 , … , δ pare non-negative
p q
regression coefficients such that ∑ δ i + ∑ γ j< 1.
i=1 j=1
Let {ks b t } be the series of closed prices of KSB shares, {gks bt }be the series of growth rates on
KSB shares, and { lks bt } be the series of log returns on KSB shares. Then, a summary of statistics
and the time series plots for all 03 time series in the data set is provided below:
Table 1. Descriptive Statistics
III.2. Methodology
Predicting KBS stock prices using ARIMA model
In this study, I will apply the Box-Jenkins method to choose the most suitable ARIMA models
for all 3 series – closing price, growth rate, and log return, after that I will employ those models
to forecast the future values of KBS stock.
Figure 4. Box-Jenkins Methods
Step 1: Electing data
My dataset will be divided into 2 parts: the training set and the validation set. The validation set
includes 10 observations, from August 1st, 2023 to August 14th, 2023, while the training set
consists of the rest of the series. This helps to evaluate and compare between the actual prices
and the forecast prices.
Step 2: Testing for stationary series
I will use the Dickey-Fuller test to check for stationarity. The null hypothesis is that the series is
unit root and non-stationary. If the DF statistic is larger than the critical value, then rejecting the
null hypothesis, the series is therefore stationary. Afterwards, I will plot the ACF and PACF of
the stationary series to determine the order. The order of AR is the furthest order of the series
having partial autocorrelation whereas the order of MA is the furthest order of the series having
autocorrelation.
Step 3: Estimating
Having chosen the order for ARIMA, I will estimate the intercept and coefficients of these
models. With the criteria of significant coefficients and AIC criteria, I will be able to elect the
optimal models.
Step 4: Diagnostic checking
Firstly, I will use the inverse unit root circle to check for stationarity of the autoregressive terms
and moving average terms in the model. If all inverse roots are within the unit circle, I can surely
conclude the AR process and MA process in the ARIMA model are stationary.
Secondly, I will use the ACF and PACF correlograms to check for autocorrelation and partial
autocorrelation in the residual series. If spikes at all orders are insignificant, then the residual
series has no autocorrelation and partial autocorrelation. Nevertheless, if there are significant
spikes at order smaller than 10 in the ACF and PACF correlograms, then the model is
misspecified and must be removed.
Finally, I will exhibit the Box-Ljung test to test for hypothesis that the residual series is a white
noise. If the p-value of the test is higher than 5%, then I can conclude that the residual series is a
white noise.
Step 5: Forecasting
From the models in the comparison table, I will choose 3 candidate models based on 3 criteria:
AR and MA process are stationary, residual series is a white noise process and most importantly,
smallest AIC values. Then, I will use these models to forecast for the 10 observations in the
validation data set. In the case of the growth rate series and log return series, I will apply the
following formulas to calculate KSB stock values for the last 10 days:
Table 2. Comparison between DF test and critical values for stock price series
Without drift With drift With trend
τ stat τ 0.05 τ stat τ 0.05 τ stat τ 0.05
-1.1962 -1.95 -1.9427 -2.87 -1.2526 -3.42
As we can see, ¿ τ stat ∨¿ ¿ τ 0.05∨¿ in all three cases of DF tests with trend, with drift and without
drift. Therefore, at significant level 5%, we can conclude that the stock price series is non-
stationary.
Table 3. Comparison between DF test statistics and critical values for 1st difference series,
growth rate series and log return series
Without drift With drift With trend
τ stat τ 0.05 τ stat τ 0.05 τ stat τ 0.05
First difference series
-13.4043 -1.95 -13.414 -2.87 -13.5548 -3.42
Growth rate series
-13.9902 -1.95 -13.9757 -2.87 -14.0994 -3.42
Log return series
-13.9262 -1.95 -13.9246 -2.87 -14.0577 -3.42
Table 4. DF test’s coefficient estimation results
Without drift With drift With trend
Intercept -34.22125 -180.22590 *
1st difference Lagged values -0.87599 *** -0.87837 *** -0.89230 ***
Time trend 0.74219 *
Intercept -0.0004194 -0.005089
Growth rate Lagged values -0.90812 *** -0.9084152 *** -0.9206 ***
Time trend 0.0000238
Intercept -0.0009087 -0.005765 *
Log return Lagged values -0.90225 *** -0.9036489 *** -0.9166 ***
Time trend 0.000025 *
*, **, ***: significant at 10%, 5%, 1%
From Table 3 and Table 4, at significant level 5%, we can conclude that our 1 st difference series,
growth rate series and log return series are stationary around 0.
Autocorrelation and Partial Autocorrelation
Figure 5. ACF and PACF correlograms of 1st difference series
From the ACF and PACF correlograms of 1st difference series, we can infer the possible values
for lag order p= 1,2,4,6 and for order of moving average q=1,2,4,6 too.
Figure 6. ACF and PACF correlograms of growth rate series
From the ACF and PACF correlograms of growth rate series, we can infer the possible values for
lag order p= 1,2,3,6 and for order of moving average q=1,2,3,6 too.
Figure 7. ACF and PACF correlograms of log return series
From the ACF and PACF correlograms of log return series, we can infer the possible values for
lag order p= 1,2,3,4,6 and for order of moving average q=1,2,3,4,6 too.
Fitting models
Table 5. 03 best ARIMA Model Specifications for the stock prices series
ARIMA models stock price series
ARIMA(4, 1, 2) ARIMA(6, 1, 2) ARIMA(4, 1, 4)
Coefficients
Significant Coefficients 5/6 4/8 4/8
Stationarity
Inverse Roots in Unit Circle All roots are All roots are All roots are
within unit circle within unit circle within unit circle
Residual Diagnostics
ACF No significant No significant No significant
autocorrelation at autocorrelation at autocorrelation at
any order any order any order
p-value of Box-Ljung Test 0.2366 0.34 0.2462
Information criteria
AIC 6318.79 6322.98 6321.61
From Table 5, we could see that ARIMA (4, 1, 2) has the most significant coefficients compared
to the two other models, and most importantly, it has the smallest AIC, therefore I choose
ARIMA (4, 1, 2) to forecast for the stock price.
Figure 8. Unit circle for ARIMA (4, 1, 2)
Figure 8 shows that all inverse roots lie inside the unit circle, so we can strictly conclude that the
1st difference series is stationary.
For ARIMA(4, 1, 2), the P-value is 23.66%, larger than the significance level of 10%, thus the
residual series of the model is white noises.
Table 6. 03 best ARIMA Model Specifications for the growth rate series
ARIMA models growth rate series
ARIMA(1, 0, 1) ARIMA(2, 0, 2) ARIMA(2, 0, 3)
Coefficients
Significant Coefficients 1/2 4/4 5/5
Stationarity
Inverse Roots in Unit Circle All roots are All roots are All roots are
within unit circle within unit circle within unit circle
Residual Diagnostics
ACF No significant No significant No significant
autocorrelation at autocorrelation at autocorrelation at
any order any order any order
p-value of Box-Ljung Test 0.6167 0.527 0.6649
Information criteria
AIC -1524.95 -1528.56 -1530.19
Table 6 implies that model ARIMA(2, 0, 3) has the most significant coefficients and the smallest
AIC, therefore it should be the most suitable model to forecast for the growth rate.
Figure 10 shows that all inverse roots lie inside the unit circle, so we can strictly conclude that
the growth rate series is stationary.
Figure 13 shows that all inverse roots lie inside the unit circle, so we can strictly conclude that
the log return series is stationary.
Similarly, I will do the same task with ARIMA (2, 0, 3) for growth rate series.
Table 9. Forecasting results for stock price using ARIMA (2, 0, 3)
From Table 10, RMSE and MAE of ARIMA (6, 0, 2) are smaller than those of two others.
Therefore, it could be concluded that model ARIMA (6, 0, 2) used for log-return series is the best
model to forecast. The equation of ARIMA (6, 0, 2) is:
From Table 11, the forecasting results have less than 5% error, within the allowable limits.
GARCH (1, 1) has the most numbers of significant coefficients. Its delta and gammas are all
non-negative, and their sum is smaller than 1. Therefore, it could be concluded that GARCH (1,
1) is the best model to forecast for the volatility of KSB’s log return.
The model of GARCH (1, 1) is:
2 2 2
σ t =0.00001563+0.07918 ε t−1 +0.904 σ t−1 +v t