Econometrics Assignment
Econometrics Assignment
FACULTY OF MATHEMATICS
ECONOMETRICS 2
ASSIGNMENT TOPIC:
APPLICATION OF ARIMA MODEL TO FORECAST
THE STOCK PRICE OF VINHOMES JSC
Class: Actuary 63
Teacher: Bui Duong Hai
Student: Luu Huong Nhi
ID: 1121942
Hanoi, 11/2023
Table of Contents
INTRODUCTION................................................................................3
METHODOLOGY...............................................................................4
Box – Jenkins method............................................................................................................4
1. Test for Stationary...........................................................................................................5
2. Estimated models............................................................................................................6
3. Valuating Models............................................................................................................8
4. Forecasting......................................................................................................................8
DATA...................................................................................................9
1. Data collect......................................................................................................................9
2. Descriptive statistic.........................................................................................................9
Data Visualization................................................................................................................10
MODELS............................................................................................11
1. Test for Stationary – Original series.............................................................................11
2. Test for Stationary – Different series............................................................................11
3. Test for Stationary – Log-transformed series................................................................12
4. ACF & PACF................................................................................................................13
5. Building ARIMA models..............................................................................................14
6. Model valuation............................................................................................................15
Test set and MAE, RMSE.................................................................................................17
7. Forecast with ARIMA...................................................................................................19
8. ARCH - GARCH models..............................................................................................20
CONCLUSION..................................................................................21
REFERENCE.....................................................................................21
APPENDIX........................................................................................22
INTRODUCTION
Did you know that Vinhomes JSC is the leading real estate developer in Vietnam? Vinhomes
JSC, known for its expertise in the real estate field, has been a key player in the Vietnamese
market for several years. Established in 2012, the company has rapidly grown and diversified
its portfolio, offering a wide range of residential, commercial, and industrial properties. With
a strong focus on quality and innovation, Vinhomes JSC has gained a solid reputation in the
market and has become a preferred choice for investors and homebuyers alike.
The purpose of this report is to provide a comprehensive analysis and forecast of the stock
price of Vinhomes JSC, a prominent real estate company in Vietnam. In this report, the data
collected is the company's trading history from 1/1/2022 to 31/7/2023 and time series
analysis is used to forecast the stock price of the company. First, the company's historical
price data is examined to identify any trends or patterns. These trends and patterns are then
used to develop a forecasting model. Finally, the forecasting model is used to predict the
company's stock price in August 2023.
3
METHODOLOGY
4
1. Test for Stationary
To first prepare for building the ARIMA model, we must assess whether the dataset is
stationary or not. This is done using Augmented Dickey-Fuller (ADF) Test.
b^
Statistic: τ = ; if |τ|>|τ α|thenreject H 0
^
se ( b)
b^
Statistic: τ = ; if |τ|>|τ α|thenreject H 0
^
se ( b)
¿ μ
Long run mean: μ =
1- ϕ
b^
Statistic: τ = ; if |τ|>|τ α|thenreject H 0
^
se ( b)
Stationary around trend, or deterministic trend, or trend – stationary
2. Estimated models
ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. This
acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:
AR: Autoregression. A model that uses the dependent relationship between an
observation and some number of lagged observations.
I: Integrated. The use of differencing of raw observations (i.e. subtracting an observation
from an observation at the previous time step) in order to make the time series stationary.
5
MA: Moving Average. A model that uses the dependency between an observation
and residual errors from a moving average model applied to lagged observations.
A standard notation is used of ARIMA(p,d,q). The parameters of the ARIMA model are
defined as follows:
p: The number of lag observations included in the model, also called the lag order.
d: The number of times that the raw observations are differenced, also called the degree
of differencing.
q: The size of the moving average window, also called the order of moving average.
Two diagnostic plots can be used to help choose the p and q parameters of the ARMA or
ARIMA. They are:
Autocorrelation Function (ACF). The plot summarizes the correlation of an observation
with lag values. The x-axis shows the lag and the y-axis shows the correlation coefficient
between -1 and 1 for negative and positive correlation.
Partial Autocorrelation Function (PACF). The plot summarizes the correlations for an
observation with lag values that is not accounted for by prior lagged observations.
MA(q) model
The model equation where y t depends only on the lagged forecast errors is:
AR(p) model
The model equation where y t depends only on its own lags is:
3. Valuating Models
After estimating the parameters of a trial-identified ARIMA model, there are tests need to be
conducted to verify that the model is appropriate. The ways to do this are as follows:
Checking the residual: If the residuals et is white noise, then accept the model. Otherwise,
we need to start over. The tests that can be used are the BP (Box-Prier) test or the Ljung-
box test
Checking unit circle: Plot the poles of the ARIMA model on the unit circle. If all of the
poles of the ARIMA model lie inside the unit circle, then accept the model.
If there is more than one correct model, the model will be selected based on
different metrics such as AIC, MAE, RMSE.
4. Forecasting
Once a forecasting model has been built and evaluated, it can be used to make forecasts for
future time periods. To make a forecast, the forecasting model is run with the test set (last 10
observations) as input, and the model produces a prediction for the next time period.
7
DATA
1. Data collect
The historical stock price of Vinhomes JSC (VHM) from January 1, 2021 to July 31, 2023
was obtained from Vietstock. Two variables, closing price “Close” and growth rate
“Change”, were selected for forecasting. The last 10 observations were reserved for testing
and comparing models.
Variable Description
Close Closing prices of stocks
Change The growth rate of the stock price each day
2. Descriptive statistic
Close
Mean 58,300
Standard Error 0,535
Median 56,500
Mode 55,000
Standard Deviation 10,580
Sample Variance 111,930
Kurtosis -0,681
Skewness 0,470
Range 42,077
Minimum 40,900
Maximum 82,977
Count 391
8
The minimum and maximum closing prices are 40,900 VND and 82,977 VND,
respectively. This suggests that the closing price data ranges from 40,900 VND to
82,977 VND.
Data Visualization
MODELS
1. Test for Stationary – Original series
τ α=1 % =−3.44
τ α=5 % =−2.87 Unit root,
drift ¿ τ ∨¿ 2.2992
τ α=10 % =−2.57 nonstationary
τ α=1 % =−3.98
τ α=5 % =−3.42 Unit root,
trend ¿ τ ∨¿ 1.1985
% =−3.13
τ α=10plot
Figure 4. Change series nonstationary
The results above show that the original series is non-stationary at the 5% significance level.
This mean that the mean, variance, or autocorrelation of the series changes over time. Non-
stationary time series cannot be modeled directly, so they must first be transformed into
stationary series. This can be done by differencing or taking the logarithm of the differenced
series.
9
2. Test for Stationary – Different series
tt - - -0.0009
10
3. Test for Stationary – Log-transformed series
tt - - 1.437e-05
11
Table 5. Unit root test for log.series
The Unit Root test results indicate that the log.series is stationary at 5% significant level. This
means that the mean, variance, and autocorrelation of the series do not change over time, thus
the series can be used for forecasting and other statistical analyses. The tt term is relatively
small and the z.lag1 is significantly negative suggests that The linear trend in the data is weak
and likely to remain stable.
Since the diff.series and the log.series is stationary, the next step is to plot the autocorrelation
function (ACF) and partial autocorrelation function (PACF) to determine the order of the
ARIMA model. The order of the ARIMA model is equal to the number of significant spikes
in the ACF and PACF plots.
The absence of significant autocorrelation in the ACF and PACF plots of a differenced series
indicates that simple time series models, such as ARIMA, will likely be unable to forecast the
12
series accurately. In this case, it may be necessary to use other methods of transforming the
series to make it stationary, such as the log.series.
The figure above shows that for the log.series, the ACF and PACF plot show significant
values at lags 1 and 7. This means that the order of the ARIMA model is likely to be
ARIMA(7,0,7) model. In addition, other ARIMA models will also be tested to choose the
best fitted model.
The table above shows that the log.series with ARIMA(7,0,1) has the smallest AIC value,
which means that it is the best fitting model for the data according to the AIC criterion.
ARIMA(7,0,1) model is a model that predicts the current value of a time series based on the
previous 7 values and the previous errors in the series. In the next step, additional tests will
be computed to evaluate for model’s performance.
13
6. Model valuation
The figure shows that the unit root of both ARIMA(7,0,1) and ARIMA(6,0,7) models is
inside the unit circle, meaning that the model is more stable and less likely to fluctuate wildly
over time. Therefore, the model is suitable for making predictions about future values of the
time series.
14
Figure 8.1. Unit Root circle for ARIMA(7,0,1) model
15
Homoscedasticity: Homoscedasticity is a desirable property for the residuals of a
model because it allows us to use the least squares method to estimate the model's
coefficients. If the variance of the residuals is not constant over time, then the least
squares method may not provide efficient estimates of the model's coefficients.
No autocorrelation: Autocorrelation in the residuals is a problem because it indicates
that the model has not captured all of the relevant patterns in the data. If there is
autocorrelation in the residuals, then the model's predictions are likely to be
inaccurate.
The figures above show that the residuals of the two models satisfy all three of these criteria.
This means that both models can be used to make predictions.
16
Actual vs. Fitted value
65.000
63.000
61.000
59.000
57.000
55.000
1 2 3 4 5 6 7 8 9 10
Actual Fitted
The graph further demonstrate that the ARIMA(7,0,1) model's predicted values are quite
close to the actual data, which suggests that the model can identify underlying patterns in the
data without being overly sensitive to noise and therefore is reliable when making future
forecasts
17
Date Actual value Forecast value
01/08/2023 62,800 62,929
02/08/2023 61,900 62,864
03/08/2023 60,100 62,833
04/08/2023 63,000 62,819
07/08/2023 62,900 62,777
08/08/2023 62,800 62,718
09/08/2023 60,600 62,663
10/08/2023 60,600 62,613
11/08/2023 60,900 62,558
14/08/2023 61,200 62,508
15/08/2023 61,700 62,460
16/08/2023 62,900 62,408
17/08/2023 61,000 62,354
18/08/2023 56,800 62,301
21/08/2023 56,000 62,248
22/08/2023 55,500 62,195
23/08/2023 54,500 62,143
24/08/2023 55,400 62,090
25/08/2023 54,100 62,038
28/08/2023 54,700 61,985
29/08/2023 54,600 61,932
30/08/2023 54,600 61,879
The table above shows that the forecast values have a larger deviation from the actual values
than the forecast for the test set. This is likely due in part to the long forecast horizon. A long
forecast horizon is a forecast made for a distant future period. The longer the forecast
horizon, the more difficult it is to forecast accurately, as the ARIMA model assumes that the
patterns in the data will continue into the future. If the patterns in the data change, the
ARIMA model's forecasts will be less accurate.
The small coefficients of the ARCH(1) model suggest that the lagged squared residuals of the
model are statistically significant in explaining the current volatility, but to a very small
degree. This means that the volatility of the series is persistent, but the persistence is very
18
weak. Nevertheless, the ARCH(1) model is still sufficient to capture the volatility dynamics
of the series and in this case, there is no need to use a more complex model such as GARCH.
Overall, the model can still be used to forecast volatility. However, it is important to be aware
that the forecasts may be less accurate than the forecasts from a model with larger
coefficients.
Forecast
2 2
ARCH(1) model: σ t =(3.271e-04 )+(2.894e-01 )ε t−1
w 3.271e-04
Estimated unconditional variance: σ 1= ¿> σ 1=
1−( ∑ δ j+ ∑ γ j ) 1−(2.894e-01)
Date Forecast value
01/08/2023 -6.021e-04
02/08/2023 8.845e-05
03/08/2023 2.693e-04
04/08/2023 -8.691e-04
07/08/2023 -1.238e-03
08/08/2023 -7.726e-04
09/08/2023 -2.621e-03
10/08/2023 -2.421e-03
11/08/2023 -2.286e-03
14/08/2023 -2.182e-03
15/08/2023 -2.022e-03
16/08/2023 -1.862e-03
17/08/2023 -1.746e-03
18/08/2023 -1.565e-03
21/08/2023 -1.401e-03
22/08/2023 -1.252e-03
23/08/2023 -1.113e-03
24/08/2023 -9.869e-04
25/08/2023 -8.735e-04
28/08/2023 -7.696e-04
29/08/2023 -6.774e-04
30/08/2023 -5.954e-04
CONCLUSION
19
Table 9. Forcast volatility using ARCH(1) model
This report focuses on using time series analysis to forecasting the future stock price of
Vinhomes JSC. The data set is collected from January 2022 to July 2023, containing 391
observations, in which 381 observations is used for training set and the remained 10
observations is used for test set.
Using Box-Jenkins method, The ARIMA(7,0,1) model was identified as the best-fitting
model, suggesting that current value of the variable is dependent on its past 7 values and the
previous error. However, with 95% confident interval, the forecast data seems to be not as
close as the actual data.
Additionally, an ARCH(1) model was used to predict the volatility of the model. Though the
small coefficient of the ARCH(1) model suggests that the volatility of the stock price is
weakly persistent, meaning that past volatility has a small but statistically significant impact
on future volatility. Therefore, it is important to use caution and to be aware of the
limitations of volatility forecasting.
It is also essential to note that all stock price forecasts are uncertain because the stock market
is a complex and unpredictable system. There are many factors that can affect stock prices
and the ARIMA model is not perfect, it is impossible to accurately predict the stock price of
Vinhomes JSC. The accuracy of the ARIMA model's forecasts will also depend on the future
performance of the stock market. If the stock market performs unexpectedly well or poorly,
the model's forecasts will be less accurate. As a result, investors should consider multiple
factors and use stock price forecasts with caution.
REFERENCE
[1] Edwards W.Frees (2010). Regression Modeling with Actuarial and Financial
Applications, Cambridge University Press, Chapter 7, 8, 9.
[2] Bui Quang Trung (2010). "Application of the ARIMA model to forecast VNIndex",
Collection of Reports of the 7th Student Science Research Conference, University of Da
Nang.
[3] Al-Zeaud, H. A. (2011). “Modelling and forecasting volatility using ARIMA model”,
European Journal of Economics, Finance and Administrative Sciences, 35.
[4] Fahim Faisal1 (2012). “Forecasting Bangladesh's Inflation Using Time Series ARIMA
Models”, World Review of Business Research Vol. 2. No. 3.
[5] Md. Zahangir Alam, M.N. Siddikee (2013). “Forecasting Volatility of Stock Indices with
ARCH Model”, International Journal of Financial Research Vol. 4, No. 2.
20
APPENDIX
ARIMA models
Series: log.return.price
ARIMA(7,0,7) with non-zero mean
Coefficients:
ar1 ar2 ar3 ar4 ar5 ar6 ar7 ma1 ma2 ma3
ma4
0.0718 0.0697 -0.4444 0.2598 -0.2416 -0.5214 -0.1790 0.0383 -0.0442
0.4070 -0.1781
s.e. 0.2773 0.2377 0.2141 0.2219 0.1948 0.1919 0.1949 0.2658 0.2148
0.1963 0.2097
ma5 ma6 ma7 mean
0.1855 0.6098 0.3986 -0.0008
s.e. 0.1843 0.1750 0.1992 0.0012
sigma^2 = 0.0004065: log likelihood = 952.1
AIC=-1872.19 AICc=-1870.7 BIC=-1809.11
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -3.15844e-06 0.01976068 0.01416311 NaN Inf 0.7064495 -0.000272118
Series: log.return.price
ARIMA(7,0,1) with non-zero mean
Coefficients:
ar1 ar2 ar3 ar4 ar5 ar6 ar7 ma1 mean
0.0191 0.0211 -0.0501 0.0393 -0.0626 0.0363 0.1169 0.1106 -0.0009
s.e. 0.3318 0.0674 0.0510 0.0533 0.0526 0.0555 0.0531 0.3326 0.0013
Series: log.return.price
ARIMA(6,0,7) with non-zero mean
Coefficients:
ar1 ar2 ar3 ar4 ar5 ar6 ma1 ma2 ma3
ma4 ma5
0.1824 0.1452 -0.5541 0.3756 -0.2405 -0.6010 -0.0671 -0.1140 0.5130
-0.2919 0.1905
s.e. 0.2867 0.3244 0.2730 0.1810 0.1929 0.2092 0.2852 0.3027 0.2574
0.1745 0.1862
ma6 ma7 mean
0.6850 0.2165 -0.0008
s.e. 0.1964 0.0693 0.0013
21
Series: log.return.price
ARIMA(7,0,1) with non-zero mean
Coefficients:
ar1 ar2 ar3 ar4 ar5 ar6 ar7 ma1 mean
0.0191 0.0211 -0.0501 0.0393 -0.0626 0.0363 0.1169 0.1106 -0.0009
s.e. 0.3318 0.0674 0.0510 0.0533 0.0526 0.0555 0.0531 0.3326 0.0013
ARCH model
Call:
garch(x = return1, order = c(0, 1))
Model:
GARCH(0,1)
Residuals:
Min 1Q Median 3Q Max
-3.15342 -0.52043 0.01768 0.39046 3.62983
Coefficient(s):
Estimate Std. Error t value Pr(>|t|)
a0 3.271e-04 2.342e-05 13.968 < 2e-16 ***
a1 2.894e-01 7.254e-02 3.989 6.63e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Diagnostic Tests:
Jarque Bera Test
data: Residuals
X-squared = 60.11, df = 2, p-value = 8.86e-14
Box-Ljung test
data: Squared.Residuals
X-squared = 0.31742, df = 1, p-value = 0.5732
22