Gold Price Estimation Using A Multi Variable Model
Gold Price Estimation Using A Multi Variable Model
Abstract---Stock market analysis is a very popular area of investment [2]. Hidden Markov Model, Similarity
research. Achieving good prediction in forecasting the Index, Crisp Methodologies and Machine learning
stock markets is a very challenging task. The prediction of
algorithms are widely appreciated in the field of
the future stock markets is done using cascading statistical
models. This paper investigates the MCX commodity (Gold) prediction for the past one and half decades [1][3][5].
on which the model is applied. Through the similarity measures, it is possible to
identify the richness and evenness about the attributes
Objective: To predict the trend of the gold commodity will
that contribute to the greater idea of prediction. Dead
remove the uncertainty in the future for the investors.
Stocks and fast moving items are predicted through
Prediction techniques employed:Multi Variable Linear association rule or market basket analysis [4].
regression model,Time series models, Skewness and
Kurtosis.
Data Mining Methodologies are used to large extents to
Finding:The mammoth analysis of the attributes brings up analyse the portfolios, risk management and trend
the greater vision about the gold productand its importance analysis. Clustering algorithm is the tool used to find
to invest in that segment for it keeps the money in harmony
with the real time drift in the price and its fluctuations. The the net trading volume, price volatility and return
data collected consists of commodity prices and volumes volatility ratios [5]. Nowadays the predictors’ vision
over the last 5 years on a monthly basis. The experimental turns to other segments like market movements, equity,
results give us the predicted future values of the
commodities.
commodity market and Euro forex rates. For the above
said, large scale microblog data and higher search
Inference: Prediction can also be valuable to support in behaviour provides insight for the market movements
planning about viable developments and provides a clear
outcome for the future eventual market. [6].
Keywords: Kurtosis, Skewness, Multi Variable Linear Intraday stock price forecasting can be done using
regression model, RMSE, Demonetization. ANN (Artificial Neural Networks) [7]. Stock price
Keywords-Surveillance, Classification, SVM, CNN, PCA, forecasting can also be done using Machine Learning
LDA, HMM, K-NN, Optical Flow. techniques like Support Vector Machines. [8]
Authorized licensed use limited to: UNIVERSITAS GADJAH MADA. Downloaded on May 30,2021 at 07:38:35 UTC from IEEE Xplore. Restrictions apply.
Another study examined the impact of twitter messages time-series data and since stock data qualify as a time-
(exogenous inputs) on stock prices using Linear series data, HMM model has been attempted by a few
Regression mode. The results showed the correlation studies in the past. There are variants to the HMM
between the daily tweets and the stock market model like Artificial Neural Networks (ANNs) [16],
indicators .Hence Twitter data can be useful to predict Fuzzy Logic (FL) [17], and Support Vector Machines
stock market [13]. (SVM)[18]. The results show that the SVM is most
suitable method for stock market forecasting
Yet another study combined the news mining and time problem.The accuracy is observed high due to less
series analysis to forecast inter-day stock prices. The number of features and data points.
results show that the NTF (News Mining and Time
Series based Analysis forecasting) forecasts the stock This paper has chosen the Regression Model, for Gold
prices better than the regular TSA (Time Series Price Estimation. The Gold Price has been factored as
Analysis) and forecasts the stock price trends better the dependent variable, and a few other variables have
than the classical random walk algorithm. Stock price been considered to be influencing the Gold Price in
forecasting can be obtained from news reports, which is India. Such variables are: NIFTY stock index, Oil
an improvement over conventional forecasting Price, Cost of Price Index, USD to INR exchange rate
techniques [14]. and international Gold Price in USD. Additional other
data like GDP growth has been considered initially, but
due to poor correlation between Gold price and GDP,
III. GOLD TREND ANALYSIS FOR THE the same has been excluded.
FUTURE MARKET
Number of statistical and mathematical models have
evolved over a period of time, and one of them is the IV. THE DATA SET (GOLD PRICE &
Multi-variable Regression model. This method tries NIFTY)
to estimate the value of dependent variable Y that may TABLE-1 (GOLD PRICE, CPI INDEX AND NIFTY)
be influenced by other independent_variable, assuming
that dependant variable Y with other independent Gold Price CPI Nifty
DATE (INR) Index
variables can be expressed as a linear relationship. The
linear regression can be expressed as a formula Y= 31-Jan-12 2,769.25 109.42 5199.25
a+bX where X is independent variable and Y is the 29-Feb-12 2,835.76 109.98 5385.20
dependent variable. The slope of the line is b and a is 31-Mar-12 2,796.10 111.09 5295.55
the intercept (the value of y when X has a value zero).
30-Apr-12 2,862.23 113.30 5248.15
When multiple variables are considered, the formula is
as follows: 31-May-12 2,882.34 113.85 4924.25
30-Jun-12 2,991.21 114.96 5278.90
th
Here Yi represents the i observationof the dependent 31-Jul-12 2,955.50 117.17 5229.00
variable. ELrepresents the parameter to be estimated where
31-Aug-12 3,036.67 118.27 5258.50
i=1,2,3,4,..n, HLrepresents the standard ith distributed normal
error and Xij represents the ith observation of the jth 30-Sep-12 3,170.63 118.82 5703.30
variable. 31-Oct-12 3,120.88 119.93 5619.70
30-Nov-12 3,159.29 120.48 5879.85
A popular use of this model is using Least Squares
Regression. This method is used to calculate the best-fit 31-Dec-12 3,113.05 121.03 5905.10
line by minimizing the sum of the squares of the 31-Jan-13 3,075.48 122.14 6034.75
vertical deviations from each data point to the line (if a 28-Feb-13 3,020.58 123.24 5693.05
point lies on the fitted line exactly, then its vertical
31-Mar-13 2,959.98 123.79 5682.55
deviation is 0). Because the deviations are squared,
there are no issues with the positive and negative 30-Apr-13 2,744.92 124.89 5930.20
deviations, due to address outliers, influential 31-May-13 2,655.15 125.99 5985.95
observations, residuals and lurking variables.
30-Jun-13 2,706.67 127.65 5842.20
The other model that has been evolving recently is the 31-Jul-13 2,689.39 129.86 5742.00
Hidden Markov Model. This method was originally 31-Aug-13 3,038.63 130.97 5471.80
applied in areas like speech recognition, Image 30-Sep-13 3,079.85 131.52 5735.30
processing, etc. HMM models are useful to predict
31-Oct-13 2,959.01 133.17 6299.15
2
365
370
Authorized licensed use limited to: UNIVERSITAS GADJAH MADA. Downloaded on May 30,2021 at 07:38:35 UTC from IEEE Xplore. Restrictions apply.
30-Jun-
30-Nov-13 3,000.05 134.28 6176.10 59.5330
13 8.25 102.92 1,343.35
31-Dec-13 2,889.15 132.06 6304.00 31-Jul-
60.8550
13 10.25 107.93 1,285.52
31-Jan-14 2,909.57 130.95 6089.50 31-Aug-
65.7050
13 10.25 111.28 1,351.74
28-Feb-14 2,948.29 131.50 6276.95 30-Sep-
62.5900
31-Mar-14 2,967.04 132.06 6704.20 13 9.50 111.60 1,348.60
31-Oct-
61.6240
30-Apr-14 2,851.46 133.72 6696.40 13 9.00 109.08 1,316.58
30-Nov-
31-May-14 2,781.28 134.83 7229.95 62.3990
13 8.75 107.79 1,275.86
30-Jun-14 2,681.32 135.93 7611.35 31-Dec-
61.8100
13 8.75 110.76 1,221.51
31-Jul-14 2,786.71 139.25 7721.30 31-Jan-
62.6850
14 9.00 108.12 1,244.27
31-Aug-14 2,827.46 139.81 7954.35
28-Feb-
61.7950
30-Sep-14 2,701.94 139.81 7964.80 14 9.00 108.90 1,299.58
31-Mar-
31-Aug-16 3,133.01 153.66 8786.20 60.0150
14 9.00 107.48 1,336.08
30-Apr-
30-Sep-16 3,108.29 153.11 8611.15 60.3450
14 9.00 107.76 1,298.45
31-Oct-16 2,990.57 153.66 8638.00 31-May-
59.1950
14 9.00 109.54 1,288.74
30-Nov-16 2,966.07 153.11 8224.50 30-Jun-
60.0600
14 9.00 111.80 1,279.10
31-Dec-16 2,747.97 152.01 8185.80
31-Jul-
60.5550
14 9.00 106.77 1,310.59
31-Aug-
60.5200
14 9.00 101.61 1,295.13
30-Sep-
61.9400
A. THE DATA SET (OTHER DATA) 14 9.00 97.09 1,236.55
31-Aug-
66.9730
TABLE-2 (BANK RATE, USD TO INR, OIL PRICE & GOLD 16 7.00 45.84 1,340.17
PRICE IN US) 30-Sep-
66.5560
16 7.00 46.57 1,326.61
Bank USD to Oil Gold 31-Oct-
66.6860
Rate INR Price Price in 16 6.75 49.52 1,266.28
DATE in USD USD 30-Nov-
31-Jan- 68.5980
49.5150 16 6.75 44.73 1,238.35
12 6.00 110.69 1,652.21 31-Dec-
29-Feb- 67.9550
49.1100 16 6.75 53.29 1,157.36
12 6.00 119.33 1,742.14
31-Mar-
50.8750
12 9.50 125.45 1,673.77
30-Apr-
The dataset consists of gold rates (per gm) over the past
52.6650 five years on a monthly basis from Jan 2012 to Dec
12 9.50 119.42 1,649.69
31-May- 2016. The dataset also includes other data such as Nifty
56.0400
12 9.00 110.34 1,591.19
30-Jun-
prices, Oil prices (Brent Crude), CPI, GDP, Interest
55.5100 Rates and Dollar to Rupee rates.
12 9.00 95.16 1,598.76
31-Jul-
55.4400
12 9.00 102.62 1,589.90 The data was collected from various sourcesincluding
31-Aug- World Gold Council, IndexMundi, MCXIndia and RBI.
55.5250
12 9.00 113.36 1,630.31
30-Sep-
52.8550
12 9.00 112.86 1,744.81
31-Oct-
12 9.00
53.8050
111.71 1,746.58
V. APPLIED METHODOLOGY
30-Nov-
12 9.00
54.2650
109.06 1,721.64 The Multi Variable Linear Regression Method has been
31-Dec- built using R Programming. The actual data for the
54.9950
12 9.00 109.49 1,684.76 different variables covered a period of 60 months. The
31-Jan-
13 8.75
53.2750
112.96 1,671.85 data for the first forty seven (47) months have been
28-Feb- used as training data set and the data for the remaining
54.3700
13 8.75 116.05 1,627.57 thirteen (13)months as the testing data set. The model
31-Mar-
13 8.50
54.2850
108.47 1,593.09
involved the following stages: establishing high level of
30-Apr- correlation between the variables and building a model
53.6850
13 8.50 102.25 1,487.86 with the variables chosen.
31-May-
56.5800
13 8.25 102.56 1,414.03
3
366
371
Authorized licensed use limited to: UNIVERSITAS GADJAH MADA. Downloaded on May 30,2021 at 07:38:35 UTC from IEEE Xplore. Restrictions apply.
A. HYPOTHESIS
The following hypothesis has been formulated, to build TABLE-4 : COEFFICIENTS AND INTERCEPT
the model: Intercept Oil Price NIFTY Interest
-3.501e+03 3.829e+00 1.049e-02 -7.611e+00
1. H0: Gold prices don’t have relation with Nifty USD to INR CPI Gold Price
prices. in US
H1: Gold prices have relation with Nifty prices 5.301e+01 4.103e+00 1.655e+00
Note: e stands for 10 and e+0x is 10^x
2. I0: Gold prices don’t have relation with
International Gold prices.
D.SUMMARY
I1: Gold prices have relation with International
Gold prices. The summary of the model is shown below:
3. M0: Gold prices don’t have relation with Oil prices. TABLE-5: SUMMARY
Multiple R- Adjusted R- Residual p-Value
M1: Gold prices have relation with Oil prices. Squared Squared Value Standard of
Value Error
4. N0: Gold prices don’t have relation with Inflation 0.8487 0.826 72.84 on 40 6.661e-15
degrees of
(CPI) freedom
N1: Gold prices have relation with Inflation (CPI)
Note: e stands for 10 and e+0x is 10^x
5. S0: Gold prices don’t have relation with Interest This model shows a Multiple R-Squared value of
Rates. 0.8487, implying that approximately 85% of the
S1: Gold prices have relation with Interest Rates. observations is explained by change in the variables
(Oil Price, Nifty, Interest Rate, and USD to INR Rate,
6. U0: Gold Prices do not have relation with USD to CPI,and Gold Price in US).
INR rate
Thus the derived regression equation for the derived
U1: Gold Prices have relation with USD to INR
model is: y=(-3.501e+03)+( 3.829*oil rate )+( 1.049e-
rate
02*Nifty)+ +(-7.611*Interest) + (53.01*USD to INR)+
Totally six variables have been considered, apart from (4.103*CPI)+ (1.655*Gold price in US)
historical Gold price data for the calculation. A high
E. RESULTS
level of correlation is observed between Gold Prices
The model was then tested on the remaining
and other variables.
observations (13 of them) of the dataset and the results
B. CORRELATION BETWEEN THE VARIABLES are given below.The prediction and observation values
are presented in the following graph diagram:
TABLE-3: CORRELATION BETWEEN GOLD PRICES AND
OTHER VARIABLES
INT CPI OIL US2INR NIFTY INT TABLE-6: PREDICTION AND OBSERVATION
RATE PRICE Date Actual Fit Lower Upper
VALUES 0.325 -0.6 0.68 -0.495 -0.69 0.714
48 31-Dec-15 2,528.41 2,556.57 2,391.37 2,721.77
The above correlation gives us the variables having
49 31-Jan-16 2,604.21 2,662.18 2,489.74 2,834.61
highest correlation with the rate of gold, with their p-
values (Probability Values) less than 0.05.Hence the 50 29-Feb-16 2,874.36 2,842.92 2,657.71 3,028.12
null hypotheses (H0, I0, M0, N0, S0, U0) stated in the 51 31-Mar-16 2,927.31 2,848.15 2,674.15 3,022.15
previous section are false as the p-values are less than 30-Apr-16 2,916.78
52 2,873.28 2,697.61 3,048.94
0.05
53 31-May-16 2,972.89 2,983.59 2,790.35 3,176.83
C. MODEL FINALISATION 54 30-Jun-16 3,044.37 3,036.47 2,835.69 3,237.25
55 31-Jul-16 3,129.77 3,089.02 2,877.67 3,300.36
Using the most correlated variables, the multi-variable
regression model is built. This model revealed a 56 31-Aug-16 3,133.01 3,112.15 2,898.14 3,326.16
Kurtosis value of -1.13 and a skewness value 0.27 57 30-Sep-16 3,108.29 3,066.29 2,860.68 3,271.91
implying that gold is a reliable commodity to invest. 31-Oct-16 2,990.57
58 2,989.07 2,789.67 3,188.47
The resultant interceptor and the coefficients are shown
59 30-Nov-16 2,966.07 3,019.24 2,814.86 3,223.62
below:
60 31-Dec-16 2,747.97 2,878.96 2,692.98 3,064.93
372367
Authorized licensed use limited to: UNIVERSITAS GADJAH MADA. Downloaded on May 30,2021 at 07:38:35 UTC from IEEE Xplore. Restrictions apply.
linear regression model built using the above mentioned
predictors produces an RMSE of 53.583. This is
because, the largest deviation from the actual value is
seen in the month of December, causing a heavy
increase in the RMSE Value.This is mainly attributed to
demonetization, which took effect from the second
week of November, hence causing a slump in the gold
price from November to December. Without
demonetization (i.e. without the last observation), the
RMSE Value is around 11.
REFERENCES
373368
Authorized licensed use limited to: UNIVERSITAS GADJAH MADA. Downloaded on May 30,2021 at 07:38:35 UTC from IEEE Xplore. Restrictions apply.
8. Upadhyay, V. P., Panwar, S., Merugu, R., risk measures”. Fuzzy Sets and Systems, pages
&Panchariya, R. (2016, August). Forecasting 769–782, 2007.
Stock Market Movements Using Various 18. L. Cao and F.E.H. Tay. “Financial forecasting
Kernel Functions in Support Vector Machine. using support vector machines”. Neural
In Proceedings of the International Conference
Computation and Application, pages 184–192,
on Advances in Information Communication
Technology & Computing (p. 107). ACM. 2007.
369
374
Authorized licensed use limited to: UNIVERSITAS GADJAH MADA. Downloaded on May 30,2021 at 07:38:35 UTC from IEEE Xplore. Restrictions apply.