Big Data and Machine Learning
Big Data and Machine Learning
net/publication/331837874
Big Data and Machine Learning for Economic Cycle Prediction: Application of
Thailand’s Economy
CITATIONS READS
0 200
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Satawat Wannapan on 29 May 2020.
1 Introduction
constant. Graphically, this assumption is just one of sciences used for data analyses,
which is displayed in Fig. 1. It is not enough and makes a cause for many researchers
have to put more assumptions to guarantee that their estimating models can compu-
tationally fit for data. Furthermore, this unmitigated mistake is being still employed to
many econometric predictions and used to provide policy recommendations to
authorities, especially central banks. Accordingly, to escape from it, Machine Learning
(ML) is invented to efficiently combine with big data analyses, and this would be the
solution for clarifying and forecasting the economy’s trends, hidden signs, and crises in
the modern era of academic researches.
Considering into central banks, it is obvious their outcomes are also microeconomic
decisions and interactions. These responsibilities came with the collection and access to
a wealth of new data sources, which moves central banks into the realm of big data
(Chakraborty and Joseph 2017). Historically, applying data science investigations and
machine learning coined in 1959 by Samuel (1959) is rare in econometrics for eco-
nomic researches. For central banks, however, these data computing analyses are
becoming more highlighted. For example, the paper of Bholat (2015) was proposed to
situate topics within the context of strategic plans and initiatives and summarized the
article linking central banks’ emerging interest in Big Data approaches. The letter of
Bank of Italy proposed by Signorini (2018) was mentioned the actual and potential
value of big data is more and more crucial for economic researches. Moreover, the
academic issue stated by Daniel Hinge (2017) was expressed that machine learning
may not yet be at the stage where central bankers are being replaced with robots, but
the field is recently bringing powerful tools to bear on big economic questions.
Accordingly, there is no reason why big data and machine learning cannot be the
possibly suitable solution for monetary-econometric researches in Thailand.
Big Data and Machine Learning for Economic Cycle Prediction 349
Table 1. The details of collective information used to data science analyses reference from the
Thailand’s Key Macroeconomic (Identified by BOT) and Big data from Google Trends database
Variable Detail Symbol Time-series Source
range
GDP (growth rate) RBC_GDP 2004–2017 The World Bank
Database
Population (growth rate) POP 2004–2017 The World Bank
Database
Unemployment (% change) UN_EM 2004–2017 The World Bank
Database
Industrial value added (% per GDP) IND 2004–2017 The World Bank
Database
Agricultural value added (% per GDP) AGR 2004–2017 The World Bank
Database
Gross domestic investment (% per GDI 2004–2017 The World Bank
GDP) Database
Service imports (% per GDP) SER_IM 2004–2017 The World Bank
Database
Service exports (% per GDP) SER_EX 2004–2017 The World Bank
Database
Military expenditures (% per GDP) MIL 2004–2017 The World Bank
Database
Public debt (% per GDP) PU_DEP 2004–2017 The World Bank
Database
Consumer price index (% per annual) CPI 2004–2017 The World Bank
Database
Foreign direct investment (US$) FDI 2004–2017 The World Bank
Database
Available lands (km2) LAND 2004–2017 The World Bank
Database
Preserved forests (km2) FOR 2004–2017 The World Bank
Database
(continued)
350 C. Chaiboonsri and S. Wannapan
Table 1. (continued)
Variable Detail Symbol Time-series Source
range
International reserves INT 2004–2017 The World Bank
Database
Fixed interest rate FIX_I 2004–2017 The World Bank
Database
Exchange rate (Baht per US$) EX 2004–2017 The World Bank
Database
Official development assets (US$) ODA 2004–2017 The World Bank
Database
Values of stock market trading MKT 2004–2017 The World Bank
Database
Overall Thailand economic situations Big_data 1 2004–2017 Google shopping
database
Investment situations Big_data 2 2004–2017 Google shopping
database
Stock market situations Big_data 3 2004–2017 Google shopping
database
Thailand business movements Big_data 4 2004–2017 Google shopping
database
Employments Big_data 5 2004–2018 Google shopping
database
Thailand international trades and Big_data 6 2004–2019 News search
investments
Thailand agricultural situations Big_data 7 2004–2020 New search
Thailand industrial situations Big_data 8 2004–2017 Image search
Thailand banking situations Big_data 9 2004–2018 Google shopping
database
Thailand political atmospheres Big_data 2004–2019 YouTube search
10
Thailand social atmospheres Big_data 2004–2020 YouTube search
11
margin, is, however, not straightforward in the general case. Thus, the concept behind
SVMs is now two-fold. First, to solve a presentation of the feature spaces in which the
data is linearly separable and, second, to identify the points in the input space which
define, or support, the maximal margin, the support vectors. As seen in Fig. 2, the
separating boundary, decision rule and error function for two-class classification can be
expressed as
T
x b 1 support vectors
SV
hðxi ; bÞ ¼ sign xTi :b P hypothesis
m
ERRðX; Y; bÞ ¼ 2m 1
i¼1 ðjyi hðxi ; bÞjÞ error function:
Fig. 2. Schematic presentation of a two-class (green and red dots) support-vector classifier in a
two-dimensional feature space. (Color figure online)
The coefficients b define the hyperplane satisfying the decision boundary equation.
The support vectors lying on the boundary of the gray area and the target values, yi ,
take the values yi 2 1; 1. The coefficients b can be provided by solving the (dual)
optimization problem:
X
m
1
LðaÞ ¼ ai aT :H:a;
i¼1
2
X
m
with b ¼ ai yi xi ; Hij yi yj xTi :xj ; ð1Þ
i¼1
under the constraints a:Y ¼ 0 and ai 0. Considering into the Lagrangian LðaÞ, it
focuses though there is a large number of free parameters ai , namely one for each
observation xi . The input data X enters the Lagrangian only via an inner product,
resulting in a scalar. This allows to define transformations T(.) and inner products via a
so-called kernel,
Big Data and Machine Learning for Economic Cycle Prediction 353
Commonly, the radial basis function (Gaussian kernel) relied on the polynomial
expansion of the exponential function is employed to be the choice for a kernel
function,
h i
K ð^x; x0 Þ ¼ exp cjjx x0 jj2 : ð3Þ
where pðY ¼ cjX Þ is the connected frequency of class c observations in X. jXv j is the set
of observations which take on each value. In a regression setting, the entropy can be
replaced by the mean squared error (MSE) and splits are performed along the
dimensions which most reduce the error (Galton 1907). A schematic representation of a
tree model with two features, x1 and x2, is displayed in Fig. 3.
Fig. 3. Schematic representation of a tree model. Left: the target data space is systematically
segregated by the tree model based on the input features. Right: a tree representation of the final
model. The tree is grown from the top to the bottom. (Modified from Chakraborty and Joseph
2017)
354 C. Chaiboonsri and S. Wannapan
p0 pe 1 p0
j ¼1 ; ð4Þ
1 pe 1 pe
where p0 is the relative observed agreement among raters, which identifies to accuracy,
and pe is the hypothetical probability of chance agreement. The observed data is used to
calculate the probabilities of each randomly observable seeing in each category. If ðjÞ
equals one, this implies that the raters are in complete agreement. On the other hand, if
ðjÞ equals zero then here is no agreement among the raters other than what would be
expected by chance. Cohen’s kappa coefficient is possible to be negative. This implies
that there is no effective agreement between the two raters or the agreement is worse
than random. For categories k, a number of items N and nki is the number of time raters
i predicted category k,
1 X
pe ¼ nk1 nk2 : ð5Þ
N2 k
4 Empirical Results
4.1 Descriptive Information
First of all, taking consideration into Thailand economic trends, yearly collective GDP
during 2004 to 2017 is displayed in Fig. 5. The classification of the trend obviously
shows that Thailand economy is dramatically fluctuated. This fluctuation causes the
prediction to become more difficult, doing traditionally econometric tools alone cannot
precisely provide the best model. In particularly, general means are not reliable any-
more. Consequently, the optimal algorithm calculation called Newton Method is
employed to extend the ability of data explanations.
To mathematically solve the issue, the basic idea of Newton’s method is typically
used as a basis on linearization. Graphically, the convergence processing of Newton’s
method is approximated by its tangent line, which is already gone closer to the root at
x* (as shown in Fig. 6). Empirically, the result represents that the real middle value of
data observations is quite different from the mean (Seen details in Table 2).
From the result of the optimal value, the economic trend can be clarified as a period
of real business cycles. The peak period is defined at the level, which is higher than the
optimal value (3.55%). Expansion is in the interval between 2.1% and 3.5%. Recession
belongs to the interval between 0% and 2%, and fall is defined as the level below 0%.
The details are demonstrated in Table 3.
356 C. Chaiboonsri and S. Wannapan
Table 2. The comparison between the optimal value and normal mean of yearly Thailand GDP
Variables (Growth rate) General mean Optimal value (Newton’s method)
GDP 3.67% 3.55%
Table 4. Comparative performance statistics of various machine learning models for predicting
Thailand macroeconomic variables
Method Cross-validation The final values Accuracy Kappa’s coefficient
Random forest (rf) Cohen’s kappa k = 9 0.7 0.4167
k-NN (kn) Cohen’s kappa c = 0.25 0.6 0
sigma = 1.286084e-18
SVM (svm) Cohen’s kappa mtry = 15 0.5417 0
5 Conclusion
The huge challenge to apply the advanced computational method, as called “machine
learning algorithms” for predicting the big data in macroeconomic variables is suc-
cessfully done in this paper. Since it is totally different from traditionally parametric
estimations and is more powerful. The machine learning systems can capture an
enormous amount of informative details in databases, including qualitative data,
quantitative factors, and even time-series trends. In this paper, 29 time-series variables
regarding to Thailand economic structures during 2004 to 2017 were included to
predict the sensible upcoming trend.
With the minimization of modeling assumptions, machine learning systems can
efficiently calculate both stationary and non-stationary data. Methodologically, the data
was processed into two sections, algorithm calculations and model selections.
358 C. Chaiboonsri and S. Wannapan
Empirically, the random forest algorithm is the best model selected by Cohen’s kappa
coefficient. Consequently, the predicting result clearly shows Thailand economy would
be very active (peak) in the upcoming quarters (2018q3 to 2018q4).
To interpret this computational result and apply to policy recommendations, it is
obvious that the big data using in this paper dose not includes only economic factors,
but it contains social variables such as political activities, environmental issues, and
even IT social behaviors. With machine learning algorithms, the result which is con-
cluded that Thailand economy would be the sunshine time is extremely reliable much
more than traditional econometric models for time-series forecasting. Since the com-
plexity in multi-processing estimations by AI, machine learning methods can explain
the outliners in the mixed observation rather than traditional econometric methods,
which inevitably need assumptions. The solution can confirm that political stability,
social network updating, fluctuation controlling in financial market systems, online
news, and even efficiently environmental managements are inevitably connected, and
these points have to be empirical implemented. Hence, there is no reason why big data
and machine learning cannot be the suitable answer for monetary-econometric
researches in Thailand, especially data mining for researching into the responsibilities
of BOT (Bank of Thailand).
References
Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., Bacardit, J.: Functional network
construction in Arabidopsis using rule-based machine learning on large-scale data sets. Plant
Cell 23(9), 3101–3116 (2011)
Bholat, D.: Big data and central banks. Q. Bull. Q1, pp. 86–93 (2015). https://www.researchgate.
net/publication/276101527_Big_Data_and_central_banks
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). ISBN
0-387-31073-8
Chakraborty, C., Joseph, A.: Machine learning at central banks. Staff Working Paper No. 647.
Bank of England (2017)
Galton, F.: Vox populi. Nature 75, 450–451 (1907)
Hinge, D.: Big Data in Central Banks. Published by Infopro Digital Services Ltd, Central
Banking Publications, London (2017)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data.
Biometrics 33(1), 159–174 (1977). https://doi.org/10.2307/2529310
Pontius, R., Millones, M.: Death to Kappa: birth of quantity disagreement and allocation
disagreement for accuracy assessment. Int. J. Remote Sens. 32, 4407–4429 (2011)
Samuel, A.L.: Some studies in machine learning using the game of checkers. In: Levy, D.N.L.
(ed.) Computer Games, pp. 335–365. Springer, New York (1988). https://doi.org/10.1007/
978-1-4613-8716-9_14
Sim, J., Wright, C.C.: The Kappa statistic in reliability studies: use, interpretation, and sample
size requirements. Phys. Ther. 85, 257–268 (2005)
Signorini, L.F.: Harnessing big data & machine learning technologies for central banks. The
Printing and Publishing Division, Bank of Italy, Rome (2018)
Big Data and Machine Learning for Economic Cycle Prediction 359
Wannapan, S., Chaiboonsri, C., Sriboonchitta, S.: Macro-econometric forecasting for during
periods of economic cycle using bayesian extreme value optimization algorithm. In:
Kreinovich, V., Sriboonchitta, S., Chakpitak, N. (eds.) TES 2018. SCI, vol. 753, pp. 706–723.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70942-0_51
Zhang, J., et al.: Evolutionary computation meets machine learning: a survey. IEEE 6(4), 68–75
(2011)