0% found this document useful (0 votes)

50 views10 pages

Predicting The Future With Artificial Inteligence

seminarki rad - umjetna inteligencija

Uploaded by

Nejra Sivić

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views10 pages

Predicting The Future With Artificial Inteligence

seminarki rad - umjetna inteligencija

Uploaded by

Nejra Sivić

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Available online at www.sciencedirect.

com
ScienceDirect
ScienceDirect
Procedia Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 140 (2018) 383–392

Complex Adaptive Systems Conference with Theme: Cyber Physical Systems and Deep Learning, CAS 2018,
Complex Adaptive Systems Conference–with
5 November Theme: Cyber
7 November 2018, Physical
Chicago, Systems and Deep Learning, CAS 2018,
Illinois, USA
5 November – 7 November 2018, Chicago, Illinois, USA
Predicting the Future with Artificial Neural Network
Predicting the Future with Artificial Neural Network
Anifat Olawoyin*, Yangjuin Chen
Anifat Olawoyin*, Yangjuin Chen
University of Winnipeg, Winnipeg R3B2E9, CANADA
University of Winnipeg, Winnipeg R3B2E9, CANADA

Abstract
Abstract
Accurate prediction of future values of time series data is crucial for strategic decision making such as inventory
management,
Accurate budgetofplanning,
prediction customer
future values relationship
of time series datamanagement,
is crucial formarketing
strategic promotion, and efficient
decision making such as allocation
inventory
of resources. budget
management, However, time series
planning, prediction
customer can bemanagement,
relationship very challenging especially
marketing whenand
promotion, there are elements
efficient of
allocation
uncertainty
of resources.including
However, natural
time disaster, change incan
series prediction government policies andespecially
be very challenging weather condition.
when there In arethis research,
elements of
four different
uncertainty multilayer
including perceptron
natural disaster,(MLP)
changeartificial neural networks
in government haveweather
policies and been discussed andIncompared
condition. with
this research,
Autoregressive Integratedperceptron
four different multilayer Moving Average
(MLP)(ARIMA)
artificialfor this task.
neural The models
networks have beenare evaluated
discussed using two statistical
and compared with
performance evaluation
Autoregressive Integratedmeasures, Root Mean
Moving Average Squared
(ARIMA) forError (RMSE)
this task. The and coefficient
models of determination
are evaluated (R2). The
using two statistical
experimental evaluation
performance result shows measures, Root MLP
that a 4-layer Meanarchitecture
Squared Error using the tanh
(RMSE) activation
and function
coefficient in each of the
of determination (R2hidden
). The
layer and a linear
experimental result function
shows that in athe output
4-layer MLPlayer has the lowest
architecture prediction
using the error and
tanh activation the highest
function in eachcoefficient
of the hiddenof
determination
layer among
and a linear the configured
function multilayer
in the output layerperceptron neuralprediction
has the lowest networks. error
In addition,
and thecomparative analysis of
highest coefficient
performance
determinationresult
among reveals that the multilayer
the configured multilayerperceptron
perceptronneural
neuralnetwork
networks.MLP has a lower
In addition, predictionanalysis
comparative error than
of
the ARIMA model.
performance result reveals that the multilayer perceptron neural network MLP has a lower prediction error than
the ARIMA model.
© 2018 The Authors. Published by Elsevier B.V.
This © open
is an 2018 access
The Authors. Published
article under by Elsevier
the CC BY-NC-ND B.V.license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
© 2018 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection
This is an and
openpeer-review
access under
article underresponsibility
the CC of the Complex
BY-NC-ND license Adaptive Systems Conference with Theme: Engineering Cyber
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the Complex Adaptive Systems Conference with Theme:
PhysicalEngineering
Selection Systems.
and peer-review under responsibility
Cyber Physical Systems. of the Complex Adaptive Systems Conference with Theme: Engineering Cyber
Physical Systems.
Keywords: Artificial Neural Network, ARIMA, Multilayer Perceptron, Time Series, Data Preprocessing
Keywords: Artificial Neural Network, ARIMA, Multilayer Perceptron, Time Series, Data Preprocessing

* Corresponding author. E-mail address: [email protected]

* Corresponding author. E-mail address: [email protected]
1. Introduction
1. Introduction
Machine learning is a domain of computational intelligence focusing on models that can iteratively learn from
data toMachine
find hidden insights
learning is a and patterns
domain without beingintelligence
of computational explicitly programmed. Learning
focusing on models methods
that can be supervised,
can iteratively learn from
semi-supervised
data or insights
to find hidden unsupervised. The Artificial
and patterns withoutneural network (ANN)
being explicitly is a formLearning
programmed. of the supervised
methods can machine learning
be supervised,
semi-supervised or unsupervised. The Artificial neural network (ANN) is a form of the supervised machine learning
1877-0509 © 2018 The Authors. Published by Elsevier B.V.
This is an open
1877-0509 access
© 2018 Thearticle under
Authors. the CC BY-NC-ND
Published license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
by Elsevier B.V.
Selection
This is an and
openpeer-review under
access article responsibility
under of the Complex
the CC BY-NC-ND licenseAdaptive Systems Conference with Theme: Engineering Cyber Physical Systems.
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the Complex Adaptive Systems Conference with Theme: Engineering Cyber Physical Systems.

1877-0509 © 2018 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the Complex Adaptive Systems Conference with Theme: Engineering Cyber
Physical Systems.
10.1016/j.procs.2018.10.300
384 Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

model that mimics a biology nervous system. The ANN can detect patterns and trends that are too complex for
human or other statistical models such as non-linearity in time series data to analyse. Real world applications of the
ANN include pattern classification such as handwritten recognition, time series prediction, image compression,
credit scoring for loan approval, and machine control, just to name a few.
This research designs a Multilayer Perceptron neural network for time series prediction and compares this
with one of the traditional statistical time series prediction techniques known as the Autoregressive Integrated
Moving Average, ARIMA. The study varies the number of hidden layers and investigates the best activation function
for a set of data. In addition, this study explores the significance of pre-processing in time series prediction through
data transformation by which a dataset having 5 attributes and 1,098,044 instances is converted to another dataset
having 2 attributes and 366 instances by using aggregation, equal frequency binning and feature selection
techniques.
The rest of this paper is organized as follows: Section 2 gives the background information and related works,
Section 3 presents the main theoretical framework, Section 4 describes the implementation details, Section 5 is
devoted to the experimental result and discussion. Finally, a short conclusion is set forth in Section 6.

2. Related Work
Artificial Neural network (ANN) has been applied to time series forecasting problems by many researchers. The
study in [1] employed the Elman recurrent neural network (ERNN) with stochastic time effective functions for
predicting price indices of stock markets. The ERNN can keep memory of recent events in predicting the future. The
study in [2] used the Multilayer Feed Forward Neural Network (MLFFNN) and the Nonlinear Autoregressive
models with the Exogenous Input (NARX) Neural Network to forecast exchange rates in a multivariate framework.
Experimental findings indicated that the MLFFNN and NARX were more efficient when compared with the
Generalized Autoregressive Conditional Heteroskedastic (GARCH) and Exponential Generalized Autoregressive
Conditional Heteroskedastic (EGARCH).
Another advanced statistical technique for predicting future time series is the Autoregressive Integrated Moving
Average (ARIMA) model, which assumes that the time series data are stationary. That is, the data are not time
dependent. Thus, to use the ARIMA for time series prediction requires checking for stationarity; and a common
approach to do this is to use the augmented Dickey-Fuller test (ADF) to test the presence of a unit root in a sample.
Specifically, if the p-value is greater than 0.05, null hypothesis is accepted.
Besides, the hybrid techniques combining ARIMA and ANN have been shown to be successful by [3,4, 5, 6].
However, in [3] it is assumed that the linear and non-linear pattern can be separately modelled, their relationship is
additive, and the residual from the linear model will contain only the non-linear pattern which may lead to
performance degeneration for instance if the relationship is multiplicative. The empirical evidence from the study in
[7] showed that such integrated approaches may not necessarily outperform the individual forecasting techniques.
Although, the authors in [4] proposed a hybrid model to overcome the limitation of the traditional hybrid models and
guarantee that the model will not be worse than using the individual ARIMA and artificial neural network, this
assurance cannot be true in all cases. Hence, in this study we focus on the individual model comparison using the
parking tickets dataset.

3.0 THEORETICAL FRAMEWORK

3.1 Artificial Neural Network (ANN)
The Artificial Neural network (ANN) is made up of a series of interconnected nodes that simulate individual
neurons like a biological neural system. The ANN can be used for classification, pattern recognition and forecasting
problem in situations of complex processes characterized by chaotic features such as trends and seasonality observed
in parking ticket data, nonlinear and non-stationary in stock market data, chaotic features in ozone concentration
measurements and weather related problems with non-linear relationships between inputs and the outputs.
The earlier ANN has only a single layer and follows a local learning rule known as the Widrow-Hoff or
Perceptron Learning Rule (PLP) to update the weights associated with a network. A single layer neural network has
no hidden layer; each input neuron has an associated weight and the output neuron uses a simple Linear Threshold
Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392 385
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

Unit (LTU) activation function. The activation function commonly used in most artificial network configurations is
the sigmoid function because of its ability to combine linear, curvilinear and constant behaviors, as well as being
smoothly differentiable.
The single perceptron output is defined by
� � �� (1)
where t = threshold, �� are the associated weight of the input attributes �� .
The major drawbacks of a simple neural network include:
 Single neurons cannot solve complex tasks;
 It is restricted to linear calculations.
 Nonlinear features need to be generated by hand, an expensive operation.
The focus of this paper is the Multilayer Perceptron (MLP). A multi-layer perceptron is a feedforward neural
network consisting of a set of inputs, one or more hidden layers and an output layer. The layers in MLP are fully
connected such that neurons between adjacent layers are fully pairwise connected while neurons within a layer share
no connection.
The input represents the raw data (�� fed into the network. The raw data and the weight are fed into the
hidden layer. The input to the hidden layer is thus given as
� � �� ∑�� (2)
The hidden layer is the processing unit where the learning occurs. The hidden layer transforms the values
received from the input layer using an activation function. A commonly used activation function is the sigmoid
function given as
σ = 1/(1 + e-x) (3)
Other activation functions are:
i. tanh(x)- non-linearity activation function is a scaled sigmoid function given as:
�
�� (4)

tanh(x) can also be expressed in the form of sigmoid as 2σ(2x) – 1.

ii. Rectifier Linear unit (RELU) is an activation function with a threshold of zero given as:
�� (5)
The output of the hidden layer is given as:
H � �� ∑�� (6)
where A is the activation function. Assuming that the sigmoid gives
�
H = 1/�� ∑�� (7)
The output layer receives the output and the associated weight of the hidden layer neurons as inputs. The
output � of the output layer assuming a sigmoid function is given as
� � ��∑�
�� (8)
where �� are the output and weight of individual neurons of the hidden layer.
The activation function of the output layer is commonly a linear function, and depending on the task, a tanh or a
sigmoid function may be applicable.
A multilayer perceptron architecture having 2 hidden layers denoted as 2-layer multilayer perceptron neural
network is shown in Figure 2.
The main issue with a Multilayer Perceptron neural network is weight adjustment in the hidden layer which is
necessary to reduce the error at the output layer. The weight adjustment in the hidden layer is achieved using
backpropagation algorithm. The back propagation takes the sequence of training samples (time series data for this
study):
(��
386 Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

as the input and produces a sequence of weights (�� starting from some initial weight �� , usually
chosen at random [4]. Generally, the backpropagation rule is given as:
��
�� ∂� (9)
��
where � represents the weights, E(w) is the cost function that measures how far the current network’s output is from
the desired one. ∂E(w)/∂w is the partial derivative of the cost function E that specifies the direction of the weight
adjustment to reduce the error,  is the learning rate, measured as the number steps for each iteration of the weight
update equation.
The weight change for the hidden layer is given as:
�� ∂� �� (10)
where ∂� � �� ∑ �� .
The weight change for the output layer is given as:
�� ∂� �� (11)
where ∂� � �� , and T is the target output and �� is the output.
The network is trained by adjusting the network weights as defined in equation 9 -11 above to minimize the
output errors on a set of training data.
The training of a multilayer perceptron can be summarized as:
 Given a dataset D with (�� input and P patterns for the network to learn
 The network with n input units is fully connected to h nonlinear hidden layers via connection weight ��
associated with each input unit.
 The hidden layer is fully connected to T output units via connection weight �� associated with each neuron
in the hidden layer.
 The training is initiated with random initial weight for each neuron in the network.
 An appropriate error function �� , for instance the Mean Square Error (MSE) to minimize by the
network is predetermined.
 The learning rate η is also predetermined.
 The weight associated with each neuron in the hidden layer and the output layer is updated using the
��
equation: ∆w � �� until the error function is minimized.
��
A momentum � is an inertia term used to diminish fluctuations of weight changes over consecutive iterations.
Thus, the weight update equation becomes:
��
�� (12)
��

Figure 1: Single Layer Perceptron Neural Network Figure 2: Multi-Layer Neural network (MLP)
Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392 387
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

3.2 The Auto Regressive Integrated Moving Average (ARIMA)

ARIMA is proposed by Box and Jenkins [2]. The model assumes that time series is stationary and follows the
normal distribution. To achieve the notion of stationary in time series, the model subtracts an observation at time t
from an observation at time t - 1. Here, the name ‘ARIMA’ stands for
i. Autoregressive, AR - the lag of the stationary time series data. AR is represented as p in the model.
ii. Integrated, I - a differencing transformation applied to time series to make it stationary. A stationary series is
independent of observation time, represented as d in the model.
iii. Moving average, MA - the lag of the forecast errors and is represented as, q in the model.
Thus, a non -seasonal ARIMA model can be summarized as ‫ܣܯܫܴܣ‬ሺ‫݌‬ǡ ݀ǡ ‫ݍ‬ሻ where:
p is the number of autoregressive terms;
d is the number of non-seasonal differences;
q is the number of moving average terms.
‫ܣܯܫܴܣ‬ሺ‫݌‬ǡ ݀ǡ ‫ݍ‬ሻ Forecasting equation is defined with respect to the number of differencing necessary to make the
time series data stationary as follows:
Let Y= original series; y = stationary series; d = 0 (indicating no difference); then‫ݕ‬௧ ൌ ܻ௧ .
First difference, d = 1 then‫ݕ‬௧ ൌ ܻ௧ െ ܻ௧ିଵ .
Second Difference, d = 2 then‫ݕ‬௧ ൌ ሺܻ௧ െ ܻ௧ିଵ ሻ െ ܻ௧ିଵ െ ܻ௧ିଶ =ܻ௧ െ ʹܻ௧ିଵ ൅ ܻ௧ିଶ 1.
To use ARIMA for time series prediction requires checking for stationarity, a common approach is to use the
augmented Dickey-Fuller test (ADF). The ADF tests the presence of a unit root in a sample; if the p-value is greater
than 0.05, null hypothesis is accepted. This research uses the statsmodels package in python to implement the
ARIMA model for the dataset.

4.0 IMPLEMENTATION
4.1 Development Environment and Tools
All our experiments are performed on a 64-bit operating system. The processor is 2.4GHz Intel(R) core™i5
laptop, 8GB installed memory. Programming language is Python, and Development environment is Enthought
Canopy. The used Machine learning tool is Scikit-learn [9], Keras libraries [10] with Pandas, NumPy, stats Models
and Matplotlib.
4.2 Dataset
The dataset for this study is a set of parking contravention transactions updated monthly by the city of
Winnipeg on open data government license available in [11]. The dataset has five attributes and over a million
instances comprising of parking tickets issued between January 1st, 2010 and March 31st, 2017. For this paper,
seven years’ data (2010-2016) are used. The description and preview of the dataset is presented in Table 1 and table
2, respectively.
Table 1:Dataset Description
Dataset Name Number of attributes Number of Instances
Parking_Contravention_Citaitons.csv 5 1.09M

�
Robert Nau Lecture notes on forecasting: Fuqua School of Business. Duke
Universityhttp://people.duke.edu/~rnau/Slides_on_ARIMA_models--Robert_Nau.pdf
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

388 Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392

Table 2: Sample data

Issue Date Ticket Number Violation Street Location
12/13/2016 12:59:55 PM 70219201 01Meter Expired Hargrave ST (49.8884066, -97.142226)
12/13/2016 12:58:05 PM 74920668 05Overtime Kenneth ST (49.839005, -97.149891)
12/13/2016 12:53:09 PM 75508386 05Overtime Girton BLVD
12/13/2016 12:51:36 PM 73418686 01Meter Expired Portage AVE (49.89496, -97.136288)

In Table 2, the data types of the five attributes are as follows:

Issue Date – Timestamp;
Ticket Number – Transaction unique identifier;
Violation (offence) – Text;
Street – Text;
Location – (x, y) (coordinate).

4.3 Evaluation
The models are evaluated using the root mean square error (RMSE) and coefficient of determination (R2).
 The RMSE is the square root of mean square error, a risk metric corresponding to the expected value of the
squared error loss function, defined as:
� �
�� . (13)
�

 The coefficient of determination is a measure of goodness of the model. It explains how well future samples are
likely to be predicted by the model [4]. The value of R2 can be negative or positive. A negative R2 defines an
arbitrary worse model, defined as
∑��
�� ∑��
��
�
, where �� . (14)
��
� � � � ∑��
��

4.4 Implementation Chart

This study pre-processes the dataset before designing the models. The output of the pre-processing serves as
inputs to the two major models under consideration. The Multilayer Perceptron neural network is run first, and then
followed by the ARIMA model. Comparative analysis of the results is done after the experiment. The working flow
for this study is depicted in Figure 3.

5.0 EXPERIMENT AND RESULTS

5.1 Pre-processing
The pre-processing stage involves aggregation of the dataset into daily counts and weekly average is then
calculated. Using a feature selection, the end-date of each week is taken as the period and the weekly average is the
time series data. Thus, after the pre-processing stage the dataset has 2 attributes and 366 instances. Sample outputs
of the pre-processing stage are presented in table 3.

The summary statistics for the dataset presented in table 4 shows that the minimum weekly mean between year
2010 and 2016 is 178 tickets while the maximum is 1341 tickets. The graph for the dataset presented in figure 4
shows that there is a spike in ticket numbers around January-February each year when the snow related violation
tickets are issued.
Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392 389
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

5.2 ARIMA (p, d, q) Model

The assumption of the ARIMA model is that the time series is independent of time. Thus, the Augmented
Dickey-Fuller (ADF) test is performed to test for stationarity in time series data. The ADF null hypothesis states that
a sample data has unit root, and the data is not stationary while the alternative hypothesis states that the data is
stationary. If the p-value > 0.05 the null hypothesis is accepted and if p-value < 0.05 the null hypothesis is rejected.
The result of the Augmented Dickey-Fuller (ADF) presented in table 4 shows that the p-value < 0.05. Thus, the data
is stationary, and the null hypotheses is rejected.
Since the time series is stationary, the value of parameter d is assumed to be zero (� � 0).
The significant of the pre-processing stage is observed in the result of the augmented Dickey-Fuller Test. The
mean weekly ticket calculated at the pre-processing is a useful tool in transforming time series data to stationary.
Next, the log transformation is applied to the dataset for scaling and the ACF and PACF are plotted to determine
the value of p and q parameters of the��, �, ��. The Autocorrelation Function (ACF) is defined as a measure
of correlation between the time series with a lagged version of itself.
The Partial Autocorrelation Function (PACF) is defined as the correlation between a time series with a lagged
version of itself after removing the effect already explained by previous lag. For instance, at lag 5, say X5, the PACF
is the correlation after removing the effect of x1, x2, x3 and x4.
Parameter p is defined as the point where the PACF crosses the upper confidence interval for the first time. From
the result in Figure 5, p = 1. Parameter q is defined as the point where the PACF tail off. From the result in Figure 4,
q = 2.
The experiment was tested with varying training percentage - 75%, 80%, 85% and 90% of the dataset and the
remaining percentage (25%,20%,15%, and 10%) is used for testing. The ��, �, �� is implemented using
��, 0, 2�,��2, 0, 2� , ��3, 0, 2� and ��4, 0, 2�, respectively. The result presented in table
5 shows performance degradation at 15% where RMSE increases for all models except ��2,0,2�; hence, the
result obtained at 10% is considered to be overfitting while ��2,0,2� using 20% testing (RMSE = 0.145; R2 =
0.301) is taken as the baseline for comparison.

5.3 Multilayer Perceptron (MLP) Neural Network

Similar to the ARIMA model implementation, the MLP is tested with varying training and testing percentage
ranging from 75% to 90% and the test data ranges from 10% to 25%. Four different MLP architecture were designed
in this paper; a 2-layer with one neuron in the hidden layer denoted as 2H1, a 2-layer with four neurons in the hidden
layer denoted as 2H4 to test the effect of increasing neuron, a 3-layer having four neurons in one hidden layer and
one neuron in the other layer denoted as 3H41 and 4-layer having four neurons in the second layer, one neuron in the
third layer and one neuron in the last layer denoted as 4H411.

Figure 3: Implementation Chart Figure 4: Dataset Trends Graph Figure 5: ACF and PACF Plot
390 Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

Table 3: Pre-Processing Sample Output Table 4. Pre-Processing Data Summary

Period Weekly Mean

Period Weekly Mean
2010-01-03 178.33
2010-01-03 178.33
2010-01-10 442.57
2010-01-10 442.57
2010-01-17 483.57
2010-01-17 483.57
2010-01-24 527.86
2010-01-24 527.86
2010-01-31 513.43
2010-01-31 513.43

Table 5: Augmented Dickey-Fuller (ADF) Test Table 6: ARIMA (p, d, q) Results-( RMSE and R2)

ADF Test Result

Model RMSE (Test %) R2 (Test %)
ADF Statistic: -4.111741
10% 15% 20% 25% 10% 15% 20% 25%
p-value: 0.000926 ��1, 0, 2� 0.114 0.146 0.145 0.144 0.28 0.286 0.296 0.26
Critical Values: ��, �, �� 0.108 0.144 0.145 0.143 0.35 0.307 0.301 0.26
��, 0, 2� 0.107 0.146 0.145 0.144 0.37 0.282 0.300 0.26
1% -3.449 ��, 0, 2� 0.108 0.148 0.146 0.145 0.35 0.263 0.291 0.25
5% -2.870
10% -2.571

All the models are separately trained for up to 1000 epochs using the sigmoid activation function and a comparison
is made using the tanh activation function. The relationship between the sigmoid and tanh activation functions is
stated in equation (6). The optimizer selected for the training is the Stochastic Gradient Descent (SGD) optimizer
with a default learning rate of 0.01. The dataset is standardized using the MixMaxScaler function in the range (-1, 1).
An attempt to use the sigmoid activation function in the output layer resulted in negative r2 (-9.67); thus, a linear
activation function is used for the output layer of all the architectures. The setup is presented in table 7.The loss
function specified for all the models is the Mean Square Error (MSE), RMSE and R2 are subsequently calculated for
evaluation.
The result presented in Table 8 for the sigmoid activation function shows that a 2-layer with one neuron in the
hidden layer has the best goodness of fit having correlation of determination R2 of 0.61 and an error of 0.103.
Adding more neuron to a layer does not improve performance as seen in the result for 2H1 and 2H4. Similarly,
addition of layer to the network does not improve the prediction capability of the network. The root means square
error, RMSE increases from 0.103 for 2H1 network to 0.104 for 3H41 while the coefficient of determination, R2
increases to 0.66 for 3H41 network from 0.61 for 2H1 network.
The result from table 9 for the network designed using tanh activation function shows performance improvement
when more layers are added to the network up to 4H411 (depicted in figure 7) where the best result is recorded.
Further additions of layer beyond 4H411 add no value to the prediction capability and goodness of fit of the
network.
The Comparative analysis of the result presented in table 10 and figure 6 shows that the 4H411 neural network
designed with tanh activation function has the lowest error (RMSE=0.099) having an average prediction error of 57
tickets per week. A 2-layer MLP with one neuron in the hidden layer also has a better performance than ARIMA
(2,0,2) having an average prediction error of 60 tickets per week.
Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392 391
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

Table 7: Multilayer Perceptron architecture Table 8: Sigmoid Function Evaluation Results (RMSE and R2)

Epoch=1000, Optimizer=SGD, learning rate=0.01, Model RMSE (Test %) R2 (Test %)

loss function=MSE Standardization = 10% 15% 20% 25% 10% 15% 20% 25%
MinMaxScaler; Activation Function: Hidden 2H1 0.085 0.104 0.103 0.102 0.60 0.61 0.61 0.65
Layer: Sigmoid/ tanh, output: Linear 2H4 0.105 0.115 0.110 0.115 0.30 0.30 0.45 0.30
Models No of Hidden No of Neuron in 3H41 0.081 0.103 0.104 0.101 0.73 0.62 0.66 0.67
Layer Hidden Layer 4H411 0.102 0.128 0.126 0.125 -0.08 -0.19 -0.01 -0.09
2H1 1 1
2H4 1 4
3H41 2 4,1
4H411 3 4,1,1

Table 9: Tanh Activation Function Evaluation Results (RMSE and R2)

Model RMSE (Test %) R2 (Test %)

10% 15% 20% 25% 10% 15% 20% 25%
2H1 0.083 0.102 0.101 0.099 0.70 0.70 0.74 0.75
2H4 0.091 0.107 0.101 0.105 0.52 0.69 0.84 0.69
3H41 0.076 0.097 0.100 0.098 0.82 0.78 0.79 0.73
4H411 0.074 0.096 0.099 0.095 0.82 0.79 0.77 0.79

Table 10: Comparative Analysis of Results (RMSE and R2)

Model RMSE R2 RMSE (actual dataset)
MLP tan h (4H411) 0.099 0.77 57.25
MLP sigmoid (2H1) 0.103 0.61 59.69
ARIMA (2,0,2) 0.145 0.30 61.37

Evaluation Result for Multilayer Perceptron and
ARIMA models
1
0
MLP tan h (4H411) MLP sigmoid (2H1) ARIMA (2,0,2)

RMSE R2

Figure 6: Comparison evaluation result (RMSE and R2) Figure 7: 4H411 MLP

6.0 CONCLUSION
The performance of the Multilayer Perceptron neural network and ARIMA models have been investigated in this
research. Observations from the performance evaluation of the models revealed that the four MLP architectures
designed using tanh activation function outperform the ARIMA model. Specifically, with the 4H411 model, they
produce the best goodness of fit (R2 = 0.77) and lowest prediction error (RMSE = 0.099). The effect of adding more
layers on the performance of a multilayer perceptron neural network is also investigated. Using the sigmoid
activation function, a 2-layer MLP having one neuron in the hidden layer has the best performance in term of
prediction error (RMSE = 0.103) and the coefficient of determination (R2 = 0.61) measures. The empirical evidence
from this study indicates that adding more layers to a network configured using sigmoid function may not
necessarily improve the predictive power of the network and may result in performance degeneration.
392 Anifat Olawoyin et al. / Procedia Computer Science 140 (2018) 383–392
A Olawoyin, Y Chen/ Procedia Computer Science 00 (2018) 000–000

Like the sigmoid activation function, the tanh activation function also has a saturation effect, however, unlike the
sigmoid, the output of the tanh activation function is zero-centered. Thus, adding layers to a network configured
using the tanh activation function can improve the performance of a network as demonstrated in this study. From the
result in Table 9, it can be observed that adding more layers reduces the prediction error and improves the goodness
of fit of the network up to the 4-layer network (4H411).
In addition, pre-processing datasets is a necessity to some models like the ARIMA and MLP investigated in this
study. The ARIMA model requires a stationary time series data. This is achieved by first aggregating the ticket
transaction to daily counts and using equal weekly frequency to group the mean values and then apply the logarithm
function to them. Standardization is a requirement for multilayer perceptron networks to remove bias that might be
caused from wide variation in range of values of raw data during a training. From the summary of pre-processing
stage in table 4, it can be observed that standardization is required since the minimum average ticket per week is 178
while the maximum is 1340. This study used the MinMaxScaler function of the Scikit-learn library to transform the
dataset to a range [-1, 1].

Our experiments suggest that choosing a good activation function can significantly improve the performance of a
multilayer perceptron neural network.

ACKNOWLEDGEMENTS

The first author would like to thank two anonymous referees for their helpful comments. Special thanks to Dr Sheela
Ramanna and Dr. Sergio Camorlinga, University of Winnipeg, for their helpful comments at the initial stage of this
work.

REFERENCES
[1] J. Wang, J. Wang, W. Fang, and H. Niu, Financial time series prediction using Elman recurrent random neural networks,
Computational Intelligence and Neuroscience, vol. 2016, Article ID 4742515, 14 pages, 2016.
[2] Chaudhuri T. D. et al. Artificial Neural Network and Time Series Modeling Based Approach to Forecasting the Exchange Rate in a
Multivariate Framework” Journal of Insurance and Financial Management, Vol. 1, Issue 5 (2016), pp 92-123.
[3] Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159-175.
[4] Khashei, Mehdi, and Mehdi Bijari. "A novel hybridization of artificial neural networks and ARIMA models for time series
forecasting." Applied Soft Computing 11.2 (2011): 2664-2675.
[5] Babu, C. N., & Reddy, B. E. (2014). A moving-average filter based hybrid ARIMA–ANN model for forecasting time series
data. Applied Soft Computing, 23, 27-38.
[6] Khandelwal, I., Adhikari, R., & Verma, G. (2015). Time series forecasting using hybrid ARIMA and ANN models based on DWT
decomposition. Procedia Computer Science, 48, 173-179.
[7] Taskaya-Temizel, T., & Casey, M. C. (2005). A comparative study of autoregressive neural network hybrids. Neural Networks, 18(5-
6), 781-789.
[8] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating
errors". Nature. 323 (6088): 533–536.
[9] Scikit-learn Machine Learning in Python http://scikit-learn.org/stable/index.html
[10] Keras Deep Learning Documentation https://keras.io/
[11] City of Winnipeg Parking contravention dataset: https://data.winnipeg.ca/Parking/Parking-Contravention-Citations-/bhrt-29rb/data

A Comparison Between Neural Network and Box Jenkins Forecasting Techniques With Application To Real Data PDF
100% (2)
A Comparison Between Neural Network and Box Jenkins Forecasting Techniques With Application To Real Data PDF
39 pages
Time Series Forecasting Using Back Propagation Neural Network With ADE Algorithm
No ratings yet
Time Series Forecasting Using Back Propagation Neural Network With ADE Algorithm
5 pages
Paper 26-Regression Model and Neural Network
No ratings yet
Paper 26-Regression Model and Neural Network
6 pages
Scopus q4 Anggota - cm-3367（机构已修改）Bkd Kuz Nugroho
No ratings yet
Scopus q4 Anggota - cm-3367（机构已修改）Bkd Kuz Nugroho
13 pages
CISIM2010 HBHashemi
No ratings yet
CISIM2010 HBHashemi
5 pages
Correlation of English Reading Comprehension and Problem
100% (2)
Correlation of English Reading Comprehension and Problem
38 pages
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
No ratings yet
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
11 pages
Predicting Psycho-Somatic Disorders in Online Activity Using Multi-Layer Perceptron
No ratings yet
Predicting Psycho-Somatic Disorders in Online Activity Using Multi-Layer Perceptron
8 pages
Forecasting With Artificial Neural Network Models
No ratings yet
Forecasting With Artificial Neural Network Models
38 pages
A New Deep Neural Network For Forecasting Deep Dendritic Artificial Neural Network
No ratings yet
A New Deep Neural Network For Forecasting Deep Dendritic Artificial Neural Network
25 pages
Sres 2179
No ratings yet
Sres 2179
16 pages
The Keynes Solution The Path To Global Economic Prosperity Via A Serious Monetary Theory Paul Davidson 2009
No ratings yet
The Keynes Solution The Path To Global Economic Prosperity Via A Serious Monetary Theory Paul Davidson 2009
23 pages
A Comparison of Regularization Techniques in Deep
No ratings yet
A Comparison of Regularization Techniques in Deep
18 pages
Example ANN
No ratings yet
Example ANN
14 pages
IJNAA - Volume 14 - Issue 1 - Pages 1989-1998
No ratings yet
IJNAA - Volume 14 - Issue 1 - Pages 1989-1998
10 pages
Using Recurrent Neural Networks To Forecasting of Forex
No ratings yet
Using Recurrent Neural Networks To Forecasting of Forex
23 pages
Contrast Media Molecular Imaging - 2022 - Bansal - GGA MLP A Greedy Genetic Algorithm To Optimize Weights and Biases in
No ratings yet
Contrast Media Molecular Imaging - 2022 - Bansal - GGA MLP A Greedy Genetic Algorithm To Optimize Weights and Biases in
14 pages
1 s2.0 S1474667016374900 Main
No ratings yet
1 s2.0 S1474667016374900 Main
6 pages
ECN-4902E Bitirme
No ratings yet
ECN-4902E Bitirme
25 pages
Predicting Australian Stock Market Index Using Neural Networks Exploiting Dynamical Swings and Intermarket Influences
No ratings yet
Predicting Australian Stock Market Index Using Neural Networks Exploiting Dynamical Swings and Intermarket Influences
13 pages
Artificial Neural Network Methodology For Modelling and Forecasting Maize Crop Yield
No ratings yet
Artificial Neural Network Methodology For Modelling and Forecasting Maize Crop Yield
6 pages
CH 10
No ratings yet
CH 10
41 pages
Effect of Input Features On The Performance of The ANN-based Wind Power Forecasting
No ratings yet
Effect of Input Features On The Performance of The ANN-based Wind Power Forecasting
6 pages
14 IJAEST Volume No 2 Issue No 1 A Study On The Parameters of Back Propagation Artificial Neural Network Temperature Prediction 099 103
No ratings yet
14 IJAEST Volume No 2 Issue No 1 A Study On The Parameters of Back Propagation Artificial Neural Network Temperature Prediction 099 103
5 pages
Weather Forecasting Using Neural Network IJERTCONV5IS01197
No ratings yet
Weather Forecasting Using Neural Network IJERTCONV5IS01197
4 pages
049 - Flores Et Al - Loaeza - Rodriguez - Gonzalez - Flores - Terceño
No ratings yet
049 - Flores Et Al - Loaeza - Rodriguez - Gonzalez - Flores - Terceño
9 pages
Augmented Reality
No ratings yet
Augmented Reality
8 pages
Artificial Intelligence Algorithm For Optimal Time
No ratings yet
Artificial Intelligence Algorithm For Optimal Time
15 pages
10 38016-Jista 1620633-4524097
No ratings yet
10 38016-Jista 1620633-4524097
11 pages
Forex - Nnet Vs Reg
No ratings yet
Forex - Nnet Vs Reg
6 pages
Amir ND Time Series Prediction
No ratings yet
Amir ND Time Series Prediction
8 pages
Chapter 3 Thesis Sample
100% (1)
Chapter 3 Thesis Sample
5 pages
CISIM2010
No ratings yet
CISIM2010
5 pages
Sanet - ST - 3030713768geospatial Technology For Human Well-Being and Health - Compressed
No ratings yet
Sanet - ST - 3030713768geospatial Technology For Human Well-Being and Health - Compressed
421 pages
Sinc UVM24
No ratings yet
Sinc UVM24
13 pages
Teacher Absence As A Leading Indicator of Student Achievement
100% (1)
Teacher Absence As A Leading Indicator of Student Achievement
24 pages
1 s2.0 S2314728817300715 Main
No ratings yet
1 s2.0 S2314728817300715 Main
7 pages
Research Paper
No ratings yet
Research Paper
7 pages
Eswacorrectedproff
No ratings yet
Eswacorrectedproff
10 pages
Multi-Layer Perceptrons
No ratings yet
Multi-Layer Perceptrons
8 pages
Conditional Probability
No ratings yet
Conditional Probability
4 pages
Multi-Step-Ahead Prediction With Neural Networks
No ratings yet
Multi-Step-Ahead Prediction With Neural Networks
12 pages
Banikova
No ratings yet
Banikova
7 pages
Prediction of Time Series Data Using GA-BPNN Based Hybrid ANN Model
No ratings yet
Prediction of Time Series Data Using GA-BPNN Based Hybrid ANN Model
6 pages
Causal Model Based On ANN
No ratings yet
Causal Model Based On ANN
6 pages
Environsciproc 26 00049
No ratings yet
Environsciproc 26 00049
6 pages
Forecasting of Nonlinear Time Series Using Ann: Sciencedirect
No ratings yet
Forecasting of Nonlinear Time Series Using Ann: Sciencedirect
11 pages
A Comparison Between Neural Networks and Traditional Forecasting Methods A Case Study
No ratings yet
A Comparison Between Neural Networks and Traditional Forecasting Methods A Case Study
6 pages
04.stock Market Prediction Using Machine Learning
No ratings yet
04.stock Market Prediction Using Machine Learning
6 pages
Ijirt155434 Paper
No ratings yet
Ijirt155434 Paper
5 pages
NN5
No ratings yet
NN5
29 pages
An Application of Artificial Neural Networks For Prediction and Comparison With Statistical Methods
No ratings yet
An Application of Artificial Neural Networks For Prediction and Comparison With Statistical Methods
6 pages
Computer Lab - Practical Question Bank Faculty of Commerce, Osmania University
No ratings yet
Computer Lab - Practical Question Bank Faculty of Commerce, Osmania University
30 pages
Quality Prediction in Object Oriented System by Using ANN: A Brief Survey
No ratings yet
Quality Prediction in Object Oriented System by Using ANN: A Brief Survey
6 pages
A Non-Linear Autoregressive Neural Network Model For Forecasting Indian Index of Industrial Production
No ratings yet
A Non-Linear Autoregressive Neural Network Model For Forecasting Indian Index of Industrial Production
5 pages
Simulation in Finance
No ratings yet
Simulation in Finance
62 pages
Chapter 3
100% (1)
Chapter 3
2 pages
A Study of Soft Computing Techniques
No ratings yet
A Study of Soft Computing Techniques
19 pages
An Artificial Neural Network P D Q Model For Times
No ratings yet
An Artificial Neural Network P D Q Model For Times
12 pages
A Novel Hybridization of Artificial Neural Networks and ARIMA Models For Time Series Forecasting
No ratings yet
A Novel Hybridization of Artificial Neural Networks and ARIMA Models For Time Series Forecasting
12 pages
IEEE CAA Journal of Automatica Sinica-3
No ratings yet
IEEE CAA Journal of Automatica Sinica-3
11 pages
Demand Forecast in A Supermarket Using S Hybrid Intelligent System
No ratings yet
Demand Forecast in A Supermarket Using S Hybrid Intelligent System
8 pages
Econ 316 Course Outline
No ratings yet
Econ 316 Course Outline
4 pages
Artificial Neural Network Intelligent Method For Prediction: Articles You May Be Interested in
No ratings yet
Artificial Neural Network Intelligent Method For Prediction: Articles You May Be Interested in
7 pages
BR MCQ
No ratings yet
BR MCQ
52 pages
Guide To SEM
No ratings yet
Guide To SEM
21 pages
Stock Market Indices Prediction With Various Neural Network Models PDF
No ratings yet
Stock Market Indices Prediction With Various Neural Network Models PDF
5 pages
MSW (Syllabus) Revised and Approved by The Board of Studies of Social Work
No ratings yet
MSW (Syllabus) Revised and Approved by The Board of Studies of Social Work
48 pages
To Pool or Not To Pool: Homogeneous Versus Heterogeneous Estimators Applied To Cigarette Demand
No ratings yet
To Pool or Not To Pool: Homogeneous Versus Heterogeneous Estimators Applied To Cigarette Demand
10 pages
Aligood Terjemahan Bagian 6
No ratings yet
Aligood Terjemahan Bagian 6
21 pages
Gsdeemer Third Edition: Alberto Ades, Rumi Masih, Daniel Tenengauzer September 1999
No ratings yet
Gsdeemer Third Edition: Alberto Ades, Rumi Masih, Daniel Tenengauzer September 1999
16 pages
EI 2022 Electronics and Instruments Engineering Etr 2022 Paper
No ratings yet
EI 2022 Electronics and Instruments Engineering Etr 2022 Paper
47 pages
Devroye Random Variate Generation One Line of Code
No ratings yet
Devroye Random Variate Generation One Line of Code
8 pages
Applications of Mathematics in Science: Abstarct
No ratings yet
Applications of Mathematics in Science: Abstarct
3 pages
CHAPTER 03-Random Variable
No ratings yet
CHAPTER 03-Random Variable
68 pages
SMJ Author Instructions January 2022
No ratings yet
SMJ Author Instructions January 2022
17 pages
An Empirical Evaluation of Explanations For State Repression
No ratings yet
An Empirical Evaluation of Explanations For State Repression
27 pages
Etasr 8304
No ratings yet
Etasr 8304
6 pages
ETF2100 5910 Tutorial Week 1 SOLUTION
No ratings yet
ETF2100 5910 Tutorial Week 1 SOLUTION
7 pages
Heritage 04 00133 v2
No ratings yet
Heritage 04 00133 v2
21 pages
Edur 8131 Notes 7 Chi Square
No ratings yet
Edur 8131 Notes 7 Chi Square
16 pages
Activity 7 (Mean Deviation, Standard Deviation, and Variance)
No ratings yet
Activity 7 (Mean Deviation, Standard Deviation, and Variance)
6 pages
Final Minuscriptprint
No ratings yet
Final Minuscriptprint
68 pages
Interception Efficiency of CVM-based
No ratings yet
Interception Efficiency of CVM-based
22 pages
Time Series Forecasting ANN
No ratings yet
Time Series Forecasting ANN
8 pages
Educational Statistics KCA Past Paper 3
No ratings yet
Educational Statistics KCA Past Paper 3
4 pages
Lesson 1-4 PR2 Q2
No ratings yet
Lesson 1-4 PR2 Q2
113 pages
Qdrant Vector Search in Practice: The Complete Guide for Developers and Engineers
From Everand
Qdrant Vector Search in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Predicting The Future With Artificial Inteligence

Uploaded by

Predicting The Future With Artificial Inteligence

Uploaded by

Available online at www.sciencedirect.

* Corresponding author. E-mail address: [email protected]

1877-0509 © 2018 The Authors. Published by Elsevier B.V.

3.0 THEORETICAL FRAMEWORK

tanh(x) can also be expressed in the form of sigmoid as 2σ(2x) – 1.

3.2 The Auto Regressive Integrated Moving Average (ARIMA)

Table 2: Sample data

In Table 2, the data types of the five attributes are as follows:

4.4 Implementation Chart

5.0 EXPERIMENT AND RESULTS

5.2 ARIMA (p, d, q) Model

5.3 Multilayer Perceptron (MLP) Neural Network

Table 3: Pre-Processing Sample Output Table 4. Pre-Processing Data Summary

Period Weekly Mean

ADF Test Result

Epoch=1000, Optimizer=SGD, learning rate=0.01, Model RMSE (Test %) R2 (Test %)

Table 9: Tanh Activation Function Evaluation Results (RMSE and R2)

Model RMSE (Test %) R2 (Test %)

Table 10: Comparative Analysis of Results (RMSE and R2)

You might also like