A Comparison of Regularization Techniques in Deep
A Comparison of Regularization Techniques in Deep
Article
A Comparison of Regularization Techniques in Deep
Neural Networks
Ismoilov Nusrat 1 and Sung-Bong Jang 2, *
1 Department of Computer Software Engineering, Kumoh National Institute of Technology,
Gyeong-Buk 39177, South Korea; [email protected]
2 Department of Industry-Academy, Kumoh National Institute of Technology, Gyeong-Buk 39177, South Korea
* Correspondence: [email protected]; Tel.: +82-054-478-6708
Received: 29 October 2018; Accepted: 14 November 2018; Published: 18 November 2018
Abstract: Artificial neural networks (ANN) have attracted significant attention from researchers
because many complex problems can be solved by training them. If enough data are provided
during the training process, ANNs are capable of achieving good performance results. However,
if training data are not enough, the predefined neural network model suffers from overfitting and
underfitting problems. To solve these problems, several regularization techniques have been devised
and widely applied to applications and data analysis. However, it is difficult for developers to choose
the most suitable scheme for a developing application because there is no information regarding the
performance of each scheme. This paper describes comparative research on regularization techniques
by evaluating the training and validation errors in a deep neural network model, using a weather
dataset. For comparisons, each algorithm was implemented using a recent neural network library
of TensorFlow. The experiment results showed that an autoencoder had the worst performance
among schemes. When the prediction accuracy was compared, data augmentation and the batch
normalization scheme showed better performance than the others.
1. Introduction
Accurate weather forecasting is an important issue that plays a significant role in the development
of several industrial sectors, such as agriculture and transportation. Many companies are using weather
prediction techniques to analyze consumer demands. In addition, exact forecasting is essential for
people to organize and plan their days. However, it is very difficult to predict the weather precisely
because the atmosphere changes dynamically. For a long time, physical simulations were the most
widely used scheme. With this method, the current atmospheric condition is sampled, and future
conditions are predicted by comparing thermodynamic characteristics. In recent years, artificial neural
networks (ANNs) have been widely used for weather prediction because they perform better through
the use of machine learning. The human brain is composed of 100 billion interconnected neurons.
These neurons are core cells that are responsible for information transmission among neurons using
electrochemical signals.
ANNs were modeled by using a mechanism inspired by the human brain’s information processing.
This scheme was first introduced to researchers in 1943 by Warren and Walter [1]. This scheme
is currently being used in almost every scientific area to solve complex problems. Williams [2]
presented the efficiency of machine learning algorithms, and proved that they could be applied to
many applications. Nicholas [3] proposed an enhanced scheme to train neural network algorithms.
In the scheme, a statistical computation scheme was used to reduce the training errors. Zhang [4]
proposed a new recurrent neural network (RNN) scheme based on synchronization of delays and
impulses, which reduced prediction errors. Isomura [5] applied neural networks to knowledge
inference applications, useful information was collected, and an inference rule was extracted using
deep neural networks (DNNs). Elusai [6] used ANNs to model behavior to prevent bronze corrosion.
In the scheme, corrosion types were classified into different features, and future corrosion behaviors
were predicted. Jian [7] enhanced an existing principal component analysis (PCA) neural network that
was based on new discrete-time algorithms. The experiment results showed that validation errors
were significantly decreased. Yin [8] applied a DNN to classify server states for enhancing the quality
of service in cloud environments. The experiments showed that the scheme could be used to find used
or broken servers.
During the training process, the input data type and amount directly influence the performance of
the ANN model [9]. If the training data are deficient, overfitting or underfitting occurs [10]. Overfitting
refers to the phenomenon where the validation error increases while the training error decreases [11].
This occurs because the model learns the expected output for every input data instead of learning the
real data distribution [12,13]. In contrast, underfitting problems occur when a model cannot learn
enough because of insufficient training data [14,15]. Many solutions have been proposed to prevent
these problems. The most widely used method is regularization, where a small variation is applied
to the original data to efficiently train a model [16]. One of the advantages of this method is that it
achieves a better performance for unseen data. For weather prediction, it is used to predict rainfall,
temperature, and humidity [17].
In this study, we summarize prior research on weather prediction using ANNs. Several studies
have been completed on accurately predicting the weather. There studies included a method based on
an ANN model to predict the air temperature at hourly intervals for up to 12 h [18,19]. The prediction
error was minimized, and the method achieved good performance for short-term forecasting. Other
research [20,21] developed a new model to predict the hourly temperature for up to 24 hour. The model
used a separate winter, spring, summer, and fall season. Experiments were conducted to compare the
performances of well-known ANN models, including the Elman recurrent neural network (ERNN),
the radial basis function network (RBFN), the multilayer perceptron network (MLP), and the Hopfield
model (HFM). The MLP model with a single hidden layer and the RBFN model with two hidden layers
outperformed the other models. In the MLP experiment, the log-sigmoid function was used as the
activation function, and the Gaussian activation function was used for the hidden layers in the RBFN.
The temperatures for both models were measured and drawn with pure lines. Although the accuracy of
both models was identical, the RBFN had better processing times, because it took too long for the MLP
to learn data. Some research compared several ANN models by applying various transfer functions,
hidden layers, and neurons to predict the maximum temperature for the year. The model included
five hidden layers, and each hidden layer contained 10 or 16 neurons. The tan-sigmoid activation
function for hidden layers showed the bests results when using the logistic sigmoid function [22].
Xiaodong [23] presented a data augmentation scheme to improve performance when applied to audio
data. In the scheme, extra sampling data were added to the original input audio data and used for
training and validation. Huaguang [24] analyzed the stability of RNNs. An RNN was the most
widely used scheme in analyzing and predicting time series data. Songchuan [25] proposed a new
training algorithm, called fireworks, to predict the mean temperature. The algorithm showed fast
convergence and reduced training cycles. Takashi [26] proposed a new RNN algorithm that was based
on asynchronous negotiation and reproduction. This algorithm improved the prediction efficiency
when it was applied to time series data. Hayati [27] used an ANN model that contained a single
hidden layer and six neurons, which showed good performance results. Many of the latest advances
can be found in image processing and object detection. Cao [28] invented a fast DNN algorithm based
on additional knowledge during training. This was applied to object detection from streaming video,
and the results showed that it could reach good performance. Wang [29] applied a convolutional
neural network (CNN) to detecting a salient object from input images. A salient object refers to the
Symmetry 2018, 10, 648 3 of 18
most important object, which expresses the most outstanding characteristic from an image. To do this,
they used additional metadata together with a CNN. Yue [30] applied neural networks to detecting
a collision between cars. The video streaming captured from the traffic system was very complex
and dynamic. This made collision detection more difficult. To solve this problem, they presented an
enhanced DNN algorithm that was based on feature enhancement. Huang [31] used a neural network
to improve the detection accuracy of traffic monitoring systems. One of the problems in moving object
detection is that the accuracy becomes lower when there are too many moving objects in a video stream.
To solve this problem, they used a DNN to detect a moving object accurately. Akcay [32] used a deep
convolutional neural network (DCNN) to classify and detect an object from X-ray images scanned
in an airport. By using this, they could save time and expense spent in investigating and detecting a
dangerous object. Sevo [33] used a CNN to detect an object from images captured in the air. It is very
difficult to detect an object from the air because all scenes in the air are expressed as three-dimensional
(3D) images. By using a CNN, he could decrease the complexity of air object detection. One of the
most widely known areas where neural networks are applied can be said to be image processing.
Woźniak [34] presented an enhanced object detection method, where convolutional neural networks
were combined with the analysis of clustered numbers. To determine the points of clusters, they used
fuzzy logic. Vieira [35] presented methods and applications of deep learning that were applied to
neuroimaging. Neuroimaging is used to make an image structure of a human brain to cure mental
disease. In their paper, they insisted that deep learning could be an efficient method in improving
brain image quality by training the neural network. Polap [36] described a practice in which an ANN
was applied to detect potential diseases from body skin. In the method, skin data were collected by
using motion sensors and a camera, and an ANN model was trained using the data. Then, using the
model, they determined whether the skin had disease or not. Heaton [37] described an application of
deep-learning stochastic models in financial areas. In these areas, they used deep learning to predict
and classify financial data. Most doctors in hospitals used X-rays to classify carcinomas in chest organs.
However, it causes wrong diagnoses because it is difficult for radiologists to exactly interpret the X-ray
results. To solve this problem, Wozniak [38] applied neural networks to improving the accuracies
of carcinoma classification. The experimental results showed that they reached a 92% classification
accuracy. Litjens [39] described recent advancements of deep learning applications in analyzing images
in the medical industry. In the paper, they presented more than 300 pieces of research achieved in
this field. To give a concise review, they divided application areas for studies into 10 medical areas.
Wozniak [40] presented a new method based on neural networks to detect defects of fruit peels, which
was very different from a classical scheme. They invented an enhanced ANN algorithm called an
adaptive artificial neural network (AANN). By using this method, they could improve calculation
accuracy because it adapted to input data and their characteristics. Wang [41] presented an overview
of machine learning applications for manufacturing. Through the help of widespread sensors and the
Internet of Things (IoT), huge amounts of data could be collected in manufacturing systems. Deep
learning could be used to improve system performance and product quality by analyzing collected
big data.
As described earlier, active research is being conducted on neural networks and overfitting
solutions. However, there is no research that compares regularization schemes. Therefore, it is difficult
for developers to choose the most suitable scheme for developing an application, because there is
no information about the performance of each scheme. To solve this problem, this study presents
comparative research on regularization techniques by evaluating the training and validation errors in
a DNN model using weather datasets. Especially, the appropriate choice of the regularization scheme
is a very important process to manage huge augmented objects in intelligent mobile augmented reality
(IMAR) system.
The remainder of this paper is organized as follows. Section 2 describes the research methodology
and experiment setup. In Section 3, experiment results are described and analyzed. Section 4 presents
a discussion of the results. Finally, Section 5 concludes this work.
Symmetry 2018, 10, 648 4 of 18
Start
Apply Regularization Methods to the Original Data
End
dimensional
this feature forfeature vector
the ith for day.
single the ith set prediction
The of five consecutive days,
of y(i) with and letx(i)
a given y(i)can
be the
be one-dimensional
expressed using
vector that
Equation (1): contained this feature for the ith single day. The prediction of y (i) with a given x(i) can be
In Equation (2), m is the number of training examples. For supervised machine learning, the data
In Equation
are typically (2), minto
divided is the number
two types,oftraining
trainingandexamples.
testing For supervised
data. However,machine
to obtainlearning,
a betterthe data
tuning
are typically divided into two types, training and testing data. However, to obtain a
model, validation data were used in addition to the original data. These validation data are referred better tuning
model,
to as thevalidation data were
development used
dataset, orindev
addition
dataset.to The
the original data.dataset
goal of this These validation data are
was to fine-tune referred
the hyper
to
parameters (architecture) of the ANN model. The model frequently used this data. However, hyper
as the development dataset, or dev dataset. The goal of this dataset was to fine-tune the it did
parameters (architecture)
not learn from this dataset.ofThis
the set
ANN hadmodel. The to
to be used model frequently
obtain the optimalused this data.
number However,
of hidden it The
units. did
not learn
dataset from
was this dataset.
divided using theThis set had
Sci-Kit to be
Learn usedastofollows.
library obtain the optimal
First, number
the data of hidden
was split units.
into training
The dataset was divided using the Sci-Kit Learn library as follows. First, the data was
and temporary data. Approximately 80% of the entire dataset was used as training data, and 20% split into training
and
was temporary data. Approximately
used as temporary data. The 80% of the entire
temporary dataset
dataset waswas used
split as two
into training data,
equal and 20%
parts, test was
and
validation. Entire data usage for both training and verification would increase the inaccuracy of
prediction and would increase the training errors. It would be better to use the divided dataset. We
think that cross-samplings can be good ways to decrease the errors. However, we did not use these
Symmetry 2018, 10, 648 5 of 18
used as temporary data. The temporary dataset was split into two equal parts, test and validation.
Entire data usage for both training and verification would increase the inaccuracy of prediction and
would increase the training errors. It would be better to use the divided dataset. We think that
Symmetry 2018, 10, 648 5 of 17
cross-samplings can be good ways to decrease the errors. However, we did not use these methods
because we had
methods to change
because thetoinput
we had changealgorithms
the inputinalgorithms
order to apply thetocross-samplings.
in order In future work,
apply the cross-samplings. In
we can apply
future thewe
work, algorithms.
can apply the algorithms.
Next, we defined
Next, we defined andandchose
chose ananaccurate
accuratearchitecture
architecturefor
for neural networkanalysis.
neural network analysis.This
This process
process
usually requires
usually significant
requires experience
significant experience because
becausemany
many factors
factorsmust
mustbebeefficiently
efficientlydecided.
decided. One
One factor
factor is
how ismany
how many
layerslayers
should should
be setbeinsetthe
in the model.
model. TheThe basicmodel
basic modelincluded
included oneone input
inputlayer,
layer,two
twohidden
hidden
layers,
layers, and output
and one one output
layer,layer, as illustrated
as illustrated in Figure
in Figure 2. The2.input
The input
layer layer contained
contained 35 neurons,
35 neurons, the
the hidden
hidden layers contained 50 neurons each, and the output layer had 1 neuron.
layers contained 50 neurons each, and the output layer had 1 neuron. The topology of our model was The topology of our
model wastopology.
the 35-50-50-1 the 35-50-50-1 topology.
Wind
Speed
Humidity
Daylight Avergage
Hours Tempaeraure
Cloudiness
Output
Input Data
Data Processing Neurons(Nodes)
Figure 2. The
Figure neural
2. The network
neural networkmodel
modelapplied
applied in
in the experiment.
the experiment.
AfterAfter
defining
defining thethemodel,
model,we weneeded
needed to makethe
to make theparameter
parameter settings,
settings, which which are listed
are listed in Tablein1.Table
The 1.
The columns
columns included the basic DNN model and regularization schemes, and the rows included thethe
included the basic DNN model and regularization schemes, and the rows included
parameters
parameters for each model.
for each The
model. Thefirst
firstparameter
parameter was the number
was the numberofofinput
input neurons.
neurons. ThisThis number
number was was
set toset
to 3535
forformost
mostmodels,
models, as as previously
previously described.
described. TheThe
nextnext parameter
parameter to set to
wassetthewas the number
number of hiddenoflayers,
hidden
layers, which
which werewere 2. In reality,
2. In reality, the valuethe value
could could beor
be increased increased
decreasedor decreased
according according
to the to the central
central processing unit
(CPU) capability.
processing unit (CPU) If the CPU capability
capability. was
If the CPUhigh, the number
capability wascould be decreased
high, the number because it took
could be less time.
decreased
When
because testedless
it took in our experiment,
time. When tested2 was in
theour
most appropriate 2value
experiment, wasbecause
the most theappropriate
processing time increased
value because
exponentially if the value was greater than 3. The third was the number of neurons
the processing time increased exponentially if the value was greater than 3. The third was the number in the hidden layers.
If the number
of neurons was higher,
in the hidden layers.then thenumber
If the results were better. However,
was higher, there was
then the results werea trade-off between the
better. However, there
number and processing time. In our experiment, the value was set to be 50. The number of output neurons
was a trade-off between the number and processing time. In our experiment, the value was set to be 50.
was set to 1 because the target feature had only one. The learning rate was set to 0.0001. Although
The number of output neurons was set to 1 because the target feature had only one. The learning rate
processing took a long time because it was slightly low, the results were more reliable. The proximal
was set to 0.0001. Although processing took a long time because it was slightly low, the results were
Adagrad optimizer algorithm was used to optimize our model. The batch size was 100 and the maximum
morenumber
reliable. The proximal Adagrad optimizer algorithm was used to optimize our model. The batch
of epochs was 100,000. The rectified linear unit (ReLU) was used for the activation function.
size was 100 and the maximum number of epochs was 100,000. The rectified linear unit (ReLU) was
used for the activation function.settings applied to a temperature prediction neural network model.
Table 1. Parameter
In the third step, a defined neural network model was trained using the original data where
Parameters for Each Typical Data Batch
regularization methods were not applied. During training, the rootL1mean
Autoencoder square errors (RMSEs) were
Regularization
Model DNN Augmentation Normalization
captured and the data were saved into a separate file. The RMSE value could be obtained using the
Number of input
following equation: 35 35 35 35 35
neurons
Number of hidden
2 2 2 2 2
layers
Symmetry 2018, 10, 648 6 of 18
v
u m 2
Answer ( Xi ) − Predict( Xi )
u1
RMSE(NNModel ) = t
M ∑ Answer ( Xi )
. (3)
i =1
In Equation (3), Answer (Xi ) is the real answer data at time i, and Predict (Xi ) is the value predicted by
the trained neural network model. In the fourth step, the defined neural network model was validated
and the validation errors were captured using Equation (1). In the third and fourth steps, overfitting
and underfitting were checked.
In the fifth step, regularization methods were applied to the original weather data. The applied
methods will be described in more detail in Section 2.2. In steps 5 and 6, a defined model was trained
and validated using the datasets. In step 7, the future temperature is predicted using the trained neural
network model where regularization methods were applied. Finally, each scheme is compared by
analyzing the train, validation, and prediction errors.
In Equation (4), z is known as the latent space representation. It is sometimes identified as a code or
latent variable. Here, σ is the activation function, such as the ReLU, sigmoid, or Leaky ReLU function.
W is the weight of the nodes, and b is the bias vector. In the reconstruction process, the same operation
is repeated, as shown in Equation (5) [43]:
x0 = σ0 (W 0 × z + b0 ). (5)
In the decoding process, the compressed data z is mapped to x0 , where x0 represents the
transformed input data with the same dimension as the input x value, and σ0 is an activation function
used to decompress the data. W 0 is the weight of the transformed nodes, and b0 is a bias in the
decoder. To obtain satisfactory performance using the autoencoder scheme, the decoding loss should
Symmetry 2018, 10, 648 7 of 18
be minimized. Sum squared errors (SSEs) or an RMSE function were used to measure the loss as in
Equation (6):
Symmetry 2018, 10, 648 7 of 17
F(x, x0 ) = ||x − x0 ||2. (6)
Data
Data augmentation
augmentation is is one
one ofof the
the most
most popular
popular regularization
regularization techniques.
techniques. The The main
main idea
idea of
of the
the
scheme
scheme isis to
to expand
expand the the training
training dataset
dataset by
by applying
applying transformations
transformations toto decrease
decrease overfitting.
overfitting. This
This
technique
technique isiscommonly
commonlyused usedin image
in image processing, sincesince
processing, imageimage
operations like rotating,
operations shifting,shifting,
like rotating, scaling,
mirroring,
scaling, mirroring, or randomly cropping can be easily implemented when using the scheme [44].data
or randomly cropping can be easily implemented when using the scheme [44]. For For
augmentation,
data augmentation,it is it
important
is importantto effectively control
to effectively noise.
control There
noise. are
There aresome
sometypes
typesofofnoise
noise that
that are
are
available for the scheme. Among them, the Gaussian noise control scheme is
available for the scheme. Among them, the Gaussian noise control scheme is the most widely used.the most widely used.
The
The scheme
scheme could
could bebe expressed
expressed using
using Equation
Equation (9)
(9) [45]:
[45]:
Lmin==Ep
Lmin Ep[(y
[(y−f(x
− f(x++μ))2].
µ))2]. (9)
(9)
In Equation (9), μ is the noise vector. This technique is effectively used for RNNs, whereas it is
In Equation (9), µ is the noise vector. This technique is effectively used for RNNs, whereas it is
seldom used in feed forward neural networks [46]. In this study, two augmentation techniques were
seldom used inThe
implemented. feedfirst
forward neural
type of data networks [46]. was
augmentation In this
to study,
sum up two augmentation
partial techniques
datasets. Let were
L(ji(i)) represent
implemented.
input feature dataThe for
firstweather
type of prediction,
data augmentation
where j ϵwas to…
{1, 2, sum upthe
n} is partial datasets.
number Let L jand
of features represent
i ϵ {1, 2,
input feature data for weather prediction, where j ∈ {1, 2, . . . n} is the number of
… m} is the number of identical categorical features (in our case, i is the number of days). The features and i ∈ final
{1, 2,
. . m}using
.input is the the
number of identical
augmented categorical
data for features (in
the jth categorical ourcould
data i isexpressed
case, be the number as of days). The final
input using the augmented data for the jth categorical 1
data could be expressed as
(𝑖)
input_dataj = ∑𝑚 𝐿 , (10)
𝑚 𝑖=1 𝑗
where Lj is the average humidity or average temperature. The number of input features is seven (j =
7), and the number of identical features is five (i = 5). The second augmentation summed the identical
categorical features, as shown in Equation (11):
(𝑖)
input_dataj = ∑𝑚
𝑖=1 𝐿𝑗 . (11)
Symmetry 2018, 10, 648 8 of 18
m
1
∑ Lj
(i )
input_data j = , (10)
m i =1
where Lj is the average humidity or average temperature. The number of input features is seven (j = 7),
and the number of identical features is five (i = 5). The second augmentation summed the identical
categorical features, as shown in Equation (11):
m
∑ Lj
(i )
input_data j = . (11)
i =1
Similar to the first data augmentation technique, the number of input neurons became 7. The
number of operations in the model decreased as the number of input data decreased, thereby
preventing overfitting.
The third scheme for regularization was batch normalization, which was proposed by Sergey and
Christian in 2015. After implementing batch normalization on a DNN, regularization techniques like
dropout [47] or L2 regularization were not required to tune the model. Instead, this method focused
on an internal covariate shift [48]. In addition, by implementing this method, they reduced the training
time of the model.
The fourth scheme applied in the experiment was an L1 regularization. L1 regularization is known
as the least absolute shrinkage and selection operator (LASSO), and was introduced by Robert [49].
The main idea behind the scheme is to regularize the loss function by completely removing the
irrelevant features from the model [36]. The equation of the scheme could be expressed as
m m
1
f (w, b) =
m ∑ L ( yi , y i ) − λ ∑ w j . (12)
i =1 j =1
In Equation (12), L(yi , yi ) is a loss function, m is the number of observations, yi is the predicted value
(whereas yi is the actual value), and λ is a non-negative regularization parameter. The main objective
was to minimize the f (w,b) function by penalizing weights in proportion to the sum of their absolute
values. As λ increases, w decreases. As λ” decreases, the variance increases.
• Activation_fn: The activation function was for each layer of the neural network. By default, ReLU
was fixed for the layers.
• Optimizer: In this feature of the class, we defined the optimizer type, which optimized the neural
network model’s weights throughout the training process.
• Hidden_units: This contained the number of hidden units (neurons) per layer. For example,
in [50], it means the first layer has 70 neurons and the second one has 50.
• Feature_columns: This argument contained the feature columns and data type used by the model.
• Model_dir: This was the directory for saving model parameters and graphs. In addition, it could
be used to load checkpoints from the directory into the estimator to continue training a previously
saved model.
•Symmetry 2018, 10, We
Dropout: 648 needed this feature for implementing a dropout regularization technique 9 ofin
17
our model.
Sometimes, when using DNNRegressor class, all techniques of regularization were not available.
Sometimes,
For example, forwhen
L1 andusing DNNRegressor
autoencoder, we had class, all another
to use techniques ofof
type regularization werewe
API. To do that, notused
available.
Keras
For
openexample, for L1 network
source neural and autoencoder, we had
library, which to use another
is written typeAs
in Python. ofitAPI.
can To
rundoonthat,
top we used Keras
of TensorFlow,
open
it wassource
easy toneural network
implement library,
these which istogether.
two libraries written in Python. As it can run on top of TensorFlow,
it wasFor
easy to implement
visualization, wethese
usedtwo thelibraries together.
TensorBoard visualization tool, which is a very powerful graph
visualization released by Google’s TensorFlow team. Thistool,
For visualization, we used the TensorBoard visualization toolwhich is aonly
is not very used
powerfulfor graph
graph
visualization
visualization,released
but alsobyimplemented
Google’s TensorFlow team. This tool
to plot quantitative is notononly
metrics the used for graph
execution of a visualization,
graph and to
but
showalso implemented
additional to plot
data (e.g., quantitative
images) metrics
that pass on the
through execution of
it. Moreover, a graph
using this and
tool,to show additional
a programmer can
data (e.g., images) that pass through it. Moreover, using this tool, a programmer can
debug a model easily. Another API applied for visualization in this research was Matplotlib plotting debug a model
easily.
library.Another
This API API provides
applied forextremely
visualization
widein this research was
visualization Matplotlibfor
techniques plotting
Python library. This API
programming
provides
language.extremely wide visualization techniques for Python programming language.
3.
3. Results
Results
This
Thischapter
chapter discusses
discusses the
the results.
results. First,
First, the
the results
results of
of the
the DNN
DNN modelmodel are
aredescribed,
described,which
whichwaswas
trained
trained without the use of any regularization techniques. Next, the results of the DNN models with
without the use of any regularization techniques. Next, the results of the DNN models with
regularization
regularizationtechniques
techniquesareare presented.
presented.Since overfitting
Since problems
overfitting are much
problems are more
muchvisible
moreinvisible
pictures,
in
our final results are visualized as graphs. The axes of the graphs consist of error
pictures, our final results are visualized as graphs. The axes of the graphs consist of error values and values and epoch
numbers. Even after
epoch numbers. Eventraining the network
after training model,model,
the network very lowvery RMSE valuesvalues
low RMSE seemed to be very
seemed to begood
very
accuracy. However, in some cases, they caused issues such as an overfitting
good accuracy. However, in some cases, they caused issues such as an overfitting problem. problem.
We
Wefirst
firstpresent
presentthe the
experiment
experimentresults of a DNN
results of awithout regularization.
DNN without As previously
regularization. discussed,
As previously
our model was established with the settings that are shown in Table 1. After
discussed, our model was established with the settings that are shown in Table 1. After training training 100,000 epochs,
RMSEs
100,000for training
epochs, and validation
RMSEs for trainingdataand
were plotted, as
validation shown
data wereinplotted,
Figures as4 and 5, respectively.
shown in Figures Figure
4 and 5,4
shows that, by increasing the epochs, the error for training data changed rapidly.
respectively. Figure 4 shows that, by increasing the epochs, the error for training data changed However, the overall
training
rapidly. errors
However,decreased.
the overall training errors decreased.
Figure 4.
Figure 4. Training
Training error
error results
results in
inaa deep
deepneural
neural network
network(DNN)
(DNN)without
withoutregularization
regularizationmethods.
methods.
Symmetry 2018, 10, 648 10 of 18
Figure 4. Training error results in a deep neural network (DNN) without regularization methods.
By comparing the results, we concluded that the errors for validation and training data decreased
By comparing
in the same range. Thethecloseness
results, between
we concluded that the and
the validation errors for validation
training andmeant
data errors training
thatdata
good
generalization was achieved. In some cases, it showed that the neural network modelmeant
decreased in the same range. The closeness between the validation and training data errors needed
that good generalization was achieved. In some cases, it showed that the neural network model
more training. However, since it was time-consuming, we continued our research by implementing
needed more training. However, since it was time-consuming, we continued our research by
regularization methods based on this model.
implementing regularization methods based on this model.
Experiment Results for Each Regularization Method
Experiment Results for Each Regularization Method
First, a model where the autoencoder method was applied with the same settings was tested.
First, a model where the autoencoder method was applied with the same settings was tested.
The results of the training errors are illustrated in Figures 6 and 7. As can be seen from Figure 6,
The results of the training errors are illustrated in Figures 6 and 7. As can be seen from Figure 6, the
the training errors did not increase as epoch number increased. However, validation errors became
training errors did not increase as epoch number increased. However, validation errors became
higher when the epoch number increased, as illustrated in Figure 7. Through the results, it is clearly
higher when the epoch number increased, as illustrated in Figure 7. Through the results, it is clearly
seen from
seen fromthe
thegraphs
graphsthat thatthe
themodel
model suffered
suffered from an underfitting
from an underfittingproblem
problemand
andcould
could
notnot learn
learn
anything.
anything.When
Whenweweconducted
conducted anan
experiment
experimentseveral
severaltimes
timesusing
usingananautoencoder,
autoencoder,the
theresults
resultswere
werenot
so not
good. Thus, we could not continue learning. In our analysis, the basic model of stacked encoder
so good. Thus, we could not continue learning. In our analysis, the basic model of stacked encoder
was notnot
was appropriate.
appropriate.ItItneeded
neededsome
somechanges
changes of
of structure.
structure.
Figure
Figure 6. 6.Training
Trainingmean
meansquare
square errors
errors (MSEs)
(MSEs) results
results in
in aa DNN
DNNmodel
modelfor
forwhich
whichanan
autoencoder
autoencoder
regularization method was applied.
regularization method was applied.
Second, we investigated the experiment results of the DNN model, where a batch normalization
method was applied. The results of this technique are given in Figures 8 and 9. As it is shown in the
figures, the results were more acceptable than those using the autoencoder. The training errors began
to decrease initially. However, the overall trend fluctuated constantly after approximately 3000 epochs.
Validation errors decreased and increased from the beginning. Even though there was a small decrease
around 10,000 epochs, the overall trend increased slightly. By comparing these two graphs, it became
clear that the DNN model using batch normalization was overfitted within a small range, because
validation errors increased in spite of the constant fluctuation in training errors.
Figure 7. Validation mean square errors results in a DNN model for which an autoencoder
regularization method was applied.
Figure
Symmetry 2018, 6.
10,Training
648 mean square errors (MSEs) results in a DNN model for which an autoencoder
11 of 18
regularization method was applied.
Symmetry 2018,
Symmetry 2018, 10,
10, 648
648 11 of
11 of 17
17
itit became
became clear
clear that
that the
the DNN
DNN model
model using
using batch
batch normalization
normalization was
was overfitted
overfitted within
within aa small
small range,
range,
Figure
Figure 7.
7.
because validation Validation
validation errors mean square
errors increased
increased in errors
in spite
spite ofresults
of the in
in a
a DNN
DNN
the constant model
model
constant fluctuation for
for
fluctuation in which
which
in trainingan
an autoencoder
autoencoder
training errors.
errors.
because
regularization
regularizationmethod
methodwaswasapplied.
applied.
Second, we investigated the experiment results of the DNN model, where a batch normalization
method was applied. The results of this technique are given in Figures 8 and 9. As it is shown in the
figures, the results were more acceptable than those using the autoencoder. The training errors began
to decrease initially. However, the overall trend fluctuated constantly after approximately 3000
epochs. Validation errors decreased and increased from the beginning. Even though there was a small
decrease around 10,000 epochs, the overall trend increased slightly. By comparing these two graphs,
Figure 8.
Figure 8. Training
Training mean
mean square
square errors
errors results
results in
in a DNN
DNN model
model where
where aa batch
batch normalization
normalization method
Figure 8. Training mean square errors results in aaDNN model where a batch normalization method
was applied.
was applied.
applied.
was
Figure9.
Figure
Figure 9. Validation
9. Validationmean
Validation meansquare
mean squareerrors
square errorsresults
errors resultsin
results inaaaDNN
in DNNmodel
DNN model where
model where aaa batch
where batch normalization
batch normalizationmethod
normalization method
method
wasapplied.
was
was applied.
applied.
Next,
Next,we
Next, wediscuss
we thethe
discuss
discuss DNN
the DNNmodel
DNN wherewhere
model
model a L1 regularization
where aa L1 method was
L1 regularization
regularization applied.
method
method was
was The experiment
applied.
applied. The
The
results
experimentare represented
results are in Figures
represented 10inand 11.
FiguresFigure
10 and 10 shows
11. Figurethat
10 the
shows training
that errors
the were
training
experiment results are represented in Figures 10 and 11. Figure 10 shows that the training errors were smoothly
errors were
diminished
smoothly as the epoch
smoothly diminished
diminished asnumber
as the epoch
the increased.
epoch numberHowever,
number increased.
increased. forHowever,
validationfor
However, errors,
for the trend
validation
validation showed
errors,
errors, the atrend
the rise
trend
from
showed the beginning
a rise of
from the
the epochs. This
beginning demonstrates
of the epochs. that even
This though
demonstratesthe
showed a rise from the beginning of the epochs. This demonstrates that even though the L1 L1 regularization
that even thoughtechnique
the L1
was the most popular
regularization technique
regularization model
technique was to
was the prevent
the most overfitting
most popular
popular model in
model toartificial
to preventintelligence,
prevent overfitting
overfitting init still suffered
in artificial from an
artificial intelligence,
intelligence,
overfitting
it still problem.
suffered from an overfitting problem.
it still suffered from an overfitting problem.
Next, we discuss the DNN model where a L1 regularization method was applied. The
experiment results are represented in Figures 10 and 11. Figure 10 shows that the training errors were
smoothly diminished as the epoch number increased. However, for validation errors, the trend
showed a rise from the beginning of the epochs. This demonstrates that even though the L1
regularization
Symmetry 2018, 10, technique
648 was the most popular model to prevent overfitting in artificial intelligence,
12 of 18
it still suffered from an overfitting problem.
Figure 11.
Figure 11. Validation
Validationerror
errorresults
resultsin
inaaDNN
DNNmodel
modelwhere
wherean
anL1
L1regularization
regularizationmethod
methodwas
wasapplied.
applied.
In
Inreality, a failure
reality, occurred
a failure during
occurred the experiments
during for all three
the experiments for types of regularization
all three techniques.
types of regularization
However,
techniques.theHowever,
cause of the
thefailures
cause ofwasthe undetermined.
failures was undetermined.
Fourth,
Fourth, thethe experiment
experiment results
results were
wereinvestigated
investigatedusing
usingdata
dataaugmentation.
augmentation. As As previously
previously
discussed,
discussed, we implemented two types of data augmentation for our investigation. The first scheme,
we implemented two types of data augmentation for our investigation. The first scheme,
which
which summed
summed the thefeatures,
features,performed
performedbetterbetterthan
thanother
otherregularization
regularizationmethods.
methods. AsAs shown
shown inin
Figures
Figures1212and
and13,13,the training
the trainingerrors simultaneously
errors simultaneously declined
declinedas the
as number of epochs
the number increased.
of epochs The
increased.
validation error error
The validation of theofmodel foundfound
the model its optimal valuevalue
its optimal at 40Katepochs. The validation
40K epochs. data error
The validation rose
data error
slightly after 40K
rose slightly epochs.
after From these
40K epochs. graphs,
From theseit graphs,
is shownitthat the DNN
is shown model
that the with
DNNdata augmentation
model with data
based on summing
augmentation basedhadonasumming
slight overfitting afteroverfitting
had a slight 40K epochs.after 40K epochs.
Figure12.
Figure 12.Training
Trainingerror
errorresults
resultsininaaDNN
DNNmodel
modelwhere
whereaadata
dataaugmentation
augmentationtechnique
techniquewas
wasapplied.
applied.
Symmetry 2018, 10, 648 13 of 18
Figure 12. Training error results in a DNN model where a data augmentation technique was applied.
Figure
Figure 13. Training
13.10,
Trainingerror
errorresults inin
results a DNN model
a DNN where
model a data
where augmentation
a data method
augmentation was applied
method that
was applied
Symmetry 2018, 648 13 of 17
Symmetry 2018, identical
summed 10, 648 categorical input features. 13 of 17
that summed identical categorical input features.
Fifth, experiment results using the data augmentation technique based on an average are
Fifth, experiment
Fifth, experiment results
results using
using the
the data
data augmentation
augmentation technique
technique based
based onon an
an average
average are
are
illustrated in Figures 14 and 15. The figures show that the overfitting issue was completely overcome
illustratedin
illustrated inFigures
Figures14 14and
and15.
15.The
Thefigures
figuresshow
showthat
thatthe
theoverfitting
overfittingissue
issuewas
wascompletely
completelyovercome
overcome
with this method. As is shown in Figure 14, training errors were considerably diminished throughout
withthis
with thismethod.
method.As Asisisshown
shownin inFigure
Figure14,14,training
trainingerrors
errorswere
wereconsiderably
considerablydiminished
diminishedthroughout
throughout
all the epochs. For the validation errors, those rapidly decreased until 20K epochs and stayed stable
allthe
all theepochs.
epochs.For Forthe
thevalidation
validationerrors,
errors,those
thoserapidly
rapidlydecreased
decreaseduntil
until20K
20Kepochs
epochsand
andstayed
stayedstable
stable
after the point shown in Figure 15. Then, the validation errors began to fall down slowly from 35K
afterthe
after thepoint
pointshown
shownin inFigure
Figure15.15.Then,
Then,the thevalidation
validation errors
errorsbegan
beganto tofall
falldown
downslowly
slowlyfrom
from35K
35K
epochs, and it showed the smallest error at 100K epochs. Notice the difference between training and
epochs,and
epochs, andititshowed
showedthe thesmallest
smallesterror
erroratat100K
100Kepochs.
epochs.Notice
Noticethe
thedifference
differencebetween
betweentraining
trainingand
and
validation errors were not high. This indicates that the dataset achieved good generalization.
validation errors were not high. This indicates that the dataset achieved good generalization.
validation errors were not high. This indicates that the dataset achieved good generalization.
Figure 14. Training error results in a DNN model where a data augmentation method was applied.
Figure 14.Training
Figure14. Trainingerror
errorresults
resultsininaaDNN
DNNmodel
modelwhere
whereaadata
dataaugmentation
augmentationmethod
methodwas
wasapplied.
applied.
Taking the average values of the same categorical features was applied.
Taking
Takingthe
theaverage
averagevalues
valuesofofthe
thesame
samecategorical
categoricalfeatures
featureswas
wasapplied.
applied.
Figure 15.Validation
Figure15. Validationerror
errorresults
resultsininaaDNN
DNNmodel
modelwhere
whereaadata
dataaugmentation
augmentationmethod
methodwas
wasapplied.
applied.
Figure 15. Validation error results in a DNN model where a data augmentation method was applied.
Taking
Takingthe
theaverage
averagevalues
valuesofofthe
thesame
samecategorical
categoricalfeatures
featureswas
wasapplied.
applied.
Taking the average values of the same categorical features was applied.
Next, we compared the accuracies of each scheme by evaluating the averaged mean square
Next, we compared the accuracies of each scheme by evaluating the averaged mean square
errors (MSEs) between estimated temperature and observed temperature. Statistically, the MSE is
errors (MSEs) between estimated temperature and observed temperature. Statistically, the MSE is
regarded as an important metric that is used in order to evaluate the performance of a predictor. By
regarded as an important metric that is used in order to evaluate the performance of a predictor. By
comparing the values, we could evaluate the precision and accuracy of predictors. The formula is
comparing the values, we could evaluate the precision and accuracy of predictors. The formula is
given below:
given below:
Symmetry 2018, 10, 648 14 of 18
Next, we compared the accuracies of each scheme by evaluating the averaged mean square errors
(MSEs) between estimated temperature and observed temperature. Statistically, the MSE is regarded
as an important metric that is used in order to evaluate the performance of a predictor. By comparing
the values, we could evaluate the precision and accuracy of predictors. The formula is given below:
n
1 2
MSE =
n ∑ ( Xk − Xk ) . (13)
k =1
Symmetry 2018, 10, 648 14 of 17
The statistical results are represented in Figure 16.
Figure 16. Comparison of average mean squared error for each regularization method.
Figure 16. Comparison of average mean squared error for each regularization method.
As it is shown in the figure, the autoencoder scheme was fairly high. For the other schemes,
As it is shown
the values were notin the figure,
high. Fromthe the
autoencoder scheme
results, L1 was fairlyand
regularization high.the
Forautoencoder
the other schemes, the values
still encountered
were
overfitting and underfitting problems. The batch normalization showed better performance than and
not high. From the results, L1 regularization and the autoencoder still encountered overfitting these
underfitting problems. The batch normalization showed better performance than
methods, as mentioned earlier. The DNN model with data augmentation showed the best performance. these methods, as
mentioned earlier. The DNN model with data augmentation showed the best performance.
Finally, we compared actual average temperature and predicted average temperature in all
Finally,
models. Thewe compared
prediction wasactual
doneaverage temperature
during ten days, fromand predicted
2018.03.01 untilaverage temperature
2018.03.10. in all
The results are
models. The prediction was done during ten days, from 2018.03.01 until 2018.03.10.
shown in Table 2. As can be seen from the table, the scheme that showed the worst performance The results are
shown
was thein Table 2. As canbecause
autoencoder be seen from
there the
wastable,
muchthedifference
scheme that showedactuality
between the worstand performance
prediction. was
For
the autoencoder
data augmentation because
and there
batch was much difference
normalization, between were
the differences actuality
fairlyand prediction.
small. For data
Sometimes, batch
augmentation
normalizationand batch normalization,
outperformed the differences
data augmentation during somewere fairly
days. From small. Sometimes,
the table, batch
we see that the
normalization outperformed data augmentation during some days. From
data augmentation showed the best performance because the prediction was nearly the same as the table, we see that thethe
data augmentation showed
real temperature for some days. the best performance because the prediction was nearly the same as the
real temperature for some days.
DNN
Without L1 Data Data Batch Real
Autoenc
Regulari Regulariza August August: Normalizat Temperat
Date oder
zation tion Sum Average ion ure
(°C)
Methods (°C) (°C) (°C) (°C) (°C)
(°C)
2018.03.01 2.3 6.3 10.1 1.5 1.0 3.9 4.6
2018.03.02 1.1 –1.9 17.6 –0.2 –0.8 1.0 –0.7
2018.03.03 2.9 –0.5 15.8 0.5 1.4 2.8 7.9
2018.03.04 3.5 17.1 15.1 1.9 0.6 10.3 9.8
2018.03.05 9.3 3.7 10.6 0.7 1.4 16.6 5.5
2018.03.06 4.3 2.7 17.7 4.8 4.5 –0.1 4.5
Symmetry 2018, 10, 648 15 of 18
4. Discussion
The study showed that the models using regularization techniques demonstrated better
performance than those without regularization methods in terms of training errors. When comparing
each scheme quantitatively, an autoencoder scheme exposed higher errors than other schemes. This
was because it encountered underfitting due to the lack of data caused by removing some of the
training data. With this result, the portion of removed data must be decreased for an autoencoder
when the training data are insufficient. In addition, L1 regularization and the autoencoder scheme
still encountered overfitting and underfitting. Batch normalization and data augmentation showed
better performance than the others when comparing the errors. When comparing the prediction
accuracy, data augmentation and batch normalization showed better performance than others. Of the
two schemes, batch normalization outperformed data augmentation on some days. This was because
much more training data was added to the original data instead of being removed. However, if too
much data was used for training, it required too much time to complete the training of the models,
demonstrating a tradeoff between training data and processing time. In our study, only one CPU
was used to train the neural network. If there was too much data, the training time was too long.
In future work, it is necessary to analyze how the training time varies and compare the results using
big data. One of the approaches to considerably decrease the training time is to use a compute unified
device architecture (CUDA) GPU, where the experimental data are stored in a distributed manner
and processed in parallel on a multiple-CPU computer. However, this scheme requires the installation
of proprietary applications and software. It also requires a change in the basic architecture of the
experimental software.
5. Conclusions
The main contribution of this work is to help developers to choose the most suitable scheme for
their neural network application by doing comparative research with the purpose of assessing the
training and validation errors of a model with regularization methods. In the existing research in the
literature of neural networks, there was no research about a comparison of regularization methods.
From our study, we see that regularization methods could solve overfitting and underfitting problems
efficiently, but, even though some regularization algorithms were applied, neural network models still
suffered from the same problems during training. This indicates that it is not easy to solve the problems
and a more enhanced solution needs to be devised to completely solve the problems. One remaining
aspect to reflect upon consists of a comparison of processing times for each regularization scheme.
For stacked autoencoders, it takes a longer time to finish training and validation. The reason for this is
not analyzed clearly yet.
Acknowledgments: This research was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF), funded by the Ministry of Education (NRF-2018R1D1A1B07045589).
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. McCulloch, W.S.; Pitts, W. A Logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys.
1943, 5, 115–133. [CrossRef]
2. William, A.A. On the Efficiency of Learning Machines. IEEE Trans. on Syst. Sci. Cybern. 1967, 3, 111–116.
3. Nicholas, V.F. Some New Approaches to Machine Learning. IEEE Trans. Syst. Sci. Cybern. 1969, 5, 173–182.
4. Zhang, W.; Li, C.; Huang, T.; He, X. Synchronization of Memristor-Based Coupling Recurrent Neural
Networks With Time-Varying Delays and Impulses. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 3308–3313.
[CrossRef] [PubMed]
5. Isomura, T. A Measure of Information Available for Inference. Entropy 2018, 20, 512. [CrossRef]
6. Elusaí Millán-Ocampo, D.; Parrales-Bahena, A.; González-Rodríguez, J.G.; Silva-Martínez, S.; Porcayo-Calderón, J.;
Hernández-Pérez, J.A. Modelling of Behavior for Inhibition Corrosion of Bronze Using Artificial Neural
Network (ANN). Entropy 2018, 20, 409. [CrossRef]
7. Jian, C.L.; Zhang, Y.; Yunxia, L. Non-Divergence of Stochastic Discrete Time Algorithms for PCA Neural
Networks. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 394–399.
8. Yin, Y.; Wang, L.; Gelenbe, E. Multi-layer neural networks for quality of service oriented server-state
classification in cloud servers. In Proceedings of the 2017 International Joint Conference on Neural Networks
(IJCNN), Anchorage, AK, USA, 14–19 May 2017.
9. Srivastava, N.; Geoffrey, H.; Alex, K.; Ilya, S.; Ruslan, S. Dropout: A simple way to prevent neural networks
from over-fitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
10. Suksri, S.; Warangkhana, K. Neural Network training model for weather forecasting using Fireworks
Algorithm. In Proceedings of the 2016 International Computer Science and Engineering Conference (ICSEC),
Chiang Mai, Thailand, 14–17 December 2016.
11. Abdelhadi, L.; Abdelkader, B. Over-fitting avoidance in probabilistic neural networks. In Proceedings of
the 2015 World Congress on Information Technology and Computer Applications (WCITCA), Hammamet,
Tunisia, 11–13 June 2015.
12. Singh, S.; Pankaj, B.; Jasmeen, G. Time series-based temperature prediction using back propagation with
genetic algorithm technique. Int. J. Comput. Sci. Issues 2011, 8, 293–304.
13. Abhishek, K. Weather forecasting model using artificial neural network. Procedia Tech. 2012, 4, 311–318.
[CrossRef]
14. Prasanta, R.J. Weather forecasting using artificial neural networks and data mining techniques. IJITR 2015, 3,
2534–2539.
15. Smith, B.A.; Ronald, W.M.; Gerrit, H. Improving air temperature prediction with artificial neural networks.
Int. J. Comput. Intell. 2006, 3, 179–186.
16. Zhang, S.; Hou, Y.; Wang, B.; Song, D. Regularizing Neural Networks via Retaining Confident Connections.
Entropy 2017, 19, 313. [CrossRef]
17. Kaur, A.; Sharma, J.K.; Sunil, A. Artificial neural networks in forecasting maximum and minimum relative
humidity. Int. J. Comput. Sci. Netw Secur. 2011, 11, 197–199.
18. Alemu, H.Z.; Wu, W.; Zhao, J. Feedforward Neural Networks with a Hidden Layer Regularization Method.
Symmetry 2018, 10, 525. [CrossRef]
19. Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. Royal Stat. Soc. 2011, 73,
273–282. [CrossRef]
20. Hung, N.Q. An artificial neural network model for rainfall forecasting in Bangkok, Thailand. Hydrol. Earth
Syst. Sci. 2009, 13, 1413–1425. [CrossRef]
21. Chattopadhyay, S. Feed forward Artificial Neural Network model to predict the average summer-monsoon
rainfall in India. Acta Geophys. 2007, 55, 369–382. [CrossRef]
22. Khajure, S.; Mohod, S.W. Future weather forecasting using soft computing techniques. Procedia Comput. Sci.
2016, 78, 402–407. [CrossRef]
Symmetry 2018, 10, 648 17 of 18
23. Cui, X.; Goel, V.; Kingabury, B. Data augmentation for deep neural network acoustic modeling. IEEE/ACM
Trans. Audio Speech Lang. Process. 2015, 23, 1469–1477.
24. Zhang, H.; Wang, Z.; Liu, D. A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent
Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1229–1262. [CrossRef]
25. Zhang, S.; Xia, Y.; Wang, J. A Complex-Valued Projection Neural Network for Constrained Optimization of
Real Functions in Complex Variables. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 3227–3238. [CrossRef]
[PubMed]
26. Takashi, M.; Hiroyuki, T. An Asynchronous Recurrent Network of Cellular Automaton-Based Neurons and
Its Reproduction of Spiking Neural Network Activities. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27,
836–852.
27. Hayati, M.; Zahra, M. Application of artificial neural networks for temperature forecasting. Int. J. Electr.
Comput. Eng. 2007, 1, 662–666.
28. Cao, W.; Yuan, J.; He, Z.; Zhang, Z. Fast Deep Neural Networks With Knowledge Guided Training and
Predicted Regions of Interests for Real-Time Video Object Detection. IEEE Acc. 2018, 6, 8990–8999. [CrossRef]
29. Wang, X.; Ma, H.; Chen, X.; You, S. Edge Preserving and Multi-Scale Contextual Neural Network for Salient
Object Detection. IEEE Trans. Image Process. 2018, 27, 121–134. [CrossRef] [PubMed]
30. Yue, S.; Rind, F.C. Collision detection in complex dynamic scenes using an LGMD-based visual neural
network with feature enhancement. IEEE Trans. Neural Netw. 2006, 17, 705–716. [PubMed]
31. Huang, S.-C.; Chen, B.-H. Highly Accurate Moving Object Detection in Variable Bit Rate Video-Based Traffic
Monitoring Systems. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1920–1931. [CrossRef] [PubMed]
32. Akcay, S.; Kundegorski, M.E.; Willcocks, C.G.; Breckon, T.P. Using Deep Convolutional Neural Network
Architectures for Object Classification and Detection Within X-Ray Baggage Security Imagery. IEEE Trans.
Inf. Forensic. Secur. 2018, 13, 2203–2215. [CrossRef]
33. Sevo, I.; Avramovic, A. Convolutional Neural Network Based Automatic Object Detection on Aerial Images.
IEEE Geo. Remote Sens. Lett. 2016, 13, 740–744. [CrossRef]
34. Woźniak, M.; Połap, D. Object detection and recognition via clustered features. Neurocomputing 2018, 320,
76–84. [CrossRef]
35. Vieira, S.; Pinaya, W.H.L.; Mechelli, A. Using deep learning to investigate the neuroimaging correlates of
psychiatric and neurological disorders: Methods and applications. Neurosci. Biobehav. Rev. 2017, 74, 58–75.
[CrossRef] [PubMed]
36. Połap, D.; Winnicka, A.; Serwata, K.; K˛esik, K.; Woźniak, M. An Intelligent System for Monitoring Skin
Diseases. Sensors 2018, 18, 2552. [CrossRef]
37. Heaton, J.B.; Polson, N.G.; Witte, J.H. Deep learning for finance: Deep portfolios. Appl. Stochastic Models Bus.
Ind. 2016, 33. [CrossRef]
38. Woźniak, M.; Połap, D.; Capizzi, G.; Sciuto, G.L.; Kośmider, L.; Frankiewicz, K. Small lung nodules detection
based on local variance analysis and probabilistic neural network. Compt. Methods Programs Biomed. 2018,
161, 173–180. [CrossRef] [PubMed]
39. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Laak, J.A.W.M.; Ginneken, B.v.;
Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88.
[CrossRef] [PubMed]
40. Woźniak, M.; Połap, D. Adaptive neuro-heuristic hybrid model for fruit peel defects detection. Neural Netw.
2018, 98, 16–33. [CrossRef] [PubMed]
41. Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep learning for smart manufacturing: Methods and
applications. J. Manuf. Syst. 2018, 48, 144–156. [CrossRef]
42. Mariel, A.-P.; Amadeo, A.C.; Isaac, C. Adaptive Identifier for Uncertain Complex Nonlinear Systems Based
on Continuous Neural Networks. IEEE Trans. Neural Netw. Learn. 2014, 25, 483–494.
43. Chang, C.H. Deep and Shallow Architecture of Multilayer Neural Networks. IEEE Trans. Neural Netw. Learn.
2015, 26, 2477–2486. [CrossRef] [PubMed]
44. Tycho, M.S.; Pedro, A.M.M.; Murray, S. The Partial Information Decomposition of Generative Neural
Network Models. Entropy 2017, 19, 474. [CrossRef]
45. Xin, W.; Yuanchao, L.; Ming, L.; Chengjie, S.; Xiaolong, W. Understanding Gating Operations in Recurrent
Neural Networks through Opinion Expression Extraction. Entropy 2016, 18, 294. [CrossRef]
Symmetry 2018, 10, 648 18 of 18
46. Sitian, Q.; Xiaoping, X. A Two-Layer Recurrent Neural Network for Nonsmooth Convex Optimization
Problems. IEEE Trans. Neural Netw. Learn. 2015, 26, 1149–1160. [CrossRef] [PubMed]
47. Saman, R.; Bryan, A. A New Formulation for Feedforward Neural Networks. IEEE Trans. Neural Netw. Learn.
2011, 22, 1588–1598.
48. Nan, Z. Study on the prediction of energy demand based on master slave neural network. In Proceedings
of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference,
Chongqing, China, 20–22 May 2016.
49. Feng, L.; Jacek, M.Z.; Yan, L.; Wei, W. Input Layer Regularization of Multilayer Feedforward Neural
Networks. IEEE Access 2017, 5, 10979–10985.
50. Armen, A. SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks.
In Proceedings of the 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), Exeter, UK, 21–23
June 2017.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).