0% found this document useful (0 votes)

54 views18 pages

A Comparison of Regularization Techniques in Deep

Uploaded by

rieritico

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views18 pages

A Comparison of Regularization Techniques in Deep

Uploaded by

rieritico

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

SS symmetry

Article
A Comparison of Regularization Techniques in Deep
Neural Networks
Ismoilov Nusrat 1 and Sung-Bong Jang 2, *
1 Department of Computer Software Engineering, Kumoh National Institute of Technology,
Gyeong-Buk 39177, South Korea; [email protected]
2 Department of Industry-Academy, Kumoh National Institute of Technology, Gyeong-Buk 39177, South Korea
* Correspondence: [email protected]; Tel.: +82-054-478-6708

Received: 29 October 2018; Accepted: 14 November 2018; Published: 18 November 2018

Abstract: Artificial neural networks (ANN) have attracted significant attention from researchers
because many complex problems can be solved by training them. If enough data are provided
during the training process, ANNs are capable of achieving good performance results. However,
if training data are not enough, the predefined neural network model suffers from overfitting and
underfitting problems. To solve these problems, several regularization techniques have been devised
and widely applied to applications and data analysis. However, it is difficult for developers to choose
the most suitable scheme for a developing application because there is no information regarding the
performance of each scheme. This paper describes comparative research on regularization techniques
by evaluating the training and validation errors in a deep neural network model, using a weather
dataset. For comparisons, each algorithm was implemented using a recent neural network library
of TensorFlow. The experiment results showed that an autoencoder had the worst performance
among schemes. When the prediction accuracy was compared, data augmentation and the batch
normalization scheme showed better performance than the others.

Keywords: deep neural networks; regularization methods; temperature prediction; tensor

flow library

1. Introduction
Accurate weather forecasting is an important issue that plays a significant role in the development
of several industrial sectors, such as agriculture and transportation. Many companies are using weather
prediction techniques to analyze consumer demands. In addition, exact forecasting is essential for
people to organize and plan their days. However, it is very difficult to predict the weather precisely
because the atmosphere changes dynamically. For a long time, physical simulations were the most
widely used scheme. With this method, the current atmospheric condition is sampled, and future
conditions are predicted by comparing thermodynamic characteristics. In recent years, artificial neural
networks (ANNs) have been widely used for weather prediction because they perform better through
the use of machine learning. The human brain is composed of 100 billion interconnected neurons.
These neurons are core cells that are responsible for information transmission among neurons using
electrochemical signals.
ANNs were modeled by using a mechanism inspired by the human brain’s information processing.
This scheme was first introduced to researchers in 1943 by Warren and Walter [1]. This scheme
is currently being used in almost every scientific area to solve complex problems. Williams [2]
presented the efficiency of machine learning algorithms, and proved that they could be applied to
many applications. Nicholas [3] proposed an enhanced scheme to train neural network algorithms.
In the scheme, a statistical computation scheme was used to reduce the training errors. Zhang [4]

Symmetry 2018, 10, 648; doi:10.3390/sym10110648 www.mdpi.com/journal/symmetry

Symmetry 2018, 10, 648 2 of 18

proposed a new recurrent neural network (RNN) scheme based on synchronization of delays and
impulses, which reduced prediction errors. Isomura [5] applied neural networks to knowledge
inference applications, useful information was collected, and an inference rule was extracted using
deep neural networks (DNNs). Elusai [6] used ANNs to model behavior to prevent bronze corrosion.
In the scheme, corrosion types were classified into different features, and future corrosion behaviors
were predicted. Jian [7] enhanced an existing principal component analysis (PCA) neural network that
was based on new discrete-time algorithms. The experiment results showed that validation errors
were significantly decreased. Yin [8] applied a DNN to classify server states for enhancing the quality
of service in cloud environments. The experiments showed that the scheme could be used to find used
or broken servers.
During the training process, the input data type and amount directly influence the performance of
the ANN model [9]. If the training data are deficient, overfitting or underfitting occurs [10]. Overfitting
refers to the phenomenon where the validation error increases while the training error decreases [11].
This occurs because the model learns the expected output for every input data instead of learning the
real data distribution [12,13]. In contrast, underfitting problems occur when a model cannot learn
enough because of insufficient training data [14,15]. Many solutions have been proposed to prevent
these problems. The most widely used method is regularization, where a small variation is applied
to the original data to efficiently train a model [16]. One of the advantages of this method is that it
achieves a better performance for unseen data. For weather prediction, it is used to predict rainfall,
temperature, and humidity [17].
In this study, we summarize prior research on weather prediction using ANNs. Several studies
have been completed on accurately predicting the weather. There studies included a method based on
an ANN model to predict the air temperature at hourly intervals for up to 12 h [18,19]. The prediction
error was minimized, and the method achieved good performance for short-term forecasting. Other
research [20,21] developed a new model to predict the hourly temperature for up to 24 hour. The model
used a separate winter, spring, summer, and fall season. Experiments were conducted to compare the
performances of well-known ANN models, including the Elman recurrent neural network (ERNN),
the radial basis function network (RBFN), the multilayer perceptron network (MLP), and the Hopfield
model (HFM). The MLP model with a single hidden layer and the RBFN model with two hidden layers
outperformed the other models. In the MLP experiment, the log-sigmoid function was used as the
activation function, and the Gaussian activation function was used for the hidden layers in the RBFN.
The temperatures for both models were measured and drawn with pure lines. Although the accuracy of
both models was identical, the RBFN had better processing times, because it took too long for the MLP
to learn data. Some research compared several ANN models by applying various transfer functions,
hidden layers, and neurons to predict the maximum temperature for the year. The model included
five hidden layers, and each hidden layer contained 10 or 16 neurons. The tan-sigmoid activation
function for hidden layers showed the bests results when using the logistic sigmoid function [22].
Xiaodong [23] presented a data augmentation scheme to improve performance when applied to audio
data. In the scheme, extra sampling data were added to the original input audio data and used for
training and validation. Huaguang [24] analyzed the stability of RNNs. An RNN was the most
widely used scheme in analyzing and predicting time series data. Songchuan [25] proposed a new
training algorithm, called fireworks, to predict the mean temperature. The algorithm showed fast
convergence and reduced training cycles. Takashi [26] proposed a new RNN algorithm that was based
on asynchronous negotiation and reproduction. This algorithm improved the prediction efficiency
when it was applied to time series data. Hayati [27] used an ANN model that contained a single
hidden layer and six neurons, which showed good performance results. Many of the latest advances
can be found in image processing and object detection. Cao [28] invented a fast DNN algorithm based
on additional knowledge during training. This was applied to object detection from streaming video,
and the results showed that it could reach good performance. Wang [29] applied a convolutional
neural network (CNN) to detecting a salient object from input images. A salient object refers to the
Symmetry 2018, 10, 648 3 of 18

most important object, which expresses the most outstanding characteristic from an image. To do this,
they used additional metadata together with a CNN. Yue [30] applied neural networks to detecting
a collision between cars. The video streaming captured from the traffic system was very complex
and dynamic. This made collision detection more difficult. To solve this problem, they presented an
enhanced DNN algorithm that was based on feature enhancement. Huang [31] used a neural network
to improve the detection accuracy of traffic monitoring systems. One of the problems in moving object
detection is that the accuracy becomes lower when there are too many moving objects in a video stream.
To solve this problem, they used a DNN to detect a moving object accurately. Akcay [32] used a deep
convolutional neural network (DCNN) to classify and detect an object from X-ray images scanned
in an airport. By using this, they could save time and expense spent in investigating and detecting a
dangerous object. Sevo [33] used a CNN to detect an object from images captured in the air. It is very
difficult to detect an object from the air because all scenes in the air are expressed as three-dimensional
(3D) images. By using a CNN, he could decrease the complexity of air object detection. One of the
most widely known areas where neural networks are applied can be said to be image processing.
Woźniak [34] presented an enhanced object detection method, where convolutional neural networks
were combined with the analysis of clustered numbers. To determine the points of clusters, they used
fuzzy logic. Vieira [35] presented methods and applications of deep learning that were applied to
neuroimaging. Neuroimaging is used to make an image structure of a human brain to cure mental
disease. In their paper, they insisted that deep learning could be an efficient method in improving
brain image quality by training the neural network. Polap [36] described a practice in which an ANN
was applied to detect potential diseases from body skin. In the method, skin data were collected by
using motion sensors and a camera, and an ANN model was trained using the data. Then, using the
model, they determined whether the skin had disease or not. Heaton [37] described an application of
deep-learning stochastic models in financial areas. In these areas, they used deep learning to predict
and classify financial data. Most doctors in hospitals used X-rays to classify carcinomas in chest organs.
However, it causes wrong diagnoses because it is difficult for radiologists to exactly interpret the X-ray
results. To solve this problem, Wozniak [38] applied neural networks to improving the accuracies
of carcinoma classification. The experimental results showed that they reached a 92% classification
accuracy. Litjens [39] described recent advancements of deep learning applications in analyzing images
in the medical industry. In the paper, they presented more than 300 pieces of research achieved in
this field. To give a concise review, they divided application areas for studies into 10 medical areas.
Wozniak [40] presented a new method based on neural networks to detect defects of fruit peels, which
was very different from a classical scheme. They invented an enhanced ANN algorithm called an
adaptive artificial neural network (AANN). By using this method, they could improve calculation
accuracy because it adapted to input data and their characteristics. Wang [41] presented an overview
of machine learning applications for manufacturing. Through the help of widespread sensors and the
Internet of Things (IoT), huge amounts of data could be collected in manufacturing systems. Deep
learning could be used to improve system performance and product quality by analyzing collected
big data.
As described earlier, active research is being conducted on neural networks and overfitting
solutions. However, there is no research that compares regularization schemes. Therefore, it is difficult
for developers to choose the most suitable scheme for developing an application, because there is
no information about the performance of each scheme. To solve this problem, this study presents
comparative research on regularization techniques by evaluating the training and validation errors in
a DNN model using weather datasets. Especially, the appropriate choice of the regularization scheme
is a very important process to manage huge augmented objects in intelligent mobile augmented reality
(IMAR) system.
The remainder of this paper is organized as follows. Section 2 describes the research methodology
and experiment setup. In Section 3, experiment results are described and analyzed. Section 4 presents
a discussion of the results. Finally, Section 5 concludes this work.
Symmetry 2018, 10, 648 4 of 18

2. Data and Methods

Symmetry 2018, 10, 648 4 of 17
2.1. Methodology
The methodology
The methodologyof of the the comparative
comparative research
research is represented
is represented in Figure in Figure
1. The entire1.methodology
The entire
methodology consisted of 9 steps. To compare the performance, several powerful
consisted of 9 steps. To compare the performance, several powerful regularization methods, includingregularization
methods,
an including
autoencoder, dataanaugmentation,
autoencoder, data batchaugmentation,
normalization,batchand normalization, andwere
L1 regularization, L1 regularization,
implemented
were implemented and studied. First, we built a DNN model without using any
and studied. First, we built a DNN model without using any regularization methods. We then trained regularization
methods.
and We the
validated thenmodel,
trainedandandthenvalidated
calculated thethemodel,
errors.and then
Next, we calculated
measured thetheerrors
errors.of Next, we
the same
measured
model the errors
without of thethe
changing same modelafter
settings, without changing
applying the settings,schemes.
regularization after applying regularization
Each regularization
scheme was analyzed by comparing the experiment results. Each step is described in moreEach
schemes. Each regularization scheme was analyzed by comparing the experiment results. step
detail in
is described in more
the following section. detail in the following section.

Start
Apply Regularization Methods to the Original Data

Collect Weather Datasets

Train the DNN model using the Datasets where
Regularization Methods Are Applied
Define architecture of DNN Model

Validate the DNN model using the Datasets where

Regularization Methods Are Applied
Train the DNN model using the Datasets where
Regularization Methods Are Not Applied
Predict Temperature using the Trained Model for
Each Regularization Methods
Validate the DNN model the Datasets where
Regularization Methods Are Not Applied
Compare Errors and Prediction Accuracy for Each
Regularization Schemes

End

Figure 1. A methodology for the comparative research.

Figure 1. A methodology for the comparative research.
First, the datasets that were used to train and validate the neural network model were collected.
In ourFirst, the datasets
experiment, that were
the Korean used to
weather train and
dataset was validate
collectedthe neural
from network model
the government wereItcollected.
website. included
In our experiment, the Korean weather dataset was collected from the
35 features, including average temperature, maximum temperature, minimum temperature, average government website. It
included 35 features, including average temperature, maximum temperature,
wind speed, average humidity, cloudiness, and daylight hours, for the previous five days. The average minimum
temperature, was
temperature average windfeature
a target speed,to
average
predicthumidity,
for the nextcloudiness,
day. Weand let xdaylight hours, for the previous
(i) be a 35-dimensional feature
five days. The average temperature was a target feature
(i) to predict for the next day.
vector for the ith set of five consecutive days, and let y be the one-dimensional vector that contained We let x(i) be a 35-

dimensional
this feature forfeature vector
the ith for day.
single the ith set prediction
The of five consecutive days,
of y(i) with and letx(i)
a given y(i)can
be the
be one-dimensional
expressed using
vector that
Equation (1): contained this feature for the ith single day. The prediction of y (i) with a given x(i) can be

expressed using Equation (1): Z(y(i) ) = σ(θ(x(i) )). (1)

Z(y(i)) =σ(θ(x(i))).vector, and σ is an activation function. The
In Equation (1), θ refers to a subset of the 35-dimensional (1)
cost function(1),
In Equation seeks to minimize
θ refers to a subset of the 35-dimensional vector, and σ is an activation function. The
cost function seeks to minimize
1 m ) 2
Cost(θ ) = 1 ∑ 𝑚
( Zθ ( x ((𝑖)
i)
) − y(i(𝑖)
) .2 (2)
𝐶𝑜𝑠𝑡(𝜃) = 22∑i= 1 (𝑍𝜃 (𝑥 ) − 𝑦 ) .
𝑖=1 (2)

In Equation (2), m is the number of training examples. For supervised machine learning, the data
In Equation
are typically (2), minto
divided is the number
two types,oftraining
trainingandexamples.
testing For supervised
data. However,machine
to obtainlearning,
a betterthe data
tuning
are typically divided into two types, training and testing data. However, to obtain a
model, validation data were used in addition to the original data. These validation data are referred better tuning
model,
to as thevalidation data were
development used
dataset, orindev
addition
dataset.to The
the original data.dataset
goal of this These validation data are
was to fine-tune referred
the hyper
to
parameters (architecture) of the ANN model. The model frequently used this data. However, hyper
as the development dataset, or dev dataset. The goal of this dataset was to fine-tune the it did
parameters (architecture)
not learn from this dataset.ofThis
the set
ANN hadmodel. The to
to be used model frequently
obtain the optimalused this data.
number However,
of hidden it The
units. did
not learn
dataset from
was this dataset.
divided using theThis set had
Sci-Kit to be
Learn usedastofollows.
library obtain the optimal
First, number
the data of hidden
was split units.
into training
The dataset was divided using the Sci-Kit Learn library as follows. First, the data was
and temporary data. Approximately 80% of the entire dataset was used as training data, and 20% split into training
and
was temporary data. Approximately
used as temporary data. The 80% of the entire
temporary dataset
dataset waswas used
split as two
into training data,
equal and 20%
parts, test was
and
validation. Entire data usage for both training and verification would increase the inaccuracy of
prediction and would increase the training errors. It would be better to use the divided dataset. We
think that cross-samplings can be good ways to decrease the errors. However, we did not use these
Symmetry 2018, 10, 648 5 of 18

used as temporary data. The temporary dataset was split into two equal parts, test and validation.
Entire data usage for both training and verification would increase the inaccuracy of prediction and
would increase the training errors. It would be better to use the divided dataset. We think that
Symmetry 2018, 10, 648 5 of 17
cross-samplings can be good ways to decrease the errors. However, we did not use these methods
because we had
methods to change
because thetoinput
we had changealgorithms
the inputinalgorithms
order to apply thetocross-samplings.
in order In future work,
apply the cross-samplings. In
we can apply
future thewe
work, algorithms.
can apply the algorithms.
Next, we defined
Next, we defined andandchose
chose ananaccurate
accuratearchitecture
architecturefor
for neural networkanalysis.
neural network analysis.This
This process
process
usually requires
usually significant
requires experience
significant experience because
becausemany
many factors
factorsmust
mustbebeefficiently
efficientlydecided.
decided. One
One factor
factor is
how ismany
how many
layerslayers
should should
be setbeinsetthe
in the model.
model. TheThe basicmodel
basic modelincluded
included oneone input
inputlayer,
layer,two
twohidden
hidden
layers,
layers, and output
and one one output
layer,layer, as illustrated
as illustrated in Figure
in Figure 2. The2.input
The input
layer layer contained
contained 35 neurons,
35 neurons, the
the hidden
hidden layers contained 50 neurons each, and the output layer had 1 neuron.
layers contained 50 neurons each, and the output layer had 1 neuron. The topology of our model was The topology of our
model wastopology.
the 35-50-50-1 the 35-50-50-1 topology.

Hidden Layer 1 Hidden Layer 2 Output Layers

Input Layers
(Neurons : 50) (Neurons : 50) (Node Number:1)
(Neurons:30)

Wind
Speed

Humidity

Daylight Avergage
Hours Tempaeraure

Cloudiness

Output
Input Data
Data Processing Neurons(Nodes)

Figure 2. The
Figure neural
2. The network
neural networkmodel
modelapplied
applied in
in the experiment.
the experiment.

AfterAfter
defining
defining thethemodel,
model,we weneeded
needed to makethe
to make theparameter
parameter settings,
settings, which which are listed
are listed in Tablein1.Table
The 1.
The columns
columns included the basic DNN model and regularization schemes, and the rows included thethe
included the basic DNN model and regularization schemes, and the rows included
parameters
parameters for each model.
for each The
model. Thefirst
firstparameter
parameter was the number
was the numberofofinput
input neurons.
neurons. ThisThis number
number was was
set toset
to 3535
forformost
mostmodels,
models, as as previously
previously described.
described. TheThe
nextnext parameter
parameter to set to
wassetthewas the number
number of hiddenoflayers,
hidden
layers, which
which werewere 2. In reality,
2. In reality, the valuethe value
could could beor
be increased increased
decreasedor decreased
according according
to the to the central
central processing unit
(CPU) capability.
processing unit (CPU) If the CPU capability
capability. was
If the CPUhigh, the number
capability wascould be decreased
high, the number because it took
could be less time.
decreased
When
because testedless
it took in our experiment,
time. When tested2 was in
theour
most appropriate 2value
experiment, wasbecause
the most theappropriate
processing time increased
value because
exponentially if the value was greater than 3. The third was the number of neurons
the processing time increased exponentially if the value was greater than 3. The third was the number in the hidden layers.
If the number
of neurons was higher,
in the hidden layers.then thenumber
If the results were better. However,
was higher, there was
then the results werea trade-off between the
better. However, there
number and processing time. In our experiment, the value was set to be 50. The number of output neurons
was a trade-off between the number and processing time. In our experiment, the value was set to be 50.
was set to 1 because the target feature had only one. The learning rate was set to 0.0001. Although
The number of output neurons was set to 1 because the target feature had only one. The learning rate
processing took a long time because it was slightly low, the results were more reliable. The proximal
was set to 0.0001. Although processing took a long time because it was slightly low, the results were
Adagrad optimizer algorithm was used to optimize our model. The batch size was 100 and the maximum
morenumber
reliable. The proximal Adagrad optimizer algorithm was used to optimize our model. The batch
of epochs was 100,000. The rectified linear unit (ReLU) was used for the activation function.
size was 100 and the maximum number of epochs was 100,000. The rectified linear unit (ReLU) was
used for the activation function.settings applied to a temperature prediction neural network model.
Table 1. Parameter
In the third step, a defined neural network model was trained using the original data where
Parameters for Each Typical Data Batch
regularization methods were not applied. During training, the rootL1mean
Autoencoder square errors (RMSEs) were
Regularization
Model DNN Augmentation Normalization
captured and the data were saved into a separate file. The RMSE value could be obtained using the
Number of input
following equation: 35 35 35 35 35
neurons
Number of hidden
2 2 2 2 2
layers
Symmetry 2018, 10, 648 6 of 18

v
u m 2
Answer ( Xi ) − Predict( Xi )
u1
RMSE(NNModel ) = t
M ∑ Answer ( Xi )
. (3)
i =1

In Equation (3), Answer (Xi ) is the real answer data at time i, and Predict (Xi ) is the value predicted by
the trained neural network model. In the fourth step, the defined neural network model was validated
and the validation errors were captured using Equation (1). In the third and fourth steps, overfitting
and underfitting were checked.

Table 1. Parameter settings applied to a temperature prediction neural network model.

Typical Data L1 Batch

Parameters for Each Model Autoencoder
DNN Augmentation Regularization Normalization
Number of input neurons 35 35 35 35 35
Number of hidden layers 2 2 2 2 2
Number of neurons in hidden layers 50 50 50 50 50
Number of output neurons 1 1 1 1 1
Learning rate 0.0001 0.0001 0.0001 0.0001 0.0001
Activation function ReLU ReLU ReLU ReLU ReLU
Proximal Proximal Proximal Proximal Proximal
Optimizer
Adagrad Adagrad Adagrad Adagrad Adagrad

In the fifth step, regularization methods were applied to the original weather data. The applied
methods will be described in more detail in Section 2.2. In steps 5 and 6, a defined model was trained
and validated using the datasets. In step 7, the future temperature is predicted using the trained neural
network model where regularization methods were applied. Finally, each scheme is compared by
analyzing the train, validation, and prediction errors.

2.2. Applied Regularization Methods in the Experiment

This section describes the regularization methods for the experiment. A widely used scheme
for regularization is the autoencoder scheme, which refers to an enhanced neural network with the
same number of neurons in the input and output layers. This scheme uses unsupervised learning,
because labels are not required when training the model. The scheme compresses the data received
from the input neurons into short code, and then decompresses this code into output neurons that
are very close to the input data. One of the goals of this scheme is to remove the noise from the input
data. The architecture is similar to MLP structure, and it has at least one input, hidden, and output
layer. This type of neural network consists of two parts, an encoder and a decoder. An encoder is a
network component that compresses the input. A decoder is used to reconstruct the encoded input. In
a simple autoencoder with a single layer, the encoder takes the x X input and compresses it to z Z.
The equation for calculating the compressed data z is as follows [42]:

z = σ(W × x + b). (4)

In Equation (4), z is known as the latent space representation. It is sometimes identified as a code or
latent variable. Here, σ is the activation function, such as the ReLU, sigmoid, or Leaky ReLU function.
W is the weight of the nodes, and b is the bias vector. In the reconstruction process, the same operation
is repeated, as shown in Equation (5) [43]:

x0 = σ0 (W 0 × z + b0 ). (5)

In the decoding process, the compressed data z is mapped to x0 , where x0 represents the
transformed input data with the same dimension as the input x value, and σ0 is an activation function
used to decompress the data. W 0 is the weight of the transformed nodes, and b0 is a bias in the
decoder. To obtain satisfactory performance using the autoencoder scheme, the decoding loss should
Symmetry 2018, 10, 648 7 of 18

be minimized. Sum squared errors (SSEs) or an RMSE function were used to measure the loss as in
Equation (6):
Symmetry 2018, 10, 648 7 of 17
F(x, x0 ) = ||x − x0 ||2. (6)

From Equations (3) and (4), we derived

F(x, Equation (7):
x’) = || x−x’||2. (6)
From Equations (3) and (4), we x0 ) = ||x
f(x,derived − σ0 (W(7):
Equation 0
z + b0 )||2. (7)
f(x, x’) = || x−σ’(W’z + b’)||2. (7)
By replacing z with Equation (7), the final equation of the loss function was as follows [30]:
By replacing z with Equation (7), the final equation of the loss function was as follows [30]:
f(x, x0 ) = ||x − σ0 (W 0 (σ(Wx + b)) + b0 )||2. (8)
f(x, x’) = ||x−σ’(W’(σ(Wx + b)) + b’)||2. (8)
There
There areare many
many types
types of
of autoencoder
autoencoder schemes.
schemes.
In
In our research, the stacked autoencoder was
our research, the stacked autoencoder was used
used to
to diminish
diminish the
the noise
noise from
from the
the input
input data
data and
and
simplify
simplify the tuning hyperparameters in a scheme. The architecture of the scheme is illustrated in
the tuning hyperparameters in a scheme. The architecture of the scheme is illustrated in
Figure
Figure 3.
3. It
It contained
contained one
one input
input layer,
layer, three
three hidden
hidden layers,
layers, and
and one
oneoutput
output layer.
layer.

Figure 3. Structure of autoencoder applied in the experiment.

Data
Data augmentation
augmentation is is one
one ofof the
the most
most popular
popular regularization
regularization techniques.
techniques. The The main
main idea
idea of
of the
the
scheme
scheme isis to
to expand
expand the the training
training dataset
dataset by
by applying
applying transformations
transformations toto decrease
decrease overfitting.
overfitting. This
This
technique
technique isiscommonly
commonlyused usedin image
in image processing, sincesince
processing, imageimage
operations like rotating,
operations shifting,shifting,
like rotating, scaling,
mirroring,
scaling, mirroring, or randomly cropping can be easily implemented when using the scheme [44].data
or randomly cropping can be easily implemented when using the scheme [44]. For For
augmentation,
data augmentation,it is it
important
is importantto effectively control
to effectively noise.
control There
noise. are
There aresome
sometypes
typesofofnoise
noise that
that are
are
available for the scheme. Among them, the Gaussian noise control scheme is
available for the scheme. Among them, the Gaussian noise control scheme is the most widely used.the most widely used.
The
The scheme
scheme could
could bebe expressed
expressed using
using Equation
Equation (9)
(9) [45]:
[45]:
Lmin==Ep
Lmin Ep[(y
[(y−f(x
− f(x++μ))2].
µ))2]. (9)
(9)
In Equation (9), μ is the noise vector. This technique is effectively used for RNNs, whereas it is
In Equation (9), µ is the noise vector. This technique is effectively used for RNNs, whereas it is
seldom used in feed forward neural networks [46]. In this study, two augmentation techniques were
seldom used inThe
implemented. feedfirst
forward neural
type of data networks [46]. was
augmentation In this
to study,
sum up two augmentation
partial techniques
datasets. Let were
L(ji(i)) represent
implemented.
input feature dataThe for
firstweather
type of prediction,
data augmentation
where j ϵwas to…
{1, 2, sum upthe
n} is partial datasets.
number Let L jand
of features represent
i ϵ {1, 2,
input feature data for weather prediction, where j ∈ {1, 2, . . . n} is the number of
… m} is the number of identical categorical features (in our case, i is the number of days). The features and i ∈ final
{1, 2,
. . m}using
.input is the the
number of identical
augmented categorical
data for features (in
the jth categorical ourcould
data i isexpressed
case, be the number as of days). The final
input using the augmented data for the jth categorical 1
data could be expressed as
(𝑖)
input_dataj = ∑𝑚 𝐿 , (10)
𝑚 𝑖=1 𝑗

where Lj is the average humidity or average temperature. The number of input features is seven (j =
7), and the number of identical features is five (i = 5). The second augmentation summed the identical
categorical features, as shown in Equation (11):
(𝑖)
input_dataj = ∑𝑚
𝑖=1 𝐿𝑗 . (11)
Symmetry 2018, 10, 648 8 of 18

m
1
∑ Lj
(i )
input_data j = , (10)
m i =1

where Lj is the average humidity or average temperature. The number of input features is seven (j = 7),
and the number of identical features is five (i = 5). The second augmentation summed the identical
categorical features, as shown in Equation (11):
m
∑ Lj
(i )
input_data j = . (11)
i =1

Similar to the first data augmentation technique, the number of input neurons became 7. The
number of operations in the model decreased as the number of input data decreased, thereby
preventing overfitting.
The third scheme for regularization was batch normalization, which was proposed by Sergey and
Christian in 2015. After implementing batch normalization on a DNN, regularization techniques like
dropout [47] or L2 regularization were not required to tune the model. Instead, this method focused
on an internal covariate shift [48]. In addition, by implementing this method, they reduced the training
time of the model.
The fourth scheme applied in the experiment was an L1 regularization. L1 regularization is known
as the least absolute shrinkage and selection operator (LASSO), and was introduced by Robert [49].
The main idea behind the scheme is to regularize the loss function by completely removing the
irrelevant features from the model [36]. The equation of the scheme could be expressed as
m m
1
f (w, b) =
m ∑ L ( yi , y i ) − λ ∑ w j . (12)
i =1 j =1

In Equation (12), L(yi , yi ) is a loss function, m is the number of observations, yi is the predicted value
(whereas yi is the actual value), and λ is a non-negative regularization parameter. The main objective
was to minimize the f (w,b) function by penalizing weights in proportion to the sum of their absolute
values. As λ increases, w decreases. As λ” decreases, the variance increases.

2.3. Experiment Setup

The hardware specification for the experiment was as follows. The desktop used was a Gigabyte
Z97X-UD3H personal computer running Windows 10 with an Intel Core i7-4790 K CPU and 8 GB of
RAM. The simulations were implemented using Python programming language and the TensorFlow
library. This library is very popular among machine learning application developers. A neural
network application can run on several CPUs and graphics processing units (GPUs) in parallel.
Supporting parallelism is one of the key features of the library. In addition, the library can be
available for multiple programming languages, such as Python, C++, and Java. There are many
higher-level application programming interfaces (APIs) that work with TensorFlow. For instance,
Keras API, TFLearn, and Sonnet are provided to easily train the model. In our study, we implemented
TensorFlow’s new higher-level constructs estimator. There were many advantages to the estimator:
• Without changing the model, our model could be run on local or distributed servers. In addition,
without the need to record our model, our estimator-based model could run CPUs, GPUs, or tensor
processing unit (TPU)s.
• It was much easier to develop a model with an estimator rather than low-level TensorFlow APIs.
• It could make a graph for us.
The estimator API provided an ordinary interface to train, evaluate, and predict functions.
For building our model, we used DNNRegressor class in the tf.estimator package. There were many
parameters in this class, but we will focus on the major ones as follows.
Symmetry 2018, 10, 648 9 of 18

• Activation_fn: The activation function was for each layer of the neural network. By default, ReLU
was fixed for the layers.
• Optimizer: In this feature of the class, we defined the optimizer type, which optimized the neural
network model’s weights throughout the training process.
• Hidden_units: This contained the number of hidden units (neurons) per layer. For example,
in [50], it means the first layer has 70 neurons and the second one has 50.
• Feature_columns: This argument contained the feature columns and data type used by the model.
• Model_dir: This was the directory for saving model parameters and graphs. In addition, it could
be used to load checkpoints from the directory into the estimator to continue training a previously
saved model.
•Symmetry 2018, 10, We
Dropout: 648 needed this feature for implementing a dropout regularization technique 9 ofin
17

our model.
Sometimes, when using DNNRegressor class, all techniques of regularization were not available.
Sometimes,
For example, forwhen
L1 andusing DNNRegressor
autoencoder, we had class, all another
to use techniques ofof
type regularization werewe
API. To do that, notused
available.
Keras
For
openexample, for L1 network
source neural and autoencoder, we had
library, which to use another
is written typeAs
in Python. ofitAPI.
can To
rundoonthat,
top we used Keras
of TensorFlow,
open
it wassource
easy toneural network
implement library,
these which istogether.
two libraries written in Python. As it can run on top of TensorFlow,
it wasFor
easy to implement
visualization, wethese
usedtwo thelibraries together.
TensorBoard visualization tool, which is a very powerful graph
visualization released by Google’s TensorFlow team. Thistool,
For visualization, we used the TensorBoard visualization toolwhich is aonly
is not very used
powerfulfor graph
graph
visualization
visualization,released
but alsobyimplemented
Google’s TensorFlow team. This tool
to plot quantitative is notononly
metrics the used for graph
execution of a visualization,
graph and to
but
showalso implemented
additional to plot
data (e.g., quantitative
images) metrics
that pass on the
through execution of
it. Moreover, a graph
using this and
tool,to show additional
a programmer can
data (e.g., images) that pass through it. Moreover, using this tool, a programmer can
debug a model easily. Another API applied for visualization in this research was Matplotlib plotting debug a model
easily.
library.Another
This API API provides
applied forextremely
visualization
widein this research was
visualization Matplotlibfor
techniques plotting
Python library. This API
programming
provides
language.extremely wide visualization techniques for Python programming language.

3.
3. Results
Results
This
Thischapter
chapter discusses
discusses the
the results.
results. First,
First, the
the results
results of
of the
the DNN
DNN modelmodel are
aredescribed,
described,which
whichwaswas
trained
trained without the use of any regularization techniques. Next, the results of the DNN models with
without the use of any regularization techniques. Next, the results of the DNN models with
regularization
regularizationtechniques
techniquesareare presented.
presented.Since overfitting
Since problems
overfitting are much
problems are more
muchvisible
moreinvisible
pictures,
in
our final results are visualized as graphs. The axes of the graphs consist of error
pictures, our final results are visualized as graphs. The axes of the graphs consist of error values and values and epoch
numbers. Even after
epoch numbers. Eventraining the network
after training model,model,
the network very lowvery RMSE valuesvalues
low RMSE seemed to be very
seemed to begood
very
accuracy. However, in some cases, they caused issues such as an overfitting
good accuracy. However, in some cases, they caused issues such as an overfitting problem. problem.
We
Wefirst
firstpresent
presentthe the
experiment
experimentresults of a DNN
results of awithout regularization.
DNN without As previously
regularization. discussed,
As previously
our model was established with the settings that are shown in Table 1. After
discussed, our model was established with the settings that are shown in Table 1. After training training 100,000 epochs,
RMSEs
100,000for training
epochs, and validation
RMSEs for trainingdataand
were plotted, as
validation shown
data wereinplotted,
Figures as4 and 5, respectively.
shown in Figures Figure
4 and 5,4
shows that, by increasing the epochs, the error for training data changed rapidly.
respectively. Figure 4 shows that, by increasing the epochs, the error for training data changed However, the overall
training
rapidly. errors
However,decreased.
the overall training errors decreased.

Figure 4.
Figure 4. Training
Training error
error results
results in
inaa deep
deepneural
neural network
network(DNN)
(DNN)without
withoutregularization
regularizationmethods.
methods.
Symmetry 2018, 10, 648 10 of 18
Figure 4. Training error results in a deep neural network (DNN) without regularization methods.

Symmetry 2018, 10, 648

Figure5.5.Validation
Validationerror
errorresults
results in
in aa DNN
DNN without 10 of 17
Figure without regularization
regularizationmethods.
methods.

By comparing the results, we concluded that the errors for validation and training data decreased
By comparing
in the same range. Thethecloseness
results, between
we concluded that the and
the validation errors for validation
training andmeant
data errors training
thatdata
good
generalization was achieved. In some cases, it showed that the neural network modelmeant
decreased in the same range. The closeness between the validation and training data errors needed
that good generalization was achieved. In some cases, it showed that the neural network model
more training. However, since it was time-consuming, we continued our research by implementing
needed more training. However, since it was time-consuming, we continued our research by
regularization methods based on this model.
implementing regularization methods based on this model.
Experiment Results for Each Regularization Method
Experiment Results for Each Regularization Method
First, a model where the autoencoder method was applied with the same settings was tested.
First, a model where the autoencoder method was applied with the same settings was tested.
The results of the training errors are illustrated in Figures 6 and 7. As can be seen from Figure 6,
The results of the training errors are illustrated in Figures 6 and 7. As can be seen from Figure 6, the
the training errors did not increase as epoch number increased. However, validation errors became
training errors did not increase as epoch number increased. However, validation errors became
higher when the epoch number increased, as illustrated in Figure 7. Through the results, it is clearly
higher when the epoch number increased, as illustrated in Figure 7. Through the results, it is clearly
seen from
seen fromthe
thegraphs
graphsthat thatthe
themodel
model suffered
suffered from an underfitting
from an underfittingproblem
problemand
andcould
could
notnot learn
learn
anything.
anything.When
Whenweweconducted
conducted anan
experiment
experimentseveral
severaltimes
timesusing
usingananautoencoder,
autoencoder,the
theresults
resultswere
werenot
so not
good. Thus, we could not continue learning. In our analysis, the basic model of stacked encoder
so good. Thus, we could not continue learning. In our analysis, the basic model of stacked encoder
was notnot
was appropriate.
appropriate.ItItneeded
neededsome
somechanges
changes of
of structure.
structure.

Figure
Figure 6. 6.Training
Trainingmean
meansquare
square errors
errors (MSEs)
(MSEs) results
results in
in aa DNN
DNNmodel
modelfor
forwhich
whichanan
autoencoder
autoencoder
regularization method was applied.
regularization method was applied.

Second, we investigated the experiment results of the DNN model, where a batch normalization
method was applied. The results of this technique are given in Figures 8 and 9. As it is shown in the
figures, the results were more acceptable than those using the autoencoder. The training errors began
to decrease initially. However, the overall trend fluctuated constantly after approximately 3000 epochs.
Validation errors decreased and increased from the beginning. Even though there was a small decrease
around 10,000 epochs, the overall trend increased slightly. By comparing these two graphs, it became
clear that the DNN model using batch normalization was overfitted within a small range, because
validation errors increased in spite of the constant fluctuation in training errors.

Figure 7. Validation mean square errors results in a DNN model for which an autoencoder
regularization method was applied.
Figure
Symmetry 2018, 6.
10,Training
648 mean square errors (MSEs) results in a DNN model for which an autoencoder
11 of 18
regularization method was applied.

Symmetry 2018,
Symmetry 2018, 10,
10, 648
648 11 of
11 of 17
17

itit became
became clear
clear that
that the
the DNN
DNN model
model using
using batch
batch normalization
normalization was
was overfitted
overfitted within
within aa small
small range,
range,
Figure
Figure 7.
7.
because validation Validation
validation errors mean square
errors increased
increased in errors
in spite
spite ofresults
of the in
in a
a DNN
DNN
the constant model
model
constant fluctuation for
for
fluctuation in which
which
in trainingan
an autoencoder
autoencoder
training errors.
errors.
because
regularization
regularizationmethod
methodwaswasapplied.
applied.

Second, we investigated the experiment results of the DNN model, where a batch normalization
method was applied. The results of this technique are given in Figures 8 and 9. As it is shown in the
figures, the results were more acceptable than those using the autoencoder. The training errors began
to decrease initially. However, the overall trend fluctuated constantly after approximately 3000
epochs. Validation errors decreased and increased from the beginning. Even though there was a small
decrease around 10,000 epochs, the overall trend increased slightly. By comparing these two graphs,

Figure 8.
Figure 8. Training
Training mean
mean square
square errors
errors results
results in
in a DNN
DNN model
model where
where aa batch
batch normalization
normalization method
Figure 8. Training mean square errors results in aaDNN model where a batch normalization method
was applied.
was applied.
applied.
was

Figure9.
Figure
Figure 9. Validation
9. Validationmean
Validation meansquare
mean squareerrors
square errorsresults
errors resultsin
results inaaaDNN
in DNNmodel
DNN model where
model where aaa batch
where batch normalization
batch normalizationmethod
normalization method
method
wasapplied.
was
was applied.
applied.

Next,
Next,we
Next, wediscuss
we thethe
discuss
discuss DNN
the DNNmodel
DNN wherewhere
model
model a L1 regularization
where aa L1 method was
L1 regularization
regularization applied.
method
method was
was The experiment
applied.
applied. The
The
results
experimentare represented
results are in Figures
represented 10inand 11.
FiguresFigure
10 and 10 shows
11. Figurethat
10 the
shows training
that errors
the were
training
experiment results are represented in Figures 10 and 11. Figure 10 shows that the training errors were smoothly
errors were
diminished
smoothly as the epoch
smoothly diminished
diminished asnumber
as the epoch
the increased.
epoch numberHowever,
number increased.
increased. forHowever,
validationfor
However, errors,
for the trend
validation
validation showed
errors,
errors, the atrend
the rise
trend
from
showed the beginning
a rise of
from the
the epochs. This
beginning demonstrates
of the epochs. that even
This though
demonstratesthe
showed a rise from the beginning of the epochs. This demonstrates that even though the L1 L1 regularization
that even thoughtechnique
the L1
was the most popular
regularization technique
regularization model
technique was to
was the prevent
the most overfitting
most popular
popular model in
model toartificial
to preventintelligence,
prevent overfitting
overfitting init still suffered
in artificial from an
artificial intelligence,
intelligence,
overfitting
it still problem.
suffered from an overfitting problem.
it still suffered from an overfitting problem.
Next, we discuss the DNN model where a L1 regularization method was applied. The
experiment results are represented in Figures 10 and 11. Figure 10 shows that the training errors were
smoothly diminished as the epoch number increased. However, for validation errors, the trend
showed a rise from the beginning of the epochs. This demonstrates that even though the L1
regularization
Symmetry 2018, 10, technique
648 was the most popular model to prevent overfitting in artificial intelligence,
12 of 18
it still suffered from an overfitting problem.

Symmetry 2018, 10, 648

Figure 10.
Figure 10. Training
Training error
error results
resultsin
inaaDNN
DNNmodel
modelwhere
wherean
anL1
L1regularization
regularizationmethod
methodwas applied.12 of 17
wasapplied.

Figure 11.
Figure 11. Validation
Validationerror
errorresults
resultsin
inaaDNN
DNNmodel
modelwhere
wherean
anL1
L1regularization
regularizationmethod
methodwas
wasapplied.
applied.

In
Inreality, a failure
reality, occurred
a failure during
occurred the experiments
during for all three
the experiments for types of regularization
all three techniques.
types of regularization
However,
techniques.theHowever,
cause of the
thefailures
cause ofwasthe undetermined.
failures was undetermined.
Fourth,
Fourth, thethe experiment
experiment results
results were
wereinvestigated
investigatedusing
usingdata
dataaugmentation.
augmentation. As As previously
previously
discussed,
discussed, we implemented two types of data augmentation for our investigation. The first scheme,
we implemented two types of data augmentation for our investigation. The first scheme,
which
which summed
summed the thefeatures,
features,performed
performedbetterbetterthan
thanother
otherregularization
regularizationmethods.
methods. AsAs shown
shown inin
Figures
Figures1212and
and13,13,the training
the trainingerrors simultaneously
errors simultaneously declined
declinedas the
as number of epochs
the number increased.
of epochs The
increased.
validation error error
The validation of theofmodel foundfound
the model its optimal valuevalue
its optimal at 40Katepochs. The validation
40K epochs. data error
The validation rose
data error
slightly after 40K
rose slightly epochs.
after From these
40K epochs. graphs,
From theseit graphs,
is shownitthat the DNN
is shown model
that the with
DNNdata augmentation
model with data
based on summing
augmentation basedhadonasumming
slight overfitting afteroverfitting
had a slight 40K epochs.after 40K epochs.

Figure12.
Figure 12.Training
Trainingerror
errorresults
resultsininaaDNN
DNNmodel
modelwhere
whereaadata
dataaugmentation
augmentationtechnique
techniquewas
wasapplied.
applied.
Symmetry 2018, 10, 648 13 of 18
Figure 12. Training error results in a DNN model where a data augmentation technique was applied.

Figure
Figure 13. Training
13.10,
Trainingerror
errorresults inin
results a DNN model
a DNN where
model a data
where augmentation
a data method
augmentation was applied
method that
was applied
Symmetry 2018, 648 13 of 17
Symmetry 2018, identical
summed 10, 648 categorical input features. 13 of 17
that summed identical categorical input features.
Fifth, experiment results using the data augmentation technique based on an average are
Fifth, experiment
Fifth, experiment results
results using
using the
the data
data augmentation
augmentation technique
technique based
based onon an
an average
average are
are
illustrated in Figures 14 and 15. The figures show that the overfitting issue was completely overcome
illustratedin
illustrated inFigures
Figures14 14and
and15.
15.The
Thefigures
figuresshow
showthat
thatthe
theoverfitting
overfittingissue
issuewas
wascompletely
completelyovercome
overcome
with this method. As is shown in Figure 14, training errors were considerably diminished throughout
withthis
with thismethod.
method.As Asisisshown
shownin inFigure
Figure14,14,training
trainingerrors
errorswere
wereconsiderably
considerablydiminished
diminishedthroughout
throughout
all the epochs. For the validation errors, those rapidly decreased until 20K epochs and stayed stable
allthe
all theepochs.
epochs.For Forthe
thevalidation
validationerrors,
errors,those
thoserapidly
rapidlydecreased
decreaseduntil
until20K
20Kepochs
epochsand
andstayed
stayedstable
stable
after the point shown in Figure 15. Then, the validation errors began to fall down slowly from 35K
afterthe
after thepoint
pointshown
shownin inFigure
Figure15.15.Then,
Then,the thevalidation
validation errors
errorsbegan
beganto tofall
falldown
downslowly
slowlyfrom
from35K
35K
epochs, and it showed the smallest error at 100K epochs. Notice the difference between training and
epochs,and
epochs, andititshowed
showedthe thesmallest
smallesterror
erroratat100K
100Kepochs.
epochs.Notice
Noticethe
thedifference
differencebetween
betweentraining
trainingand
and
validation errors were not high. This indicates that the dataset achieved good generalization.
validation errors were not high. This indicates that the dataset achieved good generalization.
validation errors were not high. This indicates that the dataset achieved good generalization.

Figure 14. Training error results in a DNN model where a data augmentation method was applied.
Figure 14.Training
Figure14. Trainingerror
errorresults
resultsininaaDNN
DNNmodel
modelwhere
whereaadata
dataaugmentation
augmentationmethod
methodwas
wasapplied.
applied.
Taking the average values of the same categorical features was applied.
Taking
Takingthe
theaverage
averagevalues
valuesofofthe
thesame
samecategorical
categoricalfeatures
featureswas
wasapplied.
applied.

Figure 15.Validation
Figure15. Validationerror
errorresults
resultsininaaDNN
DNNmodel
modelwhere
whereaadata
dataaugmentation
augmentationmethod
methodwas
wasapplied.
applied.
Figure 15. Validation error results in a DNN model where a data augmentation method was applied.
Taking
Takingthe
theaverage
averagevalues
valuesofofthe
thesame
samecategorical
categoricalfeatures
featureswas
wasapplied.
applied.
Taking the average values of the same categorical features was applied.

Next, we compared the accuracies of each scheme by evaluating the averaged mean square
Next, we compared the accuracies of each scheme by evaluating the averaged mean square
errors (MSEs) between estimated temperature and observed temperature. Statistically, the MSE is
errors (MSEs) between estimated temperature and observed temperature. Statistically, the MSE is
regarded as an important metric that is used in order to evaluate the performance of a predictor. By
regarded as an important metric that is used in order to evaluate the performance of a predictor. By
comparing the values, we could evaluate the precision and accuracy of predictors. The formula is
comparing the values, we could evaluate the precision and accuracy of predictors. The formula is
given below:
given below:
Symmetry 2018, 10, 648 14 of 18

Next, we compared the accuracies of each scheme by evaluating the averaged mean square errors
(MSEs) between estimated temperature and observed temperature. Statistically, the MSE is regarded
as an important metric that is used in order to evaluate the performance of a predictor. By comparing
the values, we could evaluate the precision and accuracy of predictors. The formula is given below:
n
1 2
MSE =
n ∑ ( Xk − Xk ) . (13)
k =1
Symmetry 2018, 10, 648 14 of 17
The statistical results are represented in Figure 16.

Figure 16. Comparison of average mean squared error for each regularization method.
Figure 16. Comparison of average mean squared error for each regularization method.
As it is shown in the figure, the autoencoder scheme was fairly high. For the other schemes,
As it is shown
the values were notin the figure,
high. Fromthe the
autoencoder scheme
results, L1 was fairlyand
regularization high.the
Forautoencoder
the other schemes, the values
still encountered
were
overfitting and underfitting problems. The batch normalization showed better performance than and
not high. From the results, L1 regularization and the autoencoder still encountered overfitting these
underfitting problems. The batch normalization showed better performance than
methods, as mentioned earlier. The DNN model with data augmentation showed the best performance. these methods, as
mentioned earlier. The DNN model with data augmentation showed the best performance.
Finally, we compared actual average temperature and predicted average temperature in all
Finally,
models. Thewe compared
prediction wasactual
doneaverage temperature
during ten days, fromand predicted
2018.03.01 untilaverage temperature
2018.03.10. in all
The results are
models. The prediction was done during ten days, from 2018.03.01 until 2018.03.10.
shown in Table 2. As can be seen from the table, the scheme that showed the worst performance The results are
shown
was thein Table 2. As canbecause
autoencoder be seen from
there the
wastable,
muchthedifference
scheme that showedactuality
between the worstand performance
prediction. was
For
the autoencoder
data augmentation because
and there
batch was much difference
normalization, between were
the differences actuality
fairlyand prediction.
small. For data
Sometimes, batch
augmentation
normalizationand batch normalization,
outperformed the differences
data augmentation during somewere fairly
days. From small. Sometimes,
the table, batch
we see that the
normalization outperformed data augmentation during some days. From
data augmentation showed the best performance because the prediction was nearly the same as the table, we see that thethe
data augmentation showed
real temperature for some days. the best performance because the prediction was nearly the same as the
real temperature for some days.

Table 2. Comparison of actual and predicted average temperature.

DNN
Without L1 Data Data Batch Real
Autoenc
Regulari Regulariza August August: Normalizat Temperat
Date oder
zation tion Sum Average ion ure
(°C)
Methods (°C) (°C) (°C) (°C) (°C)
(°C)
2018.03.01 2.3 6.3 10.1 1.5 1.0 3.9 4.6
2018.03.02 1.1 –1.9 17.6 –0.2 –0.8 1.0 –0.7
2018.03.03 2.9 –0.5 15.8 0.5 1.4 2.8 7.9
2018.03.04 3.5 17.1 15.1 1.9 0.6 10.3 9.8
2018.03.05 9.3 3.7 10.6 0.7 1.4 16.6 5.5
2018.03.06 4.3 2.7 17.7 4.8 4.5 –0.1 4.5
Symmetry 2018, 10, 648 15 of 18

Table 2. Comparison of actual and predicted average temperature.

DNN without L1 Data Batch Real

Autoencoder Data August:
Date Regularization Regularization ◦ C) August Normalization Temperature
( Average (◦ C)
Methods (◦ C) (◦ C) Sum (◦ C) (◦ C) (◦ C)
2018.03.01 2.3 6.3 10.1 1.5 1.0 3.9 4.6
2018.03.02 1.1 −1.9 17.6 −0.2 −0.8 1.0 −0.7
2018.03.03 2.9 −0.5 15.8 0.5 1.4 2.8 7.9
2018.03.04 3.5 17.1 15.1 1.9 0.6 10.3 9.8
2018.03.05 9.3 3.7 10.6 0.7 1.4 16.6 5.5
2018.03.06 4.3 2.7 17.7 4.8 4.5 −0.1 4.5
2018.03.07 2.6 4.4 17.6 8.3 9.1 9.4 6.4
2018.03.08 3.1 7.2 15.2 12.4 7.8 5.6 4.6
2018.03.09 6.6 3.1 16.6 2.7 4.7 4.0 4.5
2018.03.10 1.4 6.1 18.8 4.3 4.9 4.9 4.6

4. Discussion
The study showed that the models using regularization techniques demonstrated better
performance than those without regularization methods in terms of training errors. When comparing
each scheme quantitatively, an autoencoder scheme exposed higher errors than other schemes. This
was because it encountered underfitting due to the lack of data caused by removing some of the
training data. With this result, the portion of removed data must be decreased for an autoencoder
when the training data are insufficient. In addition, L1 regularization and the autoencoder scheme
still encountered overfitting and underfitting. Batch normalization and data augmentation showed
better performance than the others when comparing the errors. When comparing the prediction
accuracy, data augmentation and batch normalization showed better performance than others. Of the
two schemes, batch normalization outperformed data augmentation on some days. This was because
much more training data was added to the original data instead of being removed. However, if too
much data was used for training, it required too much time to complete the training of the models,
demonstrating a tradeoff between training data and processing time. In our study, only one CPU
was used to train the neural network. If there was too much data, the training time was too long.
In future work, it is necessary to analyze how the training time varies and compare the results using
big data. One of the approaches to considerably decrease the training time is to use a compute unified
device architecture (CUDA) GPU, where the experimental data are stored in a distributed manner
and processed in parallel on a multiple-CPU computer. However, this scheme requires the installation
of proprietary applications and software. It also requires a change in the basic architecture of the
experimental software.

5. Conclusions
The main contribution of this work is to help developers to choose the most suitable scheme for
their neural network application by doing comparative research with the purpose of assessing the
training and validation errors of a model with regularization methods. In the existing research in the
literature of neural networks, there was no research about a comparison of regularization methods.
From our study, we see that regularization methods could solve overfitting and underfitting problems
efficiently, but, even though some regularization algorithms were applied, neural network models still
suffered from the same problems during training. This indicates that it is not easy to solve the problems
and a more enhanced solution needs to be devised to completely solve the problems. One remaining
aspect to reflect upon consists of a comparison of processing times for each regularization scheme.
For stacked autoencoders, it takes a longer time to finish training and validation. The reason for this is
not analyzed clearly yet.

Author Contributions: All authors contributed equally to this work.

Funding: This research received no external funding.
Symmetry 2018, 10, 648 16 of 18

Acknowledgments: This research was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF), funded by the Ministry of Education (NRF-2018R1D1A1B07045589).
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. McCulloch, W.S.; Pitts, W. A Logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys.
1943, 5, 115–133. [CrossRef]
2. William, A.A. On the Efficiency of Learning Machines. IEEE Trans. on Syst. Sci. Cybern. 1967, 3, 111–116.
3. Nicholas, V.F. Some New Approaches to Machine Learning. IEEE Trans. Syst. Sci. Cybern. 1969, 5, 173–182.
4. Zhang, W.; Li, C.; Huang, T.; He, X. Synchronization of Memristor-Based Coupling Recurrent Neural
Networks With Time-Varying Delays and Impulses. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 3308–3313.
[CrossRef] [PubMed]
5. Isomura, T. A Measure of Information Available for Inference. Entropy 2018, 20, 512. [CrossRef]
6. Elusaí Millán-Ocampo, D.; Parrales-Bahena, A.; González-Rodríguez, J.G.; Silva-Martínez, S.; Porcayo-Calderón, J.;
Hernández-Pérez, J.A. Modelling of Behavior for Inhibition Corrosion of Bronze Using Artificial Neural
Network (ANN). Entropy 2018, 20, 409. [CrossRef]
7. Jian, C.L.; Zhang, Y.; Yunxia, L. Non-Divergence of Stochastic Discrete Time Algorithms for PCA Neural
Networks. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 394–399.
8. Yin, Y.; Wang, L.; Gelenbe, E. Multi-layer neural networks for quality of service oriented server-state
classification in cloud servers. In Proceedings of the 2017 International Joint Conference on Neural Networks
(IJCNN), Anchorage, AK, USA, 14–19 May 2017.
9. Srivastava, N.; Geoffrey, H.; Alex, K.; Ilya, S.; Ruslan, S. Dropout: A simple way to prevent neural networks
from over-fitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
10. Suksri, S.; Warangkhana, K. Neural Network training model for weather forecasting using Fireworks
Algorithm. In Proceedings of the 2016 International Computer Science and Engineering Conference (ICSEC),
Chiang Mai, Thailand, 14–17 December 2016.
11. Abdelhadi, L.; Abdelkader, B. Over-fitting avoidance in probabilistic neural networks. In Proceedings of
the 2015 World Congress on Information Technology and Computer Applications (WCITCA), Hammamet,
Tunisia, 11–13 June 2015.
12. Singh, S.; Pankaj, B.; Jasmeen, G. Time series-based temperature prediction using back propagation with
genetic algorithm technique. Int. J. Comput. Sci. Issues 2011, 8, 293–304.
13. Abhishek, K. Weather forecasting model using artificial neural network. Procedia Tech. 2012, 4, 311–318.
[CrossRef]
14. Prasanta, R.J. Weather forecasting using artificial neural networks and data mining techniques. IJITR 2015, 3,
2534–2539.
15. Smith, B.A.; Ronald, W.M.; Gerrit, H. Improving air temperature prediction with artificial neural networks.
Int. J. Comput. Intell. 2006, 3, 179–186.
16. Zhang, S.; Hou, Y.; Wang, B.; Song, D. Regularizing Neural Networks via Retaining Confident Connections.
Entropy 2017, 19, 313. [CrossRef]
17. Kaur, A.; Sharma, J.K.; Sunil, A. Artificial neural networks in forecasting maximum and minimum relative
humidity. Int. J. Comput. Sci. Netw Secur. 2011, 11, 197–199.
18. Alemu, H.Z.; Wu, W.; Zhao, J. Feedforward Neural Networks with a Hidden Layer Regularization Method.
Symmetry 2018, 10, 525. [CrossRef]
19. Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. Royal Stat. Soc. 2011, 73,
273–282. [CrossRef]
20. Hung, N.Q. An artificial neural network model for rainfall forecasting in Bangkok, Thailand. Hydrol. Earth
Syst. Sci. 2009, 13, 1413–1425. [CrossRef]
21. Chattopadhyay, S. Feed forward Artificial Neural Network model to predict the average summer-monsoon
rainfall in India. Acta Geophys. 2007, 55, 369–382. [CrossRef]
22. Khajure, S.; Mohod, S.W. Future weather forecasting using soft computing techniques. Procedia Comput. Sci.
2016, 78, 402–407. [CrossRef]
Symmetry 2018, 10, 648 17 of 18

23. Cui, X.; Goel, V.; Kingabury, B. Data augmentation for deep neural network acoustic modeling. IEEE/ACM
Trans. Audio Speech Lang. Process. 2015, 23, 1469–1477.
24. Zhang, H.; Wang, Z.; Liu, D. A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent
Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1229–1262. [CrossRef]
25. Zhang, S.; Xia, Y.; Wang, J. A Complex-Valued Projection Neural Network for Constrained Optimization of
Real Functions in Complex Variables. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 3227–3238. [CrossRef]
[PubMed]
26. Takashi, M.; Hiroyuki, T. An Asynchronous Recurrent Network of Cellular Automaton-Based Neurons and
Its Reproduction of Spiking Neural Network Activities. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27,
836–852.
27. Hayati, M.; Zahra, M. Application of artificial neural networks for temperature forecasting. Int. J. Electr.
Comput. Eng. 2007, 1, 662–666.
28. Cao, W.; Yuan, J.; He, Z.; Zhang, Z. Fast Deep Neural Networks With Knowledge Guided Training and
Predicted Regions of Interests for Real-Time Video Object Detection. IEEE Acc. 2018, 6, 8990–8999. [CrossRef]
29. Wang, X.; Ma, H.; Chen, X.; You, S. Edge Preserving and Multi-Scale Contextual Neural Network for Salient
Object Detection. IEEE Trans. Image Process. 2018, 27, 121–134. [CrossRef] [PubMed]
30. Yue, S.; Rind, F.C. Collision detection in complex dynamic scenes using an LGMD-based visual neural
network with feature enhancement. IEEE Trans. Neural Netw. 2006, 17, 705–716. [PubMed]
31. Huang, S.-C.; Chen, B.-H. Highly Accurate Moving Object Detection in Variable Bit Rate Video-Based Traffic
Monitoring Systems. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1920–1931. [CrossRef] [PubMed]
32. Akcay, S.; Kundegorski, M.E.; Willcocks, C.G.; Breckon, T.P. Using Deep Convolutional Neural Network
Architectures for Object Classification and Detection Within X-Ray Baggage Security Imagery. IEEE Trans.
Inf. Forensic. Secur. 2018, 13, 2203–2215. [CrossRef]
33. Sevo, I.; Avramovic, A. Convolutional Neural Network Based Automatic Object Detection on Aerial Images.
IEEE Geo. Remote Sens. Lett. 2016, 13, 740–744. [CrossRef]
34. Woźniak, M.; Połap, D. Object detection and recognition via clustered features. Neurocomputing 2018, 320,
76–84. [CrossRef]
35. Vieira, S.; Pinaya, W.H.L.; Mechelli, A. Using deep learning to investigate the neuroimaging correlates of
psychiatric and neurological disorders: Methods and applications. Neurosci. Biobehav. Rev. 2017, 74, 58–75.
[CrossRef] [PubMed]
36. Połap, D.; Winnicka, A.; Serwata, K.; K˛esik, K.; Woźniak, M. An Intelligent System for Monitoring Skin
Diseases. Sensors 2018, 18, 2552. [CrossRef]
37. Heaton, J.B.; Polson, N.G.; Witte, J.H. Deep learning for finance: Deep portfolios. Appl. Stochastic Models Bus.
Ind. 2016, 33. [CrossRef]
38. Woźniak, M.; Połap, D.; Capizzi, G.; Sciuto, G.L.; Kośmider, L.; Frankiewicz, K. Small lung nodules detection
based on local variance analysis and probabilistic neural network. Compt. Methods Programs Biomed. 2018,
161, 173–180. [CrossRef] [PubMed]
39. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Laak, J.A.W.M.; Ginneken, B.v.;
Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88.
[CrossRef] [PubMed]
40. Woźniak, M.; Połap, D. Adaptive neuro-heuristic hybrid model for fruit peel defects detection. Neural Netw.
2018, 98, 16–33. [CrossRef] [PubMed]
41. Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep learning for smart manufacturing: Methods and
applications. J. Manuf. Syst. 2018, 48, 144–156. [CrossRef]
42. Mariel, A.-P.; Amadeo, A.C.; Isaac, C. Adaptive Identifier for Uncertain Complex Nonlinear Systems Based
on Continuous Neural Networks. IEEE Trans. Neural Netw. Learn. 2014, 25, 483–494.
43. Chang, C.H. Deep and Shallow Architecture of Multilayer Neural Networks. IEEE Trans. Neural Netw. Learn.
2015, 26, 2477–2486. [CrossRef] [PubMed]
44. Tycho, M.S.; Pedro, A.M.M.; Murray, S. The Partial Information Decomposition of Generative Neural
Network Models. Entropy 2017, 19, 474. [CrossRef]
45. Xin, W.; Yuanchao, L.; Ming, L.; Chengjie, S.; Xiaolong, W. Understanding Gating Operations in Recurrent
Neural Networks through Opinion Expression Extraction. Entropy 2016, 18, 294. [CrossRef]
Symmetry 2018, 10, 648 18 of 18

46. Sitian, Q.; Xiaoping, X. A Two-Layer Recurrent Neural Network for Nonsmooth Convex Optimization
Problems. IEEE Trans. Neural Netw. Learn. 2015, 26, 1149–1160. [CrossRef] [PubMed]
47. Saman, R.; Bryan, A. A New Formulation for Feedforward Neural Networks. IEEE Trans. Neural Netw. Learn.
2011, 22, 1588–1598.
48. Nan, Z. Study on the prediction of energy demand based on master slave neural network. In Proceedings
of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference,
Chongqing, China, 20–22 May 2016.
49. Feng, L.; Jacek, M.Z.; Yan, L.; Wei, W. Input Layer Regularization of Multilayer Feedforward Neural
Networks. IEEE Access 2017, 5, 10979–10985.
50. Armen, A. SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks.
In Proceedings of the 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), Exeter, UK, 21–23
June 2017.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Weather Prediction Based On LSTM Model Implemented AWS Machine Learning Platform
100% (1)
Weather Prediction Based On LSTM Model Implemented AWS Machine Learning Platform
10 pages
Weather Prediction Model Recurrent Neural Network (Team Daniel Rizvi)
No ratings yet
Weather Prediction Model Recurrent Neural Network (Team Daniel Rizvi)
18 pages
Algorithms: Advances in Artificial Neural Networks - Methodological Development and Application
No ratings yet
Algorithms: Advances in Artificial Neural Networks - Methodological Development and Application
35 pages
Temperature Forecasting For Dar Es Salaam City Using Artificial Neural Network PDF
No ratings yet
Temperature Forecasting For Dar Es Salaam City Using Artificial Neural Network PDF
7 pages
Weather Prediction Using Ann Final Review
No ratings yet
Weather Prediction Using Ann Final Review
14 pages
Predicting Weather Forecaste Uncertainty With Machine Learning
No ratings yet
Predicting Weather Forecaste Uncertainty With Machine Learning
17 pages
Predicting The Future With Artificial Inteligence
No ratings yet
Predicting The Future With Artificial Inteligence
10 pages
An Application of Artificial Neural Networks For Rainfall Forecasting
No ratings yet
An Application of Artificial Neural Networks For Rainfall Forecasting
11 pages
Development and Analysis of Artificial Neural Netw
No ratings yet
Development and Analysis of Artificial Neural Netw
9 pages
Example ANN
No ratings yet
Example ANN
14 pages
Predictive Maintenance of Electromechanical Systems Based On Enhanced Generative Adversarial Neural Network With Convolutional Neural Network
No ratings yet
Predictive Maintenance of Electromechanical Systems Based On Enhanced Generative Adversarial Neural Network With Convolutional Neural Network
9 pages
The Weather Forecast Using Data Mining Research Based On Cloud Computing
No ratings yet
The Weather Forecast Using Data Mining Research Based On Cloud Computing
7 pages
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
No ratings yet
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
11 pages
1 s2.0 S1474667016374900 Main
No ratings yet
1 s2.0 S1474667016374900 Main
6 pages
Gru Ae
No ratings yet
Gru Ae
18 pages
Weather Forecasting Using Decision Tree Regression
No ratings yet
Weather Forecasting Using Decision Tree Regression
7 pages
Applsci 10 01609 v2
No ratings yet
Applsci 10 01609 v2
24 pages
A Deep Learning Framework For Temperature Forecasting
No ratings yet
A Deep Learning Framework For Temperature Forecasting
6 pages
Temperature Prediction Using A Neofuzzy Neuron Approach
No ratings yet
Temperature Prediction Using A Neofuzzy Neuron Approach
5 pages
Weather Forecasting Using Deep Learning Techniques: Asalman@binus - Edu
No ratings yet
Weather Forecasting Using Deep Learning Techniques: Asalman@binus - Edu
5 pages
Applied Sciences: An ANN Model Trained On Regional Data in The Prediction of Particular Weather Conditions
No ratings yet
Applied Sciences: An ANN Model Trained On Regional Data in The Prediction of Particular Weather Conditions
46 pages
A New Deep Neural Network For Forecasting Deep Dendritic Artificial Neural Network
No ratings yet
A New Deep Neural Network For Forecasting Deep Dendritic Artificial Neural Network
25 pages
Civil-Applications of Artificial Neural Networks in Civil Engineering
100% (1)
Civil-Applications of Artificial Neural Networks in Civil Engineering
25 pages
Environsciproc 26 00049
No ratings yet
Environsciproc 26 00049
6 pages
(IJCST-V12I1P6) :kaushik Kashyap, Rinku Moni Borah, Priyanku Rahang, DR Bornali Gogoi, Prof. Nelson R Varte
No ratings yet
(IJCST-V12I1P6) :kaushik Kashyap, Rinku Moni Borah, Priyanku Rahang, DR Bornali Gogoi, Prof. Nelson R Varte
5 pages
Advances in Artificial Intelligence and Security 2021 471 480
No ratings yet
Advances in Artificial Intelligence and Security 2021 471 480
10 pages
Mathematics: Analysis of A Predictive Mathematical Model of Weather Changes Based On Neural Networks
No ratings yet
Mathematics: Analysis of A Predictive Mathematical Model of Weather Changes Based On Neural Networks
17 pages
Anjali 2019
No ratings yet
Anjali 2019
5 pages
Innovative Machine Learning Approaches For Prediction of Weather Parameters
No ratings yet
Innovative Machine Learning Approaches For Prediction of Weather Parameters
8 pages
Part 5
No ratings yet
Part 5
5 pages
Application of Artificial Neural Networks For Temperature Forecasting
No ratings yet
Application of Artificial Neural Networks For Temperature Forecasting
5 pages
Weather Prediction With Machine Learning
No ratings yet
Weather Prediction With Machine Learning
5 pages
Self Adaptive Deep Neural Network Numerical Approximation To Functions and PDEs
No ratings yet
Self Adaptive Deep Neural Network Numerical Approximation To Functions and PDEs
16 pages
Amir ND Time Series Prediction
No ratings yet
Amir ND Time Series Prediction
8 pages
An Interactive Framework For Analysis of Weather Prediction Using Digital Image Processing
100% (1)
An Interactive Framework For Analysis of Weather Prediction Using Digital Image Processing
8 pages
Previous Research: Et - Al and Zahra Mohebi, 2007)
No ratings yet
Previous Research: Et - Al and Zahra Mohebi, 2007)
3 pages
Weather Forecasting Using Neural Network IJERTCONV5IS01197
No ratings yet
Weather Forecasting Using Neural Network IJERTCONV5IS01197
4 pages
Simple Introduction of Neural Network
No ratings yet
Simple Introduction of Neural Network
28 pages
An Ann Based Weather Forecasting System
100% (1)
An Ann Based Weather Forecasting System
4 pages
A Flexible and Lightweight Deep Learning Weather Forecasting Model
No ratings yet
A Flexible and Lightweight Deep Learning Weather Forecasting Model
12 pages
SSRN Id3380834 Code3457479 240609 192018
No ratings yet
SSRN Id3380834 Code3457479 240609 192018
6 pages
Toshiba Tosvert VF S11 Manual PDF
50% (2)
Toshiba Tosvert VF S11 Manual PDF
81 pages
Dynamic Modeling Technique For Weather Prediction: Jyotismita Goswami
No ratings yet
Dynamic Modeling Technique For Weather Prediction: Jyotismita Goswami
8 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
36 Predicting
No ratings yet
36 Predicting
8 pages
Weather Forecast Prediction: A Data Mining Application: Abstract
No ratings yet
Weather Forecast Prediction: A Data Mining Application: Abstract
6 pages
A Comparison Between Neural Networks and Traditional Forecasting Methods A Case Study
No ratings yet
A Comparison Between Neural Networks and Traditional Forecasting Methods A Case Study
6 pages
Literature Survey
No ratings yet
Literature Survey
3 pages
Schwalbe 2019
No ratings yet
Schwalbe 2019
5 pages
Researchpaper
No ratings yet
Researchpaper
8 pages
Ieee International Conference-Template
No ratings yet
Ieee International Conference-Template
4 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
10 pages
Temperature and Humidity Data Analysis For Future Value Prediction Using Clustering Technique: An Approach
No ratings yet
Temperature and Humidity Data Analysis For Future Value Prediction Using Clustering Technique: An Approach
4 pages
Report
No ratings yet
Report
5 pages
Fruit Old
No ratings yet
Fruit Old
37 pages
Weather Prediction System
No ratings yet
Weather Prediction System
17 pages
14 IJAEST Volume No 2 Issue No 1 A Study On The Parameters of Back Propagation Artificial Neural Network Temperature Prediction 099 103
No ratings yet
14 IJAEST Volume No 2 Issue No 1 A Study On The Parameters of Back Propagation Artificial Neural Network Temperature Prediction 099 103
5 pages
Teste 10 Ano - Technology (Inglês)
100% (2)
Teste 10 Ano - Technology (Inglês)
5 pages
Your Energy Bill
No ratings yet
Your Energy Bill
4 pages
FCI ANSIFCI 70-3 - Standard For Regulator Seat Leakage Testing
No ratings yet
FCI ANSIFCI 70-3 - Standard For Regulator Seat Leakage Testing
5 pages
Oracle Leasing and Finance Management
No ratings yet
Oracle Leasing and Finance Management
74 pages
Introduction To Machine Learning PART 1
No ratings yet
Introduction To Machine Learning PART 1
6 pages
Usm Thesis Format Ips
100% (3)
Usm Thesis Format Ips
6 pages
Students Perceptions On Online Education
No ratings yet
Students Perceptions On Online Education
4 pages
O133932v89 SUPER 19003L EN 2951844 MPW 080221
No ratings yet
O133932v89 SUPER 19003L EN 2951844 MPW 080221
18 pages
Raptor 2024
No ratings yet
Raptor 2024
8 pages
Quiz 2 AIS Niko Arniño
No ratings yet
Quiz 2 AIS Niko Arniño
8 pages
Radio Com & Nav System
No ratings yet
Radio Com & Nav System
32 pages
Essentials of Cloud Computing A Holistic Perspective Surianarayanan - The Ebook in PDF Format Is Ready For Download
100% (3)
Essentials of Cloud Computing A Holistic Perspective Surianarayanan - The Ebook in PDF Format Is Ready For Download
68 pages
CH 7 - PMTTD
No ratings yet
CH 7 - PMTTD
32 pages
Cummins 220 KW
No ratings yet
Cummins 220 KW
7 pages
Monalyn Señaris - 2.2.1.4 Packet Tracer - Simulating IoT Devices
No ratings yet
Monalyn Señaris - 2.2.1.4 Packet Tracer - Simulating IoT Devices
5 pages
Statistical Analysis System: First SAS Program
No ratings yet
Statistical Analysis System: First SAS Program
8 pages
CFS Families
No ratings yet
CFS Families
4 pages
Bảng thông số sản phẩm: Micrologic 5.0 P trip unit - LSI - for NW 08..63 drawout
No ratings yet
Bảng thông số sản phẩm: Micrologic 5.0 P trip unit - LSI - for NW 08..63 drawout
3 pages
Numerical Diff and Integration
No ratings yet
Numerical Diff and Integration
56 pages
PCworth Product Pricelist
No ratings yet
PCworth Product Pricelist
22 pages
Permanent Formwork For Concrete
No ratings yet
Permanent Formwork For Concrete
6 pages
Foods 09 00963
No ratings yet
Foods 09 00963
25 pages
Bus Times
No ratings yet
Bus Times
2 pages
Flyer D-Volt 202407
No ratings yet
Flyer D-Volt 202407
2 pages
6 BSTs and AVL Trees
No ratings yet
6 BSTs and AVL Trees
12 pages
Application Form
No ratings yet
Application Form
8 pages
Applied Sciences: Real-Time Guitar Amplifier Emulation With Deep Learning
No ratings yet
Applied Sciences: Real-Time Guitar Amplifier Emulation With Deep Learning
18 pages
Motioneering - Damping Solutions
No ratings yet
Motioneering - Damping Solutions
1 page
Kinetics of Saccharose Fermentation by Kombucha
No ratings yet
Kinetics of Saccharose Fermentation by Kombucha
8 pages
Tentative 3rd International Conference On Communication
No ratings yet
Tentative 3rd International Conference On Communication
2 pages
Beneficial Effects of Electromagnetic Fields: Bioelectric Research Center, Columbia University, Riverdale, New York
No ratings yet
Beneficial Effects of Electromagnetic Fields: Bioelectric Research Center, Columbia University, Riverdale, New York
7 pages
Accepted Manuscript To Appear in FRACTALS
No ratings yet
Accepted Manuscript To Appear in FRACTALS
9 pages
Operational Amplifier
No ratings yet
Operational Amplifier
18 pages
Bond Strength of Concrete Plugs Embedded in Tubula PDF
No ratings yet
Bond Strength of Concrete Plugs Embedded in Tubula PDF
16 pages
Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models
No ratings yet
Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models
5 pages
Possible Mechanisms of Cochlear Two-Tone Suppression Represented by Vector Subtraction Within A Model
No ratings yet
Possible Mechanisms of Cochlear Two-Tone Suppression Represented by Vector Subtraction Within A Model
11 pages
User Agreement - HUMAINE Database: Condi:ons of Release
No ratings yet
User Agreement - HUMAINE Database: Condi:ons of Release
1 page
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet

A Comparison of Regularization Techniques in Deep

Uploaded by

A Comparison of Regularization Techniques in Deep

Uploaded by

SS symmetry

Keywords: deep neural networks; regularization methods; temperature prediction; tensor

Symmetry 2018, 10, 648; doi:10.3390/sym10110648 www.mdpi.com/journal/symmetry

2. Data and Methods

Collect Weather Datasets

Validate the DNN model using the Datasets where

Figure 1. A methodology for the comparative research.

expressed using Equation (1): Z(y(i) ) = σ(θ(x(i) )). (1)

Hidden Layer 1 Hidden Layer 2 Output Layers

Table 1. Parameter settings applied to a temperature prediction neural network model.

Typical Data L1 Batch

2.2. Applied Regularization Methods in the Experiment

z = σ(W × x + b). (4)

From Equations (3) and (4), we derived

Figure 3. Structure of autoencoder applied in the experiment.

2.3. Experiment Setup

Symmetry 2018, 10, 648

Symmetry 2018, 10, 648

Table 2. Comparison of actual and predicted average temperature.

Table 2. Comparison of actual and predicted average temperature.

DNN without L1 Data Batch Real

Author Contributions: All authors contributed equally to this work.

You might also like