0% found this document useful (0 votes)
11 views

Energy Demand Forecasting Using Deep Learning Applications for the French Grid

This paper presents a deep learning approach for energy demand forecasting in France, utilizing a mixed architecture of convolutional neural networks (CNN) and artificial neural networks (ANN). The proposed model demonstrates superior performance compared to traditional methods and existing subscription-based forecasting services. The authors emphasize the potential of deep learning techniques to improve forecasting accuracy using accessible data sources.

Uploaded by

Rayhana Karar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Energy Demand Forecasting Using Deep Learning Applications for the French Grid

This paper presents a deep learning approach for energy demand forecasting in France, utilizing a mixed architecture of convolutional neural networks (CNN) and artificial neural networks (ANN). The proposed model demonstrates superior performance compared to traditional methods and existing subscription-based forecasting services. The authors emphasize the potential of deep learning techniques to improve forecasting accuracy using accessible data sources.

Uploaded by

Rayhana Karar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

energies

Article
Energy Demand Forecasting Using Deep Learning:
Applications for the French Grid
Alejandro J. del Real 1, *, Fernando Dorado 2 and Jaime Durán 2
1 Department of Systems and Automation, University of Seville, 41004 Seville, Spain
2 IDENER, 41300 Seville, Spain; [email protected] (F.D.); [email protected] (J.D.)
* Correspondence: [email protected]

Received: 8 March 2020; Accepted: 21 April 2020; Published: 3 May 2020 

Abstract: This paper investigates the use of deep learning techniques in order to perform energy
demand forecasting. To this end, the authors propose a mixed architecture consisting of a convolutional
neural network (CNN) coupled with an artificial neural network (ANN), with the main objective of
taking advantage of the virtues of both structures: the regression capabilities of the artificial neural
network and the feature extraction capacities of the convolutional neural network. The proposed
structure was trained and then used in a real setting to provide a French energy demand forecast using
Action de Recherche Petite Echelle Grande Echelle (ARPEGE) forecasting weather data. The results
show that this approach outperforms the reference Réseau de Transport d’Electricité (RTE, French
transmission system operator) subscription-based service. Additionally, the proposed solution obtains
the highest performance score when compared with other alternatives, including Autoregressive
Integrated Moving Average (ARIMA) and traditional ANN models. This opens up the possibility
of achieving high-accuracy forecasting using widely accessible deep learning techniques through
open-source machine learning platforms.

Keywords: energy demand forecasting; deep learning; machine learning; convolutional neural
networks; artificial neural networks

1. Introduction
The forecasting of demand plays an essential role in the electric power industry. Thus, there are
a wide variety of methods for electricity demand prediction ranging from those of the short term
(minutes) to long term (weeks), while considering microscopic (individual consumer) to macroscopic
(country-level) aggregation levels. This paper is focused on macroscopic power forecasting in the
medium term (hours).
To date, researchers are in agreement that electrical demand arises from complex interactions
between multiple personal, corporate, and socio-economic factors [1]. All these sources make power
demand forecasting difficult. Indeed, an ideal model able to forecast the power demand with the
highest possible level of accuracy would require access to virtually infinite data sources in order to feed
such a model with all the relevant information. Unfortunately, both the unavailability of the data and
the associated computational burden mean that researchers investigate approximate models supplied
with partial input information.
Within this framework, the prediction of power consumption has been tackled from different
perspectives using different forecasting methodologies. Indeed, there is a rich state of the art of
methods which, according to the the authors of [1], can be divided into the following main categories:

• Statistical models: Purely empirical models where inputs and outputs are correlated using
statistical inference methods, such as:

Energies 2020, 13, 2242; doi:10.3390/en13092242 www.mdpi.com/journal/energies


Energies 2020, 13, 2242 2 of 15

◦ Cointegration analysis and ARIMA;


Energies 2020, 13, x FOR PEER REVIEW 2 of 15
◦ Log-linear regression models;
◦o Combined
Log-linearbootstrap
regression models; (bagging) ARIMA and exponential smoothing.
aggregation
o Combined bootstrap aggregation (bagging) ARIMA and exponential smoothing.
Although some authors [2] report substantial improvements in the forecast accuracy of demand
Although
for energy some
end-use authors
services in[2]
bothreport substantial
developed and improvements in the forecast
developing countries, accuracy
the related modelsof demand
require
for energy end-use services in both developed and developing countries, the related
the implementation of sophisticated statistical methods which are case-dependent with respect to models require
their
application. These facts hinder statistical models from forming an affordable and consistent basis for to
the implementation of sophisticated statistical methods which are case-dependent with respect a
their application. These facts hinder
general power demand forecasting approach. statistical models from forming an affordable and consistent
basis for a general power demand forecasting approach.
• Grey models: These combine a partial theoretical structure with empirical data to complete the
 Grey models: These combine a partial theoretical structure with empirical data to complete the
structure. When compared to purely statistical models, grey models require only a limited amount
structure. When compared to purely statistical models, grey models require only a limited
of data to infer the behavior of the electrical system. Therefore, grey models can deal with partially
amount of data to infer the behavior of the electrical system. Therefore, grey models can deal
known information through generating, excavating, and extracting useful information from what
with partially known information through generating, excavating, and extracting useful
is available. In return, the construction of the partial theoretical structure required by grey models
information from what is available. In return, the construction of the partial theoretical structure
is resource-demanding in terms of modeling. Thus, the cost of an accurate grey model for a
required by grey models is resource-demanding in terms of modeling. Thus, the cost of an
particular application is usually high.
accurate grey model for a particular application is usually high.
• Artificial
Artificialintelligence
intelligencemodels:
models:Traditional
Traditionalmachine
machinelearning
learningmodels
modelsarearedata-driven
data-driventechniques
techniques
used
used to model complex relationships between inputs and outputs. Although the machine
to model complex relationships between inputs and outputs. Although the basis of basis of
learning
machineislearning
mostlyisstatistical, the current
mostly statistical, availability
the current of openofplatforms
availability to easily
open platforms design
to easily and
design
train
and train models contributes significantly to access to this technology. This fact, along withthe
models contributes significantly to access to this technology. This fact, along with the
high
highperformance
performanceachieved
achievedby bywell-designed
well-designedand andtrained
trainedmachine
machinelearning
learningmodels,
models,provides
providesan an
affordable
affordableandandrobust
robusttool
toolforforpower
powerdemand
demandforecasting.
forecasting.
However,
However, traditional
traditional machine
machine learning
learning models
models require
require aa pre-processing
pre-processing step
step to
to perform
perform the
the
extraction of the
extraction of the main features of the input data; this generally involves the manual construction
the input data; this generally involves the manual construction of
of sucha amodule.
such module.ToToovercome
overcomethis
thisrequirement
requirement(as (as depicted
depicted inin Figure 1) modern
modern deep
deep learning
learning
techniques
techniques [3]
[3]have
havethethecapacity
capacityto
tointegrate
integratefeature
featurelearning
learningandandmodel
modelconstruction
constructioninto
intojust
justone
one
model.
model.According
Accordingtotothis
thisapproach,
approach,the
theoriginal
originalinput
input is is
transferred
transferredto to
more abstract
more representations
abstract representationsin
order to allow
in order the the
to allow subsequent model
subsequent modellayers to find
layers the the
to find inherent structures.
inherent structures.

Figure1.1. Comparison
Figure Comparison between
between traditional
traditional machine
machine learning
learning models
models (a)
(a) requiring
requiringmanual
manualfeature
feature
extraction,
extraction,and
andmodern
moderndeep
deeplearning
learningstructures
structures(b)
(b)which
whichcan
canautomate
automateall
allthe
thefeature
featureand
andtraining
training
process
processininan
anend-to-end
end-to-endlearning
learningstructure.
structure.

In
Inthis
thispaper,
paper,the
theauthors
authorsfocus
focusthetheanalysis
analysison
onpredicting
predictingthe
theenergy
energydemand
demandbased
basedon
onartificial
artificial
intelligence
intelligence models. Nevertheless, although modern deep learning techniques have attracted the
models. Nevertheless, although modern deep learning techniques have attracted the
attention
attention ofof many
many researchers
researchers inin aamyriad
myriadof ofareas,
areas,many
manypublications
publications related
relatedtotopower
powerdemand
demand
forecasting
forecastinguseusetraditional
traditionalmachine
machinelearning
learningapproaches
approachessuch
suchasasartificial
artificialneural
neuralnetworks.
networks.
As commented above, the use of ANNs in the energy sector has been
As commented above, the use of ANNs in the energy sector has been widely widely researched. Thanks
researched. Thanksto
their good generalization ability, ANNs have received considerable attention in smart
to their good generalization ability, ANNs have received considerable attention in smart grid grid forecasting
forecasting and management. A comparison between the different methods of energy prediction
using ANN is proposed in [4] by classifying these algorithms into two groups.
Energies 2020, 13, 2242 3 of 15

and management. A comparison between the different methods of energy prediction using ANN is
proposed in [4] by classifying these algorithms into two groups.
On the one hand, the first group consists of traditional feedforward neural networks with only
one output node to predict next-hour or next-day peak load, or with several output nodes to forecast
hourly load [5].
On the other hand, other authors opt for radial basis function networks [6], self-organizing
maps [7], and recurrent neural networks [8].
Lie et al. compared three forecasting techniques, i.e., fuzzy logic (FL), neural networks (NNs),
and autoregressive (AR) processes. They concluded that FL and NNs are more accurate than AR
models [9]. In 2020, Chen Li presented an ANN for a short-term load forecasting model in the smart
urban grids of Victoria and New South Wales in Australia.
Bo et al. proposed a combined energy forecasting mechanism composed of the back propagation
neural network, support vector machine, generalized regression neural network, and ARIMA [10].
Wen et al. explored a deep learning approach to identify active power fluctuations in real time based
on long short-term memory (LSTM) [11].
However, traditional ANN solutions have limited performance when there is a lack of large training
datasets, a significant number of inputs, or when solving computationally demanding problems [12],
which is precisely the case discussed in this paper.
Thus, the authors of this paper found a promising topic related to the application of modern
deep learning structures to the problem of power demand forecasting. More specifically, this paper
describes the novel use of a particular deep neural network structure composed of a convolutional
neural network (widely used in image classification) followed by an artificial neural network for the
forecasting of power demand with a limited number of information sources available.
The network structure of a CNN was first proposed by Fukushima in 1988 [13]. The use of CNNs
has several advantages over traditional ANNs, including being highly optimized for processing 2D
and 3D images and being effective in the learning and extraction of 2D image features [14]. Specifically,
this is a quite interesting application for the purposes of this paper, since the authors here aim to extract
relevant features from the temperature grid of France (as further explained in Section 2.1.2).
The technique used to locate important regions and extract relevant features from images is
referred to as visual saliency prediction. This is a challenging research topic, with a vast number of
computer vision and image processing applications.
Wang et al. [15] introduced a novel saliency detection algorithm which sequentially exploited
the local and global contexts. The local context was handled by a CNN model which assigned a local
saliency value to each pixel given the input of local images patches, while the global context was
handled by a feed-forward network.
In the field of energy prediction, some authors have studied the modeling of electricity consumption
in Poland using nighttime light images and deep neural networks [16].
In [17], an architecture known as DeepEnergy was proposed to predict energy demands using
CNNS. There are two main processes in DeepEnergy: feature extraction and forecasting. The feature
extraction is performed by three convolutional layers and three poling layers while the forecasting
phase is handled by a fully connected structure.
Based on the conclusions and outcomes achieved in previous literature, the authors here
conceptualize their solution which, as described in the next sections, is an effective approach to
dealing with the power demand time series forecasting problem with multiples input variables,
complex nonlinear relationships, and missing data.
Furthermore, the proposed deep learning structure has been applied to the particular problem of
French power demand in a real-setting approach. The next section comprehensively describes the
materials and data sources used to this end, so other researchers can replicate and adapt the work of
this paper to other power demand forecasting applications. As shown later, the performance of this
approach is equal to (if not better than) that of the reference Réseau de Transport d’Electricité (RTE)
Energies 2020, 13, 2242 4 of 15

French power demand forecast subscription-based service. Moreover, the proposed model performs
Energies 2020, 13, x FOR PEER REVIEW 4 of 15
better than existing approaches, as described in the Results section.
2. Materials and Methods
2. Materials and Methods
2.1. Data
2.1. Data Analysis
Analysis

2.2.1. Power
2.1.1. Power Demand
Demand Data
Data
For this
For this paper,
paper, the
the historical
historical data
data of
of French
French energy
energy consumption
consumption were
were downloaded
downloaded fromfrom the
the
official RTE website [18], which provides data from 2012 to present. A first analysis of this data
official RTE website [18], which provides data from 2012 to present. A first analysis of this data resultsresults
is shown
is shownin inFigure
Figure2,2,which
whichshows
showsaaclear
clearseasonal
seasonalpattern
patternononthe
theenergy
energydemand.
demand.

Figure2.2.Monthly
Figure Monthly French
Frenchenergy demand
energy for thefor
demand period
the 2018–2019. Qx indicates
period 2018–2019. the X data the
Qx indicates percentiles.
X data
The colored lines within the Q
percentiles. The colored lines within
25 and Q quartile boxes represent the median (orange line)
the75Q25 and Q75 quartile boxes represent the median (orange andline)
the
mean (dashed green line). Points below and above
and the mean (dashed green line). Points below and 25 Q and Q
above Q75 are shown as well.
25 and Q75 are shown as well.

The
The strong seasonal pattern
strong seasonal patternininthetheenergy
energy demand
demand is further
is further backed
backed by Figure
by Figure 3, computed
3, computed with
with the data of RTE, which depicts the correlation between energy consumption
the data of RTE, which depicts the correlation between energy consumption and temperature. As and temperature.
As shown,
shown, an average
an average variation
variation of 1 °C 1 ◦ C during
of during winterwinter
over theover theterritory
entire entire territory led to a of
led to a variation variation
around
of around 2500 MW in the peak consumption (equivalent to the average winter
2500 MW in the peak consumption (equivalent to the average winter consumption of about 2 million consumption of
about
homes) 2 million
[18]. In homes) [18]. In
the summer, thethe summer, the
temperature temperature
gradient relatedgradient related to airwas
to air conditioning conditioning was
approximately
approximately 400 MW per ◦ C.
400 MW per °C.
2.1.2. Weather Forecast Data
As explained in [19], the meteorological parameters are the most important independent variables
and the main form of input information for the forecast of energy demand. Specifically, temperature
plays a fundamental role in the energy demand prediction, since it has a significant and direct effect on
energy consumption (please refer to Figure 3). Moreover, different weather parameters are correlated,
so the inclusion of more than one may cause multicollinearity [19]. Accordingly, for this paper,
temperature was the fundamental input used.
The strong seasonal pattern in the energy demand is further backed by Figure 3, computed with
the data of RTE, which depicts the correlation between energy consumption and temperature. As
shown, an average variation of 1 °C during winter over the entire territory led to a variation of around
2500 MW in the peak consumption (equivalent to the average winter consumption of about 2 million
homes)2020,
Energies [18].
13,In the summer, the temperature gradient related to air conditioning was approximately
2242 5 of 15
400 MW per °C.

Energies 2020, 13, x FOR PEER REVIEW 5 of 15

Figure 3. Correlation between energy consumption and temperature as provided by the Réseau de
Transport d’Electricité (RTE).

2.2.2. Weather Forecast Data


As explained in [19], the meteorological parameters are the most important independent
variables and the main form of input information for the forecast of energy demand. Specifically,
temperature plays a fundamental role in the energy demand prediction, since it has a significant and
directFigure
effect on energybetween
3. Correlation consumption (please refer
energy consumption andtotemperature
Figure 3).asMoreover,
provided bydifferent
the Réseauweather
de
parameters
Transportare correlated,
d’Electricité so the inclusion of more than one may cause multicollinearity [19].
(RTE).
Accordingly, for this paper, temperature was the fundamental input used.
The weather forecast historical data were collected from Action de Recherche Petite Echelle
(ARPEGE), which
Grande Echelle (ARPEGE), which is the main numerical weather prediction provider over the
Europe–Atlantic domain.
As indicated
indicatedininthe theARPEGE
ARPEGE documentation
documentation [20],
[20], the the initial
initial conditions
conditions of thisofmodel
this model were
were based
based
on on four-dimensional
four-dimensional variational
variational assimilation
assimilation (4D-Var) (4D-Var) that incorporated
that incorporated very largeveryandlarge and
varied
varied conventional
conventional observations
observations (radio sounding,
(radio sounding, airplaneairplane measurements,
measurements, ground ground
stations,stations, ships,
ships, buoys,
buoys,
etc) etc.)asasthose
as well well from
as those fromsensing
remote remote(Advanced
sensing (Advanced TIROS Operational
TIROS Operational Vertical(ATOVS),
Vertical Sounder Sounder
(ATOVS), Special
Special Sensor Sensor Microwave
Microwave Imager(SSMI/S),
Imager Sounder Sounder (SSMI/S),
Atmospheric Atmospheric Infrared (AIRS),
Infrared Sounder SounderInfrared
(AIRS),
Infrared Atmospheric Sounding Interferometer (IASI), Cross-Track Infrared
Atmospheric Sounding Interferometer (IASI), Cross-Track Infrared Sounder (CRIS), Advanced Sounder (CRIS), Advanced
Technology Microwave Sounder (ATMs), Spinning Enhanced Visible and Infrared Imager Imager (SEVIRI),
(SEVIRI),
GPS sol, GPS satellite, etc.).
Although ARPEGE
ARPEGE provided
provided multiple
multiple forecast
forecast data
data related
related to weather (pressure, wind, wind,
humidity) with
with aa full resolution
resolution of 0.1°, ◦
0.1 , the
the authors
authors of this paper found that the
temperature, humidity)
temperature data with a 1° 1◦ resolution
resolution (as shown in Figure
Figure 4) were
were sufficient
sufficient for the purpose of energy
energy
forecastingwhile
demand forecasting whilemaintaining
maintainingeasy-to-handle
easy-to-handledata data sets.
sets. ThisThis finding
finding is also
is also aligned
aligned withwith
the
the strong
strong correlation
correlation of energy
of energy demanddemand and temperature
and temperature described
described in the preceding
in the preceding SectionSection 2.1.1.
2.2.1. Figure
4Figure
shows4 the
shows the pre-processing
pre-processing results results after reducing
after reducing the granularity
the granularity of the
of the grid grid0.1°
from from 1°.◦ to 1◦ .
to 0.1

Figure
Figure 4.
4. Locations
Locations of
of temperature
temperature forecasting
forecasting with
with aa resolution of 11°
resolution of ◦ over
over France.
France.

A key issue related to weather forecasting was the availability of the data, since the providers
released their predictions only at certain moments. In this case, the solution had to be be able to
predict the French power demand for the day ahead (D+1) based on the weather forecasts at day D.
As depicted in Figure 5, the power demand forecast model was run at 08.00 every day (D), with
the most recent weather forecast information available (released at 00.00), and provided a prediction
of the energy demand during day D+1.
Energies 2020, 13, 2242 6 of 15

A key issue related to weather forecasting was the availability of the data, since the providers
released their predictions only at certain moments. In this case, the solution had to be be able to predict
the French power demand for the day ahead (D+1) based on the weather forecasts at day D.
As depicted in Figure 5, the power demand forecast model was run at 08.00 every day (D), with the
most recent weather forecast information available (released at 00.00), and provided a prediction of the
energy demand during day D+1.
Energies 2020, 13, x FOR PEER REVIEW 6 of 15
Energies 2020, 13, x FOR PEER REVIEW 6 of 15

Figure5.5.Real
Figure Realsetting
settingof
ofthe
theenergy
energydemand
demandforecasting
forecastingproblem.
problem.D:
D:day.
day.
Figure 5. Real setting of the energy demand forecasting problem. D: day.
2.2.
2.2.Data
DataPreparation
Preparation
2.2. Data Preparation
First,
First,thethehistorical
historicaldatasets
datasets related to theto
related French energy consumption
the French energy consumption(Section 2.1.1) and forecasted
(Section 2.2.1) and
First,
temperature the historical
(Section 2.1.2)datasets
were related to
pre-processed the toFrench
eliminateenergy consumption
outliers,
forecasted temperature (Section 2.2.2) were pre-processed to eliminate outliers, clean unwanted clean (Section
unwanted 2.2.1) and
characters,
forecasted
and filter nulltemperature
data. Then, (Section
and as is 2.2.2)
usual were
practice pre-processed
when training to eliminate
machine outliers,
learning
characters, and filter null data. Then, and as is usual practice when training machine learning models, clean
models, theunwanted
resulting
characters,
data and
were divided
the resulting filter
data null
into
were data.
three
divided Then,
into and
datasets: asdatasets:
training,
three is usual practice
validation,
training, when
and training
testing.
validation, andmachine
testing.learning models,
the resulting
Although
Although data
the
the were divided
historical
historical French into
French three
consumptiondatasets:
consumption datatraining,
provided validation,
by RTEby
data provided dateand
RTEbacktesting.
to 2012,
date backthetoauthors of
2012, the
this Although
paper had the
only historical
access to theFrench
ARPEGE consumption
historical data
weather provided
forecast by
data
authors of this paper had only access to the ARPEGE historical weather forecast data in the period RTE
in the date back
period to
spanning2012, the
from
1authors
Octoberof2018
spanning this to
from paper
30 had only
September
1 October 2018 access
2019. toSeptember
the ARPEGE
Although
to 30 a wider
2019.historical
availability
Although weather forecast
ofa historical
wider data in
weather
availability ofthe
forecast period
data
historical
spanning
would have from 1
benefittedOctober
the 2018 to
generalization 30 September
capability 2019.
of the Although
resulting a wider
machine
weather forecast data would have benefitted the generalization capability of the resulting machine availability
learning of
model, historical
the data
weather
set
learning forecast
availablemodel, data
still the
coveredwould
data set have year,
a whole
availablebenefitted
so the
still the generalization
seasonal
covered year,capability
influence
a whole was
so thefully of the influence
resulting
captured.
seasonal machine
Additionally,
was fully
learning
the authors model, the
randomly data set
extracted available
eight still
full covered
days from a whole
the year,
original so
data
captured. Additionally, the authors randomly extracted eight full days from the original data setthe seasonal
set in order influence
to further was fully
test thein
captured.
generalization Additionally,
performance the authors
of the randomly
model (as extracted
depicted in eight
Figure full
6 days
and from
further
order to further test the generalization performance of the model (as depicted in Figure 6 and further the original
discussed in data set
Section in
3).
order
This to further
way,
discussed the test the
in remaining
Section 3). generalization
data sets were
This way, performance
randomly data
the remaining of sets
dividedthe asmodel (as depicted
follows:
were randomly in Figure
divided 6 and further
as follows:
discussed in Section 3). This way, the remaining data sets were randomly divided as follows:
• Training
TrainingDataset
Dataset(80%(80%ofofthe thedata):
data):The
Thesample
sampleofofdata dataused
usedtotofitfitthe
themodel.
model.
 Training
• Validation Dataset
ValidationDataset (80%
Dataset(10% of the
(10%ofofthe data):
thedata): The
data):The sample
Thesample
sampleofof data
ofdata used
dataused to
usedto fit the
toprovide model.
providean anunbiased
unbiasedevaluation
evaluation
 Validation
ofaamodel
of modelfit Dataset
fiton
onthe (10%
the of the
training
training data): while
dataset
dataset The
whilesample
tuning
tuning ofmodel
data used
model to provide an unbiased evaluation
hyperparameters.
hyperparameters.
of a model fit (10%
on the training dataset while tuning datamodel hyperparameters.
• Test Test Dataset
Dataset (10% ofofthe thedata):
data): Thesample
The sample ofdata
of usedto
used to providean
provide anunbiased
unbiasedevaluation
evaluationof ofaa
 Test Dataset
finalmodel (10%
modelfitfiton of
onthe the
thetrainingdata): The
trainingdataset. sample
dataset. of data used to provide an unbiased evaluation of a
final
final model fit on the training dataset.

Figure 6. Division of the original dataset (365 days) into testing and training data. The testing data
Figure 6. Division of the original dataset (365 days) into testing and training data. The testing data
Figure 6. Division
were used of the original dataset
as a complementary means to(365 days) analyze
further into testing
the and training data.
generalization The testingofdata
performance the
were used as a complementary means to further analyze the generalization performance of the resulting
were usedmodel.
resulting as a complementary means to
The remaining training further
data were analyze theusual:
divided as generalization
80% train, performance of and
10% validation, the
model. The remaining training data were divided as usual: 80% train, 10% validation, and 10% testing.
resulting model. The remaining training data were divided as usual: 80% train, 10% validation, and
10% testing.
10% testing.
2.3. Deep Learning Architecture
2.3. Deep Learning Architecture
The deep learning architecture used in this paper (as shown in Figure 7) resembles those
The deep
structures learning
widely used architecture used in thisapaper
in image classification: (as shown
convolutional in Figure
neural 7) resembles
network followed by those
an
structures widelynetwork.
artificial neural used in The
image classification:
novelty a convolutional
of this paper is not the deep neural network
neural followed
network by an
itself but its
artificial neural network. The novelty of this paper is not the deep neural network itself
application to the macroscopic forecast of energy demand. In fact, the aforementioned deep learning but its
application
architecturetowas
the macroscopic forecast
originally thought toofautomatically
energy demand.inferInfeatures
fact, thefrom
aforementioned deep (made
an input image learning of
Energies 2020, 13, 2242 7 of 15

2.3. Deep Learning Architecture


The deep learning architecture used in this paper (as shown in Figure 7) resembles those structures
widely used in image classification: a convolutional neural network followed by an artificial neural
network. The novelty of this paper is not the deep neural network itself but its application to the
macroscopic forecast of energy demand. In fact, the aforementioned deep learning architecture was
originally thought to automatically infer features from an input image (made of pixels) in order to
subsequently
Energies 2020, 13,classify such
x FOR PEER image in a certain category attending to the inferred specific features.7 of 15
REVIEW

Figure 7. Deep learning structure composed of a convolutional neural network followed by an artificial
Figurenetwork
neural 7. Deepadapted
learningtostructure composed
the energy of a convolutional
demand forecasting problem. neural network followed by an
artificial neural network adapted to the energy demand forecasting problem.
For the applications of this paper, the convolutional network received the temperature forecasts
from For the applications
multiple of this paper,
locations within the area the
ofconvolutional network
interest (in this receivedinstead
case, France) the temperature forecasts
of an image. Still,
fromconvolutional
the multiple locations within
network the areaaof“feature”
extracted interest (in
of this
suchcase, France)
input, whichinstead
may of bean image. Still,asthe
understood a
convolutional temperature
representative network extracted
of Franceaautomatically
“feature” ofcalculated
such input, whichto may
attending be understood
the individual as a
contribution
representative
of each location to temperature
the aggregated of energy
France consumption.
automaticallyFor calculated attending
instance, the temperatureto the individual
locations close
contribution
to of each location
large consumption sites (such to as
thehighly
aggregated energy
populated consumption.
areas) For instance,assigned
would be automatically the temperature
a larger
locations
weight close
when to large consumption
compared sites (suchareas.
to other less populated as highly populated areas) would be automatically
assigned a larger
As also weight
discussed when compared
in Section to otherofless
1, the advantage the populated areas.
proposed deep learning structure with respect
As also discussed
to traditional in Section 1, the
(and less sophisticated) advantage
machine of the
learning proposed
structures deepthis
is that learning
featurestructure
extractionwith
is
respect to
implicit to the
traditional
model, and (andthusless sophisticated)
there is no need to machine
design thelearning structuresstep
feature extraction is that this feature
manually.
extraction is implicit
As shown to 7,
in Figure thethe
model, andneural
artificial thus network
there is no need tothe
receiving design the feature
“featured” extraction
temperature fromstep
the
manually.
convolutional network was also fed with additional information found to highly influence the energy
demandAs shown
as well,innamely:
Figure 7, the artificial neural network receiving the “featured” temperature from
the convolutional network was also fed with additional information found to highly influence the
• Week of the year: a number from 1 to 52.
energy demand as well, namely:
• Hour: a number from 0 to 23.

• Week
Day ofof
thethe year:aanumber
week: numberfrom from01toto6.52.
 Hour: a number from 0 to 23.
• Holiday: true (1) or false (0).
 Day of the week: a number from 0 to 6.

2.3.1.Holiday: true (1)Neural
Convolutional or false (0).
Network

2.3.1.As described byNeural


Convolutional the authors in [21], convolutional neural networks, or CNNs, are a specialized
Network
kind of neural network for processing data, and have a known, grid-like topology. The name of
As described
“convolutional by the
neural authorsindicates
network” in [21], convolutional neural
that the network networks,
employs or CNNs, are
a mathematical a specialized
operation called
kind of neural
convolution, network
which for processing
is a specialized kinddata, and operation.
of linear have a known, grid-like
Traditional topology.
neural Thelayers
network nameuseof
“convolutional neural network” indicates that the network employs a mathematical operation called
matrix multiplication to model the interaction between each input unit and each output unit. This means
convolution, which is a specialized kind of linear operation. Traditional neural network layers use
every output unit interacts with every input unit. Convolutional networks, however, typically have
matrix multiplication to model the interaction between each input unit and each output unit. This
sparse interactions (also referred to as sparse connectivity or sparse weights). This characteristic
means every output unit interacts with every input unit. Convolutional networks, however, typically
provides the CNNs with a series of benefits, namely:
have sparse interactions (also referred to as sparse connectivity or sparse weights). This characteristic
provides the CNNs with a series of benefits, namely:
 They use a fewer number of parameters (weights) with respect to fully connected networks.
 They are designed to be invariant in object position and distortion of the scene when used to
process images, which is a property shared when they are fed with other kinds of inputs as well.
 They can automatically learn and generalize features from the input domain.
Energies 2020, 13, 2242 8 of 15

• They use a fewer number of parameters (weights) with respect to fully connected networks.
• They are designed to be invariant in object position and distortion of the scene when used to
process images, which is a property shared when they are fed with other kinds of inputs as well.
• They can automatically learn and generalize features from the input domain.

Attending to these benefits, this paper used a CNN to extract a representative temperature of
the area of interest (France) from the historical temperature forecast data as explained before. For the
sake of providing an easy replication of the results by other researchers, the main features of the CNN
designed for this paper were as follows:

• A two-dimensional convolutional layer. This layer created a convolution kernel that was convolved
with the layer to produce a tensor of outputs. It was set with the following parameters:

◦ Filters: 8 integers, the dimensions of the output space.


◦ Kernel size: (2,2). A list of 2 integers specifying the height and width of the 2D
convolution window.
◦ Strides: (1,1). A list of 2 integers specifying the stride of the convolution along with the height
and width.
◦ Activation function: Rectified linear unit (ReLU)
◦ Padding: To pad input (if needed) so that input image was fully covered by the filter and
was stride-specified.
• Average pooling 2D: Average pooling operation for special data. This was set with the
following parameters:

◦ Pool size: (2,2). Factors by which to downscale.


◦ Strides: (1,1).
• Flattening: To flatten the input.
• A fully connected network providing the featured output temperature:

◦ Layer 1: 64 neurons, activation function: ReLU.


◦ Layer 2: 24 neurons, activation function: ReLU.
◦ Layer 3: 1 neuron, activation function: ReLU.

2.3.2. Artificial Neural Network.


As shown in Figure 7, the CNN output was connected to a fully connected multilayer artificial
neural network (ANN) with the following structure.

• Layer 1: 256 neurons, activation function: ReLU.


• Layer 2: 128 neurons, activation function: ReLU.
• Layer 3: 64 neurons, activation function: ReLU.
• Layer 4: 32 neurons, activation function: ReLU.
• Layer 5: 16 neurons, activation function: ReLU.
• Layer 6: 1 neuron, activation function: ReLU.

The weights of all layers were initialized following a normal distribution with mean 0.1 and
standard deviation 0.05.

2.3.3. Design of the Architecture


Different tests were performed in order to find the optimum number of layers and neurons. To this
end, various networks structures were trained using the same data division for training, validation,
Energies 2020, 13, 2242 9 of 15

and testing. Moreover, the training parameters were also optimized. To this effect, the different
models were trained, repeatedly changing the learning parameters (as the learning rate) to find the
optimal ones.
Once all the results were obtained, the objective was to find the model with the least bias error
(error in the training set) as well as low validation and testing errors. Accordingly, model 5 of the table
below was selected.
Finally, L2 regularization was added to our model in order to reduce the difference between the
bias error and the validation/testing error. In addition, thanks to L2 regularization, the model was able
to better generalize using data that had never been seen.
A summary of the tests performed can be seen in the Table 1 below:

Table 1. Summary of the results of the different structures. ANN: artificial neural network; CNN:
convolutional neural network.

Model 1 2 3 4 5
Layer 1 (CNN) - 64 64 64 64
Layer 2 (CNN) 24 24 24 24 24
Layer 1 (ANN) - - - - 256
Layer 2 (ANN) - - - 128 128
Layer 3 (ANN) - - 64 64 64
Layer 4 (ANN) 32 32 32 32 32
Layer 5 (ANN) 16 16 16 16 16
Layer 6 (ANN) 1 1 1 1 1
Train Error (%) 1.9548 1.2275 0.6532 0.4797 0.4929
Validation Error (%) 2.7721 2.6791 1.2307 0.9378 0.8603
Test Error 1 (%) 2.8185 3.0435 1.2415 0.9125 0.8843
Test Error 2 (%) 4.2818 4.1677 2.0604 1.6873 1.5378
Cross-Validation Error (%) 5.8827 5.3691 2.6341 2.0806 1.6621

2.4. Training
The training process of the proposed deep neural network was aimed at adjusting its internal
parameters (resembling mathematical regression), so the structure was able to correlate its output (the
French energy demand forecast) with respect to its inputs.
What separates deep learning from a traditional regression problem is the handling of the
generalization error, also known as the validation error. Here, the generalization error is defined as the
expected value of the error when the deep learning structure is fed with new input data which were not
shown during the training phase. Typically, the usual approach is to estimate the generalization error
by measuring its performance on the validation set of examples that were collected separately from the
training set. The factors determining how well a deep learning algorithm performs are its ability to:

• Reduce the training error to as low as possible.


• Keep the gap between the training and validation errors as low as possible.

The tradeoff of these factors results in a deep neural network structure that is either underfitted or
overfitted. In order to prevent overfitting, the usual approach is to update the learning algorithm to
encourage the network to keep the weights small. This is called weight regularization, and it can be
used as a general technique to reduce overfitting of the training dataset and improve the generalization
of the model.
In the model used in this paper, the authors used the so-called L2 regularization in order to reduce
the validation error. This regularization strategy drives the weights closer to the origin by adding
a regularization term to the objective function. L2 regularization adds the sum of the square of the
weights to the loss function [22].
The rest of the training parameters were selected as follows:
Energies 2020, 13, 2242 10 of 15

• Batch size: 100. The number of training examples in one forward/backward pass. The higher the
batch size, the more memory space needed.
• Epochs: 30,000. One forward pass and one backward pass of all the training examples.
• Learning rate: 0.001. Determines the step size at each iteration while moving toward a minimum
of a loss function.
• β1 parameter: 0.9. The exponential decay rate for the first moment estimates (momentum).
• β2 parameter: 0.99. The exponential decay rate for the first moment estimates (RMSprop).
• Loss function: Mean absolute percentage error.

3. Results
Different metrics were computed within this paper in order to evaluate the performance of the
proposed solution. Specifically, mean absolute error (MAE), mean absolute percentage error (MAPE),
mean bias error (MBE), and mean bias percentage error (MBPE) were calculated. Their equations are
listed below:
1X
MAE (MW) = y − ŷ , (1)
n
100 X y − ŷ
MAPE (%) = , (2)
n y
1X
MBE (MW) = y − ŷ, (3)
n
100 X y − ŷ
MBPE (MW) = , (4)
n y
where y is the reference measure (in our case, the real energy demand) and ŷ is the estimated measure
(in our case, the forecasted energy demand).
Once the deep learning structure proposed in this paper was trained with the training data set
shown in Figure 6 and its output was tested against the real French energy demand, the authors
calculated the different metrics, as shown in Table 2. As an additional performance metric, the authors
calculated the metrics of the reference RAE energy demand forecast, which was included for comparison.

Table 2. Performance comparison metrics. MAE: mean absolute error; MAPE: mean absolute percentage
error; MBE: mean bias error; MBPE: mean bias percentage error.

Model MAE (MW) MAPE (%) MBE (MW) MBPE (%)


Deep learning network 808.317 1.4934 21.7444 0.0231
RTE forecast service 812.707 1.4941 280.8350 0.4665

The absolute percentage error is also presented in the graphical form provided in Figure 8.
Deep learning network 808.317 1.4934 21.7444 0.0231
RTE forecast service 812.707 1.4941 280.8350 0.4665

The absoluteEnergies
percentage
2020, 13, 2242
error is also presented in the graphical form provided
11 of 15
in Figure 8.

Figure 8. Absolute percentage error distribution provided by the deep learning structure proposed in
Figure 8. Absolute
thispercentage error
paper and the RTE distribution
subscription-based provided by the deep learning structure proposed in
service.
this paper and the RTE subscription-based service.
Another interesting measure of the performance of the proposed structure is the absolute
percentage error
Energies 2020, 13, monthly
x FOR distribution along a full year, as shown in Figure 9.
PEER REVIEW 11 of 15
Another interesting measure of the performance of the proposed structure is the abso
entage error monthly distribution along a full year, as shown in Figure 9.

Absolute percentage
Figure 9. Absolute
Figure percentage error-specific
error-specific monthly
monthly metrics
metrics over
over an
an entire
entire year
year as
as provided
provided by
by the
the
proposed deep neural network.

In Tables
In Tables 33and
and4,4,the
themonthly
monthly distributed
distributed results
results of the
of the metrics
metrics fromfrom Equations
Equations (1)–(4)
(1)–(4) are
are also
also gathered.
gathered.

Table 3. Errors provided by the proposed deep learning structure.

Month MAE (MW) MAPE (%) MBE (MW) MBPE (MW)


January 965.1701 1.3542 54.1231 0.0499
February 818.8975 1.2424 203.2982 0.2812
March 823.4836 1.4667 25.0578 -0.0033
Energies 2020, 13, 2242 12 of 15

Table 3. Errors provided by the proposed deep learning structure.

Month MAE (MW) MAPE (%) MBE (MW) MBPE (MW)


January 965.1701 1.3542 54.1231 0.0499
February 818.8975 1.2424 203.2982 0.2812
March 823.4836 1.4667 25.0578 −0.0033
April 1041.4191 1.9774 554.1952 1.0199
May 684.9614 1.4718 94.5629 0.1791
June 544.8806 1.2693 29.9470 0.0536
July 588.3867 1.2318 −264.5221 −0.5527
August 572.0692 1.3592 −98.9578 −0.2256
September 618.1227 1.3575 −168.3288 −0.3636
October 676.8499 1.3964 67.5875 0.1724
November 1062.7987 1.7511 211.3716 0.3383
December 1303.7523 2.0131 −385.4661 −0.5891

Table 4. Errors provided by the RTE.

Month MAE (MW) MAPE (%) MBE (MW) MBPE (MW)


January 1078.5000 1.5249 275.5658 0.3773
February 1011.5000 1.5078 504.5656 0.7169
March 1082.8261 1.8753 843.4203 1.4599
April 751.6769 1.4599 243.6000 0.4678
May 722.7412 1.5619 138.6000 0.2854
June 604.5833 1.3278 71.5307 0.1574
July 623.2633 1.3025 −156.375 −0.3392
August 607.4872 1.4180 92.2435 0.2202
September 543.9652 1.2333 8.8125 −0.0092
October 760.9367 1.4758 222.9873 0.3394
November 917.9838 1.5582 579.1290 1.0029
December 1039.5945 1.6306 572.2432 0.9326

The next Figure 10 depicts the results for the testing of the performance of the forecast provided
by the proposed deep neural network on the eight full days extracted from the original data.
Finally, in order to compare the performance achieved by the approach proposed in this paper
with respect to existing solutions, a comparative study was performed. To this end, the CNN + ANN
structure was fed with the temperature grid information while the others, which were not specially
designed for processing images, were fed with the average temperature values of France. The ARIMA
algorithm received as input the past energy demand and predicted that of the future. The results of the
experiment are provided in Table 5.

Table 5. Comparison between the proposed solution and the existing methods.

Model MAE (MW) MAPE (%) MBE (MW) MBPE (%)


Linear Regression 6217.5683 12.3102 −232.3066 −2.5240
Regression Tree 5436.8139 10.4437 −254.3756 −2.0879
Support Vector Regression (Lineal) 6217.4404 12.2809 −0.597 −2.0014
Support Vector Regression (Polinomic) 4813.6934 9.2218 246.248 −1.047
ARIMA 1179.964 2.9480 104.6737 0.2097
ANN 1537.5137 2.8351 132.1096 0.1195
CNN + ANN 808.3166 1.4934 21.7444 0.0231
RTE Model 812.6966 1.4941 280.835 0.4665

As the main outcome of the analysis, it can be concluded that the approach that achieved the best
results was the deep learning structure presented in this paper, as it showed an improvement even on
RTE baseline values. Furthermore, it can be observed that the single ANN, which was only supplied
with the average temperature, performed worse than the CNN + ANN method. This fact confirms
that the CNN can extract the temperature features for France, providing relevant information to the
machine learning algorithm and allowing improved results. Accordingly, the solution presented in
this paper was able to enhance the performance of existing methods, thanks to the processing and
extraction of features from the French temperature grid performed by the CNN.
Energies 2020, 13, x FOR PEER REVIEW 12 of 15

The
Energies next
2020, 13, Figure
2242 10 depicts the results for the testing of the performance of the forecast provided
13 of 15
by the proposed deep neural network on the eight full days extracted from the original data.

Figure
Figure 10.
10. Performance
Performanceof of the
the forecast
forecast provided
provided byby the
the proposed
proposed deep
deep neural
neural network
network on
on the
the eight
eight
full
full days
daysextracted
extractedfrom
fromthe
theoriginal
originaldata.
data. (Left column) Real energy consumption, neural network
(Left
energycolumn) Realand
prediction energy consumption,
energy prediction byneural network
RTE model onenergy prediction
a different full dayand energy
in the prediction
Testing by
Set. (Right
RTE model on a different full day in the Testing Set.
column) Absolute Percentage Error in energy prediction throughout the day by the neural network
and the model proposed by RTE).
Energies 2020, 13, 2242 14 of 15

4. Discussion
In this paper, the authors presented the adaptation of a deep neural network structure commonly
used for image classification applied to the forecast of energy demand. In particular, the structure was
trained for the French energy grid.
The results show that the performance of the proposed structure competes with the results
provided by the RTE subscription-based reference service. Specifically, the overall MAPE metric of the
proposed approach delivers an error of 1.4934%, which is slightly better than the value of 1.4941%
obtained with the RTE forecast data.
In addition, a comparison between the proposed solution and existing methods was also performed.
As pointed out in the Results section, the suggested approach performed better than all the existing
methods which were tested. Specifically, the linear regression, regression tree, and support vector
regression (lineal) approaches had a MAPE above 10%, support vector regression (polinomic) had a
MAPE of 9.2218%, and ARIMA and ANN had a MAPE that was slightly lower than 3%. Since the
MAPE achieved by the proposed structure was 1.4934%, it can be confirmed that the CNN + ANN
approach is better than the existing models.
When analyzed on a monthly basis, the errors were uniformly distributed through the year,
despite the noticeable increments during the late autumn and winter seasons. This fact is also in
accordance with the reference RTE forecast data and may be due to the intermittency of the energy
consumption profile observed when French temperatures are low.
The proposed deep neural network also was tested against eight full days randomly selected
from the original dataset in order to provide an additional measure of generalization performance.
On the one hand and as shown in Figure 10, the errors were uniformly distributed along the selected
days. On the other hand, the predictions provided by this paper were quite similar to those predictions
provided by the reference RTE subscription service and were also aligned with the overall MAPE
metrics. These results indicate that the proposed neural network structure is well designed and trained,
and that it generalizes as expected.
The performance achieved in this paper is a promising result for those researchers within the
electrical energy industry requiring accurate energy demand forecasting at multiple levels (both
temporal and geographical). Despite the focus of this paper on the French macroscopic energy demand
problem, the flexibility of the proposed deep neural network and the wide availability of open platforms
for its design and training make the proposed approach an accessible and easy-to-implement project.
To further facilitate the replication of this paper by other researchers in this area, the authors have
included detailed information about the topology and design of the proposed structure.

Author Contributions: Conceptualization, A.J.d.R.; software, F.D.; validation, A.J.d.R. and J.D.; writing—original
draft preparation, A.J.d.R.; writing—review and editing, A.J.d.R., F.D. and J.D.; supervision, A.J.d.R. All authors
have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Hu, H.; Wang, L.; Peng, L.; Zeng, Y.-R. Effective energy consumption forecasting using enhanced bagged
echo state network. Energy 2020, 197, 1167–1178. [CrossRef]
2. Oliveira, E.M.; Luiz, F.; Oliveira, C. Forecasting mid-long term electric energy consumption through bagging
ARIMA and exponential smoothing methods. Energy 2018, 144, 776–778. [CrossRef]
3. Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep learning for smart manufacturing: Methods and
applications. J. Manuf. Syst. 2018, 48, 144–156. [CrossRef]
4. RazaKhan, A.; Mahmood, A.; Safdar, A.A.; Khan, Z. Load forecsating, dynamic pricing and DSM in smart
grid: A review. Renew. Sustain. Energy Rev. 2016, 54, 1311–1322.
Energies 2020, 13, 2242 15 of 15

5. Hippert, H.; Pedreira, C.; Souza, R. Neural Network for short-term load forecasting: A review and evaluation.
IEEE Trans. Power Syst. 2001, 16, 44–55. [CrossRef]
6. Gonzalez-Romera, J.-M.; Carmona-Fernandez, M. Montly electric demand forecasting based on trend
extraction. IEEE Trans. Power Syst. 2006, 21, 1946–1953. [CrossRef]
7. Becalli, M.; Cellura, M.; Brano, L.; Marvuglia, V. Forecasting daily urban electric load profiles using artificial
neural networks. Energy Convers. Manag. 2004, 45, 2879–2900. [CrossRef]
8. Srinivasan, D.; Lee, M.A. Survey of hybrid fuzzy neural approches to a electric load forecasting. In Proceedings
of the IEEE international Conference on System, Man and Cybernetics. Intelligent System for the 21st
Century, Vancouver, BC, Canada, 22–25 October 1995.
9. Liu, K.; Subbarayan, S.; Shoults, R.; Manry, M. Comparison of very short-term load forecasting techniques.
IEEE Trans. Power Syst. 1996, 11, 877–882. [CrossRef]
10. Bo, H.; Nie, Y.; Wang, J. Electric load forecasting use a novelty hybrid model on the basic of data preprocessing
technique and multi-objective optimization algorithm. IEEE Access 2020, 8, 13858–13874. [CrossRef]
11. Wen, S.; Wang, Y.; Tang, Y.; Xu, Y.; Li, P.; Zhao, T. Real—Time identification of power fluctuations based on
lstm recurrent neural network: A case study on singapore power system. IEEE Trans. Ind. Inform. 2019, 15,
5266–5275. [CrossRef]
12. Gu, J.; Wangb, Z.; Kuenb, J.; Ma, L.; Shahroudy, A.; Shuaib, B.; Wang, X.; Wang, L.; Wang, G.; Cai, J.; et al.
Recent advances in convolutional neural networks. Pattern Recognit. 2017. [CrossRef]
13. Fukushima, K. Neocognitron: A hierical neural network capable of visual pattern recognition. Neural Netw.
1988, 1, 119–130. [CrossRef]
14. Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P. A State-of-the-art survey on deep learning
theory and architectures. Electronics 2019, 8, 292. [CrossRef]
15. Wang, L.; Lu, H.; Ruan, X.; Yang, M.H. Deep networks for saliency detection via local estimation and global
search. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA,
USA, 7–12 June 2015.
16. Jasiński, T. Modeling electricity consumption using nighttime light images and artificial neural networks.
Energy 2019, 179, 831–842. [CrossRef]
17. Kuo, P.-H.; Huang, C.-J. A high precision artificial neural networks model for short-term energy load
forecasting. Energies 2018, 11, 213. [CrossRef]
18. RTE. November 2014. Available online: http://clients.rte-france.com/lang/fr/visiteurs/vie/courbes_
methodologie.jsp (accessed on 20 December 2019).
19. Arenal Gómez, C. Modelo de Temperatura Para la Mejora de la Predicción de la Demanda Eléctrica: Aplicación al
Sistema Peninsular Español; Universidad Politécnica de Madrid: Madrid, Spain, 2016.
20. ARPEGE. Meteo France, Le Modele. 2019. Available online: https://donneespubliques.meteofrance.fr/client/
document/doc_arpege_pour-portail_20190827-_249.pdf (accessed on 3 February 2020).
21. Goodfellow, I.; Bengio, Y.; Courville, A. Optimization for training deep models. Deep Learning. 2017,
pp. 274–330. Available online: http://faculty.neu.edu.cn/yury/AAI/Textbook/DeepLearningBook.pdf
(accessed on 3 February 2020).
22. Browlee, J. Machine Learning Mastery. Available online: https://machinelearningmastery.com/
weightregularization- (accessed on 3 February 2020).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like