Anomaly Detection and Predictive Maintenance For Photovoltaic Systems 8 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Neurocomputing 310 (2018) 59–68

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Anomaly detection and predictive maintenance for photovoltaic


systems
Massimiliano De Benedetti a, Fabio Leonardi a, Fabrizio Messina a,∗, Corrado Santoro a,
Athanasios Vasilakos b
a
Department of Mathematics and Computer Science, University of Catania, Via S. sofia 64, Catania 95125, Italy
b
Lab of Networks and Cybersecurity, Innopolis University, 1, Universitetskaya Str., Innopolis 420500, Russia

a r t i c l e i n f o a b s t r a c t

Article history: We present a learning approach designed to detect possible anomalies in photovoltaic (PV) systems in
Received 19 January 2018 order to let an operator to plan predictive maintenance interventions. The anomaly detection algorithm
Revised 27 April 2018
presented is based on the comparison between the measured and the predicted values of the AC power
Accepted 3 May 2018
production. The model designed to predict the AC power production is based on an Artificial Neural Net-
Available online 16 May 2018
work (ANN), that is capable of estimating the AC power production using solar irradiance and PV panel
Communicated by Dr. Yuan Yuan temperature measurements, and that is trained using a dataset previously gathered from the plant to
be monitored. Live trend data coming from the PV system are then compared with the output of the
Keywords:
Artificial neural networks model and the vector of residuals is analyzed to detect anomalies and generate daily predictive mainte-
Data analysis nance alerts; there residuals are aggregated over 1-day and processed to detect out-of-threshold samples
Anomaly detection and system degradation trends; these trends are extracted by computing the Triangular Moving Average
Alerting system (TMA) where the window size is automatically determined. The paper also reports experimental data re-
Predictive maintenance sults revealing that the model leads to a good anomaly detection rate, which is measured as a positive
Photovoltaic systems predictive detection rate greater than 90%. Moreover, the algorithm is able to recognize trends of system’s
deviations from normal operation behavior and generate predictive maintenance alerts as a decision sup-
port system for operatives, with the aim of avoiding possible incoming failures.
© 2018 Elsevier B.V. All rights reserved.

1. Introduction 1.1. Motivation

The reduction of the costs of photovoltaic (PV) systems, the In this context, we are particularly interested in developing
trend of the market prices [1], along with the increment of per- techniques allowing to detect faults of PV components in a timely
formances resulting from the improved cell efficiencies and lower manner. This aspect has got a fair attention in literature; in [8],
electrical conversion losses [2], has led to the grow of the interest the authors developed a practical fault detection approach in PV
in such alternative energy production systems [3–6]. As a conse- systems, intended for online implementation; a similar technique
quence, the issues related to PV systems maintenance are gaining a is proposed in [9] for wind turbines. Both approaches seem to give
lot of attention, as proved by the studies and efforts (conducted by good results in terms of accuracy but they predict faults with a too
various institutions and companies) that aim at developing “best small anticipation time (one to five time units). Indeed, a desirable
practices” for PV system operations [7]. Maintenance includes vari- property of any solution for predicting faults is represented by
ous activities that are planned into a monitoring program that can the ability to provide accurate daily predictive maintenance alerts
range from minimal checks (e.g., checking the total electricity gen- in order to allow operators to take decisions in time and, as a
erated as reported by the inverter once per year) to high-accuracy consequence, to plan maintenance tasks on field in advance. An al-
monitoring that allows the manufacturer or the owner to identifies gorithm for anomaly detection and predictive maintenance would
problems or the need of cleaning operations. Maintenance will af- be able to provide an accurate model for estimating electricity
fect the performance of the PV systems, in terms of efficiency and production under normal operating conditions, which can be
generated power, and, as a consequence, the overall revenue. employed to provide early detection of anomalies. Such anomalies
can be identified by analyzing power production samples that

show significant differences from those expected under certain
Corresponding author.
operating conditions, and the significance of the anomalies can
E-mail address: [email protected] (F. Messina).

https://doi.org/10.1016/j.neucom.2018.05.017
0925-2312/© 2018 Elsevier B.V. All rights reserved.
60 M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68

be computed as the deviation between the actual sample and the mainly dependent by the solar radiation that has a slower dynam-
forecasted value compared to the model accuracy index. In the ics. Nevertheless, at the best of our knowledge there is no any solid
specific context of PV systems, predictive alerts shall be presented approach dealing with predictive maintenance of PV systems capa-
several days before the incoming failure: in such a case, the opera- ble of providing predictive alerts several days in advance, therefore,
tor can analyze the signal in order to take a proper decision and to in the following paragraphs, we will refer to the works that have
plan a number of technical and logistic operations related to the some similarities with our approach.
needed maintenance activity. As a consequence, the PV system will Among the current literature, some proposals are based on
not under-perform for prolonged periods of time, and the power electrical-circuit simulations of a PV panel [26–28], while some
losses caused by these anomalies and faults will be minimized. others rely on the statistical analysis of different PV system mea-
surements, as well as system efficiency values [29–31]. Some of
1.2. Main contribution the existing predictive models for PV systems are capable to pre-
dict the production of electricity by means of a number of para-
In this work, we present a novel approach designed to detect metric models that use variables belonging to PV system and
anomalies in PV systems and generate predictive maintenance weather, plus a number of adjustable parameters [32,33]. Some
alerts. Our approach is capable to predict the faults in time others do use artificial intelligence techniques, such as neural net-
(i.e. some weeks before the fault will occur) to plan a number of works, fuzzy logic, and expert systems [8,34–36].
maintenance activities aimed at avoiding the possible loss of power Along the approaches cited before, we mention an interesting
production. This is achieved by deriving a long term trends that fault detection approach presented in [8], which is intended only
allow us to detect “degradation patterns” which identifies future for on-line implementation, developed and validated using data
faults in a timely manner. The approach is based on computing the measured from a real PV system. The model starts with a data
numeric difference (in terms of residual vector) between power analysis aimed at identifying values not representative of a normal
production data coming from the real plant and the data estimated PV system operation, then the original 10-min measurements are
by a model of the inverters developed ad hoc: the residual vector averaged over 1 hour. The authors also developed different mod-
is then processed by applying first autocorrelation and then a els for different irradiance ranges, in order to represent different
Triangular Moving Average (TMA) in order to detect anomalies performance corresponding to different sunlight levels. The results
and to identify hidden long term trends. The inverter model is mainly reveal that the models for different irradiance intervals lead
determined using an artificial neural network (ANN) trained with to a fault detection rate greater than 90%. The fault detection is
historical data previously samples related to solar irradiance, tem- simply based on the comparison between the measured and model
perature and produced power. As it will be discussed later in this prediction results of the AC power production. The model, in turn,
paper, the application of these techniques allowed us to obtain estimates the AC power production using solar irradiance and PV
significant results in term of accuracy of the generated alerts, panel temperature measurements. The model present a fair degree
which are presented several days before the incoming failures; of complexity and a high accuracy in detecting faults even in pres-
indeed, based on the experimental results we performed on a real ence of of some abnormalities on measured data. Nevertheless, the
dataset, that are also reported in the paper, we have verified that fault detection approach in [8] predict faults by looking at the last
the AC power production model show a validation error of only samples available. As a consequence, it does not allow the opera-
2.3%, and the predictive anomaly detection rate is larger than 90%. tor to predict the faults in time to plan a number of maintenance
The paper is organized as follows. Section 2 reports a summary activities aimed at avoiding the possible loss of power production
of the main literature in the field. Section 3 provides an overview (e.g., some weeks before the fault will occur). Conversely, by our
of the proposed approach. Section 4 thoroughly describes the ANN approach, as we explain later in the paper, we derive a long term
model development. Section 5 enters the anomaly’s predictive de- trends that allow us to detect “degradation patterns” which iden-
tection. A number of experimental results, along with a numerical tifies future faults in a timely manner.
example are illustrated in Section 6. Finally, Section 7 concludes In [36] the authors study the interesting problem of modeling a
the paper. power supply system by taking into account the various seasonal,
monthly and daily changes in meteorological data. They adopted
2. Related work an adaptive neuro-fuzzy inference system, called ANFIS, and a new
expert configuration PVPS system, that is a user aided design tool
The application of Artificial Intelligence for modeling and study- for PV systems. In order to find suitable models for the different
ing photovoltaic systems has recently attracted a lot of interest. components of the PVPS system (e.g., generator, battery and reg-
For instance, in [10] the authors discuss the major artificial intelli- ulator) they used an extended database of measured climate data
gence (AI) techniques for photovoltaic applications: artificial neural (global radiation, temperature and humidity) as well as electrical
networks (ANNs), fuzzy logic (FL), genetic algorithm (GA), and hy- data (photovoltaic, battery and regulator voltage and current) of a
brid systems (HSs), and analyzed the main advantages of AI-based PVPS system installed in the south of Algeria. They obtained an
modeling and simulation techniques as alternatives to conventional excellent level of accuracy and reliability and the correlation coef-
physical modeling are explained. They describe the application of ficient between measured values and those estimated by the AN-
the different techniques of AI for modeling, prediction and fault FIS has shown a good prediction accuracy of 98%. A very interest-
detection in some detail, and outline some conclusions. The anal- ing part of this related work is that the authors performed some
ysis presented by the authors puts evidence on the fact that AI tests with an Artificial Neural Network (ANN), and those tests have
offers a real alternative for prediction and fault identification sys- shown that the modeling technique developed by the authors is
tems. able to perform better than the used ANN. Although in our work
As for anomalies and fault detection on PV systems, as well as the ANN represents only a first part of the whole process, as we
long-term prediction and performance measures, the related liter- state in the conclusive section, we aim at integrating, in a future
ature is fairly wise [11–25]. Overall, a few proposals in the litera- work a different approach to model (and predict) the AC power
ture deal with fault prediction on PV and eolic systems, and the production.
proposed algorithms differ on the nature of the processes and sig- In [11] the authors present a statistical approach for fault de-
nals. Eolic systems are characterized mostly by the wind velocity tection and diagnosis in a PV system, with the goal of early detect
signal, which presents high daily variations, while PV systems are and identify faults on the DC side of a PV system as short-circuit,
M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68 61

open-circuit and partial shading faults. An aspect similar to


our work is that the authors applied an exponentially-weighted
moving average (EWMA) control chart on the residuals obtained
from the one-diode model. One of the main motivation behind
the application of this technique is the low computational cost
that makes it easy to implement in real time. Authors validated
their approach by means of a dataset collected into a suitable
photovoltaic plant located in Algeria. In particular, as we discuss
later in this paper, we adopted the Triangular Moving Average
(TMA) in order to detect anomalies and to identify hidden long
term trends, as one of the final steps of our approach.
The study proposed in [37] discusses a particular fault detec-
tion algorithm based on the analysis of the theoretical curves de-
scribing the behaviour of an existing grid-connected photovoltaic
(GCPV) system. The authors simulated a number of attributes such
as voltage ratio (VR) and power ratio (PR) and, thereafter, they
used a third-order polynomial function to generate a couple of de-
tection limits for the VR and PR ratios. Samples of a given data
set that laid out of the detecting limits were processed by a fuzzy
logic classification system. The analysis of the obtained results has
proven the accuracy of the different faults occurring in the PV sys-
tem, in the order of 98.8%.
A further interesting approach is presented in [9], which deals
with the problem of detecting small and slowly developing faults
on wind turbines. One of the main differences lies in the context of
the application: indeed, if such faults are not detected (and fixed)
on time, they may cause severe damages to wind turbines with
severe consequences in terms of downtimes and costs, while a
small fault in PV systems will cause a decrease in performance and
overall productivity. In this critical context, the authors proposed a
method that combines an artificial neural network (ANN) and an
Exponentially-Weighted Moving Average (EWMA) to improve the
Fig. 1. Schema of the approach.
accuracy of detection of small faults: the ANN is used to predict
the output of wind turbine, and the EWMA is applied to moni-
tor the residuals between real and predicted values. Experimental intended to be executed in batch. We model a plant by means of
results on real data have shown that the proposed approach is ca- an artificial neural network; since the ANN must be trained, we
pable to catch about the 63.6% of the failures out of 77 occurred suppose that a certain amount of historical data is available; on
during wind turbine monitoring. The authors discuss the fact that this basis, since PV power production features a seasonal trend,
the performance measures (MSE and average run length) indicate one year of historical data suffices. The details of this first phase
that the proposed method can detect small faults in advance and are reported in Section 4.
improve the detection accuracy. Once the plant/inverter model has been developed, the second
We remark that most of the related works in the literature an- phase is the on-line analysis of sampled data in order to detected
alyze the model development phase, either through regression or the anomalies, a data-driven process that is sketched in Fig. 1 and
artificial intelligence techniques, with the aim of identifying rel- detailed in Section 5. The ANN is used to predict the inverter out-
evant deviations or anomalies, at the time they appear. Neverthe- put under normal conditions; the output is then used to compute
less, none of them deal with a long-term predictive problem, in or- daily residuals and values that exceed the normal operation limits
der to generate alerts in time, such that specialized operators will will result in warnings; the autocorrelation function is then used
be able to analyze them and schedule an intervention on site, with to find intrinsic periodicity of peaks, and the longest periodicity
either internal or external human resources. is treated as input parameter in order to calculate the Triangular
Moving Average (TMA) of the daily residuals, thus obtaining the
3. System model and overview of the approach related trend signal. TMA shows the long term trend of the daily
residuals, which could result in both areas of degradation or con-
he proposed approach aims at predicting the failures in the vergence, detected through its derivative function. Predictive alerts
system by detecting abnormalities, in PV system operations, be- are finally generated for those warnings that appear in correspon-
fore they cause severe damages. Here we refer to state-of-the-art dence of positive values of the trend derivative.
photovoltaic power production plants featured by a certain num-
ber of PV panel strings connected to suitable DC/AC inverters. We 4. Model development
consider that the plant is provided with a suitable data collec-
tion system, able to sample and store data relevant to solar irra- As discussed in the previous section, the first phase of the ap-
diance (pyranometer), temperature and generated AC power. As for proach is the development of a model to predict the AC power pro-
data sampling, the only requirement is a sampling interval that is duction of the PV system. In principle, the overall produced power
enough to have adequate data for a 1-hour average (indeed, we could be modeled as a linear function of the solar irradiance, as the
consider a sampling time of 5 min). plot in Fig. 2 suggests, which shows a trend obtained from data sam-
The approach is based on two different phases. In the first pled in a real production plant. Nevertheless, the relation between so-
phase, the model of the plant/inverter to be monitored has to be lar irradiance and produced power is also affected by the temperature
developed, based on historical stored data; therefore this phase is of the photovoltaic cells that is tied to the environment temperature,
62 M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68

Fig. 2. Hourly dataset – inverter active power vs radiation.

Fig. 3. Representation of ANN used to model power production.

therefore a linear model would represent a very simple and inaccu- We also consider a second pre-processing operation by averag-
rate approach to approximate the real function which, indeed, features ing data over 1 hour: indeed, in PV systems data is often sampled
some non-linearities that require a proper modeling approach. with a smaller period, in the order of 1–10 min, but, with that
sampling time, data could feature a high variance; averaging over
4.1. Plant model and data pre-processing 1 hour can thus decrease the variability in the dataset and con-
tribute to improve the accuracy of the model constructed. Com-
A Photovoltaic system can be modeled by the following generic puting average over a certain time window represents an opera-
function: tion that enables to keep track of the core variance in the daily
P = f (I, T ) (1) solar process, dealing with slow variation signals such as the so-
lar radiance. It is possible to consider time windows different than
where P is the power produced in kW, I is the solar irradiance in 1 hour for this averaging step, but we should take into account, as
W
m2
and T is the environment temperature in °C. a general comment, that the higher the time window, the higher
To derive a usable form of function (1), a typical approach con- the probability to lose relevant information within data. Vice versa,
sists in the adoption of an artificial neural network, as a first step a small time window used in the calculation of mean will result in
of a more complex approach which is capable to “learn” – from the complications in determining after the related long-term trend.
available dataset – the non-linear nature of the system. For this
reason, in our system, we employ an ANN made of a Multi-layer 4.2. ANN training and structure
Perceptron (MLP) [38] with two inputs, I and T, and one output, P,
as in Fig. 3. In order to obtain a MLP-ANN good enough to model a process,
In order to train the network, a dataset coming from a real we must determine the number hidden layers as well as the op-
power production plant must be used (as we did in our experi- timal number of neurons. As for the number of hidden layers, it
ments, as Section 6 reports); the set has to be however properly is widely known that using more than one layer in a MLP does
filtered in order to remove invalid data that could affect the predic- not provide significant advances but, on the contrary increments
tive model. Indeed, the development of a reliable model requires, processing time and complexity; for this reason we used only one
at the beginning, a consistent training dataset: the available data hidden layer. On the other hand, to determine the needed number
samples that are being used for the training dataset should contain of neurons that would provide a good accuracy, we adopted a “trial
and represent all the seasonality i.e. at least 1 year of flat data.1 and error” process (which is a widely accepted approach) aimed at
A pre-processing operation has to be done on the available data training and test several network structures and then analyzing the
set in order to ensure that all data are valid. Indeed, in some cases, residual vectors.
data could present situations in which power output is near zero On this basis, we divided the data obtained at the end of the
but the irradiance is slightly greater than zero; such conditions, filtering phase into a training set and a validation set. Since the
which are highlighted in Fig. 2 with the red oval, refer to events, model is specifically tuned to fit the training data, we also verified
like plant maintenance or plant failure, in which the inverters are whether the developed model suffered of over-fitting, which may
“off”: since this data patterns do not refer to normal operative con- occur when a model performs very well on the training dataset
ditions, they must be removed. In this filtering operation, also data but, on the other hand, is not able to generalize from the data
identifying patterns related to very low sunlight levels (low irra- trend and performs poorly on fresh data.
diance and zero output power) should be removed, for which the The validation is performed by analyzing the Coefficient of Vari-
measurements accuracy is significantly reduced. ation of the Root Mean Square Error (RMSE) computed on the set
of observations. On this basis, we tried a range of values from 1
1
Here we consider an acquisition system robust enough to not present sensi-
to 50 for the number of neurons in the hidden layer and mea-
ble “data-missing holes”; in the opposite case, more than 1 year of data could be sured validation error, then we selected the case corresponding to
required to obtain a good model. the smallest variation. By using the dataset in Section 6, we found
M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68 63

Fig. 4. Hourly residual vector.

that using a maximum number of 10 neurons is appropriate, as Daily residuals, once computed, follow two different data pro-
the evolution of the validation error indicated that model starts to cessing paths (see Fig. 1): on one hand, a threshold is applied to
over-fit when the size of the hidden layer overcounts 10 neurons. detect “out of normal operation” conditions that are used to create
The calculated RMSE on the validation dataset was 11,4 KW over warnings; on the other hand, a long-term analysis is performed
a 500 KW inverter, which represents a ratio of 2,3% for the RMSE to find possible degradation, i.e. areas with increasing deviations
normalized w.r.t. the maximum value of the AC power values. of residuals over time. The results of these two paths are then
It should be noticed that the resulting ANN must be considered merged and filtered in order to generate predictive alerts. The de-
valid for that production plant: even if the forms of function (1) are tails of this process are described in the following subsections.
quite similar when various plants are considered, if a new PV sys-
tem needs to be monitored, the relevant ANN must be developed 5.2. Determining normal operation limits
and trained, using the approach reported before, in order to let the
fault-prediction system work well. A normal operational condition implies that the behavior of the
plant does not differ too much from the predicted one and thus
5. Anomaly detection and predictive alerts through residuals features low values in the vector of residuals. On this basis, we
control can fix some thresholds and generate a warning when a value, in
the daily residuals, overcome such thresholds.
In this Section, we show the details of the algorithm for To determine the thresholds, we start from the RMSE computed
anomaly detection and generation of predictive alerts. As Fig. 1 de- during the ANN validation (see Section 4.2); such a RMSE repre-
picts, the approach is based on the analysis of the vector of resid- sents the standard deviation of the normal operation AC Power
uals, computed by determining the difference between the output Model and, since the model has been developed by leveraging on
of the ANN model and the live data coming from the plant. Since the hourly dataset, we can denote it as σ h . We define two thresh-
the algorithm is designed to detect a trend, an adequate amount olds, Hourly Lower Limit (HLL) and Hourly Upper Limit (HUL) as:
of data has to be accumulated; to this aim, a data series in a time HLL = 3σh HUL = 5σh
window is considered, the series is continuously updated on the
basis of live data: each time a new data is sampled from the plant, Samples that exceed HLL but are lower than HUL represents an ab-
the oldest data in the series is removed, the series is then used to normal behavior of the PV system, while samples that exceed HUL
compute the vector of residuals that is passed to the other data are representative of strong anomalies with a good level of confi-
analysis blocks, which are described in the following. dence. To determine the numerical values of HLL and HUL we can
observe that, according to [39], when data is featured by a normal
(Gaussian) distribution, about 99.7% of the data points lie within
5.1. Computing daily residuals
3σ h , while about 99.9999% of the data points lie within 5σ h . Fur-
thermore, according to the Chebyshev’s Theorem [39], even if the
The first operation made on the vector of residuals is a per-
data does not follow a normal distribution, at least 88.9% of the
day aggregation. In general, PV data are sampled with a periodicity
observations falls within 3σ h . For these reasons, 3σ h and 5σ h rep-
that is in the order of minutes and, for our analysis, as reported
resents reasonable values for HLL and HUL, respectively.
in Section 3, a per-hour aggregation is needed. However, while a
The threshold values are determined starting from the RMSE of
1-hour basis aggregation is required to develop a good model of
ANN validation that is performed using hourly values, but, as re-
plant, such a resulting dataset still contains too much values for a
ported in the previous subsection, anomaly detection is performed
trend-based anomaly detection: indeed, since our analysis is based
using daily cumulative residuals. On this basis, starting from HLL
on finding data trends that evolves with a very low frequency (or-
and HUL we should determine two equivalent thresholds to be ap-
der of days), a 1-hour aggregation contains high frequencies that
plied to daily data. We empirically define a daily-equivalent stan-
are not only useless for the analysis but could also affect valid-
dard deviation index σd = K σh to be used for the definition of Daily
ity. For this reason, starting from the 1-hour vector of residuals,
Lower Limit (DLL) and Daily Upper Limit (DUL), which are expressed
we compute the daily vector by considering the cumulative value
as:
of the absolute hourly residuals in a day. The new data series thus
contains a value for each day that is the sum of the absolute values σd = K σh
of all the residuals of that day. DLL = 3σd = 3(K σh )
Figs. 4 and 5 show the data before and after this aggrega-
DUL = 5σd = 5(K σh )
tion process; Fig. 4 reports the hourly trend of residuals while
Fig. 5 plots the output of the daily-cumulative operation: we can Clearly, the choice of K has an impact on the accuracy of predictive
observe that high frequencies are removed but the overall trend of alerts generated by the system. Indeed, the lower the K factor, the
data is still maintained. lower the daily normal operation limits, the higher the probability
64 M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68

Fig. 5. Daily residual vector.

Fig. 6. Normal operation limits on daily residuals.

to obtain daily residuals exceeding that limits, which will result further processed, thus reducing the probability of generating false
in a larger number of warnings and, as a consequence, the prob- positive and helping to better compute the predictive maintenance
ability that some of them will be false positive will increase. Vice alerts. The control quality criteria, in this case, is based on ex-
versa, the higher is the K, the higher the daily normal operation tracting the trend in the daily residuals and combining such trends
limits, the lower the probability to have daily residuals exceeding with warnings.
that limits, which will increase the probability of losing informa- The process to extract the trend in daily residuals is based
tion potentially related to a real anomaly. Moreover, in any PV sys- on filtering the data by applying a Triangular Moving Average
tem, during the different hours of the day, the amount of power (TMA) [39]. The main parameter to be set for the TMA is the win-
production will be different due to several different factors as, e.g., dow size to be considered in the computation of the average which
the variation in the intensity of the Sun’s radiation during the day, can be determined by finding periodicity in residuals. To this aim,
the variations in the length of the day, the angle of incidence of we apply a classical data processing algorithm that exploits the au-
the Sun’s rays with the ground, increasing during the day from a tocorrelation function as explained below:
very low value at dawn as the Sun rises to a peak at noon and
• the output of autocorrelation function (see Fig. 8) is analyzed
falling again as the Sun sets. Similarly the insolation will show low
in order to find “peaks” (i.e. local maximum); these points cor-
values at high latitudes due to the effect of the different air mass
respond to short-time periods;
density.
• the time values corresponding to short-time periods are aver-
In particular, the values of ESH (Equivalent Sun Hours) vary from
aged and a short-time period average (SPA) is computed;
3 to 5 equivalent hours, therefore, by taking into account this as-
• the next step is to find long-time periods; to this aim all peaks
pect, we selected K = 4 in order to get a daily equivalent standard
featuring an autocorrelation value grater than 0.3 are first ex-
deviation starting from the hourly standard deviation σ h .
tracted and, from this set, only the points whose time differ-
On this basis, we apply DLL and DUL thresholds to daily resid-
ence is greater than SPA are considered;
uals and generate warnings by using a function that outputs a nu-
• time values of the resulting set are average thus computing the
meric value as follows:
long-time period average (LPA); the window size of the TMA is
• samples below DLL represent a normal operation, so the output thus set to 2LPA.
value is 0;
As it is reported in Section 6, using our experimental data we
• samples between DLL and DUL are considered as possible
found a LPA of 19 days and set the window size of TMA to 38 days.
anomalies, here the output value is 0.5;
The residual trend computed with the TMA is therefore reported in
• samples greater than DUL are considered indications of strong
Fig. 9.
anomalies, so they take as output the value 1.

Fig. 7 shows the different levels of warnings generated from the 5.4. Predictive alerts creation
daily residual vector in Fig. 5.
The last step of the data analysis process deals with the cre-
5.3. Determining residual trends ation of predictive alerts; this is performed by finding areas of
degradation or convergence related to the inverter normal behav-
In order to improve the accuracy in the recognition of abnor- ior by computing the derivative of the trend signal. Positive val-
mal behaviors, warnings generated in the previous step must be ues of the derivatives samples correspond to degradation areas
M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68 65

Fig. 7. Daily residual vector.

Fig. 8. Autocorrelation of daily residuals Vector.

Fig. 9. Normalized Daily residuals vector and long term trend.

(residuals trend tend to upper values) while negative values are not automatically imply that a fault is going to occur. The idea be-
related to normal behavior or convergence. Residual warnings are hind our proposal is to provide operators with a “warning lamp”,
thus checked against derivative samples and predictive maintenance in their control room, meaning that a system is featuring a be-
alerts are generated only for those residuals warnings appearing havior that is slightly different than the normal one, but it’s up
in correspondence of positive values of the trend derivative. The to the operators themselves to decide whether or not to plan a
trend comes from a processing that uses a time window—the Tri- maintenance intervention on the plant; for this reason, the analy-
angular Moving Average—the size of the window used in this step sis described in this paper has to be intended as an additional sup-
corresponds to the relative alert anticipation; since, according to port for decision making. As an example, an operator, on the basis
our experiments, the windows size in general in the order of one of her/his experience, can decide to perform a maintenance action
month (38 days in our experiments, see Section 6), we can state when a certain number of consecutive warnings are reported; this
that the data processing chain described is able to provide pre- specific number can be set on the basis of some factors like past
dictive alter with a high anticipation, thus allowing maintainers to warnings and their relationships with occurred faults, age of PV
decide whether planning proper actions. panels or inverters of the plant, etc., but, in any case, its choice is
finally tied to a human decision.
5.5. Alert utilization and decision making
6. Simulation and experimental results
The data analysis process described above has the objective of
identifying a degradation of performances that could led to an im- In order to test the proposed solution and to collect the ex-
minent fault; the use of the conditional, in this case, is mandatory perimental results, we implemented the algorithm by means of a
because we must bear in mind that the presence of an alert does number of Matlab functions and scripts [40]. We briefly discuss
66 M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68

Fig. 10. Daily Residuals warnings, Daily Residual trend, Daily Predictive Alerts.

Fig. 11. Registered faults, Predictive positive alerts, False positive alerts.

here the computational complexity of the proposed solution; the Table 1


Sample of collected data.
interested reader may find more details about the software in
the supplementary material, which includes a simplified version TimeStamp Inv_A1-Active Met1 radiation Met1 module
of the procedures correspondent to every computational phases power avg [kW] (pyranometer) temperature
avg [W/m2] avg [C]
illustrated into the Fig. 1. The implementation has been used to
perform a series of experiments on a real dataset and the results 1/1/2015 7:30 0.0 0 01886598 0.3815842867 2.6353769302
we obtained are reported in the following. 1/1/2015 7:35 0.0610818677 1.3595483303 2.5940611362
1/1/2015 7:40 0.0382286347 2.2702567577 2.5262842178
As for the dataset, we used historical data collected from a
1/1/2015 7:45 0.0731724724 3.61191535 2.4704217911
ground-mounted PV plant located in Akropotamos, Greece. The 1/1/2015 7:50 0.0056160204 5.0770974159 2.438030 0 045
system has a nominal capacity of 6.22 MW, generated by 20,384 1/1/2015 7:55 1.4125183821 8.0398168564 2.3655948639
SHARP modules of 125 Wp and 30,628 SHARP modules of 120 1/1/2015 8:00 3.849619627 12.8972492218 2.3292591572
1/1/2015 8:05 5.4008703232 17.5688610077 2.3057751656
Wp connected to 13 FIMER inverters with a rated power output of
1/1/2015 8:10 6.9009132385 21.4069633484 2.2752606869
500-kW AC. As a reference, Table 1 reports a snap of the collected 1/1/2015 8:15 8.0641679764 25.2157459259 2.2615573406
dataset. 1/1/2015 8:20 9.4087839127 29.904001236 2.2365436554
Measurements are recorded every 5 min, covering the period 1/1/2015 8:25 8.8424167633 35.2242164612 2.250962019
from January 01, 2015 to December 31, 2016, that is 731 days. 1/1/2015 8:30 9.1357841492 37.8326911926 2.2370893955
1/1/2015 8:35 10.3381958008 40.2881011963 2.2404639721
Collected data includes AC inverter active power output (second
column), the pyranometer solar irradiance (third column) and the
module temperature (fourth column). To develop the ANN-based
model, we used the 70% of the dataset as training set, while the
of predictive alerts is detected consecutively within a certain time
remaining 30% has been exploited as validation set. We then run
window. As windows size, we chose 19 days, meaning that we con-
the prediction alert algorithm by using the overall dataset and, as
sider the presence of a number of alerts in at least half of sam-
a result, the algorithm generated a total amount of 109 daily pre-
ples in the time window determined for the TMA. On this basis, as
dictive alerts that are reported in Fig. 11.
Fig. 11 shows, the number of possible fault warnings detected has
As the Figure shows, some alerts are simple spots occurring
been 13, while the predictive alerts that did not pass the 19-days
only one day, others feature a persistence that lasts some days. In
persistence filter (and that we named as “false positive”) are a to-
order to derive a possible “rule-of-thumb” that could drive an op-
tal of 24. Since the total (real) registered faults are 14, we can state
erator to understand the imminent occurrence of a possible fault,
that the algorithm has been able to correctly predict faults with a
we analyzed the daily predictive alerts against real plant faults
detection rate greater than 90%.
that are registered in the dataset. To this aim, we computed pos-
The implementation gave us also the possibility to understand
sible fault warnings: a possible fault warning occurs when a group
the computational complexity of the algorithm that is useful to
M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68 67

evaluate the cost of the proposed approach. In particular, as shown regarding the required corrective actions leveraging on the busi-
in the supplementary material, all the functions implemented for ness knowledge base moving toward the “so cold” prescriptive an-
each computational blocks of Fig. 1 have a complexity that is linear alytics.
with the input size. Overall, we say that the complexity of these
function is O(n), where n represents the input size of the functions. Acknowledgment
In particular:
This study has been conducted thanks to the Enel Foundation
• The computation of the 1 hour average, the computation of the
Fellowship program.
daily residuals leads to a number of single loops with a number
of iterations proportional to the number of the data set rows.
• Similarly, it can be shown that the computation of the equiva- Supplementary material
lent daily RMSE as well as the setting of a vector of warnings
for out of normal operation limits residuals has linear complex- Supplementary material associated with this article can be
ity on the input size, as it involves a sequence of single of not found, in the online version, at doi:10.1016/j.neucom.2018.05.017.
nested loops that include a few simple operations.
• A similar analysis holds for the phases correspondent to the References
computation of autocorrelation [41], for which we used the
[1] M. Bazilian, I. Onyeji, M. Liebreich, I. MacGill, J. Chase, J. Shah, D. Gielen,
Matlab function xcorr(). In this case, the complexity is propor- D. Arent, D. Landfear, S. Zhengrong, Re-considering the economics of photo-
tional to the size of the input vector (say n), then O(n). Simi- voltaic power, Renew. Energy 53 (Supplement C) (2013) 329–338, doi:10.1016/
larly, the complexity related to the process of finding the pe- j.renene.2012.11.029.
[2] E. Wesoff, Update: Solar firms setting new records in efficiency and perfor-
riodicity and the application of the triangular moving average
mance [www document], Greentech Media. http://www.greentechmedia.com/
(TMA) to find degradation trends is still linear with the input articles/read/Update- Solar- Firms- Setting- New- Records- in- Efficiency- and-
size. Performance (2012).
[3] R. Messenger, A. Abtahi, Photovoltaic Systems Engineering, CRC press, 2017.
• Finally, as it is reported in [42], the complexity related to the
[4] P.Y. Gan, Z. Li, Quantitative study on long term global solar photovoltaic mar-
computation performed by the ANN is represented by a con- ket, Renew. Sustain. Energy Rev. 46 (2015) 88–99.
stant value equal to the number of neurons of the hidden layer. [5] S. Eftekharnejad, V. Vittal, G.T. Heydt, B. Keel, J. Loehr, Impact of increased
penetration of photovoltaic generation on power systems, IEEE Trans. Power
Syst. 28 (2) (2013) 893–901.
7. Discussion and conclusions [6] M. Obi, R. Bass, Trends and challenges of grid-connected photovoltaic system-
s–a review, Renew. Sustain. Energy Rev. 58 (2016) 1082–1094.
[7] N.A.S.P.O.W. Group, Best practices in photovoltaic system operations and main-
Hourly averages of the measurements should be used for PV in- tenance, (2017), https://www.nrel.gov/docs/fy17osti/67553.pdf.
verter models implementation, since the AC power models devel- [8] R. Platon, J. Martel, N. Woodruff, T.Y. Chau, Online fault detection in pv sys-
oped using hourly averages are more accurate than the models de- tems, IEEE Trans. Sustain. Energy 6 (4) (2015) 1200–1207.
[9] S. Arabgol, H.S. Ko, S. Esmaeili, Artificial neural network and ewma-based fault
veloped using 10-min measurements and because of the slow in- prediction in wind turbines, in: Proceedings of the IIE Annual Conference. Pro-
trinsic dynamic of the solar signals. The main requirement of this ceedings, Institute of Industrial and Systems Engineers (IISE), 2015, p. 829.
study was to develop an anomaly detection system and predic- [10] A. Mellit, S.A. Kalogirou, Chapter ii-1-d - a survey on the application of artifi-
cial intelligence techniques for photovoltaic systems, in: S.A. Kalogirou (Ed.),
tive maintenance model with a relatively low degree of complexity McEvoy’s Handbook of Photovoltaics (Third Edition), third, Academic Press,
and able to provide daily predictive alerts to operators to support 2018, pp. 735–761, doi:10.1016/B978- 0- 12- 809921- 6.0 0 019-7.
the maintenance decision process. The novelty of our approach is [11] E. Garoudja, F. Harrou, Y. Sun, K. Kara, A. Chouder, S. Silvestre, A statistical-
based approach for fault detection and diagnosis in a photovoltaic system, in:
represented by the fact that it is designed to detect anomalies in Proceedings of the 6th International Conference on Systems and Control (ICSC),
PV systems and generate predictive maintenance alerts. A a conse- 2017, pp. 75–80, doi:10.1109/ICoSC.2017.7958710.
quence, it is capable to predict the faults in time (e.g., some weeks [12] L. Bonsignore, M. Davarifar, A. Rabhi, G.M. Tina, A. Elhajjaji, Neuro-fuzzy fault
detection method for photovoltaic systems, Energy Proced. 62 (2014) 431–441.
before the fault will occur) to plan a number of maintenance ac- [13] E. Garoudja, F. Harrou, Y. Sun, K. Kara, A. Chouder, S. Silvestre, Statistical fault
tivities aimed at avoiding the possible loss of power production. detection in photovoltaic systems, Solar Energy 150 (2017) 485–499.
As we explained in the paper, this is achieved by deriving a long [14] L. Chen, S. Li, X. Wang, Quickest fault detection in photovoltaic systems, IEEE
Trans. Smart Grid 9 (3) (2018) 835–1847.
term trends that allow us to detect “degradation patterns” which
[15] W. Chine, A. Mellit, V. Lughi, A. Malek, G. Sulligoi, A.M. Pavan, A novel fault di-
identifies future faults in a timely manner. agnosis technique for photovoltaic systems based on artificial neural networks,
Despite the low complexity of the AC power production model, Renew. Energy 90 (2016) 501–512.
the predictive accuracy is quite high: the model has a validation [16] S. Silvestre, A. Chouder, E. Karatepe, Automatic fault detection in grid con-
nected pv systems, Solar Energy 94 (2013) 119–127.
error of 2.3% and the predictive anomaly detection rate is better [17] S.K. Firth, K.J. Lomas, S.J. Rees, A simple model of pv system performance and
than 90%, considering predictive alert presented several days be- its use in fault detection, Solar Energy 84 (4) (2010) 624–635.
fore the incoming failures. [18] Y. Zhao, L. Yang, B. Lehman, J.-F. de Palma, J. Mosesian, R. Lyons, Decision
tree-based fault detection and classification in solar photovoltaic arrays, in:
We think that the main limitation of the developed approach is Proceedings of the Twenty-Seventh Annual IEEE Applied Power Electronics
that, in the current version, it relies on a single model for all the Conference and Exposition (APEC), IEEE, 2012, pp. 93–99.
data. Indeed, there are different windows of irradiance and sea- [19] A. Chouder, S. Silvestre, Automatic supervision and fault detection of pv sys-
tems based on power losses analysis, Energy Convers. Manag. 51 (10) (2010)
sonality for which the process presents different resolutions. As a 1929–1937.
consequence, we aim at measuring and comparing – in a future [20] W. Chine, A. Mellit, A.M. Pavan, S. Kalogirou, Fault detection method for grid–
work – the current model accuracy with that obtained by splitting connected photovoltaic plants, Renew. Energy 66 (2014) 99–110.
[21] X. Lin, Y. Wang, D. Zhu, N. Chang, M. Pedram, Online fault detection and toler-
the model itself in various sub-models that deal with the different
ance for photovoltaic energy harvesting systems, in: Proceedings of the Inter-
windows of irradiance or seasonality. The anomaly detection algo- national Conference on Computer-Aided Design, ACM, 2012, pp. 1–6.
rithm and predictive model is valid for all PV systems since the [22] V. Sharma, S. Chandel, Performance and degradation analysis for long term re-
liability of solar photovoltaic systems: a review, Renew. Sustain. Energy Rev. 27
data cleaning process is automated and able to identify observa-
(Supplement C) (2013) 753–767, doi:10.1016/j.rser.2013.07.046.
tions not representative of a normal PV system operation. [23] F. Touati, N.A. Chowdhury, K. Benhmed, A.J.S.P. Gonzales, M.A. Al-Hitmi, M. Be-
Future work may also include the development of fault classifi- nammar, A. Gastli, L. Ben-Brahim, Long-term performance analysis and power
cation rules, once the model is implemented on-line and its accu- prediction of pv technology in the state of qatar, Renew. Energy 113 (Supple-
ment C) (2017) 952–965, doi:10.1016/j.renene.2017.06.078.
racy and robustness is validated. The specific causes of the faults [24] T. Hove, A method for predicting long-term average performance of photo-
would be identified, providing operators with valuable information voltaic systems, Renew. Energy 21 (2) (20 0 0) 207–229.
68 M. De Benedetti et al. / Neurocomputing 310 (2018) 59–68

[25] Y. Yang, F. Blaabjerg, Z. Zou, Benchmarking of grid fault modes in single-phase Fabio Leonardi holds a Master degree in Electronic en-
grid-connected photovoltaic systems, IEEE Trans. Ind. Appl. 49 (5) (2013) gineering with specialization in Automation and Control
2167–2176. of Complex Systems. He started his career in the research
[26] K.-H. Chao, S.-H. Ho, M.-H. Wang, Modeling and fault diagnosis of a photo- sector at I.N.F.N. (National Institute for Nuclear Physics)
voltaic system, Electric Power Syst. Res. 78 (1) (2008) 97–105, doi:10.1016/j. involved in the development of control boards and soft-
epsr.2006.12.012. ware systems for Data Acquisition and real time monitor-
[27] D. Guasch, S. Silvestre, R. Calatayud, Automatic failure detection in photovoltaic ing of signals acquired from hundreds of underwater op-
systems, in: Proceedings of the 3rd World Conference on Photovoltaic Energy tical and acoustic sensors. Hired by E-Distribuzione, the
Conversion, 2003, 3, 2003, pp. 2269–2271 Vol.3. Italian DSO (Distribution System Operator) in 2010, he en-
[28] M. Hamdaoui, A. Rabhi, A. El Hajjaji, M. Rahmoun, M. Azizi, Monitoring and tered the energy distribution sector, managing develop-
control of the performances for photovoltaic systems, in: International Renew- ment projects of the MV/LV electrical grid, analysing and
able Energy Congress, 2009. supervising the energy flow utilising Data remotely ac-
[29] Y. Yagi, H. Kishi, R. Hagihara, T. Tanaka, S. Kozuma, T. Ishida, M. Waki, quired and ingested into a control and monitoring sys-
M. Tanaka, S. Kiyama, Diagnostic technology and an expert system for pho- tem as well as supporting the implementation of distributed innovative sensors
tovoltaic systems using the learning method, Solar Energy Materials and Solar for smart grid projects. Since 2014 he has been working in the renewable energy
Cells 75 (3) (2003) 655–663. sector, for Enel Green Power, as member of the Global Control and Monitoring sys-
[30] S. Firth, K. Lomas, S. Rees, A simple model of pv system performance and tems unit, managing projects on SCADA systems, Control and Monitoring Rooms
its use in fault detection, Solar Energy 84 (4) (2010) 624–635, doi:10.1016/j. development at worldwide level, design and implementation of Big Data analytics
solener.20 09.08.0 04. International Conference CISBAT 2007. infrastructure. In 2016 he started his Ph.D. in Computer Science at the University
[31] Y. Zhao, B. Lehman, R. Ball, J. Mosesian, J.-F. de Palma, Outlier detection rules of Catania, dealing with Machine Learning techniques and Predictive Maintenance
for fault detection in solar photovoltaic arrays, in: Proceedings of the Twen- algorithms supported by a Big Data infrastructure. From December 2016 he has
ty-Eighth Annual IEEE Applied Power Electronics Conference and Exposition been selected as Research Fellow of Enel Foundation, leading research projects in
(APEC), IEEE, 2013, pp. 2913–2920. the technology sector, with focus on Big Data, Robotics and Artificial Intelligence.
[32] A. Drews, A. De Keizer, H. Beyer, E. Lorenz, J. Betcke, W. Van Sark, W. Hey- Contact him at [email protected]
denreich, E. Wiemken, S. Stettler, P. Toggweiler, et al., Monitoring and remote
failure detection of grid-connected pv systems based on satellite observations,
Solar Energy 81 (4) (2007) 548–564.
[33] A. Chouder, S. Silvestre, N. Sadaoui, L. Rahmani, Modeling and simulation of a F. Messina received his Ph.D. in Computer Science from
grid connected pv system based on the evaluation of main pv module param- the Department of Mathematics and Informatics of the
eters, Simul. Model. Pract. Theory 20 (1) (2012) 46–58. University of Catania, Italy in 2009. He is currently work-
[34] A. Mellit, S.A. Kalogirou, Artificial intelligence techniques for photovoltaic ap- ing as assistant professor in the same department. His
plications: A review, Progr. Energy Combust. Sci. 34 (5) (2008) 574–632. research interest includes Distributed systems, Complex
[35] R. Platon, S. Pelland, Y. Poissant, Modelling the power production of a photo- Systems, Simulation systems, trust and recommender sys-
voltaic system: comparison of Sugeno-type fuzzy logic and pvsat-2 models, in: tems.
Proceedings of the EuroSun ISES-Eur. Solar Conference, Rijeka, Croatia, 2012.
[36] A. Mellit, S.A. Kalogirou, Anfis-based modelling for photovoltaic power supply
system: a case study, Renew. Energy 36 (1) (2011) 250–258.
[37] M. Dhimish, V. Holmes, B. Mehrdadi, M. Dales, Multi-layer photovoltaic fault
detection algorithm, High Voltage 2 (4) (2017) 244–252, doi:10.1049/hve.2017.
0044.
[38] H.B. Demuth, M.H. Beale, O. De Jess, M.T. Hagan, Neural Network Design, Mar-
tin Hagan, 2014.
[39] D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, C. Santoro received the Laurea degree in Computer En-
John Wiley & Sons, 2010. gineering from the University of Catania in 1997, and
[40] M.U. Guide, The Mathworks, Inc. Natick, MA 5 (1998) 333. the Ph.D. in Computer Engineering from the University of
[41] P.F. Dunn, Measurement and Data Analysis for Engineering and Science, CRC Palermo in 2001. Presently, he is a researcher at the De-
press, 2014. partment of Mathematics and Informatics of the Univer-
[42] R. Rojas, Neural Networks: a Systematic Introduction, Springer Science & Busi- sity of Catania. His research interests include large scale
ness Media, 2013. distributed systems, intelligent autonomous systems and
robotics. Contact him at [email protected]
Massimiliano De Benedetti received a Master degree in
Electronic engineering with specialization in Automation
and Control of Complex Systems in 2010. He started his
career in E-Distribuzione, the Italian DSO (Distribution
System Operator) in 2011, working on the development
of web/mobile applications and services for the manage-
ment and monitoring of medium voltage network criti-
cal failure events. In 2013 he was one of the winner of Athanasios V. Vasilakos received the Ph.D. degree in
the first ISSNAF International Internship at Vision-Lab of computer science and engineering from the University
UCL A in Los Angeles where he worked on VisioInertial of Patras, Patras, Greece. He is a Professor with Innop-
Navigation systems for Robotics applications, and in 2014 olis University, Innopolis, Russia. Dr. Vasilakos served
he contributed to an open-data systems R&D for an euro- or is serving as an Editor for several technical jour-
pean research project (PRISMA). Since 2016 he has been nals, such as the IEEE TRANSACTIONS ON NETWORK
working in the renewable energy sector for Enel Green Power, Innovation and Sus- AND SERVICE MANAGEMENT, the IEEE TRANSACTIONS
tainability energy storage unit, working on control and aggregation strategies for ON CLOUD COMPUTING, the IEEE TRANSACTIONS ON IN-
energy stora ge systems based on machine learning algorithms, software platform FORMATION FORENSICS AND SECURITY, the IEEE TRANS-
architecture for DER (Distributed Energy Resources) management and optimization, ACTIONS ON CYBERNETICS, the IEEE TRANSACTIONS ON
Blockchain technology applications, microgrids controller and smart metering sys- NANOBIOSCIENCE the IEEE TRANSACTIONS ON INFORMA-
tems. From December 2016 he has been selected as Fellow of Enel Foundation, lead- TION TECHNOLOGY IN BIOMEDICINE, the ACM Transac-
ing research project in the technology sector focused on Big Data, Robot Coopera- tions on Autonomous and Adaptive Systems, the IEEE
tion and Artificial Intelligence. In 2014 he started his Ph.D. in Computer Science at JOURNAL ON SELECTED AREAS IN COMMUNICATIONS. He is also the General Chair
the University of Catania. He is mainly involved in research activities about Dis- of the European Alliances for Innovation.
tributed Software Architectures, Robot cooperation, Computer Vision and Machine
Learning for Robotics applications. Contact him at [email protected]

You might also like