ENERGY SOURCES, PART A: RECOVERY, UTILIZATION, AND ENVIRONMENTAL EFFECTS
https://doi.org/10.1080/15567036.2021.1919792
Faults diagnosis in a photovoltaic system based on multivariate
statistical analysis
a
Patrick Juvet Gnetchejo , Salomé Ndjakomo Essianea,b, Pierre Elea,c, Abdouramani Dadjéd,
and Zhicong Chene
a
Laboratory of Technologies and Applied Sciences, University of Douala, Douala, Cameroon; bSignal, Image and
Systems Laboratory, Higher Technical Teacher Training College of Ebolowa, University of Yaounde 1, Yaoundé,
Cameroon; cLaboratory of Electrical Engineering, Mechatronic and Signal Treatment, National Advanced School of
Engineering, University of Yaounde 1, Yaoundé, Cameroon; dSchool of Geology and Mining Engineering, University of
Ngaoundéré, Ngaoundéré, Cameroon; eCollege of Physics and Information Engineering, Fuzhou University, Fuzhou,
China
ABSTRACT ARTICLE HISTORY
Improving the efficiency of photovoltaic (PV) systems has gained priority in Received 10 November 2020
current research due to the large volumes of PV panels installed. Moreover, Revised 31 March 2021
the remarkable efforts made to investigate different methods of diagnosing Accepted 10 April 2021
PV failures have multiplied, giving additional impetus to research on the KEYWORDS
efficiency of PV systems. However, most of these methods are limited in the Photovoltaic system; kernel
number of faults that can be identified; some are expensive and complex, principal component
and others require huge amounts of data to train. In this paper, a simple and analysis; fault detection;
robust multivariate statistical analysis method is proposed for the diagnosis multivariate statistical
and identification of faults in a PV system. From the multitude of data that analysis; fault identification;
can be collected on a PV system in operation or available on most new I-V characteristics
inverters on the market, the method used here is based on kernel principal
component analysis, which looks at and analyses the variance between these
data. Combined with the Hotelling statistic (T 2 ) and Squared Prediction Error
(SPE) index associated with Rolle’s theorem, this analysis identifies six oper
ating states of the PV system: normal operation, short-circuited panels, open-
circuit panels, partially shaded panels, serial resistance degradation, and
MPPT error. 200 samples of different faults at different irradiance have
been generated between an irradiance from 50 to 500 w/m2. The accuracy
rate was 64% for the degradation of series resistance, 91.11% for partial
shading, 100% for open circuit, 100% for short circuit, and 100% for MPPT
error. The various results obtained first from a Matlab-Simulink model, and
then from a real system of 18 modules, demonstrate the efficiency and
performance of the proposed algorithms.
Introduction
The installation of photovoltaic (PV) systems has grown exponentially around the world over the past
decade. These installations across the world generated 89.5GW in 2012 and nowadays the approximate
installed capacity are estimated to 800GW (Jäger-Waldau 2020). Like any other system, a PV array can
be subjected during its operation to various malfunctions leading to a decrease in its performance, and
even totally incapacitating the system (Das, Hazra, and Basu 2018). In a PV array, it is difficult to detect
or identify a fault occurring on a PV module, which may remain obscure until the module is
completely inoperative. Indeed, PV systems are frequently exposed to different kinds of faults and
disturbances that adversely affect the power generated by the system. These faults considerably reduce
CONTACT Patrick Juvet Gnetchejo [email protected] Laboratory of Technologies and Applied Sciences, University of
Douala, Douala, Cameroon
© 2021 Taylor & Francis Group, LLC
2 P. J. GNETCHEJO ET AL.
not only the efficiency but also the lifespan of the PV module. These faults are mainly due to external
interference, dust accumulation on photovoltaic (PV) systems, module aging, shading, or error on the
maximum PowerPoint controller (MPPT). These causes generate several faults such as series resis
tance degradation, partial shading, line fault, short circuit, open circuit and earth faults (Wang et al.
2019). The annual energy loss of PVs due to partial shading is reported to be about 20% (Madeti and
Singh 2017). Therefore, detecting and diagnosing faults quickly are necessary to improve the efficiency
of photovoltaic systems and avoid high costs of maintenance and the risk of fire.
In the literature, several methods have been used for fault diagnosis. They can mainly be classified
into two types: electrical and nonelectrical methods. Among electrical methods, we find model-based
methods, statistics and signal processing methods, and machine learning methods.
Regarding machine learning, (Syafaruddin and Hiyama 2011) proposed a three-layer feedforward-
type artificial neural network for the detection and localization of short-circuit faults in a PV array.
The maximum voltage, maximum current, irradiation, and temperature are used as input of the model
to estimate the voltage at the output of each panel. The merit of this method is its ability to accurately
detect the location of the fault. However, the algorithm is limited to only one fault. (Zhao et al. 2012)
used a new approach based on decision trees for the detection and classification of different defects.
The model can detect and classify faults according to specific types. The model also makes it possible to
identify faults in real-time with very high prediction accuracy. However, the authors recognized that
the model might not perform well when the data is unknown. Artificial neural network combined with
analytical was proposed by (Jiang and Maskell 2015) for fault diagnosing; from the atmospheric data
(temperature and irradiance) two-layered neural network is applied to predict the system’s power.
This predicted power is then compared to the actual measured power, which makes it possible to
detect and identify the occurrence of a fault. The method has the following advantages: it has
a compact structure, high precision, rapidity in detection, and no other equipment and knowledge
of the system is required. A multilayer perceptron has been used by (Mekki, Mellit, and Salhi 2016) for
the detection of partial shading. The maximum voltage, maximum current, irradiation, and tempera
ture were used as input elements of the neural network. Although the method is claimed to be simple
by authors, it is limited to a single fault; besides, the multilayer perceptron requires periodic training to
maintain accuracy. (Garoudja et al. 2017) used a probabilistic neural network to detect in PV array
open and short-circuit faults. In a comparative analysis, the authors show that the results obtained
were better than that of classical artificial neural networks.
The advantage of methods based on machine learning is that they allow to have a clear view of all
the system’s parameters and can detect and identify a multitude of faults. But the main drawback is
usually found in the amount of data which is huge and requires a lot of time for their training, which
makes them inefficient in real-time detection.
Regarding the model-based method, (Silvestre, Chouder, and Karatepe 2013) used a comparison-
based model to detect and identify faults on the DC side of a PV array. The detection method is based
on the analysis of power losses, where a comparison is made between measured and simulated
efficiencies. The faults are identified by calculating the deviation of the measured and simulated
voltages and the current. This difference is then compared to a previously defined threshold.
Despite the simplicity of the algorithm, it remains limited in the number of faults detected, and the
cost is still high because of the number of sensors used. (Platon et al. 2015) proposed a practical
approach for on-line fault detection in a PV system. The approach is based on a comparison between
predicted and measured output alternating voltage. A significant discrepancy between the measured
value and that predicted by the model is considered as fault. The proposed method is capable of
detecting 90% of faults. However, despite the simplicity and the low cost, the proposed method does
not specify the type of fault. A simple approach for the diagnosis and identification of open circuit and
short-circuit faults in a PV system was proposed by (Garoudja et al. 2016). The method is based on
evaluating three coefficients: the ratios between the measured and simulated power, current and
voltage at the maximum power point. Despite the simplicity of the method, it remains limited to
only two faults and can also generate false alarms. (Kuo et al. 2017) used a fractional-order system to
ENERGY SOURCES, PART A: RECOVERY, UTILIZATION, AND ENVIRONMENTAL EFFECTS 3
detect and identify open circuit, partial shading, ground and line faults. The method is based on the
comparison of the power at the maximum power point. An MPPT search algorithm is used to estimate
the maximum power which is then compared to the actual data measured by an electric meter.
A genetic algorithm with a binary crossover has been proposed by (Das, Hazra, and Basu 2018) to
identify the number of open and short circuits in a PV array. The advantage of the method is that it
accurately gives the number of open-circuit and short-circuited modules and identifies the concerned
faults’ position. Despite these advantages, the method has two drawbacks: metaheuristics based on
random phenomena can lead to false alarms, and the cost of the system is high because one sensor
must be placed on each PV module. In (Karimi et al. 2020), a current-based approach is used to detect
hotspot on a string PV; authors compared the proposed method with I–V method to show the
proposed method’s efficiency.
The main advantage of model-based methods is that they require very little material for their
implementation; they are efficient for large-scale PV systems (Fadhel et al. 2019). Despite their ability
to detect faults in real-time, most model-based methods experience the following drawbacks (Appiah
et al. 2019): The simulation model errors considerably affect the performance of the method. Sudden
changes in parameters such as temperature and irradiation usually cause false alarms. Lack of model
updates can significantly affect the method.
Regarding statistics and signal processing, (Yao et al. 2014) used discrete wavelet transform to
detect arc faults in a PV system. The detection algorithm uses the variation of current in the time
domain and a normalized value of the RMSE from the wavelets as input elements. The results obtained
show that the method can achieve 100% accuracy for a charge current of less than 15 A. However,
errors were observed for a current of 25 A. Besides, this algorithm remains limited to one type of fault.
A detection algorithm based on a quantum probability model was proposed by (Georgijevic et al.
2016). The advantage of this method is that it allows real-time fault detection and does not require any
other system data. However, the method is limited to a single fault. On other hand, (Yi and Etemadi
2017) proposed a line fault detection algorithm based on the multi-resolution decomposition of signals
and SVM classifiers. This method uses current and voltage data from the system and a partial amount
of data for training the model. The main advantage of this method is its ability to detect fault at very
low irradiation. However, it is limited to a single fault, and the amount of data to be trained can be very
heavy for a large-scale system. In (Kumar et al. 2018), the authors used wavelet packet decomposition
to detect line and partial shading faults. Wavelet packet decomposition was used to extract the input
characteristics of the detection algorithm from the voltage and current data of the PV system. This
method contains several threshold values that can lead to a false alarm if they are not set correctly.
Furthermore (Qureshi et al. 2020) used independent component analysis to diagnose faults in PV
system. (Fadhel et al. 2019) proposed a robust and straightforward statistical method based on
principal component analysis (PCA) to detect shading faults in a photovoltaic array. This method’s
primary advantage is its low cost as it utilizes the current-voltage characteristic data available in most
new inverters. In (Prasad and Dash 2020), multifractal detrended fluctuation based on Simulink model
was used to detect faults both on the AC and BC sides of a photovoltaic installation. In (Georgijevic,
Stojic, and Radakovic 2020), the authors proposed a method for detecting arc series fault based on
small-signal impedance and noise power monitoring. The advantages of the methods based on
statistics and signal processing are that they do not require an explicit model of the system for
supervision. Moreover, the methods are suitable for real-time detections and are significantly effective
in detecting partial shading faults (Fadhel et al. 2019). However, most of them have a limitation for the
number of faults to be identified; some require external function generators for signal processing,
which highly increases the cost of the diagnostic system.
Many other relate works have been recently published. In (Zhao et al. 2021), deep learning based on
convolutional neural networks is used to detect faults in a production line. (Dhibi et al. 2020) used
Reduced kernel PCA to enhance the Random Forest Technique for diagnosing fault in Spread
spectrum reflectometry is used to diagnose and identify faults in a string PV (Saleh et al. 2021).
A mathematic model based on Simulink is used to analyze the effect of partial shading and short-