Solar PV Module Fault Classification Using Artificial Intelligence and Machine Learning Techniques
Solar PV Module Fault Classification Using Artificial Intelligence and Machine Learning Techniques
ORG
supplied by solar power. It is reasonable to focus on the connected in series the system voltage increases and if
design of smart systems to monitor such solar power connected in parallel the system current increases as in
systems and classify the fault type that might be present Figure 2. Every solar cell design should account for
for reliability. PV systems provide several advantages over parameters that affect the amount of generated current like
other conventional energy systems. The energy provided irradiance, temperature, and type of semiconductor
is modular in that the capacity to be generated depends on material.
the amount required it also provides easy options to
expand the power system to meet the demand. Regardless
of the massive initial cost of setting up a PV power system;
there is no cost on machineries like transformers,
generators, and transmission equipment. Overall
maintenance of a PV system is more modular and easily
accessible. The above attributes have resulted in an
expansion of photovoltaics, and India has invested enough
to improve the sector as shown in Figure 1. Figure 2: I-V characteristics of a solar cell
There are many models for solar cells that have been
Figure 1: Solar market in India by installed capacity
designed to meet different conditions. However, the
best model should simply be accurate enough to
There has been a progressive increase in the
account for most solar cell parameters. The numerical
installation of solar power plants. With the continued
approach to model the PV module using the equations
rate of installation, a future with more clean and
that define its basic working. Several circuit software
reliable energy will be guaranteed to improve the
can be used to design this model but this uses Simulink
energy sector. The monitoring systems that can capture
in Matlab to build a 1.3KW PV system. The approach
real-time analysis of power plants are been designed to
to building the system is shown in the sequence of
improve the reliability and stability of power systems
Figure 3. The 1.3KW plant is then introduced to
improving energy utility by the industries and avoiding
different fault configurations for data generation for
the risk of fires or any other hazards.
future fault prediction on the plant.
Solar Cell Models and depends on the semiconductor material used. The
parameters in equation (1) are shown in Table 1.
Solar cells have non-linear I-V characteristics that
vary with irradiance and so it isn't suitable to model
(𝑉+𝐼𝑅𝑠) 𝑉+𝐼𝑅𝑠
a solar cell to be a constant voltage source instead I = Iph − Io (e − 1)-( )
𝑛𝐾𝑁𝑠 𝑅𝑠ℎ
solar cells are modelled as a current source. Among
Equation (1)
the different circuit designs, the single-diode and
double-diode models are the most used to describe
the characteristics of the solar cell. The Rs describes
the ohmic losses in the contacts solar cell contacts Table 1: Solar cell parameters
metal-semiconductor interfaces. It is assumed for the Symbol Parameter
sake of simplicity that there is no recombination I Solar cell current (A)
V Solar cell voltage (V)
around the junction region for the single-diode
Iph Light-generated current (A)
model. Especially for semiconductor materials with Ish Shunt resistance current (A)
larger bandgaps, this assumption leads to deviations Io Saturation current of the diode (A)
Rsh Solar cell shunt resistance (ohms)
between actual and simulated characteristic curves of Rs Series resistance (ohms)
the solar cell but the double-diode model attempts to n diode ideal factor
k Boltzmann’s constant =1.38×10-23 J/K
incorporate the recombination in the junction. The
q Electron charge = 1.6×10-19 C
equivalent circuits for the single-diode model and the T ambient temperature (K)
double-diode model are shown in Figure 4(a) and
Figure 4(b).
The double-diode model incorporates a second diode
as in equation (2) which represents the losses due to
recombination on the surface and the junction of the
solar cell. Equation (2) provides a more accurate
description of a solar cell but requires a higher
a. b. computation power.
Figure 4: Solar cell circuit models (a) Single-diode model (b) Double-diode
model
(𝑉+𝐼𝑅𝑠) (𝑉+𝐼𝑅𝑠) 𝑉+𝐼𝑅𝑠
For the single-diode model in Fig. 4(a), the solar cell I = Iph − Io1 (e
𝑛𝐾𝑁𝑠
− 1)- Io2 (e
𝑛𝐾𝑁𝑠
− 1) −
𝑅𝑠ℎ
Parameter Value
The PV industry is guaranteed by the reduced cost to the ground resulting in the
substantial back-feed current
of materials hence the production cost of energy.
Series arc fault discontinuity in any current-
In the last few years, the industry has also seen a carrying conductor
Arc faults
qualitative improvement regarding growth in grid Parallel arc fault Insulation failure in the
connectivity. The PV system suffers loss more for current-carrying conductors
a variety of faults. The different common faults The line to line Accidental short-circuiting of
two strings of solar cells
include ground fault, the line to line fault, hotspots, fault
bypass mismatch, and arc faults which all result in Bypass diode Short-circuiting due to
fault incorrect connection
high current inflows with the potential to cause a
Delamination and yellowing of
fire. Fault analysis and protection besides modules, insertion of bubbles
Degradation in the modules, cracking, and
improving the efficiency and reliability of the PV fault defects in the anti-reflection
system, if ignored, can lead to a reduction in the coating of the panels
power generated and breakdown of the power Open circuit Unplugging of connection
fault wires in the junction box
system.
Inverter faults Failure in the components
Classification of faults in PV system
Outage Blackout due to weather
conditions like lightning,
There so many types of faults either electrical or
storm, hurricanes
non-electrical that affect a power system. The idea
of broadly classifying the possible faults is close to
impossible, and throughout the years, many
different types of research have been carried to
Mismatch faults in the solar panels
isolate different fault types as much as possible.
Some of the common faults like the mismatch Mismatch faults are most common in the PV arrays
faults, ground faults, line to line faults, bypass resulting in power loss and permanent damage to
diode faults, and arcing are explained here as they the modules. Mismatch faults in PV modules occur
when electrical parameters of some panels are for ground faults are.
significantly different from the others or
i. Degradation and liquid entry leading to a short
mismatched. The possible reason for the mismatch circuit between EGC.
is the varied irradiance levels (Figure 9) on ii. Animal infestation resulting in damaged cable
different panels or different temperature levels. insulation.
Mismatch faults are further classified into two iii. Insulation damage to cables due to aging,
types: corrosion due to water, damaged panels, or
Temporal mismatch faults: Caused by shading of incorrect installation.
panels from structures, clouds, foliage, dust on
iv. Short circuits in the PV combiner box
panels, and anything else that block radiation.
Permanent mismatch faults: Mainly caused by
hotspots and aging of the modules. The shading
effect results in uneven distribution of the
irradiance on the PV array as shown in Figure 9
causing reduced power production
A PV panel design comprises noncurrent-carrying Detection Interruption (GFDI) a fuse, and Residual
(NCC) metals (e.g.., module frame, and the metal Current Device (RCD) is also used.
enclosures) to provide mechanical support during
normal operation
Arc faults
of the panels .The conduits can accidentally short-
circuit the current-carrying wires of the panel due to
Factors cause arcs within the module and these
various reasons. To prevent a short circuit all NCC
persistent burns for a long time interval cause
conductors are connected to an equipment grounding
massive damages in the PV system. Arcs burn at
conductor (EGC) to the ground and the conducting
very high temperatures which depends on the
conductors are well insulated. The potential reasons
IJNRD2301123 International Journal of Novel Research and Development (www.ijnrd.org) b176
© 2023 IJNRD | Volume 8, Issue 1 January 2023 | ISSN: 2456-4184 | IJNRD.ORG
available energy and thermal characteristics of the and this reduces the risk of hotspots and minimizes
panel. The current carrier generation in a module the shading effect. Any damage to the diode results
depends on the irradiance and therefore can aid in in local hotspots causing heating and damaging the
stable ignition conditions for electrical arcs. If not solar cells and the panel effectively. Bypass diodes
put off within a short period, the direct radiation on play a very essential role without which the entire
the arc may start a fire. panel breaks down over time. The diodes are
Arc faults occur in a variety of locations for connected across each module or a group of panels
example in a fuse, terminals, inverters, bypass in systems with so many panels.
diodes and also within the PV modules at joint
Results
locations. Classification of arc faults:
Successful fault classification in PV systems is
i. Parallel arc fault to the ground
essential for reliability in power production. Any
ii. Cross-string parallel arc fault
fault configuration results in a shift from the
iii. Intra-String parallel arc fault
optimal operating conditions of the power plant
The line to line faults resulting in a reduced capacity and potentially into
A line to line fault is an accidental short circuit a total power system breakdown. Fast
between two or more random points in an array identification and isolation of the fault will ensure
that are operating at different potentials. The line satisfactory customer service but the key depends
to line faults is more difficult to isolate with any on pinpointing the exact type of fault the system is
conventional fault clearing devices. The line to line suffering from at the time. Some example
faults depicts distinct behavior under low simulated conditions on the plant are:
irradiance and at night time to day time .A line to Healthy condition: The power capacity during
line fault can be represented as in Figure 11 normal operation of the “1-Soltech” solar
shorting two points There are two different panel is 1.3KW as in Figure 12 (a) the
techniques for analysis of faults in PV arrays; generation resides around this capacity during
i. steady-state analysis good sunshine hours and its desired to operate
ii. The transient fault analysis at these conditions.
During shading conditions, bypass diode bypasses operating point of the PV system. The capacity
the non-generating group of panels at low voltages reduces and line fault risk heating of NOC if in
contact and may cause a fire and damage to Artificial Intelligence and Machine Learning
insulation. All line to line simulated on the 1.3KW Artificial intelligence (AI) deals with the ability of
system resulted in the reduced capacity to 914W as a computer system to perform tasks commonly
Figure 12(c). similar to human intelligence. AI involves
developing systems endowed with the intellectual
processes that mimic human characteristics like
the ability to reason, understand, generalization,
and learn from experience. Despite all continuing
advances in the technology, there are still
challenges in processing the speed and memory
capacity of the systems and there haven’t been
programs that can match human flexibility in
general to meet the human level of intelligence. In
Lazy Learners: Lazy learners just store the training data Multi-Class Classification: An isolation process
and await the testing data to appear. The classification with more than two classes and each sample is
depends on how close the stored data is related. They assigned to one and only one label.
have more predicting time compared to eager learners.
Training: A process of feeding data to an algorithm
An example of such is the k-nearest neighbour.
(F(x,y)) with the ability to learn the data.
Eager Learners: Learners construct a general
classification approach based on a particular Prediction: The process of decoding future
training data to design models with the ability to instances based on the training obtained from
Feature: A feature is a measurable or observable the trained model to give a prediction and the
property of the phenomenon being observed. Refer accuracy is cross-validated through analysis. The
to the parts or properties that form the entire machine learning classification models in this
distribution, the classifier design underlines the with high accuracy. This line is at an equal distance
data.Lazy learning algorithms keep all the training away from the SVMs. Support vectors refer to the
data for prediction on the outcome of future closest points to the hyperplane. SVMs explore the
instances. The premise of KNN is the fact that different possibilities of the several lines then
similar objects appear close to each other. Objects select the farthest from the support vector. In the
are classified based on a majority vote from their case of a non-linear classification data, the
neighbors and are assigned tothe category closest algorithm utilizes several kernels that map the
to their neighbors. K is a positive integer parameter points unto different planes to isolate the clusters
passed to the KNN for tuning to improve accuracy. accurately. In the work, the SVM classifier applies
the OVR strategy to build a binary classifier for
Random forest
every class. The classifier focuses on the current
Random forest algorithms use ensemble class and treats it as a positive leaving the rest
algorithms that create several decisions called trees negative. A cluster is treated as a single class and
as in Figure 14 from the training dataset to predict is fit against the rest of the clusters
outcomes of future instances. Once data is fed on
Decision Tree
to the algorithm it sets rules from it which are used
for the prediction on future outcomes on the new
data. The classifiers have an up to down approach
analysis on the nodes starting from the root node
applying a binary split first on the most predictive
features creates nodes further down through the
process. This continues until leaf nodes are
generated with no possibilities of further splitting.
This process is based on the calculation of entropy.
The model predicts a target class for each leaf node
upwards to the actual class.
Decision trees are very common algorithms and
Random forests as the name ts create several trees
their learning methods with a wide range of
called a forest of decisions based on the training
applications New data is classified by sorting from
data depicting patterns and behavior contained in
up to down of the root node following attributes
the data. The random forests called a bagging
tested in the previous nodes. Each branch below a
classifier. Decision trees pick a random set of
node indicates a possible value for an attribute. .
features from the fed data and differ from in that
The process is repeated until a leaf node hence
regard.
reaching the classification of the instance.
Support Vector Machines
This process is recursive and the decision to isolate Data Analysis and Interpretation
the most significant subset to be set aside repeats To fully understand the nature of the data and its
to either add or remove a feature. The process meaning we employ Jupyter notebook using the
repeats until the desired feature subsets are Anaconda IDE. The different libraries are
attained. The feature subsets are cross-validated employed to display data and interpretation. The
for their performance using the right learning different features are checked parallel to
classifier. These methods are extensive in understand how they are related to each other and
searching and can easily find the best features for interpret their significance in the system. To get a
the training model. But the selected features using general description of the data we use the
wrappers are only great for a particular data on “describe” command and Table 4 gives the
which the model was trained and may not perform summary of the data.
perfectly on future instances risking overfitting on
Table 4: General description of the dataset
the data.
Irradiance Temperature Imp Vmp Pmax
technique gives a rank to the entire dataset on top Min 300 12 1.3 27.9 41.7
25% 500 16 4.5 30.4 152.6
of selecting the best features. Ranking helps during
50% 600 24 6.8 31.5 228.2
performance enhancement and analysis which 75% 800 26 8.9 32.3 326.4
improves performance. Filters are not good at Max 1000 28 14.9 93.6 1302.2
Class C 𝑋3 𝑌3 𝑊3 C
Total X Y W T
CM(i,j)
normalized, then ∑Nc
m=1 CM(i, m)=1 giving Re (i) = CM
𝐶𝑀n (𝑖, 𝑗) = --- Equation
(∑𝑁𝑐
𝑖=1) ∑𝑖=1 CM(i,j)
𝑁𝑐 (i, i), implying that the diagonal elements of the matrix
are recall values.
(6):
Pr(i) is a fraction of samples that are correctly
The second type of normalization is done row-wise classified to class i taking into account all the samples
by dividing each element of the confusion matrix by that are classified to that class. Precision is a measure
the sum of elements of the respective row (the true of accuracy on a class basis and is defined according
population of the class that has been mapped on that to the equation (10):
row). After normalization has taken place we can CM(i,i)
𝑃𝑒 (𝑖) = Equation (10)
∑Nc
m=1 CM(m,i)
now discard the information that is related to the size
of each class. In this form, all classes are considered Where ∑Nc
m=1 𝐶𝑀 (𝑚, 𝑖) represent the total number of
to be of equal size and the dataset is now class- samples that were classified to class i note that, if all
balanced. This normalization as in equation (7) gives classes contain the same number of samples, i.e. if all
the elements a per unity analysis classes then all three performance measures can be
computed from any version of the confusion matrix
CM(i,j)
𝐶𝑀n (𝑖, 𝑗) =∑ 𝑁𝑐 Equation (7) either with or without normalization. Otherwise, if the
CM(i,n)
𝑛=1
classes are not balanced, the second normalization
Analysis of the Confusion matrix before method will yield different performance results from
normalization, we can extract three useful the first, standard normalization scheme. An important
performance measures, namely the overall measure that combines the values of precision and
accuracy (𝐴cc) of the classifier, which represents recall is the F measure, which is computed as the
the fraction of samples of the dataset that have harmonic mean of the precision and recall values as
been correctly classified. The overall accuracy equation (11).
(𝐴cc) in equation (8) can be computed by dividing
2Re(i)Pr(i)
the sum of the diagonal elements by the total sum 𝐹 (𝑖) =Pr(i)+Re(i) Equation (11)
of the elements of the matrix (T).
Following Table 7 for an easier understanding
equation (12) to equation (14) adopts the
( ∑𝑁𝑐
𝑚=1 𝐶𝑀(𝑚,𝑚) nomenclature and we
𝐴cc= -- ---- Equation (8)
(∑𝑁𝑐 𝑁𝑐
𝑚=1) ∑𝑚=1 𝐶𝑀(𝑚,𝑚) can see that 𝑋1 = CM(i, i) and X =
∑Nc
m=1 CM(m, i) and the performance measures are:
(14) Table 9: Data obtained from the confusion matrix from the classifiers
Prediction Results:
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The dataset of 1062 (6*177) data points was split 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
with 80% of the data for training and around 20% for
0 0 0 4 0 0 0 0 0 0 0 0 0 0 0
training. 20% of the data gives about 250 data points 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0
and this confirms with confusion matrix of (16*16) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 3 0 0 0 0 0 0 0 0
obtained for the prediction on the testing data points
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
as in Table 8. To calculate the accuracy equation (14) 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0
is used on all the four algorithms then the algorithm 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
with the highest score is selected. Data from the 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0
confusion matrix of the other three algorithms are 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0
reported in Table 9 and Table 8 shows the detailed
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
confusion matrix for decision tree since it gave the Using the equations (14) we calculate the accuracy
highest score. score of each classifier from data in Table 9 as
shown in Table 10 and Table 11 shows the
Table 8: The obtained confusion matrix using Decision tree
classifier summary of the performance by each classifier
∑ 𝐂𝐌(𝐦, 𝐢) 5 25 22 17
Table 10: Tabulation of the performance score of the
𝐦=𝟏 classifiers from the confusion matrix
𝐍𝐜 𝐍𝐜
∑Nc
m=1 CM(m, m) =31 indicating the sum of the
which is the total elements and the obtained Table 11: Accuracy score by different classifiers used
31
Accuracy score is 𝐴𝑐𝑐 =36 = 0.8611. Similarly, Model Testing Accuracy
Decision tree 86%
Table 9 data is used to calculate the accuracy of the Random Forest 31%
ANN 38%
other classifiers
Support Vector 53%
Machine (SVM)
data was visualized in Jupyter notebook using python to 1981; 27:622-7. 10.1109/TIT.1981.1056403
infer the meaning hidden about the 1.3KW plant. Four 10) C.C. Chang and C.J. Lin. LIBSVM: a library
for support vector machines. http://www.csie.
different algorithms were used and the most accurate
ntu.edu.tw/cjlin/libsvm, 2001.
with an efficiency of 86% was the decision tree and was
11) Rossi D, Omana M, Giaffreda D, Metra C.
used to implement the design. The biggest challenge Modeling and detection of hotspot in shaded
was exhausting all the possible fault configuration to photovoltaic cells. IEEE Trans Very Large
Scale Integr Syst 2015; 23:1031–9.
capture the entire system behavior, for example, it was
doi:10.1109/TVLSI.2014.2333064.
impossible to simulate the ground fault on the designed
12) Johnson J, Montoya M, Fresquez A, Gonzalez
DC power system. This suggests that the model may not
S, Granata J, Mccalmont S, et al.
effectively classify instances that may represent fault Differentiating Series and Parallel
conditions that were not incorporated during model Photovoltaic Arc- Faults Arc-Fault Types
2012.
training. This is the biggest throwback of using
13) Zhao Y, Yang L, Lehman B, de Palma J-F,
generated synthetic data rather than real-time data from Mosesian J, Lyons R. Decision tree-based
a plant collected over the years. Future work can include fault detection and classification in solar
photovoltaic arrays. 2012:93–9.
expanding the dataset to incorporate all year seasonal
doi:10.1109/APEC.2012.6165803.
conditions and possibly the implementation of fault
14) Yi Z, Etemadi AH. A novel detection
classification techniques on real plant data and improve algorithm for Line-to-Line faults in
the model model design through tunning of classifier Photovoltaic (PV) arrays.
parameters using new python packages.
Bibliography
1) Ministry of Renewable energy government of
India available at https//mnre.gov.in
2) Alternative Energy available at
http://www.alternative-energy.com
3) RECP, “Global Market Outlook - For Solar
Power/2017-2021,” 2017
4) S. K. Firth, K. J. Lomas, and S. J. Rees, "A
simple model of a PV system
5) Alternative energy tutorials
athttps://www.alternative-energggy-
tutorials.com
6) ”K Nearest Neighbor- sklearn”. [Online].
Available: https://scikit-
learn.org/stable/
modules/generated/sklearn.neighbors.KNeig
hborsClassifier.html
7) ”Random Forest-
sklearn”.[Online].Available: https://scikit-
learn.org/stable/
modules/generated/sklearn.ensemble.Rando
mForestClassifier.html
8) PVGIS for assessment of solar PV energy
potential of Odisha. Int J Renew Energy Res
2016; 6:61–72.
9) Short RD, Fukunaga K. The optimal distance
measure for nearest neighbor classification.
IEEE Transactions on Information Theory
IJNRD2301123 International Journal of Novel Research and Development (www.ijnrd.org) b187