0% found this document useful (0 votes)
29 views20 pages

Towards Improving Prediction Accuracy and User Level Expl - 2023 - Expert System

Uploaded by

cjdagua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views20 pages

Towards Improving Prediction Accuracy and User Level Expl - 2023 - Expert System

Uploaded by

cjdagua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Expert Systems With Applications 233 (2023) 120955

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Towards improving prediction accuracy and user-level explainability using


deep learning and knowledge graphs: A study on cassava disease
Tek Raj Chhetri a,b ,∗, Armin Hohenegger a , Anna Fensel c,d , Mariam Aramide Kasali e , Asiru
Afeez Adekunle f
a
Semantic Technology Institute (STI), Department of Computer Science, Universität Innsbruck, Innsbruck, 6020, Austria
b Center for Artificial Intelligence (AI) Research Nepal, Sundarharaincha-09, Nepal
c Wageningen Data Competence Center, Wageningen University & Research, Wageningen, The Netherlands
d Consumption and Healthy Lifestyles Chair Group, Wageningen University & Research, Wageningen, The Netherlands
e
Department of Plant Physiology and Crop Production, College of Plant Sciences, Federal University of Agriculture, Abeokuta, Nigeria
f
Kwara Agricultural Network, Ilorin, Kwara, Nigeria

ARTICLE INFO ABSTRACT

Dataset link: Survey Data: Towards Improving P Food security is currently a major concern due to the growing global population, the exponential increase in
rediction Accuracy and User-Level Explainabili food demand, the deterioration of soil quality, the occurrence of numerous diseases, and the effects of climate
ty Using Deep Learning and Knowledge Graphs change on crop yield. Sustainable agriculture is necessary to solve this food security challenge. Disruptive tech-
: A Study on Cassava Disease (Original data)
nologies, such as of artificial intelligence, especially, deep learning techniques can contribute to agricultural
Keywords: sustainability. For example, applying deep learning techniques for early disease classification allows us to take
Explainable AI (XAI) timely action, thereby helping to increase the yield without inflicting unnecessary environmental damage, such
Agricultural sustainability as excessive use of fertilisers or pesticides. Several studies have been conducted on agricultural sustainability
Knowledge graphs using deep learning techniques and also semantic web technologies such as ontologies and knowledge graphs.
Deep learning However, the three major challenges remain: (i) the lack of explainability of deep learning-based systems (e.g.
Cassava disease information), especially to non-experts like farmers; (ii) a lack of contextual information (e.g. soil or
plant information) and domain-expert knowledge in deep learning-based systems; and (iii) the lack of pattern
learning ability of systems based on the semantic web, despite their ability to incorporate domain knowledge.
Therefore, this paper presents the work on disease classification, addressing the challenges as mentioned earlier
by combining deep learning and semantic web technologies, namely ontologies and knowledge graphs. The
findings are: (i) 0.905 (90.5%) prediction accuracy on large noisy dataset; (ii) ability to generate user-level
explanations about disease and incorporate contextual and domain knowledge; (iii) the average prediction
latency of 3.8514 s on 5268 samples; (iv) 95% of users finding the explanation of the proposed method useful;
and (v) 85% of users being able to understand generated explanations easily—show that the proposed method
is superior to the state-of-the-art in terms of performance and explainability and is also suitable for real-world
scenarios.

1. Introduction on the economy. According to the United Nations Food and Agriculture
Organisation, plant diseases cost the global economy over $220 billion
With one-third of the global population (2.37 billion) already expe- annually (Food and Agriculture Organization of the United Nations,
riencing moderate or severe food insecurity (UN, 2021) and a rapidly 2021).
expanding global population, which is expected to reach 9–10 billion Today, modern technologies, specifically artificial intelligence (AI),
by 2050 (Sharma et al., 2020), the agriculture sector is under immense are used to combat the issue of plant disease. This is due to the
pressure to increase food production. This pressure is further exacer- predictive capacity of AI technologies, which enables early identifica-
bated by climate change, which has led to a decline in soil quality and tion of potential diseases and prompt preventative measures, thereby
the occurrence of numerous diseases such as paddy stackburn (Ayoub reducing loss. As a result, a large number of studies (see Section 2)
Shaikh et al., 2022; Chen et al., 2021), which have a significant effect

∗ Corresponding author at: Semantic Technology Institute (STI), Department of Computer Science, Universität Innsbruck, Innsbruck, 6020, Austria.
E-mail addresses: [email protected] (T.R. Chhetri), [email protected] (A. Hohenegger), [email protected] (A. Fensel),
[email protected] (M.A. Kasali), [email protected] (A.A. Adekunle).

https://doi.org/10.1016/j.eswa.2023.120955
Received 18 April 2023; Received in revised form 13 June 2023; Accepted 5 July 2023
Available online 8 July 2023
0957-4174/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 1. Explainability using SHAP. (a) Input image with disease label ‘‘Cassava Mosaic Disease’’ (b) The leftmost image with different colours highlights the features that had an
impact on the prediction. The four images on the right, labelled ‘‘Cassava Mosaic Disease’’, ‘‘Healthy’’, ‘‘Cassava Brown Streak Disease’’, and ‘‘Cassava Green Mottle’’, are the top
4 predictions.

have been conducted on plant disease detection and classification using made by existing works (Section 2) cannot be understated. This is
AI technologies, particularly deep learning (DL), a sub-field of machine because having just explainability would not solve the problem, as the
learning (ML). There has been a substantial improvement in the pre- identification of a potential disease must also be correct, which is the
diction performance of DL, with an increased prediction accuracy of other focus of this work. Therefore, both explainability and prediction
90 percent (or even higher in some cases) for disease detection and accuracy are equally important.
classification (Emmanuel et al., 2023). However, DL-based techniques The second motivation is the limited contextual information in
suffer from the explainability issue (Chaddad et al., 2023; Gaur et al., existing studies and the exclusion of the domain knowledge, such as
2021). in DL (or DL-based studies) (Chhetri, Kurteva, et al., 2022; Holzinger
To combat the issue of explainability in DL, the AI academic & Müller, 2021). Domain knowledge is knowledge acquired by domain
community has been actively investigating mathematical approaches, experts over time and is a valuable source of information unavailable
such as LIME (local interpretable model-agnostic explanations) (Ribeiro in datasets like images. Moreover, images have limited contextual
et al., 2016) and SHAP (SHapley Additive exPlanations) (Lundberg & information (Gaur et al., 2022). For example, images lack information
Lee, 2017), to improve the DL model explainability (Yang et al., 2021). about the relationship between plant types, the area they are grown
However, explainability techniques based on SHAP and LIME are not in, and the environmental conditions, such as soil moisture, of that
suitable for non-experts (or end users). Additionally, Yang et al.’s (Yang region and their impact on plant health. Both domain knowledge
et al., 2021) study confirms the same: the explainability techniques and contextual information help in the improvement of performance,
based on SHAP and LIME provide an explanation from the computer support explainability, and ensure the safety of AI by preventing, for
scientists’ perspective, not the user’s, thereby widening the gap in example, hallucinations (Gaur et al., 2022). Semantic technologies,
user-centred explainability techniques. namely ontologies and knowledge graphs (KGs), on the other hand,
Agriculture is one of the sectors dominated by non technological can incorporate domain expert knowledge, provide reasoning capa-
experts (or non-experts), such as farmers. Therefore, any technological bility, and also enables context awareness to support explainability
solution aimed at the agricultural sector, e.g. for disease detection and (Chhetri, Kurteva, et al., 2022; Sharma et al., 2019). Studies have
classification, must be centred on farmers, as they are the end users, in also been conducted using semantic technology for disease classifica-
order to be effective. The use of SHAP and LIME based explanations is tion such as by (Jearanaiwongkul et al., 2018) and (Lacasta et al.,
therefore ineffective. For example, say there are the three input features 2018). However, such approaches based on semantic technology are
(or independent variables) 𝐴, 𝐵, and 𝐶 and the dependent variable (or limited in terms of their capabilities, such as their ability to learn
target label) 𝑌 . Using SHAP, information about, for example, the impact complex patterns like statistical approaches like DL. Therefore, there
of 𝐴, 𝐵, and 𝐶 on predicting Y but not the explanation of 𝑌 can be ob- is a need to synergise DL-based approaches with semantic technology-
tained. These SHAP-derived explanations are helpful for understanding based approaches, particularly in the domain of agriculture, so that the
model choices, but they are not very useful for farmers (or non- benefits of both worlds can be obtained, which are lacking in current
experts), for whom knowing about the disease would be most helpful. work. In the Dagstuhl Seminars report (Benedikt et al., 2020), Claudia
Fig. 1 shows the explainability using SHAP for cassava (Fig. 1(a)). d’Amato made a similar point about combining symbol-based methods,
Fig. 1(b) (leftmost image) displays the explanations generated by the e.g. methods based on KGs, and numerical methods like ML.
SHAP that illustrate the importance of the features upon which the Therefore, to address the limitations discussed above, this paper
predictions were based. As can be observed from Fig. 1(b), SHAP-based presents the research to improve the classification of plant diseases
explanations are not useful for users, i.e. farmers. Existing studies, by combining semantic technologies and DL. The main objectives
particularly those based on DL in agriculture (see Section 2), are mostly of this research are as follows: (i) to generate user-level (or user-
concentrated on performance improvement and therefore lack user- comprehensible natural language) explanations about diseases; and
level explainability. This therefore constitutes the first motivation for (ii) to improve the prediction accuracy of disease classification by
this work. However, the importance of the performance improvement incorporating domain knowledge and contextual information, such as

2
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

effects of soil moisture and relative humidity on plant health, using ot incorporate additional contextual information. Yang et al.’s (Yang
semantic technology and combining it with DL. This study on plant et al., 2023) work focuses on generating knowledge aware explanations
disease classification focuses on cassava plants. This is because of the for natural language inference using KGs, for which they propose a
importance of cassava plants. Cassava, which is mostly grown in Africa, generative model that makes use of the KGs. In particular, the KGs
is the fourth most important staple crop and plays a significant role in are used to address the following problems: (i) lack of conformance
the diets of over a billion people around the world (Ajayi & Olutumise, to the common sense of existing models; and (ii) their lack of in-
2018). The following contributions are made in response to this study’s formativeness. However, the focus of this work is on language tasks,
objectives: and their application to a multimodal scenario such as in this study
remains unexplored. Moreover, Yang et al.’s (Yang et al., 2023) work
1. A generic approach is proposed to combine semantic technology
demonstrates the additional benefits that can be realised by utilising
and DL in order to improve prediction accuracy and gener-
KGs (or semantic technology). Similarly, other works, such as the
ate user-comprehensible natural language explanations using
one by (Amador-Domínguez et al., 2023) focus on explainability to
cassava disease classification as a case study.
understand the predictions made by ML models. Their work focuses on
2. The reusable cassava disease ontology is developed to enable the
generating explanations of KG embedding predictions.
incorporation of domain-specific cassava disease knowledge.
In conclusion, the majority of works on explainability are not geared
3. The proposed approach is designed and implemented as deploy- towards non-experts and are primarily concerned with comprehending
able software to facilitate reusability and evaluated the proposed predictions and the implications of input variables on final predictions.
method for both performance and explainability. Some works, such as the one by (Sammani et al., 2022), focus on
The remainder of the paper is organised as follows. Section 2 generating natural language explanations. However, their work lacks
presents the related works. Section 3 details the proposed approach and the benefits that can be obtained through the use of KGs, and there
Section 4 provides details about the experiment, including the dataset, are limitations such as the need for a large amount of training data.
system information, evaluation metrics used, and the implementation. Therefore, there is still a need for further research on explainability
Section 5 discusses the results and finally, Section 6 provides the approaches that can be both reliable and useful to non-experts, the issue
conclusion. that the proposed work addresses.

2. Related work 2.2. ML-based studies

This section provides an overview of the related work. Section 2.1 (Emmanuel et al., 2023) conducted research on the classification
provides a brief overview of the explainable AI techniques. Section 2.2 of cassava diseases utilising the pretrained models VGG16 and Mo-
provides an overview of the ML-based studies, whereas Section 2.3 bileNet V2. Emmanuel et al. demonstrated the utility of their proposed
provides an overview of the semantic technology-based studies. In the hybrid kernel methods by attaining a 90.1% accuracy rate. The hy-
review, the studies (Chan et al., 2022; Zhang et al., 2020) that focus on brid kernel methods combine the quadratic kernel with the squared
the explainability of DL, for example, using SHAP have been excluded exponential kernel. Similarly, (Kumar et al., 2023) conducted a study
as this study is focused on user-level explanations for non-experts. on cassava disease detection. Their work focuses on improving ac-
Additionally, because the focus of this work is on plant diseases, curacy by ensembling different computer vision models: EfficientNet,
the review of the work only examines related works in this area with SEResNeXt, ViT, DeIT and MobileNetV3. With their result of 90.75%
particular focus on cassava plant. accuracy, (Kumar et al., 2023) have demonstrated the effectiveness
of their proposed approach. (Ravi et al., 2022) conducted a study
2.1. Explainable AI techniques using attention-based models on cassava disease classification. Simi-
lar to the case of (Kumar et al., 2023), (Ravi et al., 2022) demon-
The majority of the work on explainability focuses on using tech- strated the advantages of ensembling by achieving the best accuracy
niques like SHAP and LIME (Machlev et al., 2022). (Kuzlu et al., 2020) of 87.08%, where they combined the penultimate layer features of
and (Mitrentsis & Lens, 2022) use the techniques SHAP and LIME in A_EfficientNetB4, A_EfficientNetB5, and A_EfficientNetB6. The com-
their work for explainability. Unlike the cases of SHAP and LIME, bined final model is called A_L_EfficientNet. In a similar effort to
Toubeau et al.’s (Toubeau et al., 2022) explainability is based on the improve the prediction accuracy of cassava plant disease, (Ahishakiye
attention mechanism. However, similar to the case of the SHAP and et al., 2023), proposed ensemble model combining Generalised Learn-
LIME-based explanations, the explanations of Toubeau et al.’s (Toubeau ing Vector Quantisation (GLVQ), Generalised Matrix LVQ (GMLVQ),
et al., 2022) work are also focused on understanding the impact of input and Local Generalised Matrix LVQ (LGMLVQ). Their work achieves an
variables on the prediction outcome. Unique to previous works, Bahani accuracy of 82% (100% with overfitting) using the ensemble model.
et al.’s (Bahani et al., 2020) work uses the knowledge base, which This work also builds on these findings about the advantage of en-
contains the explanations of the target labels, to provide explainability semble models, which in the case of this study combine semantic
for the predictions made, a work inline to the proposed approach of technology with DL, and addresses the limitations of explainability
this study. The mapping is done via fuzzy logic. In addition, new that have not been addressed by these studies. (Paiva-Peredo, 2023)
model-agnostic approaches similar to SHAP and LIME have evolved similarly conducted a study on the classification of cassava disease
recently for graph neural networks (GNNs) (Jiménez-Luna et al., 2020) using pretrained models such as VGG16, RASNET50, and MobileNetV2.
and their variants. Examples of such explainability techniques include A total of 12 different models were examined for cassava disease
GNNExplainer (Ying et al., 2019) and CF-GNNExplainer (Lucic et al., classification. Paiva-Peredo achieved the best accuracy of 74.77% with
2022), where CF represents the counterfactual. Recently, (Sammani DenseNet169.
et al., 2022) introduced a GPT (Generative Pre-Trained Transformer)- Similar to other studies, (Chen et al., 2022) conducted studies on
based language model that can simultaneously make predictions and cassava disease classification using pretrained models like EfficientNet.
generate natural language explanations, similar to the research in this However, unlike other works, (Chen et al., 2022) proposed a cross-
study. Two limitations exist: the first is the requirement for large entropy loss and demonstrated the robustness of cross-entropy loss in
training data, and the second is that the accuracy of the generated noisy datasets, achieving an accuracy of 89.3%. This work, particularly
explanations may not be accurate due to the hallucinations (Gaur et al., the DL, makes use of the cross-entropy loss, takes advantage of the
2022). Further, Sammani et al.’s (Sammani et al., 2022) approach lacks findings of (Chen et al., 2022), and makes further improvements both in
the benefit that comes with the use of the KGs, which is the ability terms of accuracy and explainability. Unlike previous studies, (Anitha &

3
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Saranya, 2022) demonstrated the benefits of the data augmentation for 2.3. Semantic-based studies
cassava disease achieving an accuracy 90%. Their work makes use of
convolutional neural network (CNN). A similar benefit of data augmen- (Jearanaiwongkul et al., 2018) propose a semantics-based system
tation has been demonstrated by (Riaz et al., 2022) for cassava disease (i.e. system architecture) for identifying rice diseases. In addition, a
classification. Their work uses the pretrained DL model EfficiennetB3 rice disease ontology (Detras et al., 2016) was developed by reusing
and achieves an accuracy of 83.03%. As with the case of (Chen et al., the rice ontology, plant protection ontology (Halabi, 2009), and plant
disease ontology (American Phytopathological Society, 2016). The au-
2022), this work also takes advantage of these findings about the data
thors further show how the developed ontology can be used given a
augmentation.
farmer’s observation. However, the presented system architecture is
(Too et al., 2019) conduct a comparative study of the fine-tuning of yet to be implemented. Similarly, (Lacasta et al., 2018) present their
DL models: VGG 16, Inception V4, ResNet with 50, 101, and 152 layers, work on an agricultural recommendation system based on SPARQL
and DenseNets with 121 layers, for identifying plant diseases based (SPARQL Protocol and Resource Description Framework Query Lan-
on images of leaves. According to their analysis of the plantVillage guage) queries for crop protection to help with the identification of
dataset, the DenseNets model outperforms other models, achieving pests and selection of suitable treatments. To facilitate the recommen-
an accuracy of up to 99.75%. (Atila et al., 2021) also conducted a dation system, the authors developed the ‘‘Pests in Crops and their
comparative study and discovered that EfficientNet B5 and B4 were the Treatments’’ ontology. (Rodríguez-García et al., 2021) present their re-
most effective models on the plantVillage dataset, even outperforming search on integrated pest management (IPM), a decision support system
Too et al.’s (Too et al., 2019) discovery of DenseNets. for crop pest identification and disease recognition. The purpose of
(Chen et al., 2021) and (Ferentinos, 2018) investigated the plant their work is to reduce the virulence of the pests and also to manage the
disease. Their work incorporates the CropPestO ontology (Rodríguez-
disease. Similar to (Too et al., 2019), their research focuses on using
García & García-Sánchez, 2020) and natural language processing (NLP)
plant leaf images for disease detection (or classification) and employs
into a symptom analyser to provide a diagnosis and treatment based
computer vision based on DL. The (Ferentinos, 2018) study uses an
on the symptoms provided. Due to the fact that the ontology is pop-
open database containing images of 25 different plants, and (Chen ulated with information from the official Spanish guide, it is only
et al., 2021) use the dataset from the Fujian Institute of Subtropical helpful for Spanish-speaking users. (Lagos-Ortiz et al., 2017), similar
Botany in Xiamen, China. (Chen et al., 2021) introduced location- to (Rodríguez-García et al., 2021), present a decision support system
wise soft attention to pre-trained MobileNet-V2, improving disease to assist farmers in making effective decisions regarding the diagnosis
identification. On the other hand, (Ferentinos, 2018) evaluated the of plant disease. Similar to other semantics-based studies, their work
specific convolutional neural network (CNN) models, such as AlexNe- is dependent on ontology, specifically the phytopathology ontology.
tOWTBn, GoogLeNet and VGG. (Ferentinos, 2018) made an important The phytopathology ontology is developed by reusing the plant disease
discovery that CNN models trained on images of laboratory conditions ontology (American Phytopathological Society, 2016).
perform significantly worse in the real-world, dropping a success rate Similar to the case of ML-based studies discussed in Section 2.2,
(i.e. accuracy of detection) as low as 33%. this work also takes advantage of existing semantic-based studies, such
as (Jearanaiwongkul et al., 2018). In particular, this study addresses the
Similarly, (Abbas et al., 2021; Ashwinkumar et al., 2022; Bedi
following major limitations of semantic-based approaches: (i) their in-
& Gole, 2021; Nagasubramanian et al., 2019; Roy & Bhaduri, 2021;
ability to learn complex patterns like DL; and (ii) improving prediction
Sahu & Sinha, 2022) and (Shah et al., 2022) conducted studies on
accuracy.
plant disease detection and classification using images (of plant leaves)
and CNN models such as VGG-FCN-VD16, VGG-FCN-S, DenseNet121, 3. Materials and methods
and ResNet50. The study of (Shah et al., 2022), similarly to (Too
et al., 2019), uses the plantVillage dataset. However, unlike (Too This section describes the proposed approach. However, prior to
et al., 2019), which focuses on comparative studies, the work of (Shah discussing the proposed approach, the ontology modelling will be
et al., 2022) concentrates on interpretability via visualisation. The discussed in Section 3.1. This is because ontology is one of the core
study of (Ashwinkumar et al., 2022), on the other hand, proposed an components of the proposed approach and ontology also supports the
automated model for detecting and classifying plant leaf diseases and generation of user-comprehensible natural language explanations about
used MobileNet and emperor penguin optimiser algorithm. The pro- disease. Section 3.2 provides details about the proposed approach that
posed model was evaluated by conducting a simulated experiment. The combines the semantic technology and DL.
study of (Abbas et al., 2021) focuses on tomato disease classification
3.1. Ontology modelling
and uses DenseNet121 and plantVillage datasets. However, unlike other
studies using the plantVillage dataset, the study of (Abbas et al., 2021),
This section describes the ontology utilised in this research. Sec-
in addition to DenseNet121, also employs conditional generative adver-
tion 3.1.1 describes the ontology requirements (i.e. the ontology’s
sarial networks to generate synthetic images to complement the lack scope). Section 3.1.2 describes the ontology’s development, which fol-
of data. (Roy & Bhaduri, 2021) conducted a multi-class plant disease lows the ontology development guidelines from (Noy & McGuinness,
classification using an improved version of the YOLOV4 (Bochkovskiy 2001).
et al., 2020) algorithm, while (Sahu & Sinha, 2022) showed an improve-
ment in disease classification using transfer learning with models such 3.1.1. Ontology requirements
as VGG-16, Inception V3, and ResNet50. (Bedi & Gole, 2021), however, The first step to developing any ontology is to define the scope of
proposed a hybrid approach based on CNN and a convolutional autoen- the ontology, such as what it covers (or should model). The scope of
coder (CAE) network, where CAE is used to reduce the dimensionality the ontology in this research is as follows:
of the input leaf images and CNN to classify the disease based on the
• The ontology should cover three different categories (or do-
image. mains), namely, sensor observations, cassava plant disease, and
This study addresses the three major limitations of previously dis- cassava plant environments.
cussed studies. They are: (i) not including domain knowledge; (ii) only • The ontology should be able to classify the cassava disease based
using the limited contextual information of images; and (iii) the lack of on the sensor observations information in an understandable
user-level explainability along with the performance improvement. manner.

4
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 2. Domain ontology.

Following the scope, the ontology should be able to answer the com- concept 𝑐𝑎𝑠𝑠𝑎𝑣𝑎−𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐷𝑖𝑠𝑒𝑎𝑠𝑒𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 is introduced to have more
petency questions (CQs), which were derived based on the research fine-grained information about the cassava disease and represents the
question. The CQs are as follows: conditions that are favourable for the occurrence of the disease. The
𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐷𝑖𝑠𝑒𝑎𝑠𝑒𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 class is linked to a disease with
1. What are all the possible diseases that cassava plants could have?
the property enablesDisease, and its inverse enabledByCondition, which
2. What is the possible disease that cassava plants can have the
allows to infer the disease given certain disease conditions.
following symptoms of A, B, . . . ?
3. How is the cassava plant’s environment and the sensor observa-
tion related to the cassava plant? 3.2. Proposed approach
4. What are the symptoms of a cassava plant with disease X?
5. What diseases are caused by viruses (or bacteria, or fungi)? This section describes the proposed approach in detail. Fig. 3 illus-
trates the proposed method, which combines DL and semantic technol-
3.1.2. Ontology development ogy to improve cassava disease prediction and enable explainability.
Fig. 2 shows the ontology, which was developed based on the CQs As shown in Fig. 3, the proposed method consists of three major ele-
outlined in Section 3.1.1. The ontology makes use of the SOSA (Sensor, ments: the vision model, the semantic model, and the decision engine.
Observation, Sample, and Actuator) ontology (Janowicz et al., 2019), This is based on a microservices strategy where each component of
whose concepts are denoted by the prefix 𝑠𝑜𝑠𝑎 in Fig. 2. Numerous the software is built separately based on its functionality. Moreover,
ontologies exist for plant diseases, such as the plant stress ontology.1 In the proposed approach is also implemented as deployable software
addition, there is a cassava ontology2 that models information such as following the microservices strategy (see Section 4.3). The high-level
cassava characteristics. The existing ontologies are modelled as exhaus-
overview of the proposed approach, which is discussed in detail in the
tive knowledge sources, which differs from the use case of this study
subsequent sections, is summarised as follows:
and necessitates the creation of a new ontology for cassava disease. In
order to develop the ontology, the disease hierarchy from (Jearanai- • First, the cassava image is taken and passed through the vision
wongkul et al., 2018) study is adopted, in which they modelled the model, which performs the image classification using the vision
rice disease. The ontology used in this study can be easily expanded transformer, from which the prediction result is obtained in terms
to include information from other ontologies, including the cassava of the prediction probability.
ontology and the plant stress ontology. The modelled cassava disease • Second, the sensor information, such as temperature and soil
in the ontology of this study is marked by the prefix 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒, moisture, is passed through the semantic model, which also per-
as can be seen in Fig. 2. forms the classification but using semantic technology, i.e. do-
The 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑃 𝑙𝑎𝑛𝑡𝐷𝑖𝑠𝑒𝑎𝑠𝑒 models the information about main ontology and reasoning using Semantic Web Rule Language
the cassava disease with its subclass 𝑐𝑎𝑠𝑠𝑎𝑣𝑎−𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐵𝑎𝑐𝑡𝑒𝑟𝑖𝑎𝑙𝑃 𝑙𝑎𝑛𝑡− (SWRL) rules. Similar to the case of the vision model, the output is
𝐷𝑖𝑠𝑒𝑎𝑠𝑒, 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑉 𝑖𝑟𝑎𝑙𝑃 𝑙𝑎𝑛𝑡𝐷𝑖𝑠𝑒𝑎𝑠𝑒 and 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ obtained in terms of prediction probability. In addition to the pre-
𝐹 𝑢𝑛𝑔𝑎𝑙𝑃 𝑙𝑎𝑛𝑡𝐷𝑖𝑠𝑒𝑎𝑠𝑒 in a similar fashion as (Jearanaiwongkul et al., diction probability, the symptoms information and explanations
2018). The cassava disease symptoms are modelled using class 𝑐𝑎𝑠𝑠𝑎𝑣𝑎− are also obtained as outputs.
𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐷𝑖𝑠𝑒𝑎𝑠𝑒𝑆𝑦𝑚𝑝𝑡𝑜𝑚. To model the information about the field, • Finally, the prediction results from the vision model and semantic
the classes 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑆𝑜𝑖𝑙 and 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐹 𝑖𝑒𝑙𝑑 were model are combined (or ensembled) using weighted majority
used. The classes 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑆𝑜𝑖𝑙 and 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐹 𝑖𝑒𝑙𝑑
voting. In the proposed approach, the component that performs
are also connected to the sensor ontology for observing specific prop-
this task is referred to as the decision engine. After combining
erties such as humidity. Moreover, the classes 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑆𝑜𝑖𝑙
the predictions, the explanations are generated by fetching the
and 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝐹 𝑖𝑒𝑙𝑑 are also connected to the class 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 −
information from the domain ontology.
𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑃 𝑙𝑎𝑛𝑡. This is because the conditions of the field will have an
impact on the plant’s disease and are connected via object properties
𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ 𝑖𝑠𝑃 𝑙𝑎𝑛𝑡𝑒𝑑𝐼𝑛 and 𝑐𝑎𝑠𝑠𝑎𝑣𝑎 − 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ∶ ℎ𝑎𝑠𝑃 𝑙𝑎𝑛𝑡. The
3.2.1. Vision model
The vision model performs the image classification task, making dis-
1
https://wiki.plantontology.org/index.php/Plant_Stress_Ontology ease predictions based on cassava image data. This research utilises the
2
https://cropontology.org/ontology/CO_334 vision transformer within the vision model. However, the experiment

5
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 3. Proposed approach.

with other pre-trained models such as RESNEXT50_32X43 and Efficient-


NetV2S (Tan & Le, 2021b) were also performed. In contrast to CNN, the
vision transformer is the most recent advancement in computer vision
and is utilised by the vision model. The vision transformer, via the
attention mechanism, enables the modelling of long-range dependen-
cies and provides greater flexibility for modelling visual content (Yuan
et al., 2021). The vision model in this study employs the state-of-the-art
vision transformer, vision outlooker (VOLO), from (Yuan et al., 2021),
the first model of its kind to achieve an accuracy of greater than 87% on
the ImageNet4 -1K benchmark dataset without additional training data.
Fig. 4. A snippet of the CBB disease SWRL rule based on observations of soil
VOLO improves by introducing a lightweight attention mechanism
temperature that is utilised by the semantic model for disease classification.
that can represent fine-level information based on aggregation and
extrapolation and reduces
√ the complexity of expensive dot products
(i.e. Softmax(𝑄𝑇 𝐾∕ 𝑑), where d is a dimension) by linear projection.
The vision model receives the image as input and generates a predicting CBB (cassava bacterial blight) disease based on the soil
prediction regarding the plant’s health, whether it is healthy or infected temperature (or sensor observation). The inference model component
with a particular disease. The output of the prediction is the predicted of the semantic model performs the reasoning. Like the vision model,
probability, which indicates the likelihood that a particular disease will the semantic model yields the final output in terms of probability. In
occur. contrast to the vision model, the semantic model also generates disease
explanations based on the reasoning result (or prediction) using the
3.2.2. Semantic model domain ontology. Since no sensor dataset is available, the simulated
The semantic model, as its name suggests, is based on semantic tech- sensor data (see Section 4.1) are used for the semantic model. Fig. 5
nology and is the second component of the proposed methodology. The shows semantic model prediction.
semantic model performs SWRL (Semantic Web Rule Language) (Hor- Following is a summary of the overall steps involved in the semantic
rocks et al., 2004) reasoning over the input sensor observation, such as model:
soil temperature, soil moisture, and relative humidity, using the domain • First, inference model component of the semantic model obtains
ontology (see Section 3.1). The SWRL reasoning can be defined as a the sensor observation, such as temperature and soil moisture.
reasoning that corresponds to finding new assertions 𝐴̂ ∋ 𝐴 ∈ 𝐻 • The inference model then populates (or annotates) the domain
based on the set of SWRL rules 𝑅 of the form 𝑅 = 𝐵 → 𝐻 applied
ontology with the received sensor observation (i.e. creates KG
on the domain ontology 𝑂 of the form 𝑂 = (𝑇 , 𝐴), where 𝑇 refers
instance).
to the terms (also called the vocabulary) that capture the particular
• Lastly, the corresponding SWRL based on sensor observations is
domain and is the union of the classes (𝐶) and properties (𝑃 ). The
retrieved, and reasoning is performed for disease prediction.
𝐴 ∋ 𝐴𝐶 ∪ 𝐴𝑃 in ontology denotes assertions made regarding classes and
properties. The 𝐵 in the SWRL rules represent the body axioms, also
3.2.3. Decision engine
called as antecedent and 𝐻, which is also referred to as consequent,
is the head axiom. Fig. 4 shows the snippet of the SWRL rule used The decision engine combines (or ensembles) the predictions from
in this study, where ℎ𝑎𝑠𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛(?𝑝𝑙𝑎𝑛𝑡, 𝐶𝐵𝐵𝑆𝑜𝑖𝑙𝑇 𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) ∈ 𝐻 the vision model and the semantic model to produce the final predic-
and the remaining belongs to 𝐵. The SWRL rule in Fig. 4 is use for tion. The steps involved in combining the results of the semantic model
and vision model are depicted in Algorithm 1. As shown in Algorithm 1,
the predictions represented by the probabilities from the vision model
3
Dhttps://pytorch.org/vision/main/models/generated/torchvision.models. (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑉 𝑖𝑠𝑖𝑜𝑛𝑚𝑜𝑑𝑒𝑙) and semantic model (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐𝑚𝑜𝑑𝑒𝑙)
resnext50_32x4d.html are obtained. The predictions from the vision model and the semantic
4
https://www.image-net.org model are then ensembled (𝑓 𝑖𝑛𝑎𝑙𝑃 𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛). The weighted majority

6
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 5. A snippet of prediction from the semantic model after performing SWRL reasoning over the sensor observation.

voting (Raschka & Mirjalili, 2017), as indicated by Eq. (1), is used for score from the individual vision and semantic models, are returned in
ensembling the prediction results. The reason for considering weighted a JSON (JavaScript Object Notation) format as shown in Fig. 6.
majority voting is that it allows for fine-grained control over the ∑
𝑃 𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛_𝑒𝑛𝑔𝑖𝑛𝑒 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑛𝑖=1 𝑤𝑖 𝑝𝑐𝑖
prediction results, as a greater weight can be assigned to the classifier 𝑐𝜖{𝐶𝐵𝐵, 𝐶𝐵𝑆𝐷, 𝐶𝐺𝑀, 𝐶𝑀𝐷, 𝐻𝑒𝑎𝑙𝑡ℎ𝑦} (1)
on which one wishes to rely. This is beneficial in certain situations, like 𝑖 = {𝑉 𝑖𝑠𝑖𝑜𝑛 𝑚𝑜𝑑𝑒𝑙, 𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝑚𝑜𝑑𝑒𝑙}

those with poor visibility. For instance, during periods of poor visibility,
such as the winter or rainy season, it may be desirable to rely on sensor Algorithm 1: Algorithm to combine the prediction results from the
observations that are not affected by inclement weather, which in the vision model and semantic model
Input: Disease predictions from the semantic model and vision model
case of this study is the semantic model. This can be accomplished Output: Disease prediction with user-level explanation about disease
by assigning a greater weight to the semantic model’s predictions. 𝑖 1 predictionVisionmodel ← Vision model predicted probability score;
represents the classifier and 𝑝𝑐𝑖 represents the predicted probability. 2 predictionSemanticmodel ← Semantic model predicted probability score;
3 finalPrediction ← combine_predictions(predictionVisionmodel,
𝑤𝑖 in Eq. (1) is the weight of the classifier. In this study, there are predictionSemanticmodel);
two different classifiers: the semantic model and the vision model, and 4 explanation ← get_explanation_about_disease(finalPrediction);
therefore, the value of 𝑖 in Eq. (1) is 2. The CBSD (cassava brown streak 5 finalResult← combine_results_for_user(finalPrediction, explanation,
predictionVisionmodel, predictionSemanticmodel);
disease), CMD (cassava mosaic disease), CBB (cassava bacterial blight), 6 return finalResult;
CGM (cassava green mite) and Healthy in Eq. (1) represent different
classes (or target variable).
4. Experiment
For example, let 𝑝𝑣𝑖𝑠𝑖𝑜𝑛_𝑚𝑜𝑑𝑒𝑙 = [0.21, 0.05, 0.05, 0.19, 0.5]5 be the
prediction (i.e. as predicted probability) from the vision model and
This section provides details about the experiment. Section 4.1
𝑝𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐_𝑚𝑜𝑑𝑒𝑙 = [0.49, 0.3, 0.4, 0.5, 0.19] be the prediction from the seman-
provides information about the datasets and Section 4.2 provides in-
tic model. With only a vision model, the prediction would be ℎ𝑒𝑎𝑙𝑡ℎ𝑦
formation about the libraries and system used for the implementation
and the prediction only based on a semantic model would be the 𝐶𝑀𝐷.
and to conduct experiment. Section 4.3 details the implementation of
Now, if the weight 𝑤𝑣𝑖𝑠𝑖𝑜𝑛_𝑚𝑜𝑑𝑒𝑙 = 0.3 is assigned to the vision model
the proposed approach as deployable software. In a similar fashion,
and 𝑤𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐_𝑚𝑜𝑑𝑒𝑙 = 0.8 to the semantic model, the resulting prediction
Section 4.4 describes the evaluation metrics, and Section 4.5 describes
would be 𝑃 𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛_𝑒𝑛𝑔𝑖𝑛𝑒 = [0.414, 0.232, 0.305, 0.415, 0.275] and
training and testing.
the disease (or resulting target label) would be 𝐶𝑀𝐷. However, if the
weight is changed as 𝑤𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐_𝑚𝑜𝑑𝑒𝑙 = 𝑤𝑣𝑖𝑠𝑖𝑜𝑛_𝑚𝑜𝑑𝑒𝑙 = 0.5, the final pre-
4.1. Datasets
diction would be 𝑃 𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛_𝑒𝑛𝑔𝑖𝑛𝑒 = [0.35, 0.175, 0.225, 0.345, 0.345]
and the disease would be disease 𝐶𝐵𝐵. This weighted majority voting
This research utilises the cassava image dataset made available by
approach used by the decision engine, therefore, allows fine-grained
the Makerere University AI Lab6 for training the convolutional neural
control over the predictions. At the same time, if the weights differ by
networks (or image classification). The Makerere University AI Lab
a large margin, say 0.8 (or 80%), that can have a negative impact on
provided two distinct datasets for a Kaggle competition, one in 20197
the prediction. Therefore, it is recommended not to have a large weight and one in 2020.8 (Mwebaze et al., 2019) In this study the combined
difference unless one wishes to rely more on one model (or classifier). dataset from (2019 and 2020), which is available at (Gohil, 2021)
Moreover it is recommended to utilise the suitable weights and finding is utilised. The combined dataset consists of 27053 image samples in
the suitable weights necessitates experimentation as the weights differs total. The dataset, along with the healthy images, contains the four
from use case to use case similar to other ML hyperparameters. most common cassava diseases: CBSD, CMD, CBB and CGM. The details
Following the combination of the predictions from the vision model about these diseases are available in the study by (Mwebaze et al.,
and the semantic model, the final prediction is obtained. The final 2019). Fig. 7 shows the distribution of four prevalent cassava diseases
prediction about the disease is then used for generating the user- and the healthy images in this study’s dataset. From Fig. 7, it can
comprehensible natural language explanations. The explanations are observed that the dataset contains only a small number of healthy
generated by retrieving disease information from the domain ontology samples, which is 2893, out of the total dataset. Moreover, it can also
based on the final results of the prediction. The final prediction results, be observed that the occurrence of the CMD disease is more prevalent,
along with the user-comprehensible explanations and the confidence

6
https://air.ug/
5 7
The probabilities of occurrence are as follows: CBB, CBSD, CGM, CMD, https://www.kaggle.com/c/cassava-disease
8
and Healthy. https://www.kaggle.com/c/cassava-leaf-disease-classification

7
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 6. The final prediction result after combining the predictions from the vision model and semantic model along with the user-level explanations about the disease and the
prediction confidence of the vision model (visual_certainty) and semantic model (knowledge_certainty).

Fig. 7. Distribution of cassava image data based on target class labels.

which is why CMD occupies a large number of samples in the dataset, environment (Chhetri, Dehury, et al., 2022), which is a situation analo-
followed by CBSD, CGM, and CBB. gous to the generation of cassava disease sensor data in this study. The
The dataset was collected via crowd sourcing from approximately dataset was created using information obtained from experts about the
200 farmers whose farms were located in various regions of Uganda cassava plant. The simulated sensor data was generated with a uniform
(Mwebaze et al., 2019); as a result, their quality varies, making it distribution range for each of the five classes, simulating the favourable
challenging to perform computer vision tasks. Fig. 8 depicts the visu- disease condition. On the basis of available expert knowledge about
alisation of the images extracted from the dataset for each disease and cassava plants, the distribution’s limits were chosen manually so that
the healthy sample. The 2020 dataset contains additional complexities, the probability 𝑝 of a certain condition occurring is 50 < 𝑝 <= 65.
such as multiple diseases associated with each plant, in comparison to Moreover, Table 1 shows the simulated sensor data for each of the
the 2019 dataset (Mwebaze et al., 2019). Therefore, the complexity of cassava diseases. The minimum and maximum values (i.e. indicated
the combined dataset is anticipated to be greater. by 𝑀𝑖𝑛 and 𝑀𝑎𝑥) in Table 1 indicate the limits of the uniform dis-
The semantic model (or for semantics-based disease classification) tribution, while the 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 indicates the likelihood that the input
requires sensor data, such as soil information for cassava plants, but no will fall within the specified disease. The lower and upper boundaries
such dataset exists. Because of this, the simulated data were generated. (i.e. 𝐿𝑜𝑤𝑒𝑟 and 𝑈 𝑝𝑝𝑒𝑟 in Table 1) indicate the rule boundaries used
This approach to the generation and use of simulated data is inspired in the SWRL reasoning for semantics-based classification. In the case
by other areas of computer science like cloud computing, ML, and of ‘‘healthy’’, however, there is no probability value. This is because,
predictive maintenance research, where, in the absence of real-world for other diseases, the same value of sensor observation can cause
data, simulated data is generated and used (Anzolin et al., 2021; different diseases. For example, if the soil moisture value is 0.5, then
Fakhfakh et al., 2017; Greff et al., 2022; Kannammal et al., 2023; it could lead to both CBB and CBSD. This is not the case in the case
Rawat et al., 2021). For example, in a study by Chhetri et al. in the of ‘‘healthy’’ and is the reason for not having the probability value.
absence of target labels in the dataset, the target labels were generated The effects of moisture, humidity, PH (Potential of Hydrogen), and
manually following the failure characteristics of the cloud computing temperature on crops (i.e. including cassava) have been extensively

8
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 8. Images of cassava corresponding to the five classes in the dataset, including healthy cassava leaves and four prevalent diseases.

Table 1 Table 2
The rules that are used for generating sensor data. Moisture and relative humidity are Overall configuration of the LEO4.
measured in percentage, indicated by the range 0–1, and temperature in degrees Celsius Node type Nodes Cores/Nodes Memory/Nodes GPUs
(0–100). The PH is measured in units between 0–14.
Standard 44 28 × Broadwell 64 GB –
Disease Sensor Lower Upper Min Max Probability
Big memory 4 28 × Broadwell 512 GB –
Soil moisture 0.3 1 0.1 1 77.78% Fat memory 1 80 × Skylake 3000 GB –
CBB Soil PH 6.5 7.2 6.3 7.5 58.33% GPU 1 28 × Skylake 384 GB 4 × Nvidia Tesla V100
Soil temperature 25 30 23 31 62.50%
Relative humidity 0.7 0.85 0.65 0.88 65.22%
CBSD Soil moisture 0.1 1 0 1 90.00% Table 3
Soil temperature 10 32 5 33 78.57% Specification of Nvidia Tesla V100 GPU used in LEO4.
Soil moisture 0.7 1 0.55 1 66.67% Double precision 7.8 TFlop/s (Teraflops per second)
CGM Relative humidity 0.7 1 0.6 1 75.00% Single precision 15.7 TFlop/s
Soil temperature 27 40 24 40 81.25% Performance
Tensor 31.3 TFlop/s
Temperature 30 50 24 50 76.92% 125 TFlop/s
Soil moisture 0.3 1 0.1 1 77.78% Interconnect NVLINK 300 GB/s
CMD
Soil temperature 20 32 18 34 75.00%
Relative humidity 0.8 1 0.7 1 66.67% Capacity 32 GB
Memory
Bandwidth 900 GB/s
Soil moisture – – 0.2 0.8 –
Soil moisture – – 0.2 0.8 –
Soil temperature – – 5 40 –
Healthy Table 4
Relative humidity – – 0.2 0.8 –
temperature – – 10 50 – A list of the libraries and software used to implement
Soil PH – – 3 10 – the proposed method.
Libraries/Software Version
Python 3.10
Docker 20.10
studied, therefore, the readers are recommended to studies (Luampon Owlready2 0.37
& Charmongkolpradit, 2019; Seena Radhakrishnan et al., 2022) for FastAPI 0.75.1
additional information on their implications. Requests 2.27.1
BentoML 1.0.0a7
PyTorch 1.12.0
4.2. System setup timm 0.6.5

This section describes the software and libraries utilised in the im-
plementation of the proposed system. In addition, this section includes
details about the systems used to conduct the experiment. processors. Similarly, the LEO4 GPU (graphics processing unit) node
Table 4 shows the list of the libraries and software that were utilised utilised in the experiment consists of 4 Nvidia Tesla V10018 GPUs with a
to implement the proposed work. The implementation is performed total memory of 384 GB (gigabytes) per node. Table 3 shows the detail
in the Python,9 programming language. Docker10 FastAPI,11 and Re- specification of the LEO4 GPU, Nvidia Tesla V10019 providing details on
quests12 are used to modularise the implementation. Owlready213 is performance, memory and interconnectivity. In particular, GPU used in
used to deal with the semantic model, namely, ontologies and KGs, LEO4 has NVLink20 connectivity for high-speed data transfer. Moreover,
and to perform reasoning. PyTorch14 is used for the implementation the LEO4 uses a high-performance, low latency 100 Gb/s (gigabits per
of the vision model and BentoML15 is used for model serving purposes. second) infiniband interconnect for MPI (Message Passing Interface)
The library timm16 is used for model implementations and ImageNet communications in order to communicate between nodes and GPFS
weights. (General Parallel File System) file system.
The supercomputer LEO417 is used to perform the experiment. LEO4
4.3. Implementation
is a high performance compute cluster at the University of Innsbruck.
Table 2 shows the overall configuration of the LEO4. LEO4 consists
This section describes the implementation details of the proposed
of a total of 50 nodes totalling 1452 cores and a total memory of 8.4
approach as deployable software to facilitate the reusability and acces-
terabytes. LEO4 is powered by either Broadwell or Skylake Intel Xeon
sibility of the proposed work in real-world deployment scenarios. The
implementation follows the microservices strategy (or architecture).
9
https://www.python.org/ This is because of the scalability that microservices offer. The microser-
10
https://www.docker.com/ vices allow scaling of the individual components independently. With
11
https://fastapi.tiangolo.com/ the microservices architecture, for instance, compute-intensive system
12
https://requests.readthedocs.io/en/latest/
13
https://owlready2.readthedocs.io/en/v0.37/
14 18
https://pytorch.org/ https://www.nvidia.com/en-us/data-center/v100/
15 19
https://docs.bentoml.org/en/latest/ https://www.uibk.ac.at/zid/systeme/hpc-systeme/common/software/
16
https://timm.fast.ai/ leo-gpu.html
17 20
https://www.uibk.ac.at/zid/systeme/hpc-systeme/leo4/ https://www.nvidia.com/en-us/data-center/nvlink/

9
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

components can be deployed on high-performance servers or cloud is not. The false negative, however, is similar to the false positive but
systems. The source code for the implementation and other resources, for the negative class.
such as ontology, are available at (Chhetri, Hohenegger, et al., 2022). 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (2)
For the implementation of the vision model service, BentoML is 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
used, which is an open-source model serving framework at a production
𝑇𝑃
scale and provides easy deployment of the ML models (see footnote 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
𝑇𝑃 + 𝐹𝑃
15). The implementation of BentoML for model serving includes the
following steps: (i) saving the trained model weight with BentoML; (ii) 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (4)
𝑇𝑃 + 𝐹𝑁
defining the service configuration, such as defining the API (Application
Programming Interface) service for prediction using the @svc.api dec- 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 × (5)
orator; and (ii) performing input image preprocessing, such as resizing. 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
The runner is then invoked, which translates the API definition into an
4.4.2. Prediction latency
HTTP (Hypertext Transfer Protocol) endpoint /predict for making a The prediction latency quantifies the time required to make a
prediction. prediction. The prediction latency is measured by calculating the dif-
The semantic model (or semantic classifier) REST (Representational ference between the invocation and response time using Eq. (6). The
State Transfer) Service implementation makes use of libraries such 𝑡𝑖 in Eq. (6) represents the prediction invocation time and 𝑡𝑟 is the
as FastAPI and Owlready2 and the domain ontology. The seman- response time.
tic model consists of the three endpoints. /soils/{soil_id}/
observations is used for sensor observations of soil data, such 𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 = 𝑡𝑖 − 𝑡𝑟 (6)
as soil temperature. The second endpoint, /fields/{field_id}/
observations, is used for observations in the field, such as whether 4.5. Training and testing
it rained. The endpoint /plants/{plant_id}/disease-vector
is used to retrieve the specific disease information about the particular This section describes the training and testing of the different pre-
plant based on its unique identifier (ID). The ID in this study is trained models, RESNEXT50 (Xie et al., 2017), EfficientNetV2S (Tan
initialised as a number between 1–6. Using the domain ontology and & Le, 2021a) and VOLO, that were experimented with. These models,
SWRL reasoning, the disease information is retrieved. Pellet Reasoner RESNEXT50, EfficientNetV2S, and VOLO, were chosen based on their
available in Owlready2 is used to perform the reasoning. In addition, superior performance, which has been shown in earlier studies such
in order to prevent any unnecessary inconsistencies in predictions, the as (Chen et al., 2022) and (Ravi et al., 2022). In addition, this sec-
previous observations are deleted. The domain ontology, which is saved tion describes the testing of the proposed method that combines the
predictions from the vision model and the semantic model.
as OWL file, is loaded at service startup and stored in an internal triple
The training and testing of the different pre-trained models, VOLO,
store.
EfficientNetV2S and RESNEXT50 was performed using the cassava
The decision engine, which is implemented as another service com-
image dataset (see Section 4.1). 80% of the dataset was used for
ponent, consists of the realisation of the decision engine steps out-
training, and the remaining 20% was used for testing. Fig. 9 illustrates
lined in Section 3.2. The decision engine takes the image and plant
the distribution of the test data, which demonstrates that, similar to
information as an input and calls REST endpoints /predict and
the original dataset, the test data is unbalanced. Transfer learning was
/plants/{plant_id}/disease-vector from the image classi-
utilised for the model’s training using Pytorch. Similarly, the library
fier service component and semantic models service component to
timm was used to load the model and pretrained ImageNet weights.
obtain the predictions. The obtained predictions are then combined to
The hyperparameters used for different pre-trained models for training
produce the final prediction along with their explanations. The relevant
are specified in Table 5. The values of the hyperparameters in this
information is retrieved from the domain ontology and the explanations
study were determined based on experimentation and the authors’ prior
are generated based on the obtained predictions. experience with ML research (Chhetri, Dehury, et al., 2022; Chhetri,
Kurteva, et al., 2022). The training was performed over 10 epochs with
4.4. Evaluation metrics a batch size of 32 for RESNEXT50_32X4D and 16 for EfficientNetV2S
and VOLO. The other significant criterion is the learning rate, which
In ML, the evaluation metrics help to understand the performance determines the success of the learning process (how well the algorithm
of the model (or how well the model is likely to perform) in an untested learns). The Adam optimiser21 with a learning rate of 10−4 and weight
scenario. In this study, the evaluation of the proposed approach is made decay of 10−6 is employed. The Adam optimiser is an adaptive learning
in terms of accuracy and prediction latency to make the inference. rate optimiser capable of handling dynamic situations, such as a loss
that is either increasing or decreasing. Moreover, the learning rate
4.4.1. Accuracy, precision, recall and F1 score scheduler is also used with a cosine annealing schedule with warm
restarts (Loshchilov & Hutter, 2016). As a loss function, the Taylor
The accuracy measures the overall correct predictions made by the
Cross Entropy Loss (Feng et al., 2021) with label smoothing (Müller
model. Precision, also known as the positive predicted value, measures
et al., 2019) is used, which has been demonstrated to be robust against
the model’s exactness, i.e. the proportion of positives correctly identi-
the noisy dataset (Chen et al., 2022). A value of 0.05 is used for label
fied, whereas recall measures the actual positives correctly identified.
smoothing. Additionally, following the findings of the previous works,
Recall is also known as sensitivity or the rate of true positives. The F1
the different data augmentation techniques are used, details of which
score represents the harmonic mean of the precision and recall.
are available in Table 5. In the case of the semantic models, however,
Accuracy, precision, recall and F1 score can be calculated using Eqs.
no training is required as the semantic models are based on SWRL rules
(2), (3), (4) and (5) respectively. In Eqs. (2), (3), and (4), the 𝑇 𝑃 stands
and the rules are defined manually.
for the true positive value and the 𝑇 𝑁 for the true negative. In the
In the case of the testing, however, different levels of testing were
same way, 𝐹 𝑃 in represents a false positive, and 𝐹 𝑁 is a false negative.
performed. The different tests that were performed are presented be-
True positives and true negatives are values that the model accurately low.
predicts for both the positive and negative class labels, in this case,
cassava disease and healthy. The false positive, on the other hand, is the
21
prediction that is identified by the model as positive while in reality, it https://pytorch.org/docs/stable/generated/torch.optim.Adam.html

10
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 9. Class-based distribution of test cassava image data.

Table 5
Hyperparameters of various CNN-based vision models and a vision transformer, where LR is the learning rate, p is the probability, std is the
standard deviation, T_0 represents the number of iterations for the first restart, T_mult is a factor increase for 𝑇𝑖 after restart and eta_min is
the minimum LR.
Model Batch Hyperparameters
size
LR Loss LR scheduling Data augmentation
RESNEXT50_32X4D 32 1.00E−04 Taylor cross Cosine RandomResizedCrop
entropy loss annealing HorizontalFlip (p = 0.5)
with label warm restarts Transpose (p = 0.5)
smoothing scheduling VerticalFlip (p = 0.5)
with T_0 = ShiftScaleRotate (p = 0.5)
10, Normalise(
T_mult = 1 mean = [0.485, 0.456, 0.406],
and std = [0.229, 0.224, 0.225],)
eta_min =
1e-6
EfficientNetV2S (256 × 256) 16
EfficientNetV2S (384 × 384) 16
VOLO-D1 (224 × 224) 16
VOLO-D2 (224 × 224) 16
VOLO-D1 (384 × 384) 16
VOLO-D2 (384 × 384) 16

1. The first test are conducted on the various pre-trained computer also asked about the usefulness of these explanations, both based
vision models that are being examined. 20% of the test cassava on SHAP and the proposed method in terms of understanding the
dataset was used for this purpose. disease. In addition to questions about explainability, the survey
2. The second evaluation concerned the proposed method, which also asked questions related to demographics, age groups, and
combines a semantic model and a vision model to improve the education. The details of how the survey is conducted and the
accuracy of disease predictions and provide user-level disease evaluations will be presented in Section 5
explanations. As different computer vision models are experi- 3. The third test is about latency. As this study implements the pro-
mented with, the one with the highest performance is selected posed approach into a fully deployable system (see Section 4.3),
(evaluated in Step 1) to combine with the semantic model. The the prediction latency is also evaluated.
explainability was evaluated qualitatively using three identified
dimensions: (i) correctness, i.e. if the generated explanations are The results of the tests conducted will be discussed in Section 5.
correct; (ii) usefulness, i.e. whether the generated explanations
are helpful to the users; and (iii) comprehension, i.e. whether the 5. Results and discussion
generated explanations are understandable to the users. The ac-
curacy of the explanations was manually examined by inspecting This section provides the experimental results from the experiment
the explanations that were generated. Regarding the remaining in this study. Moreover, this study also provides a comparison of the
two dimensions, comprehension and usefulness, an online sur- experimental results with the state-of-the-art study.
vey22 was conducted. The survey asked participants if they were As discussed in Section 4.5, the first evaluation is about pre-trained
able to understand the SHAP-based explanations as well as the models. Table 6 shows the accuracy of the vision transformer, VOLO,
explanations provided by the proposed approach. The survey and also other CNN models, such as EfficientNetV2S and RESNEXT50,
that were experimented with. In the result of the vision model, only
the accuracy score of the models experimented with except for the
22
The survey questions are available in GitHub where the code is stored. best model, for which the precision, recall, and f1 score were also

11
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Table 6 Table 7
A performance comparison of various CNN-based vision models and a Individual class-level performance evaluation of the VOLO-D2 model with an image
vision transformer. size of 384 × 384, i.e. the best performing vision model.
Model Image size Accuracy (%) Class name Precision Recall Accuracy F1-score
RESNEXT50_32X4D 256 × 256 0.8721 CBB 0.699 0.652 0.964 0.675
EfficientNetV2S 256 × 256 0.8785 CBSD 0.864 0.853 0.963 0.859
EfficientNetV2S 384 × 384 0.8840 CGM 0.864 0.801 0.963 0.831
VOLO-D1 224 × 224 0.8256 CMD 0.949 0.975 0.955 0.962
VOLO-D2 224 × 224 0.8764 Healthy 0.768 0.754 0.948 0.761
VOLO-D1 384 × 384 0.8914
VOLO-D2 384 × 384 0.8964

recorded. The reason for only recording accuracy, despite the fact
that sometimes accuracy can be deceptive, is because of the test data
distribution (see Section 4.5), which is unbalanced but not in such a
manner that it can lead to deceptive results. This is also consistent
with other studies (Chen et al., 2022; Paiva-Peredo, 2023) that are
used for comparison with the work of this study, which also only
records accuracy. From Table 6, it can be observed that the VOLO
(or vision transformer) clearly outperforms the corresponding CNN
models. The other interesting observation that can be made is that the
VOLO is also the model with the lowest performance. The VOLO-D1
with an image size of 224 is the model with the lowest performance,
while the VOLO-D2 with an image size of 384 is the model with the
highest performance. Moreover, the precision, recall, and F1 scores for
the best performing model, VOLO-D2, are 0.828, 0.807, and 0.818,
respectively, indicating the robustness of the model. However, the
accuracy of the VOLO in this study is consistent with the original study,
in which VOLO-D1 was the lowest performing model. The second best-
Fig. 10. Confusion matrix of the VOLO-D2 model with an image size of 384x384,
performing model is EfficientNetV2S, with an image size of 384. This
i.e. the best performing vision model.
is similar to the VOLO where both VOLO-D1 and VOLO-D2 with an
image size of 384 are performing better. This is because as the size gets
smaller, the features get lost and have an impact on the performance
as observed. The EfficientNetV2S and RESNEXT50, with an image size 0.0086%. While the accuracy improvement is modest, it is still superior
of 256, have a comparable performance with a small variation of to the state-of-the-art.
In addition to the evaluation of the performance, explainability is
0.0063%. The best performing VOLO model makes an improvement of
also evaluated. A qualitative evaluation was performed using the three
0.012% compared to the best performing EfficientNetV2S model and
dimensions: (i) correctness; (ii) usefulness; and (iii) comprehension to
an improvement of 0.024% compared to that of RESNEXT50.
determine whether the generated natural language explanations about
Fig. 10 likewise shows the confusion matrix of the VOLO (best
disease (see Section 3.2.3) were helpful and informative enough to
performing) model and Table 7 shows the individual class-level per-
convey the disease information to users. For correctness, the generated
formance evaluation of VOLO. As can be observed from the confusion
explanations were checked by manually inspecting the explanations
matrix and also from the precision and recall, there is still high misclas-
by the authors and comparing the explanations against the informa-
sification. For example, the higher accuracy can be seen in the case of
tion present in the ontology, as the explanations were based on the
CBB but a huge drop in precision and recall. This is because of the class
information present in the ontology. The generated explanations were
imbalance. A similar observation has been made by (Chen et al., 2022)
found to be correct (100%) and aligned with the predictions. The
and (Paiva-Peredo, 2023) in their study, observing misclassification due
second dimension is usefulness, where the generated explanations were
to a large class imbalance. In the case of CMD, however, no such drop evaluated to see whether they were informative enough. The compre-
is observed, all precision, recall, and accuracy are higher, greater than hension dimension is the third. This evaluation examines the generated
or equal to 94%. However, upon close observation of the precision and natural language explanations to determine if they are sufficiently
recall, there is no significant difference between them, indicating no comprehensive and written in a natural language that avoids technical
case of overfitting or underfitting. Moreover, similar conclusions can jargon that is incomprehensible to non-expert users. Regarding the
be drawn from the F1 score. The reason for this is that the dataset evaluation of the second and third dimensions, i.e. usefulness and
that is used in this study contains noisy data. The images were taken comprehension, an online survey was conducted following the study
by farmers and have different lighting conditions, for example, and by (Ball, 2019). Fig. 11 depicts the steps taken to assess the second
therefore, have an impact on the performance. Moreover, studies such and third dimensions.
as (Ferentinos, 2018) have demonstrated that the accuracy of computer First, the designed survey was shared among the participants to
vision models in real-world datasets can drop as low as 33 percent. gather their opinions. The participants were invited via private chan-
As discussed in Section 4.5, the second evaluation focuses on the nels and their peer group. The following criteria were used for inviting
proposed method, which is to combine the vision model with the the participants: (i) they had to be over 18 years old; and (ii) they
semantic model. For this, the best performing vision model is taken, had to have a valid email address and access to a computer or mobile
which in the case of this study is VOLO-D2 with an image size of phone in order to be able to take the survey. In addition, the farming
384 × 384 (or VOLO-D2@384x384). Table 8 shows the accuracy after experience, which is a major focus, is also considered but not a hard
combining the vision model with the semantic model. Moreover, Ta- requirement. This is to evaluate the benefits of the explanations of
ble 8 also shows the weight that is used for weighted majority voting. the proposed approach in other domains, as the proposed approach is
As can be clearly observed that by integrating the semantic model, generalisable. While inviting participants from other domains, the edu-
an improvement in the accuracy has been made. The enhancement is cation level was taken into account. The final invited participants had

12
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 11. The evaluation process for an online survey.

backgrounds in computer science, healthcare, social science, agricul- The original input image, the corresponding prediction (top 1), and
ture, finance, education, and engineering. A total of 22 responses were the SHAP explanation values were displayed in order to assess the ex-
collected. Since the survey was conducted online, random responses plainability of the SHAP. The SHAP explainability image from Section 1
are possible. To avoid random responses, the collected survey data is used. Then, a yes-or-no question is asked to determine whether the
was subsequently filtered based on the free-text responses provided survey participants found the explanation helpful for understanding
in the survey. Following data filtering, two responses were eliminated
the disease, i.e. the predicted disease. However, prior to posing a
from the survey. The responses were removed when they were highly
question, the information regarding the meaning of SHAP values is
inconsistent, no corresponding explanations were provided, or random
provided. After analysing the survey data, an insightful observation
answers were given in the free-text option. After preprocessing, the
remaining 20 responses were used for additional analysis and to draw can be made. Fig. 14 compares the SHAP-based explanations and the
conclusions. The responses were anonymised, cleaned,23 i.e. removing explanations based on the proposed approach in terms of usefulness
free-text responses as they were not consistent and organised as part and comprehension. As can be observed from Fig. 14(a), 65% of the
of the preprocessing. Furthermore, as a part of the preprocessing, respondents find the SHAP explanations (see Fig. 1) useful, while 35%
adjustments to the responses,24 were also made for the question that did not find the SHAP explanations helpful. However, in terms of
asked the participants to rate the usefulness of the explanations (mea- comprehension (see Fig. 14(b)), 65% of respondents were not able
sured on a scale between 1 and 5). This is because of the observed to understand the disease following SHAP-based explanations. The
anomaly. For example, the participants responded that they did not find remaining 35% of survey participants who were able to understand
the explanations useful and also did not understand the explanations, the SHAP-based explanations could be attributed to those with higher
but despite this, some participants rated the explanations as useful
education qualifications and someone with a background in computer
(measured on a scale between 1 and 5). Corrections to such responses
science or someone who has been working in a related field. Moreover,
were made by adjusting the given rating to 1, which means ‘‘not at all
useful’’. The analysis is performed using Tableau25 a data visualisation Fig. 15 shows the fine-grained analysis of the SHAP-based explanations.
and analysis tool. As can be observed from Fig. 15, 25% of the participants find the
The participants were from different age groups and geographical SHAP-based explanations very useful, 20% find them slightly useful,
distributions. 50% of the participants were in the age group of 25– 15% find them moderately useful, and 5% find them extremely useful.
34, 25% were in the 18–14 age group, 10% were in the 35–44 age Moreover, following the occupational statistics (see Fig. 12), it can said
group, 10% were in the 45–54 age group, and the remaining 5% were that the SHAP-based explanations are not very useful, even for highly
in the age group of 65 and above. Similarly, the 55% of the participants qualified people such as researchers. Overall, a significant proportion of
were from Nigeria, 15% were from Ghana, and the remaining 10% each participants find the SHAP-based explanations to be little to moderately
came from Austria, the Netherlands, and the United States of America. useful.
In terms of educational qualifications, a majority of the participants
Similarly, when it comes to the explainability of the proposed
(50%) have either a bachelor’s degree completed or are pursuing one.
method for this study, as shown in Fig. 14(c), 95% of participants
Similarly, 30% of respondents are either studying for or have completed
a PhD, followed by those with a master’s degree (completed or studying find the explanations helpful (or useful), with 5% being the exception.
for), who constitute 15% of the participants. 5 percent of the partici- Regarding the comprehension, 85% of the participants (see Fig. 14(d))
pants are at the diploma level. The occupational backgrounds, which were able to understand the explanations of the proposed approach,
are shown in Fig. 12, of the participants vary widely. However, despite while 15% were not able to understand. This indicates that there is
the varying occupational backgrounds, the majority of the respondents still room for improvement in the proposed approach, which can be
are either still involved in or were involved in farming in the past. Sixty- one of the future research directions. Similar feedback was received
five percent of respondents indicated that they are currently engaged in the open-text response, where participants mentioned that there
in farming or have been in the past. is a need to add examples of symptoms in explanations, suggested
In terms of AI familiarity, the majority of survey respondents, adding images along with textual explanations, and also suggested
90%, were familiar with AI. However, as can be observed in Fig. 13,
providing explanations on how to prevent the illness. A similar response
the majority of them were not experts, only around 5% had a good
to not being able to see the images is also provided in the case of
understanding of AI. The remainder had either heard of AI or had
a moderate understanding of it. Following the occupational data, a the usefulness of proposed method explainability where it was marked
conclusion can be made that the majority of respondents are non- with response ‘‘no’’, i.e. participants did not find useful. Table 10 in
expert AI users, which makes the survey results more useful for the Appendix provides the free-text survey responses from the participants
evaluation, as the explainability is geared towards regular users and for both SHAP and the proposed approach. Additionally, in comparison
not AI specialists. with the comprehension of SHAP-based explainability, the proposed ap-
proach’s comprehension is 50% higher, demonstrating the advantages
of this study’s explainability approach.
23
The cleaned version of the survey data is available in the GitHub
Furthermore, Fig. 16 further depicts a fine-grained evaluation of
repository where the code is present.
24 the usefulness (or helpfulness) of the explanations generated by the
Only the responses related to SHAP explanations (or the breakdown of
SHAP explanations’ usefulness) were modified. Adjustments were made to four proposed method. As can be observed from Fig. 16, with the exception
responses. of 5% of participants, all of them find the explanations generated by the
25
https://www.tableau.com/ proposed approach useful. Overall, 10% find the explanations of the

13
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 12. Occupation-based distribution of survey respondents.

Fig. 13. The familiarity of the participants with AI, which is measured on a scale of 1–5. The respective options are: (1) never heard of it, (2) heard of it, (3) know a little, (4)
know a fair amount, and (5) know it well.

proposed approach extremely useful, 60% find them very useful, 20% participants from diverse fields were able to understand and find the
find them moderately useful, and 5% find them slightly useful. Upon explanations useful demonstrates the very high value of the proposed
comparing the usefulness of the proposed approach with the SHAP- explainability approach, even in other domains, and (iv) despite being
based explanations, it clearly shows the benefit of the explainability highly effective, there is still room for improvement in the explainabil-
based on the proposed approach and validates the claim about the ity of the proposed approach, which is to show where in the plant the
benefits of the proposed explainability approach. diseases are and add more explanations to the cause and remedy of the
In summary, the following conclusions can be drawn from the diseases.
evaluation of the survey response: (i) the visualisation of SHAP-based Table 9 shows the comparison of the proposed approach results
explanations is still useful even if they are not comprehensible, (ii) with the state-of-the-art studies in terms of the accuracy. The study’s
the proposed approach’s explainability is very effective both in terms relevance (i.e. the study utilises the same dataset as in this study) is
of usefulness and comprehension compared to explanations based on the primary criterion for its selection. The second criterion is based
SHAP (or LIME), which are the popular and becoming a de-facto on its recency (2022 and 2023). In addition, the best performance is
standard in ML explainability (Scapin et al., 2022), (iii) the fact that selected from state-of-the-art studies when selecting performance for

14
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 14. Comparison of the usefulness and comprehension of the SHAP-based explanations (top) and the proposed approach (bottom). (a) Distributions of the participants responses
according to the usefulness of the explanations generated using SHAP. (b) Participant distributions based on their comprehension of SHAP-based explanations. (c) Distributions of
the participants responses based on the usefulness of the explanations generated using the proposed approach. (d) Participant distributions based on their comprehension of the
proposed approach explanations.

Fig. 15. The level of usefulness of the SHAP explanations that are measured on a scale of 1–5. The respective options are: (1) not at all useful, (2) slightly useful, (3) moderately
useful, (4) very useful, and (5) extremely useful.

15
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Fig. 16. The level of usefulness of the explanations that are measured on a scale of 1–5 and that are based on the proposed approach of this study. The respective options are:
(1) not at all useful, (2) slightly useful, (3) moderately useful, (4) very useful, and (5) extremely useful.

Fig. 17. The average prediction time (or prediction latency) measured for 5268 samples for vision models, semantic models, and the combination of predictions from vision and
semantic models (i.e. decision engine), i.e. a proposed approach.

comparison. Table 9 clearly demonstrates that the proposed method Table 8


The accuracy after combining the predictions from the vision model and the semantic
outperforms state-of-the-art methods, with the exception of Kumar model, i.e. the proposed approach.
et al.’s (Kumar et al., 2023) work, which outperforms the result of this Vision model weight Semantic model weight Combined accuracy
study by 0.0025. However, there are two important differences to take 0.5 0.5 0.905
into account: (i) Kumar et al.’s (Kumar et al., 2023) work does not
provide explainability, and (ii) this study uses a large, noisy cassava
image dataset. This is because this study uses the combined datasets approach. The semantic model requires an average of 2.91 s, while
released by the Makerere University AI Lab in 2019 and 2020 (see the vision model requires only 0.91 s. The vision model utilises GPUs,
whereas the semantic model does not. This is one reason for the
Section 4.1), while Kumar et al.’s (Kumar et al., 2023) work only uses
significant difference in inference time between the vision model and
the dataset from 2019. A similar situation is present in other studies. semantic model. In addition, the combined average inference time
Therefore, conclusion can be made that the proposed method is robust. is 3.85 s, which is acceptable given the nature of the task (i.e. no
hard real-time requirements). Therefore, with this, a conclusion can
The third evaluation is on latency, as discussed in Section 4.5. be drawn that the proposed approach is also suitable for real-world
Fig. 17 illustrates the average inference time for of the proposed deployment.

16
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Table 9
A comparison of the proposed method to the state-of-the-art studies in terms of accuracy and user-level explainability.

Study Model/Methods Accuracy User-level


explanation

MobileNet V2 0.901 No
(Emmanuel et al., 2023)
(Ahishakiye et al., 2023) Ensemble of GLVQ, 0.82 No
GMLVQ and LGMLVQ
(Kumar et al., 2023) Ensemble of EfficientNet, 0.9075 No
SEResNeXt, ViT, DeIT and
MobileNetV3
(Paiva-Peredo, 2023) DenseNet169 0.7477 No
(Chen et al., 2022) Smooth-Taylor CE 0.893 No
(Ravi et al., 2022) A_L_EfficientNet 0.8708 No
(Riaz et al., 2022) EfficiennetB3 0.8303 No
(Anitha & Saranya, 2022) CNN with data 0.90 No
augmentation
The proposed Fusion of DL and 0.905 Yes
approach semantic technology

6. Conclusion CRediT authorship contribution statement

This study presents novel work on combining semantic technology Tek Raj Chhetri: Conceptualization, Methodology, Investigation,
and DL with a focus on cassava disease. This research demonstrated Validation, Data curation, Writing – original draft, Writing – review &
how semantic technology and DL can be combined to address the editing, Visualisation, Software, Formal analysis. Armin Hohenegger:
following limitations: (i) the lack of domain knowledge and contextual Investigation, Software, Validation, Formal analysis, Data curation,
information in DL (see Section 3), and (ii) the lack of user-level ex- Writing – review & editing, Visualisation. Anna Fensel: Writing – re-
plainability of DL (see Section 3.2.3). Moreover, by combining semantic view & editing, Funding acquisition. Mariam Aramide Kasali: Writing
technology and DL, as demonstrated by the results and comparison – review & editing, Investigation. Asiru Afeez Adekunle: Writing –
with the state-of-the-art works (see Section 5), this study also addressed review & editing, Investigation.
the limitations of semantic technology, which are their limited abilities
to learn complex patterns like DL. Furthermore, through the latency
evaluation, this study demonstrated the suitability of the proposed Declaration of competing interest
approach for real-world scenarios.
From the performance evaluation and comparison with state-of- The authors declare no conflict of interest.
the-art studies, this study demonstrated the benefits of the proposed
method and its robustness. However, the true benefit lies in the gen-
eralisability of the proposed approach, which can be used to solve Data availability
other problems in the agricultural (and also other) domain, its ability
to incorporate domain knowledge and generate, generate user-level The data and code are made available online details of which is
explainability (see Section 3.2.3) and combine the symbolic approach provided in the manuscript.
based on semantic technology with a non-symbolic approach, also
referred to as a numeric approach (or neuro-symbolic approach). This Survey Data: Towards Improving Prediction Accuracy and User-Lev
is particularly important when building AI (or AI systems) in a societal el Explainability Using Deep Learning and Knowledge Graphs: A St
context, as the majority of the users are non-experts, and it is essential udy on Cassava Disease (Original data) (GitHub)
that they do not blindly trust predictions from the AI system. Moreover,
this work enables the ability to include knowledge and rich contextual
Acknowledgements
information that is not available in data, e.g. images, that helps in
improving prediction, as demonstrated by the result (see Table 8), and
build trustworthy AI systems. This research is partially supported by Interreg Österreich-Bayern
However, if the domain knowledge is inaccurate or highly biased, it 2014–2020 programme project KI-Net: Bausteine für KI-basierte Opti-
can act as a bottleneck, decreasing rather than increasing the accuracy mierungen in der industriellen Fertigung (grant agreement : AB 292).
of the prediction. Therefore, the information (or domain knowledge) We would like thank to the High-Performance Computing Centre (HPC)
should be incorporated with extreme caution, and multiple domain at the University of Innsbruck for providing the LEO HPC infrastructure
experts should be involved to eliminate (or reduce) bias. In the case for our experiment. We also want to express our gratitude to Michael
of strict real-time requirements, the semantic model’s performance can Fink, a member of the HPC staff, for his assistance with HPC admin-
be a bottleneck, and this is therefore regarded as a limitation of this istrative tasks and for accelerating the procedure to reduce delay. In
study that requires future improvement. addition, we would like to thank everyone who participated in our
As potential future work, three directions are identified. The first is survey and gave us permission to analyse and utilise their responses
to make additional performance enhancements and incorporate more in our research.
domain knowledge. The second objective is to apply the proposed
work to domains, such as healthcare, that require AI systems that are
Appendix
explainable, have good performance, and can also make use of domain
knowledge. The third research direction is to improve the explainability
further following the user response, as discussed in Section 5. See Table 10.

17
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Table 10
The responses from the participants that provides reasons why they selected a particular option. The responses presented in
this table remove responses that include text such as ‘‘I don’t understand’’ or incomplete answer.

SHAP-based explanations Explanations based on proposed approach


Free-text response providing the reasons why participants selected the option ‘‘yes’’ for usability and
comprehension for the yes-or-no question.
1. The colour in the prediction and explanation
is same.
2. The spots show the affected area.
3. It affects chlorophyll production and leads to 1. It gave a written explanation of the disease.
the reduction in crop yield. 2. It is clear what the disease is, the explanation is
given and the percentage certainty are given as well.
3. The written text helps.
4. Learnt that the disease occurs mostly in moist soil.
5. Because the text tell the exact thing and it can be
understood.
6. The generated explanation gave a view of how the
disease thrives and this will aid in how to combat it.
7. Form the generated explanation, I could decipher
that cassava mosaic is a high casualty disease that
affects cassava plants.
8. There is a proper breakdown on the casual
environment for this particular disease.
9. It explains the predisposing factors that could
promote the disease growth.
10. I learnt that the disease could be devastating
during the rainy season.
Free-text response providing the reasons why participants selected the option ‘‘no’’ for usability and
comprehension for the yes-or-no question.
1. Prediction is not clear to me. The explanation is not visible.
2. The explanation is just a series of dots on a
canvas. However, from the text above the
prediction image, I am able to tell what the
disease is.
3. Not clear how the cassava disease affects
plants and where it shows up on the plant.
Free-text response providing the reasons why
participants gave particular scores for usefulness ratings
that are measured on a scale between 1 and 5.
1. I did not understand How the prediction 1. Since the prediction and explanation are correct, it
model is made and how the explanation helps is helpful to know when the disease is worse, what
or adds to diagnose and explain additional soil condition and temperature.
information about the disease. 2. It is straight forward.
2. Not clear if the photo shows a distribution 3. The explanation could be improved in terms of for
of the disease on the plant as in location instance cause, how it can be fixed and perhaps, other
(leaves or roots). diseases that may look the same even to a human
3. Because the colour is based on the observer
observations. Some people may perceive the 4. Should add examples of the symptoms in the text.
colour in different way. 5. Having more explanation with some graphics as
4. I responded NO because I am not an avid well would be helpful.
user of AI and the generated images could not 6. I gave the above rating because seeing the
be easily understood. explanation above is an Avenue to know how to move
5. It requires a high level of knowledge to with the precautions necessary to combat the disease
understand. 7. Easily understood.
6. It will help farmers to identify the disease 8. It gave detailed information about the disease
faster. 9. I often see the plants on the field
7. It explains how wide the disease has spread. 10. Not technical to understand.

References Amador-Domínguez, E., Serrano, E., & Manrique, D. (2023). GEnI: A framework for the
generation of explanations and insights of knowledge graph embedding predictions.
Neurocomputing, 521, 199–212. http://dx.doi.org/10.1016/j.neucom.2022.12.010.
Abbas, A., Jain, S., Gour, M., & Vankudothu, S. (2021). Tomato plant disease detection American Phytopathological Society (2016). Plant disease ontology. https://github.
using transfer learning with C-GAN synthetic images. Computers and Electronics in com/Planteome/plant-disease-ontology. (Online accessed 18 July 2022).
Agriculture, 187, Article 106279. http://dx.doi.org/10.1016/j.compag.2021.106279.
Anitha, J., & Saranya, N. (2022). Cassava leaf disease identification and detection
Ahishakiye, E., Mwangi, W., Murithi, P., Wario, R., Kanobe, F., & Danison, T. using deep learning approach. International Journal of Computers Communications
(2023). An ensemble model based on learning vector quantization algorithms & Control, 17(2), http://dx.doi.org/10.15837/ijccc.2022.2.4356.
for early detection of cassava diseases using spectral data. In P. Ndayizigamiye, Anzolin, A., Toppi, J., Petti, M., Cincotti, F., & Astolfi, L. (2021). SEED-g: Simulated
H. Twinomurinzi, B. Kalema, K. Bwalya, & M. Bembe (Eds.), Digital-for-development: EEG data generator for testing connectivity algorithms. Sensors, 21(11), 3632.
Enabling transformation, inclusion and sustainability through ICTs (pp. 320–328). http://dx.doi.org/10.3390/s21113632.
Cham: Springer Nature Switzerland. Ashwinkumar, S., Rajagopal, S., Manimaran, V., & Jegajothi, B. (2022). Automated
Ajayi, C. O., & Olutumise, A. I. (2018). Determinants of food security and technical effi- plant leaf disease detection and classification using optimal MobileNet based
ciency of cassava farmers in Ondo State, Nigeria. International Food and Agribusiness convolutional neural networks. Materials Today: Proceedings, 51, 480–487. http:
Management Review, 21(7), 915–928. http://dx.doi.org/10.22434/ifamr2016.0151. //dx.doi.org/10.1016/j.matpr.2021.05.584, CMAE’21.

18
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Atila, Ü., Uçar, M., Akyol, K., & Uçar, E. (2021). Plant leaf disease classification Holzinger, A., & Müller, H. (2021). Toward human–AI interfaces to support explain-
using EfficientNet deep learning model. Ecological Informatics, 61, Article 101182. ability and causability in medical AI. Computer, 54(10), 78–86. http://dx.doi.org/
http://dx.doi.org/10.1016/j.ecoinf.2020.101182. 10.1109/MC.2021.3092610.
Ayoub Shaikh, T., Rasool, T., & Rasheed Lone, F. (2022). Towards leveraging the Horrocks, I., Patel-Schneider, P. F., Boley, H., Tabet, S., Grosof, B., & Dean, M. (2004).
role of machine learning and artificial intelligence in precision agriculture and SWRL: A semantic web rule language combining OWL and RuleML. W3C Member
smart farming. Computers and Electronics in Agriculture, 198, Article 107119. http: Submission, 21(79), 1–31.
//dx.doi.org/10.1016/j.compag.2022.107119. Janowicz, K., Haller, A., Cox, S. J., Le Phuoc, D., & Lefrançois, M. (2019). SOSA: A
Bahani, K., Ali-Ou-Salah, H., Moujabbir, M., Oukarfi, B., & Ramdani, M. (2020). A lightweight ontology for sensors, observations, samples, and actuators. Journal of
novel interpretable model for solar radiation prediction based on adaptive fuzzy Web Semantics, 56, 1–10. http://dx.doi.org/10.1016/j.websem.2018.06.003.
clustering and linguistic hedges. In Proceedings of the 13th international conference Jearanaiwongkul, W., Anutariya, C., & Andres, F. (2018). An ontology-based approach
on intelligent systems: Theories and applications. New York, NY, USA: Association for to plant disease identification system. In IAIT 2018, Proceedings of the 10th
Computing Machinery, http://dx.doi.org/10.1145/3419604.3419807. international conference on advances in information technology. New York, NY,
Ball, H. L. (2019). Conducting online surveys. Journal of Human Lactation, 35(3), USA: Association for Computing Machinery, http://dx.doi.org/10.1145/3291280.
413–417. http://dx.doi.org/10.1177/0890334419848734, PMID: 31084575. 3291786.
Bedi, P., & Gole, P. (2021). Plant disease detection using hybrid model based on Jiménez-Luna, J., Grisoni, F., & Schneider, G. (2020). Drug discovery with explainable
convolutional autoencoder and convolutional neural network. Artificial Intelligence artificial intelligence. Nature Machine Intelligence, 2(10), 573–584. http://dx.doi.
in Agriculture, 5, 90–101. http://dx.doi.org/10.1016/j.aiia.2021.05.002. org/10.1038/s42256-020-00236-4.
Benedikt, M., Kersting, K., Kolaitis, P. G., & Neider, D. (2020). Logic and learning Kannammal, A., Guhanesvar, M., & Venketesz, R. R. (2023). Predictive maintenance for
(Dagstuhl Seminar 19361). Dagstuhl Reports, 9(9), 1–22. http://dx.doi.org/10.4230/ remote field IoT devices—A deep learning and cloud-based approach. In S. Shakya,
DagRep.9.9.1. G. Papakostas, & K. A. Kamel (Eds.), Mobile computing and sustainable informatics
Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). Yolov4: Optimal speed and (pp. 567–585). Singapore: Springer Nature Singapore.
accuracy of object detection. arXiv preprint arXiv:2004.10934. Kumar, H., Velu, S., Lokesh, A., Suman, K., & Chebrolu, S. (2023). Cassava leaf disease
Chaddad, A., Peng, J., Xu, J., & Bouridane, A. (2023). Survey of explainable detection using ensembling of EfficientNet, SEResNeXt, ViT, DeIT and MobileNetV3
AI techniques in healthcare. Sensors, 23(2), 634. http://dx.doi.org/10.3390/ models. In R. P. Yadav, S. J. Nanda, P. S. Rana, & M.-H. Lim (Eds.), Proceedings
s23020634. of the international conference on paradigms of computing, communication and data
Chan, M.-C., Pai, K.-C., Su, S.-A., Wang, M.-S., Wu, C.-L., & Chao, W.-C. (2022). Ex- sciences (pp. 183–193). Singapore: Springer Nature Singapore.
plainable machine learning to predict long-term mortality in critically ill ventilated Kuzlu, M., Cali, U., Sharma, V., & Güler, Ö. (2020). Gaining insight into solar pho-
patients: a retrospective study in central Taiwan. BMC Medical Informatics and tovoltaic power generation forecasting utilizing explainable artificial intelligence
Decision Making, 22(1), 75. http://dx.doi.org/10.1186/s12911-022-01817-6. tools. IEEE Access, 8, 187814–187823. http://dx.doi.org/10.1109/ACCESS.2020.
Chen, Y., Xu, K., Zhou, P., Ban, X., & He, D. (2022). Improved cross entropy loss 3031477.
for noisy labels in vision leaf disease classification. IET Image Processing, 16(6), Lacasta, J., Lopez-Pellicer, F. J., Espejo-García, B., Nogueras-Iso, J., & Zarazaga-Soria, F.
1511–1519. http://dx.doi.org/10.1049/ipr2.12402. J. (2018). Agricultural recommendation system for crop protection. Computers and
Chen, J., Zhang, D., Suzauddola, M., & Zeb, A. (2021). Identifying crop diseases Electronics in Agriculture, 152, 82–89. http://dx.doi.org/10.1016/j.compag.2018.06.
using attention embedded MobileNet-V2 model. Applied Soft Computing, 113, Article 049.
107901. http://dx.doi.org/10.1016/j.asoc.2021.107901. Lagos-Ortiz, K., Medina-Moreira, J., Paredes-Valverde, M. A., Espinoza-Morán, W., &
Chhetri, T. R., Dehury, C. K., Lind, A., Srirama, S. N., & Fensel, A. (2022). A combined Valencia-García, R. (2017). An ontology-based decision support system for the
system metrics approach to cloud service reliability using artificial intelligence. Big diagnosis of plant diseases. Journal of Information Technology Research (JITR), 10(4),
Data and Cognitive Computing, 6(1), 26. http://dx.doi.org/10.3390/bdcc6010026. 42–55. http://dx.doi.org/10.4018/JITR.2017100103.
Chhetri, T. R., Hohenegger, A., Fensel, A., Aramide, K. M., & Adekunle, A. A. Loshchilov, I., & Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts.
(2022). Code: Towards an explainable artificial intelligence using deep learning arXiv preprint arXiv:1608.03983.
and knowledge graphs: A study on cassava disease. https://github.com/Research- Luampon, R., & Charmongkolpradit, S. (2019). Temperature and relative humidity effect
Tek/xai-cassava-agriculture. (Last accessed 28 March 2023). on equilibrium moisture content of cassava pulp. Research in Agricultural Engineering,
Chhetri, T. R., Kurteva, A., Adigun, J. G., & Fensel, A. (2022). Knowledge graph 65(1), 13–19.
based hard drive failure prediction. Sensors, 22(3), 985. http://dx.doi.org/10.3390/ Lucic, A., Ter Hoeve, M. A., Tolomei, G., De Rijke, M., & Silvestri, F. (2022).
s22030985. CF-GNNExplainer: Counterfactual explanations for graph neural networks. In
Detras, J., Borja, F. N., McNally, K., Mauleon, R., William, J. M. P., Ruaraidh, E., G. Camps-Valls, F. J. R. Ruiz, & I. Valera (Eds.), Proceedings of machine learning
Hamilton, S., & Grenier, C. (2016). Rice ontology. https://cropontology.org/term/ research: vol. 151, Proceedings of the 25th international conference on artificial
CO_320:ROOT. (Online accessed 19 July 2022). intelligence and statistics (pp. 4499–4511). PMLR, URL: https://proceedings.mlr.
Emmanuel, A., Mwangi, R. W., Murithi, P., Fredrick, K., & Danison, T. (2023). press/v151/lucic22a.html.
Classification of cassava leaf diseases using deep Gaussian transfer learning model. Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model
Engineering Reports, Article e12651. http://dx.doi.org/10.1002/eng2.12651. predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
Fakhfakh, F., Kacem, H. H., & Kacem, A. H. (2017). Simulation tools for cloud S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing
computing: A survey and comparative study. In 2017 IEEE/ACIS 16th international systems, vol. 30. Curran Associates, Inc., URL: https://proceedings.neurips.cc/paper_
conference on computer and information science (pp. 221–226). http://dx.doi.org/10. files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
1109/ICIS.2017.7959997. Machlev, R., Heistrene, L., Perl, M., Levy, K., Belikov, J., Mannor, S., & Levron, Y.
Feng, L., Shu, S., Lin, Z., Lv, F., Li, L., & An, B. (2021). Can cross entropy loss be (2022). Explainable artificial intelligence (XAI) techniques for energy and power
robust to label noise? In Proceedings of the twenty-ninth international conference on systems: Review, challenges and opportunities. Energy and AI, 9, Article 100169.
international joint conferences on artificial intelligence (pp. 2206–2212). http://dx.doi.org/10.1016/j.egyai.2022.100169.
Ferentinos, K. P. (2018). Deep learning models for plant disease detection and diagnosis. Mitrentsis, G., & Lens, H. (2022). An interpretable probabilistic model for short-term
Computers and Electronics in Agriculture, 145, 311–318. http://dx.doi.org/10.1016/ solar power forecasting using natural gradient boosting. Applied Energy, 309, Article
j.compag.2018.01.009. 118473. http://dx.doi.org/10.1016/j.apenergy.2021.118473.
Food and Agriculture Organization of the United Nations (2021). Climate change fans Müller, R., Kornblith, S., & Hinton, G. E. (2019). When does label smoothing help? In
spread of pests and threatens plants and crops, new FAO study. URL: https://www. H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, & R. Garnett
fao.org/news/story/en/item/1402920/icode/. (Last accessed 31 March 2023). (Eds.), Advances in neural information processing systems, vol. 32. Curran Associates,
Gaur, M., Faldu, K., & Sheth, A. (2021). Semantics of the black-box: Can knowledge Inc.
graphs help make deep learning systems more interpretable and explainable? IEEE Mwebaze, E., Gebru, T., Frome, A., Nsumba, S., & Tusubira, J. (2019). iCassava 2019
Internet Computing, 25(1), 51–59. http://dx.doi.org/10.1109/MIC.2020.3031769. fine-grained visual categorization challenge. http://dx.doi.org/10.48550/ARXIV.
Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-infused learning: 1908.02900, arXiv.
A sweet spot in neuro-symbolic AI. IEEE Internet Computing, 26(4), 5–11. http: Nagasubramanian, K., Jones, S., Singh, A. K., Sarkar, S., Singh, A., & Ganapathysubra-
//dx.doi.org/10.1109/MIC.2022.3179759. manian, B. (2019). Plant disease identification using explainable 3D deep learning
Gohil, S. (2021). Dataset: Cassava plant disease Merged 2019–2020. https:// on hyperspectral images. Plant Methods, 15(1), 98. http://dx.doi.org/10.1186/
www.kaggle.com/datasets/srg9000/cassava-plant-disease-merged-20192020. (Last s13007-019-0479-8.
accessed 27 August 2022). Noy, N., & McGuinness, D. L. (2001). Ontology development 101: A guide to creating your
Greff, K., Belletti, F., Beyer, L., Doersch, C., Du, Y., Duckworth, D., Fleet, D. J., first ontology: Technical report KSL-01-05 and SMI-2001-0880, Stanford Knowledge
Gnanapragasam, D., Golemo, F., Herrmann, C., Kipf, T., Kundu, A., Lagun, D., Systems Laboratory and Stanford Medical Informatics.
Laradji, I., Liu, H.-T. D., Meyer, H., Miao, Y., Nowrouzezahrai, D., Oztireli, C., .... Paiva-Peredo, E. (2023). Deep learning for the classification of cassava leaf diseases
Tagliasacchi, A. (2022). Kubric: A scalable dataset generator. In Proceedings of the in unbalanced field data set. In I. Woungang, S. K. Dhurandher, K. K. Pattanaik,
IEEE/CVF conference on computer vision and pattern recognition (pp. 3749–3761). A. Verma, & P. Verma (Eds.), Advanced network technologies and intelligent computing
Halabi, A. (2009). Plant protection ontology. https://sites.google.com/site/ppontology/ (pp. 101–114). Cham: Springer Nature Switzerland.
home. (Online accessed 19 July 2022). Raschka, S., & Mirjalili, V. (2017). Python machine learning. Packt Publishing Ltd.

19
T.R. Chhetri et al. Expert Systems With Applications 233 (2023) 120955

Ravi, V., Acharya, V., & Pham, T. D. (2022). Attention deep learning-based large- Sharma, S., Santra, B., Jana, A., Tokala, S., Ganguly, N., & Goyal, P. (2019).
scale learning classifier for cassava leaf disease classification. Expert Systems, 39(2), Incorporating domain knowledge into medical NLI using knowledge graphs. In
Article e12862. http://dx.doi.org/10.1111/exsy.12862. Proceedings of the 2019 conference on empirical methods in natural language processing
Rawat, A., Sushil, R., Agarwal, A., & Sikander, A. (2021). A new approach for VM and the 9th international joint conference on natural language processing (pp. 6092–
failure prediction using stochastic model in cloud. IETE Journal of Research, 67(2), 6097). Hong Kong, China: Association for Computational Linguistics, http://dx.doi.
165–172. http://dx.doi.org/10.1080/03772063.2018.1537814. org/10.18653/v1/D19-1631.
Riaz, S. M., Ahsan, M., & Akram, M. U. (2022). Diagnosis of cassava leaf diseases Tan, M., & Le, Q. (2021a). EfficientNetV2: Smaller models and faster training. In
and classification using deep learning techniques. In 2022 16th International M. Meila, & T. Zhang (Eds.), Proceedings of machine learning research: vol. 139, Pro-
conference on open source systems and technologies (pp. 1–8). http://dx.doi.org/10. ceedings of the 38th international conference on machine learning (pp. 10096–10106).
1109/ICOSST57195.2022.10016854. PMLR, URL: https://proceedings.mlr.press/v139/tan21a.html.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ‘‘Why should I trust you?’’: Explaining Tan, M., & Le, Q. V. (2021b). EfficientNetV2: Smaller models and faster training.
the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD inter- arXiv:2104.00298.
national conference on knowledge discovery and data mining (pp. 1135–1144). New Too, E. C., Yujian, L., Njuki, S., & Yingchun, L. (2019). A comparative study of
York, NY, USA: Association for Computing Machinery, http://dx.doi.org/10.1145/ fine-tuning deep learning models for plant disease identification. Computers and
2939672.2939778. Electronics in Agriculture, 161, 272–279. http://dx.doi.org/10.1016/j.compag.2018.
Rodríguez-García, M. Á., & García-Sánchez, F. (2020). CropPestO: An ontology model 03.032, BigData and DSS in Agriculture.
for identifying and managing plant pests and diseases. In R. Valencia-García, Toubeau, J.-F., Bottieau, J., Wang, Y., & Vallée, F. (2022). Interpretable probabilistic
G. Alcaraz-Marmol, J. Del Cioppo-Morstadt, N. Vera-Lucio, & M. Bucaram-Leverone forecasting of imbalances in renewable-dominated electricity systems. IEEE Trans-
(Eds.), Technologies and innovation (pp. 18–29). Cham: Springer International actions on Sustainable Energy, 13(2), 1267–1277. http://dx.doi.org/10.1109/TSTE.
Publishing. 2021.3092137.
Rodríguez-García, M. Á., García-Sánchez, F., & Valencia-García, R. (2021). Knowledge- UN (2021). The sustainable development goals report 2021. https://unstats.un.
based system for crop pests and diseases recognition. Electronics, 10(8), http: org/sdgs/report/2021/The-Sustainable-Development-Goals-Report-2021.pdf. (On-
//dx.doi.org/10.3390/electronics10080905. line accessed 16 July 2022).
Roy, A. M., & Bhaduri, J. (2021). A deep learning enabled multi-class plant disease Xie, S., Girshick, R., Dollar, P., Tu, Z., & He, K. (2017). Aggregated residual transfor-
detection model based on computer vision. AI, 2(3), 413–428. http://dx.doi.org/ mations for deep neural networks. In Proceedings of the IEEE conference on computer
10.3390/ai2030026. vision and pattern recognition.
Sahu, P., & Sinha, V. K. (2022). Plant disease detection using transfer learning with Yang, L., Wang, H., & Deleris, L. A. (2021). What does it mean to explain? A user-
DL model. In S. Shakya, K. Ntalianis, & K. A. Kamel (Eds.), Mobile computing and centered study on AI explainability. In H. Degen, & S. Ntoa (Eds.), Artificial
sustainable informatics (pp. 169–180). Singapore: Springer Nature Singapore. intelligence in HCI (pp. 107–121). Cham: Springer International Publishing.
Sammani, F., Mukherjee, T., & Deligiannis, N. (2022). NLX-gpt: A model for natural Yang, Z., Xu, Y., Hu, J., & Dong, S. (2023). Generating knowledge aware explanation
language explanations in vision and vision-language tasks. In Proceedings of the for natural language inference. Information Processing & Management, 60(2), Article
IEEE/CVF conference on computer vision and pattern recognition (pp. 8322–8332). 103245. http://dx.doi.org/10.1016/j.ipm.2022.103245.
Scapin, D., Cisotto, G., Gindullina, E., & Badia, L. (2022). Shapley value as an aid Ying, Z., Bourgeois, D., You, J., Zitnik, M., & Leskovec, J. (2019). GNNEx-
to biomedical machine learning: a heart disease dataset analysis. In 2022 22nd plainer: Generating explanations for graph neural networks. In H. Wal-
IEEE international symposium on cluster, cloud and internet computing (pp. 933–939). lach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, & R. Gar-
http://dx.doi.org/10.1109/CCGrid54584.2022.00113. nett (Eds.), Advances in neural information processing systems, vol. 32. Curran
Seena Radhakrishnan, A., Suja, G., & Sreekumar, J. (2022). How sustainable is Associates, Inc., URL: https://proceedings.neurips.cc/paper_files/paper/2019/file/
organic management in cassava? Evidences from yield, soil quality, energetics and d80b7040b773199015de6d3b4293c8ff-Paper.pdf.
economics in the humid tropics of South India. Scientia Horticulturae, 293, Article Yuan, L., Hou, Q., Jiang, Z., Feng, J., & Yan, S. (2021). Volo: Vision outlooker for
110723. http://dx.doi.org/10.1016/j.scienta.2021.110723. visual recognition. arXiv preprint arXiv:2106.13112.
Shah, D., Trivedi, V., Sheth, V., Shah, A., & Chauhan, U. (2022). ResTS: Residual Zhang, K., Xu, P., & Zhang, J. (2020). Explainable AI in deep reinforcement learning
deep interpretable architecture for plant disease detection. Information Processing in models: A SHAP method applied in power system emergency control. In 2020
Agriculture, 9(2), 212–223. http://dx.doi.org/10.1016/j.inpa.2021.06.001. IEEE 4th conference on energy internet and energy system integration (pp. 711–716).
Sharma, R., Kamble, S. S., Gunasekaran, A., Kumar, V., & Kumar, A. (2020). A http://dx.doi.org/10.1109/EI250167.2020.9347147.
systematic literature review on machine learning applications for sustainable
agriculture supply chain performance. Computers & Operations Research, 119, Article
104926.

20

You might also like