0% found this document useful (0 votes)
24 views17 pages

An Efficient and Robust Approach Using Inductive Transfer-Based Ensemble Deep Neural Networks For Kidney Stone Detection

Uploaded by

ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

An Efficient and Robust Approach Using Inductive Transfer-Based Ensemble Deep Neural Networks For Kidney Stone Detection

Uploaded by

ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received 18 January 2024, accepted 17 February 2024, date of publication 26 February 2024, date of current version 6 March 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3370672

An Efficient and Robust Approach Using Inductive


Transfer-Based Ensemble Deep Neural
Networks for Kidney Stone Detection
JYOTISMITA CHAKI 1, (Member, IEEE), AND AYŞEGÜL UÇAR 2, (Senior Member, IEEE)
1 Schoolof Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
2 Engineering Faculty, Mechatronics Engineering Department, Firat University, 23119 Elazig, Turkey

Corresponding author: Jyotismita Chaki ([email protected])


This work was supported by the Vellore Institute of Technology, Vellore, India.

ABSTRACT Chronic kidney disorder is a global health problem involving the repercussions of impaired
kidney function and kidney failure. A kidney stone is a kidney scenario that impairs kidney function. Because
this disease is usually asymptomatic, early and quick detection of kidney problems is essential to avoid signif-
icant consequences. This study presents an automated detection of Computed Tomography (CT) kidney stone
images using an inductive transfer-based ensemble Deep Neural Network (DNN). Three datasets are created
for feature extraction from kidney CT images using pre-trained DNN models. After assembling several
pre-trained DNNs, such as DarkNet19, InceptionV3, and ResNet101, the ensemble deep feature vector is
created using feature concatenation. The Iterative ReliefF feature selection method is used to choose the most
informative ensemble deep feature vectors, which are then fed into the K Nearest Neighbor classifier tuned
using a Bayesian optimizer with a 10-fold cross-validation approach to detect kidney stones. The proposed
strategy achieves 99.8% and 96.7% accuracy using the quality and noisy image datasets, which are superior
to other DNN-based and traditional image detection approaches. This proposed automated approach can help
urologists confirm their physical inspection of kidney stones, reducing the possibility of human mistakes.

INDEX TERMS Cross-validation, deep learning, computed tomography, kidney stone, transfer learning,
ensemble network.

I. INTRODUCTION time for physicians, and this hectic working schedule may
Kidney diseases affect people of all ages and genders. Early result in an incorrect diagnosis.
detection of kidney diseases is critical, as it is for other dis- Furthermore, many patients suffer from kidney disorders,
eases. Chronic kidney disease can be deadly if not addressed. and a shortage of physicians may cause the treatment to be
Kidney stones must be identified and diagnosed as soon severely delayed. Computer-assisted medical solutions have
as possible. Early detection of tiny kidney stones helps to been developed to solve these challenges and diagnose kidney
avoid the development of chronic kidney disorders. There is disease at an early stage. Automated solutions reduce physi-
a constant increase in kidney patients globally, and several cian effort and potential human mistakes [2]. It is also always
nations (particularly third-world countries) have a shortage beneficial to achieve accurate outcomes free of subjectivity.
of nephrologists [1]. As a result, many people with kidney As a consequence, the designed system is always precise and
disease cannot obtain sufficient care. Patients with kidney sturdy.
disorders should be frequently screened, including medical Many automated research studies are now being conducted
imaging technology screening. These normal processes take on breast, lung, and heart disorders. However, there has
been little research on the recognition of kidney diseases.
To identify kidney disorders, ultrasound, Magnetic Reso-
The associate editor coordinating the review of this manuscript and nance Imaging (MRI), and Computed Tomography (CT) are
approving it for publication was Yong Yang . routinely utilized [3]. CT images have generally been utilized

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
32894 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

to identify, quantify, and segment kidney stone disease. The A. PRE-PROCESSING OF THE DATASET IMAGES
primary goal of automated applications in clinical practice 1) IMAGE BRIGHTNESS ENHANCEMENT BY USING THE
is to create an autonomous model that can identify kidney PROPOSED FUZZY INFERENCE SYSTEM
disorders and assist physicians in accurate treatment. The goal of creating a new Fuzzy Inference System (FIS)
Detection of medical imaging is a prominent research is to improve the brightness/contrast of the dataset images,
topic, and numerous autonomous models have been devel- thereby overcoming the limitations of previous techniques.
oped to diagnose the disorder reliably. Many Deep Learning The discrete pixel intensity is fuzzified using input Member-
(DL) and Machine Learning (ML) models have been utilized ship Functions (MFs) and then mapped to the output using
to achieve good performance in medical image recogni- IF-THEN rules to generate FIS. Finally, the output MFs are
tion [4], [5]. Despite using a single model to recognize med- used to generate a defuzzified value.
ical images, some researchers have used ensemble learning
to recognize medical images [41], [42], [43]. Therefore, this
2) IMAGE AUGMENTATION
study proposes a fusion of inductive transfer-based ensemble
Deep Neural Networks (DNNs) feature and ML classifier to Augmentation is used to increase the number of image sam-
recognize kidney stones. The inductive transfer-based ensem- ples in the dataset and add some variety. FINDWELL model’s
ble DNNs are used to extract the features from the kidney CT performance and results can be improved by data augmen-
images. tation. The data augmentation tools enhance and enrich the
In urology, DL and ML-based methods automatically data, allowing the model to perform better and more correctly.
detect ureteral and kidney stones. In this study, a model Data augmentation approaches lower operating expenses by
is proposed for the automated recognition of kidney stones including transformation into datasets. In this study, aug-
based on a fusion of ensemble DNN along with inductive mentation is done by using horizontal flip, vertical flip,
transfer and ML technique. The proposed technique aims horizontal+vertical flip, and adding salt and pepper noise in
to develop an automated kidney stone recognition model to the dataset images so that the FINDWELL model becomes
alleviate insufficient doctor discrepancies. This automated robust from these variations. Three different datasets are
system serves as a resource for nephrologists and radiologists. generated using the augmentation above approaches to check
This study proposes the Fusion of Inductive transfer-based the method’s efficacy.
eNsemble Deep netWork features, iterative rELiefF features
selector, and machine Learning (FINDWELL) model for kid- B. PROPOSED ENSEMBLE DEEP FEATURES BASED ON
ney stone detection. Before feature extraction, dataset images INDUCTIVE TRANSFER MODELS
are pre-processed in various ways, like image augmenta- This study proposes an Inductive transfer-based new ensem-
tion and brightness enhancement, using the proposed fuzzy ble deep feature extractor, which combines the features
inference system. Training DNNs from the beginning neces- extracted from Darknet19, InceptionV3, and ResNet101.
sitates a huge number of labeled images, which is extremely As a feature generation function, pre-trained DarkNet19
problematic in medical imaging because of the personnel is chosen to extract discriminative features. DarkNets [26]
costs and time required to build skillfully classified datasets. are lightweight CNNs that are exceptionally effective. For
To adjust the weights associated with DNNs for medical these reasons, DarkNet19 is used to extract the features in
image recognition, inductive transfer or transfer learning [6] this study. InceptionV3 [27] is less computationally costly.
has been used by previous researchers. In inductive transfer, As regularizers, it employs auxiliary classifiers. An auxiliary
the knowledge of an already trained deep learning model classifier is intended to improve the convergence of very deep
with large datasets is transferred to a new relatively small neural networks. In very deep networks, the auxiliary classi-
but similar dataset [7]. All extracted features may not help fier is primarily employed to tackle the vanishing gradient
recognize kidney stone images. Thus, an Iterative ReliefF problem. In the early rounds of training, the auxiliary classi-
(IRF) features selector [8] is incorporated in this study to fiers did not result in any improvement. However, in the end,
select only the meaningful features to reduce future com- the network with auxiliary classifiers outperformed the net-
putational complexity. To demonstrate the success of the work without auxiliary classifiers. To minimize the grid size
ensemble feature, a K-Nearest Neighbor (KNN) classifier of feature maps, max pooling, and average pooling were tra-
tuned by Bayesian Optimizer (BO) [9] is used to recognize ditionally utilized. The activation dimension of the network
the kidney stone images. The efficiency of the proposed filters is enhanced in the InceptionV3 model to minimize
FINDWELL model is demonstrated using three datasets, grid size effectively. For these reasons, InceptionV3 is used
which consist of high-quality and noisy kidney CT images. in this study as the second feature extractor. Networks with
Accuracy, precision, recall, F1 score, kappa score, Matthews many layers (even thousands) may be readily trained with
Correlation Coefficient (MCC), Good Detection rate (GDR) ResNet101 [28] without raising the training error percentage.
and Classification Success Index (CSI) are used to evaluate ResNet101 uses identity mapping to assist in solving the
the performance. vanishing gradient problem. Thus, this model is the 3rd choice
The overall contributions of the FINDWELL model are as in this study for extracting the dataset image feature. Lastly,
follows: an ensemble feature vector is developed by concatenating
VOLUME 12, 2024 32895
J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

the features extracted from Darknet19, InceptionV3, and the independent testing group had radiomics characteristics
Resnet101. of 24 kidney stones and 23 phleboliths. For the categoriza-
tion of 221 x-ray kidney stone images, Aksakalli et al. [14]
C. PROPOSED FEATURE SELECTOR used several machine learning approaches such as Decision
Feature selection is critical in data mining and DL, particu- Trees (DT), RF, SVM, Multilayer Perceptron (MLP), KNN,
larly for high-dimensional data. IRF [8] is a non-parametric NaiveBayes (BernoulliNB), and DNNs utilizing Convolu-
feature selection strategy that aims to maximize the accuracy tional Neural Network (CNN). The DT Classifier produced
of the recognition algorithms. The IRF feature selector selects the best classification results in the trials. There are some
meaningful features from the ensemble deep feature set for significant limitations to choosing only ML algorithms for
these reasons. the recognition of kidney stone images properly. ML requires
enormous, inclusive/unbiased, and high-quality data sets to
D. PROPOSED CLASSIFIER train the model. They may also have to wait for fresh data
In this study, the KNN classifier is used for kidney stone to be created at times. ML requires adequate time to let
recognition purposes for the following reasons. This classi- the algorithms learn and mature sufficiently to perform their
fier is simple and easy to understand. It is non-parametric. function with high accuracy and relevance. It also needs vast
No training is required when using the classifier. This clas- resources to function. This may need more excellent com-
sifier can handle large datasets. BO is used to tune the KNN puting resources. Another significant problem is the capacity
classifier’s hyperparameters. to understand the outcomes of the algorithms appropriately.
The rest of the paper is organized as follows. Section II Researchers must also carefully select the algorithms for the
discusses the previous works based on different ML algo- application. The error made in the early phases is massive,
rithms and inductive transfer-based DNNs for kidney stone and if not remedied at that time, it causes disaster. Bias and
detection. Section III discusses the proposed technique’s wrongdoing must be dealt with separately; they are unrelated.
methodology, which covers image augmentation, image ML is dependent on two factors: data and algorithms. The
enhancement using the proposed fuzzy brightness enhance- two variables determine all of the mistakes. Any errors in any
ment technique and generation of three different datasets variables would have a significant impact on the outcome.
from the original dataset, feature extraction using ensemble Because of the limitations mentioned above in ML
inductive transfer enabled DNNs, feature selection by using techniques, nowadays, researchers mainly use DL based
IRF and classification using KNN tuned by BO. Section IV techniques to recognize kidney stone images. For the classifi-
is concerned with the results of the FINDWELL model and cation of 1799 CT scan kidney stone images, Manoj et al. [15]
analysis of the model with the techniques used by previous used VGG16 architecture. VGG16 consists of 16 layers with
researchers in kidney stone recognition. Section V deals with learnable weights: 13 convolutional layers and three fully
the discussion of the article. Finally, Section VI concludes the connected layers. In [16], authors used ResNet50 to recog-
article. nize 2959 CT kidney stone images. ResNet is an abbreviation
for Residual Network. Microsoft Research launched The
II. RELATED WORKS ResNet design in 2015, and is widely regarded as one of the
This section discusses studies on different ML techniques, most popular Convolutional Neural Network architectures.
DNN models, and inductive transfer for kidney stone ResNet-50 is a convolutional neural network of 50 layers
recognition. (48 convolutional layers, one MaxPool layer, and one aver-
Serrat et al. [10] used random forests to recognize kid- age pool layer). For the segmentation of 260 kidney stone
ney stones. The authors present findings from a study of CT images, Li et al. [17] used SegNet. SegNet is a model
454 CT kidney stone images. The authors of [11] provide two for semantic segmentation. This core trainable segmentation
supervised learning approaches for automating and improv- architecture is made up of an encoder network, a decoder
ing the categorization of kidney stones using a collection of network, and a pixel-wise classification layer. The encoder
125 ureteroscope-captured kidney stone images. Image cues network’s design is topologically identical to the VGG16
that urologists use visually to determine the kidney stone network’s 13 convolutional layers. In [18], authors used a
are analyzed and encoded as vectors in the approaches. The Deep Kronecker Neural network (DKN) to recognize kid-
feature vectors are then classified using Random Forest (RF) ney stone CT images. The image input layer, Kronecker
and ensemble KNN classifiers. Verma et al. [12] begin by convolutional layer (Conv2D), Rectifier Linear unit (ReLu)
enhancing the kidney stone images with the median filter, activation layer, max pooling layer, flatten layer, Dropout
the Gaussian filter, and un-sharp masking. After that, the layer, and Dense layers are all combined in the developed
authors employ morphological procedures such as erosion model. Inception-V3 architecture with 48 layers is used by
and dilation to determine the region of interest, and ulti- Sabuncu et al. [19] to recognize 8209 kidney stone CT
mately, the authors apply KNN and Support Vector Machine images. In [20], authors used the Deep Believe Network
(SVM) classification approaches to analyze kidney stone (DBN) to detect 1377 kidney stone CT images. DBNs have
images. In [13], authors used AdaBoost to classify 211 kidney no direction in the first two levels, but the layers above them
stones and 201 phleboliths CT images. The 43 individuals in contain directed linkages to lower layers. DBNs are distinct
32896 VOLUME 12, 2024
J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

from standard neural networks in that they may function as


generating and discriminative models. DBNs are also dis-
tinct from other deep learning algorithms, such as Restricted
Boltzmann Machines (RBMs) or autoencoders, because they
do not deal with raw inputs like RBMs. Instead, they start
with an input layer with one neuron per input vector and go
through several layers until they reach a final layer where
outputs are created using probabilities derived from the acti-
vations of previous levels. To recognize 150 kidney stone CT
images, Sesha et al. [21] used AlexNet, an 8-layered DNN FIGURE 1. The block diagram of the proposed FINDWELL model.

architecture. DenseNet169, MobileNetV2, and GoogLeNet


are used by Bhardwaj and Shreenidhi [22] to recognize the
same. Compared to AlexNet, the GoogLeNet [23] design sig- III. PROPOSED METHODOLOGY
nificantly decreases the number of parameters learned. The The proposed methodology section is divided into five
architecture comprises 22 layers, including convolutional, sub-sections, including the proposed image fuzzy bright-
activation, and maximum pooling layers. MobileNetv2 [24] ness enhancement technique, the creation of three different
was built with 53 layers, and roughly 3.5 million parameters datasets (KD1, KD2, and KD3) by using image augmentation
were learned. To extract features from images, MobileNetV2 technique, feature extraction using inductive transfer-based
employs lightweight depth-wise convolutions. DenseNet169 DNNs, and the creation of ensemble feature vector, fea-
[25] has 169 layers and is relatively low in parameters com- ture selection by using IRF feature selection function, the
pared to other models. It also handles the vanishing gradient classification using KNN classifier and the performance mea-
problem effectively. The authors employed the cross-residual sure metric. The proposed approach (FINDWELL) is based
network (XResNet-50) model in [40] to identify kidney on Pre-Trained DNN (PT-DNN) models. Training the DNN
stones. The XResnet-50 architecture is divided into four models from the beginning necessitates a large number of
phases. In the Stem and MaxPooling layers, the picture reso- labeled training data, which is highly problematic in detecting
lution is lowered by half. Furthermore, each level contains medical images, which has been solved by employing induc-
ResLayer blocks that degrade the resolution by half, and tive transfer and image augmentation approaches.
these blocks are made up of many layers. The XResNet- Figure 1 shows the block diagram of the FINDWELL
50 deep model was trained from scratch on raw kidney CT model.
images. This investigation included 433 patients, 278 stone
positive and 165 normal. Although these DNNs are incredibly A. IMAGE FUZZY BRIGHTNESS ENHANCEMENT
precise, they require many weeks of training, making them This part aims to provide an innovative FIS to improve image
computationally challenging. brightness while avoiding the disadvantages of previous tech-
The majority of available approaches identify kidney niques [29]. The grey-level image intensities are projected
stones using a single model. However, the proposed approach onto a fuzzy space using MFs, the MFs are modified for
(FINDWELL) combines the features extracted from three contrast improvement, and the fuzzy space is mapped back
independent DNN models (ensemble model) and selected to the grey-level image intensities.
using IRF feature selector in fusion with the KNN classifier The input CT image (of size Q × S) color mode is first
tuned by BO, resulting in a more efficient detection model converted to CIELab from the RGB color mode, and the
than the distinct models. Ensemble learning is the process L channel is transformed [39]. The L channel in CIELab
of combining numerous deep learning models. This is done color mode displays brightness and depicts a human’s eye
to improve deep learning model predictions, classification, sensitivity to light during daylight circumstances. As a result,
or other functions. Ensemble learning may also be used to the overall visual contrast may be improved by adjusting the
develop a new model by combining the functionality of many L channel.
deep learning models. Creating a new model offers several After that, the average pixel intensity (P A ) is computed
advantages over training a new model from the start. Here are using equation (1), where I QS is the pixel intensity at position
a few examples. Because most learning is obtained from cou- QS.
pled models, the new ensemble model takes comparatively P P
less data to train. New ensemble models offer more accuracy Q S IQS
PA = (1)
and capabilities than those used to create the new model as Q×S
features are learned by several models. Thus, the advantages
and the feature learning capabilities of different deep models The L channel is altered by utilizing eight MFs (equa-
can be used in ensemble models. The presented approach tions (2) - (9)) that vary with I and P A .
performs effectively in both quality and noisy images. The
− (I − (−25))2
  2
FINDWELL model produced the most significant classifica- 2× PA
tion accuracy compared to state-of-the-art approaches. µExtremelyDark = e 9
(2)

VOLUME 12, 2024 32897


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

− (I )2
  2
PA

µVeryDark = e 9
(3)
 2
PA 2
− I− 2

PA

µDark = e 9
(4)
  2
PA 2
− I − 5× 4

PA FIGURE 2. Transformation of datasets: from KD to KD3.

µMediumDark = e 9
(5)
  2
255−PA PS
Q (RGB (Q, S) − RGBE (Q, S))2
− I − PA +  2
9 255−PA
2× RMSE = (12)
µMediumBright = e 9
Q×S
(6) where H denotes maximum kidney CT image intensity,
2
(Q,S) represents the image size, and RGB and RGBE denote


255−PA 2
− I − PA + 2

255−PA
2× the original and enhanced images, respectively.
µBright = e 9

(7) B. IMAGE AUGMENTATION


− (I − 255) 2 
255−PA
2 Image augmentation is the most often utilized strategy for

µVeryBright = e 9
(8) reducing overfitting during the DNN training phase [30].
Image augmentation does not affect the semantic meaning
− (I − 300) 2 
255−PA
2
2× of the images. Training the network on a bigger augmented
µExtremelyBright = e 9
(9) dataset increases the resilience and generalizability of DNN
In this study, bright pixels are converted to brighter and to new data [31]. To create 6000 normal and 6000 kidney
dark pixels to boost the brightness of the input image. stone CT images, various image augmentation techniques
This technique generates the rulebase for this fuzzy system, such as horizontal flip, vertical flip, horizontal + vertical flip,
as indicated in equation (10). In equation (10), OL and and salt and pepper noise are applied.
IL denote the output and input L channel pixel values, CT images often contain noise. This is the motivation
respectively. for adding salt and pepper noise to the dataset. ‘‘Salt and
pepper noise’’ refers to a condition in which random pixels
IL VeryBright → OL ExtremelyBright are replaced with exceptionally dark or bright values. While
doing a CT scan of the kidney, the same situation can occur
IL Bright → OL VeryBright
where due to sharp and sudden disturbances in the image
IL MediumBright → OL Bright signal, the scan is having the impact of salt and pepper noise.
IL MediumDark → OL Dark Three kidney CT image dataset variants, i.e., KD1, KD2,
IL Dark → OL VeryDark and KD3, are created from the original KD dataset. KD1 is
created by applying horizontal flipping, vertical flipping, and
IL VeryDark → OL ExtremelyDark (10)
horizontal + vertical flipping on the images of KD. On the
Once the L channel membership value is determined, other hand, noisy kidney CT images can be found in the KD2
the updated L channel value (Lmod ) is computed using dataset which are generated by applying salt and pepper noise
equation (11). on the KD1 images. In KD2, the noise is applied only to the
testing images. KD3 is generated using KD2 images where
PQ
Ii ·µ (Ii ) the salt and pepper noise is applied to the training set CT
Lmod = Pi=1Q
(11) images.
i=1 µ (Ii ) The transformation of the dataset images from KD to KD3
The L channel might be enhanced by substituting the is shown in Figure 2. The datasets KD2 and KD3 are cre-
original L values with the Lmod values, resulting in a better ated to test the FINDWELL model’s performance in a noisy
Lmod channel. This modified detail may be coupled with the environment.
conserved chromatic data (a and b channels) to produce a
modified CIELabE image, which can then be converted into C. FEATURE EXTRACTION USING INDUCTIVE
an improved RGBE image. Using the FIS, enriched samples TRANSFER-BASED ENSEMBLE DNNs
are created from each kidney CT image. Inductive transfer is the technique of training a network on a
The Peak Signal to Noise Ratio (PSNR) is determined large dataset of annotated images to acquire the knowledge
using equation (12) to evaluate the performance of the pro- of general image descriptors or features and then transferring
posed FIS-based image enhancement approach. the retrieved features to a relatively small dataset for detec-
tion [6]. Figure 3 depicts the inductive transfer framework for
H2
 
PT-DNN. In this study, PT-DNN models are used that were
PSNR = 10log10
RMSE created by training DNN architectures from the beginning

32898 VOLUME 12, 2024


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 4. Ensemble feature vector creation for the FINDWELL model.

The feature vectors generated from PT-DNN 1, 2, and 3 are


concatenated to generate the ensemble feature vector.

D. FEATURE SELECTION FROM INDUCTIVE


TRANSFER-BASED ENSEMBLE DNNs
Ensemble DNN extracted many features during the feature
generation phase, and the most valuable descriptors or fea-
tures are chosen utilizing an Iterative ReliefF (IRF) features
selector [8]. Relief is a frequently used feature selection
approach that is feature precision and discovers the estima-
tor’s weights when the estimator’s output is dependent on
FIGURE 3. The inductive transfer framework of PT-DNN for kidney stone a multiclass variable. The relief algorithm is used to rank
detection. features for the sample chosen from the data set, considering
the proximity of other classes in its class and its distance from
with images obtained from the ImageNet database [32]. other classes. Once this feature ranking is done using a model
1.2 million images are used to train the model to recognize with negative and positive weights, the feature selection pro-
1000 categories. During the training phase, the SoftMax clas- cedure is finished. To the best of our knowledge, the Relief
sifier receives features derived by the DNN for categorizing algorithm and its variations are the only individual evaluation
1000 class images, as illustrated in Figure 3. Three PT-DNNs, filter methods capable of identifying feature dependencies.
DarkNet19 [26], InceptionV3 [27], and ResNet101 [28] are These algorithms do not search through feature combinations
employed in the proposed method to detect CT kidney stone but rather employ the idea of closest neighbors to obtain
images automatically. feature statistics that account for interactions in an indirect
Each PT-DNN model is trained and tested using several manner. To improve the classification ability, the feature
types of datasets KD1, KD2, and KD3. These datasets are reduction strategy is applied.
created by adding some variations in the dataset images. The ReliefF creates positive and negative weights and cal-
total number of kidney CT images in the dataset is 12000 culates them using the Manhattan distance. ReliefF is an
(Normal: 6000, Stone: 6000). The PT-DNN model trained improved version of Relief and chooses the most notice-
on ImageNet dataset images supports learning descriptors or able aspects. The relief-based feature selection approach
features from kidney CT images. Because the FINDWELL employs Euclidean distance. The ReliefF weights are built
model is built from an ensemble of multiple PT-DNNs, the using the Manhattan distance. As a result, the IRF approach
feature vector from each model is combined to produce the is suggested.
final anticipated feature vector for further processing. In IRF, the previous iteration’s feature weights W are used
To decide which DNNs are to be combined to generate to update pairwise distance computations each time, such
the ensemble DNN, the experiment is conducted by using that a low-scoring feature from the prior iteration has less
the KD1 dataset as it contains quality images and the dataset effect on instance distance in the current iteration. When the
is balanced. To improve the classification model, the three distance weights are iteratively updated, certain samples may
best performances of PT-DNNs are examined and ensembled enter and leave the neighborhoods of other samples. Iterative
together. The goal of the study is to create a lightweight, Relief additionally included a radius to define neighborhoods
computationally less expensive ensemble model. rather than a predetermined number of instances to decrease
These PT-DNNs are trained on kidney CT images to detect discontinuities in feature weight estimations caused by shift-
kidney stone images. The ensemble method’s primary goal is ing neighborhoods. Iterations were repeated until the weights
to integrate the predictions of various models to get superior converged or a maximum number of iterations was achieved.
classification output [33]. The ensemble approach can poten- The main motivation for using the IRF method is the itera-
tially minimize network model prediction variation [34] and tion range reduces the time, and the loss is calculated at every
bias [34]. As a result, three PT-DNN models are assembled feature selection. Also, this method is less sensitive to noise
in the proposed method for reliable detection of CT kidney as it looks for n nearest neighbors iteratively. This method
stone images. Figure 4 depicts the ensemble feature vector can also handle the missing values. The IRF feature selector
generation technique for the proposed strategy. selects the most valuable features by iteratively removing the

VOLUME 12, 2024 32899


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

irrelevant features, which are used for the following process-


ing. For optimal qualities, a loss calculator should be used.
As a result, in this study, the loss value calculator is a KNN
classification algorithm tuned by using BO with a 10-fold
Cross-Validation (CV).

E. KIDNEY STONE CT IMAGES CLASSIFICATION BY USING FIGURE 5. Samples from the KD dataset: normal (first row), kidney stone
KNN OPTIMIZED BY BAYESIAN OPTIMIZER (last row).
The ensemble deep features are classified using a KNN clas-
sifier. The Bayesian Optimization (BO) [9] method is used to
adaptively optimize the hyperparameters of KNN to increase
prediction performance. The critical advantage of BO is that
it is resistant to non-convex issues and is less prone to slip into
a local optimum. The BO procedure requires a prior function, FIGURE 6. Image enhancement using fuzzy logic: (First) Original image,
such as the frequently used Gaussian kernel, to optimize the (Second) Enhancement using HE, (Third) Enhancement using CLAHE,
(Last) Enhancement using the proposed approach.
objective function. The latter is defined by its mean func-
tion and covariance function, both of which are computed
from data points. The kind of distance metric (Chebychev, GDR
cityblock, cosine, correlation, Hamming, Euclidean, Maha- TP − FP
= (19)
lanobis, Jaccard, Seuclidean, Minkowski, Spearman) and the TP + FN
number of neighbors given by K are the parameters of KNN CSI
to be optimized in this study. BO’s fitness function produced TP TP
the lowest misclassification rate. = + −1 (20)
TP + FP TP + FN
F. PERFORMANCE MEASURE These performance matrices are utilized to compute the
FINDWELL model’s quantitative evaluation is based on pre- model’s performance in classifying kidney CT images. The
cision, recall, accuracy, F1 score, kappa score, MCC, GDR performance measurements are computed after training and
and CSI measures. The values TP, FP, TN, and FN reflect the testing with different i.e., KD, KD1, KD2, and KD3 datasets.
expected true positive, false positive, true negative, and false
negative samples, respectively. Equations 13 – 20 provide the IV. RESULTS AND ANALYSIS
performance measures. For the training of the FINDWELL model, Intel Core i7,
2.6 GHz 6-Core processor, NVIDIA GeForce GTX 1600 Ti
Precision
Graphics (6GB), and 16 GB 2667 MHz DDR4 RAM, is used
TP
= (13) which minimizes the time of training. TensorFlow 2.9.2 is
TP + FP used for the implementation of the suggested approach.
Recall
TP A. DATASETS
= (14)
TP + FN The kidney CT images are collected from [35]. The dataset
was gathered through PACS (Picture archiving and communi-
Accuracy
cation system) from several hospitals in Dhaka, Bangladesh,
TP + TN
= (15) where patients had already been diagnosed with normal or
TP + FP + TN + FN stone kidney results. The Coronal and Axial slices were
F1Score chosen from both contrast and non-contrast examinations,
2×Precision × Recall and the entire abdomen and urogram protocol was fol-
= (16) lowed. The Dicom study was then carefully picked, one
Precision + Recall
diagnosis at a time, and a batch of Dicom images of the
Kappa region of interest for each radiological result was made
2 × (TP × TN − FN × FP) from those. Each patient’s information and metadata are
= then removed from the Dicom images and converted to a
(TP + FP) × (FP + TN ) + (TP + FN ) × (FN + TN )
lossless jpg image format. Following the conversion, each
(17) imaging finding was double-checked by a radiologist and a
medical technician to ensure that the data was correct. The
MCC
collected dataset consists of 5077 normal and 1377 kidney
TP × TN − FP × FN stone images. In this study, this dataset is named as KD
=√
(TP + FP) × (TP + FN ) × (TN + FP) × (TN + FN ) dataset. Figure 5 represents some sample images from the KD
(18) dataset.

32900 VOLUME 12, 2024


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 7. Samples from KD1 (Training: Augmented, Testing: Augmented), FIGURE 8. The best point hyperparameter of the utilized KNN classifier.
KD2 (Training: Augmented, Testing: Augmented + Noisy), KD3 (Training:
Augmented + Noisy, Testing: Augmented + Noisy) datasets. The first three TABLE 2. Network analysis of several PT-DNNs.
images are from the training set and the last three images are from the
testing set.

TABLE 1. The proportion of data for training, validation, and testing.

B. IMAGE BRIGHTNESS ENHANCEMENT BY USING THE


PROPOSED APPROACH
After collecting the dataset images, the first step is to
enhance the brightness of the KD dataset images using
the proposed FIS. The outcome of the proposed fuzzy TABLE 3. Performance comparison of several PT-DNNs.

image brightness enhancement approach is compared to that


of two well-known image enrichment methods, histogram
equalization (HE) and Contrast-Limited Adaptive Histogram
Equalization (CLAHE). The results of the HE, CLAHE, and
suggested fuzzy image brightness enhancement approaches
are shown in Figure 6.
PSNR is used to evaluate the efficacy of the proposed fuzzy
image brightness enhancement method. CLAHE, HE, and the
proposed FIS enrichment technique have average PSNRs of
9.5, 8.2, and 10.7, respectively.
Some samples from KD1, KD2, and KD3 datasets are
C. KD1, KD2 AND KD3 DATASETS FROM KD depicted in Figure 7.
As the KD dataset is not balanced, an image augmentation
technique is employed on KD dataset and three different D. TRAINING CRITERIA
datasets are created: KD1, KD2, and KD3 which are bal- We choose 80% of the kidney CT images for training and
anced and having different varieties. After augmentation, 20% for testing for the individual and ensemble DNN models.
each dataset contains 6000 normal images and 6000 kidney On training images, we employed 10-fold cross-validation.
stone CT images. In the training, validation, and testing sets, we have an equal

VOLUME 12, 2024 32901


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

TABLE 4. Performance comparison of several ensemble DNNs.

FIGURE 9. Ensemble feature vector creation for the FINDWELL model.

proportion of normal and kidney stone CT images. Table 1


shows the proportion of data for training, validation, and
testing of the individual and proposed ensemble DNN model.

E. SELECTION OF BEST POINT HYPERPARAMETER VALUE FIGURE 10. Performance comparison of different DNN approaches as
USING KNN CLASSIFIER well as the proposed ensemble technique.
To find out the best point hyperparameter value using the
KNN classifier, 40 iterations are considered. Figure 8 depicts regularization. It is a technique for regularizing the classifier
the best point hyperparameter of the utilized KNN classifier. by assessing the impact of label dropout during training.
To carry out this experiment the KD1 dataset is used. The It stops the classifier from making too confident predictions
optimized K value is 1 with the Euclidean distance metric. about a class. ResNet101 provides heuristics to increase the
parallelism of training and decrease the computational cost
F. SELECTION OF DNNs TO CREATE THE ENSEMBLE DNN through lower precision computing and modifying the learn-
FOR FEATURE EXTRACTION ing rate or biases.
Table 2 shows the network analysis of several PT-DNNs used The following are the network parameters of the DNNs
in this study. used in this study: DarkNet19 [26], InceptionV3 [27], and
The performance of individual PT-DNNs is compared to ResNet101 [28]. DarkNet19 consists of 19 Convolutional
create the ensemble DNN using a 10-fold CV which is listed Layers, 18 Batch Normalization, 18 Leaky ReLU, five max-
in Table 3. Here the images from the KD1dataset are used as pooling, and 1 SoftMax with a weight decay of 0.005 and
it contains quality images and the dataset is balanced. momentum of 0.9. The learning rate is set to 0.0002 and
From Table 3 we can conclude that the top five PT-DNNs the batch size is 512. After the training phase, 88.1 million
which result in good performances and p-value < 0.05 are parameters are learned. InceptionV3 is 48 layers deep with
DarkNet19 [26], ResNet101 [28], InceptionV3 [27], Shuf- a weight decay of 0.9 and momentum of 0.9. The learning
fleNet [37], and MobileNetV2 [22]. The combination of these rate for the network is set to 0.001, and batch sizes range
PT-DNNs is used to create the proposed ensemble DNN. from 4,096 to 16,384. ResNet101 is 101 layers deep with a
The performance comparison of the combination of these weight decay of 10e−4 and momentum of 0.9. The quickest
PT-DNNs is listed in Table 4. training progress is seen at a learning rate of 0.1 and a batch
From Table 4 it is clear that Ensemble 3 which is the combi- size of 768. After the training phase, 23.9 million parameters
nation of DarkNet19 [26], ResNet101 [28], and InceptionV3 are learned. ResNet101 is 101 layers deep with a weight
[27] is producing the best performance (97.1% accuracy) decay of 0.0001 and momentum of 0.9. The learning rate
compared to other ensemble DNNs. As a result, DarkNet19 and batch size are set to 0.001 and 64. After the training
[26], InceptionV3 [27], and ResNet101 [28] are selected for phase, 44.7 million parameters are learned. Adam optimizer
their unique performance and lightweight structure. Dark- is used to train every DNN. Henceforth, the combination
net19 is faster and more precise since it is less complicated. (concatenation) of the features generated from these PT-DNN
The advantage of using InceptionV3 is the use of an auxiliary architectures, i.e., DarkNet19, InceptionV3, and ResNet101,
classifier. In very deep networks, the auxiliary classifier is are employed in the proposed FINDWELL model. Figure 9
primarily employed to tackle the vanishing gradient problem. depicts the ensemble feature vector generation technique for
In the early rounds of training, the auxiliary classifiers did not the proposed strategy.
result in any improvement. However, in the end, the network The feature vectors generated from DarkNet19, Incep-
with auxiliary classifiers outperformed the network with- tionV3, and ResNet101 are concatenated to generate the
out auxiliary classifiers. InceptionV3 uses label smoothing ensemble feature vector.

32902 VOLUME 12, 2024


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 11. Classification error vs the number of features utilized with


the IRF features selector.

TABLE 5. The model’s accuracy comparison using IRF, TuRF, and


VLSReliefF feature selectors on the test samples of the KD1 dataset. FIGURE 12. The classification performance comparison uses the
proposed normal ensemble DNN and FINDWELL model.

TABLE 6. Comparison of accuracy on KD1 dataset using different CV


folds.

Fig. 10 represents the accuracy comparison of the perfor-


mance among the existing DNNs and the proposed ensemble
method.
According to Figure 10, the proposed technique obtains feature vector has a length of 146, as shown in Figure 11.
a higher classification accuracy of 97.1% when compared KNN classifier tuned using BO is used to classify these deep
to the individual PT-DNNs [21], [23], [26], [27], [28], [36], features (see Figure 8).
[37]. The proposed approach ensembles the prediction find- IRF choose 146 optimal features per image from the pro-
ings from DarkNet19 [26], InceptionV3 [27], and ResNet101 duced 5120 features, which are then input into the KNN
[28], resulting in a more powerful classification model. This classifier tuned by BO for the classification of CT kidney
implies that the proposed ensemble DNN can properly detect stone images.
CT kidney stone images with fewer misclassification errors. The performance of the proposed ensemble model on KD1
test samples is tested with IRF feature selector and is com-
G. DETECTION OF CT KIDNEY STONE IMAGES BY USING pared with another two iterative feature selection methods
SELECTED DEEP FEATURES USING IRF FROM THE KD1 i.e., TuRF and VLSReliefF. Table 5 depicts the model’s
DATASET performance comparison using IRF, TuRF, and VLSReliefF
After constructing the DNN ensemble for the CT kidney feature selectors.
stone image detection, first, the performance of the proposed From Table 5, it can be noticed that there is no signifi-
DNN ensemble is tested on the KD1 dataset. Performance cant difference in the accuracy obtained using the proposed
measures such as accuracy, precision, recall, F1-score, kappa ensemble DNN and IRF, TuRF, and VLSReliefF feature
score, MCC, GDR, and CSI are utilized to evaluate the selector but we can see a significant difference in the time
performance. For that, the first task is to select the number of execution. As time is also an important factor in any
of meaningful features by using the IRF feature selector. automated detection system, we used the IRF feature selector
Number of features generated per image from the proposed for further processing.
ensemble DNN (DarkNet19 (features: 1024), InceptionV3 Fig. 12 shows the performance comparison while using
(features: 2048), and ResNet101 (features: 2048)) is 5120. normal ensemble DNN features and ensemble DNN +
To select the optimal number of features for the detection of selected features using IRF for the detection. The proposed
kidney CT images we used the iteration range of {100:300}. FINDWELL model obtained a better classification accuracy
Figure 11 depicts a graph of classification error vs the number of greater than 99% using the IRF-selected features and
of features utilized with the IRF features selector. The best 10-fold CV.

VOLUME 12, 2024 32903


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 15. Some incorrectly classified test samples using the FINDWELL
FIGURE 13. Confusion matrix and performance measure generated using model from the KD1 dataset (A) Normal, (B) Kidney stone.
the FINDWELL model when applied to the test samples of the KD1
dataset.
Figure 15 depicts some incorrectly classified test samples
using the FINDWELL model from the KD1 dataset.
The fundamental issues related to image classification
arise from the fact that images are simply massive matrices
with a great number of pixels present, and they are extremely
complicated. Training a computer to accurately categorize
images is a time-consuming and challenging undertaking.
In some circumstances, the incorrect features may be utilized
to make judgments.

H. PERFORMANCE OF FINDWELL MODEL USING KD2


DATASET AND COMPARISON OF THE PERFORMANCE
WITH EXISTING DNNs
FIGURE 14. Some correctly classified test samples using the FINDWELL The dataset KD2 is used to assess the FINDWELL model’s
model from the KD1 dataset (A) Normal, (B) Kidney stone.
performance in a noisy environment. This dataset KD2 con-
tains augmented training images and augmented + noisy
Before calculating all the performance measures using the testing images. Images are generally distorted by noise in
KD1 dataset, we examined the accuracy values obtained from actual applications; therefore, validating the FINDWELL
FINDWELL using different CV folds, i.e., {5, 10, 15, 20, 25, model’s efficiency in noisy conditions is more practical.
and 30}, which is depicted in Table 6. Figure 16 depicts the FINDWELL model’s performance
Table 6 clearly shows that a 10-fold CV produces better deterioration when tested with noisy images. As seen in
classification results on the KD1 dataset compared to other Figure 16, the performance of the networks degrades if noise
folds. levels increase. Also, as seen in Figures 13 and 17, there is
Figure 13 depicts the confusion matrix produced using the a considerable decrease in DNN classification accuracy as
FINDWELL model with the 10-fold CV on the test samples noise variation increases. This leads to the conclusion that all
from the KD1 dataset. Also, the performance measures are present approaches are noise-sensitive, and their performance
included in Figure 13. The confusion matrix provides insight suffers significantly as the noise intensity rises. Another rea-
into the number of errors the network model produces when son for the poor performance can be the disparity in the
predicting each class. quality of training and testing images. As a result, the per-
It is important to note that the FINDWELL model, as illus- formance of DNN is quite bad, which is rectified by training
trated in Figure 13, achieves nearly 100% recall. This and testing the models with dataset KD3. At noise levels
demonstrates that the proposed method accurately classifies of 0.02, 0.04, 0.06, 0.08, and 0.1, the proposed technique
almost all CT kidney stone images with minor classification achieves 85%, 79.6%, 71.4%, 64.7%, and 57.1% accuracy,
mistakes. respectively using the KD2 dataset images.
Figure 14 depicts some correctly classified test samples Before calculating all the performance measures using the
using the FINDWELL model from the KD1 dataset. KD2 dataset, we examined the accuracy values obtained from

32904 VOLUME 12, 2024


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 18. Some correctly classified test samples using the FINDWELL
model from the KD2 dataset (A) Normal, (B) Kidney stone.
FIGURE 16. Comparison of the performance of the proposed FINDWELL
model with existing DNNs using the KD2 dataset.

TABLE 7. Comparison of accuracy on KD2 dataset using different CV


folds.

FIGURE 19. Some incorrectly classified test samples using the FINDWELL
model from the KD2 dataset (A) Normal, (B) Kidney stone.

From the confusion matrix and the performance measure


shown in Figure 17, it can be concluded that while using
a DNN-based technique for supervised classification, the
training and testing samples should contain similar augmen-
tations. In the KD2 dataset, as the only testing data is noisy,
the existing DNN and the proposed technique cannot handle
FIGURE 17. Confusion matrix and performance measure using the it properly.
FINDWELL model when applied to the test samples of the KD2 dataset. Fig. 18 depicts some correctly classified test samples using
the FINDWELL model from the KD2 dataset.
FINDWELL using different CV folds i.e., {5, 10, 15, 20, 25, Fig. 19 depicts some incorrectly classified test samples
and 30} which is depicted in Table 7. using the FINDWELL model from the KD2 dataset.
Table 7 clearly shows that a 10-fold CV produces better The main reason for the poor performance of the FIND-
classification results on the KD2 dataset compared to other WELL model using KD2 dataset images is the variation in
folds. training and testing image samples. The model is trained with
Figure 17 depicts the confusion matrix produced using the quality images and we’ve tested the model with noisy images.
FINDWELL model applied to the KD2 test dataset and the Because of the presence of the noise in the test samples,
performance measure. This matrix is generated by consider- the extracted feature vectors are significantly different from
ing the noise level of 0.02. the extracted features from the training samples. Thus, the

VOLUME 12, 2024 32905


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 20. Comparison of the performance of the proposed FINDWELL


model with existing DNNs using the KD3 dataset.

TABLE 8. Comparison of accuracy on KD3 dataset using different CV


folds.

FIGURE 21. Confusion matrix and performance measure using the


FINDWELL model when applied to the test samples of the KD3 dataset.

model fails to detect the images correctly even if the main


foreground is large and has enough information for the proper
image detection. This issue is solved by creating the KD3
dataset.

I. PERFORMANCE OF FINDWELL MODEL USING KD3


DATASET AND COMPARISON OF THE PERFORMANCE
WITH EXISTING DNNs
KD3 is a dataset made up of augmented as well as noisy train-
ing and testing CT images. The classification performance of
FIGURE 22. Some correctly classified test samples using the FINDWELL
the DNN is enhanced by training it with noisy images, as seen model from the KD3 dataset (A) Normal, (B) Kidney stone.
in Figure 20. Figures 16 and 20 show that the FINDWELL
model performs better at all noise levels when trained with
KD3 than with the KD2 dataset. This is mainly because the Fig. 21 depicts the confusion matrix produced using the
quality of training and testing is similar to dataset KD3. When FINDWELL model applied to the test samples of the KD3
compared to previous approaches, Figure 15 clearly shows dataset as well as the performance measure. This matrix is
that the FINDWELL model attains the highest classification generated by considering the noise level of 0.02.
accuracy at every noise level. At noise levels of 0.02, 0.04, One can notice from Figure 21 that the FINDWELL model
0.06, 0.08, and 0.1, the proposed technique achieves 96.7%, can efficiently classify the images in a noisy environment.
96.2%, 95.8%, 95.1%, and 94.6% accuracy, respectively. This Fig. 22 depicts some correctly classified test samples using
demonstrates that the proposed technique (FINDWELL) can the FINDWELL model from the KD3 dataset.
properly categorize images even in the existence of high and Fig. 23 depicts some incorrectly classified test samples
low noise levels in the test images. Furthermore, as shown in using the FINDWELL model from the KD3 dataset.
Figures 13 and 21, the ensemble-based system outperforms The main reason behind the misclassification of some of
other methods in accurately detecting CT kidney images in the test samples from the KD3 dataset is the presence of
both high-quality and noisy environments. noise as the presence of noise can significantly change image
Before calculating all the performance measures using the features.
KD3 dataset, we examined the accuracy values obtained from
FINDWELL using different CV folds, i.e., {5, 10, 15, 20, 25, J. PERFORMANCE OF FINDWELL MODEL ON KD DATASET
and 30}, which is depicted in Table 8. IMAGES
Table 8 clearly shows that a 10-fold CV produces better Lastly, we tested the performance of the proposed method on
classification results on the KD1 dataset compared to other the KD dataset, which contains original (without augmenta-
folds. tion) images.

32906 VOLUME 12, 2024


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

FIGURE 24. Confusion matrix and performance measure using the


FINDWELL model when applied to the test samples of the KD dataset.

FIGURE 23. Some incorrectly classified test samples using the FINDWELL
model from the KD3 dataset (A) Normal, (B) Kidney stone.

TABLE 9. Comparison of accuracy on KD dataset using different CV folds.

Before calculating all the performance measures using the


KD dataset, we examined the precision values (as the dataset
is imbalanced so accuracy is not a good choice) obtained from
the proposed ensemble DNN using different CV folds i.e., {5,
10, 15, 20, 25 and 30} which is depicted in Table 9.
Table 9 shows that a 10-fold CV produces better precision
results on the KD dataset compared to other folds. FIGURE 25. Some correctly classified test samples using the FINDWELL
model from the KD dataset (A) Normal, (B) Kidney stone.
Figure 24 depicts the confusion matrix produced using the
FINDWELL model on the test samples of the KD dataset and
the performance measure. The main reason for the misclassification can be the pro-
It can be noticed from the experimental results obtained portion of normal and stone images which is imbalanced. This
from the performance of the proposed method on the KD imbalance can cause problems in deep learning, as algorithms
dataset, that the performance is quite poor compared to the may fail to learn and predict accurately for the minority class,
augmented datasets. resulting in biased outputs and poor performance, which is
Fig. 25 depicts some correctly classified test samples using visible.
the FINDWELL model from the KD dataset.
Despite the poor performance of the FINDWELL model K. PERFORMANCE COMPARISON OF THE PROPOSED
on the KD dataset, some images are correctly classified, FINDWELL MODEL WITH THE EXISTING METHODS
as shown in Figure 25. The reason may be the proper use of Table 10 compares the FINDWELL model’s accuracy using
K-fold cross-validation. a 10-fold CV with the existing method.
Fig. 26 depicts some incorrectly classified test samples According to Table 10, the performance of the FINDWELL
using the FINDWELL model from the KD dataset. model is superior to the current methods for all three datasets.

VOLUME 12, 2024 32907


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

TABLE 11. Performance comparison of the FINDWELL model on the KD,


KD1, KD2, and KD3 dataset.

InceptionV3 models make it simple to construct deep fea-


tures. As a result, in this study, the ensemble feature vector
is generated utilizing the models above. IRF feature selection
is utilized to lower the time load of the utilized classifier while
increasing the performance of CT kidney image detection.
This study employs IRF, KNN, and BO approaches since
they present a unique feature engineering model. The perfor-
FIGURE 26. Some incorrectly classified test samples using the FINDWELL mance comparison of the FINDWELL model using the KD,
model from the KD dataset (A) Normal, (B) Kidney stone.
KD1, KD2, and KD3 datasets is shown in Table 11.
TABLE 10. The accuracy comparison (%) of the FINDWELL model with the From Table 11, we can deduce the following points regard-
existing method. ing the performance efficiency of the FINDWELL model.
We can conclude from the accuracy performance measure
that the FINDWELL model can produce good classification
results using quality and noisy images. From the precision,
recall, and F1 score values obtained from the KD1 and KD3
datasets we can conclude that the model can efficiently detect
the kidney stone CT images in both quality and noisy envi-
ronments. After observing the kappa score, we can say that
the FINDWELL model has a very good agreement between
the real-world observer vs classification model. The high
MCC value obtained denotes that the FINDWELL model has
correctly classified a high percentage of normal CT images
and a high percentage of kidney stone CT images. The high
GDR value represents that the FINDWELL model is an effi-
cient model detecting CT images in both quality and noisy
environments. The high CSI value indicates that the FIND-
WELL model is well suited for detecting rare events in both
quality and noisy environments. The p-value denotes that
the proposed FINDWELL model can efficiently handle both
KD1 and KD3 dataset images. The overall performance of the
model using the KD dataset is very poor due to the imbalance
characteristic of the dataset. The performance measures of the
FINDWELL model using the KD2 dataset images are also
poor compared to the KD1 and KD3 datasets. The reason can
be the mismatch of the training and testing feature vectors
V. DISCUSSION (the training dataset doesn’t contain any noisy samples, but
Disease locations in medical images are often visible in a we tested the performance of the FINDWELL model using
limited portion of the image. Local traits must be taken from noisy samples). So, we can say that the FINDWELL model
such localized locations to identify diseases properly. As a will not perform well if the dataset is imbalanced or the
result, an ensemble deep feature creator is employed in the testing samples do not have the same properties as the training
proposed model (FINDWELL) to make use of the benefits of samples.
more than one DNN. This effort also tries to use the benefits Figures 13 and 21 show that the proposed FINDWELL
of inductive transfer. Pretrained DarkNet19, ResNet101, and model can efficiently classify both quality and noisy images.

32908 VOLUME 12, 2024


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

Table 10 also shows comparative results achieved by the pro- REFERENCES


posed model with previously utilized approaches. Table 10 [1] National Kidney Foundation. Accessed: Jan. 2, 2023. [Online]. Available:
https://www.kidney.org/kidneydisease/global-facts-about-kidney-disease
clearly illustrates that the FINDWELL model outperformed
[2] E. Jussupow, K. Spohrer, A. Heinzl, and J. Gawlitza, ‘‘Augmenting
other DNNs utilized in the literature regarding accuracy. medical diagnosis decisions? An investigation into physicians’ decision-
making process with artificial intelligence,’’ Inf. Syst. Res., vol. 32, no. 3,
pp. 713–735, Sep. 2021, doi: 10.1287/isre.2020.0980.
VI. CONCLUSION [3] V. Singh, V. K. Asari, and R. Rajasekaran, ‘‘A deep neural network for early
We introduced a kidney stone detection approach in this paper detection and prediction of chronic kidney disease,’’ Diagnostics, vol. 12,
that combined deep features from pre-trained deep convolu- no. 1, p. 116, Jan. 2022, doi: 10.3390/diagnostics12010116.
[4] I. Castiglioni, L. Rundo, M. Codari, G. Di Leo, C. Salvatore,
tional neural networks with ML classifiers. To extract deep
M. Interlenghi, F. Gallivanone, A. Cozzi, N. C. D’Amico, and
characteristics from kidney CT images, we suggest using F. Sardanelli, ‘‘AI applications to medical images: From machine
many pre-trained deep convolutional neural networks. The learning to deep learning,’’ Phys. Medica, vol. 83, pp. 9–24, Mar. 2021,
doi: 10.1016/j.ejmp.2021.02.006.
retrieved deep features are subsequently analyzed using a
[5] M. Biswas, V. Kuppili, L. Saba, D. R. Edla, H. S. Suri, E. Cuadrado-Godia,
KNN classifier that has been tuned by BO. The top three deep J. R. Laird, R. T. Marinhoe, J. M. Sanches, A. Nicolaides, and J. S. Suri,
features that perform well on the KNN classifier are chosen ‘‘State-of-the-art review on deep learning in medical imaging,’’ Frontiers
and ensembled as a deep feature ensemble. Then IRF feature Biosci., vol. 24, no. 3, pp. 392–426, 2019, doi: 10.2741/4725.
[6] H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros,
selector is used to use only the meaningful features for the and T. Ganslandt, ‘‘Transfer learning for medical image classification:
classification which is then fed into the KNN classifiers to A literature review,’’ BMC Med. Imag., vol. 22, no. 1, p. 69, Dec. 2022,
predict the final output. In this experiment, we performed a doi: 10.1186/s12880-022-00793-7.
[7] S. T. Krishna and H. K. Kalluri, ‘‘Deep learning and transfer learning
thorough study of kidney stone identification utilizing 10 dif- approaches for image classification,’’ Int. J. Recent Technol. Eng. (IJRTE),
ferent pre-trained deep convolutional neural networks on four vol. 7, pp. 427–432, Feb. 2019.
distinct datasets (KD, KD1, KD2, and KD3). [8] N. Aslan, G. Ozmen Koca, M. A. Kobat, and S. Dogan, ‘‘Multi-
classification deep CNN model for diagnosing COVID-19 using iterative
Our experiment findings show that from our architecture neighborhood component analysis and iterative ReliefF feature selection
(1) the DarkNet-19 deep feature alone is a good option in techniques with X-ray images,’’ Chemometric Intell. Lab. Syst., vol. 224,
kidney stone detection, and (2) the ensemble of DarkNet19, May 2022, Art. no. 104539, doi: 10.1016/j.chemolab.2022.104539.
ResNet101, and InceptionV3 deep features is a good choice in [9] L. D. Sharma, H. Chhabra, U. Chauhan, R. K. Saraswat, and
R. K. Sunkaria, ‘‘Mental arithmetic task load recognition using EEG signal
kidney stone detection, (3)Selection of relevant features using and Bayesian optimized K-nearest neighbor,’’ Int. J. Inf. Technol., vol. 13,
IRF can increase the model performance, (4) FINDWELL no. 6, pp. 2363–2369, Dec. 2021, doi: 10.1007/s41870-021-00807-7.
can detect both qualities and noisy images. In conclusion, [10] J. Serrat, F. Lumbreras, F. Blanco, M. Valiente, and M. López-Mesas,
‘‘MyStone: A system for automatic kidney stone classification,’’ Exp. Syst.
our suggested FINDWELL ensemble technique overcomes Appl., vol. 89, pp. 41–51, Dec. 2017, doi: 10.1016/j.eswa.2017.07.024.
the limits of a single CNN model and offers improved and [11] A. Martínez, D. H. Trinh, J. El Beze, J. Hubert, P. Eschwege, V. Estrade,
robust performance, particularly for big datasets. These find- L. Aguilar, C. Daul, and G. Ochoa, ‘‘Towards an automated classification
method for ureteroscopic kidney stone images using ensemble learning,’’ in
ings revealed that our proposed technique, which employs Proc. 42nd Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2020,
an ensemble of deep features and an ML classifier, might pp. 1936–1939, doi: 10.1109/EMBC44109.2020.9176121.
aid radiologists and nephrologists in identifying stones in [12] J. Verma, M. Nath, P. Tripathi, and K. K. Saini, ‘‘Analysis and identification
of kidney stone using kth nearest neighbour (KNN) and support vector
CT kidney images. Modifications to the design of DNNs to machine (SVM) classification techniques,’’ Pattern Recognit. Image Anal.,
increase classification accuracy appear to be an intriguing vol. 27, no. 3, pp. 574–580, Jul. 2017, doi: 10.1134/s1054661817030294.
future effort. [13] T. De Perrot, J. Hofmeister, S. Burgermeister, S. P. Martin, G. Feutry,
The following are the limitations of the study. In this study, J. Klein, and X. Montet, ‘‘Differentiating kidney stones from phleboliths in
unenhanced low-dose computed tomography using radiomics and machine
6000 normal and 6000 patient individuals CT scans are used. learning,’’ Eur. Radiol., vol. 29, no. 9, pp. 4776–4782, Sep. 2019, doi:
The model must be evaluated with a more varied dataset to 10.1007/s00330-019-6004-7.
generalize the performance and get a robust model. We are [14] I. Aksakalli, S. Kaçdioglu, and Y. S. Hanay, ‘‘Kidney X-ray images clas-
sification using machine learning and deep learning methods,’’ Balkan
unable to reveal the location of kidney stones in this work. J. Electr. Comput. Eng., vol. 9, no. 2, pp. 144–151, Apr. 2021, doi:
An intelligent segmentation model can be used to solve this 10.17694/bajece.878116.
challenge. In the future, we hope to investigate the feasibility [15] B. Manoj, N. Mohan, and S. Kumar, ‘‘Automated detection of kidney
stone using deep learning models,’’ in Proc. 2nd Int. Conf. Intell. Technol.
of employing the created FINDWELL model to diagnose var- (CONIT), Jun. 2022, pp. 1–5, doi: 10.1109/CONIT55038.2022.9847894.
ious tumors and disorders using CT scans. The FINDWELL [16] A. Caglayan, M. O. Horsanali, K. Kocadurdu, E. Ismailoglu, and
model may also be utilized as a learning model to tackle other S. Guneyli, ‘‘Deep learning model-assisted detection of kidney stones
on computed tomography,’’ Int. Braz. J., vol. 48, no. 5, pp. 830–839,
computer vision challenges. Oct. 2022, doi: 10.1590/s1677-5538.ibju.2022.0132.
[17] D. Li, C. Xiao, Y. Liu, Z. Chen, H. Hassan, L. Su, J. Liu, H. Li, W. Xie,
W. Zhong, and B. Huang, ‘‘Deep segmentation networks for segment-
DECLARATION COMPETING INTERESTS ing kidneys and detecting kidney stones in unenhanced abdominal CT
The authors declare no competing interests. images,’’ Diagnostics, vol. 12, no. 8, p. 1788, Jul. 2022, doi: 10.3390/diag-
nostics12081788.
[18] K. K. Patro, J. P. Allam, B. C. Neelapu, R. Tadeusiewicz, U. R. Acharya,
ACKNOWLEDGMENT M. Hammad, O. Yildirim, and P. Plawiak, ‘‘Application of Kronecker
convolutions in deep learning technique for automated detection of kid-
The authors are thankful to the Vellore Institute of Technol- ney stones with coronal CT images,’’ Inf. Sci., vol. 640, Sep. 2023,
ogy (VIT), Vellore, for providing all the facilities and support. Art. no. 119005, doi: 10.1016/j.ins.2023.119005.

VOLUME 12, 2024 32909


J. Chaki, A. Uçar: Efficient and Robust Approach Using Inductive Transfer-Based Ensemble DNNs

[19] O. Sabuncu, B. Bilgehan, E. Kneebone, and O. Mirzaei, ‘‘Effective deep [38] X. Liu, X. Yang, M. Wang, and R. Hong, ‘‘Deep neighborhood component
learning classification for kidney stone using axial computed tomography analysis for visual similarity modeling,’’ ACM Trans. Intell. Syst. Technol.,
(CT) images,’’ Biomed. Eng./Biomedizinische Technik, vol. 68, no. 5, vol. 11, no. 3, pp. 1–15, Jun. 2020, doi: 10.1145/3375787.
pp. 481–491, 2023, doi: 10.1515/bmt-2022-0142. [39] J. Chaki, ‘‘Two-fold brain tumor segmentation using fuzzy image enhance-
[20] C. Yan and N. Razmjooy, ‘‘Kidney stone detection using an optimized ment and DeepBrainet2.0,’’ Multimedia Tools Appl., vol. 81, no. 21,
deep believe network by fractional coronavirus herd immunity optimizer,’’ pp. 30705–30731, Sep. 2022, doi: 10.1007/s11042-022-13014-8.
Biomed. Signal Process. Control, vol. 86, Sep. 2023, Art. no. 104951, doi: [40] K. Yildirim, P. G. Bozdag, M. Talo, O. Yildirim, M. Karabatak, and
10.1016/j.bspc.2023.104951. U. R. Acharya, ‘‘Deep learning model for automated kidney stone detec-
[21] S. S. Vidhya, D. Vishmitha, K. Yoshika, P. Sivalakshmi, V. Chowdary, tion using coronal CT images,’’ Comput. Biol. Med., vol. 135, Aug. 2021,
K. G. Shanthi, and M. Yamini, ‘‘Kidney stone detection using Art. no. 104569, doi: 10.1016/j.compbiomed.2021.104569.
deep learning and transfer learning,’’ in Proc. 4th Int. Conf. Inven- [41] P. Bosowski, J. Bosowska, and J. Nalepa, ‘‘Evolving deep ensembles
tive Res. Comput. Appl. (ICIRCA), Sep. 2022, pp. 987–992, doi: for detecting COVID-19 in chest X-rays,’’ in Proc. IEEE Int.
10.1109/ICIRCA54612.2022.9985723. Conf. Image Process. (ICIP), Sep. 2021, pp. 3772–3776, doi:
[22] S. Bhardwaj and H. S. Shreenidhi, ‘‘Modeling of an CNN architecture for 10.1109/ICIP42928.2021.9506119.
kidney stone detection using image processing,’’ in Proc. 4th Int. Conf. [42] M. R. Islam and M. Nahiduzzaman, ‘‘Complex features extraction with
Emerg. Res. Electron., Comput. Sci. Technol. (ICERECT), Dec. 2022, deep learning model for the detection of COVID19 from CT scan images
pp. 1–5, doi: 10.1109/ICERECT56837.2022.10059972. using ensemble based machine learning approach,’’ Exp. Syst. Appl.,
[23] P. Ballester and R. Araujo, ‘‘On the performance of GoogLeNet and vol. 195, Jun. 2022, Art. no. 116554, doi: 10.1016/j.eswa.2022.116554.
AlexNet applied to sketches,’’ in Proc. AAAI Conf. Artif. Intell., 2016, [43] J. Nalepa, P. Bosowski, W. Dudzik, and M. Kawulok, ‘‘Fusing deep learn-
vol. 30, no. 1, pp. 1–5, doi: 10.1609/aaai.v30i1.10171. ing with support vector machines to detect COVID-19 in X-ray images,’’
[24] P. Nagrath, R. Jain, A. Madan, R. Arora, P. Kataria, and J. Hemanth, in Proc. Asian Conf. Intell. Inf. Database Syst., 2022, pp. 340–353, doi:
‘‘SSDMNV2: A real time DNN-based face mask detection system using 10.1007/978-981-19-8234-7_27.
single shot multibox detector and MobileNetV2,’’ Sustain. Cities Soc.,
vol. 66, Mar. 2021, Art. no. 102692, doi: 10.1016/j.scs.2020.102692.
[25] M. T. Rahman and A. Dola, ‘‘Automated grading of diabetic
retinopathy using DenseNet-169 architecture,’’ in Proc. 5th Int.
Conf. Electr. Inf. Commun. Technol. (EICT), Dec. 2021, pp. 1–4,
doi: 10.1109/EICT54103.2021.9733431.
[26] M. Baygin, O. Yaman, P. D. Barua, S. Dogan, T. Tuncer, and U. R. Acharya,
‘‘Exemplar Darknet19 feature generation technique for automated kidney
stone detection with coronal CT images,’’ Artif. Intell. Med., vol. 127,
May 2022, Art. no. 102274, doi: 10.1016/j.artmed.2022.102274.
[27] M. N. Islam, M. Hasan, M. K. Hossain, M. G. R. Alam, M. Z. Uddin, and JYOTISMITA CHAKI (Member, IEEE) received
A. Soylu, ‘‘Vision transformer and explainable transfer learning models the Ph.D. (Engg.) degree from Jadavpur Univer-
for auto detection of kidney cyst, stone and tumor from CT-radiography,’’ sity, Kolkata, India. She is currently an Associate
Sci. Rep., vol. 12, no. 1, p. 11440, Jul. 2022, doi: 10.1038/s41598-022- Professor with the School of Computer Science
15634-4. and Engineering, Vellore Institute of Technology
[28] K. M. Black, H. Law, A. Aldoukhi, J. Deng, and K. R. Ghani, ‘‘Deep learn- (VIT University), Vellore, India. She has authored
ing computer vision algorithm for detecting kidney stone composition,’’ and edited many international conference papers,
BJU Int., vol. 125, no. 6, pp. 920–924, Jun. 2020, doi: 10.1111/bju.15035. journal articles, and books. Her research interests
[29] P. Pandey, K. K. Dewangan, and D. K. Dewangan, ‘‘Enhancing the quality
include computer vision and image processing,
of satellite images using fuzzy inference system,’’ in Proc. Int. Conf.
Energy, Commun., Data Analytics Soft Comput. (ICECDS), Aug. 2017,
pattern recognition, medical imaging, soft com-
pp. 3087–3092, doi: 10.1109/ICECDS.2017.8390024. puting, artificial intelligence, and machine learning. She is an Editor of
[30] F. Lopez, A. Varelo, O. Hinojosa, M. Mendez, D. H. Trinh, Engineering Applications of Artificial Intelligence journal (Elsevier); an
Y. ElBeze, J. Hubert, V. Estrade, M. Gonzalez, G. Ochoa, and Academic Editor of PLOS ONE journal; an Associate Editor of Array journal
C. Daul, ‘‘Assessing deep learning methods for the identification of (Elsevier), IET Image Processing, and Machine Learning with Applications
kidney stones in endoscopic images,’’ in Proc. 43rd Annu. Int. Conf. journal (Elsevier); and a Section Editor of PeerJ Computer Science journal.
IEEE Eng. Med. Biol. Soc. (EMBC), Nov. 2021, pp. 2778–2781, doi:
10.1109/EMBC46164.2021.9630211.
[31] Z.-H. Huang, Y.-Y. Liu, W.-J. Wu, and K.-W. Huang, ‘‘Design and valida-
tion of a deep learning model for renal stone detection and segmentation
on Kidney–Ureter–Bladder images,’’ Bioengineering, vol. 10, no. 8, p. 970,
Aug. 2023, doi: 10.3390/bioengineering10080970.
[32] ImageNet Database. Accessed: Mar. 31, 2023. [Online]. Available:
https://www.image-net.org/
[33] X. Zheng, W. Chen, Y. You, Y. Jiang, M. Li, and T. Zhang, ‘‘Ensem-
ble deep learning for automated visual classification using EEG sig-
nals,’’ Pattern Recognit., vol. 102, Jun. 2020, Art. no. 107147, doi: AYŞEGÜL UÇAR (Senior Member, IEEE)
10.1016/j.patcog.2019.107147. received the B.S., M.S., and Ph.D. degrees from
[34] M. A. Ganaie, M. Hu, A. K. Malik, M. Tanveer, and P. N. Suganthan, the Department of Electrical and Electronics Engi-
‘‘Ensemble deep learning: A review,’’ Eng. Appl. Artif. Intell., vol. 115, neering, Firat University, Turkey, in 1998, 2000,
Oct. 2022, Art. no. 105151, doi: 10.1016/j.engappai.2022.105151. and 2006, respectively. In 2013, she was a Visiting
[35] Kaggle Kidney CT Image Database. Accessed: Mar. 31, 2023. [Online]. Professor with the Division of Computer Sci-
Available: https://www.kaggle.com/datasets/nazmul0087/ct-kidney-
ence and Engineering, Louisiana State University,
dataset-normal-cyst-tumor-and-stone
USA. Since 2020, she has been a Professor with
[36] K. Somasundaram, S. M. Animekalai, and P. Sivakumar, ‘‘An efficient
detection of kidney stone based on HDVS deep learning approach,’’ in the Department of Mechatronics Engineering,
Proc. 1st Int. Conf. Combinat. Optim., 2021, pp. 1–9, doi: 10.4108/eai.7- Firat University. She has more than 24 years of
12-2021.2314490. background in autonomous technologies and artificial intelligence, its engi-
[37] S. Sudharson and P. Kokil, ‘‘An ensemble of deep neural net- neering applications, robotics vision, teaching, and research. She is active in
works for kidney ultrasound image classification,’’ Comput. Meth- several professional bodies, particularly as a European Artificial Intelligence
ods Programs Biomed., vol. 197, Dec. 2020, Art. no. 105709, doi: Alliance Committee Member and an Associate Editor of IEEE ACCESS.
10.1016/j.cmpb.2020.105709.

32910 VOLUME 12, 2024

You might also like