0% found this document useful (0 votes)
14 views11 pages

Adaptive Feature Selection and Image Classification Using Manifold Learning Techniques

Uploaded by

thailadevona6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Adaptive Feature Selection and Image Classification Using Manifold Learning Techniques

Uploaded by

thailadevona6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 17 August 2023, accepted 26 September 2023, date of publication 5 October 2023, date of current version 21 March 2024.

Digital Object Identifier 10.1109/ACCESS.2023.3322147

Adaptive Feature Selection and Image


Classification Using Manifold
Learning Techniques
AMNA ASHRAF 1,2 , NAZRI MOHD NAWI 2, AND MUHAMMAD AAMIR 3
1 Department of Artificial Intelligence, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
2 Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), Parit Raja, Johor 86400, Malaysia
3 School of Electronics, Computing and Mathematics, University of Derby, DE22 3AW Derby, U.K.

Corresponding authors: Amna Ashraf ([email protected]) and Nazri Mohd Nawi ([email protected])

ABSTRACT Manifold learning techniques aim to the non-linear dimension reduction of data. Dimension
reduction is the field of interest and demand of many data analysts and is widely used in computer vision,
image processing, pattern recognition, neural networks, and machine learning. The research has been divided
into two phases to recognize manifold learning techniques’ importance. In the first phase, the manifold
learning approach is used to improve the ‘feature selection by clustering’. Clustering algorithms such as
K-means, spectral clustering, and the Gaussian Mixer Model have been tested with manifold learning
approaches for adaptive feature selection. The results obtained are satisfactory compared to simple clustering.
In the second phase, a Triple Layered Convolutional Architecture (TLCA) has been proposed for image
classification bearing 85.34%, 59.14%, 71.43%, 90.06%, and 71.71% accuracy levels for the datasets such
as Pistachio, Animal, HAR, Mango Leaves, and Cards respectively. The performance of the proposed TLCA
model is compared to the other deep learning models i.e., CNN, LSTM, and GRU. To further improve the
accuracy, reduced dimensional data from manifold learning technique is used and achieved higher accuracies
from Hybrid Triple Layered Convolutional Architecture HTLCA as 97.73%, 87.18%, 97.97%, 99.19%, and
96.91% for the mentioned sequence of datasets. The effectiveness and precision of the suggested methods
are demonstrated by the experimental findings.

INDEX TERMS Clustering, feature extraction, feature selection triple layered convolutional architecture.

I. INTRODUCTION data, thereby extracting meaningful features that capture this


Manifold learning is a machine learning and data analy- structure.
sis technique that extracts meaningful features from high- Several techniques for manifold learning exist, includ-
dimensional data [1]. Its primary objective is to identify a ing Principal Component Analysis (PCA) [2], t-distributed
lower-dimensional representation of the data that preserves Stochastic Neighbor Embedding (t-SNE) [3], and Isomet-
the underlying structure and relationships among the data ric Mapping (Isomap) [4], among others. These tech-
points. The technique treats the data as it lays on a mani- niques employ different algorithms to determine the
fold, which is a curved, lower-dimensional surface embedded lower-dimensional representation of the data while preserv-
in the high-dimensional space. This manifold can be envi- ing the relationships among the data points. The extracted
sioned as a twisted or folded version of the high-dimensional features can be utilized for diverse tasks, such as classifi-
space. By identifying the underlying manifold, the manifold cation, clustering, object recognition, image retrieval, and
learning algorithms can uncover the intrinsic structure of the visualization [5]. Through the reduction of the data’s dimen-
sionality and the extraction of meaningful features, manifold
learning enhances the performance of machine learning and
The associate editor coordinating the review of this manuscript and deep learning algorithms and simplifies the understanding
approving it for publication was Jeon Gwanggil . and interpretation of the data.
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 40279
A. Ashraf et al.: Adaptive Feature Selection and Image Classification

structure while investigating local structure and label


correlations.
The main attention of this study is to realize the importance
of manifold learning techniques in the domain of machine
learning and deep learning. The focus of our research has
been mentioned in Fig. 1 and the techniques used and appli-
cations considered are stared in the diagram. The major
applications are image classification, adaptive feature selec-
tion, and data visualization. Image classification is used in
the fields of medical imaging, natural language processing,
and robotics. The adaptive feature selection technique has its
FIGURE 1. Main focus of the research.
advantages in web cluster engines [12], bioinformatics [13],
recommendation systems, search result clustering, and social
network analyses, while data visualization is essential for
The use of extracted features from manifold learning image and video processing [14].
techniques helps deep learning algorithms to accurately There are lots of feature selection methods that already
classify images based on their underlying structure and exist like filters, wrappers, and some hybrid methods [15].
relationships, thereby improving the performance of com- Clustering itself facilitates feature selection. Different clus-
puter vision systems [6]. Image classification by deep tering algorithms have different accuracies on different
learning algorithms is widely used in fields such as med- datasets. These accuracies can be improved using manifold
ical imaging, natural language processing, and robotics. learning techniques. The same is the case with image classi-
By reducing the dimensionality of the image data and extract- fication. Experimental results show that introducing feature
ing meaningful features, manifold learning can enhance extraction by manifold learning can play an important role
the performance of computer vision systems, thereby in adaptive feature selection and perform better image clas-
advancing research and practical applications in these sification than that can be achieved by state-of-the-art deep
fields. learning models.
Diffusion maps [7], Laplacian eigenmaps, and manifold
regularized extreme learning machines [8] are other manifold II. PRELIMINARIES
learning algorithms that have been used for picture catego- Machine learning and data analysis employ manifold learn-
rization. These techniques have shown a promising increase ing approaches to comprehend and extract high-dimensional
in image classification accuracy as they aim to capture many data structures. Deep learning techniques have excelled by
features of the underlying data structure. A researcher pro- outperforming other techniques in a variety of applications,
posed a unique approach to feature selection that makes use of including text mining, speaker identification, handwriting
both labeled and unlabeled data [9]. To find the most pertinent recognition and object detection and recognition. Data often
features for classification, a strategy that combines mani- resides on a lower-dimensional manifold embedded in a
fold learning with a graph-based semi-supervised learning higher-dimensional environment in real-world applications.
algorithm is used. To propagate labels from labeled to unla- Manifold learning attempts to capture and explain this funda-
beled data, it uses the graph-based semi-supervised learning mental structure. Different manifold learning techniques are
algorithm. discussed and elaborated below.
Feature selection is a data preprocessing technique that
prepares data for various data mining and machine learn- A. ISOMAP
ing tasks [10]. It aims a simpler and more comprehensive Isomap dimensionality reduction preserves geodesic dis-
model to improve data mining performance and produce tances between data points. Visualizing high-dimensional
clean and logical data. In recent decades, numerous fea- data in smaller dimensions is typical. Isomap creates a neigh-
ture selection techniques have been introduced, primarily borhood graph from paired data point distances and finds a
designed for supervised classification problems. However, low-dimensional embedding that retains geodesic distances.
the recent advancements in technology and the abundance Isomap has the following steps to be followed. Data input X
of unlabeled data generated in various applications, such with ‘d’ dimensions refers to (1) having ‘n’ number of data
as text mining, image retrieval, social media and intrusion points.
detection, have led to a significant interest in Unlabeled
Feature Selection (UFS) methods within the scientific com- X = [x1 , x2 , . . . , xn ] , xi : Rd (1)
munity. Daniela proposed a solution for feature selection
SFAM [11] a unified learning paradigm that combines Pairwise distances between data points are computed to build
adaptive global structure learning with manifold learn- the neighborhood graph. The distance matrix D = [dij ] shows
ing, to address the algorithm cost concern. The method the distance between data points xi and xj . In the k-nearest
is designed to retain global and sparse reconstruction neighborhood graph G, Euclidean distance (2) is used to

40280 VOLUME 12, 2024


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

calculate the edge length C. UMAP


2 UMAP is mostly used for larger datasets to convert high
dij = xi − xj (2) dimensional data to lower dimensional data that visualization
is much better and easy. It is beneficial for the outliers and
An adjacency matrix A represents the neighborhood graph,
similarities to be identified. UMAP works in a way that
where Aij = 1 if xi and xj are connected and 0 otherwise.
preserves the high-dimensional grouping of data and the
Next, the geodesic distances between all pairs of data points
relationships between different data points. The method starts
are obtained. The shortest route distance along G is the
with all the high dimensional points in low dimension and
geodesic distance (3) Gij between xi and xj .
then moves those low dimensional data points so that the
Gij = shortest_path_distance(xi , xj ) (3) categorization among different groups remains as same as
the relationships present in high dimension data. Distances
The shortest paths are usually calculated using graph-based between every pair of data in high dimensions are calculated
methods like Dijkstra’s or Floyd-Warshall. The distance in the initial step. Then UMAP algorithm determines the
matrix DG = [Gij ] reflects the geodesic distance between xi similarity score for each cluster which helps recognize how
and xj . Isomap computes a low-dimensional data embedding good clustering has been done. It must be as same as the
using classical multidimensional scaling (MDS). MDS finds clusters in the low-dimensional graph present. UMAP uses
a group of points in a lower-dimensional space that approx- Spectral Embedding to initiate a low-dimensional graph by
imates pairwise distances from the high-dimensional space. using the similarity score SS (6).
The low-dimensional embedding matrix Y = [y1 , y2 ,. . . , yn ]
represents the lower-dimensional coordinates of each data SS = e−(Raw Distance−Distance to nearest neighbor)/σ (6)
 
point xi . 1 1
Isomap has proved successful in several applications. Data Cost = log +log( ) (7)
neighbor 1 − notneighbor
and parameters determine its efficacy. Isomap, like other
dimensionality reduction methods, does not function well for UMAP focuses on the two scores ‘neighbor’ and ‘not neigh-
all datasets. The data structure, noise, and outliers affect its bor’ to evaluate if a point is in the right place or not. There
performance. is a Cost Function elaborated in (7) which uses the two
scores to calculate. For an optimal low-dimensional graph
B. LLE very few points are moved at a time by Stochastic Gradient
Locally Linear Embedding (LLE) is an effective non-linear Descent.
dimension reduction technique for reducing the features of
high-dimensional data while retaining its core geometric D. PHATE
structure. The LLE algorithm consists of three key stages: High-dimensional data is complex to visualize in a manner
constructing a neighborhood graph, computing the weight that is it should be intuitive and accurate. This visual-
matrix, and computing the embedding coordinates. To begin, ization method must preserve local and global structure
the algorithm constructs a neighborhood graph G represented in higher dimensional data, denoise the data so that the
by an adjacency matrix. It identifies the k nearest neighbors underlying structure is visible, and preserve as much infor-
‘j’ of each data point ‘i’ and connects them with edges. The mation as possible i.e. local and global structure, in low
variable ‘Gij ’ sets to 1 if there is an edge between i and j dimensions (two to three). In addition, a visualization
otherwise sets to 0. Next, for each data point, the algorithm method should be robust in the sense that the obtained
computes a weight matrix E(W) (4) by minimizing the recon- data structure is insensitive to the user configurations of
struction error between the data point and its neighbors using the algorithm and scalable to the massive sizes of contem-
linear weights Wij . porary data. Potential of Heat-diffusion for Affinity-based
Transition Embedding (PHATE) [16] is designed for these
2
X X objectives.
E (W) = Xi − Wij Xj (4)
i j There are three main steps of the algorithm. The first
step is to use local similarities to encode local data infor-
Finally, the algorithm computes the embedding coordi- mation. The second step is to use potential distances to
nates Yi by minimizing a cost function C(Y) that preserves encode global relationships in data. The third one is to
the local relationships between the data points referred to (5). have low-dimensional data by embedding potential distance
X X 2 information.
C (Y) = Yi − Wij Yj (5)
i j
III. METHODOLOGY
The resulting embedding coordinates provide a The research has been divided into two phases. Feature
lower-dimensional representation of the data that main- selection is the first phase in which different clustering tech-
tains its essential geometric structure. LLE recovers global niques are used. These selected features are analyzed against
nonlinear structure from locally linear fits, unlike Isomap. five datasets. Along with the clustering techniques, some

VOLUME 12, 2024 40281


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

FIGURE 2. Adaptive feature selection includes techniques such as feature extractor and clustering algorithms for feature selection.

manifold learning techniques are hybridized in attention to combinations of clustering techniques and manifold learning
attain better performances. In the following sections, the techniques. Experimentally tested clustering methodologies
manifold learning techniques and the clustering techniques are discussed below.
are explained respectively. The second phase of the research
is about image classification, where a new model TLCA is 1) K-MEAN
proposed and the three state-of-the-art algorithms are tested It works in an iterative process [19], of assigning all
and evaluated for image classification. the data points to the groups with the initial supposi-
tion of a specific centroid to each cluster. This assign-
A. ADAPTIVE FEATURE SELECTION ment of data points is done by calculating the Euclidean
The objective of feature selection for clustering is to select distance (8) between the data points and the supposed
a set of most relevant features that facilitate the discovery of centroids.
r
natural clusters in the data, according to the selected crite- Xn
rion [17]. These selected features may lead to the best version d (x, y) = (xi − yi )2 (8)
i=1
of relevant features if a suitable feature extraction technique
is applied to consider the spatial features of image data X. The centroid chosen for a fixed number of clusters in
Fig. 2 represents the complete flow of how data spectrum is the first step keeps on changing to minimize the sum
used to capture spectral features and how feature extraction of distances between the data points and the assigned
is performed. As normal preprocessing steps, data normal- centroids.
ization and data scaling of a spectral signature are used to 1 X
Ci = Xi (9)
provide the spectral features of an image. Therefore, we add |Ni |
feature extraction using isomap, LLE, UMAP, or PHATE.
These techniques are known as manifold learning techniques 2) SPECTRAL CLUSTERING
explained in Section II. Numerous fields, such as data analysis, video indexing,
Clustering algorithms discussed in the literature are sen- character identification, image processing, speech separa-
sitive to largeness or dimensionality or both. There is an tion, etc., have effectively implemented spectral clustering.
entropy-based solution is proposed for the ranking of fea- In these applications and many more, the number of data
tures [18]. The key issue regarding this resolution is the elements to cluster can be extraordinarily large [20]. Basic
repeated calculations required for the information-entropy- concepts of spectral clustering involve algebraic graph the-
based significance of an attribute set, which slows down ory and graph cut methods. The advanced development of
feature selection for large datasets. Consequently, fea- spectral clustering comprises the aspects of similarity matrix,
ture extraction followed by feature selection assisted in Laplacian matrix, selecting eigenvectors, and the number
this regard. Adaptive feature selection involves different of clusters chosen. The main focus of spectral clustering is

40282 VOLUME 12, 2024


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

choosing a distance measurement that adequately describes in terms of model size, and its performance is better also.
the intrinsic structure of the data elements. Data within the The convolutional layer has the properties of a local recep-
same category should have a high degree of similarity and tive field, which retains the input shape. Another point to
adhere to space consistency. The measurement of similar- be noticed is that the convolutional layer frequently calcu-
ity is vital to the efficacy of spectral clustering [21]. As a lates the same convolution kernel and various input positions
rule, the Gaussian kernel function is chosen as the similarity through a sliding window, thereby effectively preventing the
measure. training parameter size from becoming excessively large. The
Following the construction of a similarity matrix, the pooling layer reduces the computational load by minimiz-
corresponding Laplacian matrix is created using various ing the number of connections between the convolutional
graph cut methods. The efficacy of spectral clustering algo- layers [29] and alleviates the convolutional layer’s exces-
rithms is significantly influenced by the selection of graph sive position sensitivity. CNN ensures the invariance of
cut methods and the construction of Laplacian matrices. input image pixels with respect to displacement, scaling, and
Through eigen-decomposition, the eigenvalues and eigenvec- distortion.
tors of a Laplacian matrix can be determined. An analysis
of the properties of eigenspace demonstrates that: (a) not 2) LSTM
every Laplacian matrix’s eigenvector is relevant for clus- Long Short-Term Memory (LSTM) is a sophisticated
tering; (b) eigenvector selection is crucial because using form of Recurrent Neural Networks (RNN) that cap-
uninformative eigenvectors could result in poor cluster- tures long-term dependencies. LSTM was introduced in
ing results; and (c) the corresponding eigenvalues can- 1997 [30] and improved in 2013 [31], garnering a
not be used to select relevant eigenvectors for a realistic great deal of popularity in the deep learning community.
dataset. LSTM models have proven more effective than standard
RNNs at retaining and utilizing information over extended
3) GAUSSIAN MIXER MODEL sequences [32].
GMM [22] works the same as k-means does but k-means In an LSTM network, the current input at a particu-
only performs better for the data distributed over circular lar time step and the output from the previous time step
shapes. The reason behind this is it clusters the points only in a are supplied into the LSTM unit, which in turn generates
circular shape with a radius defined by the most distant point. an output that is passed on to the subsequent time step.
In the case of GMM, the clusters can be oblong depending Commonly, the final hidden layer of the last time phase,
upon the data distribution. Besides assigning a cluster to each and sometimes all hidden layers, are used for classification
point, GMM considers the probability that a certain point purposes [33].
belongs to which cluster. Three gates comprise LSTM: input gate, forget gate,
and output gate. Each gate serves a distinct purpose in
B. IMAGE CLASSIFICATION
regulating the passage of information. Based on the cur-
rent input and the preceding internal state, the input gate
A popular technique for classifying hyperspectral images is
determines how to update the internal state. The for-
supervised classification. The fundamental procedure is to
get gate determines how much of the preceding state of
calculate the discriminant function and then establish the
the internal environment should be forgotten. Lastly, the
discriminant criterion based on the given sample category and
output gate regulates the effect of the system’s internal
prior knowledge; Support vector machine, artificial neural
state [34].
network (ANN) [23], convolutional neural network (CNN)
[24], long short-term memory (LSTM), decision tree, gated
3) GRU
recurrent unit networks (GRU) [25] and maximum like-
lihood classification methods are supervised classification A gated recurrent unit (GRU) is an improvement on the
techniques that are frequently employed. Some of these are conventional RNN (recurrent neural network). In 2014,
described below. Kyunghyun Cho [35] introduced it for statistical machine
translation. More or less they are similar to LSTM. GRU
also employs gates to control the information flow, just
1) CNN
like LSTM. They are comparatively more recent than
CNN’s structure includes the convolutional, pooling, non- LSTM and are superior to LSTM in terms of simplicity of
linear activation, and fully connected layers. In general, the architecture.
image is preprocessed [26] before being provided to the net- Unlike LSTM, it lacks a distinct cell state (Ct) and pos-
work via the input layer, passed through a series of alternately sesses only a hidden state (Ht). Due to their simplified
arranged convolutional and pooling layers, and then a fully architecture, GRUs can be trained more quickly. Only two
connected layer is used for classification. gates comprise GRU: Reset gate and Update gate. Equations
CNN [27], [28] adds a very distinctive convolutional and for their functionalities are as follows.
pooling layer compared to Multilayer Perceptron (MLP). For
large data sets, CNN exhibits exceptional cost performance rt = σ (xt × Ur + Ht−1 × Wr ) (10)

VOLUME 12, 2024 40283


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

FIGURE 3. Proposed model of image classification.

TABLE 1. Model summary. 4) TLCA (PROPOSED MODEL)


TLCA has proven to be an effective solution for image
classification problems. The efficacy of large image
databases, such as the Pistachio, HAR, Mango Leaves, and
Cards datasets have been significantly enhanced by the
TLCA-based network. As an improved form of CNN, it is
very adept at understanding the local and global structures
from image data.
The overall design of the framework can be depicted in
Fig. 3. The first part of layer 1 is a convolutional layer with
32 output channels and a kernel dimension of 3 × 3 pixels.
The second part of layer 1 is also a convolutional layer with
64 output channels and the same kernel size. The third part
of layer 1 is a max pooling layer with a 2 × 2 kernel. In a
triple-layered architecture, the same sequence is repeated
three times. Each of the subsequent five layers is composed
of 73728-1024-512-64-c neurons and is fully connected.
Where ‘c’ is different for different datasets and is the
number of classes each dataset has. Since the input image
is not textual, the network must learn large-scale or high-
level features. The network with a three-layered architecture
performs image classification tasks significantly well. The
large number of parameters to be learned may result in over-
fitting, but as a consequence, accuracy improves. The results
obtained using epochs 20 on batch size 32 are satisfactory.
Model summary is shown in Table 1.

C. EXPERIMENTAL SETUP
To evaluate the proposed model of adaptive feature selection
ut = σ (xt × Uu + Ht−1 × Wu ) (11) and image classification model TLCA, we used the following
experimental setup and five datasets, whose prescription is
The reset gate uses equation 1, where Ur and Wr are the mentioned as follows.
weight matrices for the reset gate. Similarly, the update gate The experiment setup involves disk storage, system RAM,
uses equation 2, where Uu and Wu are the weight matrices and GPU RAM as hardware requirements and Python3 as
for update gate. software prerequisites. Depending on dataset size and model

40284 VOLUME 12, 2024


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

TABLE 2. Dataset description. and 10% for validation. Their distribution can be seen in the
table.
The distribution of data over different classes in each
dataset is demonstrated in the histograms shown in Fig. 4.
Datasets Pistachio and Animal are binary class datasets while
others are multiclass. The dataset ‘Human Action Recogni-
tion’ (HAR) is a balanced dataset which means each class
has an equal number of images and the dataset ‘Mango
Leaves’ is the most imbalanced while others are near to
balanced.

IV. EXPERIMENTAL RESULTS


A. ADAPTIVE FEATURE SELECTION
In the first phase of experimentation, three clustering algo-
rithms; KMeans, GMM, and spectral clustering are used for
feature selection. As expected the results are not satisfied so
some manifold learning techniques: LLE, Isomap, UMAP,
and PHATE are introduced as a preprocessing step for better
performance. For different datasets, the different combination
of clustering and manifold learning technique provides the
best results. Three out of five datasets i.e. Animal, HAR, and
Cards dataset perform better with PHATE + Kmeans while
for the Pistachio dataset, its Isomap + Kmeans performs well.
As far as simple clustering is concerned, spectral behaves a
way better than Kmeans and GMM for this dataset but when
these clustering techniques combine with Isomap, Kmeans
provide better features. There are two most prominent cases
where clustering accuracy improved remarkably by intro-
ducing manifold learning. For dataset HAR, it rises from
8.07% to 55.07% while for Cards, it elevates from 20.80%
to 31.78%.

B. ADAPTIVE FEATURE SELECTION


In the second phase of experimentation, the proposed image
classification model; TLCA is evaluated based on accuracy.
Its performance is compared with the state-of-the-art image
classification models CNN, LSTM, and GRU. Simple classi-
FIGURE 4. Data distribution over classes.
fication can be further improved by reducing data size before
processing.
This data size reduction is dimension reduction which
complexity almost 32 GB of system RAM and 32 GB of prevents overfitting and eliminates noise and redundancy.
disk storage is desired for data accumulation, model check- Eventually, the computational cost is reduced and generalized
points, and other relevant files. We have used the Google performance improves. As shown in Table 3, the accuracy
Colab Pro+ version for our experimentation. A100 type of level of TLCA for datasets Pistachio, Animal, and Cards
GPU has been chosen. The latest generation, A100-80GB is far better than CNN, LSTM, and GRU. The accuracy
doubles GPU memory and introduces the world’s quickest is further improved when feature extraction by PHATE is
memory bandwidth at 2 terabytes per second (TB/s), which done and the reduced features are used for classification by
accelerates time to solution for the largest models and largest TLCA. For Pistachio it goes from 85.34% to 97.73%. For
datasets. HAR, it improves from 71.43% to 97.97% and for dataset
Cards, the accuracy rises from 71.71% to 95.65%. The results
D. DATASETS of TLCA with PHATE i.e. Hybrid Triple Layered Convo-
Five different image datasets mentioned in Table 2, have been lutional Architecture (HTLCA) are also mentioned in the
taken from the Kaggle repository. The image data has been table.
divided into 3 proportions for training, testing, and validation. To show the performance of TLCA, we presented the con-
Of the total images, 75% is used for training, 15% for testing, vergence graphs. Training accuracy and validation accuracy

VOLUME 12, 2024 40285


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

TABLE 3. Adaptive feature selection results.

TABLE 4. Image classification results. figure. It can be observed from the accuracy plots of three
datasets that accuracy is inclined over 20 epochs and the
model performed well against state-of-the-art algorithms
while convergence in loss plots is not that eminent, hence
providing room for over-fitting to be handled. In HTLCA,
we first extract features then reduce dimensions using
PHATE, and then train TLCA offline using the labeled image
dataset.

V. DISCUSSION
Adaptive feature learning via clustering is introduced in
the first phase of this research. K-means, GMM, and spec-
tral clustering It is evident that feature selection can be
significantly improved using manifold learning techniques.
In the second phase of experimentation, the proposed image
classification model HTLCA is tested, and the accuracy is
compared with state-of-the-art classification models: CNN,
LSTM and GRU. The association with manifold learning fur-
ther improves classification performance. The convergence
graph for the training/validation accuracy and loss of the
proposed model shows how it behaves (Fig. 5); good fit in
some cases while overfitting for other datasets. The results
show that there is no overfitting in the case of small datasets.
Among the model accuracy mentioned in Table 4, TLCA
curves are demonstrated in the upper half of Fig 4 for achieved the best classification performance for the Pistachio
each of the 5 datasets. Similarly, training loss and vali- and Cards image datasets, with an accuracy of 85.34% and
dation loss curves are shown in the lower half of each 71.71%. Using PHATE as a preprocessing step (HTLCA)

40286 VOLUME 12, 2024


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

FIGURE 5. Graph of convergence for HTLCA.

increases the classification accuracy up to 97.73%, 60.18%, about 100 times smaller than others, so adequate training
97.97%, and 95.65% for Pistachio, Animal, HAR, and Cards data is required for better model performance. For datasets
datasets respectively. Here are some useful insights observed Pistachio, Mango Leaves, and Animal, the training accu-
in experimental results. racy and validation accuracy graph lines are much closer to
For Larger datasets Pistachio, HAR, Mango Leaves, and each other, which depicts a very small overfitting of data.
Cards, we observe a smooth curve of TLCA accuracy(Fig. 5), It means the model is performing well on unseen data. Con-
while for dataset ‘Animal’, jerks are found. This dataset is trary to this, the data is overfitted for the datasets HAR and

VOLUME 12, 2024 40287


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

Cards. Accuracy may be compromised for smaller datasets [9] X. Chen, R. Chen, Q. Wu, F. Nie, M. Yang, and R. Mao,
and the datasets where data distribution over classes is not ‘‘Semisupervised feature selection via structured manifold learning,’’
IEEE Trans. Cybern., vol. 52, no. 7, pp. 5756–5766, Jul. 2022, doi:
balanced. For instance, the Animal dataset is small (Table 2), 10.1109/TCYB.2021.3052847.
and for Mango Leaves, data distribution over 16 classes is [10] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu,
unbalanced (see Fig. 4). ‘‘Feature selection: A data perspective,’’ ACM Comput. Surv., vol. 50, no. 6,
pp. 1–45, Nov. 2018, doi: 10.1145/3136625.
[11] S. Lv, S. Shi, H. Wang, and F. Li, ‘‘Semi-supervised multi-label
VI. CONCLUSION AND FUTURE WORK feature selection with adaptive structure learning and manifold learn-
Manifold learning is a technique of machine learning and ing,’’ Knowl.-Based Syst., vol. 214, Feb. 2021, Art. no. 106757, doi:
10.1016/j.knosys.2021.106757.
data analysis that extracts significant features from high- [12] J. Alzubi, A. Nayyar, and A. Kumar, ‘‘Machine learning from theory
dimensional data. Different clustering algorithms have dif- to algorithms: An overview,’’ J. Phys., Conf., vol. 1142, Nov. 2018,
ferent performances on various datasets for feature selection. Art. no. 012012, doi: 10.1088/1742-6596/1142/1/012012.
[13] H. Bhaskar, D. C. Hoyle, and S. Singh, ‘‘Machine learning in bioin-
Their accuracies can be enhanced using manifold learn- formatics: A brief survey and recommendations for practitioners,’’
ing techniques i.e. PHATE, UMAP, isomap, and LLE. The Comput. Biol. Med., vol. 36, no. 10, pp. 1104–1125, Oct. 2006, doi:
extracted features can also assist in image classification. 10.1016/j.compbiomed.2005.09.002.
[14] D. Liao, Y. Qian, and Y. Y. Tang, ‘‘Constrained manifold learning for
Therefore, feature extraction by manifold learning followed
hyperspectral imagery visualization,’’ IEEE J. Sel. Topics Appl. Earth
by adaptive feature selection or image classification performs Observ. Remote Sens., vol. 11, no. 4, pp. 1213–1226, Apr. 2018, doi:
well and can be depicted by experimental results. Animal, 10.1109/JSTARS.2017.2775644.
HAR and Cards datasets perform better with PHATE fol- [15] N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, ‘‘MIFS-ND:
A mutual information-based feature selection method,’’ Exp. Syst. Appl.,
lowed by Kmeans while for the Pistachio dataset, its Isomap vol. 41, no. 14, pp. 6371–6385, Oct. 2014, doi: 10.1016/j.eswa.2014.
followed by Kmeans that performs well. In the second phase 04.019.
of experimentation, the proposed image classification model [16] K. R. Moon, D. van Dijk, Z. Wang, S. Gigante, D. B. Burkhardt,
W. S. Chen, K. Yim, A. V. D. Elzen, M. J. Hirn, R. R. Coifman,
TLCA is evaluated compared with modern classification N. B. Ivanova, G. Wolf, and S. Krishnaswamy, ‘‘Visualizing structure
models: CNN, LSTM, and GRU and governed the accura- and transitions in high-dimensional biological data,’’ Nature Biotechnol.,
cies of 97.73%, 60.18%, 97.97%, and 95.65% for Pistachio, vol. 37, no. 12, pp. 1482–1492, Dec. 2019, doi: 10.1038/s41587-019-
0336-3.
Animal, HAR and Cards dataset respectively. [17] J. R. Adhikary and M. N. Murty, ‘‘Feature selection for unsupervised
In the future, this research can be extended with dimen- learning,’’ in Neural Information Processing (Lecture Notes in Computer
sion reduction by auto-encoders. As we see, in the results Science), vol. 7665. Berlin, Germany: Springer, 2012, pp. 382–389, doi:
10.1007/978-3-642-34487-9_47.
how drastically performance accelerated by using manifold [18] M. Dash and H. Liu, ‘‘Feature selection for clustering,’’ in Knowledge
learning techniques, extra feature reduction can cause lesser Discovery and Data Mining, Current Issues and New Applications (Lecture
training times, enhancing or at least retaining the accuracy Notes in Computer Science), vol. 1805. Berlin, Germany: Springer, 2000,
pp. 110–121, doi: 10.1007/3-540-45571-x_13.
level of feature selection and image classification. Moreover, [19] J. Yadav and M. Sharma, ‘‘A review of K-mean algorithm,’’ Int. J. Eng.
work can be done to resolve data overfitting issues. Trends Technol., vol. 4, no. 7, pp. 2972–2976, 2013.
[20] H. Jia, S. Ding, X. Xu, and R. Nie, ‘‘The latest research progress on spectral
clustering,’’ Neural Comput. Appl., vol. 24, nos. 7–8, pp. 1477–1486,
DATA AVAILABLE STATEMENT
Jun. 2014, doi: 10.1007/s00521-013-1439-2.
Code will be available on demand. [21] L. Wang, L. F. Bo, and L. C. Jiao, ‘‘Density-sensitive spectral clustering,’’
Acta Electron. Sin., vol. 35, no. 8, pp. 1577–1581, 2007.
[22] G. J. McLachlan and S. Rathnayake, ‘‘On the number of components in a
REFERENCES
Gaussian mixture model,’’ WIREs Data Mining Knowl. Discovery, vol. 4,
[1] A. J. Izenman, ‘‘Introduction to manifold learning,’’ Wiley Interdiscip. Rev. no. 5, pp. 341–355, Sep. 2014, doi: 10.1002/widm.1135.
Comput. Stat., vol. 4, no. 5, pp. 439–446, 2012, doi: 10.1002/wics.1222. [23] F. Paquin, J. Rivnay, A. Salleo, N. Stingelin, and C. Silva, ‘‘Multi-phase
[2] E. Oja, ‘‘The nonlinear PCA learning rule in independent compo- semicrystalline microstructures drive exciton dissociation in neat plastic
nent analysis,’’ Neurocomputing, vol. 17, pp. 25–45, Sep. 1997, doi: semiconductors,’’ J. Mater. Chem. C, vol. 3, pp. 10715–10722, Jan. 2015,
10.1016/S0925-2312(97)00045-3. doi: 10.1039/C5TC02043C.
[3] G. H. L. van der Maaten, ‘‘Visualizing data using t-SNE,’’ Ann. Oper. Res., [24] D. Bhatt, C. Patel, H. Talsania, J. Patel, R. Vaghela, S. Pandya, K. Modi,
vol. 219, no. 1, pp. 187–202, 2014, doi: 10.1007/s10479-011-0841-3. and H. Ghayvat, ‘‘CNN variants for computer vision: History, architecture,
[4] Y. Zhang, Z. Zhang, J. Qin, L. Zhang, B. Li, and F. Li, ‘‘Semi- application, challenges and future scope,’’ Electronics, vol. 10, no. 20,
supervised local multi-manifold isomap by linear embedding for feature p. 2470, Oct. 2021, doi: 10.3390/electronics10202470.
extraction,’’ Pattern Recognit., vol. 76, pp. 662–678, Apr. 2018, doi: [25] R. Dey and F. M. Salem, ‘‘Gate-variants of gated recurrent unit (GRU)
10.1016/j.patcog.2017.09.043. neural networks,’’ in Proc. IEEE 60th Int. Midwest Symp. Circuits Syst.
[5] D. Lunga, S. Prasad, M. M. Crawford, and O. Ersoy, ‘‘Manifold-learning- (MWSCAS), Aug. 2017, pp. 1597–1600, doi: 10.1109/MWSCAS.2017.80
based feature extraction for classification of hyperspectral data: A review 53243.
of advances in manifold learning,’’ IEEE Signal Process. Mag., vol. 31, [26] D. Ciregan, U. Meier, and J. Schmidhuber, ‘‘Multi-column deep neural net-
no. 1, pp. 55–66, Jan. 2014, doi: 10.1109/MSP.2013.2279894. works for image classification,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
[6] J. Zhang, S. Z. Li, and J. Wang, ‘‘Manifold learning and applications in Recognit., Jun. 2012, pp. 3642–3649, doi: 10.1109/CVPR.2012.6248110.
recognition,’’ in Intelligent Multimedia Processing With Soft Computing. [27] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, ‘‘Learning and trans-
Berlin, Germany: Springer, 2006, pp. 281–300, doi: 10.1007/3-540-32367- ferring mid-level image representations using convolutional neural net-
8_13. works,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014,
[7] Y. Fan and Z. Zhao, ‘‘Cryo-electron microscopy image analysis using pp. 1717–1724, doi: 10.1109/CVPR.2014.222.
multi-frequency vector diffusion maps,’’ 2019, arXiv:1904.07772. [28] I. H. Md Yusof, M. An, and M. H. Barghi, ‘‘Integration of lean
[8] B. Liu, S.-X. Xia, F.-R. Meng, and Y. Zhou, ‘‘Manifold regularized extreme construction considerations into design process of construction
learning machine,’’ Neural Comput. Appl., vol. 27, no. 2, pp. 255–269, projects,’’ in Proc. 31st Annu. Assoc. Res. Constr. Manag. Conf., 2015,
Feb. 2016, doi: 10.1007/s00521-014-1777-8. pp. 885–894.

40288 VOLUME 12, 2024


A. Ashraf et al.: Adaptive Feature Selection and Image Classification

[29] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, NAZRI MOHD NAWI received the bachelor’s
G. Wang, J. Cai, and T. Chen, ‘‘Recent advances in convolutional neu- degree from Universiti Sains Malaysia (USM), the
ral networks,’’ Pattern Recognit., vol. 77, pp. 354–377, May 2018, doi: master’s degree in computer science from Uni-
10.1016/j.patcog.2017.10.013. versity Teknologi Malaysia (UTM), and the Ph.D.
[30] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ degree in data mining from Swansea University,
Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: Wales, U.K.
10.1162/neco.1997.9.8.1735. He is currently a Professor with the Department
[31] A. Graves, ‘‘Generating sequences with recurrent neural networks,’’ 2013,
of Software Engineering, Faculty of Computer
arXiv:1308.0850.
Science and Information Technology, Universiti
[32] F. M. Shiri, T. Perumal, N. Mustapha, R. Mohamed, M. A. B. Ahmadon,
and S. Yamaguchi, ‘‘A survey on multi-resident activity recognition in
Tun Hussein Onn Malaysia (UTHM), where he
smart environments,’’ 2023, arXiv:2304.12304. has been a Faculty Member, since 2001. In recent years, he has focused
[33] S. Minaee, E. Azimi, and A. Abdolrashidi, ‘‘Deep-sentiment: Senti- on better techniques for classification, analyzing, and hybridizing some new
ment analysis using ensemble of CNN and bi-LSTM models,’’ 2019, improvements on ANN using meta-heuristic techniques. He has successfully
arXiv:1904.04206. supervised a few Ph.D. students and currently, he is supervising eight Ph.D.
[34] W. Fang, Y. Chen, and Q. Xue, ‘‘Survey on research of RNN-based spatio- students and published more than 100 papers in journals and conference
temporal sequence prediction algorithms,’’ J. Big Data, vol. 3, no. 3, proceedings. His research interests include soft computing and data mining
pp. 97–110, 2021, doi: 10.32604/jbd.2021.016993. techniques, particularly in artificial neural networks, ranging from theory to
[35] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, design and implementation. He has been involved with many conferences
H. Schwenk, and Y. Bengio, ‘‘Learning phrase representations using RNN and workshop program committees and serves as a reviewer for many
encoder–decoder for statistical machine translation,’’ in Proc. Conf. Empir- outstanding journals and international conferences.
ical Methods Natural Lang. Process. (EMNLP), 2014, pp. 1724–1734, doi:
10.3115/v1/d14-1179.

MUHAMMAD AAMIR received the master’s


degree in computer science from the City Uni-
AMNA ASHRAF was born in Bahawalpur, versity of Science and Information Technology,
Pakistan, in 1992. She received the bache- Pakistan, and the Ph.D. degree in information
lor’s and master’s degrees in computer engi- technology from University Tunn Hussien Onn
neering from the University of Engineering Malaysia, in 2020.
and Technology, Lahore, Pakistan, in 2014 and Since October 2020, he has been a Research
2017, respectively. She is currently pursuing the Data Scientist and Machine Learning Develop-
Ph.D. degree with Universiti Tun Hussein Onn ment Engineer with the University of Derby, U.K.
Malaysia (UTHM) under the kind supervision of He was a Data Scientist with Xululabs LLC, for the
Prof. Nazri Mohd Nawi and Muhammad Aamir. past two years. A prolific contributor to academia, he has authored more than
She is working on dimensionality reduction of 25 journal articles and conference papers, in addition to being a coauthor of
large image datasets. Her research interest includes artificial intelligence. a book focused on data analysis. His research interests include data science,
Another embryonic interest is to deal with the hyperspectral images from deep learning, and computer programming.
satellites.

VOLUME 12, 2024 40289

You might also like