Electronics 12 00488 v2
Electronics 12 00488 v2
Review
Machine Learning and Deep Learning Techniques for
Spectral Spatial Classification of Hyperspectral Images:
A Comprehensive Survey
Reaya Grewal, Singara Singh Kasana * and Geeta Kasana
Computer Science and Engineering Department, Thapar Institute of Engineering and Technology,
Patiala 147004, India
* Correspondence: [email protected]
Abstract: The growth of Hyperspectral Image (HSI) analysis is due to technology advancements that
enable cameras to collect hundreds of continuous spectral information of each pixel in an image. HSI
classification is challenging due to the large number of redundant spectral bands, limited training
samples and non-linear relationship between the collected spatial position and the spectral bands. Our
survey highlights recent research in HSI classification using traditional Machine Learning techniques
like kernel-based learning, Support Vector Machines, Dimension Reduction and Transform-based
techniques. Our study also digs into Deep Learning (DL) techniques that involve the usage of
Autoencoders, 1D, 2D and 3D-Convolutional Neural Networks to classify HSI. From the comparison,
it is observed that DL-based classification techniques outperform ML-based techniques. It has also
been observed that spectral-spatial HSI classification outperforms pixel-by-pixel classification because
it incorporates spectral signatures and spatial domain information. The performance of ML and
DL-based classification techniques has been reviewed on commonly used land cover datasets like
Indian Pines, Salinas valley and Pavia University.
Keywords: hyperspectral images; classification; deep learning; PSO; SVM; KNN; decision tree; PCA;
DWT; ANN; CNN
• Food and Safety—HSI has contributed in food quality assessment and safety. It
has been used for identification of defects and levels of contamination. For e.g.,
Leiva et al. [3] employed HSI to find firmness of blueberries and achieved an accuracy
of 87%.
• Medical Diagnosis—Due to high spectral resolution, there is sharp capture of materi-
als and their chemical and physical compositions are highlighted. HSI has embarked
on excellent performance for studying and diagnosing tissues. For e.g., Liu, Wang and
Li [4] utilized HSI images of tongue tissues to detect the tumor. the spectral signatures
of tissues played a vital role for detection.
• Precision Agriculture—Manual crop monitoring is limited since apparent symptoms
often develop late in the disease’s progression, making it difficult to restore plant
health. Advances in HSI methodologies have made crop stress assessment and study
of soil and vegetation attributes more cost-effective. For e.g., Liu et al. [5] used spectral
signatures to estimate the yield of wheat crop.
• Environment Monitoring—HSI has also been applied for floods and water resources
management. HSI provides efficient and reliable information on water quality pa-
rameters which include hydrophysical, biochemical and biological properties. HSI
measured chlorophyll content in water bodies by Kutser et al. [6].
There are many approaches to classify a HSI image. In this work, ML and DL clas-
sification techniques have been reviewed and compared. ML-based image classification
focuses on developing algorithms to predict and detect patterns without human interven-
tion. Various classifiers like Support Vector Machine (SVM), K-Nearest Neighbor (KNN),
Electronics 2023, 12, 488 4 of 34
Decision Trees (DT) etc. are trained. Several steps of data pre-processing and feature engi-
neering need to be performed to get insights from raw images and improve performance of
classification techniques. In this study, we have sub-categorised traditional ML techniques
into commonly employed techniques in recent years like kernel-based learning, SVM
classification, dimension reduction and transform-based techniques. Peers have majorly
used kernel-based techniques to efficiently learn non-linearity of HSI dataset. Spectral and
spectral-spatial kernels have been added as another dimension of learning by authors to
capture complex details of HSI. SVM classifier also belongs to the family of kernel learning.
SVM has been extensively used to classify the high-dimension HSI data and discuused.
With transform-based techniques, authors have been able to extract useful information
while suppressing noise in HSI. HSI dataset. The influence of classification grows with
the increase of available training samples. The limited availability of HSI training samples
diminishes the classification performance with the rise of spectral dimension. This effect is
famously termed the “Hughes phenomenon.” To address this challenge, many authors have
implemented dimension reduction techniques prior to classification. We have discussed
various dimension reduction driven HSI classification that works on spectral features.
Unlike traditional ML techniques, DL delivers a dynamic approach for unsupervised
feature learning using a huge raw image data set. DL-based techniques can depict complex
relationships of data using numerous neural connections. The DL models for HSI classifi-
cation generally consist of three layers: (i) Input data, (ii) Construction of the deep layer
(iii) Classification [7]. A general representation of DL-based HSI classification has been
illustrated in Figure 4.
The papers reviewed are focused on how different state of the art classification
techniques have been used for HSI in the previous decade. A brief discussion on ex-
isting classification techniques is in Section 2. Methodology adopted to conduct this survey
has been briefly stated too. The Section 3 elaborates traditional ML techniques employed
by authors like SVM, kernel-based methods, dimension reduction and transform-based
methods. Section 4 emphasises on DL techniques for spectral and spectral-spatial HSI
classification. Sections 5 and 6 highlight the analysis of this survey. It brings out comparison
in performance of ML and DL techniques for HSI. The paper is concluded with challenges
and future scope of research and improvement in HSI analysis.
2. Preliminaries
This section briefly defines the HSI classification techniques utilised in the surveyed
publications.
that are well-known are Principal Component Analysis (PCA) [12]and Independent
component analysis (ICA). Figure 8 illustrates basic steps of PCA dimension reduction.
Figure 7. A schematic approach of Wavelet Transform decomposing data into two levels.
• Each technique’s performance was compared on the basis of their accuracy in classifi-
cation.
feature descriptors were combined with optimal weights. The experiment was performed
on Indian Pines, Pavia University and Salinas Valley datasets and achieved the highest OA
in comparison with various state of the art approaches.
In the same year, Gao et al. [18] used a composite Spectral-Spatial Kernel for Anomaly
Detection (SSCAD). It considered non-linear characteristics of data unlike other detection
models that worked in linear space and just exploited spectral information. Using a kernel-
based approach, the data is implicitly mapped into high dimensional features space that
deals with non-linear problems well. Using local homoegeneity, superpixels were extracted
using ERS that provided spatial information. It was fused with direct spectral information
extracted from images to form composite kernel. Weights were adaptively determined
using iterative kernel learning algorithm-based on Centred Kernel Alignment (CKA). CKA
measured cosine similarity between two centred kernels. High value of CKA determined
that two kernels are similar to each other. The authors focused on obtaining highest possible
value of CKA between the composite kernel and target kernel. The detection map was
built using kernel-based Reed-Xiaoli anomaly detection algorithm. It used Mahalanobis
distance to form decision rules to distinguish text pixels and backgrounds. The proposed
work was implemented on real datasets obtained using HYDICE sensor, ROSIS sensor over
Pavia centre and AVIRIS sensor over San Diego area. It gave better performances in terms
of Receiver Operating Characteristic (ROC) curve and Area Under the ROC curve (AUC)
when compared with state of the art anomaly detection methods.
Following this, A MKL-based approach involving spectral, spatial and semantic infor-
mation using SVM were used by Wang et al. [19] for better classification results of HSI. First
three PCs (PC1-PC3) were obtained by applying PCA. These were used to obtain Gabor
features, entropy rate superpixel segmentation map and EMPs. Structure and textural
features were extracted and stacked as feature vectors for each pixel using combination
of gabor and EMP features. For uniformity in spatial characteristics, Mean filtering was
performed within each superpixel. For semantic information, k-means clustering map and
segmentation map via ERS were used to produce semantic feature vector for each super-
pixel. Each superpixel was treated as a separate document/image. Spectral features, ERS
map and manually decided ‘k’ number of cluster centroids were inputs to create semantic
features using Bag of Visual Words (BOVW). K-means clustering was performed on the
spectral features to cluster them into ‘k’ cluster centres that was used as visual dictionary.
Number of pixels belonging to each cluster inside each superpixel were counted. Creation
of k × 1 histogram feature vector was done for each superpixel. Three individual kernels
were used to extract spectral, spatial and semantic information. For final results, composite
kernel with SVM was applied using weighted sum of these three kernels. The work was
implemented on Indian Pines and Pavia university and obtained highest OA of 98.39% and
99.77%, respectively.
HSI dataset faces with mixed pixels and purely pixel driven classifiers like SVM
cannot deal with overlapping data. Recently in 2021, Ma et al. [20] overcame it using Kernel
Constrained Energy Minimization (KCEM) and Kernel Linearly Constrained Minimum
Variance (KLCMV) classification. KCEM was for binary classification whereas KLCMV
for multi-classification. KCEM achieved an OA of 99.48% and 99.50% for Indian Pines
dataset, respectively. Both the former and latter achieved an OA of 99.6% on Salinas
Valley. It surpassed the performance of other spectral spatial methods. The aforementioned
Kernel-based classification techniques have been compared in Table 1.
Electronics 2023, 12, 488 9 of 34
Table 2. Cont.
3.4.1. Unsupervised
In 2011, Villa et al. [43] focused on removal of redundant bands and used Independent
Component Discriminant Analysis (ICDA) for the same. The authors obtained classifi-
cation results using Bayesian classifier. Their approach achieved better accuracy than
SVM classification.
In 2016, HSI band selection using combination of entropy filtering and K-means
clustering was done by Santos and Pedrini [44]. For increased intra cluster similarity and
inter cluster variance, the bands were grouped together using their correlations. The images
were downsized by selecting fewer features vector using bi-cubic interpolation to improve
computation time. K-means was applied where each band was treated as a sample and
the Pearson correlation coefficient was used. K Representative bands were selected from
grouped bands and a 2d entropy filter was applied to each band. The central pixel of each
Electronics 2023, 12, 488 13 of 34
kernel was replaced with computed entropy giving a new vector that was submitted to
radial kernel SVM. The methodology obtained an OA of 97.1%, 98.3% and 97.1% on Indian
Pines, Salinas valley and Pavia centre datasets, respectively.
In 2017, Schclar and Averbuch [45] focused on improving the classification results of
HSI using Diffusion Bases (DB)-based methodology. The non-linear correlations amongst
wavelengths were captured that produced low dimension representation of data, reducing
the amount of noise. A modified version of the DB method was also proposed that used
eigendecomposition of symmetric matrices. These were conjugate to the non-symmetric
Markov matrix and used weight functions comprising pairwise similarity between pixels.
To cluster the low dimensional data, two-phased histogram-based segmentation method
named as Wavelength-Wise Global segmentation (WWG) was used. In wavelength wise
understanding of n-band HSI, cube was considered as collection of n images having
size m*m. The clustering was performed on the basis of colour similarity. The colour-
based segmentation included normalisation of input image followed by it’s quantization.
The frequency colour histogram was built in which certain number of highest peaks were
detected that were assumed to belong to different objects in the image. The highest peak
being the largest homogeneous area i.e., background. It was assumed that quantized colour
vectors belonging to same peak were part of same coloured object. After identification of
peaks, each quantized colour vector was associated with a single peak using euclidean
distance and final images were constructed. Microscopy and remotely sensed images
of Washington DC’s National Mall were used on which various iterations of proposed
methodology were performed. The classification results were dependent on the dimension
of diffusion space whose optimal value selection was yet to be studied by the authors.
In 2018, Jain et al. [46] proposed classification of HSI and trained the important features
by optimizing the SVM using Self Organizing Maps (SOM). They classified the interior and
exterior pixels using the posterior probabilities. SOM is data compression technique in
which the incoming signal/pattern of any dimension is reduced to 1D or 2D lattice using
competitive learning of neurons. In their approach the input images were converted to
grayscale, and ROI were selected over which SOM algorithm was applied to properly group
together the pixels in terms of features and intensity levels. The SOM training algorithm
provided inputs and weights to each edge of the image. On the basis of neighbourhood Best
Matching Unit (BMU) using Euclidean distance, each neighbouring node’s weights were
updated iteratively. It brought them closer to the input pattern. For classification of interior
and exterior pixels, posterior probabilities and an optimal threshold were computed. If the
probability of a pixel was greater than the threshold, then the pixel belonged to the interior
of the particular class else it belonged to the boundary of certain class. The experiment
was performed on Indian Pines and Pavia University dataset where it outperformed other
baseline methods achieving highest accuracy of 85.29% and 95.46%, respectively.
Band reduction techniques would reveal nonlinear properties but at the expense of
losing orginal data’s representation. To address the same, Ahmad et al. [47] in 2019 used
non-linear Unsupervised, non-segmented and segmented Denoising Autoencoder(UDAE)-
based b method for improving the classification of HSI. For segmented UDAE, the HSI cubes
were segmented spatially-based on the pixel locations and further processing of segmented
HSI images was done spectrally by autoencoder. The experiment was performed on Pavia
Electronics 2023, 12, 488 14 of 34
Centre, Pavia university and Salinas valley dataset where the proposed methodology
achieved highest accuracy using SVM.
3.4.2. Semi-Supervised
In 2016, Romaszewski et al. [48] proposed a co-training approach-based on P-N
learning scheme inspired by the Tracking-Learning-Detection framework (TLD) used to
track the objects in videos. In P-N scheme, two independent learners P and N were used
that scored the unlabeled samples in different feature spaces and extended the training set.
P-expert assumed same class for spatially close pixels-based on region growing. The score
function was estimated using Gaussian Kernel Density Estimation that used distance from
known samples (seeds). N-expert assumed the same class for pixels with similar spectra
and was defined as a Nearest Neighbor classifier (NN) having a rejection score for pixel
i. It identified the n-closest spectral neighbours from the seeds and spectral Euclidean
distance was computed between the pixel i and pixel j. The score formula was-based on
the probability estimation with the distance-weighted KNN rule. The scores from both the
expert were combined. Spectral classification was performed for unlabeled pixels that could
not be labeled using region growing due to disjoint regions. They applied the approach
on six data sets: the Indian Pines, Salinas Valley, University of Pavia, La Selva Biological
Station and Madonna, Villelongue, France. The method achieved highest classification
accuracy in comparison with various state of the art approaches.
3.4.3. Supervised
In 2016, Li et al. [49] used dual -layer supervised Mahalanobis distance kernel for
HSI classification. The traditional unsupervised approach was modified using supervised
Mahalanobis matrix to obtain a new kernel using relativity information of the various
materials present in the images. The proposed approach was executed in two steps where
firstly, the traditional Mahalanobis matrix was used to map the raw data. Then using the
mapped data, difficult-to-identify classes from the various classes were selected and second
mahalanobis matrix was learned using this particular data only. A new mahalanobis kernel
was formed using the combination of these two matrices. In the end, on this dimensionally
reduced data, SVM was used achieving high performance on the Indian Pines, Salinas
valley and Pavia university dataset. It resolved the drawback of traditional Mahalanobis
distance metric learning, which learned a matrix without taking into accounts the weights
of each class.
Nhaila et al. [50] performed supervised classification of HSI in 2019 using SVM, KNN,
RF and Linear Discriminant Analysis (LDA) with different kernels along with MI for
dimension reduction. The features/bands were selected by computing the MI between the
ground truth and each band. The subsets of bands were intialised with the band having
highest MI value with ground truth. The average of last band and new candidate band
built a reference map called as ground truth estimated. Finally, the candidate band was
added to the subset if it increased the previous MI value between ground truth and the
reference map. The experiment was performed on Indian Pines, Salinas valley and Pavia
university dataset. SVM with RBF kernel and RF outperformed other learners.
The aforementioned supervised, semi-supervised and unsupervised dimension
reduction-based classification techniques have been compared in Table 3.
Electronics 2023, 12, 488 15 of 34
(OIF) was employed for selection of informative features. The OIF value selected bands
with most variance and least correlation. The work had a stable performance and gave
higher accuracy of 85.89% in comparison with SVM, using a single fixed kernel and Simple
MKL on Indian Pines dataset. In future, band clustering and selection could be used. Sparse
MKL could be built for compact representation. The drawback was choosing an appropriate
number of kernels which was a tradeoff between efficiency and accuracy. The number was
chosen between 9 and 12.
In 2017, Yang et al. [55] too worked on representative band selection in HSI. The dis-
tances between spectral bands were computed using disjoint information. Bands were clus-
tered using k-means and ‘K’ representative bands were selected from these clusters. The cri-
teria for optimal selection was-based on minimizing the distances between bands inside the
clusters and maximizing the gap between different representative bands. The disjoint infor-
mation was calculated using joint entropy and MI of two spectral images. The proposed
technique used KNN and SVM classifiers on the Indian Pines dataset and outperformed
various state of the art techniques.
In 2018, Medjahed et al. [56] proposed feature selection in HSI as optimization problem
by using a stochastic approach namely. Simulated annealing was used to optimize the
objective function embedded with classification accuracy rate and relevance among features
in terms of MI. The experiment was compared with existing feature selection approaches
like Mutual Information (MI) Feature Selection, MI Maximization (MIM), Joint MI (JMI),
Minimum Redundancy Maximum Relevance (MRMR) and Conditional MI Maximization
(CMIM). The proposed work achieved highest accuracy rate of 88.75% having 10 features
as compared to above techniques on the Pavia university dataset. Their study achieved
highest OA of 91.47% as compared to the other classifiers in their literature on the same
dataset. For Indian Pines dataset, the highest OA of 76.48% and AA of 71.72% was obtained
in comparison with SVM, genetic algorithm and using 10 features of 20% training pixels.
Xie et al. [57] addressed the problem of dimensionality reduction in 2019 via fea-
tures/bands selection that was information rich and less redundant. Improved Subspace
Decomposition (ISD) and Artificial Bee Colony algorithm (ABC) were used. The correlation
coefficients between adjacent bands were calculated. Local minima and spectral curve
visualization helped in achieving the subspace decomposition of choosing m bands from
the original n bands. Band subset selection was done where randomly k bands were chosen
from each band subspace. It was optimized by the ABC algorithm with the help of ISD
and maximum entropy. In the end, SVM was applied for the classification of the obtained
optimized band subsets. The proposed work was implemented on Pavia University, Indian
Pines and Salinas Valley datasets and achieved better performance than the various state of
the art approached for features selection.
In 2019, Sellami et al. [58] focused on tackling the curse of dimensionality and limited
number of training samples by selecting appropriate features/bands. Adaptive dimension
reduction was used that seeked relevant bands with high discrimination, information, low
redundancy. To extract spatial-spectral information, the spatial window includes features
from neighbouring pixels. These were loaded into a semi-supervised 3-D CNN with convo-
lutional encoder-decoder layers for 3-D convolution and max-pooling. The categorization
map was created using a linear regression classifier. The investigation was carried out
using data from Indian Pines, Pavia University, and Salinas Valley. In comparison to other
recent techniques, the suggested study attained the highest OA for all datasets..
Elzaimi et al. [59] used a filter-based approach using information gain function to
reduce the dimensionality in 2019. The bands were chosen-based on their interaction
and complimentarity. Classification was performed using SVM. The algorithm selected
the discriminative bands using an evaluation of interaction gain that maximised the com-
promise of the MI between the ground truth and the selected band. The average of the
interaction information helped in controlling the redundancy. The selected bands subset
was initialized with a band that had highest MI with class label that served as ground truth
estimated. Iteratively, candidate bands were added by computing their MI with ground
Electronics 2023, 12, 488 17 of 34
truth. Their information gain was calculated-based on the mean interaction information
between the candidate bands, ground truth and the estimated ground truth. The band that
maximized the information gain criterion was chosen in each step. The experiment was
performed on two benchmark hyperspectral datasets Indian Pines and Pavia University
and compared with other band selection algorithms like MI Feature Selection, Minimum
Redundancy Maximum Relevance (MRMR) method and MI-based Filter approach (MIBF).
The proposed work achieved highest OA of 95.25% and 96.83% in Indian Pines and Pavia
University dataset, respectively.
In 2020, Sawant et al. [60] proposed meta-heuristic-based optimization method of
bands selection using Modified Cuckoo Search algorithm (MCS). Initially, Chebyshev
chaotic map was used in the algorithm to initialize the nest locations (solutions). This
ensured non-repetition of generation of similar bands. Fitness value and current iteration
number were used to update iteratively the step size and a scaling factor of the Levy Flight
method. It generated new solutions (bands) in every iteration. These two modifications
in the standard Cuckoo Search algorithm gave MCS and helped in escaping from local
optimum. They used wrapper-based selection method due to which accuracy was checked
by involving the classifier in every iteration. Global best solution was obtained in the end.
The proposed technique outperformed standard CS algorithm and achieved the maximum
OA of 95.10% for Pavia University dataset, and 86.92% for Indian Pines dataset.
To reduce complexity of numerous spectral bands, Zhu et al. [61] used Affinity Propa-
gation (AP) clustering algorithm. An improved AP was used where subsets were created
inside the clusters, the information entropy was combined to change the availability matrix
and create clusters with arbitrary shapes. It achieved an OA of 91.5% on Salinas Valley.
The aforementioned features selection-based classification techniques have been com-
pared in Table 4.
Table 4. Cont.
University dataset, highest OA was achieved of 99.76%. The work could be improved using
saliency-based algorithms, weakly supervised learning, histogram of sparse codes.
Paul et al. [68] used MI-based S-SAE method in 2018. MI is a dependency measure
between bands. 1 indicates high dependency while 0 indicates independent bands. Non-
parametric MI-based spectral segmentation was performed. Local features of each segment
were extracted using S-SAE. MPs of the segmented spectral features gave spatial informa-
tion. The experiment was performed on 10%, 5% and 10% training samples of each class of
the Indian Pines, Pavia University and Botswana dataset. SVM with Gaussian kernel gave
better performance in classification of Pavia University and Botswana datasets. Random
Forest classified Indian Pines dataset better. It overcame the limitation of time consuming
and complex SAE-based features extraction method. The methodology performed well
even for limited number of samples. In future, various other non-linear feature extraction
methods like kernel PCA could be used with the proposed method. DL models could be
assimilated for spectral-spatial classification.
The comparative study of aforementioned features extraction-based classification
techniques is presented in Table 5.
Hu et al. [76] got inspired from application of CNN on 2D images in 2015 and applied
the same in the spectral domain of HSIs. They used 1-D CNN with five layers consisting of
input, convolution, max pooling and fully connected layers. It helped in discriminating
each spectral signature amongst others. Their 5-layer architecture of CNN achieved better
accuracy than traditional SVM, 2-layer Neural Network and LeNet-5 architecture.
Chan et al. [77] proposed a DL-based network in 2015. It consisted of basic processing
components. Cascaded PCA to learn multistage filter banks, binary hashing and blockwise
histograms for indexing and pooling. This net was called PCANet. It was applied to
benchmark visual datasets for digit and face recognition. PCANet served as an effective
baseline where more advanced processing components or more sophisticated architectures
could be justified.
DL has been extensively used for HSI analysis and classification. But high quality
labeled samples are needed for DL to be utilised efficiently. In 2016, Liu et al. [78] tackled
this challenge using weighted incremental dictionary learning on which active learning-
based algorithm was developed. They selected only those training samples which improved
the selection criteria namely uncertainty and representative. This trained deep network on
how and which samples to select at each iteration for training. Their approach achieved
accuracy of 92.4% and 91.6% on Pavia University and Botswana dataset, respectively.
In 2016, Chen et al. [79] dealt with the challenges of limited training samples and
high dimensionality using regularized deep feature extraction method. To obtain better
spectral spatial features, the authors employed 3D CNN. They also applied L2 regular-
ization and dropout techniques to overcome overfitting. The authors improved the CNN
performance by also using virtual samples. These were generated by multiplying a random
factor with training samples and added noise. Their work achieved an OA of 97.56%,
99.54% and 96.31% on Indian Pines, Pavia University and Kennedy Space Centre dataset,
Electronics 2023, 12, 488 21 of 34
mirroring strategy was applied to process the border areas in the image. The images were
divided into patches of dxdxn where d was the width and height of the neighbourhood
window centered at a pixel and n were the number of spectral bands of original image.
d/2 pixels of border were mirrored outwards so that they could be used like any other
pixel in the image. The 3D patches were grouped into batches and sent to convolution
layers. Four fully connected layers were used and cross entropy was the loss function of
CNN. The experiment was performed on Indian Pines and Pavia University dataset using
various values of parameter d. On comparison with 1D, 2D, 3D CNNs and Multi-Layer , it
achieved highest accuracy for different values of parameter d. The classification accuracy
was dependent on manual selection of parameters.
In 2018, Chen et al. [92] proposed a joint spatial and spectral features driven HSI
classification. Image blocks containing local neighbourhood features gave spatial and
spectral features were merged using the convolutional layers. The results were obtained
from the fully connected layer and it outperformed other state of the art approaches.
The proposed network was also combined with the SVM (RBF kernel) in some of the fully
connected layers. Adaptive mechanism to select the spatial window size was proposed.
For obtaining the features, the first convolution layer was Multi-scale features extraction
layer that extracted features invariant of deformation and scaling. The second convolution
layer, feature fusion layer merged the spatial and spectral features followed by features
reduction convolution layer. The proposed network obtained an OA of 98.02% on Indian
Pines dataset which was higher than other approaches. On combination with SVM, highest
accuracy of 98.39% and 98.44% was obtained in the Indian Pines and Pavia University
dataset, respectively. The best size selection for the adaptive window was done on the basis
of confidence criterion where Conf(k) represented the possibility of input pattern being
classified into kth class. The algorithm worked as follows: two random size of window
A × A and B × B were chosen. When A > B, ‘m’ was the most possible class when window
is A × A and ‘n’ being the second most possible class. If for A, Conf(n) < Conf(m) × theta
then the output would be mth class. But if condition was not satisfied then window
size B × B would give higher confident result and classify the input block into m’ th
class. Adaptive window size selection helped in overcoming the problem of large window
size that might contain many intersecting categories hence confusing the network. This
proposed method improved the classification accuracy for HSI significantly.
Earlier classification techniques did not extract HSI features effectively. To address
the same concern, Singh and Kasana [93] used deep features to classify HSI. The authors
initially reduced the dimension to suppress data redundancy using Locality Preserving
Projection (LPP). This processed data was forwarded to Stacked Auto Encoder (SAE) for
deep feature extraction. Logistic regression was used and their work achieved an OA of
84.4% and 87.2% on Indian Pines and Salinas Valley, resp.
In 2019, Zhou et al. [94] used spectral-spatial LSTM networks shown in Figure 11,
for the classification of HSI. The spectral values of each pixel in all the channels were fed
into the Spectral LSTM (SeLSTM) as shown. Initially, the pixel vector having K number
of bands was transformed into K-length sequence. This sequence was fed one by one into
SeLSTM and the last output was fed to the SVM. 1st PC image, local patches centered
at a pixel and the row vectors of each image patch were one by one fed into the spatial
LSTM (SaLSTM). The rows of neighbourhood were converted into S-length sequence.
Figures 12 and 13 display structure of SeLSTM and SaLSTM, respectively. For classification,
spectral and spatial features were obtained separately for each pixel. A decision fusion
strategy was adopted to obtain joint features. For joint spectral-spatial classification, results
of individual LSTMs were intuitively fused in weighted summation. The performance of
SeLSTM, SaLSTM and SSLSTMs were compared with several methods, including PCA,
LDA, non-parametric weighted feature extraction (NWFE), regularized local discriminant
embedding (RLDE), matrix-based discriminant analysis (MDA) and CNN where their
method improved the classification accuracy by at least 2.69%, 1.53% and 1.08% on Indian
Pines, Pavia University and Kennedy Space Centre dataset, respectively.
Electronics 2023, 12, 488 24 of 34
Figure 11. Joint spectral spatial-based LSTM [94]. Adapted with permission from ref. [94]. 2019
Neurocomputing.
Figure 12. Spectral LSTM architecture [94]. Adapted with permission from ref. [94]. 2019 Neurocomputing.
Figure 13. Spatial LSTM architecture [94]. Adapted with permission from ref. [94]. 2019 Neurocomputing.
In 2019, Fang et al. [95] also extracted deep spectral spatial features at different patch
scales using 3D dilated convolutions. All the feature maps were densely connected with
each other. To obtain more distinguishing and less redundant spectral features, the authors
also built spectral-wise attention mechanism(SA) which used soft weights for features. It
achieved an OA of 86.62% on Indian Pines and 92.99% on Pavia University.
Earlier researches implementing ELM did not deal with insufficient samples efficiently.
To address the same, Liu et al. [96] in 2020 implemented ELM-based ensemble transfer
learning. The learners of the target domain helped in determining whether the source
dataset was useful or not. They retained biases and weights learned of the ELM in target
domain and utilised the instances of the source domain to iteratively update the output
weights of ELM. These weights were used by the authors for the training models which
were further ensembled using the same. In this manner, they used source data to improve
the ability of the learner in target domain. They used Pavia University and Pavia Centre
interchangeably as source and target domains to check efficiency of their approach.
Ramamurthy et al. [97] tried to reduce computational complexity by denoising and
reducing dimensions of HSI. Initially,they recognised edges of images through image
denoising and David Marr edge recognition with Canny edge detector. Further, they
segmented HSIs into pixels, reconstructed them and optimised the reconstruction loss.
The HSI were denoised again using AutoEncoders and dimension was reduced using PCA.
Electronics 2023, 12, 488 25 of 34
In the end, they obtained classification results using CNN. They obtained high OA of 92.5%
on Pavia University dataset.
Sharifi et al. [98] also focused on extracting spectral spatial features of HSI. Earlier,
gabor filters were used to extract shallow texture features and fed into DL model. The au-
thors aimed to improve the performance and hence extracted two stage textural features.
The authors applied PCA, afterwards extracted gabor features and took mean of them in
all directions in each scale. Then they obtained LBP of these gabor filters which were more
discriminative than gabor features and LBP alone. They stacked these features and used
3D CNN for classification. Their work recorded OA of 97.72% on Indian Pines dataset.
Cao et al. [99] proposed a new architecture for CNN termed as 3D-2D SSHDR. It
was an end to end hybrid dilated residual networks. 3D hyperspectral cubes were the
input. 3D-2D SSHDR contained five parts, i.e., spectral feature learning process, 3D to 2D
deformable part, spatial feature learning process, an average pooling layer, and a fully
connected layer. The 3D spectral residual blocks learned discriminant spectral features.
For spatial feature learning, the extracted spectral features of 3D images were converted
into 2D features map. To continue learning discriminative spatial features, hybrid dilated
convolution (HDC) residual blocks were used that increased the receptive field of the
convolution kernel. It did not increase any other parameters The proposed network was
trained using supervised learning. The experiment was applied on Indian Pines, Kennedy
Space center and Pavia University datasets achieving high OA of 99.46%, 99.89% and
99.81%, respectively as compared with other models of CNN. The spatial features had not
been extracted in 3D. Also, in future transfer learning could help to extend samples and
improve accuracy.
Nalepa et al. [100] proposed resource frugal quantized spectral CNN. The weights/
activations were represented in compact format like integer or binary numbers without
affecting the classification process. They utilized multi-stage quantization aware training.
The deep model was trained in full precision followed by fake quantization and trained
again before being quantized to final low-bit version. Fake quantization was used as
intermediate step to simulate the quantization of weights/activations. The experiment was
performed on Pavia University and Salinas Valley. This model, four times smaller in size
than the original counterparts segmented equally well. It helped to reduce the memory
footprint of large-capacity model to classify the HSI. Varying the quantization levels could
help understand abilities of DL model better.
Vaddi et al. [101] worked on data normalization and CNN-based classification of HSI.
The normalization was performed by downsizing pixel scalar values by dividing them
with the maximum pixel intensity value. Probabilistic PCA was used to extract spectral
features. Gabor filter helped in acquiring the spatial features. Both the spatial and spectral
information were integrated to form fused features used by CNN. The experiment was
performed on Indian pines, Salinas valley and Pavia University dataset where the proposed
approach gave highest accuracy as compared to other state of art approaches. The running
time of the propose approach needs to be improved.
Various deep neural network models were used by Jiao et al. [102] for HSI classifi-
cation. In first approach, multi scale spatial features were extracted using convolution
network-based on VGG-verydeep-16. It contained 13 convolutional layers, five pooling lay-
ers, three fully connected layers and activation and dropout layers. The deep scale spatial
features were fused with spectral features using weighted fusion method and z-score. It
was used to segment the scenes and obtained pixel-based classification results on Indian
Pines dataset. In second approach, Recursive Autoencoders were employed. It formed
high level spatial spectral features from the original data. It learned local homogeneous
area of the image using the pixel under investigation. The spatial features of the pixel were
learned using weighting scheme-based on the neighbouring pixels. The weights were deter-
mined using the spectral similarity between the investigated pixel and neighbouring pixels.
Unsupervised RAE was employed on Pavia University dataset achieving an accuracy of
99.91%. Third approach involved Superpixels-based Multi Local CNN (SML-CNN). Super-
Electronics 2023, 12, 488 26 of 34
pixels were formed using a linear iterative clustering algorithm. Multiple local regions of
superpixels were jointly represented namely original, central and corner regions. It gave
different semantic environment of each superpixel even if there was spectral similarity. Fea-
tures were fused from the same. The classification was improved using multi-information
modification strategy to eliminate the errors by combining semantic (superpixel level) and
detailed information (pixel level). The proposed algorithm achieved a good accuracy.
Sharifi et al. [98] extracted complex spatial features using multi-scale CNN where
patches of different sizes were used. The spatial features were proved to improve the
classification performance. Hence, the authors included spatial features obtained from
gabor filters, morphological operations and LBP. All these features were fused with PCA’s
spectral features at the decision level for classification. It achieved an OA of 97.98% and
99.44% on 1% and 5% training samples from each class.
Due to radiometric and atmospheric corrections, many informative bands would be
lost. In 2021, Singh and Kasana [103] performed a different spectral-spatial classification
by approximating lost noisy bands. They used linear interpolation to gain approximated
bands. Further, they reduced spectral dimension and obtained spatial features through a
combination of LPP and PCA. The features were classified using deep network alongwith
SAE. The work achieved an OA of 88.9%, 93.3%, 91% and 91.5% on IP, Sa, KSC and PU, resp.
The recent DL classification techniques discussed above have been compared in Table 6.
Table 6. Cont.
5. Discussion
After an extensive survey of spectral, spatial and spectral-spatial features-based classi-
fication of HSI, following insights have been observed.
• Majorly, land cover HSI datasets have been covered in this work. Indian Pines and
Pavia University are the commonly used dataset for classification as depicted in
Figure 14. Figure 15 displays the highest and lowest OA achieved by different classifi-
cation techniques in the survey.
• In traditional ML, kernel-based techniques have been employed for landcover images.
Table 1 shows the greatest OA of 99.5%, obtained with Shape adaptable kernels. It
incorporated spectral and spatial features, which helped to increase performance.
The main disadvantage of mathematical kernel is calculations overhead.
• SVM classifier, a kernel-based classifier, has been widely used for land cover im-
ages. The highest performance was an accuracy of 98.68%. SVM classifier improves
classification results when combined with a spatial Gaussian filter.
Electronics 2023, 12, 488 28 of 34
• The transform-based techniques aid in the denoising and compression HSI. Table 2
demonstrates the highest OA with SVM on benchmark landcover photos of 99.0% and
99.82 percent using Adaboost modelling to detect bruising in fruits.
• PCA has been commonly utilised as a data pre-processing step in traditional ML
approaches. It aided in the elimination of unnecessary spectral data.
• Many classification methods include dimension reduction techniques as pre-processing
steps. However, we have explicitly included a few different strategies, such as super-
vised, unsupervised, feature selection, and extraction, to emphasise their performance.
Table 5 demonstrates that the land cover image with bilateral filtering and spectral
similarity calculated and used in sparse representation classification and had the
greatest OA of 99.76%.
• DL techniques have heavily invaded into the research for HSI. It has shown better
performances due to in-built features processing and convolution kernels to deal with
complex HSI data. The resource frugal networks for land cover image achieved the
highest OA of 99.89% as evident in Table 7. However, the data partitioning remains
a challenge for HSI. Due to limited samples, training and testing data overlaps and
exaggerated results are recorded.
Figure 15. The highest and lowest OA achieved by different classification techniques in the survey.
Electronics 2023, 12, 488 29 of 34
Techniques OA Remarks
SVM [28] 95.75% SVM implemented with different kernels.
CNN offers more computations to handle
complex data and generate useful features. It
does not need an expert for manual labeling
SVM + DL [92] 98.4% which is the case for supervised classifiers like
SVM. In this work, convolution kernels were
used for spatial features. It helped spectral SVM
to perform better classification.
DWT was used for denoising and enhancement
Wavelet Transform [37] 93.85%
of HSI and fed to SVM.
CNN offers high computations and deals with
complex data with many parameters. Here,
Wavelet Transform + DL [36] 98.64% DWT combined with CNN reduced the learnable
parameters and created a light, robust
CNN architecture.
Extracted features through heavy computations
Simple Band reduction [65] 95.8%
of KL divergence, PSO and MKB
DL offers automatic and multi-layer processing
for extracting features. It is more powerful than
manual hit and trial of different feature
Band Reduction + DL [47] 99.06% engineering techniques. The authors
implemented the computation power of
Autoencoders to extract informative spectral
spatial features.
SVM is a spectral classifier. The spatial features
SVM + Gaussian Filter [27] 98.68% from filter were combined with SVM
classification map.
High computation power of multi-scale CNN
DL + Gabor Filter [109] 99.38% extracted better spectral-spatial features than
SVM + Gaussian.
The purpose of this paper is to explore how well various categorization techniques
performed for HSI analysis. Some authors employed either spectral or spatial data, however
in recent papers, the emphasis has changed to both spectral and spatial data. In terms of
OA, Table 7 demonstrates significant differences between classic ML and DL approaches.
Although the OA of both algorithms is comparable, DL outperforms due to its automatic
feature development and robustness in dealing with complex HSI.
• With lesser number of samples and huge number of spectral bands, Hughes Phe-
nomena occurs in HSI. In this, with increasing bands and data, the classification
performance increases initially but decreases gradually.
• Target detection also remains one of HSI’s significant challenges, as the inherent
variability in target and background spectra poses a severe obstacle to developing
effective target detection algorithms for HSI. This may be due to the problem of un-
known backgrounds or shortage of sufficient target data, making it more challenging
and becoming a problem to be solved by more sophisticated techniques.
Author Contributions: All the authors made significant contributions to this work. Conceptualiza-
tion, S.S.K. and G.K.; Writing—original draft preparation, R.G.; Writing—revision and editing, R.G.,
S.S.K. and G.K. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Publicly available datasets were analyzed in this study. This data can
be found here: https://rslab.ut.ac.ir/data.
Conflicts of Interest: The authors declare that there is no conflict of interest.
References
1. Huete, A.R. Vegetation indices, remote sensing and forest monitoring. Geogr. Compass 2012, 6, 513–532. [CrossRef]
2. Khan, M.J.; Khan, H.S.; Yousaf, A.; Khurshid, K.; Abbas, A. Modern trends in hyperspectral image analysis: A review. IEEE
Access 2018, 6, 14118–14129. [CrossRef]
3. Leiva-Valenzuela, G.A.; Lu, R.; Aguilera, J.M. Prediction of firmness and soluble solids content of blueberries using hyperspectral
reflectance imaging. J. Food Eng. 2013, 115, 91–98. [CrossRef]
4. Liu, Z.; Wang, H.; Li, Q. Tongue tumor detection in medical hyperspectral images. Sensors 2011, 12, 162–174. [CrossRef] [PubMed]
5. Liu, L.; Wang, J.; Huang, W.; Zhao, C.; Zhang, B.; Tong, Q. Improving winter wheat yield prediction by novel spectral index.
Trans. CSAE 2004, 20, 172–175.
6. Kutser, T.; Paavel, B.; Verpoorter, C.; Kauer, T.; Vahtmäe, E. Remote sensing of water quality in optically complex lakes. ISPRS
Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 39, B8. [CrossRef]
7. Zhang, L.; Zhang, L.; Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geosci.
Remote Sens. Mag. 2016, 4, 22–40. [CrossRef]
8. Gogineni, R.; Chaturvedi, A. Hyperspectral image classification. In Processing and Analysis of Hyperspectral Data; IntechOpen:
London, UK, 2019.
9. Gu, Y.; Chanussot, J.; Jia, X.; Benediktsson, J.A. Multiple kernel learning for hyperspectral image classification: A review. IEEE
Trans. Geosci. Remote Sens. 2017, 55, 6547–6565. [CrossRef]
Electronics 2023, 12, 488 31 of 34
10. Rani, A.; Kumar, N.; Kumar, J.; Sinha, N.K. Machine learning for soil moisture assessment. In Deep Learning for Sustainable
Agriculture; Elsevier: Amsterdam, The Netherlands, 2022; pp. 143–168.
11. Lakshmi, T.V.H.; Madhu, T. Satellite Image Resolution Enhancement Using Discrete Wavelet Transform and Gaussian Mixture
Model. Int. Res. J. Eng. Technol. IRJET 2015, 2, 95–100.
12. Maduranga, U. Dimensionality Reduction in Data Mining. 2020. Available online: https://towardsdatascience.com/
dimensionality-reduction-in-data-mining-f08c734b3001 (accessed on 25 December 2022).
13. Gu, Y.; Liu, H. Sample-screening MKL method via boosting strategy for hyperspectral image classification. Neurocomputing 2016,
173, 1630–1639. [CrossRef]
14. Fang, L.; He, N.; Li, S.; Ghamisi, P.; Benediktsson, J.A. Extinction profiles fusion for hyperspectral images classification. IEEE
Trans. Geosci. Remote Sens. 2017, 56, 1803–1815. [CrossRef]
15. Li, L.; Wang, C.; Li, W.; Chen, J. Hyperspectral image classification by AdaBoost weighted composite kernel extreme learning
machines. Neurocomputing 2018, 275, 1725–1733. [CrossRef]
16. Li, F.; Lu, H.; Zhang, P. An innovative multi-kernel learning algorithm for hyperspectral classification. Comput. Electr. Eng. 2019,
79, 106456. [CrossRef]
17. Li, D.; Wang, Q.; Kong, F. Adaptive Kernel Sparse Representation Based on Multiple Feature Learning for Hyperspectral Image
Classification. Neurocomputing 2020, 400, 97–112. [CrossRef]
18. Gao, Y.; Cheng, T.; Wang, B. Nonlinear Anomaly Detection Based on Spectral-Spatial Composite Kernel for Hyperspectral Images.
IEEE Geosci. Remote Sens. Lett. 2020, 18, 1269–1273. [CrossRef]
19. Wang, Y.; Yu, W.; Fang, Z. Multiple kernel-based SVM classification of hyperspectral images by combining spectral, spatial, and
semantic information. Remote Sens. 2020, 12, 120. [CrossRef]
20. Ma, K.Y.; Chang, C.I. Kernel-based constrained energy minimization for hyperspectral mixed pixel classification. IEEE Trans.
Geosci. Remote Sens. 2021, 60, 1–23. [CrossRef]
21. Ansari, M.; Homayouni, S.; Safari, A.; Niazmardi, S. A New Convolutional Kernel Classifier for Hyperspectral Image Classifica-
tion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11240–11256. [CrossRef]
22. Krishna, S.L.; Jeya, I.; Deepa, S. Fuzzy-twin proximal SVM kernel-based deep learning neural network model for hyperspectral
image classification. Neural Comput. Appl. 2022, 34, 19343–19376. [CrossRef]
23. Wang, A.; Xing, S.; Zhao, Y.; Wu, H.; Iwahori, Y. A hyperspectral image classification method based on adaptive spectral spatial
kernel combined with improved vision transformer. Remote Sens. 2022, 14, 3705. [CrossRef]
24. Dalla Mura, M.; Villa, A.; Benediktsson, J.A.; Chanussot, J.; Bruzzone, L. Classification of hyperspectral images by using extended
morphological attribute profiles and independent component analysis. IEEE Geosci. Remote Sens. Lett. 2010, 8, 542–546. [CrossRef]
25. Licciardi, G.; Marpu, P.R.; Chanussot, J.; Benediktsson, J.A. Linear versus nonlinear PCA for the classification of hyperspectral
data based on the extended morphological profiles. IEEE Geosci. Remote Sens. Lett. 2011, 9, 447–451. [CrossRef]
26. Dópido, I.; Li, J.; Marpu, P.R.; Plaza, A.; Dias, J.M.B.; Benediktsson, J.A. Semisupervised self-learning for hyperspectral image
classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4032–4044. [CrossRef]
27. Zhong, S.; Chang, C.I.; Zhang, Y. Iterative support vector machine for hyperspectral image classification. In Proceedings of the
2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 3309–3312.
28. Pathak, D.K.; Kalita, S.K.; Bhattacharya, D.K. Hyperspectral image classification using support vector machine: A spectral spatial
feature based approach. Evol. Intell. 2022, 15, 1809–1823. [CrossRef]
29. Li, R.; Cui, K.; Chan, R.H.; Plemmons, R.J. Classification of hyperspectral images using SVM with shape-adaptive reconstruction
and smoothed total variation. arXiv 2022, arXiv:2203.15619.
30. Akbari, H.; Kosugi, Y.; Kojima, K.; Tanaka, N. Wavelet-based compression and segmentation of hyperspectral images in surgery.
In Medical Imaging and Augmented Reality, Proceedings of the International Workshop on Medical Imaging and Virtual Reality, Tokyo,
Japan, 1–2 August 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 142–149.
31. Chen, C.; Guo, B.; Wu, X.; Shen, H. An edge detection method for hyperspectral image classification based on mean shift.
In Proceedings of the 2014 7th International Congress on Image and Signal Processing, Dalian, China, 14–16 October 2014;
pp. 553–557.
32. Quesada-Barriuso, P.; Argüello, F.; Heras, D.B. Spectral–spatial classification of hyperspectral images using wavelets and extended
morphological profiles. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1177–1185. [CrossRef]
33. Prabhakar, T.N.; Geetha, P. Two-dimensional empirical wavelet transform based supervised hyperspectral image classification.
ISPRS J. Photogramm. Remote Sens. 2017, 133, 37–45. [CrossRef]
34. Ji, Y.; Sun, L.; Li, Y.; Ye, D. Detection of bruised potatoes using hyperspectral imaging technique based on discrete wavelet
transform. Infrared Phys. Technol. 2019, 103, 103054. [CrossRef]
35. Anand, R.; Veni, S.; Aravinth, J. Robust classification technique for hyperspectral images based on 3D-discrete wavelet transform.
Remote Sens. 2021, 13, 1255. [CrossRef]
36. Xu, J.; Zhao, J.; Liu, C. An Effective Hyperspectral Image Classification Approach Based on Discrete Wavelet Transform and
Dense CNN. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [CrossRef]
37. Miclea, A.V.; Terebes, R.M.; Meza, S.; Cislariu, M. On Spectral-Spatial Classification of Hyperspectral Images Using Image
Denoising and Enhancement Techniques, Wavelet Transforms and Controlled Data Set Partitioning. Remote Sens. 2022, 14, 1475.
[CrossRef]
Electronics 2023, 12, 488 32 of 34
38. Ji, Y.; Sun, L.; Li, Y.; Li, J.; Liu, S.; Xie, X.; Xu, Y. Non-destructive classification of defective potatoes based on hyperspectral
imaging and support vector machine. Infrared Phys. Technol. 2019, 99, 71–79. [CrossRef]
39. Cao, X.; Yao, J.; Fu, X.; Bi, H.; Hong, D. An enhanced 3-D discrete wavelet transform for hyperspectral image classification. IEEE
Geosci. Remote Sens. Lett. 2020, 18, 1104–1108. [CrossRef]
40. Zikiou, N.; Lahdir, M.; Helbert, D. Hyperspectral image classification using graph-based wavelet transform. Int. J. Remote Sens.
2020, 41, 2624–2643. [CrossRef]
41. Manoharan, P.; Boggavarapu, P.K.L. Improved whale optimization based band selection for hyperspectral remote sensing image
classification. Infrared Phys. Technol. 2021, 119, 103948. [CrossRef]
42. Tulapurkar, H.; Banerjee, B.; Buddhiraju, K.M. Multi-head attention with CNN and wavelet for classification of hyperspectral
image. Neural Comput. Appl. 2022, 1–15. [CrossRef]
43. Villa, A.; Benediktsson, J.A.; Chanussot, J.; Jutten, C. Hyperspectral image classification with independent component discriminant
analysis. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4865–4876. [CrossRef]
44. Santos, A.; Pedrini, H. A combination of k-means clustering and entropy filtering for band selection and classification in
hyperspectral images. Int. J. Remote Sens. 2016, 37, 3005–3020. [CrossRef]
45. Schclar, A.; Averbuch, A. A diffusion approach to unsupervised segmentation of hyper-spectral images. In Computational
Intelligence, Proceedings of the International Joint Conference on Computational Intelligence, Funchal-Madeira, Portugal, 1–3 November
2017; Springer: Cham, Switzerland, 2017; pp. 163–178.
46. Jain, D.K.; Dubey, S.B.; Choubey, R.K.; Sinhal, A.; Arjaria, S.K.; Jain, A.; Wang, H. An approach for hyperspectral image
classification by optimizing SVM using self organizing map. J. Comput. Sci. 2018, 25, 252–259. [CrossRef]
47. Ahmad, M.; Alqarni, M.A.; Khan, A.M.; Hussain, R.; Mazzara, M.; Distefano, S. Segmented and non-segmented stacked denoising
autoencoder for hyperspectral band reduction. Optik 2019, 180, 370–378. [CrossRef]
48. Romaszewski, M.; Głomb, P.; Cholewa, M. Semi-supervised hyperspectral classification from a small number of training samples
using a co-training approach. ISPRS J. Photogramm. Remote Sens. 2016, 121, 60–76. [CrossRef]
49. Li, L.; Sun, C.; Lin, L.; Li, J.; Jiang, S. A dual-layer supervised Mahalanobis kernel for the classification of hyperspectral images.
Neurocomputing 2016, 214, 430–444. [CrossRef]
50. Nhaila, H.; Elmaizi, A.; Sarhrouni, E.; Hammouch, A. Supervised classification methods applied to airborne hyperspectral
images: Comparative study using mutual information. Procedia Comput. Sci. 2019, 148, 97–106. [CrossRef]
51. Ren, J.; Wang, R.; Liu, G.; Feng, R.; Wang, Y.; Wu, W. Partitioned relief-F method for dimensionality reduction of hyperspectral
images. Remote Sens. 2020, 12, 1104. [CrossRef]
52. Liu, H.; Li, W.; Xia, X.G.; Zhang, M.; Tao, R. Superpixelwise Collaborative-Representation Graph Embedding for Unsupervised
Dimension Reduction in Hyperspectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4684–4698. [CrossRef]
53. Ding, S.; Keal, C.A.; Zhao, L.; Yu, D. Dimensionality reduction and classification for hyperspectral image based on robust
supervised ISOMAP. J. Ind. Prod. Eng. 2022, 39, 19–29. [CrossRef]
54. Qi, C.; Wang, Y.; Tian, W.; Wang, Q. Multiple kernel boosting framework based on information measure for classification. Chaos
Solitons Fractals 2016, 89, 175–186. [CrossRef]
55. Yang, R.; Su, L.; Zhao, X.; Wan, H.; Sun, J. Representative band selection for hyperspectral image classification. J. Vis. Commun.
Image Represent. 2017, 48, 396–403. [CrossRef]
56. Medjahed, S.A.; Ouali, M. Band selection based on optimization approach for hyperspectral image classification. Egypt. J. Remote
Sens. Space Sci. 2018, 21, 413–418. [CrossRef]
57. Xie, F.; Li, F.; Lei, C.; Yang, J.; Zhang, Y. Unsupervised band selection based on artificial bee colony algorithm for hyperspectral
image classification. Appl. Soft Comput. 2019, 75, 428–440. [CrossRef]
58. Sellami, A.; Farah, M.; Farah, I.R.; Solaiman, B. Hyperspectral imagery classification based on semi-supervised 3-D deep neural
network and adaptive band selection. Expert Syst. Appl. 2019, 129, 246–259. [CrossRef]
59. Elmaizi, A.; Nhaila, H.; Sarhrouni, E.; Hammouch, A.; Nacir, C. A novel information gain based approach for classification and
dimensionality reduction of hyperspectral images. Procedia Comput. Sci. 2019, 148, 126–134. [CrossRef]
60. Sawant, S.; Manoharan, P. Hyperspectral band selection based on metaheuristic optimization approach. Infrared Phys. Technol.
2020, 107, 103295. [CrossRef]
61. Zhu, Q.; Wang, Y.; Wang, F.; Song, M.; Chang, C.I. Hyperspectral band selection based on improved affinity propagation.
In Proceedings of the 2021 11th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing
(WHISPERS), Amsterdam, The Netherlands, 24–26 March 2021; pp. 1–4.
62. Uddin, M.P.; Mamun, M.A.; Afjal, M.I.; Hossain, M.A. Information-theoretic feature selection with segmentation-based folded
principal component analysis (PCA) for hyperspectral image classification. Int. J. Remote Sens. 2021, 42, 286–321. [CrossRef]
63. Zhang, J. A hybrid clustering method with a filter feature selection for hyperspectral image classification. J. Imaging 2022, 8, 180.
[CrossRef] [PubMed]
64. Imani, M.; Ghassemian, H. Binary coding based feature extraction in remote sensing high dimensional data. Inf. Sci. 2016,
342, 191–208. [CrossRef]
65. Qi, C.; Zhou, Z.; Sun, Y.; Song, H.; Hu, L.; Wang, Q. Feature selection and multiple kernel boosting framework based on PSO with
mutation mechanism for hyperspectral classification. Neurocomputing 2017, 220, 181–190. [CrossRef]
Electronics 2023, 12, 488 33 of 34
66. Ksieniewicz, P.; Krawczyk, B.; Woźniak, M. Ensemble of Extreme Learning Machines with trained classifier combination and
statistical features for hyperspectral data. Neurocomputing 2018, 271, 28–37. [CrossRef]
67. Qiao, T.; Yang, Z.; Ren, J.; Yuen, P.; Zhao, H.; Sun, G.; Marshall, S.; Benediktsson, J.A. Joint bilateral filtering and spectral similarity-
based sparse representation: A generic framework for effective feature extraction and data classification in hyperspectral imaging.
Pattern Recognit. 2018, 77, 316–328. [CrossRef]
68. Paul, S.; Kumar, D.N. Spectral-spatial classification of hyperspectral data with mutual information based segmented stacked
autoencoder approach. ISPRS J. Photogramm. Remote Sens. 2018, 138, 265–280. [CrossRef]
69. Chen, Z.; Jiang, J.; Zhou, C.; Fu, S.; Cai, Z. SuperBF: Superpixel-based bilateral filtering algorithm and its application in feature
extraction of hyperspectral images. IEEE Access 2019, 7, 147796–147807. [CrossRef]
70. Li, Q.; Zheng, B.; Tu, B.; Wang, J.; Zhou, C. Ensemble EMD-based spectral-spatial feature extraction for hyperspectral image
classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5134–5148. [CrossRef]
71. Wang, D.; Du, B.; Zhang, L.; Xu, Y. Adaptive spectral–spatial multiscale contextual feature extraction for hyperspectral image
classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2461–2477. [CrossRef]
72. Liang, N.; Duan, P.; Xu, H.; Cui, L. Multi-View Structural Feature Extraction for Hyperspectral Image Classification. Remote Sens.
2022, 14, 1971. [CrossRef]
73. Ratle, F.; Camps-Valls, G.; Weston, J. Semisupervised neural networks for efficient hyperspectral image classification. IEEE Trans.
Geosci. Remote Sens. 2010, 48, 2271–2282. [CrossRef]
74. Lin, Z.; Chen, Y.; Zhao, X.; Wang, G. Spectral-spatial classification of hyperspectral image using autoencoders. In Proceedings of
the 2013 9th International Conference on Information, Communications & Signal Processing, Taiwan, China, 10–13 December
2013; pp. 1–5.
75. Yue, J.; Zhao, W.; Mao, S.; Liu, H. Spectral–spatial classification of hyperspectral images using deep convolutional neural
networks. Remote Sens. Lett. 2015, 6, 468–477. [CrossRef]
76. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens.
2015, 2015, 258619. [CrossRef]
77. Chan, T.H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. PCANet: A simple deep learning baseline for image classification? IEEE Trans.
Image Process. 2015, 24, 5017–5032. [CrossRef] [PubMed]
78. Liu, P.; Zhang, H.; Eom, K.B. Active deep learning for classification of hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs.
Remote Sens. 2016, 10, 712–724. [CrossRef]
79. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on
convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [CrossRef]
80. Zabalza, J.; Ren, J.; Zheng, J.; Zhao, H.; Qing, C.; Yang, Z.; Du, P.; Marshall, S. Novel segmented stacked autoencoder for effective
dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 2016, 185, 1–10. [CrossRef]
81. Yu, S.; Jia, S.; Xu, C. Convolutional neural networks for hyperspectral image classification. Neurocomputing 2017, 219, 88–98.
[CrossRef]
82. Chen, Y.; Zhu, L.; Ghamisi, P.; Jia, X.; Li, G.; Tang, L. Hyperspectral images classification with Gabor filtering and convolutional
neural network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2355–2359. [CrossRef]
83. Li, Y.; Xie, W.; Li, H. Hyperspectral image reconstruction by deep convolutional neural network for classification. Pattern Recognit.
2017, 63, 371–383. [CrossRef]
84. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote
Sens. 2017, 55, 3639–3655. [CrossRef]
85. Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018,
27, 2623–2634. [CrossRef]
86. Deng, C.; Xue, Y.; Liu, X.; Li, C.; Tao, D. Active transfer learning network: A unified deep joint spectral–spatial feature learning
model for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1741–1754. [CrossRef]
87. Liang, M.; Jiao, L.; Yang, S.; Liu, F.; Hou, B.; Chen, H. Deep multiscale spectral-spatial feature fusion for hyperspectral images
classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2911–2924. [CrossRef]
88. Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A fast dense spectral–spatial convolution network framework for hyperspectral images
classification. Remote Sens. 2018, 10, 1068. [CrossRef]
89. Yang, X.; Ye, Y.; Li, X.; Lau, R.Y.; Zhang, X.; Huang, X. Hyperspectral image classification with deep learning models. IEEE Trans.
Geosci. Remote Sens. 2018, 56, 5408–5423. [CrossRef]
90. Pan, B.; Shi, Z.; Xu, X. MugNet: Deep learning for hyperspectral image classification using limited samples. ISPRS J. Photogramm.
Remote Sens. 2018, 145, 108–119. [CrossRef]
91. Paoletti, M.; Haut, J.; Plaza, J.; Plaza, A. A new deep convolutional neural network for fast hyperspectral image classification.
ISPRS J. Photogramm. Remote Sens. 2018, 145, 120–147. [CrossRef]
92. Chen, C.; Jiang, F.; Yang, C.; Rho, S.; Shen, W.; Liu, S.; Liu, Z. Hyperspectral classification based on spectral–spatial convolutional
neural networks. Eng. Appl. Artif. Intell. 2018, 68, 165–171. [CrossRef]
93. Singh, S.; Kasana, S.S. Efficient classification of the hyperspectral images using deep learning. Multimed. Tools Appl. 2018,
77, 27061–27074. [CrossRef]
Electronics 2023, 12, 488 34 of 34
94. Zhou, F.; Hang, R.; Liu, Q.; Yuan, X. Hyperspectral image classification using spectral-spatial LSTMs. Neurocomputing 2019,
328, 39–47. [CrossRef]
95. Fang, B.; Li, Y.; Zhang, H.; Chan, J.C.W. Hyperspectral images classification based on dense convolutional networks with
spectral-wise attention mechanism. Remote Sens. 2019, 11, 159. [CrossRef]
96. Liu, X.; Hu, Q.; Cai, Y.; Cai, Z. Extreme learning machine-based ensemble transfer learning for hyperspectral image classification.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3892–3902. [CrossRef]
97. Ramamurthy, M.; Robinson, Y.H.; Vimal, S.; Suresh, A. Auto encoder based dimensionality reduction and classification using
convolutional neural networks for hyperspectral images. Microprocess. Microsyst. 2020, 79, 103280. [CrossRef]
98. Sharifi, O.; Mokhtarzade, M.; Beirami, B.A. A Deep Convolutional Neural Network based on Local Binary Patterns of Gabor
Features for Classification of Hyperspectral Images. In Proceedings of the 2020 International Conference on Machine Vision and
Image Processing (MVIP), Qom, Iran, 18–20 February 2020; pp. 1–5.
99. Cao, F.; Guo, W. Deep hybrid dilated residual networks for hyperspectral image classification. Neurocomputing 2020, 384, 170–181.
[CrossRef]
100. Nalepa, J.; Antoniak, M.; Myller, M.; Lorenzo, P.R.; Marcinkiewicz, M. Towards resource-frugal deep convolutional neural
networks for hyperspectral image segmentation. Microprocess. Microsyst. 2020, 73, 102994. [CrossRef]
101. Vaddi, R.; Manoharan, P. Hyperspectral image classification using CNN with spectral and spatial features integration. Infrared
Phys. Technol. 2020, 107, 103296. [CrossRef]
102. Jiao, L.; Shang, R.; Liu, F.; Zhang, W. Brain and Nature-Inspired Learning, Computation and Recognition; Elsevier: Amsterdam,
The Netherlands, 2020.
103. Singh, S.; Kasana, S.S. A Pre-processing framework for spectral classification of hyperspectral images. Multimed. Tools Appl. 2021,
80, 243–261. [CrossRef]
104. Li, L.; Ge, H.; Gao, J. A spectral-spatial kernel-based method for hyperspectral imagery classification. Adv. Space Res. 2017,
59, 954–967. [CrossRef]
105. Manifold, B.; Men, S.; Hu, R.; Fu, D. A versatile deep learning architecture for classification and label-free prediction of
hyperspectral images. Nat. Mach. Intell. 2021, 3, 306–315. [CrossRef]
106. Xue, Z.; Yu, X.; Liu, B.; Tan, X.; Wei, X. HResNetAM: Hierarchical residual network with attention mechanism for hyperspectral
image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3566–3580. [CrossRef]
107. Sellami, A.; Tabbone, S. Deep neural networks-based relevant latent representation learning for hyperspectral image classification.
Pattern Recognit. 2022, 121, 108224. [CrossRef]
108. Zhan, Y.; Wu, K.; Dong, Y. Enhanced Spectral–Spatial Residual Attention Network for Hyperspectral Image Classification. IEEE J.
Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7171–7186. [CrossRef]
109. Sharifi, O.; Mokhtarzadeh, M.; Asghari Beirami, B. A new deep learning approach for classification of hyperspectral images:
Feature and decision level fusion of spectral and spatial features in multiscale CNN. Geocarto Int. 2021, 37, 1–26. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.