SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
Abstract— In recent years, convolutional neural networks tasks [2], [7], [8], [9], other areas in which HSIs have been
(CNNs) have drawn significant attention for the classification widely exploited include forestry [10], target/object detection,
of hyperspectral images (HSIs). Due to their self-attention mech- mineral exploration, and mapping [11], [12], environmental
anism, the vision transformer (ViT) provides promising classifi-
cation performance compared to CNNs. Many researchers have monitoring [13], disaster risk management, and biodiversity
incorporated ViT for HSI classification purposes. However, its conservation. The popularity of HSIs is due to rich spectral
performance can be further improved because the current version and spatial information [14].
does not use spatial–spectral features. In this article, we present a From the point of view of RS imaging technology, the
new morphological transformer (morphFormer) that implements
affinity of spectral and spatial resolution is quite critical [15].
a learnable spectral and spatial morphological network, where
spectral and spatial morphological convolution operations are Spatial resolution is often limited by the very high spec-
used (in conjunction with the attention mechanism) to improve tral resolution of HSIs, and it may negatively affect land
the interaction between the structural and shape information of cover classification for complex scenes. For example, hyper-
the HSI token and the CLS token. Experiments conducted on spectral (HS) data do not provide proper information about
widely used HSIs demonstrate the superiority of the proposed
the elevation and size of different structures of interest in
morphFormer over the classical CNN models and state-of-the-art
transformer models. The source will be made available publicly particular application domains [14], [16]. Most conventional
at https://github.com/mhaut/morphFormer. classifiers often process HSIs depending on spectral informa-
Index Terms— Classification, hyperspectral images (HSIs),
tion and disregard spatial information among adjacent pixels.
morphological transformer (morphFormer), spatial–spectral To solve this issue, different techniques can be implemented
features. to incorporate both spatial and spectral information. With
spatial processing, the size and shape of different objects
I. I NTRODUCTION can be determined resulting in better classification accuracy.
In the following, we summarize some of the most relevant
H YPERSPECTRAL images (HSIs) contain information in
contiguous wavelengths [1], [2], [3]. HSIs have been
adopted in many application areas of remote sensing (RS) and
methods for exploiting HSI data, outlining their pros and
cons.
Earth observation (EO), such as urban planning, vegetation In HSI classification, conventional classifiers have been
monitoring, and crop management [4], [5], [6]. HSIs have widely utilized, even in the presence of limited training
particularly been used in EO tasks, such as desertification or samples [3], [17], [18]. In general, these techniques include
climate change studies. In addition to land cover classification two stages. First, they reduce the dimensionality of the HSI
data and extract some informative features. Then, spectral clas-
Manuscript received 12 December 2022; revised 13 January 2023; sifiers are fed with such features for classification purposes [2],
accepted 30 January 2023. Date of publication 3 February 2023; date [7], [19], [20], [21], [22]. In scenarios with limited training
of current version 23 February 2023. This work was supported in part
by the Consejeria de Economia, Ciencia y Agencia Digital de la Junta samples, support vector machines (SVMs) with nonlinear
de Extremadura, and Fondo Europeo de Desarrollo Regional de la Union kernels have been widely used [23]. Moreover, the extreme
Europea under Reference Grant GR21040; in part by the Spanish Ministerio
de Ciencia e Innovacion under Project PID2019-110315RB-I00 (APRISA);
learning machine (ELM) has been broadly used to extract fea-
in part by the European Union’s Horizon 2020 Research and Innovation tures from unbalanced training sets. Li et al. [24] implemented
Program under Grant 734541 (EOXPOSURE); and in part by the Science an ELM to classify HSIs by extracting local binary patterns
and Engineering Research Board (SERB), Government of India, under Project
Grant SRG/2022/001390. (Corresponding author: Antonio Plaza.)
(LBPs) for classification. They demonstrated that ELMs can
Swalpa Kumar Roy is with the Department of Computer Science and obtain better classification results than SVMs. The random
Engineering, Jalpaiguri Government Engineering College, Jalpaiguri 735102, forest (RF) was also utilized for the classification of HSIs due
India (e-mail: [email protected]).
Ankur Deria is with the Department of Informatics, Technical Uni-
to its discriminative power [2]. However, the aforementioned
versity of Munich, 85748 Garching bei München, Germany (e-mail: classifiers face challenges when the training data are not
[email protected]). representative, suffering from data fitting problems. This is
Chiranjibi Shah and Qian Du are with the Department of Electri-
cal and Computer Engineering, Mississippi State University, Starkville,
because these classifiers consider HSIs as an assembly of
MS 39762 USA (e-mail: [email protected]; [email protected]). measurements in the spectral domain, without considering
Juan M. Haut and Antonio Plaza are with the Hyperspectral Computing their arrangement in the spatial domain. Classifiers based on
Laboratory, Department of Technology of Computers and Communications,
Escuela Politécnica, University of Extremadura, 10003 Cáceres, Spain (e-mail:
spatial–spectral information significantly enhance the results
[email protected]; [email protected]). of spectral-based classifiers with the inclusion of spatial data,
Digital Object Identifier 10.1109/TGRS.2023.3242346 such as the size and shape of various objects. In addition,
1558-0644 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
spectral-based classifiers are more sensitive to noise compared spatial information effectively. He et al. [46] proposed a
to their spatial–spectral counterparts [2], [25]. bidirectional encoder representation for a transformer that
Deep learning (DL) methods have attracted significant incorporates flexible and dynamic input regions for pixel-based
attention for multimodal data integration [26] in RS data classification of HSIs. Zhong et al. [47] proposed a fac-
classification [27]. A wide variety of fragmented datasets can torized architecture search (FAS) framework, which enables
be intelligently analyzed with DL methods. More recently, a stable and fast spectral–spatial transformer architecture
a unified and general DL framework was developed by search subject to find out the optimal architecture settings
Hong et al. [28] for classification of RS imagery. 1-D convo- for the HSI classification task. To further improve the clas-
lutional neural networks (CNNs) (CNN1Ds) [29], 2-D CNNs sification performance of HSIs, Sun et al. [48] introduced
(CNN2Ds) [30], and 3-D CNNs (CNN3Ds) [31] have demon- spatial and spectral tokenization of feature representations
strated success in the classification of HSI data. in the encoder, which helps to extract local spatial infor-
Residual networks (ResNets) were introduced by mation and establish long-range relations between neigh-
He et al. [32]. These models have a minimum loss of boring sequences. Yang et al. [49] utilize an adaptive 3-D
information after each operation of the convolutional layers to convolution projection module to incorporate spatial–spectral
reduce the gradient vanishing problem [32]. Zhong et al. [33] information in an HSI transformer classification network. The
introduced a spatial–spectral ResNet (SSRN) for utilizing above transformer models are designed based on HSI data
both spatial and spectral information to obtain enhanced and utilize spectral–spatial feature representation mechanisms.
classification performance. Roy et al. adopted a lightweight Roy et al. [50] recently developed a multimodal fusion trans-
paradigm with the extraction of spatial and spectral features former (MFT) to extract features from HSIs and fuse them
via the squeeze-and-excitation ResNet that can be added with with a CLS token derived from light detection and ranging
a bag-of-features learning mechanism to accurately obtain (LiDAR) data to enhance the joint classification performance.
the final classification results [34], [35]. Zhu et al. [36] Mathematical morphology (MM) is a theory to analyze
incorporated other channel and spatial attention layers inside geometrical structures, based on topology, lattice theory,
the SSRN architecture for extracting discriminative features. set theory, and random functions. Researchers have utilized
To take full advantage of ResNets, they can be extended to MM-based techniques such as attribute profiles (APs) and
form even more complex models, such as the inclusion of extended morphological profiles (EPs) to extract spatial fea-
adaptive kernels [17], lightweight spatial–spectral attention tures and classify HSI data more accurately [16], [51], [52].
based on squeeze-and-excitation [35], and pyramidal ResNets Rasti et al. [53] applied total variation component analysis
[37]. Rotation-equivariant CNNs [38], gradient centralized for feature fusion to improve the joint extraction of EPs.
convolutions [1], [39], and lightweight heterogeneous Merentis et al. [54] used an RF classifier to classify HSI data
kernel convolutions [40] also enable efficient classification with an automated fusion approach. By exploiting APs and
and feature extraction. Generative adversarial networks EPs, MM has been successfully applied to extract features
(GANs), on the other hand, may help with mitigating the from RS data [55], [56], [57], [58]. In EPs and APs, sev-
class-imbalance problem in HSI classification [41], [42]. eral handcrafted characteristics are collected by sequentially
Despite their apparent ability to extract contextual informa- performing dilation and erosion operations using an extensive
tion in the spatial domain, CNNs cannot easily sequentially set of structuring elements (SEs). There are a few limitations
incorporate attributes, in particular, long- and middle-term common to both EPs and APs, however. Specifically, the
dependencies. As a consequence, their performance in HSI shape of the SE is fixed. In addition, the SEs can only
classification may be affected by the presence of classes obtain information about the size of existing objects but are
with similar spectral signatures, making it difficult to extract unable to collect information about the shape of arbitrary
diagnostic spectral attributes. The spectral signatures in HSIs item boundaries in complicated environments. To circumvent
can also be modeled using recursive neural networks (RNNs), these restrictions, Roy et al. [3] introduced a spectral–spatial
which accumulate them in a band-by-band fashion. This is CNN based on morphological erosion and dilation operations
important to learn long-term dependencies, as the gradient for HSI classification. In this work, a spatial and spectral
vanishing problem may further complicate the interpretation morphological block was created for extracting discrimina-
of spectrally salient changes [43]. However, RNNs are not tive and robust spatial and spectral information from HSIs
suitable for the simultaneous training of models because HSIs using its own trainable SEs in the erosion and dilation
generally contain many samples, which limits classifier per- layers.
formance. Our work addresses the aforementioned limitations Although MM has been successfully applied in RS for
by rethinking HSI data classification using transformers. extracting the spatial information based on techniques such
As cutting-edge backbone networks, transformers utilize as EPs or APs, the SEs are nontrainable [55], [56], [57],
self-attention techniques to process and analyze sequential data [58] and unable to capture dynamic features. If the EPs
more efficiently [44]. In recent years, several new transformer or APs are replaced with learnable MM operations, the
models have been developed including SpectralFormer [45], resulting networks can be more capable of learning subtle
which is capable of learning spectral information by creating features. Conventional transformer models use self-attention
a transformer encoder module and utilizing adjacent bands. to highlight the most important features. If MM operations
Transformers excel at characterizing spectral signatures, yet are combined with the transformer, the model may be able to
they are not able to model local semantic elements or utilize learn intrinsic shape information and use this information in
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
the self-attention block for better feature extraction, leading from these two convolutions are combined in an elementwise
to higher classification accuracies. fashion (⊕) and returned as output
With the aforementioned rationale in mind, a new mor-
phological fusion transformer encoder is introduced in this X in = Reshape(Conv3D(Reshape(X HSI )))
work, where the input patch is passed through two different X out = Conv2D(X in , k1, g1, p1) ⊕ Conv2D (X in , k2, g2, p2)
morphological blocks simultaneously. The results provided (1)
by these blocks are concatenated, and the CLS token is
added to the concatenated patch. The objective of our mor- where k1 = 3, g1 = 4, p1 = 1, k2 = 1, g2 = 1, and
phological transformer (morphFormer) model is to learn the p2 = 0. The output shape of the Conv3D layer is (8 × 11 ×
spectral–spatial information from the patch embeddings of 11 × (B − 8)), and that of the HetConv2D block is (11 ×
the HSI inputs, as well as to enrich the description of the 11 × 64). Batch normalization (BN) [59] and ReLU activation
abstract provided by the CLS token without adding significant layers are used after the Conv3D layer and the HetConv2D
computational complexity. block. If only a few limited training samples are available,
The main contributions of this work can be summarized as the overfitting phenomenon may arise. To address this issue
follows. and accelerate the training performance, we use a BN. ReLU
1) We provide a new learnable classification network based also helps in smoothing the back-propagation of the loss by
on a spectral–spatial morphFormer that conducts spatial introducing nonlinearity.
and spectral morphological convolutions via dilation and
erosion operators. B. Image Tokenization and Position Embedding
2) We introduce a new attention mechanism for efficiently HSIs contain spatial and spectral features which can provide
fusing the existing CLS tokens and information obtained highly discriminative information that can lead to higher
from HSI patch tokens into a new token that carries out classification accuracies. Patch tokens of shape (1 × 64)
morphological feature fusion. each are obtained by flattening HSI subcubes of shape
3) We conduct experiments on four public HSI datasets [(11 × 11) × 64] as follows:
by comparing the proposed network with other state-
of-the-art approaches. The obtained results reveal the X flat = T (Flatten(X out )) (2)
effectiveness of the proposed approach.
where T (·) is a transpose function and X flat ∈ R121×64 .
The remainder of this article is organized as follows. The tokenization [48] operation is used to select n from
Section II describes the proposed method in detail. Section III 121 patches as follows:
discusses our experimental results. Section IV concludes this
article. X Wa = softmax(T (X flat .WaH ))
X Wb = X flat .WbH (3)
II. P ROPOSED M ETHOD
where WaH ∈ R64×n , WbH ∈ R64×64 , X Wa ∈ Rn×121 , and
A. Convolutional Networks for Feature Learning
X Wb ∈ R121×64 . The tokenization operation uses two learnable
CNNs exhibit promising performance in HSI classification weights to extract the key features
due to their ability to automatically extract contextual features.
Since HSIs have numerous spectral bands, it is possible to X patch = X Wa .X Wb (4)
take advantage of CNNs for controlling the depth of the
where X patch ∈ Rn×64 . A total of (n + 1) patches are obtained
output feature maps. CNNs have already been proved to be
as described in (5) by concatenating (⊙) the CLS token to the
effective in capturing high-level features independently of
HSI patch tokens. The CLS token (X cls ) is a learnable tensor,
the data source modality. Our proposed model uses CNNs
which is randomly initialized. To simplify the calculation of
for extracting high-level abstract features to be used by the
head dimensions, a size of 64 is used
transformer. The spectral dimensions of the HSI are reduced
b patch .
by the CNN. Xb = X cls ⊙ X (5)
Our proposed model utilizes sequential layers of Conv3D
and HetConv2D for extracting robust and discriminative The semantic textural information in the image patch tokens
features from HSIs. The original data are arranged in subcubes can be preserved by adding trainable position embeddings to
XHSI (with dimensionality 11 × 11 × B) that are reshaped the patch embeddings. Hence, a trainable position embedding
into (1 × 11 × 11 × B) and used as input to a Conv3D is added to the created HSI patch tokens. Fig. 1 graphically
layer with kernel size (3 × 3 × 9) and padding (1 × 1 × 0). illustrates the addition of position embeddings (in elementwise
Padding is used so that the spatial size of the output image is fashion) to the patches (1 to n + 1). A dropout layer is
the same as that of the input image. The HetConv2D block used after this operation to reduce the effect of the vanishing
follows the Conv3D layer and consists of two Conv2D layers gradient. The above procedure can be expressed as
working in parallel. One of the Conv2D layers is used for X = DP X b ⊕ PE
(6)
groupwise convolution, and the other one is used for pointwise
convolution. HetConv2D utilizes two kernels of different where DP denotes a dropout layer with value of 0.1 and PE
sizes to extract multiscale information. The outputs obtained represents a learnable position embedding.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
Fig. 1. In the upper row, we show the proposed HSI classification network such that the classification map of the proposed method contains (left) less
noise than existing methods and (right) transformer encoder with a multihead patch attention mechanism. In the bottom row, we show the backbone of the
proposed method.
C. Spectral and Spatial Morphological Convolutions where ψ = { (i, j) | i ∈ {1, 2, 3, . . . , s}; j ∈ {1, 2, 3, . . . , s}}
represents the elements of the kernel and Wd denotes the SEs
MM is a powerful technique for characterizing the intrin- used for the dilation operation.
sic shape, structure, and size of objects in an image. The Regarding the erosion operation, the output of the convo-
spectral and spatial morphological network presented here is lution with the SE selects the pixel with minimum value in
designed based on dilation and erosion operations with SEs of the local neighborhood. This operation reduces the shape of
size (s × s). the background object in the HSI patch token (as opposed
A dilated image is produced by combining the input HSI to the dilation). Erosions can eliminate minor details and
patch tokens with SEs, selecting the pixel with the maximum enlarge holes, making them distinguishable from each other
value in the local neighborhood. As a result of the dilation in different texture regions. Let X patch ∈ R k×k be an input
procedure, the boundaries of the foreground objects of the HSI patch token of spatial size k × k, and let ⊟ represent the
HSI input patch token are broadened. In other words, the size morphological erosion operation. The erosion operation can
of the kernel affects the size of the texture for various regions be defined as
of an HSI patch token. The dilation process is represented by
X patch ⊟ We (x, y)
⊞ and can be denoted by the following equation:
= min X patch (x + i, y + j) − We (i, j)
(8)
(i, j)∈ψ
X patch ⊞ Wd (x, y)
where ψ = { (i, j) | i ∈ {1, 2, 3, . . . , s}; j ∈ {1, 2, 3, . . . , s}}
= max X patch (x + i, y + j) + Wd (i, j)
(7)
(i, j)∈ψ represents the elements of the kernel and We denotes the SEs
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
Fig. 2. Graphical visualization of (a) dilation and (b) erosion operations for an input image patch of size (7 × 7), dilated, and eroded with an SE of size
(3 × 3). The resulting outputs keep the same size using a padding technique.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
Fig. 5. MUUFL data. (a) Pseudocolor image using bands 40, 20, and Fig. 6. Trento data. (a) Pseudocolor image using bands 40, 20, and 10.
10. (b) Disjoint train samples. (c) Disjoint testing samples. The table shows (b) Disjoint train samples. (c) Disjoint testing samples. The table shows
land-cover types for each class along with the number of disjoint train and land-cover types for each class along with the number of disjoint train and
test samples, where the train samples represent 5% of the available ground test samples.
truth, and the test samples represent the remaining 95% of the ground truth.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
TABLE I
C LASSIFICATION P ERFORMANCE ( IN %) ON THE UH HSI DATASET
Fig. 8. Classification maps for the Houston (UH) HSI dataset. (a) Ground truth. (b) KNN (69.48%). (c) RF (74.87%). (d) SVM (68.13%). (e) CNN1D
(63.04%). (f) CNN2D (65.85%). (g) CNN3D (70.26%). (h) RNN (65.20%). (i) ViT (83.23%). (j) SpectralFormer (76.35%). (k) morphFormer (87.85%).
from KNN, RF, SVM, and RNN, the Adam optimizer [69], displayed in bold. The results show that the proposed approach
[70] has been used to train the models, with a weight decay is superior to all other techniques in terms of OA, AA, and
of 5e−3 and learning rate of 5e−4 . In addition, these methods ks, and exhibits better performance in most cases in terms of
(considering also the RNN) used a step scheduler with a classwise accuracy.
gamma of 0.9, steps of size 50, and trained during 500 epochs. It is worth noting that conventional classifiers, such as
The average and standard deviation of each experiment have KNN, RF, or SVM, exhibit similar performance. An exception
been calculated based on three repetitions. Python 3.7.7 and is the KNN with the MUFFL and Trento datasets, which
PyTorch 1.5.0 were used to implement the coding of the provides inferior accuracies than those provided by RF and
proposed morphFormer. SVM. In addition, the performance of DL-based classifiers,
Different widely utilized quantitative measurement methods, such as CNN1D, CNN2D, CNN3D, and RNN, is generally
such as the overall accuracy (OA), average accuracy (AA), and superior to that of conventional classifiers, except for RF in
kappa coefficients (kappa), are utilized for assessing the per- UH and MUFFL datasets (which is better than CNN2D and
formance. The experiments have been performed on spectrally CNN3D). Transformer methods, such as ViT and Spectral-
and spatially disjoint sets of train and testing samples [71] such Former, provide better performance due to the incorporation
that there is no interaction between the respective samples. of the sequential mechanism. However, the incorporation of
In addition, varying percentages or train samples have been the spatial–spectral information in the proposed morphFormer
considered for validating the performance of the considered leads to better classification performance in terms of OA, AA,
techniques. and k in all considered datasets.
Table I shows that the RF provides better performance in
C. Performance Analysis With Disjoint Train/Test Samples the UH dataset in comparison to other conventional classifiers,
A quantitative assessment of classification performance is but it cannot provide better performance than transformer
presented in Tables I–IV. The best classification values are methods. The proposed technique exhibits a performance that
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
Fig. 9. Classification maps for the MUUFL HSI dataset. (a) Ground truth. (b) KNN (75.80%). (c) RF (89.85%). (d) SVM (84.30%). (e) CNN1D (81.17%).
(f) CNN2D (82.95%). (g) CNN3D (77.59%). (h) RNN (88.60%). (i) ViT (91.99%). (j) SpectralFormer (86.68%). (k) morphFormer (93.84%).
TABLE II
C LASSIFICATION P ERFORMANCE ( IN %) ON THE MUUFL HSI DATASET
TABLE III
C LASSIFICATION P ERFORMANCE ( IN %) ON THE T RENTO HSI DATASET
is superior to that of all compared methods due to its capacity exhibit comparable accuracies and outperform the remaining
to learn spatial and spectral information. The morphFormer conventional classifiers. The morphFormer shows better accu-
shows mean OA, AA, and k of 87.85%, 89.66%, and 86.81% racy than that of all other techniques, including transformer-
having a standard deviation of 0.20%, 0.39%, and 0.22%, based approaches, with OA, AA, and k of 93.84 ± 0.10%,
respectively. 80.55 ± 0.27%, and 91.84 ± 0.13%, respectively.
Table II shows the generalization ability of the MUUFL Table III lists the classification results on the Trento
dataset for disjoint train and test samples. Both RNN and RF dataset. RF outperforms other conventional classifiers, and
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
Fig. 10. Classification maps for the Trento HSI dataset. (a) Ground truth. (b) KNN (86.42%). (c) RF (94.73%). (d) SVM (88.55%). (e) CNN1D (93.02%).
(f) CNN2D (92.31%). (g) CNN3D (96.14%). (h) RNN (86.83%). (i) ViT (94.62%). (j) SpectralFormer (88.42%). (k) morphFormer (96.73%).
Fig. 11. Classification maps for the Augsburg HSI dataset. (a) Ground truth. (b) KNN (67.27%). (c) RF (79.96%). (d) SVM 71.60 (%). (e) CNN1D (72.00%).
(f) CNN2D (73.59%). (g) CNN3D (82.89%). (h) RNN (40.26%). (i) ViT (85.90%). (j) SpectralFormer (70.81%). (k) morphFormer (88.68%).
CNN3D shows better accuracy than other DL-based meth- Table IV shows the classification results on the Augsburg
ods. The morphFormer shows better classification accu- dataset. RNN exhibits lower accuracies than other conven-
racy than all other methods with OA, AA, and k of tional classifiers, while RF is the best conventional classifier,
96.73 ± 0.58%, 93.68 ± 1.28%, and 95.62 ± 0.77%, and CNN3D outperforms other DL-based approaches. The
respectively. transformer ViT method outperforms our approach in terms
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
Fig. 12. Classification accuracies in terms of AA, OA, and kappa (κ) obtained by various techniques with different percentages of training samples randomly
selected from (a), (d), (g), UH (b), (e), (h) MUUFL, and (c), (f), (i) Trento datasets.
D. Visual Comparison
Figs. 8–11 show the obtained classification maps. Our goal
is to perform a qualitative evaluation of the compared methods.
Conventional classifiers, such as KNN, RF, and SVM, provide
classification maps with salt and pepper noise around the
boundary areas because they only exploit spectral information.
In addition, the DL methods produce better classification
noise in comparison to conventional classifiers. Specifically,
the maps produced by CNN1D, CNN2D, and CNN3D are
smoother because the boundaries between land-use and land-
cover classes can be separated in a better way. ViT can
extract more abstract information in sequential representation, Fig. 13. Comparing the performance of transformer methods in terms of OA,
so it provides better classification maps. Compared to ViT network parameters, and FLOPs, shown by the radii of circles considered from
(a) UH, (b) MUUFL, (c) Trento, and (d) Augsburg.
and SpectraFormer, the proposed morphFormer exhibits better
classification maps. In other words, our newly proposed mor-
phFormer can enhance classification performance by consider- E. Performance Over Different Train Sample Sizes
ing spatial-contextual information and positional information
across different layers. As a result, it characterizes texture and Fig. 12(a)–(i) shows the classification performance of trans-
edge details better than other transformer-based techniques. former models with different percentages of training samples
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
TABLE IV
C LASSIFICATION P ERFORMANCE ( IN %) ON THE AUGSBURG HSI DATASET
Fig. 14. 2-D graphical visualization of the features extracted by the proposed morphFormer through t-SNE. (a) Houston. (b) MUUFL. (c) Trento.
on three HSI datasets of Houston, MUUFL, and Trento. The the next best model (ViT). In this case, the parameter tradeoff
training samples on these three datasets are randomly selected is justified by the significant increase in classification accuracy.
as 3%, 5%, 7%, and 9%. Furthermore, 2-D graphical plots depicting the features
In the Houston dataset, the proposed morphFormer outper- extracted by the proposed morphFormer are presented in
forms the second-best-performing transformer model (ViT) Fig. 14(a)–(c) for Houston, MUUFL, and Trento datasets,
by a margin of approximately 4% in terms of OA, AA, respectively. Using the t-SNE approach [72], the features
and k for all considered percentages of randomly selected extracted by morphFormer can be analyzed. It can be observed
samples. Although the margin is smaller for larger training that samples of similar categories gather together, and intra-
sizes, the proposed morphFormer exhibits superior classifica- class variance is minimized in all three datasets.
tion performance for all sample sizes in the other two datasets
(MUUFL and Trento). It can be concluded that the proposed IV. C ONCLUSION
morphFormer exhibits significantly better classification perfor- We present a novel morphFormer network for HSI data
mance than the other transformer networks, even with a limited classification, which is based on spectral and spatial morpho-
number of training samples. logical convolutions. Although fusing attention and morpho-
logical characteristics are not straightforward, our approach
can successfully merge attention mechanisms with morpholog-
F. Hyperparameter Sensitivity Analysis ical operations and provide superior classification performance
In terms of computing complexity, the proposed model is compared to standard convolutional models and the recently
not only effective but also rather efficient. In Fig. 13(a)–(d), developed transformer models. Our morphFormer has the
the parameters and calculations of the proposed method are potential to excel in many different classification tasks in EO
compared to those of various transformer networks. Specifi- and RS. It is because of its ability to apply learnable morpho-
cally, we show the OA, the number of parameters, and the logical operations in addition to multihead self-attention mech-
number of calculations (FLOPs) for the UH, Trento, MUUFL, anisms. A general adversarial network (GAN)-based method
and Augsburg datasets. The calculations are shown by the will be investigated with the morphFormer in our future work.
radii of circles. The efficiency of morphFormer is clear in Moreover, the LiDAR processing problem will also be solved
Houston and Augsburg datasets, where it needs the fewest using a morphFormer-based approach.
parameters and FLOPS. Although the parameters and FLOPS
needed by morphFormer are higher than those required by R EFERENCES
SpectralFormer in certain cases, the gain in performance [1] S. K. Roy, P. Kar, D. Hong, X. Wu, A. Plaza, and J. Chanussot,
“Revisiting deep hyperspectral feature extraction networks via gradient
compensates for that. As can be seen with the UH data, centralized convolution,” IEEE Trans. Geosci. Remote Sens., vol. 60,
morphFormer offers an outstanding gain in OA (4.62%) over pp. 1–19, 2021.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
[2] M. Ahmad et al., “Hyperspectral image classification-traditional to deep [23] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote
models: A survey for future prospects,” IEEE J. Sel. Topics Appl. Earth sensing images with support vector machines,” IEEE Trans. Geosci.
Observ. Remote Sens., vol. 15, pp. 968–999, 2022. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
[3] S. K. Roy, R. Mondal, M. E. Paoletti, J. M. Haut, and A. Plaza, [24] W. Li, C. Chen, H. Su, and Q. Du, “Local binary patterns and extreme
“Morphological convolutional neural networks for hyperspectral image learning machine for hyperspectral imagery classification,” IEEE Trans.
classification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., Geosci. Remote Sens., vol. 53, no. 7, pp. 3681–3693, Jul. 2015.
vol. 14, pp. 8689–8702, 2021. [25] B. Rasti, P. Scheunders, P. Ghamisi, G. Licciardi, and J. Chanussot,
[4] B. Lu, Y. He, and P. D. Dao, “Comparing the performance of multi- “Noise reduction in hyperspectral imagery: Overview and application,”
spectral and hyperspectral images for estimating vegetation properties,” Remote Sens., vol. 10, no. 3, p. 482, Mar. 2018. [Online]. Available:
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 6, https://www.mdpi.com/2072-4292/10/3/482
pp. 1784–1797, Jun. 2019. [26] S. K. Roy, P. Kar, M. E. Paoletti, J. M. Haut, R. Pastor-Vargas, and
[5] C. Chen, J. Yan, L. Wang, D. Liang, and W. Zhang, “Classification of A. Robles-Gomez, “SiCoDeF2 net: Siamese convolution deconvolution
urban functional areas from remote sensing images and time-series user feature fusion network for one-shot classification,” IEEE Access, vol. 9,
behavior data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., pp. 118419–118434, 2021.
vol. 14, pp. 1207–1221, 2020.
[27] X. Wang, Y. Feng, R. Song, Z. Mu, and C. Song, “Multi-attentive
[6] J. Yuan, S. Wang, C. Wu, and Y. Xu, “Fine-grained classification of hierarchical dense fusion net for fusion classification of hyperspectral
urban functional zones and landscape pattern analysis using hyperspec- and LiDAR data,” Inf. Fusion, vol. 82, pp. 1–18, Jun. 2022.
tral satellite imagery: A case study of Wuhan,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 15, pp. 3972–3991, 2022. [28] D. Hong et al., “More diverse means better: Multimodal deep learn-
ing meets remote-sensing imagery classification,” IEEE Trans. Geosci.
[7] C. Shah and Q. Du, “Spatial-aware collaboration–competition preserving
Remote Sens., vol. 59, no. 5, pp. 4340–4354, May 2021.
graph embedding for hyperspectral image classification,” IEEE Geosci.
Remote Sens. Lett., vol. 19, May 2022, Art. no. 5506005. [29] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot, “Graph
[8] E. Bartholomé and A. S. Belward, “GLC2000: A new approach to global convolutional networks for hyperspectral image classification,” IEEE
land cover mapping from Earth observation data,” Int. J. Remote Sens., Trans. Geosci. Remote Sens., vol. 59, no. 7, pp. 5966–5978, Jul. 2020.
vol. 26, no. 9, pp. 1959–1977, Feb. 2005. [30] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep
[9] J. Senthilnath, S. N. Omkar, V. Mani, N. Karnwal, and S. P. B., “Crop supervised learning for hyperspectral data classification through con-
stage classification of hyperspectral data using unsupervised techniques,” volutional neural networks,” in Proc. IEEE Int. Geosci. Remote Sens.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, Symp. (IGARSS), Jul. 2015, pp. 4959–4962.
pp. 861–866, Apr. 2013. [31] A. B. Hamida, A. Benoit, P. Lambert, and C. B. Amar, “3-D deep
[10] B. Koetz, F. Morsdorf, S. van der Linden, T. Curt, and B. Allgöwer, learning approach for remote sensing image classification,” IEEE Trans.
“Multi-source land cover classification for forest fire management based Geosci. Remote Sens., vol. 56, no. 8, pp. 4420–4434, Aug. 2018.
on imaging spectrometry and LiDAR data,” Forest Ecology Manage., [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
vol. 256, no. 3, pp. 263–271, Jul. 2008. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[11] X. Wu, D. Hong, J. Chanussot, Y. Xu, R. Tao, and Y. Wang, “Fourier- Jun. 2016, pp. 770–778.
based rotation-invariant feature boosting: An efficient framework for [33] Z. Zhong, J. Li, Z. Luo, and M. Chapman, “Spectral–spatial residual
geospatial object detection,” IEEE Geosci. Remote Sens. Lett., vol. 17, network for hyperspectral image classification: A 3-D deep learn-
no. 2, pp. 302–306, Feb. 2020. ing framework,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 2,
[12] X. Wu, D. Hong, J. Tian, J. Chanussot, W. Li, and R. Tao, “ORSIm pp. 847–858, Aug. 2018.
detector: A novel object detection framework in optical remote sensing [34] S. K. Roy, S. R. Dubey, S. Chatterjee, and B. B. Chaudhuri, “FuSENet:
imagery using spatial-frequency channel features,” IEEE Trans. Geosci. Fused squeeze- and-excitation network for spectral-spatial hyperspectral
Remote Sens., vol. 57, no. 7, pp. 5146–5158, Jul. 2019. image classification,” IET Image Process., vol. 14, no. 8, pp. 1653–1661,
[13] S. L. Ustin, Manual of Remote Sensing, Remote Sensing for Natural 2020.
Resource Management and Environmental Monitoring, vol. 4. Hoboken, [35] S. K. Roy, S. Chatterjee, S. Bhattacharyya, B. B. Chaudhuri, and
NJ, USA: Wiley, 2004. J. Platoš, “Lightweight spectral–spatial squeeze-and-excitation residual
[14] P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, “Random forests bag-of-features learning for hyperspectral classification,” IEEE Trans.
for land cover classification,” Pattern Recognit. Lett., vol. 27, no. 4, Geosci. Remote Sens., vol. 58, no. 8, pp. 5277–5290, Aug. 2020.
pp. 294–300, 2006. [36] M. Zhu, L. Jiao, F. Liu, S. Yang, and J. Wang, “Residual spectral-spatial
[15] L. Gao, D. Hong, J. Yao, B. Zhang, P. Gamba, and J. Chanussot, attention network for hyperspectral image classification,” IEEE Trans.
“Spectral superresolution of multispectral imagery with joint sparse and Geosci. Remote Sens., vol. 59, no. 1, pp. 449–462, May 2020.
low-rank learning,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 3,
[37] M. E. Paoletti, J. M. Haut, R. Fernandez-Beltran, J. Plaza, A. J. Plaza,
pp. 2269–2280, Mar. 2021.
and F. Pla, “Deep pyramidal residual networks for spectral–spatial
[16] P. Ghamisi, J. A. Benediktsson, and S. Phinn, “Land-cover classification hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,
using both hyperspectral and LiDAR data,” Int. J. Image Data Fusion, vol. 57, no. 2, pp. 740–754, Feb. 2018.
vol. 6, no. 3, pp. 189–215, 2015.
[38] M. E. Paoletti, J. M. Haut, S. K. Roy, and E. M. T. Hendrix, “Rota-
[17] S. K. Roy, S. Manna, T. Song, and L. Bruzzone, “Attention-based adap-
tion equivariant convolutional neural networks for hyperspectral image
tive Spectral–Spatial kernel ResNet for hyperspectral image classifica-
classification,” IEEE Access, vol. 8, pp. 179575–179591, 2020.
tion,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 9, pp. 7831–7843,
Sep. 2021. [39] S. K. Roy, M. E. Paoletti, J. M. Haut, E. M. T. Hendrix, and
[18] M. E. Paoletti, S. Moreno-Álvarez, and J. M. Haut, “Multiple attention- A. Plaza, “A new max-min convolutional network for hyperspectral
guided capsule networks for hyperspectral image classification,” IEEE image classification,” in Proc. 11th Workshop Hyperspectral Imag.
Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022. Signal Process., Evol. Remote Sens. (WHISPERS), 2021, pp. 1–5.
[19] M. Paoletti, X. Tao, J. Haut, S. Moreno-Álvarez, and A. Plaza, “Deep [40] S. K. Roy, D. Hong, P. Kar, X. Wu, X. Liu, and D. Zhao, “Lightweight
mixed precision for hyperspectral image classification,” J. Supercomput., heterogeneous kernel convolution for hyperspectral image classification
vol. 77, pp. 9190–9201, Feb. 2021. with noisy labels,” IEEE Geosci. Remote Sens. Lett., vol. 19, Sep. 2022,
[20] S. K. Roy, G. Krishna, S. R. Dubey, and B. B. Chaudhuri, “HybridSN: Art. no. 5509705.
Exploring 3-D-2-D CNN feature hierarchy for hyperspectral image clas- [41] L. Zhu, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Generative
sification,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 2, pp. 277–281, adversarial networks for hyperspectral image classification,” IEEE Trans.
Jun. 2020. Geosci. Remote Sens., vol. 56, no. 9, pp. 5046–5063, Sep. 2018.
[21] C. Shah and Q. Du, “Collaborative and low-rank graph for discriminant [42] S. K. Roy, J. M. Haut, M. E. Paoletti, S. R. Dubey, and A. Plaza, “Gen-
analysis of hyperspectral imagery,” IEEE J. Sel. Topics Appl. Earth erative adversarial minority oversampling for spectral–spatial hyperspec-
Observ. Remote Sens., vol. 14, pp. 5248–5259, 2021. tral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 60,
[22] D. Hong, J. Yao, D. Meng, Z. Xu, and J. Chanussot, “Multimodal GANs: pp. 1–15, 2021.
Toward crossmodal hyperspectral–multispectral image segmentation,” [43] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies
IEEE Trans. Geosci. Remote Sens., vol. 59, no. 6, pp. 5103–5113, with gradient descent is difficult,” IEEE Trans. Neural Netw., vol. 5,
Jun. 2021. no. 2, pp. 157–166, Mar. 1994.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
5503615 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023
[44] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, [66] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot, “Graph
“Transformers in vision: A survey,” ACM Comput. Surv., vol. 54, pp. convolutional networks for hyperspectral image classification,” IEEE
1–41, Jan. 2022, doi: 10.1145%2F3505244. Trans. Geosci. Remote Sens., vol. 59, no. 7, pp. 5966–5978, Jul. 2021.
[45] D. Hong et al., “Spectralformer: Rethinking hyperspectral image classi- [67] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the
fication with transformers,” IEEE Trans. Geosci. Remote Sens., vol. 60, properties of neural machine translation: Encoder-decoder approaches,”
pp. 1–15, 2021. 2014, arXiv:1409.1259.
[46] J. He, L. Zhao, H. Yang, M. Zhang, and W. Li, “HSI-BERT: Hyperspec- [68] A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers
tral image classification using the bidirectional encoder representation for image recognition at scale,” 2020, arXiv:2010.11929.
from transformers,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, [69] S. R. Dubey, S. Chakraborty, S. K. Roy, S. Mukherjee, S. K. Singh, and
pp. 165–178, Sep. 2020. B. B. Chaudhuri, “DiffGrad: An optimization method for convolutional
[47] Z. Zhong, Y. Li, L. Ma, J. Li, and W.-S. Zheng, “Spectral–spatial neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 11,
transformer network for hyperspectral image classification: A factorized pp. 4500–4511, Nov. 2019.
architecture search framework,” IEEE Trans. Geosci. Remote Sens., [70] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
vol. 60, pp. 1–15, 2021. 2014, arXiv:1412.6980.
[48] L. Sun, G. Zhao, Y. Zheng, and Z. Wu, “Spectral–spatial feature [71] E. M. T. Hendrix, M. Paoletti, and J. M. Haut, On Training Set
tokenization transformer for hyperspectral image classification,” IEEE Selection in Spatial Deep Learning. Cham, Switzerland: Springer, 2022,
Trans. Geosci. Remote Sens., vol. 60, Jan. 2022, Art. no. 5522214. pp. 327–339, doi: 10.1007/978-3-031-00832-0_9.
[49] X. Yang, W. Cao, Y. Lu, and Y. Zhou, “Hyperspectral image transformer [72] L. van der Maaten, “Accelerating t-SNE using tree-based algorithms,”
classification networks,” IEEE Trans. Geosci. Remote Sens., vol. 60, J. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, Oct. 2014. [Online].
May 2022, Art. no. 5528715. Available: http://jmlr.org/papers/v15/vandermaaten14a.html
[50] S. K. Roy, A. Deria, D. Hong, B. Rasti, A. Plaza, and J. Chanussot,
“Multimodal fusion transformer for remote sensing image classification,”
2022, arXiv:2203.16952.
[51] W. Liao, R. Bellens, A. Pizurica, S. Gautama, and W. Philips, “Graph-
based feature fusion of hyperspectral and lidar remote sensing data using
morphological features,” in Proc. IGARSS, 2013, pp. 4942–4945.
[52] M. D. Mura, J. A. Benediktsson, B. Waske, and L. Bruzzone, “Morpho-
logical attribute profiles for the analysis of very high resolution images,”
IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10, pp. 3747–3762, Swalpa Kumar Roy (Student Member, IEEE)
Oct. 2010. received the bachelor’s degree in computer science
[53] B. Rasti, P. Ghamisi, and R. Gloaguen, “Hyperspectral and LiDAR and engineering from the West Bengal University
fusion using extinction profiles and total variation component analysis,” of Technology, Kolkata, India, in 2012, the master’s
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 7, pp. 3997–4007, degree in computer science and engineering from the
Jul. 2017. Indian Institute of Engineering Science and Tech-
[54] A. Merentitis, C. Debes, R. Heremans, and N. Frangiadakis, “Auto- nology, Shibpur (IIEST Shibpur), Howrah, India,
matic fusion and classification of hyperspectral and LiDAR data using in 2015, and the Ph.D. degree in computer science
random forests,” in Proc. IEEE Geosci. Remote Sens. Symp., Jul. 2014, and engineering from the University of Calcutta,
pp. 1245–1248. Kolkata, in 2021.
[55] M. Pedergnana, P. R. Marpu, M. D. Mura, J. A. Benediktsson, and From July 2015 to March 2016, he was a Project
L. Bruzzone, “Classification of remote sensing optical and LiDAR data Linked Person with the Optical Character Recognition (OCR) Laboratory,
using extended attribute profiles,” IEEE J. Sel. Topics Signal Process., Computer Vision and Pattern Recognition Unit, Indian Statistical Institute,
vol. 6, no. 7, pp. 856–865, Nov. 2012. Kolkata. He is currently an Assistant Professor with the Department of Com-
[56] M. Pesaresi and J. A. Benediktsson, “A new approach for the morpho- puter Science and Engineering, Jalpaiguri Government Engineering College,
logical segmentation of high-resolution satellite imagery,” IEEE Trans. Jalpaiguri, India. His research interests include computer vision, deep learning,
Geosci. Remote Sens., vol. 39, no. 2, pp. 309–320, Feb. 2001. and remote sensing.
[57] S. K. Roy, B. Chanda, B. B. Chaudhuri, D. K. Ghosh, and S. R. Dubey, Dr. Roy was nominated for the Indian National Academy of Engineering
“Local morphological pattern: A scale space shape descriptor for texture (INAE) Engineering Teachers Mentoring Fellowship Program by INAE Fel-
classification,” Digit. Signal Process., vol. 82, pp. 152–165, Nov. 2018. lows in 2021. He was a recipient of the Outstanding Paper Award in the second
Hyperspectral Sensing Meets Machine Learning and Pattern Analysis (Hyper-
[58] D. Hong, X. Wu, P. Ghamisi, J. Chanussot, N. Yokoya, and X. X. Zhu,
MLPA) at the Workshop on Hyperspectral Imaging and Signal Processing:
“Invariant attribute profiles: A spatial-frequency joint feature extractor
Evolution in Remote Sensing (WHISPERS) in 2021. He serves as an Associate
for hyperspectral image classification,” IEEE Trans. Geosci. Remote
Editor for the journal Computer Science (Springer Nature) (SNCS) and an
Sens., vol. 58, no. 6, pp. 3791–3808, Jun. 2020.
Editor for the Frontiers Journal of Advanced Machine Learning Techniques
[59] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep for Remote Sensing Intelligent Interpretation. He has served as a Reviewer
network training by reducing internal covariate shift,” in Proc. ICML, for the IEEE T RANSACTIONS ON G EOSCIENCE AND R EMOTE S ENSING and
2015, pp. 448–456. IEEE G EOSCIENCE AND R EMOTE S ENSING L ETTERS.
[60] X. Du and A. Zare, “Scene label ground truth map for MUUFL gulfport
data set,” Dept. Elect. Comput. Eng., Univ. Florida, Gainesville, FL,
USA, Tech. Rep, 2017.
[61] P. Gader, A. Zare, R. Close, J. Aitken, and G. Tuell, “MUUFL gulfport
hyperspectral and LiDAR airborne data set,” Univ. Florida, Gainesville,
FL, USA, Tech. Rep. REP-2013–570, 2013.
[62] D. Hong, J. Hu, J. Yao, J. Chanussot, and X. X. Zhu, “Multimodal
remote sensing benchmark datasets for land cover classification with
a shared and specific feature learning model,” ISPRS J. Photogramm. Ankur Deria received the bachelor’s degree in com-
Remote Sens., vol. 178, pp. 68–80, Aug. 2021. puter science and engineering from the Jalpaiguri
[63] A. Baumgartner, P. Gege, C. Köhler, K. Lenhard, and Government Engineering College, Jalpaiguri, India,
T. Schwarzmaier, “Characterisation methods for the hyperspectral in 2022. He is currently pursuing the M.Sc.
sensor HySpex at DLR’s calibration home base,” Proc. SPIE, vol. 8533, degree with the Department of Informatics, Tech-
Nov. 2012, Art. no. 85331H. nical University of Munich, Garching bei München,
[64] F. Kurz, D. Rosenbaum, J. Leitloff, O. Meynberg, and P. Reinartz, “Real Germany.
time camera system for disaster and traffic monitoring,” in Proc. Int. His research interests include computer vision and
Conf. SMPR, 2011, pp. 1–6. deep learning.
[65] B. Rasti et al., “Feature extraction for hyperspectral imagery: The Mr. Deria was nominated for the Indian National
evolution from shallow to deep: Overview and toolbox,” IEEE Geosci. Academy of Engineering (INAE) Engineering Stu-
Remote Sens. Mag., vol. 8, no. 4, pp. 60–88, Apr. 2020. dents Mentoring Fellowship by INAE fellows in academic tenure 2022–2023.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.
ROY et al.: SPECTRAL–SPATIAL MORPHOLOGICAL ATTENTION TRANSFORMER FOR HSI CLASSIFICATION 5503615
Chiranjibi Shah (Member, IEEE) received the Qian Du (Fellow, IEEE) received the Ph.D. degree
B.E. degree in electronics and communication from in electrical engineering from the University of
Pokhara University, Pokhara, Nepal, in 2012, and the Maryland, Baltimore, MD, USA, in 2000.
Ph.D. degree in electrical and computer engineering She is currently the Bobby Shackouls Professor
from Mississippi State University, Starkville, MS, with the Department of Electrical and Computer
USA, in May 2022. Engineering, Mississippi State University, Starkville,
His research interests include applying different MS, USA. Her research interests include hyperspec-
machine learning and deep learning techniques for tral remote sensing image analysis and applications,
the classification of hyperspectral imagery, image pattern classification, data compression, and neural
recognition, dimensionality reduction, and object networks.
detection. Dr. Du is a fellow of the SPIE-International Soci-
ety for Optics and Photonics. She is a member of the IEEE TAB Period-
icals Review and Advisory Committee (PRAC) and the SPIE Publications
Committee. She was a recipient of the 2010 Best Reviewer Award from the
IEEE Geoscience and Remote Sensing Society. She was the Co-Chair of
the Data Fusion Technical Committee of the IEEE Geoscience and Remote
Sensing Society from 2009 to 2013 and the Chair of the Remote Sensing and
Mapping Technical Committee of the International Association for Pattern
Recognition from 2010 to 2014. She was an Associate Editor for the
IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH O BSERVATIONS
AND R EMOTE S ENSING , Journal of Applied Remote Sensing, and IEEE
S IGNAL P ROCESSING L ETTERS. From 2016 to 2020, she was the Editor-
in-Chief of the IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH
O BSERVATIONS AND R EMOTE S ENSING.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on November 09,2024 at 08:01:59 UTC from IEEE Xplore. Restrictions apply.