0% found this document useful (0 votes)

14 views16 pages

Dual-Branch Domain Adaptation Few-Shot Learning For Hyperspectral Image Classification

This document presents a novel dual-branch domain adaptation few-shot learning (FSL) method for hyperspectral image classification, addressing challenges in cross-domain learning between low and high spatial resolution images. The proposed approach includes a domain fusion branch for aligning global distributions and a domain separation branch for maintaining independent category distributions, effectively mitigating the negative impacts of forced alignment. Experimental results demonstrate the method's superiority over existing techniques in classification accuracy and model efficiency.

Uploaded by

researchdeb2025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views16 pages

Dual-Branch Domain Adaptation Few-Shot Learning For Hyperspectral Image Classification

Uploaded by

researchdeb2025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL.

62, 2024 5506116

Dual-Branch Domain Adaptation Few-Shot

Learning for Hyperspectral Image Classification
Zhuowei Wang , Shihui Zhao , Genping Zhao , and Xiaoyu Song

Abstract— Cross-domain few-shot learning (FSL) often I. I NTRODUCTION

employs adversarial domain adaptation techniques to address
the issue of data distribution discrepancies between the source
and target domains. However, forcing the alignment of two
distinct domains may lead to distortions in class distribution
alignment and result in a decrease in classification perfor-
H YPERSPECTRAL images (HSIs) consist of hun-
dreds to thousands of spectral bands, containing rich
spatial–spectral information [1]. Its distinctive attributes pro-
mance in hyperspectral image (HSI) analysis. Moreover, existing pel hyperspectral imaging into extensive application across
cross-domain methods are often applied to satellite/airborne HSIs domains including environmental studies [2], [3] and military
as both the source and target domains. It is rarely explored operations, encompassing tasks such as detection, prediction,
whether the same cross-domain methods can be applied for
cross-applications where the source and target domain data could
and reconnaissance [4], [5]. Therefore, HSI classification has
be both satellite/airborne HSIs with lower spatial resolution become a highly focused research area [6], [7], [8]. Known
and unmanned aerial vehicle (UAV) HSIs with higher spatial for the powerful feature learning capabilities of deep learn-
resolution. To address these issues, this article proposes a novel ing in various domains, the classification of hyperspectral
domain-adaptive FSL network with dual branches, respectively, remote-sensing images through the utilization of deep-learning
aiming at domain fusion and domain separation. The domain
fusion branch uses a conditional adversarial network to align
methods has emerged as more preferred approach in recent
the global distributions of the two domains, while the domain years [9], [10], [11], [12], [13]. However, HSIs usually
separation branch introduces a gate mechanism for discrimi- have unlabeled or sparsely labeled samples, while deep net-
native feature learning in each domain to achieve independent works reliant on data generally demand a substantial amount
category distributions. During the experiment, the proposed of training information. Therefore, the prominent issue of
method is evaluated by performing cross-transfer learning under
the condition that low-spatial-resolution hyperspectral data and
small sample size in hyperspectral remote-sensing images
high-spatial-resolution hyperspectral data are used as source and becomes a crucial factor limiting the further enhancement of
target data alternately. The experimental results suggest that deep learning for HSI classification performance. Simultane-
the proposed method not only mitigates the negative effects of ously, labeling HSIs is a time-consuming, labor-intensive, and
forced alignment in domain fusion, but also holds potential for expensive process. Consequently, addressing the restriction of
cross-domain transfer learning between low- and high-spatial-
resolution HSIs.
labeled samples poses a pivotal obstacle in HSI classification.
Previous research has explored effective feature learn-
Index Terms— Domain adaptation, few-shot learning (FSL), ing methods for unlabeled HSI learning using unsupervised
gate mechanism, hyperspectral image (HSI) classification.
encoding networks. For instance, Mei et al. [14] presented
a technique for spatial–spectral feature learning strategy
employing an unsupervised 3-D convolutional autoencoder
Manuscript received 30 October 2023; revised 15 December 2023;
accepted 31 December 2023. Date of publication 19 January 2024; date of (3D-CAE). Xu et al. [15] introduced the spectral–spatial
current version 5 February 2024. This work was supported in part by the semantic feature learning network (S3FN) to achieve unsu-
Guangzhou Science and Technology Plan Project under Grant 202201011835; pervised deep extraction of semantic features from HSIs. For
in part by the Innovation Fund for Industry-University-Research in Chinese
Universities under Grant 2021FNA02010; in part by the Guangzhou Funda- HSIs with limited labeled samples, semisupervised learning
mental and Applied Research under Grant 202201010273; and in part by the approaches can be utilized. As an illustration, Fan et al. [16]
Collaborative Education Project of Ministry of Education with Beijing Piesat proposed a dataset generation approach for unlabeled samples
Information Technology Company Ltd., under Grant 220802313160749.
(Corresponding author: Genping Zhao.) by employing a semisupervised multihead convolutional neural
Zhuowei Wang and Shihui Zhao are with the School of Computer Science network (CNN) technique with ensemble learning meth-
and Technology, Guangdong University of Technology, Guangzhou 510006, ods. Jia et al. [17] introduced a semisupervised autoencoder
China (e-mail: [email protected]; [email protected]).
Genping Zhao is with the School of Computer Science and Technology, Siamese network (3DAES) that leverages unsupervised feature
Guangdong University of Technology, Guangzhou 510006, China, and also learning from unlabeled data using autoencoders and utilizes
with Beijing Piesat Information Technology Company Ltd., Beijing 100095, a limited labeled sample set for calibration. Additionally,
China (e-mail: [email protected]).
Xiaoyu Song is with the Department of Electrical and Computer Engi- an approach combined with active learning is also employed.
neering, Portland State University, Portland, OR 97207 USA (e-mail: Li et al. [18] developed an active learning prototype network
[email protected]). (ALPN) that extracts distinctive features from a limited num-
This article has supplementary downloadable material available at
https://doi.org/10.1109/TGRS.2024.3356199, provided by the authors. ber of samples. This network combines semisupervised and
Digital Object Identifier 10.1109/TGRS.2024.3356199 active learning techniques to increase the number of annotated
1558-0644 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on February 07,2024 at 08:58:17 UTC from IEEE Xplore. Restrictions apply.
5506116 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 62, 2024

samples. However, the aforementioned neural network-based aggregation cross-domain FSL (Gia-CFSL) that integrates
learning methods that incorporate unsupervised, semisuper- FSL with graph-based domain alignment. The transformer is
vised, and active learning approaches mainly focus on the also introduced into domain adaptation FSL to fully learn
analysis of the data for small-sample HSIs in the target the global features of HSIs of both the source and target
domain. They did not make use of existing HSI data with domains [30]. Peng et al. [31] proposed an FSL approach
sufficient labels as source data to guide target data learning. based on a convolutional transformer (CT) network (CTFSL),
With respect to the fact of limited labeled samples, cross- in which the CT extracts both local and global information,
domain HSI classification tasks through transfer learning make and an adversarial network is employed for domain align-
it possible to utilize publicly available HSI data with suffi- ment. Despite the improvements in few-shot HSI classification
cient labels. Domain adaptation methods primarily modify the performance achieved by these methods, they do not take
feature space of the source and target domains to minimize into account the potential negative impact of aligned data
the distribution discrepancy between them [19], [20], [21]. distributions. In fact, domain alignment forcibly aligns the
For example, Huang et al. [22] proposed a spatial–spectral category distributions of two domains leading to distortions in
weighted adversarial domain adaptation (SSWADA) net- category distributions. This would affect the distinguishability
work for cross-scene wetland mapping using HSIs, where between categories, especially when dealing with datasets
a weighted adversarial discrimination strategy is employed that have extremely different data sources, and subsequently
to align the feature distributions between source and target reducing the performance of HSI classification.
scenes. Zhang et al. [23] developed a topology structure and As with the development of unmanned aerial vehicle (UAV)
semantic information transmission network (TSTnet), in which imaging, there are increasing applications with UAV-based
the distribution alignment between the source domain and the hyperspectral imaging which is of high spatial resolution. It is
target domain is achieved through maximum mean discrep- more flexible to conduct image labeling for UAV HSIs than
ancy (MMD). Zhang et al. [24] developed a single-source satellite/airborne images of low spatial resolution. Knowledge
domain expansion network (SDEnet), in which a generator, transfer conducted between HSIs with low and high spatial
comprising a semantic encoder and morph encoder, and a resolutions is faced with extremely distinct data distribution.
discriminator, incorporating supervised contrastive learning, Therefore, it is essential to have the cross-domain learning
are employed to learn domain-invariant information through network maintain category independence when aligning data
adversarial learning. The domain distribution alignment in distributions of two domains in domain adaptation. However,
these domain adaptation methods is essentially aimed at rarely do studies spend effort on learning the discrimina-
achieving knowledge transfer for sharing information. How- tive knowledge of both domains when conducting domain
ever, domain adaptation relies on the presumption that the alignment. Moreover, available studies undertake more investi-
two domains exhibit similarities in terms of categories. This gations on cross-domain FSL between public satellite/airborne
prevents them from efficiently categorizing classes in the test HSIs which have low spatial resolution. In fact, it is of
target data that have not been trained before. To overcome great significance to figure out the different performance
this limitation, it is expected that a model can learn to of cross-domain FSL when transfer learning is undertaken
learn so that to address these issues. Meta-learning [25] is between low- and high-spatial-resolution images alternatively.
a type of model inspired by human cognitive systems that To address the aforementioned problem, a dual-branch
possess the prowess of “learn to learn.” Meta-learning models domain adaptation FSL network is proposed in this study
can generalize available knowledge to new data. Few-shot to achieve HSI classification. The main contributions of this
learning (FSL) is a specific application of meta-learning [26]. article are threefold.
This involves transferring meta-knowledge from a few labeled 1) A dual-branch domain adaptation FSL method based
instances to unknown data. To obtain FSL-compatible meta- on domain separation and domain fusion is proposed
knowledge, FSL simulates the few-shot task by partitioning to achieve domain data distribution alignment while
the data into episodes and performing training on multiple maintaining domain class distribution independence.
identical episodes. With the meta-knowledge-equipped model, Compared to the available DCFSL network using
few-shot HSI classification can be achieved. For example, domain fusion learning to align the distribution of the
Gao et al. [27] utilized relationship networks imbued with the two domains, the domain separation learning branch is
concept of meta-learning to devise a deep classification model added by developing a domain gate (DG) mechanism
suitable for few-shot tasks. network structure to learn discriminative knowledge.
The premise for effective classification in most FSL 2) Investigation of cross-domain transfer learning using
approaches is that the source and target domains have the same HSIs with low spatial resolution and HSIs with high
data distribution. However, in practical remote-sensing appli- spatial resolution alternatively as the source dataset and
cations, HSIs acquired from different sensors exhibit variations the target dataset is conducted to evaluate the different
in spectral range, spatial resolution, and categories. Domain performance when the spatial resolutions of the source
alignment via cross-domain FSL network is the common and target domain data are changed.
strategy to address these problems. For example, Li et al. [28] 3) The experimental results demonstrate that the pro-
introduced a deep cross-domain FSL (DCFSL) network posed dual-branch domain adaptation approach effec-
that combines domain adaptation and FSL. Zhang et al. tively addresses the negative impact caused by domain
[29] developed a framework called graph information adaptation. It outperforms available state-of-the-art

Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on February 07,2024 at 08:58:17 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DUAL-BRANCH DOMAIN ADAPTATION FSL FOR HSI CLASSIFICATION 5506116

counterparts of CNN-associated networks, related net- separation branches, to learn different information features.
works of DCFSL, and even the latest advanced CTFSL Finally, the two losses generated from the domain separation
with respect to classification accuracy. Of more impor- branch and domain fusion branch are combined to form the
tance, its superiority over the latest CTFSL is with overall network loss function. This loss function is applied
respect to both model efficiency and precise classifica- for model training and obtaining a transferable classification
tion. In addition, the proposed model presents potential model, which can be used to classify unlabeled target data.
to be used for cross-domain transfer tasks between low-
and high-spatial-resolution HSIs. It shows especially
dominant merit compared to CTFSL when learning from B. Shallow Learning Structure Module
high-spatial-resolution HSIs to transfer on low-spatial- Considering the spectral dimension mismatch between the
resolution HSIs. two domains, an initial preprocessing is performed on datasets
in both domains using 2D-CNN mapping units to reduce the
II. M ETHODOLOGY dimensionality. This preprocessing step reduces the original
Assume that the source-domain hyperspectral dataset D S feature dimensions of the image cubes of two domains to a
has C S classes and the target domain dataset D T has C T feature dimension of 100, resulting in HSI cube x ∈ R 100×9×9
classes. In FSL, the training set consists of D S and a small [33]. After unifying the feature dimension, the 3-D residual
labeled target dataset Dtar . Subsequently, the trained model CNN feature extraction module E 1 is employed to conduct
is evaluated using the unlabeled target dataset Dtes . It is shallow learning on data from both domains. This model is
important to note that Dtar ∪ Dtes = D T , Dtar ∩ Dtes = φ, and composed of one residual block and one max-pooling layer,
C S ∩C T = φ. To obtain meta-knowledge for FSL, in this study, which renders a summary of preliminary semantic information.
the few-shot task is conducted by making use of meta-learning Among them, the 3-D residual block [28] consists of three
which needs to divide D S and Dtar into episodes. Each episode 3-D convolutional layers, each with a kernel size of 3 × 3
consists of an N -way-K -shot setup, where N refers to the × 3 and a total of eight convolutional kernels, as shown in
number of classes randomly selected from the training data Fig. 2. Through this stage of shallow learning, the abstract
and K represents the number of samples per class within features such as texture, edges, and corners of input images
the support set, denoted as S = {(x i , yi )}(N ×K )
(i=1) . Additionally,
can be initially captured. They are common representations for
(N ×W ) image data of both the source and target domains.
the query set, indicated as Q = {(x j , y j )}( j=1) , contains W
samples per class. The episodes, F s = {Ss , Q s }, and F t =
{St , Q t }, are sampled from D S and Dtar , respectively. Here, C. Dual-Branch Domain Adaptation Feature Learning Deep
x represents a pixel block and y represents the corresponding Network
label for the pixel. Considering the involvement of a handful
To address feature learning in cross-domain data, it is
of labeled instances from the target data in training, in this
necessary to simultaneously consider the shared invariant
study, to address the sample bias issue and match the data sizes
features and distinguishable differential features of the source
between the target and source domains, data augmentation
and target domains. To achieve this goal, this article introduces
using the Gaussian noise method [32] is performed at the
a dual-branch domain adaptation network framework that com-
data preparation stage. Specifically, new HSI x ′ is generated
bines domain separation and domain fusion for FSL. Namely,
through multiplying certain training sample x from the target
after universal feature representation through shallow learning,
domain by a random factor α and adding Gaussian noise noi
further higher-level semantic features are learned using a
as follows:
domain-fused and domain-separated dual-branch network.
x ′ = αx + βnoi (1) 1) Domain Fusion Branching Network: In the available
FSL network of DCFSL [28], both domain datasets are con-
where α is randomly sampled from a uniform distribution with
currently utilized for model training, signifying that the model
a range of (0.9, 1.1), β = 0.04. Gaussian noise noi is drawn
learns the common characteristics inherent in both domains.
randomly from a normal distribution with a mean of 0 and a
Therefore, as illustrated in Fig. 1, the domain fusion branch
standard deviation of 1. As only additive and multiplicative
network in this study adopts the same structure as DCFSL
transformations are operated, x ′ has the same size as x.
[28]. It includes a 3-D convolution residual network E 2 to
accomplish the shared spatial–spectral feature learning of
A. Overall Framework mixed data. The difference between the 3-D residual block in
The schematic of the proposed dual-branch domain adapta- E 2 and the shallow learning module is that the number of con-
tion training network framework is shown in Fig. 1. As seen volution kernels is 16. Subsequently, prototype learning [34] is
from the figure, random sampling is first conducted to divide applied to learn prototypes of the generalized spatial–spectral
the available training data into episodes. Following this, those feature data. To elucidate with an example from the source
training samples go through 2-D convolutions to unify the data, let f ω represent the complete learning function of the
multimodality data into the same feature space. Subsequently, shallow network E 1 and the deep network E 2 . After learning
shallow spatial–spectral feature learning is performed through through the shallow network and the deep network, the output
a 3-D residual network structure. The features are then fed into feature representation of the support set is denoted as f ω (x i ).
two dual-branch networks, namely domain fusion and domain The prototype ps for each class is the mean vector of f ω (x i ),

Fig. 1. Training framework of the proposed dual-branch domain adaptation FSL network. This framework includes preprocessing units for the input datasets
to be resized into small patches of unified dimensions, shallow network structure to conduct early period feature learning, and the main body of dual-branch
network structure which includes a domain fusion branch to learn aligned common features and a domain separation branch where a DG network is contained
to use a gating mechanism to achieve domain separation.

equation:
 
X
L tfsl = E St , Q t − log Pω y = k | x .
′

(5)
(x ′ ,y)∈ Q t
Fig. 2. Detailed description of the 3-D residual block.
After prototype learning, a basic linear classifier C is
employed to categorize the prototype representation features
which is given by the following equation: of each domain to get discriminative classification results in c.
The predicted classification result c from classifier C and
ps =
1 X
f ω (x i ). (2) the learned feature f = f ω (x) are then linearly mapped
|Ss | ,y (realized by operation ⊗) and fed into a discriminator D,
(x i i )∈Ss
which consists of five fully connected layers. In this way, the
Therefore, in each episode of the query set, the prediction usage of features f is aligned, forming the conditional domain
probability Pω for the query sample x j is calculated as follows adversarial network (CDAN) structure [35].
based on prototype learning: The linear mapping ⊗ is capable of thoroughly capturing
exp −d f ω x j , ps
the multimodal structure that emerges after complex data dis-
. (3)

Pω y j = k|x j ∈ Q s = P N tributions are adequately apprehended. However, its drawback
k=1 exp −d f ω x j , ps

lies in its susceptibility to dimension explosion. Therefore, the
The source-domain FSL loss L sfsl for each episode is following random sampling strategy is adopted to tackle this
expressed as the negative logarithm of the probability based issue:
(
on x j and its true class label k, as shown in the following T⊗ ( f , c), d f × dc ≤ 1024
equation: T ( f , c) = (6)
  T⊙ ( f , c), otherwise

L sfsl = E Ss , Q s −
X
log Pω (y = k | x). (4) where d f and dc represent the dimensions of f and c,
(x,y)∈ Q s
respectively, the dimension of the linear mapping f ⊗ c is
given by d f × dc . A multilinear mapping is performed by
Similar to the source domain, the FSL loss L tfsl for each randomly selecting certain dimensions of f and c: T⊙ ( f , c) =
episode in the target domain is calculated by the following (1/|(ds )1/2 |)(R f f ) ⊙ (Rc c), where ⊙ denotes element-wise

operation and R f and Rc represent matrices that are randomly

sampled once and remain constant during the training process.
ds represents the sampled dimensions. When the dimension
d f ×dc exceeds 1024, a random multilinear mapping T⊙ ( f , c)
is employed; otherwise, T⊗ ( f , c) is used.
Assume that the data distribution of the source and target
domains is represented as Ps (x) and Pt (x ′ ), respectively, the
loss function L domain for the discriminator D is optimized
during training according to the following objective equation:

min max L domain = −E x is ∼Ps (x) log D T f is , cis

D T
− E x tj ∼Pt (x ′ ) log 1 − D T f tj , ctj . (7)

Here, D(, ) represents the discriminator’s predicted probability

Fig. 3. Illustration of domain separation controlled by the DG matrix.
that x is from the source domain, while 1 − D(, ) represents The gating matrix functions in the gating network to achieve a one-to-one
the discriminator’s predicted probability that x originates from correspondence between domains and filters through selective activation of
the target domain. For optimization of discriminator D, the corresponding filters. This helps achieve domain separation and enables the
network to independently learn discriminative knowledge from the source and
above discriminative error function is minimized, while on the target domains.
feature extractor f ω and the classifier C, L domain is maximized.
This action is taken to render the discriminator incapable of
differentiating whether the input samples originate from the article, there are only two domains, so d is 2. M denotes
source or the target domain. As a result, the data distributions the gate matrix for the source domain, and 1 − M represents
of these two domains become harmonized. the gate matrix for the target domain. M ch d ∈ [0, 1] is the
2) Domain Separation Branch Network: Unlike simultane- correlation between the dth domain and the chth filter, a higher
ously learning features from multiple domains, using different value of which indicates a stronger correlation rendering the
filters to learn discriminative features from different domains. filter with a higher likelihood to be selected. However, this
It can prevent the model from biasing toward the informa- soft gate cannot effectively separate the two domains. For
tion of the source domain for more labeled samples. In the example, when M ch d = 0.5, it is uncertain whether the filter
above domain fusion branch, data distribution alignment is is activated or not. Moreover, gate matrix data is discrete, and
achieved. In this study, the domain separation learning branch backpropagation does not have gradients for discrete variables.
is introduced to be implemented simultaneously with domain Therefore, in this article, Gumbel softmax [37] is introduced
fusion. In this branch, feature learning with deep network to generate hard gates, achieving a one-to-one correspondence
embedding in the DG mechanism is proposed. By activating between the domain and the filter. As shown in Fig. 3, the
different filters for each domain, this structured development value of M ch d is either 0 or 1.0 signifies filter inactivation and
enables separate learning of discriminative knowledge for each 1 indicates filter activation.
domain. The gate mechanism as mentioned by Liang et al. [36] Using the source-domain dataset as an instance, let f ψ
allows for the use of different filters to control information represent the overall learning function of shallow network
learning, and the relationship between filters and categories is E 1 and deep network E 1 and the deep network E 3 , the support
one-to-one or one-to-many. The objective of this study is to set feature representation after E 3 is represented as f ψ (x i ),
establish a one-to-one correspondence between domains and and the query set feature is represented as f ψ (x j ). Combining
filters to learn discriminative knowledge from both domains f ψ (x i ) and f ψ (x j ) together as depicted in the following
simultaneously. Therefore, DG that activates and utilizes the equation:
corresponding filters only when the input sample comes from
X o = con f ψ (x i ), f ψ x j

a certain domain is proposed to be embedded in the network (8)
design. Shallow networks typically achieve common feature
learning for two domain data. As illustrated in Fig. 1, parallel where o = 1, . . . , N (K + W ). Correspondingly, the joint
to the domain fusion branch, the DG is integrated into the sample label set is represented as Yo . Finally, the feature X o
branch network of E 3 to learn domain discriminative knowl- is fed into the global classifier G for classification. The loss
edge. This domain separation branch comprises a residual function used for its optimization is the cross-entropy loss
block, a max-pooling layer, and a 3D-CNN. The difference function. In each fragment F s of the source-domain dataset,
between the 3-D residual block in E 3 and the shallow learning the class distribution of X o is calculated as follows:
module is that the number of convolution kernels is 16. The exp(−d(X o , nk ))
DG is inserted before the residual block and the 3D-CNN, Pψ (Yo = k|X o ∈ F s ) = P N (9)
k=1 exp(−d(X o , nk ))
serving as a filter selector before the convolutional operation.
In practical usage, the correlation between domains and where d represents the Euclidean distance and nk denotes
filters is controlled by a gate matrix M ∈ [0, 1]d×ch , where the deep network output features of the kth class with gate
each value is initialized randomly. Here, d signifies the count integration in the fragment. Accordingly, the source-domain
of domains and ch denotes the number of filters. In this global loss for each fragment is calculated using the

TABLE I
D ETAILS OF A LL S IX H YPERSPECTRAL I MAGERY DATASETS U SED

following formula: ones and the opposite process, six public hyperspectral datasets
  are utilized in our study. The first four are satellite/airborne
datasets of Chikusei,1 Indian Pines,2 Salinas3 and Pavia
X
L sglobal = E F s − log Pψ (Yo = k | X o ). (10)
(X,Y )∈F s
University,4 which have relatively low spatial resolution.
Additionally, two hyperspectral datasets of high spatial res-
Similar to the source domain, the global loss for each olution are also used, which were captured by UAVs from
fragment in the target domain can be expressed as shown in Wuhan University, denoted as WHU-Hi-LongKou and WHU-
the following equation: Hi-HanChuan.5 The detailed imaging parameters of those six
 
X datasets are provided in Table I. It needs to pay attention to
t
log Pψ Yo = k | X o .
′

L global = E F t − (11) the different spatial resolutions of these datasets. It can be
( X ,Y )∈F t
′ seen that the first four datasets are all with spatial resolution
over 1 m, and the lowest resolution even reaches 20 m. The
The proposed dual-branch domain adaptation FSL network
Chikusei dataset acquired using the hyperspectral visible/near-
simultaneously implements the domain separation and domain
infrared cameras (Hyperspec-VNIR-CIRIS) spectrometer has
fusion branches to address the issue of diminishing differences
a spatial resolution of 2.5 m. The Indian Pines dataset was
in class distributions between the two domains due to domain
imaged using the airborne visible/infrared imaging spectrom-
distribution alignment. Finally, the computation of the total
eter (AVIRIS) infrared imaging spectrometer with the worst
loss functions for FSL in the two domains, respectively,
spatial resolution of approximately 20 m. The Salinas dataset
is given by the following equations:
was also captured using AVIRIS but with a spatial resolution
L stotal = L sglobal + L sf sl + L domain (12) of 3.7 m. The Pavia University dataset was collected using the
L ttotal = L tglobal + L tf sl + L domain . (13) airborne hyperspectral sensor Reflective Optics System Imag-
ing Spectrometer (ROSIS) with a spatial resolution of 1.3 m.
Therefore, the total loss function of the dual-branch domain Different from the above four well-known satellite/airborne
adaptation FSL network is the sum of the aforementioned two datasets, the WHU-Hi-Longkou dataset, and the WU-Hi-
parts, expressed as follows: HanChuan dataset were obtained using the Nano-Hyperspec
L total = L stotal + L ttotal . (14) imaging sensor mounted on the drone platform. Both these
datasets have relatively high spatial resolution below 1 m.
Through optimization of (14), the proposed dual-branch The former represents a simple agricultural area in Longkou,
FSL method effectively learns both transferable common Hubei, China, on July 17, 2018, with an approximate resolu-
knowledge and discriminative knowledge from two domains. tion of 0.463 m. The latter one depicts a rural–urban fringe
During the testing phase, a restriction of labeled data from area of Hanchuan City in Hubei Province, China, on June
the target domain is applied. The test samples go through 17, 2016, with the highest spatial resolution of 0.109 m.
the shallow learning module which has been trained with the In addition, the six datasets have different numbers of spectral
source domain data to achieve dimensionality reduction and bands and all the numbers of bands are larger than 100.
spatial–spectral feature learning. Subsequently, deep network Therefore, it is reasonable to perform dimensionality reduction
E 2 follows to realize higher-level semantic learning. Finally,
a K-nearest neighbor (KNN) classifier is trained to achieve
1 https://www.sal.t.u-tokyo.ac.jp/hyperdata/
predictive classification of unlabeled target-domain data. 2 https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_
Scenes#Indian_Pines
III. E XPERIMENTS AND R ESULT 3 https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_

A. Datasets Scenes#Salinas
4 https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_
To investigate the different performances in transfer learning Scenes#Pavia_University_scene
from low-spatial-resolution images to high-spatial-resolution 5 http://rsidea.whu.edu.cn/resource_WHUHi_sharing.htm

TABLE II
C LASS T YPES OF S IX DATASETS

to unify both source and target data into 100 dimensions in DC F S L : Building upon the foundation of DFSL, DCFSL
the proposed network. Besides spectral and spatial resolution, addresses both domain adaptation and FSL within a unified
another important difference between those datasets is their framework. In this approach, after a metric space is learned
numbers of landscape class types which are summarized in through deep residual 3-D convolution. FSL is then performed
Table II. In general, datasets that have more classes are in this metric space. Finally, a conditional adversarial network
preferred to be used as source data in transfer learning. is utilized to harmonize domain data distributions and achieve
domain adaptation.
C T F S L : It employs a CT network as a feature extractor to
B. Comparison Algorithms extract local–global features, and subsequently, a fully convo-
lutional network (FCN) follows as the domain discriminator
To evaluate the efficacy of the proposed approach, the tra-
to achieve domain alignment utilizing adversarial loss.
ditional HSI classification method of support vector machines
(SVMs) [38] and deep-learning methods including the lat-
est cross-domain FSL methods mentioned in Section I are
C. Experimental Setup
used for comparison. These deep-learning counterpart methods
include 3D-CNN [39], DFSL + nearest neighbor (NN) [33], The SVM and 3D-CNN methods involve training with
DFSL + SVM [33], DCFSL [28], and CTFSL [31]. merely a limited quantity of labeled target data. For
SV M : Support vector machines utilize a kernel function cross-domain learning experiments with DFSL + NN,
to transform nonlinear input data from a low-dimensional to a DFSL + SVM, DCFSL, and CTFSL, the training data con-
high-dimensional linearly separable space. It is widely used for sists of both source data and a limited set of labeled target
effective classification and regression analysis. However, SVM data. To ensure fairness in the experiments, within the cross-
primarily focuses on spectral information while overlooking domain setting, a total of 200 labeled samples per class are
spatial information. randomly extracted from the source data and utilized for
3D − C N N : 3D-CNN enables spectral–spatial feature training purposes. In all experiments, five labeled samples
extraction from HSIs. It provides a more comprehensive are randomly chosen from each category of the target data
feature representation which is potential for classification as the training set, while the rest data serves as the test set.
accuracy improvement. To prevent the model from favoring the source domain with
D F S L + N N , D F S L + SV M : These methods leverage more labeled samples during training, random Gaussian noise
the source-domain training data to learn a deep residual is added to the five selected labeled target samples, resulting
3-D convolutional metric space capable of distinguishing dif- in an increased sample count of 200. During the training
ferent categories. Subsequently, a limited subset of labeled stage, every episode signifies an N -way-K -shot task. Here,
target-domain data is used to train NN or SVM classifiers K is set to 1 for the support set samples and Q is set to
within this metric space. Finally, the classifier is used for the 19 for the query set samples. Additionally, a cube size of 9 ×
classification of unlabeled data. 9 is chosen for the input image, and optimization employs

TABLE III
C LASSIFICATION R ESULTS (%) ON D IFFERENT DATASETS W ITH F IVE T RAINING S AMPLES PER C LASS (C HIKUSEI AS S OURCE D OMAIN )

the Adam optimizer. Model training spans 10 000 iterations, to the best counterpart of CTFSL, our proposed method shows
executed with a learning rate of 0.001. different levels of improvement of transfer learning accuracy
The evaluation standards consist of overall accuracy (OA), on the other five target datasets. This includes the Indian Pines
average accuracy (AA), and kappa coefficient (Kappa). and WHU-Hi-HanChuan datasets, which have a larger dispar-
To minimize experimental variability, the experiments were ity between spatial resolutions of the source and target domain
replicated ten times, and the outcomes are presented as the datasets. 0.74% improvement of OA and 0.8% gain of Kappa
respective average and standard deviation of three evaluation are observed on the Indian Pines dataset. The improvement
criteria across the ten iterations. amount on the WHU-Hi-HanChuan dataset is approaching.
These two target datasets both have 16 classes and include
many class types of crops similar to the Chikusei dataset.
D. Experimental Results When using transfer learning from the Chikusei dataset to clas-
To assess the performance in cross-transfer learning exper- sification of the WHU-Hi-LongKou UAV hyperspectral data,
iments, the low-spatial-resolution dataset of Chikusei and the proposed approach is found the most improvement over
the high-spatial-resolution dataset of WHU-Hi-HanChuan are CTFSL, 1.9 % improvement of OA and 2.36% of Kappa are
alternatively utilized as source-domain data to learn trans- observed. Datasets of Chikusei and WHU-Hi-LongKou also
ferable knowledge. The reason for choosing Chikusei as the have similar class types, but the transferred knowledge is only
source domain is that among the four low-spatial-resolution used to classify nine classes. The results here demonstrate,
hyperspectral datasets, the Chikusei dataset contains the most compared to the counterparts, that our network enables more
abundant landscape categories among all the datasets. Rich effective transfer learning for classes in common. Even though
and diverse class types in the source-domain data enable the Salinas dataset also has 16 classes, its landscape class
the provision of extensive knowledge support for the transfer types are mainly related to different weeds, and rarely are
task. For the same concern, WHU-Hi-HanChuan is picked as they similar to those in Chikusei data. After all, these results
the source domain from the two UAV high-spatial-resolution collectively underscore the efficacy of our approach in aligning
hyperspectral datasets. domain distributions while preserving category independence
Upon option of the Chikusei dataset as the source-domain and the feasibility of transferring knowledge from low- to
data, the remaining five datasets are employed as target high-spatial-resolution HSIs.
domains. Table III reports the classification results, respec- Figs. 4–8 visually showcase the classification maps of
tively, on each dataset. The optimal classification accuracy various approaches on different hyperspectral data. The visual
achieved on each dataset is highlighted in bold. Clearly, all observation results are in accordance with the comparative
the domain-adaptive FSL methods (DFSL + NN, DFSL + results in Table III. To visually highlight the classification
SVM, DCFSL, CTFSL, and proposed method) outperform results of these categories, red boxes are used to indicate
nondomain-adaptive methods (SVM, 3D-CNN). Among all areas with distinct classification performance in Figs. 4–8.
the FSL methods, CTFSL is the most potential competitor Compared to other counterparts, our approach achieves more
of the proposed network. By designing an FSL method that accurate classification results on categories misclassified by
aligns domains while preserving the independence of category other methods, such as class 11 (Soybean-mintill) in the
distributions, our proposed approach ranks the first with the Indian Pines dataset, class 8 (Grapes_untrained) in the Salinas
highest OA and Kappa metrics nearly on all the datasets except dataset, class 2 (Meadows) in the Pavia University dataset,
for Salinas. CTFSL only behaves slightly better than our class 4 (Broad-leaf soybean) in the WHU-Hi-LongKou
approach in transfer learning on the Salinas dataset. Compared dataset, and class 1 (Strawberry) in the WHU-Hi-HanChuan

Fig. 4. Classification maps of Indian Pines HSI (Chikusei as source-domain data) achieved by various approaches with five training samples per class.
(a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

Fig. 5. Classification maps of Salinas HSI (Chikusei as source domain data) achieved by various approaches with five training samples per class. (a) Ground
truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

Fig. 6. Classification maps of Pavia University HSI (Chikusei as source-domain data) achieved by various approaches with five training samples per class.
(a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

dataset. The above-mentioned categories mainly involve crops, in the Salinas dataset, class 1 (Asphalt) and class 6 (Bare
which have significant differences from the categories in the soil) in the Pavia University dataset, class 8 (Roads and
source-domain Chikusei data. This demonstrates the impor- houses) in the WHU-Hi-LongKou dataset, and class 10 (Red
tance of our method in learning discriminative knowledge roof) in the WHU-Hi-HanChuan dataset, CTFSL achieves
from both the source and target domains. On the contrary, for better classification results than ours. Those class types show
class 8 (Hay-windrowed) in the Indian Pines dataset, class 16 relatively small differences compared to the categories in the
(Vinyard_vertical_trellis) and class 9 (Soil_vinyard_develop) source-domain Chikusei dataset. In this case, CTFSL utilizing

Fig. 7. Classification maps of WHU-Hi-LongKou HSI (Chikusei as source-domain data) achieved by various approaches with five training samples per class.
(a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

Fig. 8. Classification maps of WHU-Hi-HanChuan HSI (Chikusei as source-domain data) achieved by various approaches with five training samples per
class. (a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed Method.

a Transformer to learn both global and local information may gain of AA and Kappa is reported on the Indian Pines
play well in emphasizing the acquisition of domain-invariant dataset. On the contrary, it is seen less than 0.3% accuracy
knowledge. improvement on the Chikusei dataset. This is because the
To assess the efficacy of the proposed approach under con- source dataset of WHU-Hi-HanChuan has 16 classes, while
trary experimental conditions, in the following experiments, the Chikusei dataset has 19 classes. There are new class types
WHU-Hi-HanChuan is chosen as the source data for learning, that cannot benefit from source data learning. For the results
while the other datasets are used as target data for prediction. obtained on the rest two datasets, the accuracy improvement in
A summary of all the outcomes can be found in Table IV. The terms of the two accuracy metrics is all over 1%. These results
comparison results of SVM and 3D-CNN are consistent with indicate the feasibility of transferring learning knowledge from
the results in Table III. Clearly, domain-adaptive FSL methods high-spatial-resolution data to low-spatial-resolution HSIs.
exhibit notably superior performance compared to nondomain- Figs. 9–13 furthermore show the classification maps of
adaptive approaches. Compared to the second-best method various approaches on different hyperspectral data when using
of CTFSL, the hugest accuracy increase is also achieved on source-domain data of WHU-Hi-HanChuan. Despite it having
the dataset of Indian Pines which has the largest difference higher spatial resolution and providing detailed spatial infor-
in spatial resolution with WHU-Hi-HanChuan. Around 1.8% mation, it has fewer categories compared to target-domain

TABLE IV
C LASSIFICATION R ESULTS (%) ON D IFFERENT DATASETS W ITH F IVE T RAINING S AMPLES PER C LASS (WHU-HI-H AN C HUAN AS THE S OURCE D OMAIN )

Fig. 9. Classification maps of Chikusei HSI (WHU-Hi-HanChuan as source-domain data) achieved by various approaches with five training samples per
class. (a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

datasets of Chikusei, Indian Pines, and Salinas. Even under (Broad-leaf soybean) in the WHU-Hi-LongKou dataset, the
such conditions, the classification results of areas in the red proposed approach generates results that are approaching the
boxes in those figures still show our method maintains superior ground truth more closely. This implies the proposed method,
classification performance on misclassified categories com- which is capable of learning both domain-invariant knowl-
pared to other methods. For instance, in category 7 (Forest) edge and discriminative knowledge from both the source and
and category 8 (Grass) in the Chikusei dataset, class 3 (Corn- target domains, is more robust in transfer learning when the
mintill) and class 11 (Soybean-mintill) in the Indian Pines source-domain data provides less transferable information.
dataset, category 13 (Lettuce_romaine_6wk) in the Salinas Besides, by comparing figures in Tables III and IV, it can
dataset, as well as category 3 (Sesame) and category 4 also be observed that, when Salinas and WHU-Hi-LongKou

Fig. 10. Classification maps of Indian Pines HSI (WHU-Hi-HanChuan as source-domain data) achieved by various approaches with five training samples
per class. (a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

Fig. 11. Classification maps of Salinas HSI (WHU-Hi-HanChuan as source-domain data) achieved by various approaches with five training samples per
class. (a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL+NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

are used as the target-domain dataset, the cross-domain target datasets is worse than that learned from low-spatial-
classification performance achieved by learning from high- resolution data of Chikusei. Different from the Chikusei
spatial-resolution data of WHU-Hi-HanChuan is superior to dataset as the source dataset which has the most categories,
that learned from the low-spatial-resolution data of Chiku- the WHU-Hi-HanChuan dataset has fewer classes. It has
sei. Specifically, the corresponding metrics obtained by the 16 categories which are primarily related to crops such
proposed method on the Salinas dataset in Table IV have as Watermelon, Water spinach, Sorghum, and so on. Even
furthermore improved based on the results in Table III though the source data of Indian Pines data also consists of
(OA—0.37%, AA—0.89%, and Kappa—0.43%, respectively). 16 categories, those types are mainly related to vegetation
On the WHU-Hi-LongKou, Table IV showed that the pro- and crops. There exists obvious diversity between crop types.
posed method get over 1% improvement for the three As for the Pavia University dataset, it consists of nine classes
accuracy metrics (OA—1.01%, AA—1.33%, and Kappa— like Asphalt, Bricks, Meadows, and more, which are signif-
1.29%, respectively) compared to that in Table III. These icantly different from the categories in WHU-Hi-HanChuan.
results seem to indicate that learning from hyperspectral data Those facts may result in a decrease in cross-domain classi-
of high spatial resolution provides more abundant or potential fication performance. This discrepancy makes it challenging
information that can be transferred for enhancing the classifi- to transfer enough informative common knowledge from the
cation ability on the target-domain datasets. source domain to the target domain.
However, opposite results were also observed when com- To explore the influence of the number of labeled samples
paring the accuracy of the proposed method and CTFSL on from the target data on classification performance, we con-
Indian Pines and University of Pavia data. Comparisons of ducted experiments using Chikusei as source-domain data. The
these figures in Tables III and IV show that the cross-domain quantity of labeled samples in the target domain for each class
classification performance achieved by learning from high- is systematically varied from 1 to 10 at increments of 1. These
spatial-resolution data of WHU-Hi-HanChuan to these two samples are selected randomly to form training sets for the

Fig. 12. Classification maps of Pavia University HSI (WHU-Hi-HanChuan as source-domain data) achieved by various approaches with five training samples
per class. (a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

Fig. 13. Classification maps of WHU-Hi-LongKou HSI (WHU-Hi-HanChuan as source-domain data) achieved by various approaches with five training
samples per class (a) Ground truth. (b) SVM. (c) 3D-CNN. (d) DFSL + NN. (e) DFSL + SVM. (f) DCFSL. (g) CTFSL. (h) Proposed method.

TABLE V
C OMPUTATION T IME AND PARAMETERS OF D IFFERENT M ETHODS ON D IFFERENT TARGET-D OMAIN DATASETS (C HIKUSEI AS S OURCE -D OMAIN DATA )

target domain. The experiments are also repeated ten times careful observation can be found that before the number
across various labeled sample numbers, and the corresponding of training samples reaches 5, rapid growth is observed in
average and standard deviation of the representative metric of most cases. When more training samples are supplied, the
OA (three evaluation criteria) are computed as the accuracy classification accuracy of each method increases gradually
results. Fig. 14 illustrates the OA curves of various techniques toward a stabilizing tendency. In comparison to counterpart
across different quantities of labeled data on five target-domain approaches, the proposed method stands out in six cases with
datasets. The plots in this figure clearly indicate the consistent the most competitive accuracy and stable increasing trend.
ascending trend of classification accuracy of all methods as For other methods, the accuracy curves are witnessed with
the count of labeled samples of each dataset increases. More more times of variation. Therefore, it needs to be pointed out

Fig. 14. Classification accuracy of various approaches using different numbers of labeled samples per class on six target datasets (Chikusei as source-domain
data). (a) Indian Pines. (b) Salinas. (c) Pavia University. (d) WHU-Hi-LongKou. (e) WHU-Hi-HanChuan.

that the experimental results in Section III-D are achieved by the cross-domain FSL problem in real-world scenarios. The
implementing each method using five labeled samples. potential of the proposed approach throws light on broad
Besides the classification accuracy, the computational com- applications by extending this method to adapt to more
plexity of the method proposed is also provided in Table V. practical application scenarios with different platforms and
Taking the experiments using the Chikusei dataset as a source spectral sensors. It also needs to be noted that the potential for
domain as an example, training time, test time, and the number promising classification with the proposed method also relates
of parameters needed to learn for each network are reported to to the spatial resolution difference between the two domain
intuitively demonstrate the computational efficiency of various data, and the number and diversity of classes of source-
methods. Training time of nondomain adaptation methods domain data. Comparisons of the accuracies of the proposed
(SVM, 3D-CNN) indicates the time spent on training models method and CTFSL on Indian Pines and the University of
on target-domain samples, while testing time refers to the Pavia data in the experiments present the limitations of our
time on predicting unlabeled samples in the target domain. approach. When the number of categories in the source domain
On the other hand, domain adaptation methods (DFSL + NN, is fewer than that in the target domain or when the diversity
DFSL + SVM, DCFSL, CTFSL, and the proposed method) of the source-domain dataset is relatively low compared to the
encompass the time training models in both the source and target domain, the classification performance of the method
target domains, along with the transfer learning process. Test- does not demonstrate superiority over other methods. This
ing time is the time predicting each unlabeled sample in the is mainly because certain categories in the target domain
target domain. The figures in Table V indicate that compared to cannot benefit from the learning in the source domain data.
nondomain adaptation methods, domain adaptation networks In other words, the learned knowledge from the source domain
take longer training time and have much more network param- provides limited assistance in classifying the target domain.
eters needed to be learned. In fact, they encounter higher Therefore, based on this initial investigation conducted in this
computational costs associated with transfer learning. Among study, future work could explore more of those factors.
domain adaptation methods, our approach has more model
parameters compared to DFSL + NN and DFSL + SVM. This
is primarily due to the inclusion of a discriminator with five R EFERENCES
fully connected layers in the conditional adversarial network [1] P. Ghamisi et al., “Advances in hyperspectral image and signal process-
used during domain fusion. Additionally, the parameter count ing: A comprehensive overview of the state of the art,” IEEE Geosci.
of our proposed network is similar to that of DCFSL. This Remote Sens. Mag., vol. 5, no. 4, pp. 37–78, Dec. 2017.
shows that our proposed domain separation module learns [2] X. Yang and Y. Yu, “Estimating soil salinity under various moisture
conditions: An experimental study,” IEEE Trans. Geosci. Remote Sens.,
discriminative features from the source and target domains vol. 55, no. 5, pp. 2525–2533, May 2017.
very effectively without sacrificing model complexity. It is [3] M. Zhang, W. Li, X. Zhao, H. Liu, R. Tao, and Q. Du, “Morphological
worth noting that the training time and parameter count of transformation and spatial-logical aggregation for tree species classifi-
cation using hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens.,
CTFSL are more than twice as much as ours, as CTFSL intro- vol. 61, 2023, Art. no. 5501212.
duces a transformer for global feature extraction. While this [4] D. Zhu, B. Du, and L. Zhang, “Two-stream convolutional networks
enhances classification efficiency, it also significantly increases for hyperspectral target detection,” IEEE Trans. Geosci. Remote Sens.,
vol. 59, no. 8, pp. 6907–6921, Aug. 2021.
model complexity. Together with the previous classification
[5] M. Shimoni, R. Haelterman, and C. Perneel, “Hypersectral imaging for
accuracy results in Tables III and IV, the comprehensive results military and security applications: Combining myriad processing and
present that our approach behaves the best among the advanced sensing techniques,” IEEE Geosci. Remote Sens. Mag., vol. 7, no. 2,
domain adaptation methods after tradeoff between acceptable pp. 101–117, Jun. 2019.
[6] B. Tu, Q. Ren, C. Zhou, S. Chen, and W. He, “Feature extraction using
efficiency and precise classification. multidimensional spectral regression whitening for hyperspectral image
classification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,
IV. C ONCLUSION vol. 14, pp. 8326–8340, 2021.
[7] S. Jia, X. Deng, J. Zhu, M. Xu, J. Zhou, and X. Jia, “Collabora-
This article introduces an innovative neural network aimed tive representation-based multiscale superpixel fusion for hyperspectral
at tackling the challenge of cross-domain FSL. A dual- image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10,
branch domain adaptation network that combines domain pp. 7770–7784, Oct. 2019.
fusion and domain separation is developed. The domain fusion [8] S. Jia, Z. Zhan, and M. Xu, “Shearlet-based structure-aware filtering for
hyperspectral and LiDAR data classification,” J. Remote Sens., vol. 2021,
branch aligns the data distributions of two domains using pp. 1–12, Jan. 2021.
conditional adversarial networks, while the domain separation [9] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based
branch achieves class distribution independence through a gate classification of hyperspectral data,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014.
mechanism. This approach effectively transmits knowledge [10] A. Vali, S. Comai, and M. Matteucci, “Deep learning for land use and
garnered from the source domain to an entirely disparate target land cover classification based on hyperspectral and multispectral Earth
domain. The scope of knowledge transfer encompasses various observation data: A review,” Remote Sens., vol. 12, no. 15, p. 2495,
Aug. 2020.
scenarios, including the transfer from low-spatial-resolution
[11] M. E. Paoletti, J. M. Haut, J. Plaza, and A. Plaza, “Deep learning
HSIs to the classification task of high-spatial-resolution HSIs, classifiers for hyperspectral imaging: A review,” ISPRS J. Photogramm.
and vice versa. Experimental results validate the superiority of Remote Sens., vol. 158, pp. 279–317, Dec. 2019.
this proposed technique, surpassing all existing cross-domain [12] S. Mei, X. Chen, Y. Zhang, J. Li, and A. Plaza, “Accelerating
convolutional neural network-based hyperspectral image classification
FSL methods in terms of classification capability. There- by step activation quantization,” IEEE Trans. Geosci. Remote Sens.,
fore, this article provides an effective solution for addressing vol. 60, 2022, Art. no. 5502012.

[13] S. Mei, X. Li, X. Liu, H. Cai, and Q. Du, “Hyperspectral image [35] M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial
classification using attention-based bidirectional long short-term mem- domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31,
ory network,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, 2018, pp. 1640–1650.
Art. no. 5509612. [36] H. Liang et al., “Training interpretable convolutional neural networks by
[14] S. Mei, J. Ji, Y. Geng, Z. Zhang, X. Li, and Q. Du, “Unsupervised differentiating class-specific filters,” in Computer Vision—ECCV 2020.
spatial–spectral feature learning by 3D convolutional autoencoder for Cham, Switzerland: Springer, 2020, pp. 622–638.
hyperspectral classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, [37] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with
no. 9, pp. 6808–6820, Sep. 2019. gumbel-softmax,” 2016, arXiv:1611.01144.
[15] H. Xu, W. He, L. Zhang, and H. Zhang, “Unsupervised spectral–spatial [38] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote
semantic feature learning for hyperspectral image classification,” IEEE sensing images with support vector machines,” IEEE Trans. Geosci.
Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5526714. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
[16] R. Fan, R. Feng, L. Wang, J. Yan, and X. Zhang, “Semi-MCNN: [39] Y. Li, H. Zhang, and Q. Shen, “Spectral–spatial classification of hyper-
A semisupervised multi-CNN ensemble learning method for urban land spectral imagery with 3D convolutional neural network,” Remote Sens.,
cover classification using submeter HRRS images,” IEEE J. Sel. Topics vol. 9, no. 1, p. 67, Jan. 2017.
Appl. Earth Observ. Remote Sens., vol. 13, pp. 4973–4987, 2020.
[17] S. Jia et al., “A semisupervised Siamese network for hyperspectral
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022,
Art. no. 5516417. Zhuowei Wang received the B.S. degree in com-
[18] X. Li, Z. Cao, L. Zhao, and J. Jiang, “ALPN: Active-Learning-Based puter science and technology from the China
prototypical network for few-shot hyperspectral imagery classification,” University of Geosciences, Wuhan, China, in 2007,
IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022. and the M.S. and Ph.D. degrees in computer sys-
[19] H. L. Yang and M. M. Crawford, “Domain adaptation with preservation tems architecture from Wuhan University, Wuhan,
of manifold geometry for hyperspectral image classification,” IEEE in 2009 and 2012, respectively.
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 2, From 2019 to 2020, she worked as a Visiting
pp. 543–555, Feb. 2016. Scholar with the Norwegian University of Science
[20] X. Ma, X. Mou, J. Wang, X. Liu, J. Geng, and H. Wang, “Cross-dataset and Technology, Gjøvik, Norway. She is currently a
hyperspectral image classification based on adversarial domain adapta- Professor with the School of Computer Science and
tion,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4179–4190, Technology, Guangdong University of Technology,
May 2021. Guangzhou, China. Her research interests focus on high-performance com-
[21] Y. Huang et al., “Two-branch attention adversarial domain adaptation puting, low-power optimization, and distributed systems.
network for hyperspectral image classification,” IEEE Trans. Geosci.
Remote Sens., vol. 60, 2022, Art. no. 5540813.
[22] Y. Huang et al., “Cross-scene wetland mapping on hyperspectral remote
sensing images using adversarial domain adaptation network,” ISPRS Shihui Zhao received the B.S. degree from Chuzhou
J. Photogramm. Remote Sens., vol. 203, pp. 37–54, Sep. 2023. University, Anhui, China, in 2015. She is currently
[23] Y. Zhang, W. Li, M. Zhang, Y. Qu, R. Tao, and H. Qi, “Topological pursuing the master’s degree with the School of
structure and semantic information transfer network for cross-scene Computer Science and Technology, Guangdong Uni-
hyperspectral image classification,” IEEE Trans. Neural Netw. Learn. versity of Technology, Guangzhou, China.
Syst., vol. 34, pp. 2817–2830, Feb. 2021. Her research interests include hyperspectral image
[24] Y. Zhang, W. Li, W. Sun, R. Tao, and Q. Du, “Single-source domain processing, computer vision, and deep learning.
expansion network for cross-scene hyperspectral image classification,”
IEEE Trans. Image Process., vol. 32, pp. 1498–1512, 2023.
[25] C. Zhang, J. Yue, and Q. Qin, “Global prototypical network for few-
shot hyperspectral image classification,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 13, pp. 4748–4759, 2020.
[26] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few
examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53, Genping Zhao received the Ph.D. degree in
no. 3, pp. 1–34, May 2021. information and telecommunications engineering
[27] K. Gao, B. Liu, X. Yu, J. Qin, P. Zhang, and X. Tan, “Deep relation from Harbin Engineering University, Harbin, China,
network for hyperspectral image few-shot classification,” Remote Sens., in 2017.
vol. 12, no. 6, p. 923, Mar. 2020. From December 2013 to January 2015, she worked
as a Visiting Ph.D. Student with the University
[28] Z. Li, M. Liu, Y. Chen, Y. Xu, W. Li, and Q. Du, “Deep cross-domain
of Western Australia, Perth, WA, Australia. From
few-shot learning for hyperspectral image classification,” IEEE Trans.
November 2018 to November 2019, she worked
Geosci. Remote Sens., vol. 60, 2022, Art. no. 5501618.
as a Post-Doctoral Fellow with the University of
[29] Y. Zhang, W. Li, M. Zhang, S. Wang, R. Tao, and Q. Du, “Graph Alberta, Edmonton, AB, Canada. She is currently
information aggregation cross-domain few-shot learning for hyperspec- a Lecturer with the School of Computer Science
tral image classification,” IEEE Trans. Neural Netw. Learn. Syst., early and Technology, Guangdong University of Technology, Guangzhou, China.
access, Jun. 30, 2022, doi: 10.1109/TNNLS.2022.3185795. Her research interests focus on multisource remote-sensing data analysis and
[30] F. Xu, G. Zhang, C. Song, H. Wang, and S. Mei, “Multiscale and cross- machine learning.
level attention learning for hyperspectral image classification,” IEEE
Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no. 5501615.
[31] Y. Peng, Y. Liu, B. Tu, and Y. Zhang, “Convolutional transformer-based
few-shot learning for cross-domain hyperspectral image classification,” Xiaoyu Song received the Ph.D. degree from the
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16, University of Pisa, Pisa, Italy, in 1991.
pp. 1335–1349, 2023. From 1992 to 1998, he was a Faculty Member
[32] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac- with the University of Montreal, Montreal, QC,
tion and classification of hyperspectral images based on convolutional Canada. He joined the Department of Electrical and
neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, Computer Engineering, Portland State University,
pp. 6232–6251, Oct. 2016. Portland, OR, USA, in 1998, where he is currently
[33] B. Liu, X. Yu, A. Yu, P. Zhang, G. Wan, and R. Wang, “Deep few-shot a Professor. His research interests include formal
learning for hyperspectral image classification,” IEEE Trans. Geosci. methods, design automation, embedded computing
Remote Sens., vol. 57, no. 4, pp. 2290–2304, Apr. 2019. systems, and emerging technologies.
[34] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few- Dr. Song was awarded an Intel Faculty Fellowship
shot learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, from 2000 to 2005. He was an Editor of IEEE T RANSACTIONS ON VLSI
pp. 4080–4090. S YSTEMS and IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS.

Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on February 07,2024 at 08:58:17 UTC from IEEE Xplore. Restrictions apply.