0% found this document useful (0 votes)

22 views5 pages

Spatial Gated Multi-Layer Perceptron For Land Use and Land Cover Mapping

This article proposes a new deep learning model called the Spatial Gated Multi-Layer Perceptron (SGU-MLP) for classifying land use and land cover from multi-modal remote sensing data. The SGU-MLP combines multi-layer perceptrons and spatial gating units to efficiently capture spatial interactions across input data without requiring positional embeddings, as used in vision transformers. Experimental results showed the SGU-MLP outperformed several CNN and CNN-vision transformer based models for land use/land cover classification using hyperspectral, multispectral, and LiDAR data, while requiring fewer training samples than these other models.

Uploaded by

p20190416

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views5 pages

Spatial Gated Multi-Layer Perceptron For Land Use and Land Cover Mapping

Uploaded by

p20190416

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

This article has been accepted for publication in IEEE Geoscience and Remote Sensing Letters.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LGRS.2024.3354175

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1

Spatial Gated Multi-Layer Perceptron for Land Use

and Land Cover Mapping
Ali Jamali,* Swalpa Kumar Roy,* Member, IEEE, Danfeng Hong, Senior Member, IEEE,
Peter M Atkinson, and Pedram Ghamisi, Senior Member, IEEE

Abstract—Due to its capacity to recognize detailed spectral urban sprawl, is essential for understanding its environmental
differences, hyperspectral data have been extensively used for consequences, as well as promoting the adoption of more
precise Land Use Land Cover (LULC) mapping. However, recent sustainable forms of urban expansion. Hyperspectral (HS) data
multi-modal methods have shown their superior classification
performance over the algorithms that use single data sets. On the have been utilized widely for accurate LULC mapping due
other hand, Convolutional Neural Networks (CNNs) are models to their ability to distinguish subtle spectral differences [2].
extensively utilized for the hierarchical extraction of features. However, recent research on the use of multi-modal models,
Vision transformers (ViTs), through a self-attention mechanism, such as multi-modal fusion transformer (MFT) network has
have recently achieved superior modeling of global contextual proven their superior classification performance compared to
information compared to CNNs. However, to harness their
image classification strength, ViTs require substantial training the models that utilize only hyperspectral data [3].
datasets. In cases where the available training data is limited, It has been shown that due to the complex characteristics
current advanced multi-layer perceptrons (MLPs) can provide of HS data, conventional machine learning models, such as
viable alternatives to both deep CNNs and ViTs. In this paper, the random forests, struggle to accurately classify HSI [2].
we developed the SGU-MLP, a deep learning algorithm that Furthermore, traditional models do not take spatial informa-
effectively combines MLPs and spatial gating units (SGUs) for
precise Land Use Land Cover (LULC) mapping using multi- tion. Additionally, hyperspectral imaging often involves a nat-
modal data from multi-spectral, LiDAR, and hyperspectral data. urally nonlinear interaction between the corresponding ground
Results illustrated the superiority of the developed SGU-MLP classes and the acquired spectral information [2]. On the other
classification algorithm over several CNN and CNN-ViT-based hand, deep learning models have been used increasingly for
models, including HybridSN, ResNet, iFormer, EfficientFormer, HS classification in recent years. In particular, Convolutional
and CoAtNet. The SGU-MLP classification model consistently
outperformed the benchmark CNN and CNN-ViT-based algo- Neural Networks (CNNs) are widely used models because of
rithms. The code will be made publicly available at https: their ability for automatic hierarchical feature extraction. To
//github.com/aj1365/SGUMLP address the limitation of CNNs in capturing global contextual
Index Terms—Attention mechanism, image classification, spa- information, vision transformers (ViTs) have been successfully
tial gating unit (SGU), vision transformers. employed for HSI classification [4]. ViTs use self-attention
mechanisms to obtain global contextual information more
I. I NTRODUCTION effectively than CNNs, significantly increasing the accuracy
of HS classification [3].

L And use and land cover (LULC) change is one of the

most significant indicators of anthropogenic interaction
with the natural environment. Massive growth in land use
To fully benefit from current CNNs, a significant number
of reference data are needed, while ViTs require even larger
training datasets to maximize image classification accuracy.
because of forest destruction, urbanization, and soil erosion On the other hand, where fewer training data are available,
has altered the global landscape and increased stress on current advanced Multi-layer Perceptrons (MLPs) can be used
natural ecosystems across the world [1]. Analysis of urban as an alternative to both deep CNNs and ViTs [5]. Employing
growth, including intense growth in urban areas known as MLP models requires far fewer reference data compared to
This research was funded by the Institute of Advanced Research in Artificial CNNs and ViTs due to fewer parameters needing to be trained.
Intelligence (IARAI). (Corresponding author: Pedram Ghamisi) Generally, similar to traditional classifiers (e.g., the random
A. Jamali is with the Department of Geography, Simon Fraser University, forest), MLPs utilize solely spectral information and ignore
8888 University Dr, Burnaby, BC V5A 1S6, Canada (e-mail: [email protected]).
S. K. Roy is with the Department of Computer Science and Engineering, the spatial interaction between pixels and their surroundings,
Alipurduar Government Engineering and Management College, West Bengal resulting in lower classification accuracy. Thus, in this paper,
736206, India (e-mail: [email protected]). we develop and propose the SGU-MLP, a deep learning
D. Hong is with the Aerospace Information Research Institute, Chinese
Academy of Sciences, 100094 Beijing, China, and also with the School of classifier that employs MLPs and a spatial gating unit (SGU)
Electronic, Electrical and Communication Engineering, University of Chinese for accurate LULC modeling utilizing hyperspectral, multi-
Academy of Sciences, 100049 Beijing, China. (e-mail: [email protected]). spectral, and LiDAR data. The SGU concept enables the al-
P. M. Atkinson is with the Faculty of Science and Technology, Lancaster
University, Lancaster, U.K. (e-mail:[email protected] ). gorithm to efficiently characterize complex spatial interactions
P. Ghamisi is with the Helmholtz-Zentrum Dresden-Rossendorf (HZDR), across input data tokens without the use of positional informa-
Helmholtz Institute Freiberg for Resource Technology, 09599 Freiberg, Ger- tion embedding as utilized in popular ViTs. The SGU-MLP
many, and is also with the Institute of Advanced Research in Artificial
Intelligence (IARAI), 1030 Vienna, Austria (e-mail: [email protected]). model’s final layer employs a structure entirely composed of
(* indicates these two authors contributed equally to the work.) MLPs, eliminating the requirement for CNNs or ViTs and,

Authorized licensed use limited to: SIMON FRASER UNIVERSITY. Downloaded on January 15,2024 at 22:38:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Geoscience and Remote Sensing Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LGRS.2024.3354175

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 2

consequently, minimizing the necessity for extensive training does not necessitate the use of positional embedding. In
data. other words, the positional embedding information is obtained
This letter introduces the SGU-MLP in Section II, illustrates through the use of spatial depth-wise convolutions [6] similar
the experiments and analyses the results in Section III, and to inverted bottlenecks employed in MobileNetV2 [7]. Con-
highlights the concluding remarks in Section IV. sidering the dense layer of D (i.e., input feature) in the MLP
block , as illustrated in Fig. 1, the SGU uses a linear projection
Buliding
Classifer
Head
layer that benefits from a contraction operation across the
Tree
Road
Grass Global Average Pooling spatial dimension of the cross-tokens interaction as defined
....
by:
...

Nx
Mixer Layer fW,b (D) = W D + b (2)
Dense Layer
Per-Pixel Fully Connected where W ∈ Rn×n defines a matrix that has a size equal to
GELU
the input sequence length, while n and b present the sequence
SGU length and biases of the tokens. It should be highlighted
DWC
Dense Layer DSM ⊙ LiDAR
that the spatial projection matrix of W is not dependent on
HSI
the input data, contradicting the self-attention models where
PCA
MLP Layer W (D) is created dynamically from the D. The SGU can be
Fig. 1: Graphical representation of spatial gated multi-layer perceptron framework for formulated as:
land use and land cover classification. The MLP-Mixer layer includes two MLPs to S(D) = D · fW,b (D) (3)
extract spatial information. ⊙ represents channel-wise concatenation.
where element-wise multiplication is represented by (·). The
SGU equation can be improved by dividing D into D1 and
II. P ROPOSED C LASSIFICATION F RAMEWORK D2 along the channel dimension. Thus, the SGU can be
As illustrated in Fig. 1, the SGU-MLP, is developed for formulated as:
image classification using a small number of training data. S(D) = D1 · fW,b (D2) (4)
For efficient application of the multi-scale representation in
the classification task, we incorporated a computationally light The output map of the DWC block is flattened and fed
and straightforward depth-wise CNN-based architecture. As to the MLP-Mixer layer. Considering a dense layer of size
presented in Fig. 2, the MLP-Mixer layer of the developed 256 × 256, The D1 and D2 both have sizes of 256 × 128. The
model includes two different types of layers: (i) MLPs utilized fW,b (D2) has a size of 256 × 128, where the S(D) has a size
across image patches for extraction of spatial information and of 256 × 128.
(ii) MLPs utilized individually to extract per-location features
from image inputs. In addition, in each MLP block, the SGU is C. Multi-layer Perceptron Mixer Block (MLP-Mixer):
utilized to enable the developed algorithm to effectively learn
In current advanced deep vision architectures, layers com-
intricate spatial relationships among the tokens of the input
bine features in one or more of the following ways: first,
data.
at a given spatial location, second, among various spatial
locations, or third, both operations simultaneously, with a
A. Depth-wise Convolution Block (DWC): kernel of k×k convolutions (for k > 1) and pooling operations
The DWC architecture is light and straightforward and is (i.e., second operation), incorporated in CNNs. Convolutions
based on CNNs. With so many variables and the limited with kernel size 1 × 1 perform only the first operation,
available training data, a higher probability of overfitting exists whereas convolutions with larger kernels accomplish both the
during the training process. Hence, to address the overfit- first and second operations. Self-attention layers in ViTs and
ting issue and capture multi-scale feature information, we other attention-based structures include the first and second
incorporated three depth-wise convolutions in parallel. These operations, while models based on MLPs perform only the first
convolutions consist of 20 outputs channels with kernel (k) operation. The objective of the MLP-Mixer architecture is to
sizes of 1 × 1, 3 × 3, and 5 × 5, respectively. Feature maps X distinguish between cross-location (height and width mixing)
with a size of 9 × 9 × d are the input for the DWC block that operations and per-location (channel-mixing) operations, as
produces output DZ , where d is the number of bands. presented in Fig. 2 [5]. A series of non-overlapping patches
X of images E from the output feature of the DWC block DZ
DZ = DWConv2D(k×k) (X) (1) are the input to the MLP-Mixer that is projected to a given
j=1,3,5 hidden dimension of C, resulting in two-dimensional table of
The output maps of the three depth-wise CNNs are added M ∈ RE×C . The output features of the DWC block are first
and fed to the MLP-Mixer blocks. flattened and then fed to the MLP-Mixer layers. Given the
input image of size H ×W , and patches of F ×F , the number
of patches would be E = H×W F 2 , where all resulting patches
B. Spatial gating unit (SGU): of images are projected into the same projection matrix. For
The SGU is designed to extract complex spatial interaction instance, considering the input image size of 9×9, the reshaped
across tokens. Unlike, the current ViT models, the SGU feature has a size of 9 × 9 = 81. As we set the dimension of

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 3

MLP-Mixer Layer
Skip Connection

Height Width Channel

MLP

MLP
Layer Norm Layer Norm
Mixing Mixing Mixing

Skip Connection

Fig. 2: Graphical representation of MLP-Mixer layer.

the token-mixing MLP to 256, the output feature map has a III. E XPERIMENTAL R ESULTS
dimension of 81 × 256. The MLP-Mixer consists of several A. Experimental Data
layers of identical size (i.e., 4 layers), where each layer has
two MLP blocks. The first token-mixing is applied to column Houston dataset: This dataset was captured over the
of the M table (i.e, it is applied to the transposed input M T ), University of Houston campus and the neighboring urban
while the second MLP block (i.e., channel mixing) is applied area. It consists of a co-registered hyperspectral and multi-
on the rows of the M table. Two fully connected layers are spectral dataset containing 144 and 8 bands, respectively, with
in each MLP block, and a non-linearity function is applied 349 × 1905 pixels. More information can be found at [8].
independently to each row of the input image tensors. As such, Berlin dataset: This dataset has a spatial resolution of
each MLP-Mixer can be formulated as: 797 × 220 pixels and contains 244 spectral bands over Berlin.
The Sentinel-1 dual-Pol (VV-VH) single-look complex (SLC)
Uι,i = Mι,i + W2 ξ(W1 LN (M )ι,i )), i = 1, ..., C (5) product represents the SAR data. The processed SAR data
have a spatial resolution of 1723 × 476 pixels. The HS data
Yj,ι = Uj,ι + W4 ξ(W3 LN (U )ι,i )), j = 1, . . . , E (6) are interpolated through the nearest neighbor algorithm, as for
the Houston dataset, to provide the same image size as the
Where ξ illustrates the element-wise non-linearity function, SAR data [9].
while LN presents layer normalization. Notably, the Augsburg dataset: This scene over the city of Augsburg,
MLP-Mixer has a linear computation complexity, which Germany includes three distinct datasets: a spaceborne HS
distinguishes it from vision transformers with quadratic dataset and a dual-Pol PolSAR image. All image spatial
computation complexity and, consequently, exhibits a high resolutions were down-scaled to a single 30 m GSD. The
level of computational efficiency. scene describes four features from the dual-Pol (VV-VH) SAR
image, 180 spectral bands for the HS dataset of 332 × 485
pixels [10].

D. Spatial Gating Unit Multi-layer Perceptron (SGU-MLP): B. Classification Results

Let us consider three data modalities: X1 , X2 , and X3 . The classification capability of the developed SGU-MLP
From these datasets, image patches with a size of 9 × 9 are was evaluated against several CNN-based and cutting-edge
extracted and then concatenated. It should be noted that a CNN-ViT algorithms, including HybridSN [11], ResNet [12],
Principal Component Analysis (PCA) algorithm is utilized for iFormer [13], EfficientFormer [14], and CoAtNet [15]. In
dimension reduction of the HS data. After applying the PCA the Augsburg dataset, as shown in Table I, the devel-
algorithm, the channel numbers for the Berlin, Augsburg, and oped SGU-MLP algorithm demonstrated superior classifi-
Houston data benchmarks are 15, 12, and 12, respectively. cation performance with an average accuracy of 65.75%
In this study, we utilized concatenation-based models, which compared to ResNet (43.57%), CoAtNet (49.9%), Efficient-
stack the source multimodal remote sensing imagery before Former (52.81%), iFormer (52.96%), and HybridSN (55.76%).
passing it through a particular feature extractor or learner to The developed SGU-MLP classifier significantly increased
generate the combined features. As depicted in Fig. 1, the the classification accuracy of the CNN-ViT-based algorithms,
concatenated layer is fed into the DWC layer. After passing iFormer, EfficientFormer, and CoAtNet, by about 13, 13,
through the DWC block, the input images of size 9 × 9 × B and 16 percentage points in terms of average accuracy, as
result in feature maps of equal size, i.e., 9 × 9 × B, where B illustrated in Table I and Fig. 3.
represents the number of bands. The resulting feature map In the Berlin study area, the SGU-MLP classifier with an
is then flattened and passed on to the MLP-Mixer blocks. average accuracy of 65.89% considerably increased the clas-
The MLP-Mixer comprises four blocks with patch sizes of sification accuracy of the other CNN-ViT algorithms iFormer
4 (i.e., input feature), a token-mixing dimension of 256, and (62.89%), CoAtNet (60.53%), and Efficientformer (60.05%)
a channel-mixing dimension of 256. As discussed, in each by approximately 3, 5 and 6 percentage points, respectively, as
MLP block, the SGU is employed to extract complex spatial shown in Table II and Fig. 4. Moreover, as shown in Table III
interactions between the tokens before the activation function and Fig. 5, with an average accuracy of 87.25%, the SGU-MLP
(i.e., GELU) is applied. Finally, the last layer of the MLP- algorithm noticeably surpassed the classification performance
Mixer is a dense layer with a softmax activation function. The of the ResNet (71.42%), iFormer (72.86%), Efficientformer
size of the last layer is determined by the number of existing (70.69%), CoAtNet (75.62%), and HybridSN (76.44%), re-
classes in each study area. spectively, in the Houston pilot site. The developed SGU-MLP

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 4

classification model outperformed the other CNN and CNN- the MLP-Mixer’s classification accuracy was increased by
ViT-based algorithms of the HybridSN, CoAtNet, Efficient- approximately 7 percentage points to 87.25%.
former, iFormer, and ResNet by about 11, 14, 15, 15, and 19
TABLE IV: Classification results of Augsburg dataset in terms of F-1 score where κ =
percentage points, respectively, in terms of average accuracy, Kappa index, OA = Overall Accuracy, AA = Average Accuracy, respectively.
as demonstrated in Table III. Class MLP SGU + MLP DWC + MLP SGUMLP
Forest 0.87 0.92 0.91 0.93
Residential 0.91 0.93 0.92 0.96
TABLE I: Classification results of Augsburg dataset in terms of F-1 score where κ = Industrial 0.36 0.52 0.55 0.59
Low Plants 0.95 0.96 0.95 0.98
Kappa index, OA = Overall Accuracy, AA = Average Accuracy, respectively. Allotment 0.20 0.20 0.21 0.27
Commercial 0.15 0.20 0.18 0.29
Class HybridSN ResNet iFormer Efficientformer CoAtNet SGU-MLP Water 0.55 0.57 0.54 0.55
Forest 0.88 0.83 0.91 0.88 0.85 0.93 OA×100 87.64 ± (0.61) 88.90 ± (0.45) 89.48 ± (0.52) 91.13 ± (0.30)
Residential 0.89 0.83 0.89 0.9 0.87 0.96 AA×100 60.96 ± (1.16) 62.36 ± (1.59) 63.59 ± (1.25) 65.75 ± (0.42)
Industrial 0.43 0.15 0.35 0.4 0.22 0.59
κ×100 82.12 ± (0.90) 84.01 ± (0.66) 84.83 ± (0.78) 87.24 ± (0.41)
Low Plants 0.87 0.88 0.88 0.88 0.98 0.96
Allotment 0.13 0.1 0.13 0.11 0.09 0.27
Commercial 0.04 0.05 0.1 0.11 0.16 0.29
Water 0.35 0.19 0.21 0.25 0.19 0.55
OA×100 82.28 79.07 82.82 82.72 81.32 91.13
AA×100 55.76 43.57 52.96 52.81 49.9 65.75
κ×100
Training time (min)
74.85
6
69.34
3
75.37
34
75.24
7
73.12
13
87.24
4
TABLE V: Classification results of Berlin dataset in terms of F-1 score where κ = Kappa
index, OA = Overall Accuracy, AA = Average Accuracy, respectively.
Class MLP SGU + MLP DWC + MLP SGUMLP
Forest 0.74 0.72 0.72 0.72
TABLE II: Classification results of Berlin dataset in terms of F-1 score where κ = Kappa Residential 0.81 0.80 0.82 0.81
Industrial 0.40 0.39 0.40 0.39
index, OA = Overall Accuracy, AA = Average Accuracy, respectively. Low Plants 0.68 0.66 0.68 0.70
Soil 0.71 0.67 0.67 0.72
Class HybridSN ResNet iFormer Efficientformer CoAtNet SGUMLP
Allotment 0.44 0.43 0.44 0.44
Forest 0.71 0.64 0.69 0.73 0.65 0.72 Commercial 0.25 0.26 0.24 0.27
Residential 0.80 0.81 0.82 0.81 0.76 0.81 Water 0.50 0.44 0.45 0.50
Industrial 0.49 0.39 0.35 0.32 0.32 0.39 OA×100 68.43 ± (0.83) 69.12 ± (0.65) 70.03 ± (0.17) 70.56 ± (0.58)
Low Plants 0.59 0.35 0.72 0.70 0.59 0.70 AA×100 65.16 ± (0.52) 65.20 ± (0.50) 64.70 ± (1.04) 65.89 ± (0.26)
Soil 0.65 0.72 0.70 0.67 0.75 0.72 κ×100 55.25 ± (0.93) 56.06 ± (0.75) 56.95 ± (0.25) 57.85 ± (0.58)
Allotment textbf0.44 0.28 0.34 0.29 0.30 0.44
Commercial 0.45 0.25 0.29 0.24 0.29 0.27
Water 0.65 0.53 0.49 0.38 0.28 0.50
OA×100 66.81 63.7 68.6 68.17 63.14 70.56
AA×100 62.67 58.23 62.84 60.05 60.53 65.89
κ×100
Training time (min)
55.84
6
47.61
3
55.28
29
54.32
7
49.21
13
57.85
4
TABLE VI: Classification results of Houston dataset in terms of F-1 score where κ =
Kappa index, OA = Overall Accuracy, AA = Average Accuracy, respectively.
Class MLP SGU + MLP DWC + MLP SGUMLP
Healthy Grass 0.89 0.90 0.90 0.90
TABLE III: Classification results of Houston dataset in terms of F-1 score where κ = Stressed Grass
Synthetic Grass
0.90
0.43
0.91
0.97
0.90
0.98
0.90
0.97
Kappa index, OA = Overall Accuracy, AA = Average Accuracy, respectively. Tree 0.89 0.94 0.94 0.92
Soil 0.96 1 1 1
Water 0.46 0.17 0.22 0.33
Class HybridSN ResNet iFormer Efficientformer CoAtNet SGUMLP Residential 0.79 0.78 0.80 0.79
Healthy Grass 0.85 0.88 0.86 0.89 0.90 0.90 Commercial 0.66 0.69 0.81 0.81
Stressed Grass 0.84 0.90 0.87 0.87 0.88 0.90 Road 0.82 0.84 0.81 0.85
Synthetic Grass 0.84 0.78 0.5 0.58 0.72 0.97 Highway 0.62 0.59 0.62 0.83
Tree 0.87 0.89 0.92 0.91 0.93 0.92 Railway 0.73 0.83 0.80 0.82
Soil 0.96 0.94 0.93 0.95 0.85 1 Parking Lot1 0.75 0.93 0.94 0.97
Water 0.73 0.71 0.29 0.39 0.25 0.33 Parking Lot2 0.79 0.69 0.88 0.86
Residential 0.69 0.72 0.68 0.6 0.79 0.79 Tennis Court 0.88 1 1 1
Commercial 0.69 0.39 0.68 0.56 0.6 0.81 Running Track 0.82 0.96 0.95 0.95
Road 0.7 0.57 0.75 0.77 0.82 0.85
Highway 0.58 0.52 0.45 0.54 0.54 0.83 OA×100 78.27 ± (1.53) 82.45 ± (0.92) 84.22 ± (0.81) 85.34 ± (0.91)
Railway 0.7 0.54 0.67 0.57 0.67 0.82 AA×100 80.53 ± (1.46) 85.03 ± (0.68) 86.38 ± (0.73) 87.25 ± (0.68)
Parking Lot1 0.74 0.42 0.48 0.71 0.55 0.97 κ×100 76.53 ± (1.65) 81.08 ± (0.98) 82.99 ± (0.88) 84.17 ± (0.96)
Parking Lot2 0.94 0.61 0.72 0.78 0.58 0.86
Tennis Court 0.84 0.77 0.74 0.73 0.56 1
Running Track 0.64 0.82 0.83 0.61 0.92 0.95
OA×100 75.62 68.16 71.03 71.66 72.67 85.34
AA×100 76.44 71.42 72.86 70.69 75.62 87.25
κ×100 73.59 65.49 68.71 69.25 70.56 84.17
Training time (min) 4 2 20 5 10 3
D. Computation cost
As illustrated in Table I, the proposed model required the
C. Ablation study least computation cost in terms of training time (4 min)
in the Augsburg data benchmark compared to other ViT-
An ablation study was performed to better understand
based models of iFormer (34 min), CoAtNet (13 min), and
the contribution and significance of different parts of the
Efficient Former (7 min). Moreover, in the Berlin dataset,
developed SGU-MLP classification algorithm. As seen in
the SGU-MLP algorithm with a required training time of 4
Table IV, the inclusion of the DWC block and SGU block
min demonstrated better computation efficiency over the other
increased the classification accuracy of the MLP-Mixer model
ViTs of iFormer (29 min), CoAtNet (13 min), and Efficient
by approximately 2 and 3 percentage points, respectively,
Former (7 min) (see Table II). In addition, as seen in Table III,
in terms of average accuracy for the Augsburg dataset. The
in the benchmark of the Houston data, the computational
highest classification accuracy was achieved by the inclusion
complexity of the SGU-MLP model was much less in terms
of both the DWC and SGU blocks with an average accuracy
of training time (3 min) compared to the other implemented
of 65.75%, increasing the classification accuracy of the MLP-
ViTs, including iFormer (20 min), CoAtNet (10 min), and
Mixer algorithm by about 5 percentage points.
Efficient Former (5 min). It is worth mentioning that an RTX
In the Berlin dataset, as illustrated in Table V, the inclusion
2070 MAX-Q GPU and Intel core-i7 CPU were utilized. The
of the SGU block and DWC block increased the classification
optimizer, loss function, batch size, and learning rate were set
accuracy of the MLP-Mixer algorithm by about 1 and 2
to Adam, Sparse Categorical Cross Entropy, 100, and 0.001,
percentage points, respectively, in terms of Kappa index. By
respectively, in all of the implemented models.
incorporating both the DWC and SGU blocks, the highest
classification was attained with a Kappa index of 57.85%.
This increased the accuracy of the MLP-Mixer classifier by IV. C ONCLUSION
approximately 3 percentage points. In this study, we developed the SGU-MLP algorithm based
As demonstrated in Table VI, the inclusion of the DWC on advanced MLP models and a spatial gating unit for land
block and SGU block increased the accuracy of the MLP- use and land cover mapping which demonstrated superior clas-
Mixer algorithm by approximately 2 and 1 percentage points, sification accuracy compared to several CNN and CNN-ViT-
respectively, in terms of average accuracy for the Houston based models. The obtained results illustrated that the utilized
dataset. By the inclusion of both the DWC and SGU blocks, MLP-Mixer architecture could obtain greater cross-location

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 5

(a) (b) (c) (d) (e) (f) (g)

Forest Residential Industrial Low plants Allotment Water

Fig. 3: Classification Maps over the Augsburg dataset using a) Study image, b) CoAtNet, c) EfficientFormer, d) HybridSN, e) iFormer, f) ResNet, and g) the SGU-MLP.

(a) (b) (c) (d) (e)

(f) (g)

Forest Residential Industrial Soil Low plants Allotment Water

Fig. 4: Classification Maps over the Berlin dataset using a) Study image, b) CoAtNet, c) EfficientFormer, d) HybridSN, e) iFormer, f) ResNet, and g) the SGU-MLP.

(a) (b) (c) (d) (e)

(f) (g)

Residential Allotment Water Commercial Healthy grass Stressed grass Synthetic grass Tree Road
Highway Railway Parking Lot1 Parking Lot2 Tennis court Running track
Fig. 5: Classification Maps over the Houston dataset using a) Study image, b) CoAtNet, c) EfficientFormer, d) HybridSN, e) iFormer, f) ResNet, and g) the SGU-MLP.

(height and width) and per-location (channel) information Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 9204–
compared to the current advanced ViTs. Additionally, the SGU 9215. [Online]. Available: https://proceedings.neurips.cc/paper files/
paper/2021/file/4cc05b35c2f937c5bd9e7d41d3686fff-Paper.pdf
increased the classification accuracy by efficiently acquiring [7] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
complex spatial interaction across image tokens. Moreover, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings
the SGU-MLP algorithm was demonstrated to be much more of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2018.
computationally efficient in terms of training time compared [8] C. Debes, A. Merentitis, R. Heremans, J. Hahn, N. Frangiadakis, T. van
to other implemented ViT-based models of iFormer, Efficient Kasteren, W. Liao, R. Bellens, A. Pižurica, S. Gautama, W. Philips,
Former, and the state-of-the-art ViT model of CoAtNet. S. Prasad, Q. Du, and F. Pacifici, “Hyperspectral and lidar data fusion:
Outcome of the 2013 grss data fusion contest,” IEEE Journal of Selected
Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6,
pp. 2405–2418, 2014.
R EFERENCES [9] A. Okujeni, S. van der Linden, and P. Hostert, “Berlin-urban-gradient
dataset 2009 - an enmap preparatory flight campaign,” 2016.
[1] J. Yang, A. Guo, Y. Li, Y. Zhang, and X. Li, “Simulation of [10] D. Hong, J. Hu, J. Yao, J. Chanussot, and X. X. Zhu, “Multimodal
landscape spatial layout evolution in rural-urban fringe areas: a remote sensing benchmark datasets for land cover classification with
case study of ganjingzi district,” GIScience & Remote Sensing, a shared and specific feature learning model,” ISPRS Journal of
vol. 56, no. 3, pp. 388–405, 2019. [Online]. Available: https: Photogrammetry and Remote Sensing, vol. 178, pp. 68–80, 2021.
//doi.org/10.1080/15481603.2018.1533680 [Online]. Available: https://www.sciencedirect.com/science/article/pii/
[2] S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, S0924271621001362
“Deep learning for hyperspectral image classification: An overview,” [11] S. K. Roy, G. Krishna, S. R. Dubey, and B. B. Chaudhuri, “Hybridsn:
IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 9, Exploring 3-d–2-d cnn feature hierarchy for hyperspectral image classi-
pp. 6690–6709, 2019. fication,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 2,
[3] S. K. Roy, A. Deria, D. Hong, B. Rasti, A. Plaza, and J. Chanussot, pp. 277–281, 2019.
“Multimodal fusion transformer for remote sensing image classification,” [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– recognition,” in Proceedings of the IEEE Conference on Computer Vision
20, 2023. and Pattern Recognition (CVPR), June 2016.
[4] H. Yan, E. Zhang, J. Wang, C. Leng, A. Basu, and J. Peng, “Hybrid conv- [13] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang,
vit network for hyperspectral image classification,” IEEE Geoscience “Informer: Beyond efficient transformer for long sequence time-
and Remote Sensing Letters, vol. 20, pp. 1–5, 2023. series forecasting,” Proceedings of the AAAI Conference on Artificial
[5] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Un- Intelligence, vol. 35, no. 12, pp. 11 106–11 115, May 2021. [Online].
terthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, Available: https://ojs.aaai.org/index.php/AAAI/article/view/17325
and A. Dosovitskiy, “Mlp-mixer: An all-mlp architecture for vision,” [14] Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang,
in Advances in Neural Information Processing Systems, M. Ranzato, and J. Ren, “Efficientformer: Vision transformers at mobilenet speed,”
A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. 2022.
Curran Associates, Inc., 2021, pp. 24 261–24 272. [15] Z. Dai, H. Liu, Q. Le, and M. Tan, “CoAtNet: Marrying convolution
[6] H. Liu, Z. Dai, D. So, and Q. V. Le, “Pay attention to and attention for all data sizes,” in Advances in Neural Information
mlps,” in Advances in Neural Information Processing Systems, Processing Systems 34, 2021.
M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W.

Helicopter Performance: Layton, Donald M
No ratings yet
Helicopter Performance: Layton, Donald M
287 pages
B737-3 71-80 B1 e (Jun2005 CMP)
100% (2)
B737-3 71-80 B1 e (Jun2005 CMP)
290 pages
Chain Drive System
100% (1)
Chain Drive System
15 pages
4 2 1
No ratings yet
4 2 1
105 pages
Shear Strenght Example
100% (1)
Shear Strenght Example
11 pages
Soft Skills
90% (10)
Soft Skills
16 pages
66kV BUSCOUPLER
No ratings yet
66kV BUSCOUPLER
73 pages
2016 FSAE Electric Vehicle Pedal Assembly Design
0% (1)
2016 FSAE Electric Vehicle Pedal Assembly Design
40 pages
LRS90 - 000 LRS 07 22
No ratings yet
LRS90 - 000 LRS 07 22
12 pages
Cost Proposal For 5tonnes Per Hour Palm Oil Extraction Plant
No ratings yet
Cost Proposal For 5tonnes Per Hour Palm Oil Extraction Plant
30 pages
09 - Revit Beginner - Stair-Ramp
No ratings yet
09 - Revit Beginner - Stair-Ramp
13 pages
Piping Codes Listing
No ratings yet
Piping Codes Listing
42 pages
Sensors 23 02204 v2
No ratings yet
Sensors 23 02204 v2
83 pages
AB60 AUTOBOX REMOVALl
100% (1)
AB60 AUTOBOX REMOVALl
9 pages
MSDS HMR PDF
No ratings yet
MSDS HMR PDF
3 pages
Assignment #1 - Mamorno
No ratings yet
Assignment #1 - Mamorno
19 pages
Remotesensing 10 01119 PDF
No ratings yet
Remotesensing 10 01119 PDF
21 pages
1 Framework Evaluating LULC CNN
No ratings yet
1 Framework Evaluating LULC CNN
23 pages
Conductor Characteristics
No ratings yet
Conductor Characteristics
20 pages
Hasan 2019 IOP Conf. Ser. Earth Environ. Sci. 357 012035
No ratings yet
Hasan 2019 IOP Conf. Ser. Earth Environ. Sci. 357 012035
11 pages
MSDS Poly SugaBetaine L (16 Section)
No ratings yet
MSDS Poly SugaBetaine L (16 Section)
5 pages
Chapter 4 - Advanced Programming Techniques: T KQJQ
No ratings yet
Chapter 4 - Advanced Programming Techniques: T KQJQ
5 pages
Comparing CNNs and Random Forests For Landsat
No ratings yet
Comparing CNNs and Random Forests For Landsat
19 pages
Satellite Image Classification Using Image Encoding and Artificial Neural Network
No ratings yet
Satellite Image Classification Using Image Encoding and Artificial Neural Network
6 pages
Savings Account - 32320100005638 Shaik MD Khadeer Ahamed
No ratings yet
Savings Account - 32320100005638 Shaik MD Khadeer Ahamed
4 pages
9.transfer Learning Models For Land Cover and Land Use Classification in Remote Sensing Image
No ratings yet
9.transfer Learning Models For Land Cover and Land Use Classification in Remote Sensing Image
20 pages
10.hyperspectral and LiDAR Data Classification Using Joint CNNs and Morphological Feature Learning
No ratings yet
10.hyperspectral and LiDAR Data Classification Using Joint CNNs and Morphological Feature Learning
16 pages
Hydranets
No ratings yet
Hydranets
12 pages
Birefringence: From Wikipedia, The Free Encyclopedia
No ratings yet
Birefringence: From Wikipedia, The Free Encyclopedia
11 pages
Electrical Equipments: Personal Protective Equipment (PPE)
No ratings yet
Electrical Equipments: Personal Protective Equipment (PPE)
2 pages
InfinityQS Case Study Nestle Waters PDF
No ratings yet
InfinityQS Case Study Nestle Waters PDF
2 pages
Jurnal Review - Object-Oriented LULC Classification in Google Earth
No ratings yet
Jurnal Review - Object-Oriented LULC Classification in Google Earth
18 pages
1 s2.0 S2210670722003687 Main
No ratings yet
1 s2.0 S2210670722003687 Main
18 pages
Remotesensing 13 00808 With Cover
No ratings yet
Remotesensing 13 00808 With Cover
42 pages
012 Recent Applications of Landsat 8OLI and Sentinel-2MSI For Land Use and Land Cover Mapping - A Systematic Review
No ratings yet
012 Recent Applications of Landsat 8OLI and Sentinel-2MSI For Land Use and Land Cover Mapping - A Systematic Review
40 pages
Deep Feature Learning and Classification of Remote Sensing Images
No ratings yet
Deep Feature Learning and Classification of Remote Sensing Images
19 pages
Lecture 3 Machine Learning For Classification Online
No ratings yet
Lecture 3 Machine Learning For Classification Online
209 pages
Urban Modelling and Forecasting of Landuse Using SLEUTH Model
No ratings yet
Urban Modelling and Forecasting of Landuse Using SLEUTH Model
20 pages
Urban Land-Use Classification Using Machine Learning Classifiers Comparative Evaluation and Post-Classification Multi-Feature Fusion Approach
No ratings yet
Urban Land-Use Classification Using Machine Learning Classifiers Comparative Evaluation and Post-Classification Multi-Feature Fusion Approach
22 pages
Syst EMS: Solar Systems
No ratings yet
Syst EMS: Solar Systems
24 pages
Remotesensing 12 01135 With Cover
No ratings yet
Remotesensing 12 01135 With Cover
25 pages
Time and Work Part - 03 (21st June 2023)
No ratings yet
Time and Work Part - 03 (21st June 2023)
21 pages
How To Send Your Data 2018-03
No ratings yet
How To Send Your Data 2018-03
6 pages
An Sith 2022
No ratings yet
An Sith 2022
11 pages
A Review On Classification of Satellite Image Using Artificial Neural Network (ANN)
No ratings yet
A Review On Classification of Satellite Image Using Artificial Neural Network (ANN)
3 pages
Advances in Scene Classification of Remotely Sensed High Resolutin Image and The Existing Datasets PDF
No ratings yet
Advances in Scene Classification of Remotely Sensed High Resolutin Image and The Existing Datasets PDF
5 pages
10 26833-Ijeg 987605-1943956
No ratings yet
10 26833-Ijeg 987605-1943956
10 pages
Earth Observation Data Analytics Using Machine and Deep Learning Modern Tools Applications and Challenges
No ratings yet
Earth Observation Data Analytics Using Machine and Deep Learning Modern Tools Applications and Challenges
258 pages
Deep Learning Models Performance Evaluations For Remote Sensed Image Classification
No ratings yet
Deep Learning Models Performance Evaluations For Remote Sensed Image Classification
10 pages
Geospatial Land Classification Via Advanced Image Processing Using CNN
No ratings yet
Geospatial Land Classification Via Advanced Image Processing Using CNN
4 pages
Coursework - Chapter 2
No ratings yet
Coursework - Chapter 2
17 pages
Remotesensing 12 02503 v3
No ratings yet
Remotesensing 12 02503 v3
28 pages
Mining Land Cover Classification
No ratings yet
Mining Land Cover Classification
17 pages
Automated Land Classification Using AIML
No ratings yet
Automated Land Classification Using AIML
3 pages
Satellite Image Classification Using Deep Learning Approach
No ratings yet
Satellite Image Classification Using Deep Learning Approach
14 pages
Random Forest Algorithm For Land Cover Classification: Arun D. Kulkarni and Barrett Lowe
No ratings yet
Random Forest Algorithm For Land Cover Classification: Arun D. Kulkarni and Barrett Lowe
6 pages
F-47617 Spare Parts
No ratings yet
F-47617 Spare Parts
195 pages
Coursework - Chapter 1
No ratings yet
Coursework - Chapter 1
9 pages
Remote Sensing Image Classification Thesis
100% (2)
Remote Sensing Image Classification Thesis
4 pages
Early Detection of Building Collapse Using Iot
No ratings yet
Early Detection of Building Collapse Using Iot
14 pages
Land Use Classification of High-Resolution Multispectral Satellite Images With Fine-Grained Multiscale Networks and Superpixel Postprocessing
No ratings yet
Land Use Classification of High-Resolution Multispectral Satellite Images With Fine-Grained Multiscale Networks and Superpixel Postprocessing
15 pages
GEE面向对象
No ratings yet
GEE面向对象
17 pages
Methodology For Land Cover Classification Using CNN
No ratings yet
Methodology For Land Cover Classification Using CNN
6 pages
GlobalLocal Multigranularity Transformer For Hyperspectral Image Classification
No ratings yet
GlobalLocal Multigranularity Transformer For Hyperspectral Image Classification
20 pages
1 s2.0 S1110982324000048 Main
No ratings yet
1 s2.0 S1110982324000048 Main
17 pages
SMTGGG
No ratings yet
SMTGGG
24 pages
Land Use and Land Cover Classification B Dhivya Eng22ct0004
No ratings yet
Land Use and Land Cover Classification B Dhivya Eng22ct0004
10 pages
9 - GeoAI Strategies For Sustainable Urban Planning
No ratings yet
9 - GeoAI Strategies For Sustainable Urban Planning
19 pages
Classification of Multi-Spectral Data With Fine-Tuning Variants of Representative Models
No ratings yet
Classification of Multi-Spectral Data With Fine-Tuning Variants of Representative Models
23 pages
0 October 0: Uneven Glove Box Fit: '01 Civic Which PGM Tester Software Should I Use?
No ratings yet
0 October 0: Uneven Glove Box Fit: '01 Civic Which PGM Tester Software Should I Use?
4 pages
Satellite 06-2
No ratings yet
Satellite 06-2
22 pages
A Survey On Satellite Image Processing Using Machine Learning
No ratings yet
A Survey On Satellite Image Processing Using Machine Learning
6 pages
Hybrid Adaptive Neural Network For Remote Sensing Image Classification
No ratings yet
Hybrid Adaptive Neural Network For Remote Sensing Image Classification
10 pages
1-s2.0-S027311772400886X-main - Copie
No ratings yet
1-s2.0-S027311772400886X-main - Copie
11 pages
Text
No ratings yet
Text
8 pages
Lulc Using SVM
No ratings yet
Lulc Using SVM
8 pages
Remotesensing 13 00516 v3
No ratings yet
Remotesensing 13 00516 v3
19 pages
Test #3 - PS 113
No ratings yet
Test #3 - PS 113
1 page
Deep Learning-Based Classification Methods For Rem
No ratings yet
Deep Learning-Based Classification Methods For Rem
10 pages
Image Captioning Using Deep Convolutional Neural N
No ratings yet
Image Captioning Using Deep Convolutional Neural N
14 pages
Land Use Classification Paper
No ratings yet
Land Use Classification Paper
8 pages
Transformer Based Land Use and Land Cover Classification With Explainability Using Satellite Imagery
No ratings yet
Transformer Based Land Use and Land Cover Classification With Explainability Using Satellite Imagery
22 pages
NDVI Versus CNN Features in Deep Learning For Land Cover Clasification of Aerial Images
No ratings yet
NDVI Versus CNN Features in Deep Learning For Land Cover Clasification of Aerial Images
4 pages
96 D-ML LULC Article Revised 4 June 2023
No ratings yet
96 D-ML LULC Article Revised 4 June 2023
9 pages
Urban Land Cover Classification Using Deep Learning.: Prof. Rushali Patil Priyanshu Rawat Abhishek Kumar Surender Singh
No ratings yet
Urban Land Cover Classification Using Deep Learning.: Prof. Rushali Patil Priyanshu Rawat Abhishek Kumar Surender Singh
9 pages
Confrence Paper Satellite Springer Format
No ratings yet
Confrence Paper Satellite Springer Format
14 pages
Adrija NTCC File
No ratings yet
Adrija NTCC File
56 pages
Deep Learning For Built-Up Fractional Mapping
No ratings yet
Deep Learning For Built-Up Fractional Mapping
7 pages
Spectralformer: Rethinking Hyperspectral Image Classification With Transformers
No ratings yet
Spectralformer: Rethinking Hyperspectral Image Classification With Transformers
13 pages

Spatial Gated Multi-Layer Perceptron For Land Use and Land Cover Mapping

Uploaded by

Spatial Gated Multi-Layer Perceptron For Land Use and Land Cover Mapping

Uploaded by

This article has been accepted for publication in IEEE Geoscience and Remote Sensing Letters.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1

Spatial Gated Multi-Layer Perceptron for Land Use

L And use and land cover (LULC) change is one of the

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 2

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 3

Height Width Channel

Fig. 2: Graphical representation of MLP-Mixer layer.

D. Spatial Gating Unit Multi-layer Perceptron (SGU-MLP): B. Classification Results

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 4

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 5

(a) (b) (c) (d) (e) (f) (g)

Forest Residential Industrial Low plants Allotment Water

(a) (b) (c) (d) (e)

Forest Residential Industrial Soil Low plants Allotment Water

(a) (b) (c) (d) (e)

You might also like