Cyc 6
Cyc 6
Article
Estimation of Tropical Cyclone Intensity via Deep Learning
Techniques from Satellite Cloud Images
Biao Tong 1 , Jiyang Fu 1 , Yaxue Deng 1 , Yongjun Huang 2 , Pakwai Chan 3 and Yuncheng He 1, *
1 Research Center for Wind Engineering and Engineering Vibration, Guangzhou University,
Guangzhou 510006, China; [email protected] (B.T.); [email protected] (J.F.);
[email protected] (Y.D.)
2 School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, China;
[email protected]
3 Hong Kong Observatory, Hong Kong 999077, China; [email protected]
* Correspondence: [email protected]; Tel.: +86-133-1286-1586
Abstract: Estimating the intensity of tropical cyclones (TCs) is usually involved as a critical step in
studies on TC disaster warnings and prediction. Satellite cloud images (SCIs) are one of the most
effective and preferable data sources for TC research. Despite the great achievements in various
SCI-based studies, accurate and efficient estimation of TC intensity still remains a challenge. In
recent years, machine learning (ML) techniques have gained fast development and shown significant
potential in dealing with big data, particularly with images. This study focuses on the objective
estimation of TC intensity based on SCIs via a comprehensive usage of some advanced deep learning
(DL) techniques and smoothing methods. Two estimation strategies are proposed and examined
which, respectively, involve one and two functional stages. The one-stage strategy uses Vision
Transformer (ViT) or Deep Convolutional Neutral Network (DCNN) as the regression model for
directly identifying TC intensity, while the second strategy involves a classification stage that aims
to stratify SCI samples into a few intensity groups and a subsequent regression stage that specifies
the TC intensity. Further efforts are made to improve the estimation accuracy by using smoothing
manipulations (via four specific smoothing techniques) in the scenarios of the aforementioned two
strategies and their fusion. Results show that DCNN performs better than ViT in the one-stage
strategy, while using ViT as the classification model and DCNN as the regression model can result
Citation: Tong, B.; Fu, J.; Deng, Y.;
in the best performance in the two-stage strategy. It is interesting that although the strategy of
Huang, Y.; Chan, P.; He, Y. Estimation
singly using DCNN wins out over any concerned two-stage strategy, the fusion of the two strategies
of Tropical Cyclone Intensity via
Deep Learning Techniques from
outperforms either the one-stage strategy or the two-stage strategy. Results also suggest that using
Satellite Cloud Images. Remote Sens. smoothing techniques are beneficial for the improvement of estimation accuracy. Overall, the best
2023, 15, 4188. https://doi.org/ performance is achieved by using a hybrid strategy that consists of the one-stage strategy, the two-
10.3390/rs15174188 stage strategy and smoothing manipulation. The associated RMSE and MAE values are 9.81 kt and
7.51 kt, which prevail over those from most existing studies.
Academic Editor: Yuriy Kuleshov
Received: 2 August 2023 Keywords: tropical cyclone; satellite cloud image; intensity estimation; ViT; DCNN
Revised: 23 August 2023
Accepted: 24 August 2023
Published: 25 August 2023
1. Introduction
TCs are highly destructive natural disasters, and accurate assessment of their activities
Copyright: © 2023 by the authors.
are essential for the prevention and reduction of TC disasters. Among all TC parameters,
Licensee MDPI, Basel, Switzerland. intensity is perhaps the most complex one as it not only physically depends upon lots
This article is an open access article of factors, such as background environment, TCs’ inner structures and their interactions,
distributed under the terms and but also capriciously varies with location and time sometimes. Consequently, research
conditions of the Creative Commons on TC intensity has been a major priority in the fields of meteorology and oceanography.
Attribution (CC BY) license (https:// However, due to the vast scale of TC structures and their complicated evolution in spatial-
creativecommons.org/licenses/by/ temporal domain during the life-cycle, it remains a challenge to characterize TC intensity
4.0/). via earth-based instruments especially before TCs’ landfall.
With the help of ever developing equipment and technology, humans can now obtain
more credible information of TCs even over seas from reconnaissance aircrafts and airborne
devices. However, such instruments are too expensive for routine usage at a global scale,
which restricts the coverage of associated observations in terms of space and time. By
contrast, satellite remote sensing data, in particular SCIs, provide abundant information of
TCs and accompanied background environment over vast oceans in an uninterrupted way.
As a result, they have been widely used for both academic studies and application practices.
Continuous efforts have been made to estimate TC intensity from SCIs. The Dvorak
technique, initially documented by Dvorak, is a set of TC analytical methods that identify
TC fingerprints and produce preliminary judgments of TC intensity, and then utilize dif-
ferent cloud textures and change patterns to refine final estimations [1]. This technique
has been further developed such as by Velden et al. [2–4]. Many other estimation methods
have also been proposed since the beginning of this century, including the fixed-intensity
Advanced Microwave Sounding Unit (AMSU) method [5], manual techniques for intensity
estimation using SSM/I images [6], a near-real-time technique for characterizing the shape
and dynamics of TCs and correlating them with TC intensity [7], and a TC intensity estima-
tion method using spatial characteristic analogue in satellite data [8]. Moreover, studies on
the deviation angle variation (DAV) method for estimating TC intensity using geostationary
infrared (IR) brightness and temperature data [9], estimation of wind speed at flight altitude
using conventional TC information and IR satellite images [10], multiple linear regression
models for estimating TC intensity by IR satellite images [11], and empirical estimation
of TC intensity by the SATCON weighted consensus algorithm [12] have been conducted
as well.
Despite their widespread use in meteorology, the aforementioned techniques face
some challenges when applied in practice. Typically, many techniques involve experience-
based manipulations, which make the estimations tend to suffer from low efficiency and
subjective errors. Therefore, there is a need to develop high-precision and objective methods
for estimating TC intensity.
In recent years, machine learning (ML) techniques have gained fast development [13,14]
and shown significant potential in dealing with many meteorological issues [15–18]. Among
various ML techniques, the convolutional neural network (CNN) has attracted more and
more attention for SCI-based estimation of TC intensity since, as an abstract feature ex-
traction technology, it is capable of retrieving highly generalized information of TCs as
well as identifying and classifying complex TC images [19,20]. For example, Combinido
et al. [21] examined the performance of the VGG19 model driven by grayscale infrared (IR)
images. Wimmers et al. [22] used 2D-CNN approach based on satellite passive microwave
imagery. Chen et al. [23] built a CNN-TC regression model by taking into account the
domain knowledge of meteorologists. Wang et al. [24] developed a CNN-based model
with the help of H-8 geostationary satellite IR imagery. Zhang et al. [25] proposed a two-
branch CNN model on the basis of IR and water vapor (WV) images. Lee et al. [26] further
employed 3D-CNN to investigate the correlation between multi-spectral geostationary
satellite images and TC intensity.
Another ML technique that varies significantly from CNN and its derivatives is
Transformer, which currently dominates in the field of natural language processing (NLP).
While CNN is operationally based on convolution calculations (which are good at capturing
local features), Transformer is established on the basis of self-attention mechanism. This
completely different mechanism enables Transformer to extract global features from longer
sequences and improve computational efficiency through performing parallel computation
during training and inference. In light of the striking success of Transformer in NLP,
Dosovitskiy et al. [27] expanded it to the vision field and proposed Vision Transformer (ViT).
Since its debut, ViT has achieved remarkable success in the vision field, outperforming most
existing CNN models [28]. Undoubtedly, ViT has significant potential in meteorological
remote sensing. In fact, Bi et al. [29] demonstrated that training ViT models with large
amounts of reanalysis data can generate better results than those from numerical weather
Remote Sens. 2023, 15, x FOR PEER REVIEW 3 of 27
Remote Sens. 2023, 15, 4188 outperforming most existing CNN models [28]. Undoubtedly, ViT has significant poten- 3 of 26
tial in meteorological remote sensing. In fact, Bi et al. [29] demonstrated that training ViT
models with large amounts of reanalysis data can generate better results than those from
numerical weather prediction (NWP). However, to the authors’ best knowledge, no stud-
prediction (NWP). However, to the authors’ best knowledge, no studies have been reported
ies ViT-aided
on have beenestimation
reported on ofViT-aided estimation of TC intensity.
TC intensity.
It should be noted that the performance
It should be noted that the performance of of ML
ML models
models markedly
markedly depends
depends on on the
the
quality and amount of input data, and it is not uncommon that a model
quality and amount of input data, and it is not uncommon that a model performs well for performs well for
some cases whilst it becomes degraded for others. Therefore, a cluster
some cases whilst it becomes degraded for others. Therefore, a cluster of ML techniques of ML techniques
may be
may be adopted
adopted concurrently
concurrently to to exert
exert hybrid-related
hybrid-related advantages.
advantages.
This study
This study focuses
focuses on on ML-aided
ML-aided estimation
estimation of of TC
TC intensity
intensity based
based on on SCIs.
SCIs. ViT
ViT isis
adopted for the first time to estimate TC intensity and its performance
adopted for the first time to estimate TC intensity and its performance is examined through is examined
through comparison
comparison with CNN. with CNN.
More More importantly,
importantly, special
special efforts are efforts
made toare made to
improve theimprove
estima-
tion accuracy by comprehensive usage of multiple hybrid strategies. The remainderThe
the estimation accuracy by comprehensive usage of multiple hybrid strategies. re-
of the
mainder of the article is organized as follows. After an introduction of
article is organized as follows. After an introduction of the datasets, data pre-processing the datasets, data
pre-processing
and evaluation andmethodsevaluation methods
in Section in Section
2, detailed 2, detailed
performance of performance
each ML model of each ML
and ML-
modelstrategy
aided and ML-aided strategy
is presented andisdiscussed
presentedinand discussed
Section 3. Mainin Section
findings3.and
Main findings and
conclusions are
conclusions are summarized
summarized in Section 4. in Section 4.
2. Methodology Statement
The adopted
adopted methodology
methodologyisisdepicted
depictedininFigure
Figure
1, 1, which
which mainly
mainly consists
consists of fol-
of the the
following four
lowing four links:
links: obtaining
obtaining SCIs
SCIs fromopen-source
from open-sourcedatabases,
databases,conducting
conducting pre-processing
pre-processing
manipulations
manipulations (i.e.,
(i.e.,data
dataaugmentation
augmentationand andsegmentation),
segmentation),training and
training andvalidating different
validating differ-
models, as well as analyzing and comparing estimation results.
ent models, as well as analyzing and comparing estimation results.
Two basic DL-aided strategies are utilized to estimate TC intensity (in terms terms of
of max-
max-
imum sustained wind, or MSW): the one-stage strategy that uses ViT or DCNN as the
regression
regression model
model for directly identifying MSW, MSW, and and the two-stage strategy which involves
a classification stage that aims to stratify SCI samples into a few intensity groups and a
subsequent regression stage that specifies MSW. Further efforts efforts are made to improve the
estimation
estimation accuracy by using smoothing manipulations (via 44 techniques)
accuracy by using smoothing manipulations (via techniques) inin the
the scenarios
scenarios
of the two basic strategies and their fusion (i.e., a hybrid
of the two basic strategies and their fusion (i.e., a hybrid strategy).strategy).
The
The primary
primary idea
idea behind
behind thethe two
two basic
basic strategies
strategies lies
lies in
in that
that input
input SCI
SCI samples
samples are
are
often
often unevenly
unevenly distributed
distributed in in varied
varied intensity
intensity groups. While the
groups. While the groups
groups containing
containing more
more
credible samples tend
credible samples tendto togenerate
generateideally
ideallyparameterized
parameterized models
models andand better
better estimation
estimation re-
results, those with fewer samples are likely to suffer from insufficiently
sults, those with fewer samples are likely to suffer from insufficiently training and inferior training and
inferior model performance.
model performance. Thus, it isThus, it is expected
expected that the two-stage
that the two-stage strategy isstrategy
helpfulisforhelpful
mini-
for minimizing the negative effects on those fewer-sample-featured
mizing the negative effects on those fewer-sample-featured groups caused by the groups caused by“re-
the
“resourceful” groups.
sourceful” groups.
Different
Different from
fromthe theabove
aboveidea
ideawhich
whichtries to to
tries improve
improve thethe
estimation results
estimation during
results the
during
identifying process (or simply, process-oriented), smoothing manipulation
the identifying process (or simply, process-oriented), smoothing manipulation aims to aims to refine
the final results through the fusion of outputs from varied DL models or those from the
same model but at different time steps (or result-oriented).
Remote Sens. 2023, 15, 4188 4 of 26
2.1. Datasets
2.1.1. Data Sources
The SCI data are derived from the Archives of Weather Home, Kochi University, Japan
(http://weather.is.kochi-u.ac.jp/archive-e.html, accessed on 30 July 2022), which were
captured by geostationary satellites “Himawari-8” and “MTSAT-1R” over the Northwest
Pacific Ocean. Each grayscale infrared image (IR, 10.2–12.5 µm) contains 1800 × 1800 pixels
that correspond to a geographic area of 70◦ N–20◦ S, 70◦ E–160◦ E. In total, 222,212 images are
exploited in this study, which were taken at 1 h intervals during the life-cycles of 546 TCs
from 2000 to 2021.
Corresponding label information, i.e., TC trajectory and intensity (defined as 10 min
mean MSW; unit: knot or kt, 1 kt = 1.85 km/h = 0.514 m/s), is available from the Japan
Meteorological Agency (JMA, Tokyo, Japan, https://www.data.jma.go.jp/, accessed on
15 June 2022). These labels are updated every 3 or 6 h. Note that the MSW values are
provided in a form of integral multiples of 5 kt, and they would be marked as zero for
MSW < 35 kt. JMA also stratifies TCs into 4 intensity categories according to MSW: typhoon
(TY, 35–63 kt), strong typhoon (STY, 64-84 kt), very strong typhoon (VSTY, 85-104 kt), violent
typhoon (VTY, >105 kt).
Besides the SCI datasets, this study also considers the reanalysis data estimated,
respectively, via ADT (https://tropic.ssec.wisc.edu/real-time/adt/adt.html, accessed on
20 June 2022) and SATCON methods (https://tropic.ssec.wisc.edu/real-time/satcon/,
accessed on 20 June 2022) for comparison purpose. These data are documented at 30 min
intervals, and the TC intensity is expressed as the 1 min mean MSW which differs from
the one issued by JMA. The method presented by Harper et al. [30] is adopted to convert
between 1 min mean and 10 min mean MSWs.
issues, the samples of processed SCIs in categories with higher TC intensity levels are
still lacking. By trial and error, better results can be achieved when the SCI samples are
stratified into two categories: the one with MSW > 64 kt (referred to as STYS) and the
Remote Sens. 2023, 15, x FOR PEER REVIEW 5 of 27
one with MSW < 64 kt (referred to as TY). Thus, this stratification is exploited for the
two-stage strategy.
2.1.3. On
Segmentation
the other hand,and Standardization
data augmentation aims to deal with the issue related to unbal-
anced After pre-processing,samples
distribution of SCI among
the samples aredifferent
segmentedstatuesintoofthree
TC intensity.
sets, i.e.,Typically,
training set,the
life-cycle of a TC consists of relatively longer periods of low-to-moderate
validation set and testing set, which are, respectively, used for training, validating and intensity status
and shorter
testing the DLepisodes
networks. of In
high-intensity
total, 158,260 status.
SCIs forThis
330 unbalanced
TCs from 2000 distribution
to 2013 areof intensity
selected as
status
the and therefore
training set (referSCIto assamples can degrade
TG, hereafter), 52,032theSCIs
training
for 113 quality for the
TCs from 2014models
to 2017which
are
usually require
selected that the(VG),
for validation inputand samples
11,921should
SCIs for be103
evenly
TCs distributed
from 2018 toalong2021with the key
are selected
targeted
for testing.parameter (i.e., TC intensity).
Basic information As demonstrated
of the three sets is tabulated in Figure
in Table2c–j,
1. eight
Note specific
that fordata
the
augmentation manipulations, including image flipping, multi-angle
smoothing strategy, the testing set is further divided into two parts: one part (SCIs in rotation and noise
addition, are
2020–2021) usedemployed
to fit thein this studymodels,
smoothing to collectively
while the (i.e.,
otherregardless
part (SCIsof in
intensity
2018–2019) category)
used
increase
to test thethe number ofofTC
performance samples.models.
smoothing Images ItofisTCs in thetoTY
essential intensity category
acknowledge that the are then
practice
randomly
of down-sampled
partitioning data by year to improve athe
introduces balance
certain degreeof samples among
of bias, given thatdifferent
more recent intensity
data
tends to possess higher quality. However, even more crucially, this strategy guarantees
categories.
completeThereindependence
are two points between distinct datasets.
to be stressed. First, to ensure the objectivity and credibility of
testing results, the dataset for testing (to be discussed in the following section) has only
Table 1. The number
experienced croppingof samples.
manipulation, whilst no operations for the aforementioned data
augmentation have been conducted, as artificial transformations tend to destroy TCs’
Years TCs SCI Samples
morphological structures and make the processed SCIs physically meaningless. Second,
althoughTrainthe process of data2000–2013augmentation does moderate 330 158,260
unbalanced-distribution-re-
Validation 2014–2017 113 52,032
lated issues, the samples of processed SCIs in categories with higher TC intensity levels
Test 2018–2021 103 11,920
are still lacking. By trial and error, better results can be achieved when the SCI samples
are stratified into two categories: the one with MSW > 64 kt (referred to as STYS) and the
Meanwhile, both the pixel sizes and pixel values of SCIs for all the three sets are
one with MSW < 64 kt (referred to as TY). Thus, this stratification is exploited for the two-
standardized to meet the input requirements of DL models. Each SCI is resized to contain
stage strategy.
128 × 128 pixels for the DCNN model and 224 × 224 pixels for the ViT model, while the
pixel values are normalized to be in the range of [–1, 1]. The normalization process is also
2.1.3. Segmentation and Standardization
helpful for enhancing the convergence during model training.
After pre-processing, the samples are segmented into three sets, i.e., training set, val-
idation
2.2. DCNN set and
Model testing set, which are, respectively, used for training, validating and testing
the DLCNN is a kindIn
networks. of total, 158260 based
ML network SCIs for
on 330 TCs from
supervised 2000 toIt2013
learning. are selected
has strong as the
adaptability
training set (refer to as TG, hereafter), 52032 SCIs for 113 TCs from
and is good at mining local features of data, extracting global training features and classifi-2014 to 2017 are se-
lected for validation (VG), and 11921 SCIs for 103 TCs from
cation. However, simple CNN becomes unable to meet the universality and accuracy of2018 to 2021 are selected for
testing. Basic information of the three sets is tabulated in Table 1. Note that for the smooth-
ing strategy, the testing set is further divided into two parts: one part (SCIs in 2020–2021)
used to fit the smoothing models, while the other part (SCIs in 2018–2019) used to test the
performance of smoothing models. It is essential to acknowledge that the practice of par-
titioning data by year introduces a certain degree of bias, given that more recent data tends
pixel values are normalized to be in the range of [–1, 1]. The normalization process is also
helpful for enhancing the convergence during model training.
Figure 3.
Figure 3. Structures of
of the
the DCNN
DCNN network.
network.
Functionally,
Functionally, the
the input
input and
and hidden
hidden layers
layers cooperate
cooperate to to extract
extract any
any potential
potential features
features
from the SCIs for identifying TC intensity, while the output layer conducts
from the SCIs for identifying TC intensity, while the output layer conducts judgments and judgments
and decisions
decisions according
according to extracted
to the the extracted results.
results. It isItclear
is clear
thatthat characterizing
characterizing TC TC intensity
intensity es-
essentially
sentially belongs to a regression problem. Therefore, the mean squared error (MSE) loss
belongs to a regression problem. Therefore, the mean squared error (MSE) loss
function
function (Equation
(Equation(1))
(1))and
andcross-entropy
cross-entropyloss
lossfunction
function (Equation
(Equation(2)) areare
(2)) adopted
adopted herein to
herein
quantify
to quantifythethe
consistency
consistencyof of
predictions against
predictions againstthetheactual results
actual forfor
results thethe
regression
regression models
mod-
and category models, respectively:
els and category models, respectively:
2 2
11n n ^
Loss
LossMSEMSE=
= (ŷ( y−i −yi )yi )
NN∑i = 1 i
(1)
i =1
1 N M c
N i∑ ∑ yi · ln[ p(yic )]
Loss cross−entropy = − (2)
=1 c =1
where ŷi represents the prediction of the true MSW values yi , N is the number of SCI
samples, yic is the label of the c-th classification (1 for positive judgments and 0 for negative
judgments) for the i-th SCI, M is the number of categories, and p(yic ) denotes the probability
of the prediction associated with yic , which can be expressed via the softmax function:
exp( f yc )
p(yic ) = i
(3)
∑C
c=1 exp( f yic )
where f yc is the original score of the model for prediction yic , which is calculated by the
i
output layer based on the 1000 × 1 dimensional output vector x (or the characteristic vector)
from previous layers:
f yc = Wx + b (4)
i
Remote Sens. 2023, 15, 4188 7 of 26
in which W (with dimensions 2 × 1000) represents the coefficient matrix which quantifies
the weight for each element in x during the judging/prediction process, and b (with
dimensions 2 × 1) is the bias vector.
Both W and b should be determined through training. In this study, the stochastic
gradient descent (SGD) method is utilized to provide efficient estimation of the model
parameters. SGD iteratively updates the W and b by computing gradients of the loss
function and adjusting them in the direction that minimizes the loss. Moreover, the model
involves a few hyperparameters, including the number of neural network nodes, the
learning rate and epoch. These parameters are usually pre-set and adjusted empirically
based on training results. Based on previous tests, the models in this study uses a learning
rate of 0.001, with a batch size of 64–128.
Remote Sens. 2023, 15, x FOR PEER REVIEW 8 of 27
The model generates a predicted value for each SCI, which ranges from 0 to 1. As
the labels in the regression models are normalized using Min-Max scaling, the predictions
from the model should be dimensionalized to the standard MSW scale through reverse
calculation module (right part of the figure). A typical workflow for ViT involves the fol-
normalization. For the classification model, a threshold of 0.5 is used to determine whether
lowing several procedures: dividing the input images into blocks with a certain size, reas-
the SCI belongs to the STYS or TY category. Samples with predictions greater than 0.5 are
sembling the divided image blocks into a sequence, transferring the combined results to
classified as STYS, while those with a value less than or equal to 0.5 are classified as TY.
the multi-head self-attention for feature extraction, and performing classification.
Taking
2.3. The Figure 4 as an example, the left part of the figure (i.e., Patch + Position Em-
ViT Model
bedding and Transformer Encoder) corresponds to the realization process for feature ex-
Transformer is a novel neural network architecture that mainly utilizes self-attention
traction. The main function of “Patch + Position Embedding” is to divide the input image
mechanism to extract internal features. Its network architecture is primarily constructed
x, (x∈RH×W×C, where H/W and C represent the sizes of the image and the number of chan-
around the attention mechanism. Based on the input information, the self-attention mecha-
nels) into a number of sub-images xp, (xp∈RN × P^2 × C), where N(= 9 herein) represents
nism first generates three vectors, namely Query (Q), Key (K) and Value (V), through matrix
the number of sub-images, P represents the size of sub-image. This processing is also
transformation. Then, these vectors undergo multiple matrix operations and weightings,
termed as
through convolution,
which the mostwhich uses ainformation
significant sliding window can be with a specific
enhanced stepthe
while size. These
less sub-
relevant
images are then transformed into long vectors using a linear transformation.
information tends to be weakened. This is similar to the dot product operation of two Each vector
is combined
vectors: with a position-encoded
the calculation result tends to be vector, as depicted
maximized in Figure
for similar 4, the
vectors, number
whilst 1 tobe
it would 9.
This position
minimized for encoding vectorvectors.
two orthogonal is learnable and cansuch
By repeating be adjusted
attentionautomatically
operations, thethrough
model
training. Each sub-image vector with the position information
can output a set of feature vectors that selectively emphasize the salient is called a token. Notably,
information in
a special
the input. class (Cls) token is inserted at position 0 in Figure 4, which aggregates infor-
mationViTfrom the entire
is actually input sequence
an expanded versionintoofathe
vector for theTransformer
standard classificationintask.
the After
visionpatch-
field.
ing and Position Embedding, the input tensor is processed through
Figure 4 shows the inner structure of a ViT model [27]. There are two main functional mod- the Transformer En-
coderfeature
ules: for computation.
extraction In the self-attentive
module computation
(i.e., Transformer Encoder) of Transformer, each token
and classification in the
calculation
input tensor
module (rightispart
attention-weighted
of the figure). A and summed
typical workflow overfortheViT
other tokensthe
involves tofollowing
generate the cor-
several
responding contextual
procedures: dividing the representation.
input imagesFinally, afterwith
into blocks several Transformer
a certain cycles, the clas-
size, reassembling the
sification information vector is passed to the classification computation
divided image blocks into a sequence, transferring the combined results to the multi-head module for scor-
ing and generating
self-attention the final
for feature output.and performing classification.
extraction,
Figure 4.
Figure 4. Analysis
Analysis of
of the
the overall
overall structure
structure of
of ViT.
ViT.
( )
P 2 ×C × D
z0 = xclass ; x1p E; x 2p E; L; x pN E + E pos , E ∈ R ,E pos ∈ R(N +1)× D (5)
Taking Figure 4 as an example, the left part of the figure (i.e., Patch + Position Em-
bedding and Transformer Encoder) corresponds to the realization process for feature
extraction. The main function of “Patch + Position Embedding” is to divide the input
image x, (x∈RH×W×C , where H/W and C represent the sizes of the image and the number
of channels) into a number of sub-images xp, (xp∈RN × Pˆ2 × C), where N(= 9 herein)
represents the number of sub-images, P represents the size of sub-image. This processing is
also termed as convolution, which uses a sliding window with a specific step size. These
sub-images are then transformed into long vectors using a linear transformation. Each vec-
tor is combined with a position-encoded vector, as depicted in Figure 4, the number 1 to 9.
This position encoding vector is learnable and can be adjusted automatically through
training. Each sub-image vector with the position information is called a token. Notably, a
special class (Cls) token is inserted at position 0 in Figure 4, which aggregates information
from the entire input sequence into a vector for the classification task. After patching and
Position Embedding, the input tensor is processed through the Transformer Encoder for
computation. In the self-attentive computation of Transformer, each token in the input
tensor is attention-weighted and summed over the other tokens to generate the correspond-
ing contextual representation. Finally, after several Transformer cycles, the classification
information vector is passed to the classification computation module for scoring and
generating the final output.
h i
( P2 ×C )× D
z0 = xclass ; x1p E;x2p E; L;x N
p E + Epos , E ∈ R , Epos ∈ R( N +1)× D (5)
z0 l = MSA(LN(zl −1 )) + zl −1 , l = 1 . . . L (6)
y = LN z0L (8)
Mathematically, the above process can be summarized as Equation (5), where xclass rep-
resents the class token vector, i.e., the asterisk of the yellow forms in Figure 4; x Np represents
each sub-image, and E represents a linear projection layer (or the fully connected layer);
xNp E represents sub-image vector after transformation; Epos represents the position coding
information vector; while z0 is the processed input of Transformer Encoder. Next, the opera-
tions in Equations (6) and (7) will be repeated L times, where MSA indicates the multi-head
self-attention operation [32] of the Transformer, and MLP represents multi-layer perceptron
operations. Furthermore, regularization is required before the operations, denoted by LN.
In Equations (5)–(7), z0l , zl , and zl −1 /z0l −1 represent the result of multi-head self-attention
calculation, the calculation result of a complete transformer block, and the corresponding
residual connection, respectively. After several looped calculations, the output z0L (the final
classification information vector) will be regularization by Equation (8), which is regarded
as a feature of the entire image to carry out the classification or regression task.
2.5.2. SATCON
Advanced satellite consensus (SATCON) [12,34] combines ADT estimation with other
methods for estimating TC intensity based on satellite remote sensing, including AMSU,
SSMIS, and ATMS and finally develops into a global TC intensity ensemble estimation
system. Specifically, SATCON utilizes statistical weighting methods to maximize the
advantages (or minimize the disadvantages) of each type of technology and generates a
consensus strength estimation for various TC structures. The statistical validation of this
method indicates that it is technically equivalent to the DT used by most meteorological
organizations; however, in some cases, the algorithm can outperform the DT, and the root
mean square error of its intensity is also lower than that of most current techniques. In
addition, this method has its advantages, such as alerting forecasters to rapid changes in
TC intensity that traditional methods (such as DT) may be unable to capture. Although
SATCON performs better than other methods for estimating TC intensity, it still has some
limitations, especially for real-time applications, as the estimation always depends on
certain satellite data. As a result, it fails to continuously work or to provide constant
feedback in time.
1 n
n j∑
MAE = pj − oj (10)
=1
100% n p j − o j
n j∑
MAPE = (11)
=1
oj
Remote Sens. 2023, 15, 4188 10 of 26
where p j represents the prediction (estimated values, which are similar to the variable ŷi
in Equation (1)) of observation values (true values, which are similar to the variable yi in
Equation (1)) o j , and n represents the number of samples.
For the classification models in this study, the performance is qualified via precision
(P), recall rate (R), and F1 score (F). Table 2 presents the confusion matrix which compiles
the classifier results for calculating the PRF values. Here, NTP represents true positive
prediction, NTN denotes true negative prediction, NFP refers to false positive prediction, and
NFN stands for false negative prediction. It is clear from the definitions (Equations (12)–(15))
of PRF, P indicates the accuracy of positive predictions, R represents the percentage of
correctly identified positive samples among all positive samples, while F1-score is used to
evaluate the overall performance of the model as it provides the harmonic mean of P and R.
NTP + NTN
Accuracy = (12)
NTP + NTN + NFP + NFN
NTP
Precision = (13)
NTP + NFP
NTP
Recall = (14)
NTP + NFN
Recall × Precision
F1 = 2 (15)
Recall + Precision
Predicted
Confusion Matrix
Positive Negative
Positive NTP NFN
Actual
Negative NFP NTN
Figure5.5.TC
Figure TCintensity regression
intensity model
regression learning
model curves:
learning (a) DCNN;
curves: (b) ViT.(b) ViT.
(a) DCNN;
Figure 6. Estimations
Figure 6. Estimations from
from validating (Val) and
validating (Val) and testing
testing (test)
(test) processes
processes of
of DCNN
DCNN and and ViT
ViT for
for the
the
one-stage
one-stage strategy,
strategy,compared
comparedwithwithbest-track
best-track data. Red
data. Redline denotes
line linear
denotes fit of
linear fit estimation in function
of estimation in func-
tion
of of best-track
best-track data:data: (a) DCNN
(a) DCNN validation,
validation, (b)validation,
(b) ViT ViT validation, (c) DCNN
(c) DCNN testing,
testing, andViT
and (d) (d)testing.
ViT test-
ing.
Results in Figure 6 also indicate that both the DCNN and ViT models tend to underes-
timate
Tothe TC intensity
further explorefor
thesamples with high
above finding, intensity
Figure levels.the
7 examines This trend is consistent
distribution with
of estimation
the fact(in
errors that the number
terms of bothofabsolute
SCI samples decreases
and relative with i.e.,
errors, increasing
RMSE TC andintensity.
MAPE) with geo-
To further
graphic explore
coordinate. To the above
better finding, Figure
understand 7 examines
the results exhibitedthe in
distribution
the figure,ofFigure
estimation
8 de-
errors (in terms of both absolute and relative errors, i.e., RMSE and MAPE) with
picts the appearance probability of TC geneses and TCs with different intensity levels. geographic
From Figure 7, large RMSE values are basically located at: (i) Luzon peninsula and sur-
rounding areas to its west/northwest where TCs (usually with high-intensity levels) are
markedly influenced by landfall-related effects, (ii) southeast of the Northwest Pacific that
is dominated by TC geneses, (iii) central south of the Northwest Pacific where both TC
Remote Sens. 2023, 15, 4188 12 of 26
coordinate. To better understand the results exhibited in the figure, Figure 8 depicts the
appearance probability of TC geneses and TCs with different intensity levels. From Figure 7,
large RMSE values are basically located at: (i) Luzon peninsula and surrounding areas to
its west/northwest where TCs (usually with high-intensity levels) are markedly influenced
by landfall-related effects, (ii) southeast of the Northwest Pacific that is dominated by TC
geneses, (iii) central south of the Northwest Pacific where both TC geneses and stronger
TCs usually exist. By contrast, conditions for the relative error vary from those for RMSE
significantly: almost all large MAPE values site around the periphery of TC-influenced
areas where TCs tend to dissipate, whilst the central areas are featured by small values.
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 27
There are also some patches where large values of both RMSE and MAPE exist, e.g., (155◦ E,
◦
9 N). From Figure 8, the appearance probability of TCs at these locations is quite low.
Figure7.7.Geographic
Figure Geographicdistribution
distributionofofestimation
estimationerrors
errorsforfor
DCNN
DCNN and ViT
and from
ViT fromone-state strategy:
one-state strategy:
(a)
(a)RMSE
RMSEfor forDCNN,
DCNN,(b)
(b)RMSE
RMSEfor forViT,
ViT,(c)(c)
MAPE forfor
MAPE DCNN,
DCNN, andand
(d)(d)
MAPE
MAPEfor for
ViT.ViT.
The above findings can be reasonably explained by: (a) the morphological structures
of TC geneses and TCs during or after landfall are much more complicated, whilst this
complexity makes the input samples to be insufficient for training versatile DL models
adequately; (b) the SCI samples of TCs with higher intensity levels are fewer than those
with low-intensity-featured TCs, which degrades the model performance for stronger TCs.
It is worth noting that the utilization of image transportation during data pre-processing
can do reduce the negative influence caused by imbalanced distribution of samples to some
extent. However, to improve the model substantially, more data that cover each typical
condition are still required.
Figure 8. Geographic distribution of appearance probability of: (a) TCs, (b) TC genesis, (c) TCs with
MSW > 65 kt, and (d) TCs with MSW > 80 kt.
The above findings can be reasonably explained by: (a) the morphological structures
Remote Sens. 2023, 15, 4188 13 of 26
accuracy of DCNN is similar to the one reported by Wang [24] via CNN, while the ViT
model performs better than DCNN especially for samples with higher intensity levels (by
2.5%). Both models perform noticeably better for TY category than for STYS, which is
attributed
Figure to the factdistribution
7. Geographic that the former category
of estimation contains
errors larger
for DCNN andamount ofone-state
ViT from data andstrategy:
allows
the models to be trained more efficiently.
(a) RMSE for DCNN, (b) RMSE for ViT, (c) MAPE for DCNN, and (d) MAPE for ViT.
The above findings can be reasonably explained by: (a) the morphological structures
of TC geneses and TCs during or after landfall are much more complicated, whilst this
complexity makes the input samples to be insufficient for training versatile DL models
adequately; (b) the SCI samples of TCs with higher intensity levels are fewer than those
with low-intensity-featured TCs, which degrades the model performance for stronger
TCs. It is worth noting that the utilization of image transportation during data pre-pro-
cessing can do reduce the negative influence caused by imbalanced distribution of sam-
ples to some extent. However, to improve the model substantially, more data that cover
each typical condition are still required.
Figure9.9.Learning
Figure Learningcurves
curvesof
ofclassification
classificationmodels:
models:(a)
(a)DCNN;
DCNN;(b)
(b)ViT.
ViT.
Table4.4.Overall
Table Overallperformance
performanceof
ofthe
theDCNN
DCNNand
andViT
ViTclassification
classificationmodels.
models.
features of the TC cloud, which makes the classification more challenging. Except for the
STY category, the recognition recall for both models in other categories tends to increase
with the increase of intensity level, which is consistent with the fact that the morphological
characteristics of TCs from two farther-spaced intensity categories differ from each other
more clearly.
True Label
DCNN STYS
TY Sum
STY VSTY VTY
TY 6600 768 150 22 7540
Predicted label STYS 1041 1486 1326 527 4380
Sum 7641 2254 1476 549 11920
True Label
ViT STYS
TY Sum
STY VSTY VTY
TY 6775 729 130 17 7651
Predicted label STYS 866 1525 1346 532 4269
Sum 7641 2254 1476 549 11920
RemoteSens.
Remote Sens.2023,
2023,15,
15,xxFOR
FORPEER
PEERREVIEW
REVIEW 16 ofof 27
16 27
majority of input samples, whilst samples beyond the categories of such majority tend to
be treated as the same kind to the majority.
Figure
Figure 10.
Figure10. Boxplots
10.Boxplots of
Boxplotsof estimation
ofestimation biasfor
estimationbias
bias fordifferent
for differentmodels
different modelsvia
models viatwo-stage
via two-stage
two-stage strategies:
strategies:
strategies: (a)
(a)
(a) DCNN_DCNN,
DCNN_DCNN,
DCNN_DCNN, (b)
(b) DCNN_Vit,
DCNN_Vit, (c)
(c) ViT_ViT,
ViT_ViT,
(b) DCNN_Vit, (c) ViT_ViT, and (d)ViT_DCNN. and
and (d)ViT_DCNN.
(d)ViT_DCNN.
Figure
Figure
Figure11 11 examinesthe
11examines
examines thegeographic
the geographicdistribution
geographic distribution
distribution of
ofof estimation
estimation
estimation errors
errors
errors from from
from two
twotwo two-
two-
two-state
state
state strategies,
strategies, i.e.,
i.e., ViT_DCNN
ViT_DCNN and
and ViT_ViT.
ViT_ViT. Comparison
Comparison of
of the
the results
results
strategies, i.e., ViT_DCNN and ViT_ViT. Comparison of the results with those in Figure with
with those
those in7,
in
Figure
Figure 7,7,no
no evident nodifferences
evidentdifferences
evident differences are
are found,are found,the
found,
although although
although theoverall
overall the overall
errors errors
errors
in Figure inFigure
11 in
are Figure
slightly11
11 are
are
larger
slightly
slightly
than thoselarger
larger thanthose
in than
Figure those
7. in inFigure
Figure7.7.
Figure11.
Figure
Figure 11. Geographicdistribution
11.Geographic
Geographic distributionof
distribution ofestimation
of estimationerrors
estimation errorsfrom
errors fromtwo
from twotwo-state
two two-statestrategies:
two-state strategies:(a)
strategies: (a)RMSE
(a) RMSEfor
RMSE for
for
ViT_DCNN,
ViT_DCNN,
ViT_DCNN,(b) (b) RMSE
(b)RMSE for
RMSEfor ViT_ViT,
forViT_ViT, (c)
ViT_ViT,(c) MAPE
(c)MAPE
MAPEforforViT_DCNN,
ViT_DCNN,and and(d)
(d)MAPE
MAPEfor forViT_ViT.
ViT_ViT.
ViT_ViT.
As shown in Figure 12, all the smoothing methods applied resulted in improved es-
timates, with ViT showing a more significant improvement. Among these methods, the
linear weighting method and MLP fit produced the most notable improvements, reducing
the DCNN’s RMSE by approximately 9% and ViT’s by almost 14%. Moreover, a compari-
Remote Sens. 2023, 15, 4188 son with Figure 6 indicates that the smoothing method was effective in reducing under- 16 of 26
estimation errors for high-intensity samples.
Similarly, Figure 13 examines the estimation errors after smoothing using DCNN and
ViTSmoothing
3.3. models. AManipulation
comparison with Figure 7 reveals that the RMSE is significantly lower in
the central as well as in the
3.3.1. Based on One-State southern region of Northwest Pacific, indicating the effective-
Strategy
ness of the smoothing method in reducing errors in high-intensity samples. However,
As shown in Figure 12, all the smoothing methods applied resulted in improved
there is notwith
estimates, muchViTchange
showingin the MAPEsignificant
a more before andimprovement.
after smoothing as shown
Among thesein Figure
methods,13.
Generally, MAPE demonstrates higher sensitivity to observations exhibiting
the linear weighting method and MLP fit produced the most notable improvements, re- abrupt inten-
sity changes,
ducing which are
the DCNN’s RMSEprone
by to occur during9%
approximately coastal
and landfalls or during
ViT’s by almost initial
14%. phases of
Moreover, a
rapid intensification in the open ocean. Consequently, it is plausible that the
comparison with Figure 6 indicates that the smoothing method was effective in reducing smoothing
methods may not
underestimation substantially
errors enhance these
for high-intensity specific scenarios.
samples.
Figure12.
Figure 12.Smoothed
Smoothedestimations
estimationsfrom
from testing
testing process
process of DCNN
of DCNN andand
ViTViT for the
for the one-stage
one-stage strategy
strategy via
via different smoothing methods: (a,e) using linear weighting; (b,f) using GB; (c,g) using RF; (d,h)
different smoothing methods: (a,e) using linear weighting; (b,f) using GB; (c,g) using RF; (d,h) using
using MLP, compared with best-track data.
MLP, compared with best-track data.
Similarly, Figure 13 examines the estimation errors after smoothing using DCNN
and ViT models. A comparison with Figure 7 reveals that the RMSE is significantly
lower in the central as well as in the southern region of Northwest Pacific, indicating
the effectiveness of the smoothing method in reducing errors in high-intensity samples.
However, there is not much change in the MAPE before and after smoothing as shown
in Figure 13. Generally, MAPE demonstrates higher sensitivity to observations exhibiting
abrupt intensity changes, which are prone to occur during coastal landfalls or during initial
phases of rapid intensification in the open ocean. Consequently, it is plausible that the
smoothing methods may not substantially enhance these specific scenarios.
Remote
Remote Sens. 2023,
2023, 15,
15, 4188
x FOR PEER REVIEW 17 18 of 27
of 26
Figure 13.Geographic
Figure13. Geographicdistribution
distributionofofsmoothed estimation
smoothed errors
estimation forfor
errors DCNN_Linear andand
DCNN_Linear ViT_MLP
ViT_MLP
from one-state
from one-statestrategy:
strategy: (a)
(a) RMSE
RMSEfor forDCNN_Linear,
DCNN_Linear,(b)(b)RMSE
RMSEforfor ViT_MLP,
ViT_MLP, (c) (c) MAPE
MAPE for for
DCNN_Linear,and
DCNN_Linear, and(d)
(d)MAPE
MAPE forfor ViT_MLP.
ViT_MLP.
3.3.2.
3.3.2.Based
Basedon onTwo-State
Two-StateStrategy
Strategy
In this section, we present an examination of the smoothing estimates of the two-state
In this section, we present an examination of the smoothing estimates of the two-state
strategy model. Table 8 presents the evaluation indices for the various model combinations.
strategy model. Table 8 presents the evaluation indices for the various model combina-
For the sake of brevity, we use abbreviations of the models to denote the combinations,
tions. For the sake of brevity, we use abbreviations of the models to denote the combina-
such as X_Y_Z, where X is the classification model, Y is the regression model, and Z is the
tions, such as X_Y_Z, where X is the classification model, Y is the regression model, and
smoothing method. Since DCNN performs better than ViT in the one-state strategy, DCNN
isZthen
is theutilized
smoothing
as themethod.
regressionSince DCNN performs better than ViT in the one-state strat-
model.
egy, DCNN is then utilized as the regression model.
Table Table 8 shows
8. Overall that the
performance optimal
of the model
two-state combination
strategy is using ViT for classification,
smoothed models.
DCNN for regression, and MLP for smoothing. However, even with the best combination
Model RMSE (kt) MAE (kt) MAPE R
model, the performance is still not better than the one-state strategy best model after
D_D_Linear 12.07 9.16 0.16
smoothing linear weighting. Our analysis indicated that the classification performance 0.84
has aD_D_GB
significant impact12.05
on the results. For 9.17
example, while0.16 0.84
ViT is only 2% more accurate
D_D_RF 12.00 9.15 0.16 0.84
thanD_D_MLP
DCNN in the classification
12.13 model, it can reduce the 0.16
9.15 RMSE by almost0.84 1 kt. Further
analysis revealed that the
V_D_Linear MAE of TC samples
10.64 8.10 correctly identified
0.15 by ViT was 0.88only 7.4 kt,
but for misclassified samples,
V_D_GB 10.90 it was as high 8.29 as 17.3 kt. This
0.15suggests that enhancing
0.87 the
V_D_RF 10.85 8.30 0.15
classification’s performance can significantly improve the TC estimation. 0.87
V_D_MLP 10.65 8.09 0.15 0.88
Table 8. Overall performance of the two-state strategy smoothed models.
Table 8 shows that the optimal model combination is using ViT for classification,
DCNN for Model
regression, and MLP RMSE (kt)
for smoothing. MAE (kt)
However, even with MAPE R
the best combination
model, D_D_Linear
the performance is still not 12.07 9.16
better than the one-state 0.16
strategy best model 0.84
after
smoothing linear weighting. Our 12.05
D_D_GB analysis indicated 9.17
that the classification
0.16 performance
0.84
has a significant
D_D_RF impact on the results.
12.00For example, 9.15
while ViT is only0.162% more accurate
0.84
than DCNN in
D_D_MLP the classification model,
12.13 it can reduce the
9.15 RMSE by almost
0.16 1 kt. Further
0.84
analysisV_D_Linear
revealed that the MAE of TC samples correctly
10.64 8.10identified by ViT
0.15was only 7.4 kt,
0.88
but for misclassified samples, it was as high as 17.3 kt. This suggests that enhancing the
V_D_GB 10.90 8.29 0.15 0.87
classification’s performance can significantly improve the TC estimation.
V_D_RF 10.85 8.30 0.15 0.87
Figure 14 depicts the distribution of estimation errors of the two best models in this
V_D_MLP
section. In contrast to Figure 11, the 10.65
smoothing method 8.09applied to the0.15 0.88
two-state strategy
Figure 14 depicts the distribution of estimation errors of the two best models in this
section. In contrast to Figure 11, the smoothing method applied to the two-state strategy
Remote Sens. 2023, 15, x FOR PEER REVIEW 19 of 27
Remote Sens. 2023, 15, 4188 18 of 26
resultsin
results inreduced
reducedRMSE
RMSEand andMAPE
MAPEininthe the central
central andand southern
southern regions
regions of Northwest
of Northwest
Pacific. However, the estimates do not show improvement across regions compared
Pacific. However, the estimates do not show improvement across regions compared to the to the
smoothedestimation
smoothed estimationerror
errorfor
forthe
theone-state
one-stateasas shown
shown in in Figure
Figure 13.13.
Figure14.
Figure 14.Geographic
Geographicdistribution
distributionofofsmoothed
smoothed estimation
estimation errors
errors forfor V_D_Linear
V_D_Linear andand V_D_MLP
V_D_MLP
fromtwo-state
from two-statestrategy:
strategy:(a)(a) RMSE
RMSE forfor V_D_Linear,
V_D_Linear, (b) (b)
RMSERMSE for V_D_MLP,
for V_D_MLP, (c) MAPE
(c) MAPE for V_D_Lin-
for V_D_Linear,
ear,(d)
and andMAPE
(d) MAPE for V_D_MLP.
for V_D_MLP.
3.3.3.
3.3.3.Smoothed
SmoothedEstimation
EstimationBasedBasedononHybrid
Hybrid Strategies
Strategies
Further, we conduct an in-depth exploration
Further, we conduct an in-depth exploration ofof
thethe
hybrid
hybrid of of
one-state
one-statestrategy andand
strategy
two-state
two-state strategy approaches for evaluation. Figure 15 illustrates the scatter plot the
strategy approaches for evaluation. Figure 15 illustrates the scatter plot of of the
approach and the corresponding error results. Similar to Table 8, Figure
approach and the corresponding error results. Similar to Table 8, Figure 15 also employs 15 also employs
abbreviations to denote the combinations of models. In the abbreviation V_D_D, ‘V’
abbreviations to denote the combinations of models. In the abbreviation V_D_D, V’ indi-
indicates the use of ViT as the classification model, first ‘D’ denotes the employment
cates the use of ViT as the classification model, first D’ denotes the employment of DCNN
of DCNN as the regression estimation model in two-state strategy. While second ‘D’
as the regression estimation model in two-state strategy. While second D’ represents es-
represents estimation conducted directly using DCNN in one-state strategy. The complete
timation conducted directly using DCNN in one-state strategy. The complete abbreviation
abbreviation indicates the hybrid of one-state strategy and two-strategy.
indicates
Figurethe15ahybrid of one-state
displays the hybridstrategy and
strategies two-strategy.
without any smoothing method, which have
Figure 15a displays the hybrid strategies
outperformed the other two strategies. Next, four smoothing without any smoothing method,
treatments werewhich
appliedhave
outperformed
to the otherand
the hybrid strategies, twoasstrategies. Next,
anticipated, four smoothing
all results in Figure treatments were applied
15b–e outperformed the to
previous strategies. It is worth noting that linear fitting smoothing and MLP smoothing arepre-
the hybrid strategies, and as anticipated, all results in Figure 15b–e outperformed the
vious
still thestrategies. It is worth
best performing noting
in hybrid that linear fitting smoothing and MLP smoothing are
strategies.
still We
the have
best performing
conducted ainfurther
hybridcomparison
strategies. of the error distributions for different lat-
itudesWe andhave conducted
longitudes. Thea further comparison
results presented in of the error
Figure distributions
16 demonstrate forthe
that different
error islati-
tudes and
reduced forlongitudes.
almost the The
entireresults presented
central region ofinNorthwest
Figure 16 Pacific,
demonstrate that
with the the error
MAPE is re-
show-
ing
duceda particularly
for almostnoticeable
the entire improvement.
central region While the hybrid
of Northwest strategy
Pacific, withperforms
the MAPE better in
showing
comparison
a particularly to noticeable
Figures 7, 11, 13 and 14, the
improvement. TC estimates
While the hybrid forstrategy
the nearperforms
coastal areas doinnot
better com-
show
parisonanytosignificant
Figures 7,improvement.
11, 13 and 14, theApartTCfrom the influence
estimates of abrupt
for the near intensity
coastal areas do changes
not show
on
anycoastal regions,
significant the quantityApart
improvement. or quality
fromoftheSCIs in theseof
influence areas might
abrupt also potentially
intensity changes on
introduce disturbances to the results.
coastal regions, the quantity or quality of SCIs in these areas might also potentially intro-
duce disturbances to the results.
Remote Sens. 2023, 15, x FOR PEER REVIEW 20 of 27
Remote
RemoteSens. 2023,15,
Sens.2023, 15,4188
x FOR PEER REVIEW 2019of
of 27
26
Figure 15. Smoothed estimations from testing process of hybrid strategy models, compared with
Figure15.
Figure
best-track15.data:
Smoothed
Smoothed estimations
estimations
(a) V_D_D, from
(b) from testing process
testing
V_D_D_Linear,process ofV_D_D_GB,
(c) of hybrid strategy
hybrid strategy
(d) models, compared
V_D_D_RF, and with
(e)
best-track data: (a) V_D_D, (b) V_D_D_Linear, (c) V_D_D_GB, (d) V_D_D_RF,
best-track data: (a) V_D_D, (b) V_D_D_Linear, (c) V_D_D_GB, (d) V_D_D_RF, and (e) V_D_D_MLP.
V_D_D_MLP. and (e)
V_D_D_MLP.
Figure
Figure 16. Geographic
Geographicdistribution
distributionofofofsmoothed
smoothedestimation
estimationerrors
errorsforforV_D_D_Linear
V_D_D_Linear and
and
Figure 16.
16. Geographic distribution smoothed estimation errors for V_D_D_Linear and
V_D_D_MLP
V_D_D_MLP from one-state strategy: (a) RMSE for V_D_D_Linear, (b) RMSE for V_D_D_MLP, (c)
V_D_D_MLP from from one-state strategy:(a)
one-state strategy: (a)RMSE
RMSEforfor V_D_D_Linear,
V_D_D_Linear, (b)(b) RMSE
RMSE for for V_D_D_MLP,
V_D_D_MLP, (c)
MAPE
(c) for V_D_D_Linear,
MAPE and and
(d) MAPE for V_D_D_MLP.
MAPE forfor V_D_D_Linear,
V_D_D_Linear, and (d)(d) MAPE
MAPE forfor V_D_D_MLP.
V_D_D_MLP.
3.4.Comparison
3.4. ComparisonwithwithOther
OtherTechniques
Techniques
3.4. Comparison with Other Techniques
Thebest
The bestestimation
estimationresults
resultsobtained
obtainedfromfromthis
thisstudy
study(i.e.,
(i.e.,via
viaV_D_D_Linear
V_D_D_Linearand and
The best estimation results obtained from this study (i.e., via V_D_D_Linear and
V_D_D_MLP)are
V_D_D_MLP) arecompared
comparedwithwiththeir
theircounterparts
counterpartsvia viavaried
variedtechniques
techniquesfrom
fromother
other
V_D_D_MLP) are compared with their counterparts via varied techniques from other
studies.The
studies. Thereference
referencesources
sourcesare
areselected
selectedtotoaccount
account forthethe conditionoverover theNorthwest
Northwest
studies. The reference sources are selected to accountfor for thecondition
condition overthe the Northwest
PacificOcean.
Pacific Ocean.AsAsreflected,
reflected,SATCON
SATCONhas hasachieved
achievedthe thebest
bestperformance
performanceamongamongallallthe
the
Pacific Ocean. As reflected, SATCON has achieved the best performance among all the
sources,
sources, but our hybrid strategies win out over all other methods. The best model proposed
sources,but
butourourhybrid
hybridstrategies
strategieswinwinout
outover
overallallother
othermethods.
methods.The Thebest
bestmodel
model
Remote Sens. 2023, 15, 4188 20 of 26
in this study surpasses methods like VGG19 and TCIENet due to two primary factors.
Firstly, our approach utilizes the advanced ViT classifier. This classifier integrates an
attention mechanism capable of capturing global information, thereby enhancing the
sample classification capability. The incorporation of ViT contributes to the reduction in the
final error of our hybrid strategy. Secondly, this study introduces a smoothing technique.
Given the gradual evolution characteristic of TC, the application of the smoothing technique
to the model’s output yields more stable estimations, consequently leading to a notable
decrease in the overall TC estimation error.
It should be stressed that the estimation results via DL models are vulnerably influ-
enced by factors such as data sources and selectors, and it is difficult to compare these
results objectively and fairly. For instance, there are variations in the selection of labels,
best-track data, and gust duration involved in the definition of MSW. Additionally, there
may be variations in the types of adopted images. While some studies only use IR images,
others may also incorporate WV satellite cloud images, among others. Finally, the test
samples may also differ from one another. While some studies use recent TCs as the test
set, others select TC samples from the past. These factors inevitably contribute to biases in
both image quality and label precision.
Comparison of the results in Figures 17 and 18 with those in Table 9 also reveals
some discrepancies. Typically, the proposed DL-aided methods demonstrate superior
Remote Sens. 2023, 15, x FOR PEER REVIEW
performance over SATCON. This discrepancy is expected to be attributed to the utilization
of different baseline data for evaluating varied techniques. In principle, it is the best way
to compare the estimation results with in situ data. However, such records are usually
unavailable in the Northwest Pacific basin. Moreover, the selection of SCIs in this study
the initial introduces
unavoidably stage of the TCs, most
deviations of the
in relation methods
to labels (in particular
from varying sources. Inthe DL
such methods)
cases,
the baseline data can be selected from the TC best-track dataset. As the
overestimate TC intensity slightly. Second, ADT and SATCON show significant o best-track data
issued from JMA and CMA are suggested to be more reliable for TCs over the Northwest
mation when TC intensity exceeds ~100 kt. Third, the DL models are more likel
Pacific basin [35,36], they are used as the baseline data in this study to train, validate and
derestimate
test the DL-aidedTCs at high-intensity status.
models.
Figure
Figure 17.17. Histogram
Histogram of errors
of errors obtained
obtained viatechniques:
via different different (a)
techniques: (a) RMSE,
RMSE, (b) MAE, (b)and
(c) MAPE, MAE, (c) M
and
(d) (d) R coefficient.
R coefficient.
Remote Sens. 2023, 15, 4188 21 (c)
Figure 17. Histogram of errors obtained via different techniques: (a) RMSE, (b) MAE, of 26
MAPE,
and (d) R coefficient.
Figure 18.Boxplots
Figure18. Boxplotsofof
estimation
estimation biasbias
for different techniques:
for different (a) for(a)
techniques: TYfor
intensity samples,
TY intensity (b) for (b) for
samples,
STY
STYintensity
intensitysamples, (c) (c)
samples, for for
VSTY intensity
VSTY samples,
intensity and (d)
samples, for(d)
and VTYforintensity samples.samples.
VTY intensity
Table 9. Comparison of the best estimation performance in this study with those in references.
To further explore the phenomena observed in Figure 19, we scrutinize the variations
Model of estimation
RMSE (kt) results
MAEtogether
(kt) with associated
TC Year SCIs among different developing stages, i.e.
Reference
ADT9.0 formation
11.24 stage, mature
8.67 stage and dissipation
2018 stage, as shown
Olander et Figure
in al. [4] 20. For TCs a
DAV-T both the formation and dissipation stages, the morphological structures
14.3 - 2007–2011 Ritchie et al. [37] of TC cloud are
SATCON 8.9 7.70 2008–2010 Velden and Herndon
manifold, and the samples becomes relatively insufficient to generate versatile [12] DL models
TCIENet 10.12 7.94 2017 Zhang and Liu [25]
CNN-TC During
12.25 the mature period,
- the MSW2015–2016
values for the four casesChen
all exceed
et al. [23]100 kt. Obviously
VGG19 13.23 - 2015–2016 Combinido et al. [21]
CNN 1 10.19 - 2015–2018 Wang et al. [24]
V_D_D_Linear 9.81 7.52
2018–2019 This study
V_D_D_MLP 9.85 7.51
1 dataset is split randomly in the given proportion.
Figure 19. Comparison of estimations via varied methods for four TCs: (a) Mangkhut in 2018, (b)
Figure 19. Comparison of estimations via varied methods for four TCs: (a) Mangkhut in 2018, (b) Yutu
Yutu in 2018, (c) Wutip in 2019, and (d) Hagibis in 2019.
in 2018, (c) Wutip in 2019, and (d) Hagibis in 2019.
Remote Sens. 2023, 15, 4188 23 of 26
To further explore the phenomena observed in Figure 19, we scrutinize the variations
of estimation results together with associated SCIs among different developing stages, i.e.,
formation stage, mature stage and dissipation stage, as shown in Figure 20. For TCs at
both the formation and dissipation stages, the morphological structures of TC cloud are
manifold, and the samples becomes relatively insufficient to generate versatile DL models.
During the mature period, the MSW values for the four cases all exceed 100 kt. Obviously,
Remote Sens. 2023, 15, x FOR PEER REVIEW 24 of 27
there are limited samples to train the DL models for this case adequately, and they are more
likely to underestimate TC intensity.
Figure
Figure20.
20.Estimation
Estimationerrors
errorsfor
for44TCs
TCsatatvaried
varieddeveloping
developingstages
stages(A,
(A,B,B,CCrepresent
representthe
theformation,
formation,
mature,
mature, and dissipation stage): (a) Mangkhut in 2018; (b) Yutu in 2018; (c) Wutip in 2019;(d)
and dissipation stage): (a) Mangkhut in 2018; (b) Yutu in 2018; (c) Wutip in 2019; (d)Hagibis
Hagibis
in 2019.
in 2019.
4.4.Concluding
ConcludingRemarks
Remarks
In
In this
this study,
study, we
we exploited
exploited two
two mainstream
mainstream DL DL models,
models, i.e.,
i.e., DCNN
DCNNand andViT,
ViT, and
and
some
somesmoothing
smoothingtechniques
techniquesto to estimate
estimate TC
TC intensity
intensity from
from SCIs.
SCIs. Several
Several strategies
strategies were
were
proposed
proposed to to improve
improve the
the estimation
estimation performance,
performance,including
includingthe
theone-stage
one-stagestrategy,
strategy, the
the
two-stagestrategy
two-stage strategyand
andaahybrid
hybridstrategy
strategyconsisting
consistingof
ofthe
theabove
abovestrategies
strategiesand
andsmoothing
smoothing
manipulations.Main
manipulations. Mainresults
resultsand
andconclusions
conclusionsare
aresummarized
summarized asas below.
below.
(1) For
(1) Forthe
theone-stage
one-stagestrategy,
strategy,bothbothDCNN
DCNNand andViTViTwere
wereused
usedasasthe
theregression
regressionmodels.
models.
Results suggested that DCNN outperformed ViT slightly,
Results suggested that DCNN outperformed ViT slightly, with the RMSE for ViT with the RMSE for be-
ViT
being
ing approximately
approximately 1 kt1larger
kt larger
thanthan
thatthat
for for DCNN.
DCNN.
(2) For
(2) Forthe
thetwo-stage
two-stagestrategy,
strategy,aaclassification
classificationmodel
modelandandaaregression
regressionmodel
modelwere
werecom-
com-
binedtotofirstly
bined firstlyclassify
classify input
input samples
samples into several intensity
intensity groups
groupsand
andthen
thentotospecify
spec-
thethe
ify TCTCintensity. Despite
intensity. Despite thethe
reasonable
reasonable idea behind
idea behind this strategy,
this it did
strategy, not
it did notlead to
lead
to further improvement of the model performance. The minimum RMSE was a bit
larger (0.6 kt) than that of DCNN for the one-stage strategy.
(3) We further exploited different smoothing methods to refine the output results from
either the regression/classification models or their combinations. The results demon-
strated that the DCNN regression model with linear weighting and MLP methods
Remote Sens. 2023, 15, 4188 24 of 26
further improvement of the model performance. The minimum RMSE was a bit larger
(0.6 kt) than that of DCNN for the one-stage strategy.
(3) We further exploited different smoothing methods to refine the output results from
either the regression/classification models or their combinations. The results demon-
strated that the DCNN regression model with linear weighting and MLP methods
outperformed the optimal model for the one-stage strategy, with RMSE values de-
creased by 1.08 kt and 1.00 kt, respectively.
(4) We also combined the one-stage strategy, two-strategy and smoothing manipulation
together to form the V_D_D_Linear and V_D_D_MLP hybrid strategies. Such hybrid
strategies generated the best performance in this study, with the RMSE value equal to
9.81 kt.
(5) Finally, the model performance presented in this study was compared to those re-
ported by others. Results showed that the DL model performed better than most
existing methods.
Although better estimation performance has been achieved through combined us-
age of multiple DL techniques and strategies, it should be clarified that fundamental
improvements of ML-aided estimation of TC intensity should essentially come from the
advancement of either the quantity/quality of data for model training or the ML models
themselves. Thus, we can further optimize the DL models and their hybrid as discussed in
this study by using: (i) more credible data (e.g., aircraft observations) instead of traditional
best-track records as SCIs’ label information; (ii) larger amount of data and more types
of SCIs (e.g., enhanced SCIs and WV images); (iii) additional physical knowledge and/or
other kinds of input information that affects TC intensity (e.g., sea surface temperature,
vorticity, and vertical wind shear). Meanwhile, we can use more advanced DL models,
such as the Swin Transformer [38] and DeiT [39], which have been demonstrated to possess
some overwhelming advantages against DCNN or ViT in certain respects.
Author Contributions: B.T., investigation, visualization, data curation, and writing—original draft.
J.F., funding acquisition, project administration, and supervision. Y.D., methodology development
and investigation. Y.H. (Yongjun Huang), visualization and data curation. P.C., data curation. Y.H.
(Yuncheng He), formal analysis, writing—editing, conceptualization, funding acquisition, and project
administration. All authors have read and agreed to the published version of the manuscript.
Funding: The authors wish to acknowledge the financial support provided by the National Science
Fund for Distinguished Young Scholars (Grant No: 51925802), the National Natural Science Foun-
dation of China (Grant No: 52178465), the Natural Science Foundation of Guangdong Province for
Distinguished Young Scholars (Grant No: 2023B1515020117), the Guangzhou Municipal Science and
Technology Project (Grant No: 202201021330190101) and the Ministry of Education, China-111 Project
(Grant No: D21021).
Data Availability Statement: The data utilized in this research can be accessed openly from multi-
ple sources. The primary sources include the Archives of Weather Home at Kochi University, Japan
(http://weather.is.kochi-u.ac.jp/archive-e.html, accessed on 30 July 2022), the Japan Meteorological Agency
(JMA, https://www.data.jma.go.jp/, accessed on 15 June 2022), as well as the ADT (https://tropic.ssec.
wisc.edu/real-time/adt/adt.html, accessed on 20 June 2022) and SATCON methods (https://tropic.ssec.
wisc.edu/real-time/satcon/, accessed on 20 June 2022).
Acknowledgments: The authors would like to thank our colleagues who made suggestions for our
paper and the developers who selflessly provided the source code to the researchers.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Dvorak, V.F. Tropical cyclone intensity analysis using satellite data. In NOAA Technical Report NESDIS, 11; US Department of
Commerce, National Oceanic and Atmospheric Administration, National Environmental Satellite, Data, and Information Service:
Washington, DC, USA, 1984; pp. 1–47.
2. Velden, C.S.; Olander, T.L.; Zehr, R.M. Development of an objective scheme to estimate tropical cyclone intensity from digital
geostationary satellite infrared imagery. Weather Forecast. 1998, 13, 172–186. [CrossRef]
Remote Sens. 2023, 15, 4188 25 of 26
3. Velden, C.; Harper, B.; Wells, F.; Beven, J.L.; Zehr, R.; Olander, T.; Mayfield, M.; Guard, C.C.; Lander, M.; Edson, R. The Dvorak
tropical cyclone intensity estimation technique: A satellite-based method that has endured for over 30 years. Bull. Am. Meteorol.
Soc. 2006, 87, 1195–1210. [CrossRef]
4. Olander, T.L.; Velden, C.S. The advanced Dvorak technique (ADT) for estimating tropical cyclone intensity: Update and new
capabilities. Weather Forecast. 2019, 34, 905–922. [CrossRef]
5. Kidder, S.Q.; Goldberg, M.D.; Zehr, R.M.; DeMaria, M.; Purdom, J.F.; Velden, C.S.; Grody, N.C.; Kusselson, S.J. Satellite analysis of
tropical cyclones using the Advanced Microwave Sounding Unit (AMSU). Bull. Am. Meteorol. Soc. 2000, 81, 1241–1260. [CrossRef]
6. Bankert, R.L.; Tag, P.M. An automated method to estimate tropical cyclone intensity using SSM/I imagery. J. Appl. Meteorol. 2002,
41, 461–472. [CrossRef]
7. Piñeros, M.F.; Ritchie, E.A.; Tyo, J.S. Estimating tropical cyclone intensity from infrared image data. Weather Forecast. 2011, 26,
690–698. [CrossRef]
8. Fetanat, G.; Homaifar, A.; Knapp, K.R. Objective tropical cyclone intensity estimation using analogs of spatial features in satellite
data. Weather Forecast. 2013, 28, 1446–1459. [CrossRef]
9. Rodríguez-Herrera, O.G.; Wood, K.M.; Dolling, K.P.; Black, W.T.; Ritchie, E.A.; Tyo, J.S. Automatic tracking of pregenesis tropical
disturbances within the deviation angle variance system. IEEE Geosci. Remote Sens. Lett. 2014, 12, 254–258. [CrossRef]
10. Knaff, J.A.; Longmore, S.P.; DeMaria, R.T.; Molenar, D.A. Improved tropical-cyclone flight-level wind estimates using routine
infrared satellite reconnaissance. J. Appl. Meteorol. Climatol. 2015, 54, 463–478. [CrossRef]
11. Zhao, Y.; Zhao, C.; Sun, R.; Wang, Z. A multiple linear regression model for tropical cyclone intensity estimation from satellite
infrared images. Atmosphere 2016, 7, 40. [CrossRef]
12. Velden, C.S.; Herndon, D. A consensus approach for estimating tropical cyclone intensity from meteorological satellites: SATCON.
Weather Forecast. 2020, 35, 1645–1662. [CrossRef]
13. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
14. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
15. Han, X.; Li, X.; Yang, J.; Wang, J.; Zheng, G.; Ren, L.; Chen, P.; Fang, H.; Xiao, Q. Dual-Level Contextual Attention Generative
Adversarial Network for Reconstructing SAR Wind Speeds in Tropical Cyclones. Remote Sens. 2023, 15, 2454. [CrossRef]
16. Tong, B.; Wang, X.; Fu, J.; Chan, P.; He, Y. Short-term prediction of the intensity and track of tropical cyclone via ConvLSTM
model. J. Wind Eng. Ind. Aerodyn. 2022, 226, 105026. [CrossRef]
17. Pang, S.; Xie, P.; Xu, D.; Meng, F.; Tao, X.; Li, B.; Li, Y.; Song, T. NDFTC: A new detection framework of tropical cyclones from
meteorological satellite images with deep transfer learning. Remote Sens. 2021, 13, 1860. [CrossRef]
18. Sun, Z.; Zhang, B.; Tang, J. Estimating the Key Parameter of a Tropical Cyclone Wind Field Model over the Northwest Pacific
Ocean: A Comparison between Neural Networks and Statistical Models. Remote Sens. 2021, 13, 2653. [CrossRef]
19. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
20. Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform.
2019, 16, 4681–4690. [CrossRef]
21. Combinido, J.S.; Mendoza, J.R.; Aborot, J. A convolutional neural network approach for estimating tropical cyclone intensity
using satellite-based infrared images. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR),
Beijing, China, 20–24 August 2018; pp. 1474–1480.
22. Wimmers, A.; Velden, C.; Cossuth, J.H. Using deep learning to estimate tropical cyclone intensity from satellite passive microwave
imagery. Mon. Weather Rev. 2019, 147, 2261–2282. [CrossRef]
23. Chen, B.; Chen, B.-F.; Lin, H.-T. Rotation-blended CNNs on a new open dataset for tropical cyclone image-to-intensity regression.
In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK,
19–23 August 2018; pp. 90–99.
24. Wang, C.; Zheng, G.; Li, X.; Xu, Q.; Liu, B.; Zhang, J. Tropical cyclone intensity estimation from geostationary satellite imagery
using deep convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4101416. [CrossRef]
25. Zhang, R.; Liu, Q.; Hang, R. Tropical cyclone intensity estimation using two-branch convolutional neural network from infrared
and water vapor images. IEEE Trans. Geosci. Remote Sens. 2019, 58, 586–597. [CrossRef]
26. Lee, J.; Im, J.; Cha, D.-H.; Park, H.; Sim, S. Tropical cyclone intensity estimation using multi-dimensional convolutional neural
networks from geostationary satellite data. Remote Sens. 2019, 12, 108. [CrossRef]
27. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.;
Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.
28. Wang, D.; Zhang, Q.; Xu, Y.; Zhang, J.; Du, B.; Tao, D.; Zhang, L. Advancing plain vision transformer toward remote sensing
foundation model. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5607315. [CrossRef]
29. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks.
Nature 2023, 619, 533–538. [CrossRef] [PubMed]
30. Harper, B.; Kepert, J.; Ginger, J. Guidelines for Converting between Various Wind Averaging Periods in Tropical Cyclone
Conditions; World Metrological Organization WMO/TD 1555. 2010, p. 64. Available online: https://library.wmo.int/doc_num.
php?explnum_id=290 (accessed on 25 August 2023).
Remote Sens. 2023, 15, 4188 26 of 26
31. Tong, B.; Sun, X.; Fu, J.; He, Y.; Chan, P. Identification of tropical cyclones via deep convolutional neural network based on satellite
cloud images. Atmos. Meas. Tech. 2022, 15, 1829–1848. [CrossRef]
32. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you
need. proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017;
pp. 5998–6008.
33. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [CrossRef]
34. Knaff, J.A.; DeMaria, R.T. Forecasting tropical cyclone eye formation and dissipation in infrared imagery. Weather Forecast. 2017,
32, 2103–2116. [CrossRef]
35. Ren, F.; Liang, J.; Wu, G.; Dong, W.; Yang, X. Reliability analysis of climate change of tropical cyclone activity over the western
North Pacific. J. Clim. 2011, 24, 5887–5898. [CrossRef]
36. Bai, L.; Tang, J.; Guo, R.; Zhang, S.; Liu, K. Quantifying interagency differences in intensity estimations of Super Typhoon Lekima
(2019). Front. Earth Sci. 2022, 16, 5–16. [CrossRef]
37. Ritchie, E.A.; Wood, K.M.; Rodríguez-Herrera, O.G.; Piñeros, M.F.; Tyo, J.S. Satellite-derived tropical cyclone intensity in the
North Pacific Ocean using the deviation-angle variance technique. Weather Forecast. 2014, 29, 505–516. [CrossRef]
38. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using
shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada,
11–17 October 2021; pp. 10012–10022.
39. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation
through attention. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021;
Volume 139, pp. 10347–10357.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.