0% found this document useful (0 votes)

12 views26 pages

Cyc 6

Uploaded by

aboothashree2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

Cyc 6

Uploaded by

aboothashree2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

remote sensing

Article
Estimation of Tropical Cyclone Intensity via Deep Learning
Techniques from Satellite Cloud Images
Biao Tong 1 , Jiyang Fu 1 , Yaxue Deng 1 , Yongjun Huang 2 , Pakwai Chan 3 and Yuncheng He 1, *

1 Research Center for Wind Engineering and Engineering Vibration, Guangzhou University,
Guangzhou 510006, China; [email protected] (B.T.); [email protected] (J.F.);
[email protected] (Y.D.)
2 School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, China;
[email protected]
3 Hong Kong Observatory, Hong Kong 999077, China; [email protected]
* Correspondence: [email protected]; Tel.: +86-133-1286-1586

Abstract: Estimating the intensity of tropical cyclones (TCs) is usually involved as a critical step in
studies on TC disaster warnings and prediction. Satellite cloud images (SCIs) are one of the most
effective and preferable data sources for TC research. Despite the great achievements in various
SCI-based studies, accurate and efficient estimation of TC intensity still remains a challenge. In
recent years, machine learning (ML) techniques have gained fast development and shown significant
potential in dealing with big data, particularly with images. This study focuses on the objective
estimation of TC intensity based on SCIs via a comprehensive usage of some advanced deep learning
(DL) techniques and smoothing methods. Two estimation strategies are proposed and examined
which, respectively, involve one and two functional stages. The one-stage strategy uses Vision
Transformer (ViT) or Deep Convolutional Neutral Network (DCNN) as the regression model for
directly identifying TC intensity, while the second strategy involves a classification stage that aims
to stratify SCI samples into a few intensity groups and a subsequent regression stage that specifies
the TC intensity. Further efforts are made to improve the estimation accuracy by using smoothing
manipulations (via four specific smoothing techniques) in the scenarios of the aforementioned two
strategies and their fusion. Results show that DCNN performs better than ViT in the one-stage
strategy, while using ViT as the classification model and DCNN as the regression model can result
Citation: Tong, B.; Fu, J.; Deng, Y.;
in the best performance in the two-stage strategy. It is interesting that although the strategy of
Huang, Y.; Chan, P.; He, Y. Estimation
singly using DCNN wins out over any concerned two-stage strategy, the fusion of the two strategies
of Tropical Cyclone Intensity via
Deep Learning Techniques from
outperforms either the one-stage strategy or the two-stage strategy. Results also suggest that using
Satellite Cloud Images. Remote Sens. smoothing techniques are beneficial for the improvement of estimation accuracy. Overall, the best
2023, 15, 4188. https://doi.org/ performance is achieved by using a hybrid strategy that consists of the one-stage strategy, the two-
10.3390/rs15174188 stage strategy and smoothing manipulation. The associated RMSE and MAE values are 9.81 kt and
7.51 kt, which prevail over those from most existing studies.
Academic Editor: Yuriy Kuleshov

Received: 2 August 2023 Keywords: tropical cyclone; satellite cloud image; intensity estimation; ViT; DCNN
Revised: 23 August 2023
Accepted: 24 August 2023
Published: 25 August 2023

1. Introduction
TCs are highly destructive natural disasters, and accurate assessment of their activities
Copyright: © 2023 by the authors.
are essential for the prevention and reduction of TC disasters. Among all TC parameters,
Licensee MDPI, Basel, Switzerland. intensity is perhaps the most complex one as it not only physically depends upon lots
This article is an open access article of factors, such as background environment, TCs’ inner structures and their interactions,
distributed under the terms and but also capriciously varies with location and time sometimes. Consequently, research
conditions of the Creative Commons on TC intensity has been a major priority in the fields of meteorology and oceanography.
Attribution (CC BY) license (https:// However, due to the vast scale of TC structures and their complicated evolution in spatial-
creativecommons.org/licenses/by/ temporal domain during the life-cycle, it remains a challenge to characterize TC intensity
4.0/). via earth-based instruments especially before TCs’ landfall.

Remote Sens. 2023, 15, 4188. https://doi.org/10.3390/rs15174188 https://www.mdpi.com/journal/remotesensing

Remote Sens. 2023, 15, 4188 2 of 26

With the help of ever developing equipment and technology, humans can now obtain
more credible information of TCs even over seas from reconnaissance aircrafts and airborne
devices. However, such instruments are too expensive for routine usage at a global scale,
which restricts the coverage of associated observations in terms of space and time. By
contrast, satellite remote sensing data, in particular SCIs, provide abundant information of
TCs and accompanied background environment over vast oceans in an uninterrupted way.
As a result, they have been widely used for both academic studies and application practices.
Continuous efforts have been made to estimate TC intensity from SCIs. The Dvorak
technique, initially documented by Dvorak, is a set of TC analytical methods that identify
TC fingerprints and produce preliminary judgments of TC intensity, and then utilize dif-
ferent cloud textures and change patterns to refine final estimations [1]. This technique
has been further developed such as by Velden et al. [2–4]. Many other estimation methods
have also been proposed since the beginning of this century, including the fixed-intensity
Advanced Microwave Sounding Unit (AMSU) method [5], manual techniques for intensity
estimation using SSM/I images [6], a near-real-time technique for characterizing the shape
and dynamics of TCs and correlating them with TC intensity [7], and a TC intensity estima-
tion method using spatial characteristic analogue in satellite data [8]. Moreover, studies on
the deviation angle variation (DAV) method for estimating TC intensity using geostationary
infrared (IR) brightness and temperature data [9], estimation of wind speed at flight altitude
using conventional TC information and IR satellite images [10], multiple linear regression
models for estimating TC intensity by IR satellite images [11], and empirical estimation
of TC intensity by the SATCON weighted consensus algorithm [12] have been conducted
as well.
Despite their widespread use in meteorology, the aforementioned techniques face
some challenges when applied in practice. Typically, many techniques involve experience-
based manipulations, which make the estimations tend to suffer from low efficiency and
subjective errors. Therefore, there is a need to develop high-precision and objective methods
for estimating TC intensity.
In recent years, machine learning (ML) techniques have gained fast development [13,14]
and shown significant potential in dealing with many meteorological issues [15–18]. Among
various ML techniques, the convolutional neural network (CNN) has attracted more and
more attention for SCI-based estimation of TC intensity since, as an abstract feature ex-
traction technology, it is capable of retrieving highly generalized information of TCs as
well as identifying and classifying complex TC images [19,20]. For example, Combinido
et al. [21] examined the performance of the VGG19 model driven by grayscale infrared (IR)
images. Wimmers et al. [22] used 2D-CNN approach based on satellite passive microwave
imagery. Chen et al. [23] built a CNN-TC regression model by taking into account the
domain knowledge of meteorologists. Wang et al. [24] developed a CNN-based model
with the help of H-8 geostationary satellite IR imagery. Zhang et al. [25] proposed a two-
branch CNN model on the basis of IR and water vapor (WV) images. Lee et al. [26] further
employed 3D-CNN to investigate the correlation between multi-spectral geostationary
satellite images and TC intensity.
Another ML technique that varies significantly from CNN and its derivatives is
Transformer, which currently dominates in the field of natural language processing (NLP).
While CNN is operationally based on convolution calculations (which are good at capturing
local features), Transformer is established on the basis of self-attention mechanism. This
completely different mechanism enables Transformer to extract global features from longer
sequences and improve computational efficiency through performing parallel computation
during training and inference. In light of the striking success of Transformer in NLP,
Dosovitskiy et al. [27] expanded it to the vision field and proposed Vision Transformer (ViT).
Since its debut, ViT has achieved remarkable success in the vision field, outperforming most
existing CNN models [28]. Undoubtedly, ViT has significant potential in meteorological
remote sensing. In fact, Bi et al. [29] demonstrated that training ViT models with large
amounts of reanalysis data can generate better results than those from numerical weather
Remote Sens. 2023, 15, x FOR PEER REVIEW 3 of 27

Remote Sens. 2023, 15, 4188 outperforming most existing CNN models [28]. Undoubtedly, ViT has significant poten- 3 of 26
tial in meteorological remote sensing. In fact, Bi et al. [29] demonstrated that training ViT
models with large amounts of reanalysis data can generate better results than those from
numerical weather prediction (NWP). However, to the authors’ best knowledge, no stud-
prediction (NWP). However, to the authors’ best knowledge, no studies have been reported
ies ViT-aided
on have beenestimation
reported on ofViT-aided estimation of TC intensity.
TC intensity.
It should be noted that the performance
It should be noted that the performance of of ML
ML models
models markedly
markedly depends
depends on on the
the
quality and amount of input data, and it is not uncommon that a model
quality and amount of input data, and it is not uncommon that a model performs well for performs well for
some cases whilst it becomes degraded for others. Therefore, a cluster
some cases whilst it becomes degraded for others. Therefore, a cluster of ML techniques of ML techniques
may be
may be adopted
adopted concurrently
concurrently to to exert
exert hybrid-related
hybrid-related advantages.
advantages.
This study
This study focuses
focuses on on ML-aided
ML-aided estimation
estimation of of TC
TC intensity
intensity based
based on on SCIs.
SCIs. ViT
ViT isis
adopted for the first time to estimate TC intensity and its performance
adopted for the first time to estimate TC intensity and its performance is examined through is examined
through comparison
comparison with CNN. with CNN.
More More importantly,
importantly, special
special efforts are eﬀorts
made toare made to
improve theimprove
estima-
tion accuracy by comprehensive usage of multiple hybrid strategies. The remainderThe
the estimation accuracy by comprehensive usage of multiple hybrid strategies. re-
of the
mainder of the article is organized as follows. After an introduction of
article is organized as follows. After an introduction of the datasets, data pre-processing the datasets, data
pre-processing
and evaluation andmethodsevaluation methods
in Section in Section
2, detailed 2, detailed
performance of performance
each ML model of each ML
and ML-
modelstrategy
aided and ML-aided strategy
is presented andisdiscussed
presentedinand discussed
Section 3. Mainin Section
findings3.and
Main findings and
conclusions are
conclusions are summarized
summarized in Section 4. in Section 4.

2. Methodology Statement
The adopted
adopted methodology
methodologyisisdepicted
depictedininFigure
Figure
1, 1, which
which mainly
mainly consists
consists of fol-
of the the
following four
lowing four links:
links: obtaining
obtaining SCIs
SCIs fromopen-source
from open-sourcedatabases,
databases,conducting
conducting pre-processing
pre-processing
manipulations
manipulations (i.e.,
(i.e.,data
dataaugmentation
augmentationand andsegmentation),
segmentation),training and
training andvalidating different
validating diﬀer-
models, as well as analyzing and comparing estimation results.
ent models, as well as analyzing and comparing estimation results.

Figure 1. Technical flowchart.

Two basic DL-aided strategies are utilized to estimate TC intensity (in terms terms of
of max-
max-
imum sustained wind, or MSW): the one-stage strategy that uses ViT or DCNN as the
regression
regression model
model for directly identifying MSW, MSW, and and the two-stage strategy which involves
a classification stage that aims to stratify SCI samples into a few intensity groups and a
subsequent regression stage that specifies MSW. Further efforts efforts are made to improve the
estimation
estimation accuracy by using smoothing manipulations (via 44 techniques)
accuracy by using smoothing manipulations (via techniques) inin the
the scenarios
scenarios
of the two basic strategies and their fusion (i.e., a hybrid
of the two basic strategies and their fusion (i.e., a hybrid strategy).strategy).
The
The primary
primary idea
idea behind
behind thethe two
two basic
basic strategies
strategies lies
lies in
in that
that input
input SCI
SCI samples
samples are
are
often
often unevenly
unevenly distributed
distributed in in varied
varied intensity
intensity groups. While the
groups. While the groups
groups containing
containing more
more
credible samples tend
credible samples tendto togenerate
generateideally
ideallyparameterized
parameterized models
models andand better
better estimation
estimation re-
results, those with fewer samples are likely to suffer from insufficiently
sults, those with fewer samples are likely to suffer from insufficiently training and inferior training and
inferior model performance.
model performance. Thus, it isThus, it is expected
expected that the two-stage
that the two-stage strategy isstrategy
helpfulisforhelpful
mini-
for minimizing the negative effects on those fewer-sample-featured
mizing the negative effects on those fewer-sample-featured groups caused by the groups caused by“re-
the
“resourceful” groups.
sourceful” groups.
Different
Different from
fromthe theabove
aboveidea
ideawhich
whichtries to to
tries improve
improve thethe
estimation results
estimation during
results the
during
identifying process (or simply, process-oriented), smoothing manipulation
the identifying process (or simply, process-oriented), smoothing manipulation aims to aims to refine
the final results through the fusion of outputs from varied DL models or those from the
same model but at different time steps (or result-oriented).
Remote Sens. 2023, 15, 4188 4 of 26

2.1. Datasets
2.1.1. Data Sources
The SCI data are derived from the Archives of Weather Home, Kochi University, Japan
(http://weather.is.kochi-u.ac.jp/archive-e.html, accessed on 30 July 2022), which were
captured by geostationary satellites “Himawari-8” and “MTSAT-1R” over the Northwest
Pacific Ocean. Each grayscale infrared image (IR, 10.2–12.5 µm) contains 1800 × 1800 pixels
that correspond to a geographic area of 70◦ N–20◦ S, 70◦ E–160◦ E. In total, 222,212 images are
exploited in this study, which were taken at 1 h intervals during the life-cycles of 546 TCs
from 2000 to 2021.
Corresponding label information, i.e., TC trajectory and intensity (defined as 10 min
mean MSW; unit: knot or kt, 1 kt = 1.85 km/h = 0.514 m/s), is available from the Japan
Meteorological Agency (JMA, Tokyo, Japan, https://www.data.jma.go.jp/, accessed on
15 June 2022). These labels are updated every 3 or 6 h. Note that the MSW values are
provided in a form of integral multiples of 5 kt, and they would be marked as zero for
MSW < 35 kt. JMA also stratifies TCs into 4 intensity categories according to MSW: typhoon
(TY, 35–63 kt), strong typhoon (STY, 64-84 kt), very strong typhoon (VSTY, 85-104 kt), violent
typhoon (VTY, >105 kt).
Besides the SCI datasets, this study also considers the reanalysis data estimated,
respectively, via ADT (https://tropic.ssec.wisc.edu/real-time/adt/adt.html, accessed on
20 June 2022) and SATCON methods (https://tropic.ssec.wisc.edu/real-time/satcon/,
accessed on 20 June 2022) for comparison purpose. These data are documented at 30 min
intervals, and the TC intensity is expressed as the 1 min mean MSW which differs from
the one issued by JMA. The method presented by Harper et al. [30] is adopted to convert
between 1 min mean and 10 min mean MSWs.

2.1.2. Data Pre-Processing

There are two main tasks during this process: cropping TC structure from each original
SCI, and data augmentation through image transformation.
The original SCIs cover a too large area with respect to a TC, which makes it difficult
to identify TC intensity effectively via DL methods. Thus, it is required to extract the TC
portion from the original image [31]. This can be fulfilled through automatically cropping
manipulation in accordance with the best track data of targeted TCs, as shown in Figure 2a,b.
After cropping, each image contains 400–500 pixels or spans 20◦ –25◦ along both longitude
and latitude directions. The cropped SCIs are then standardized to a uniform size of
400 × 400 pixels. The standardized SCIs are further examined to ensure that TC structure
is effectively covered in the processed images, and those with TC centers located beyond
the images are discarded.
On the other hand, data augmentation aims to deal with the issue related to unbalanced
distribution of SCI samples among different statues of TC intensity. Typically, the life-cycle
of a TC consists of relatively longer periods of low-to-moderate intensity status and shorter
episodes of high-intensity status. This unbalanced distribution of intensity status and
therefore SCI samples can degrade the training quality for the models which usually require
that the input samples should be evenly distributed along with the key targeted parameter
(i.e., TC intensity). As demonstrated in Figure 2c–j, eight specific data augmentation
manipulations, including image flipping, multi-angle rotation and noise addition, are
employed in this study to collectively (i.e., regardless of intensity category) increase the
number of TC samples. Images of TCs in the TY intensity category are then randomly
down-sampled to improve the balance of samples among different intensity categories.
There are two points to be stressed. First, to ensure the objectivity and credibility
of testing results, the dataset for testing (to be discussed in the following section) has
only experienced cropping manipulation, whilst no operations for the aforementioned
data augmentation have been conducted, as artificial transformations tend to destroy TCs’
morphological structures and make the processed SCIs physically meaningless. Second,
although the process of data augmentation does moderate unbalanced-distribution-related
Remote Sens. 2023, 15, 4188 5 of 26

issues, the samples of processed SCIs in categories with higher TC intensity levels are
still lacking. By trial and error, better results can be achieved when the SCI samples are
stratified into two categories: the one with MSW > 64 kt (referred to as STYS) and the
Remote Sens. 2023, 15, x FOR PEER REVIEW 5 of 27
one with MSW < 64 kt (referred to as TY). Thus, this stratification is exploited for the
two-stage strategy.

Figure 2.2. Image

Figure Image transformation:
transformation: (a)
(a) original
original image;
image; (b)
(b) cut-out
cut-out images;
images; (c)
(c) adding
adding salt
salt and
and pepper
pepper
noise; (d) adding Gaussian noise; (e) adding salt and pepper and Gaussian noise; (f,g,h)
noise; (d) adding Gaussian noise; (e) adding salt and pepper and Gaussian noise; (f,g,h) rotating 90◦ , rotating
90°,◦ ,180°,
180 270◦ 270° anticlockwise;
anticlockwise; (i) horizontal
(i) horizontal flip;
flip; (j) (j) vertical
vertical flip. flip.

2.1.3. On
Segmentation
the other hand,and Standardization
data augmentation aims to deal with the issue related to unbal-
anced After pre-processing,samples
distribution of SCI among
the samples arediﬀerent
segmentedstatuesintoofthree
TC intensity.
sets, i.e.,Typically,
training set,the
life-cycle of a TC consists of relatively longer periods of low-to-moderate
validation set and testing set, which are, respectively, used for training, validating and intensity status
and shorter
testing the DLepisodes
networks. of In
high-intensity
total, 158,260 status.
SCIs forThis
330 unbalanced
TCs from 2000 distribution
to 2013 areof intensity
selected as
status
the and therefore
training set (referSCIto assamples can degrade
TG, hereafter), 52,032theSCIs
training
for 113 quality for the
TCs from 2014models
to 2017which
are
usually require
selected that the(VG),
for validation inputand samples
11,921should
SCIs for be103
evenly
TCs distributed
from 2018 toalong2021with the key
are selected
targeted
for testing.parameter (i.e., TC intensity).
Basic information As demonstrated
of the three sets is tabulated in Figure
in Table2c–j,
1. eight
Note specific
that fordata
the
augmentation manipulations, including image flipping, multi-angle
smoothing strategy, the testing set is further divided into two parts: one part (SCIs in rotation and noise
addition, are
2020–2021) usedemployed
to fit thein this studymodels,
smoothing to collectively
while the (i.e.,
otherregardless
part (SCIsof in
intensity
2018–2019) category)
used
increase
to test thethe number ofofTC
performance samples.models.
smoothing Images ItofisTCs in thetoTY
essential intensity category
acknowledge that the are then
practice
randomly
of down-sampled
partitioning data by year to improve athe
introduces balance
certain degreeof samples among
of bias, given thatdiﬀerent
more recent intensity
data
tends to possess higher quality. However, even more crucially, this strategy guarantees
categories.
completeThereindependence
are two points between distinct datasets.
to be stressed. First, to ensure the objectivity and credibility of
testing results, the dataset for testing (to be discussed in the following section) has only
Table 1. The number
experienced croppingof samples.
manipulation, whilst no operations for the aforementioned data
augmentation have been conducted, as artificial transformations tend to destroy TCs’
Years TCs SCI Samples
morphological structures and make the processed SCIs physically meaningless. Second,
althoughTrainthe process of data2000–2013augmentation does moderate 330 158,260
unbalanced-distribution-re-
Validation 2014–2017 113 52,032
lated issues, the samples of processed SCIs in categories with higher TC intensity levels
Test 2018–2021 103 11,920
are still lacking. By trial and error, better results can be achieved when the SCI samples
are stratified into two categories: the one with MSW > 64 kt (referred to as STYS) and the
Meanwhile, both the pixel sizes and pixel values of SCIs for all the three sets are
one with MSW < 64 kt (referred to as TY). Thus, this stratification is exploited for the two-
standardized to meet the input requirements of DL models. Each SCI is resized to contain
stage strategy.
128 × 128 pixels for the DCNN model and 224 × 224 pixels for the ViT model, while the
pixel values are normalized to be in the range of [–1, 1]. The normalization process is also
2.1.3. Segmentation and Standardization
helpful for enhancing the convergence during model training.
After pre-processing, the samples are segmented into three sets, i.e., training set, val-
idation
2.2. DCNN set and
Model testing set, which are, respectively, used for training, validating and testing
the DLCNN is a kindIn
networks. of total, 158260 based
ML network SCIs for
on 330 TCs from
supervised 2000 toIt2013
learning. are selected
has strong as the
adaptability
training set (refer to as TG, hereafter), 52032 SCIs for 113 TCs from
and is good at mining local features of data, extracting global training features and classifi-2014 to 2017 are se-
lected for validation (VG), and 11921 SCIs for 103 TCs from
cation. However, simple CNN becomes unable to meet the universality and accuracy of2018 to 2021 are selected for
testing. Basic information of the three sets is tabulated in Table 1. Note that for the smooth-
ing strategy, the testing set is further divided into two parts: one part (SCIs in 2020–2021)
used to fit the smoothing models, while the other part (SCIs in 2018–2019) used to test the
performance of smoothing models. It is essential to acknowledge that the practice of par-
titioning data by year introduces a certain degree of bias, given that more recent data tends
pixel values are normalized to be in the range of [–1, 1]. The normalization process is also
helpful for enhancing the convergence during model training.

2.2. DCNN Model

Remote Sens. 2023, 15, 4188 CNN is a kind of ML network based on supervised learning. It has strong adaptabil- 6 of 26
ity and is good at mining local features of data, extracting global training features and
classification. However, simple CNN becomes unable to meet the universality and accu-
racy of various
various practical practical
problems.problems.
Under Under such conditions,
such conditions, DCNNDCNN was proposed
was proposed and hasandbeenhas
been utilized
utilized extensively.
extensively.
As shown in Figure 3, a DCNN DCNN model usually consists of several functional mod-
ules/layers
ules/layers thatthat can
can bebe combined
combined in in order and on request. Typical modules include the the
convolutional layer, pooling
pooling layer,
layer, dropout
dropout layer,
layer, and
and dense
dense layer.
layer. The
The convolution
convolution layer
contains multiple
multiple operational
operationalscanners,
scanners,namely
namely thethe convolution
convolution kernel,
kernel, whose
whose size size
is uni-is
uniformly
formly fixed fixed within
within the layer.
the layer. This This
layerlayer is used
is used to readto the
read the input
input information
information of the
of the model
model and obtain
and obtain variousvarious abstract
abstract features
features of theoftarget
the target
throughthrough convolution
convolution calculation.
calculation. The
The pooling
pooling layerlayer filters
filters matrix
matrix information
information through
through a series
a series of pooling
of pooling operations
operations suchsuchas
as maximum
maximum pooling
pooling andand average
average pooling.
pooling. TheThe dropout
dropout layerlayer is used
is used to maximize
to maximize the eﬃ-the
efficiency
ciency of of neural
neural nodes
nodes bybyeliminating
eliminatingunimportant
unimportantfeatures.
features.TheThedense
dense layer
layer isis usually
usually
arranged
arranged at the end of the model, and is used to flatten the information of the previous
at the end of the model, and is used to flatten the information of the previous
layer
layer as
as well
well asas estimate
estimate the
the classification
classification similarity
similarity byby calculating
calculating aa nonlinear
nonlinear function.
function.

Figure 3.
Figure 3. Structures of
of the
the DCNN
DCNN network.
network.

Functionally,
Functionally, the
the input
input and
and hidden
hidden layers
layers cooperate
cooperate to to extract
extract any
any potential
potential features
features
from the SCIs for identifying TC intensity, while the output layer conducts
from the SCIs for identifying TC intensity, while the output layer conducts judgments and judgments
and decisions
decisions according
according to extracted
to the the extracted results.
results. It isItclear
is clear
thatthat characterizing
characterizing TC TC intensity
intensity es-
essentially
sentially belongs to a regression problem. Therefore, the mean squared error (MSE) loss
belongs to a regression problem. Therefore, the mean squared error (MSE) loss
function
function (Equation
(Equation(1))
(1))and
andcross-entropy
cross-entropyloss
lossfunction
function (Equation
(Equation(2)) areare
(2)) adopted
adopted herein to
herein
quantify
to quantifythethe
consistency
consistencyof of
predictions against
predictions againstthetheactual results
actual forfor
results thethe
regression
regression models
mod-
and category models, respectively:
els and category models, respectively:
2 2
11n n ^
Loss
LossMSEMSE=
= (ŷ( y−i −yi )yi )
NN∑i = 1 i
(1)
i =1

1 N M c
N i∑ ∑ yi · ln[ p(yic )]
Loss cross−entropy = − (2)
=1 c =1
where ŷi represents the prediction of the true MSW values yi , N is the number of SCI
samples, yic is the label of the c-th classification (1 for positive judgments and 0 for negative
judgments) for the i-th SCI, M is the number of categories, and p(yic ) denotes the probability
of the prediction associated with yic , which can be expressed via the softmax function:

exp( f yc )
p(yic ) = i
(3)
∑C
c=1 exp( f yic )

where f yc is the original score of the model for prediction yic , which is calculated by the
i
output layer based on the 1000 × 1 dimensional output vector x (or the characteristic vector)
from previous layers:
f yc = Wx + b (4)
i
Remote Sens. 2023, 15, 4188 7 of 26

in which W (with dimensions 2 × 1000) represents the coefficient matrix which quantifies
the weight for each element in x during the judging/prediction process, and b (with
dimensions 2 × 1) is the bias vector.
Both W and b should be determined through training. In this study, the stochastic
gradient descent (SGD) method is utilized to provide efficient estimation of the model
parameters. SGD iteratively updates the W and b by computing gradients of the loss
function and adjusting them in the direction that minimizes the loss. Moreover, the model
involves a few hyperparameters, including the number of neural network nodes, the
learning rate and epoch. These parameters are usually pre-set and adjusted empirically
based on training results. Based on previous tests, the models in this study uses a learning
rate of 0.001, with a batch size of 64–128.
Remote Sens. 2023, 15, x FOR PEER REVIEW 8 of 27
The model generates a predicted value for each SCI, which ranges from 0 to 1. As
the labels in the regression models are normalized using Min-Max scaling, the predictions
from the model should be dimensionalized to the standard MSW scale through reverse
calculation module (right part of the figure). A typical workflow for ViT involves the fol-
normalization. For the classification model, a threshold of 0.5 is used to determine whether
lowing several procedures: dividing the input images into blocks with a certain size, reas-
the SCI belongs to the STYS or TY category. Samples with predictions greater than 0.5 are
sembling the divided image blocks into a sequence, transferring the combined results to
classified as STYS, while those with a value less than or equal to 0.5 are classified as TY.
the multi-head self-attention for feature extraction, and performing classification.
Taking
2.3. The Figure 4 as an example, the left part of the figure (i.e., Patch + Position Em-
ViT Model
bedding and Transformer Encoder) corresponds to the realization process for feature ex-
Transformer is a novel neural network architecture that mainly utilizes self-attention
traction. The main function of “Patch + Position Embedding” is to divide the input image
mechanism to extract internal features. Its network architecture is primarily constructed
x, (x∈RH×W×C, where H/W and C represent the sizes of the image and the number of chan-
around the attention mechanism. Based on the input information, the self-attention mecha-
nels) into a number of sub-images xp, (xp∈RN × P^2 × C), where N(= 9 herein) represents
nism first generates three vectors, namely Query (Q), Key (K) and Value (V), through matrix
the number of sub-images, P represents the size of sub-image. This processing is also
transformation. Then, these vectors undergo multiple matrix operations and weightings,
termed as
through convolution,
which the mostwhich uses ainformation
significant sliding window can be with a specific
enhanced stepthe
while size. These
less sub-
relevant
images are then transformed into long vectors using a linear transformation.
information tends to be weakened. This is similar to the dot product operation of two Each vector
is combined
vectors: with a position-encoded
the calculation result tends to be vector, as depicted
maximized in Figure
for similar 4, the
vectors, number
whilst 1 tobe
it would 9.
This position
minimized for encoding vectorvectors.
two orthogonal is learnable and cansuch
By repeating be adjusted
attentionautomatically
operations, thethrough
model
training. Each sub-image vector with the position information
can output a set of feature vectors that selectively emphasize the salient is called a token. Notably,
information in
a special
the input. class (Cls) token is inserted at position 0 in Figure 4, which aggregates infor-
mationViTfrom the entire
is actually input sequence
an expanded versionintoofathe
vector for theTransformer
standard classificationintask.
the After
visionpatch-
field.
ing and Position Embedding, the input tensor is processed through
Figure 4 shows the inner structure of a ViT model [27]. There are two main functional mod- the Transformer En-
coderfeature
ules: for computation.
extraction In the self-attentive
module computation
(i.e., Transformer Encoder) of Transformer, each token
and classification in the
calculation
input tensor
module (rightispart
attention-weighted
of the figure). A and summed
typical workflow overfortheViT
other tokensthe
involves tofollowing
generate the cor-
several
responding contextual
procedures: dividing the representation.
input imagesFinally, afterwith
into blocks several Transformer
a certain cycles, the clas-
size, reassembling the
sification information vector is passed to the classification computation
divided image blocks into a sequence, transferring the combined results to the multi-head module for scor-
ing and generating
self-attention the final
for feature output.and performing classification.
extraction,

Figure 4.
Figure 4. Analysis
Analysis of
of the
the overall
overall structure
structure of
of ViT.
ViT.

( )
P 2 ×C × D
z0 =  xclass ; x1p E; x 2p E; L; x pN E  + E pos , E ∈ R ,E pos ∈ R(N +1)× D (5)

zl′ = MSA( LN ( zl-1 ) ) +zl-1 , l =1...L (6)

Remote Sens. 2023, 15, 4188 8 of 26

Taking Figure 4 as an example, the left part of the figure (i.e., Patch + Position Em-
bedding and Transformer Encoder) corresponds to the realization process for feature
extraction. The main function of “Patch + Position Embedding” is to divide the input
image x, (x∈RH×W×C , where H/W and C represent the sizes of the image and the number
of channels) into a number of sub-images xp, (xp∈RN × Pˆ2 × C), where N(= 9 herein)
represents the number of sub-images, P represents the size of sub-image. This processing is
also termed as convolution, which uses a sliding window with a specific step size. These
sub-images are then transformed into long vectors using a linear transformation. Each vec-
tor is combined with a position-encoded vector, as depicted in Figure 4, the number 1 to 9.
This position encoding vector is learnable and can be adjusted automatically through
training. Each sub-image vector with the position information is called a token. Notably, a
special class (Cls) token is inserted at position 0 in Figure 4, which aggregates information
from the entire input sequence into a vector for the classification task. After patching and
Position Embedding, the input tensor is processed through the Transformer Encoder for
computation. In the self-attentive computation of Transformer, each token in the input
tensor is attention-weighted and summed over the other tokens to generate the correspond-
ing contextual representation. Finally, after several Transformer cycles, the classification
information vector is passed to the classification computation module for scoring and
generating the final output.
h i
( P2 ×C )× D
z0 = xclass ; x1p E;x2p E; L;x N
p E + Epos , E ∈ R , Epos ∈ R( N +1)× D (5)

z0 l = MSA(LN(zl −1 )) + zl −1 , l = 1 . . . L (6)

zl = MLP LN(z0l + z0l l = 1 . . . L

(7)

y = LN z0L (8)

Mathematically, the above process can be summarized as Equation (5), where xclass rep-
resents the class token vector, i.e., the asterisk of the yellow forms in Figure 4; x Np represents
each sub-image, and E represents a linear projection layer (or the fully connected layer);
xNp E represents sub-image vector after transformation; Epos represents the position coding
information vector; while z0 is the processed input of Transformer Encoder. Next, the opera-
tions in Equations (6) and (7) will be repeated L times, where MSA indicates the multi-head
self-attention operation [32] of the Transformer, and MLP represents multi-layer perceptron
operations. Furthermore, regularization is required before the operations, denoted by LN.
In Equations (5)–(7), z0l , zl , and zl −1 /z0l −1 represent the result of multi-head self-attention
calculation, the calculation result of a complete transformer block, and the corresponding
residual connection, respectively. After several looped calculations, the output z0L (the final
classification information vector) will be regularization by Equation (8), which is regarded
as a feature of the entire image to carry out the classification or regression task.

2.4. Smoothing Methods

Four smoothing techniques are exploited to refine the estimation of TC intensity based
on the output predictions from different DL models or the same model but at varied time
steps: (i) linear fitting, which is a statistical method used to establish linear relationships
between variables; (ii) Gradient Boosting (GB), which is a machine learning method that
combines multiple weak predictive models [33]. It works by iteratively adding new models
that predict the residuals of the previous models, and then combines them to obtain a final
prediction; (iii) Random Forest (RF), which is another ML algorithm that builds multiple
decision trees and combines their predictions to obtain the final result. Each decision
tree is trained on a random subset of the data, and the final prediction is obtained by
aggregating the predictions of all trees; (iv) multi-layer perceptron, which is an artificial
Remote Sens. 2023, 15, 4188 9 of 26

neural network consisting of multiple layers of interconnected nodes or neurons, and it

is a powerful ML method for learning complex patterns. This study utilizes estimates
of frames at time t − 2, t − 1, and t to fit or smooth the final output. The specific steps
include: (i) sequentially organizing the output outcomes of the DL model in accordance
with their temporal sequence; (ii) using estimates from the current frames and the previous
two frames as inputs for four smoothing methods. This generates smoothed estimates for
the current TC intensity; (iii) replacing the DL model estimate with the smoothed estimate
and repeating the process for the next time step. This is performed iteratively until all test
data estimates are replaced by smoothed estimates.

2.5. Other Techniques

2.5.1. ADT
As an enhancement of the standard Dvorak technology (DT), ADT was developed on
the basis of the combination of DT and other algorithms or models [3,4]. DT was originally
proposed as a subjective method for TC intensity estimation based on satellite images. It
first locates the center of TC and then provides the intensity of the TC activity through cloud
model analysis. With the wide application of DT, increasing number of people improve
the standard DT by introducing other technologies. In the latest version of ADT (ADT
9.0), statistical models and dynamic models were integrated to eliminate subjective factors.
After objective positioning, ADT uses the statistical analysis results of all intensity range TC
samples to obtain the intensity estimation value based on regression statistics for a certain
TC. One of the biggest advantages of ADT is that it can be applied to every stage of the TC
life cycle, which is difficult to achieve for other technologies.

2.5.2. SATCON
Advanced satellite consensus (SATCON) [12,34] combines ADT estimation with other
methods for estimating TC intensity based on satellite remote sensing, including AMSU,
SSMIS, and ATMS and finally develops into a global TC intensity ensemble estimation
system. Specifically, SATCON utilizes statistical weighting methods to maximize the
advantages (or minimize the disadvantages) of each type of technology and generates a
consensus strength estimation for various TC structures. The statistical validation of this
method indicates that it is technically equivalent to the DT used by most meteorological
organizations; however, in some cases, the algorithm can outperform the DT, and the root
mean square error of its intensity is also lower than that of most current techniques. In
addition, this method has its advantages, such as alerting forecasters to rapid changes in
TC intensity that traditional methods (such as DT) may be unable to capture. Although
SATCON performs better than other methods for estimating TC intensity, it still has some
limitations, especially for real-time applications, as the estimation always depends on
certain satellite data. As a result, it fails to continuously work or to provide constant
feedback in time.

2.6. Model Performance

The performance of regression models are usually evaluated the following statistical
indices: root mean square error (RMSE), mean absolute error (MAE), and mean absolute
percentage error (MAPE). v
u n
u1
RMSE =t ∑ p j − o j
2
(9)
n j =1

1 n
n j∑
MAE = pj − oj (10)
=1

100% n p j − o j
n j∑
MAPE = (11)
=1
oj
Remote Sens. 2023, 15, 4188 10 of 26

where p j represents the prediction (estimated values, which are similar to the variable ŷi
in Equation (1)) of observation values (true values, which are similar to the variable yi in
Equation (1)) o j , and n represents the number of samples.
For the classification models in this study, the performance is qualified via precision
(P), recall rate (R), and F1 score (F). Table 2 presents the confusion matrix which compiles
the classifier results for calculating the PRF values. Here, NTP represents true positive
prediction, NTN denotes true negative prediction, NFP refers to false positive prediction, and
NFN stands for false negative prediction. It is clear from the definitions (Equations (12)–(15))
of PRF, P indicates the accuracy of positive predictions, R represents the percentage of
correctly identified positive samples among all positive samples, while F1-score is used to
evaluate the overall performance of the model as it provides the harmonic mean of P and R.

NTP + NTN
Accuracy = (12)
NTP + NTN + NFP + NFN
NTP
Precision = (13)
NTP + NFP
NTP
Recall = (14)
NTP + NFN

Recall × Precision
F1 = 2 (15)
Recall + Precision

Table 2. Confusion matrix of parameters for calculating PRF values.

Predicted
Confusion Matrix
Positive Negative
Positive NTP NFN
Actual
Negative NFP NTN

2.7. Computational Platform

The DCNN model and supervised learning algorithms are coded using Python 3.7
with Keras 2.2.4 and Tensorflow 2.1.0 package. The model training was accomplished on
an NVIDIA GeForce RTX 2080 Ti × 4 GPU and parallel computing management software
CUDA (v10.1), acceleration library cuDNN (v7.6.0.64).

3. Results and Discussions

3.1. The One-Stage Strategy
Figure 5 illustrates the learning curves of the two regression models, i.e., DCNN
and ViT, during the training (TG) and validating (VG) processes. As demonstrated, the
DCNN model can be optimized rapidly, whereas ViT requires more epochs for optimization.
Meanwhile, the minimum loss value of DCNN during validating is much less than the one
of ViT. Based on these results, it can be tentatively concluded that DCNN outperforms ViT
for the regression of TC intensity.
Since each epoch in Figure 5 corresponds to a parameterized model during the training
and validating processes, we can select the best model according to the minimum value of
the loss function from validating. Table 3 and Figure 6 summarize the overall performance
of the best DCNN and ViT models. Again, it is seen that the DCNN model prevails over
the ViT model. There are two possible reasons for this somewhat unexpected observation.
First, ViT requires much more data to achieve an optimized status during training [27],
while the amount of training set in this study is insufficient to further optimize the model
parameters. Second, ViT contains many tricks and optimizations for classify problems,
which however do not work for regression tasks.
Figure 5 illustrates the learning curves of the two regression models, i.e., DCNN and
ViT, during the training (TG) and validating (VG) processes. As demonstrated, the DCNN
model can be optimized rapidly, whereas ViT requires more epochs for optimization
Meanwhile, the minimum loss value of DCNN during validating is much less than the
Remote Sens. 2023, 15, 4188
one of ViT. Based on these results, it can be tentatively concluded that DCNN11outperforms
of 26

ViT for the regression of TC intensity.

Figure5.5.TC
Figure TCintensity regression
intensity model
regression learning
model curves:
learning (a) DCNN;
curves: (b) ViT.(b) ViT.
(a) DCNN;

Table 3. Performance of regression model during testing process.

Remote Sens. 2023, 15, x FOR PEER REVIEWSince
each epoch in Figure 5 corresponds to a parameterized model during the train-
12 of 27
ing Model
and validating processes, we
Dataset RMSE can select MAE
the best modelMAPE according toR the minimum
value of the lossValidation
function from12.02
validating. Table
9.47 3 and Figure
0.16 6 summarize 0.83 the overal
DCNN
performance of the best DCNN and ViT models. Again, it is seen that
Results in Figure 6 also indicate that both the DCNN and ViT models tend 0.86
Testing 11.18 8.57 0.16 the DCNN mode
to under-
prevails overTCtheValidation
ViT model. 12.95
There are two 10.01
possible reasons 0.18 0.80
estimate
ViT the intensity for samples with high intensity levels. for this
This somewhat
trend unexpected
is consistent
Testing 12.60 9.42 0.17 0.82
observation. First,
with the fact that theViT requires
number of SCImuch
samplesmore data to
decreases withachieve an optimized
increasing TC intensity.status during
training [27], while the amount of training set in this study is insuﬃcient to further opti
mize the model parameters. Second, ViT contains many tricks and optimizations for clas-
sify problems, which however do not work for regression tasks.

Table 3. Performance of regression model during testing process.

Model Dataset RMSE MAE MAPE R

Validation 12.02 9.47 0.16 0.83
DCNN
Testing 11.18 8.57 0.16 0.86
Validation 12.95 10.01 0.18 0.80
ViT
Testing 12.60 9.42 0.17 0.82

Figure 6. Estimations
Figure 6. Estimations from
from validating (Val) and
validating (Val) and testing
testing (test)
(test) processes
processes of
of DCNN
DCNN and and ViT
ViT for
for the
the
one-stage
one-stage strategy,
strategy,compared
comparedwithwithbest-track
best-track data. Red
data. Redline denotes
line linear
denotes fit of
linear fit estimation in function
of estimation in func-
tion
of of best-track
best-track data:data: (a) DCNN
(a) DCNN validation,
validation, (b)validation,
(b) ViT ViT validation, (c) DCNN
(c) DCNN testing,
testing, andViT
and (d) (d)testing.
ViT test-
ing.
Results in Figure 6 also indicate that both the DCNN and ViT models tend to underes-
timate
Tothe TC intensity
further explorefor
thesamples with high
above finding, intensity
Figure levels.the
7 examines This trend is consistent
distribution with
of estimation
the fact(in
errors that the number
terms of bothofabsolute
SCI samples decreases
and relative with i.e.,
errors, increasing
RMSE TC andintensity.
MAPE) with geo-
To further
graphic explore
coordinate. To the above
better finding, Figure
understand 7 examines
the results exhibitedthe in
distribution
the figure,ofFigure
estimation
8 de-
errors (in terms of both absolute and relative errors, i.e., RMSE and MAPE) with
picts the appearance probability of TC geneses and TCs with diﬀerent intensity levels. geographic
From Figure 7, large RMSE values are basically located at: (i) Luzon peninsula and sur-
rounding areas to its west/northwest where TCs (usually with high-intensity levels) are
markedly influenced by landfall-related eﬀects, (ii) southeast of the Northwest Pacific that
is dominated by TC geneses, (iii) central south of the Northwest Pacific where both TC
Remote Sens. 2023, 15, 4188 12 of 26

coordinate. To better understand the results exhibited in the figure, Figure 8 depicts the
appearance probability of TC geneses and TCs with different intensity levels. From Figure 7,
large RMSE values are basically located at: (i) Luzon peninsula and surrounding areas to
its west/northwest where TCs (usually with high-intensity levels) are markedly influenced
by landfall-related effects, (ii) southeast of the Northwest Pacific that is dominated by TC
geneses, (iii) central south of the Northwest Pacific where both TC geneses and stronger
TCs usually exist. By contrast, conditions for the relative error vary from those for RMSE
significantly: almost all large MAPE values site around the periphery of TC-influenced
areas where TCs tend to dissipate, whilst the central areas are featured by small values.
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 27
There are also some patches where large values of both RMSE and MAPE exist, e.g., (155◦ E,
◦
9 N). From Figure 8, the appearance probability of TCs at these locations is quite low.

Figure7.7.Geographic
Figure Geographicdistribution
distributionofofestimation
estimationerrors
errorsforfor
DCNN
DCNN and ViT
and from
ViT fromone-state strategy:
one-state strategy:
(a)
(a)RMSE
RMSEfor forDCNN,
DCNN,(b)
(b)RMSE
RMSEfor forViT,
ViT,(c)(c)
MAPE forfor
MAPE DCNN,
DCNN, andand
(d)(d)
MAPE
MAPEfor for
ViT.ViT.

The above findings can be reasonably explained by: (a) the morphological structures
of TC geneses and TCs during or after landfall are much more complicated, whilst this
complexity makes the input samples to be insufficient for training versatile DL models
adequately; (b) the SCI samples of TCs with higher intensity levels are fewer than those
with low-intensity-featured TCs, which degrades the model performance for stronger TCs.
It is worth noting that the utilization of image transportation during data pre-processing
can do reduce the negative influence caused by imbalanced distribution of samples to some
extent. However, to improve the model substantially, more data that cover each typical
condition are still required.

3.2. The Two-Stage Strategy

Figure 8. Geographic distribution of appearance probability of: (a) TCs, (b) TC genesis, (c) TCs with
MSW > 65 kt, and (d) TCs with MSW > 80 kt.

The above findings can be reasonably explained by: (a) the morphological structures
Remote Sens. 2023, 15, 4188 13 of 26

accuracy of DCNN is similar to the one reported by Wang [24] via CNN, while the ViT
model performs better than DCNN especially for samples with higher intensity levels (by
2.5%). Both models perform noticeably better for TY category than for STYS, which is
attributed
Figure to the factdistribution
7. Geographic that the former category
of estimation contains
errors larger
for DCNN andamount ofone-state
ViT from data andstrategy:
allows
the models to be trained more efficiently.
(a) RMSE for DCNN, (b) RMSE for ViT, (c) MAPE for DCNN, and (d) MAPE for ViT.

Remote Sens. 2023, 15, x FOR PEER REVIEW 14 of 27

3.2. The Two-Stage Strategy

3.2.1. Performance of Classification Models
The first stage involved in the two-stage strategy aims to stratify input samples into
a few intensity groups (two groups in this study, i.e., TY and STYS) via classification mod-
els. Thus, it is helpful to clarify the performance of these classification models for better
understanding that of the two-stage strategy.
The learning curves for the classification models are presented in Figure 9. Table 4
compares the classification results via the DCNN and ViT models. The average values of
prediction accuracy, recall rate, and F1 score for both models exceed 80%. Overall, the
accuracy of DCNN is similar to the one reported by Wang [24] via CNN, while the ViT
model performs better than DCNN especially for samples with higher intensity levels (by
2.5%). Both models perform noticeably better for TY category than for STYS, which is at-
Figure 8.
tributed
Figure 8. Geographic
to the fact distribution
Geographic that of
of appearance
the former
distribution categoryprobability
appearance of:
of: (a)
contains larger
probability TCs,
TCs, (b)
(b) TC
(a) amount genesis,
of data
TC and(c)
genesis, TCs
TCs with
(c)allows the
with
MSW
models
MSW > 65 kt,
> 65tokt, and
beand (d)
trainedTCs with
more
(d) TCs MSW >
witheﬃciently.80
MSW > 80 kt.kt.

The above findings can be reasonably explained by: (a) the morphological structures
of TC geneses and TCs during or after landfall are much more complicated, whilst this
complexity makes the input samples to be insuﬃcient for training versatile DL models
adequately; (b) the SCI samples of TCs with higher intensity levels are fewer than those
with low-intensity-featured TCs, which degrades the model performance for stronger
TCs. It is worth noting that the utilization of image transportation during data pre-pro-
cessing can do reduce the negative influence caused by imbalanced distribution of sam-
ples to some extent. However, to improve the model substantially, more data that cover
each typical condition are still required.

Figure9.9.Learning
Figure Learningcurves
curvesof
ofclassification
classificationmodels:
models:(a)
(a)DCNN;
DCNN;(b)
(b)ViT.
ViT.

Table4.4.Overall
Table Overallperformance
performanceof
ofthe
theDCNN
DCNNand
andViT
ViTclassification
classificationmodels.
models.

Model Category Validation Accuracy Validation Testing Ac-Precision

Testing Accuracy
Model Category Precision Recall
RecallRatio
Ratio F1-Score
F1-Score
TY Accuracy curacy 0.875 0.864 0.870
DCNN 0.805 0.834
STYS TY 0.875
0.762 0.864
0.780 0.870
0.771
TY DCNN 0.805 0.834 0.885 0.887 0.886
ViT 0.831 STYS 0.854 0.762 0.780 0.771
STYS 0.797 0.795 0.796
TY 0.885 0.887 0.886
ViT 0.831 0.854
STYS 0.797 0.795 0.796
To detail the performance of the two classification models, Tables 5 and 6 exhibit the
confusion matrix of predictions against the true labels. Results indicate that STY has the
To detail the performance of the two classification models, Tables 5 and 6 exhibit the
lowest recognition recall (66%). This phenomenon can be explained by the fact that STY
confusion matrix of predictions against the true labels. Results indicate that STY has the
(64–84 kt) and TY (34–63 kt) are two neighboring intensity categories, and there should be
lowest recognition recall (66%). This phenomenon can be explained by the fact that STY
more samples belonging to varied categories but possessing much similar morphological
(64–84 kt) and TY (34–63 kt) are two neighboring intensity categories, and there should be
more samples belonging to varied categories but possessing much similar morphological
features of the TC cloud, which makes the classification more challenging. Except for the
STY category, the recognition recall for both models in other categories tends to increase
with the increase of intensity level, which is consistent with the fact that the morphological
Remote Sens. 2023, 15, 4188 14 of 26

features of the TC cloud, which makes the classification more challenging. Except for the
STY category, the recognition recall for both models in other categories tends to increase
with the increase of intensity level, which is consistent with the fact that the morphological
characteristics of TCs from two farther-spaced intensity categories differ from each other
more clearly.

Table 5. Confusion matrix of predictions from the DCNN classification model.

True Label

DCNN STYS
TY Sum
STY VSTY VTY
TY 6600 768 150 22 7540
Predicted label STYS 1041 1486 1326 527 4380
Sum 7641 2254 1476 549 11920

Table 6. Confusion matrix of predictions from the ViT classification model.

True Label
ViT STYS
TY Sum
STY VSTY VTY
TY 6775 729 130 17 7651
Predicted label STYS 866 1525 1346 532 4269
Sum 7641 2254 1476 549 11920

3.2.2. Performance of the Two-Stage Strategy

Table 7 compares the performance of four specific scenarios for the two-stage strategy.
Here, X_Y denotes a combination of X model for classification and Y model for regression,
hereafter. The classification models are trained and optimized via the entire samples, while
the regression models are treated similarly but via the samples associated with one of the
two TC-intensity groups. The performance of the two-stage strategy is examined via the
combined usage of the optimized classification model and regression model based on the
testing dataset for the classification model.

Table 7. Performance of four scenarios for the two-stage strategy.

Scenario RMSE MAE MAPE R

DCNN_DCNN 12.82 9.37 0.17 0.81
DCNN_ViT 13.45 10.00 0.18 0.79
ViT_ViT 12.70 9.63 0.17 0.82
ViT_DCNN 11.78 8.88 0.16 0.84

As expected, ViT_DCNN achieves the best performance. However, comparison of

the results in Tables 3 and 7 reveals that the two-stage strategy does not surpass the one-
stage strategy via DCNN. One major reason lies in that for the two-stage strategy, the
misclassified samples at the first stage can bring in enlarged errors for the regression results
at the second stage, and such errors cannot be diminished through better training the
regression models.
Figure 10 shows the box plots of the estimation bias for each of the four JMA intensity
categories via the four specific two-stage strategies. The red line represents median, while
the grey points represent outlier. It is evident that the TC intensity of samples in STY, VSTY
and VTY are all statistically underestimated. This systematic bias is primarily attributed
to the insufficiency of SCI samples in categories with high-intensity levels. It seems that
the DL models are trained to perform in a similar way to what they have learned from the
Remote Sens. 2023, 15, 4188 15 of 26

RemoteSens.
Remote Sens.2023,
2023,15,
15,xxFOR
FORPEER
PEERREVIEW
REVIEW 16 ofof 27
16 27
majority of input samples, whilst samples beyond the categories of such majority tend to
be treated as the same kind to the majority.

Figure
Figure 10.
Figure10. Boxplots
10.Boxplots of
Boxplotsof estimation
ofestimation biasfor
estimationbias
bias fordiﬀerent
for diﬀerentmodels
different modelsvia
models viatwo-stage
via two-stage
two-stage strategies:
strategies:
strategies: (a)
(a)
(a) DCNN_DCNN,
DCNN_DCNN,
DCNN_DCNN, (b)
(b) DCNN_Vit,
DCNN_Vit, (c)
(c) ViT_ViT,
ViT_ViT,
(b) DCNN_Vit, (c) ViT_ViT, and (d)ViT_DCNN. and
and (d)ViT_DCNN.
(d)ViT_DCNN.

Figure
Figure
Figure11 11 examinesthe
11examines
examines thegeographic
the geographicdistribution
geographic distribution
distribution of
ofof estimation
estimation
estimation errors
errors
errors from from
from two
twotwo two-
two-
two-state
state
state strategies,
strategies, i.e.,
i.e., ViT_DCNN
ViT_DCNN and
and ViT_ViT.
ViT_ViT. Comparison
Comparison of
of the
the results
results
strategies, i.e., ViT_DCNN and ViT_ViT. Comparison of the results with those in Figure with
with those
those in7,
in
Figure
Figure 7,7,no
no evident nodifferences
evidentdiﬀerences
evident diﬀerences are
are found,are found,the
found,
although although
although theoverall
overall the overall
errors errors
errors
in Figure inFigure
11 in
are Figure
slightly11
11 are
are
larger
slightly
slightly
than thoselarger
larger thanthose
in than
Figure those
7. in inFigure
Figure7.7.

Figure11.
Figure
Figure 11. Geographicdistribution
11.Geographic
Geographic distributionof
distribution ofestimation
of estimationerrors
estimation errorsfrom
errors fromtwo
from twotwo-state
two two-statestrategies:
two-state strategies:(a)
strategies: (a)RMSE
(a) RMSEfor
RMSE for
for
ViT_DCNN,
ViT_DCNN,
ViT_DCNN,(b) (b) RMSE
(b)RMSE for
RMSEfor ViT_ViT,
forViT_ViT, (c)
ViT_ViT,(c) MAPE
(c)MAPE
MAPEforforViT_DCNN,
ViT_DCNN,and and(d)
(d)MAPE
MAPEfor forViT_ViT.
ViT_ViT.
ViT_ViT.
As shown in Figure 12, all the smoothing methods applied resulted in improved es-
timates, with ViT showing a more significant improvement. Among these methods, the
linear weighting method and MLP fit produced the most notable improvements, reducing
the DCNN’s RMSE by approximately 9% and ViT’s by almost 14%. Moreover, a compari-
Remote Sens. 2023, 15, 4188 son with Figure 6 indicates that the smoothing method was eﬀective in reducing under- 16 of 26
estimation errors for high-intensity samples.
Similarly, Figure 13 examines the estimation errors after smoothing using DCNN and
ViTSmoothing
3.3. models. AManipulation
comparison with Figure 7 reveals that the RMSE is significantly lower in
the central as well as in the
3.3.1. Based on One-State southern region of Northwest Pacific, indicating the eﬀective-
Strategy
ness of the smoothing method in reducing errors in high-intensity samples. However,
As shown in Figure 12, all the smoothing methods applied resulted in improved
there is notwith
estimates, muchViTchange
showingin the MAPEsignificant
a more before andimprovement.
after smoothing as shown
Among thesein Figure
methods,13.
Generally, MAPE demonstrates higher sensitivity to observations exhibiting
the linear weighting method and MLP fit produced the most notable improvements, re- abrupt inten-
sity changes,
ducing which are
the DCNN’s RMSEprone
by to occur during9%
approximately coastal
and landfalls or during
ViT’s by almost initial
14%. phases of
Moreover, a
rapid intensification in the open ocean. Consequently, it is plausible that the
comparison with Figure 6 indicates that the smoothing method was effective in reducing smoothing
methods may not
underestimation substantially
errors enhance these
for high-intensity specific scenarios.
samples.

Figure12.
Figure 12.Smoothed
Smoothedestimations
estimationsfrom
from testing
testing process
process of DCNN
of DCNN andand
ViTViT for the
for the one-stage
one-stage strategy
strategy via
via diﬀerent smoothing methods: (a,e) using linear weighting; (b,f) using GB; (c,g) using RF; (d,h)
different smoothing methods: (a,e) using linear weighting; (b,f) using GB; (c,g) using RF; (d,h) using
using MLP, compared with best-track data.
MLP, compared with best-track data.

Similarly, Figure 13 examines the estimation errors after smoothing using DCNN
and ViT models. A comparison with Figure 7 reveals that the RMSE is significantly
lower in the central as well as in the southern region of Northwest Pacific, indicating
the effectiveness of the smoothing method in reducing errors in high-intensity samples.
However, there is not much change in the MAPE before and after smoothing as shown
in Figure 13. Generally, MAPE demonstrates higher sensitivity to observations exhibiting
abrupt intensity changes, which are prone to occur during coastal landfalls or during initial
phases of rapid intensification in the open ocean. Consequently, it is plausible that the
smoothing methods may not substantially enhance these specific scenarios.
Remote
Remote Sens. 2023,
2023, 15,
15, 4188
x FOR PEER REVIEW 17 18 of 27
of 26

Figure 13.Geographic
Figure13. Geographicdistribution
distributionofofsmoothed estimation
smoothed errors
estimation forfor
errors DCNN_Linear andand
DCNN_Linear ViT_MLP
ViT_MLP
from one-state
from one-statestrategy:
strategy: (a)
(a) RMSE
RMSEfor forDCNN_Linear,
DCNN_Linear,(b)(b)RMSE
RMSEforfor ViT_MLP,
ViT_MLP, (c) (c) MAPE
MAPE for for
DCNN_Linear,and
DCNN_Linear, and(d)
(d)MAPE
MAPE forfor ViT_MLP.
ViT_MLP.

3.3.2.
3.3.2.Based
Basedon onTwo-State
Two-StateStrategy
Strategy
In this section, we present an examination of the smoothing estimates of the two-state
In this section, we present an examination of the smoothing estimates of the two-state
strategy model. Table 8 presents the evaluation indices for the various model combinations.
strategy model. Table 8 presents the evaluation indices for the various model combina-
For the sake of brevity, we use abbreviations of the models to denote the combinations,
tions. For the sake of brevity, we use abbreviations of the models to denote the combina-
such as X_Y_Z, where X is the classification model, Y is the regression model, and Z is the
tions, such as X_Y_Z, where X is the classification model, Y is the regression model, and
smoothing method. Since DCNN performs better than ViT in the one-state strategy, DCNN
isZthen
is theutilized
smoothing
as themethod.
regressionSince DCNN performs better than ViT in the one-state strat-
model.
egy, DCNN is then utilized as the regression model.
Table Table 8 shows
8. Overall that the
performance optimal
of the model
two-state combination
strategy is using ViT for classification,
smoothed models.
DCNN for regression, and MLP for smoothing. However, even with the best combination
Model RMSE (kt) MAE (kt) MAPE R
model, the performance is still not better than the one-state strategy best model after
D_D_Linear 12.07 9.16 0.16
smoothing linear weighting. Our analysis indicated that the classification performance 0.84
has aD_D_GB
significant impact12.05
on the results. For 9.17
example, while0.16 0.84
ViT is only 2% more accurate
D_D_RF 12.00 9.15 0.16 0.84
thanD_D_MLP
DCNN in the classification
12.13 model, it can reduce the 0.16
9.15 RMSE by almost0.84 1 kt. Further
analysis revealed that the
V_D_Linear MAE of TC samples
10.64 8.10 correctly identified
0.15 by ViT was 0.88only 7.4 kt,
but for misclassified samples,
V_D_GB 10.90 it was as high 8.29 as 17.3 kt. This
0.15suggests that enhancing
0.87 the
V_D_RF 10.85 8.30 0.15
classification’s performance can significantly improve the TC estimation. 0.87
V_D_MLP 10.65 8.09 0.15 0.88
Table 8. Overall performance of the two-state strategy smoothed models.
Table 8 shows that the optimal model combination is using ViT for classification,
DCNN for Model
regression, and MLP RMSE (kt)
for smoothing. MAE (kt)
However, even with MAPE R
the best combination
model, D_D_Linear
the performance is still not 12.07 9.16
better than the one-state 0.16
strategy best model 0.84
after
smoothing linear weighting. Our 12.05
D_D_GB analysis indicated 9.17
that the classification
0.16 performance
0.84
has a significant
D_D_RF impact on the results.
12.00For example, 9.15
while ViT is only0.162% more accurate
0.84
than DCNN in
D_D_MLP the classification model,
12.13 it can reduce the
9.15 RMSE by almost
0.16 1 kt. Further
0.84
analysisV_D_Linear
revealed that the MAE of TC samples correctly
10.64 8.10identified by ViT
0.15was only 7.4 kt,
0.88
but for misclassified samples, it was as high as 17.3 kt. This suggests that enhancing the
V_D_GB 10.90 8.29 0.15 0.87
classification’s performance can significantly improve the TC estimation.
V_D_RF 10.85 8.30 0.15 0.87
Figure 14 depicts the distribution of estimation errors of the two best models in this
V_D_MLP
section. In contrast to Figure 11, the 10.65
smoothing method 8.09applied to the0.15 0.88
two-state strategy

Figure 14 depicts the distribution of estimation errors of the two best models in this
section. In contrast to Figure 11, the smoothing method applied to the two-state strategy
Remote Sens. 2023, 15, x FOR PEER REVIEW 19 of 27
Remote Sens. 2023, 15, 4188 18 of 26

resultsin
results inreduced
reducedRMSE
RMSEand andMAPE
MAPEininthe the central
central andand southern
southern regions
regions of Northwest
of Northwest
Pacific. However, the estimates do not show improvement across regions compared
Pacific. However, the estimates do not show improvement across regions compared to the to the
smoothedestimation
smoothed estimationerror
errorfor
forthe
theone-state
one-stateasas shown
shown in in Figure
Figure 13.13.

Figure14.
Figure 14.Geographic
Geographicdistribution
distributionofofsmoothed
smoothed estimation
estimation errors
errors forfor V_D_Linear
V_D_Linear andand V_D_MLP
V_D_MLP
fromtwo-state
from two-statestrategy:
strategy:(a)(a) RMSE
RMSE forfor V_D_Linear,
V_D_Linear, (b) (b)
RMSERMSE for V_D_MLP,
for V_D_MLP, (c) MAPE
(c) MAPE for V_D_Lin-
for V_D_Linear,
ear,(d)
and andMAPE
(d) MAPE for V_D_MLP.
for V_D_MLP.

3.3.3.
3.3.3.Smoothed
SmoothedEstimation
EstimationBasedBasedononHybrid
Hybrid Strategies
Strategies
Further, we conduct an in-depth exploration
Further, we conduct an in-depth exploration ofof
thethe
hybrid
hybrid of of
one-state
one-statestrategy andand
strategy
two-state
two-state strategy approaches for evaluation. Figure 15 illustrates the scatter plot the
strategy approaches for evaluation. Figure 15 illustrates the scatter plot of of the
approach and the corresponding error results. Similar to Table 8, Figure
approach and the corresponding error results. Similar to Table 8, Figure 15 also employs 15 also employs
abbreviations to denote the combinations of models. In the abbreviation V_D_D, ‘V’
abbreviations to denote the combinations of models. In the abbreviation V_D_D, V’ indi-
indicates the use of ViT as the classification model, first ‘D’ denotes the employment
cates the use of ViT as the classification model, first D’ denotes the employment of DCNN
of DCNN as the regression estimation model in two-state strategy. While second ‘D’
as the regression estimation model in two-state strategy. While second D’ represents es-
represents estimation conducted directly using DCNN in one-state strategy. The complete
timation conducted directly using DCNN in one-state strategy. The complete abbreviation
abbreviation indicates the hybrid of one-state strategy and two-strategy.
indicates
Figurethe15ahybrid of one-state
displays the hybridstrategy and
strategies two-strategy.
without any smoothing method, which have
Figure 15a displays the hybrid strategies
outperformed the other two strategies. Next, four smoothing without any smoothing method,
treatments werewhich
appliedhave
outperformed
to the otherand
the hybrid strategies, twoasstrategies. Next,
anticipated, four smoothing
all results in Figure treatments were applied
15b–e outperformed the to
previous strategies. It is worth noting that linear fitting smoothing and MLP smoothing arepre-
the hybrid strategies, and as anticipated, all results in Figure 15b–e outperformed the
vious
still thestrategies. It is worth
best performing noting
in hybrid that linear fitting smoothing and MLP smoothing are
strategies.
still We
the have
best performing
conducted ainfurther
hybridcomparison
strategies. of the error distributions for different lat-
itudesWe andhave conducted
longitudes. Thea further comparison
results presented in of the error
Figure distributions
16 demonstrate forthe
that diﬀerent
error islati-
tudes and
reduced forlongitudes.
almost the The
entireresults presented
central region ofinNorthwest
Figure 16 Pacific,
demonstrate that
with the the error
MAPE is re-
show-
ing
duceda particularly
for almostnoticeable
the entire improvement.
central region While the hybrid
of Northwest strategy
Pacific, withperforms
the MAPE better in
showing
comparison
a particularly to noticeable
Figures 7, 11, 13 and 14, the
improvement. TC estimates
While the hybrid forstrategy
the nearperforms
coastal areas doinnot
better com-
show
parisonanytosignificant
Figures 7,improvement.
11, 13 and 14, theApartTCfrom the influence
estimates of abrupt
for the near intensity
coastal areas do changes
not show
on
anycoastal regions,
significant the quantityApart
improvement. or quality
fromoftheSCIs in theseof
influence areas might
abrupt also potentially
intensity changes on
introduce disturbances to the results.
coastal regions, the quantity or quality of SCIs in these areas might also potentially intro-
duce disturbances to the results.
Remote Sens. 2023, 15, x FOR PEER REVIEW 20 of 27
Remote
RemoteSens. 2023,15,
Sens.2023, 15,4188
x FOR PEER REVIEW 2019of
of 27
26

Figure 15. Smoothed estimations from testing process of hybrid strategy models, compared with
Figure15.
Figure
best-track15.data:
Smoothed
Smoothed estimations
estimations
(a) V_D_D, from
(b) from testing process
testing
V_D_D_Linear,process ofV_D_D_GB,
(c) of hybrid strategy
hybrid strategy
(d) models, compared
V_D_D_RF, and with
(e)
best-track data: (a) V_D_D, (b) V_D_D_Linear, (c) V_D_D_GB, (d) V_D_D_RF,
best-track data: (a) V_D_D, (b) V_D_D_Linear, (c) V_D_D_GB, (d) V_D_D_RF, and (e) V_D_D_MLP.
V_D_D_MLP. and (e)
V_D_D_MLP.

Figure
Figure 16. Geographic
Geographicdistribution
distributionofofofsmoothed
smoothedestimation
estimationerrors
errorsforforV_D_D_Linear
V_D_D_Linear and
and
Figure 16.
16. Geographic distribution smoothed estimation errors for V_D_D_Linear and
V_D_D_MLP
V_D_D_MLP from one-state strategy: (a) RMSE for V_D_D_Linear, (b) RMSE for V_D_D_MLP, (c)
V_D_D_MLP from from one-state strategy:(a)
one-state strategy: (a)RMSE
RMSEforfor V_D_D_Linear,
V_D_D_Linear, (b)(b) RMSE
RMSE for for V_D_D_MLP,
V_D_D_MLP, (c)
MAPE
(c) for V_D_D_Linear,
MAPE and and
(d) MAPE for V_D_D_MLP.
MAPE forfor V_D_D_Linear,
V_D_D_Linear, and (d)(d) MAPE
MAPE forfor V_D_D_MLP.
V_D_D_MLP.

3.4.Comparison
3.4. ComparisonwithwithOther
OtherTechniques
Techniques
3.4. Comparison with Other Techniques
Thebest
The bestestimation
estimationresults
resultsobtained
obtainedfromfromthis
thisstudy
study(i.e.,
(i.e.,via
viaV_D_D_Linear
V_D_D_Linearand and
The best estimation results obtained from this study (i.e., via V_D_D_Linear and
V_D_D_MLP)are
V_D_D_MLP) arecompared
comparedwithwiththeir
theircounterparts
counterpartsvia viavaried
variedtechniques
techniquesfrom
fromother
other
V_D_D_MLP) are compared with their counterparts via varied techniques from other
studies.The
studies. Thereference
referencesources
sourcesare
areselected
selectedtotoaccount
account forthethe conditionoverover theNorthwest
Northwest
studies. The reference sources are selected to accountfor for thecondition
condition overthe the Northwest
PacificOcean.
Pacific Ocean.AsAsreflected,
reflected,SATCON
SATCONhas hasachieved
achievedthe thebest
bestperformance
performanceamongamongallallthe
the
Pacific Ocean. As reflected, SATCON has achieved the best performance among all the
sources,
sources, but our hybrid strategies win out over all other methods. The best model proposed
sources,but
butourourhybrid
hybridstrategies
strategieswinwinout
outover
overallallother
othermethods.
methods.The Thebest
bestmodel
model
Remote Sens. 2023, 15, 4188 20 of 26

in this study surpasses methods like VGG19 and TCIENet due to two primary factors.
Firstly, our approach utilizes the advanced ViT classifier. This classifier integrates an
attention mechanism capable of capturing global information, thereby enhancing the
sample classification capability. The incorporation of ViT contributes to the reduction in the
final error of our hybrid strategy. Secondly, this study introduces a smoothing technique.
Given the gradual evolution characteristic of TC, the application of the smoothing technique
to the model’s output yields more stable estimations, consequently leading to a notable
decrease in the overall TC estimation error.
It should be stressed that the estimation results via DL models are vulnerably influ-
enced by factors such as data sources and selectors, and it is difficult to compare these
results objectively and fairly. For instance, there are variations in the selection of labels,
best-track data, and gust duration involved in the definition of MSW. Additionally, there
may be variations in the types of adopted images. While some studies only use IR images,
others may also incorporate WV satellite cloud images, among others. Finally, the test
samples may also differ from one another. While some studies use recent TCs as the test
set, others select TC samples from the past. These factors inevitably contribute to biases in
both image quality and label precision.
Comparison of the results in Figures 17 and 18 with those in Table 9 also reveals
some discrepancies. Typically, the proposed DL-aided methods demonstrate superior
Remote Sens. 2023, 15, x FOR PEER REVIEW
performance over SATCON. This discrepancy is expected to be attributed to the utilization
of different baseline data for evaluating varied techniques. In principle, it is the best way
to compare the estimation results with in situ data. However, such records are usually
unavailable in the Northwest Pacific basin. Moreover, the selection of SCIs in this study
the initial introduces
unavoidably stage of the TCs, most
deviations of the
in relation methods
to labels (in particular
from varying sources. Inthe DL
such methods)
cases,
the baseline data can be selected from the TC best-track dataset. As the
overestimate TC intensity slightly. Second, ADT and SATCON show significant o best-track data
issued from JMA and CMA are suggested to be more reliable for TCs over the Northwest
mation when TC intensity exceeds ~100 kt. Third, the DL models are more likel
Pacific basin [35,36], they are used as the baseline data in this study to train, validate and
derestimate
test the DL-aidedTCs at high-intensity status.
models.

Figure
Figure 17.17. Histogram
Histogram of errors
of errors obtained
obtained viatechniques:
via different different (a)
techniques: (a) RMSE,
RMSE, (b) MAE, (b)and
(c) MAPE, MAE, (c) M
and
(d) (d) R coefficient.
R coefficient.
Remote Sens. 2023, 15, 4188 21 (c)
Figure 17. Histogram of errors obtained via different techniques: (a) RMSE, (b) MAE, of 26
MAPE,
and (d) R coefficient.

Figure 18.Boxplots
Figure18. Boxplotsofof
estimation
estimation biasbias
for different techniques:
for different (a) for(a)
techniques: TYfor
intensity samples,
TY intensity (b) for (b) for
samples,
STY
STYintensity
intensitysamples, (c) (c)
samples, for for
VSTY intensity
VSTY samples,
intensity and (d)
samples, for(d)
and VTYforintensity samples.samples.
VTY intensity
Table 9. Comparison of the best estimation performance in this study with those in references.
To further explore the phenomena observed in Figure 19, we scrutinize the variations
Model of estimation
RMSE (kt) results
MAEtogether
(kt) with associated
TC Year SCIs among different developing stages, i.e.
Reference
ADT9.0 formation
11.24 stage, mature
8.67 stage and dissipation
2018 stage, as shown
Olander et Figure
in al. [4] 20. For TCs a
DAV-T both the formation and dissipation stages, the morphological structures
14.3 - 2007–2011 Ritchie et al. [37] of TC cloud are
SATCON 8.9 7.70 2008–2010 Velden and Herndon
manifold, and the samples becomes relatively insufficient to generate versatile [12] DL models
TCIENet 10.12 7.94 2017 Zhang and Liu [25]
CNN-TC During
12.25 the mature period,
- the MSW2015–2016
values for the four casesChen
all exceed
et al. [23]100 kt. Obviously
VGG19 13.23 - 2015–2016 Combinido et al. [21]
CNN 1 10.19 - 2015–2018 Wang et al. [24]
V_D_D_Linear 9.81 7.52
2018–2019 This study
V_D_D_MLP 9.85 7.51
1 dataset is split randomly in the given proportion.

To further demonstrate the validity of the proposed methods and strategies, we

conduct a detailed comparison of our results with those from two of the most authoritative
and representative technologies, i.e., ADT and SATCON, as shown in Figures 17 and 18.
From Figure 17, the DL-aided methods perform better than ADT and SATCON, with
the RMSE and MAE decreased by approximately 30%. On the other hand, results from
Figure 18 indicate that the DL-aided methods can provide more stable (i.e., performance
varies less significantly) and meanwhile less biased estimations.
Figure 19 further details the comparison and depicts the evolution of estimations
obtained via different methods against the best-track records. Again, the proposed DL
models perform better than ADT and SATCON, particularly during the periods when
TCs experience sudden variation in intensity. There are also some points to be stressed.
First, at the initial stage of the TCs, most of the methods (in particular the DL methods)
tend to overestimate TC intensity slightly. Second, ADT and SATCON show significant
overestimation when TC intensity exceeds ~100 kt. Third, the DL models are more likely to
underestimate TCs at high-intensity status.
s. 2023, 15, x FOR PEER REVIEW 23 of 27

Remote Sens. 2023, 15, 4188 22 of 26

there are limited samples to train the DL models for this case adequately, and they are
more likely to underestimate TC intensity.

Figure 19. Comparison of estimations via varied methods for four TCs: (a) Mangkhut in 2018, (b)
Figure 19. Comparison of estimations via varied methods for four TCs: (a) Mangkhut in 2018, (b) Yutu
Yutu in 2018, (c) Wutip in 2019, and (d) Hagibis in 2019.
in 2018, (c) Wutip in 2019, and (d) Hagibis in 2019.
Remote Sens. 2023, 15, 4188 23 of 26

To further explore the phenomena observed in Figure 19, we scrutinize the variations
of estimation results together with associated SCIs among different developing stages, i.e.,
formation stage, mature stage and dissipation stage, as shown in Figure 20. For TCs at
both the formation and dissipation stages, the morphological structures of TC cloud are
manifold, and the samples becomes relatively insufficient to generate versatile DL models.
During the mature period, the MSW values for the four cases all exceed 100 kt. Obviously,
Remote Sens. 2023, 15, x FOR PEER REVIEW 24 of 27
there are limited samples to train the DL models for this case adequately, and they are more
likely to underestimate TC intensity.

Figure
Figure20.
20.Estimation
Estimationerrors
errorsfor
for44TCs
TCsatatvaried
varieddeveloping
developingstages
stages(A,
(A,B,B,CCrepresent
representthe
theformation,
formation,
mature,
mature, and dissipation stage): (a) Mangkhut in 2018; (b) Yutu in 2018; (c) Wutip in 2019;(d)
and dissipation stage): (a) Mangkhut in 2018; (b) Yutu in 2018; (c) Wutip in 2019; (d)Hagibis
Hagibis
in 2019.
in 2019.

4.4.Concluding
ConcludingRemarks
Remarks
In
In this
this study,
study, we
we exploited
exploited two
two mainstream
mainstream DL DL models,
models, i.e.,
i.e., DCNN
DCNNand andViT,
ViT, and
and
some
somesmoothing
smoothingtechniques
techniquesto to estimate
estimate TC
TC intensity
intensity from
from SCIs.
SCIs. Several
Several strategies
strategies were
were
proposed
proposed to to improve
improve the
the estimation
estimation performance,
performance,including
includingthe
theone-stage
one-stagestrategy,
strategy, the
the
two-stagestrategy
two-stage strategyand
andaahybrid
hybridstrategy
strategyconsisting
consistingof
ofthe
theabove
abovestrategies
strategiesand
andsmoothing
smoothing
manipulations.Main
manipulations. Mainresults
resultsand
andconclusions
conclusionsare
aresummarized
summarized asas below.
below.
(1) For
(1) Forthe
theone-stage
one-stagestrategy,
strategy,bothbothDCNN
DCNNand andViTViTwere
wereused
usedasasthe
theregression
regressionmodels.
models.
Results suggested that DCNN outperformed ViT slightly,
Results suggested that DCNN outperformed ViT slightly, with the RMSE for ViT with the RMSE for be-
ViT
being
ing approximately
approximately 1 kt1larger
kt larger
thanthan
thatthat
for for DCNN.
DCNN.
(2) For
(2) Forthe
thetwo-stage
two-stagestrategy,
strategy,aaclassification
classificationmodel
modelandandaaregression
regressionmodel
modelwere
werecom-
com-
binedtotofirstly
bined firstlyclassify
classify input
input samples
samples into several intensity
intensity groups
groupsand
andthen
thentotospecify
spec-
thethe
ify TCTCintensity. Despite
intensity. Despite thethe
reasonable
reasonable idea behind
idea behind this strategy,
this it did
strategy, not
it did notlead to
lead
to further improvement of the model performance. The minimum RMSE was a bit
larger (0.6 kt) than that of DCNN for the one-stage strategy.
(3) We further exploited diﬀerent smoothing methods to refine the output results from
either the regression/classification models or their combinations. The results demon-
strated that the DCNN regression model with linear weighting and MLP methods
Remote Sens. 2023, 15, 4188 24 of 26

further improvement of the model performance. The minimum RMSE was a bit larger
(0.6 kt) than that of DCNN for the one-stage strategy.
(3) We further exploited different smoothing methods to refine the output results from
either the regression/classification models or their combinations. The results demon-
strated that the DCNN regression model with linear weighting and MLP methods
outperformed the optimal model for the one-stage strategy, with RMSE values de-
creased by 1.08 kt and 1.00 kt, respectively.
(4) We also combined the one-stage strategy, two-strategy and smoothing manipulation
together to form the V_D_D_Linear and V_D_D_MLP hybrid strategies. Such hybrid
strategies generated the best performance in this study, with the RMSE value equal to
9.81 kt.
(5) Finally, the model performance presented in this study was compared to those re-
ported by others. Results showed that the DL model performed better than most
existing methods.
Although better estimation performance has been achieved through combined us-
age of multiple DL techniques and strategies, it should be clarified that fundamental
improvements of ML-aided estimation of TC intensity should essentially come from the
advancement of either the quantity/quality of data for model training or the ML models
themselves. Thus, we can further optimize the DL models and their hybrid as discussed in
this study by using: (i) more credible data (e.g., aircraft observations) instead of traditional
best-track records as SCIs’ label information; (ii) larger amount of data and more types
of SCIs (e.g., enhanced SCIs and WV images); (iii) additional physical knowledge and/or
other kinds of input information that affects TC intensity (e.g., sea surface temperature,
vorticity, and vertical wind shear). Meanwhile, we can use more advanced DL models,
such as the Swin Transformer [38] and DeiT [39], which have been demonstrated to possess
some overwhelming advantages against DCNN or ViT in certain respects.

Author Contributions: B.T., investigation, visualization, data curation, and writing—original draft.
J.F., funding acquisition, project administration, and supervision. Y.D., methodology development
and investigation. Y.H. (Yongjun Huang), visualization and data curation. P.C., data curation. Y.H.
(Yuncheng He), formal analysis, writing—editing, conceptualization, funding acquisition, and project
administration. All authors have read and agreed to the published version of the manuscript.
Funding: The authors wish to acknowledge the financial support provided by the National Science
Fund for Distinguished Young Scholars (Grant No: 51925802), the National Natural Science Foun-
dation of China (Grant No: 52178465), the Natural Science Foundation of Guangdong Province for
Distinguished Young Scholars (Grant No: 2023B1515020117), the Guangzhou Municipal Science and
Technology Project (Grant No: 202201021330190101) and the Ministry of Education, China-111 Project
(Grant No: D21021).
Data Availability Statement: The data utilized in this research can be accessed openly from multi-
ple sources. The primary sources include the Archives of Weather Home at Kochi University, Japan
(http://weather.is.kochi-u.ac.jp/archive-e.html, accessed on 30 July 2022), the Japan Meteorological Agency
(JMA, https://www.data.jma.go.jp/, accessed on 15 June 2022), as well as the ADT (https://tropic.ssec.
wisc.edu/real-time/adt/adt.html, accessed on 20 June 2022) and SATCON methods (https://tropic.ssec.
wisc.edu/real-time/satcon/, accessed on 20 June 2022).
Acknowledgments: The authors would like to thank our colleagues who made suggestions for our
paper and the developers who selflessly provided the source code to the researchers.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Dvorak, V.F. Tropical cyclone intensity analysis using satellite data. In NOAA Technical Report NESDIS, 11; US Department of
Commerce, National Oceanic and Atmospheric Administration, National Environmental Satellite, Data, and Information Service:
Washington, DC, USA, 1984; pp. 1–47.
2. Velden, C.S.; Olander, T.L.; Zehr, R.M. Development of an objective scheme to estimate tropical cyclone intensity from digital
geostationary satellite infrared imagery. Weather Forecast. 1998, 13, 172–186. [CrossRef]
Remote Sens. 2023, 15, 4188 25 of 26

3. Velden, C.; Harper, B.; Wells, F.; Beven, J.L.; Zehr, R.; Olander, T.; Mayfield, M.; Guard, C.C.; Lander, M.; Edson, R. The Dvorak
tropical cyclone intensity estimation technique: A satellite-based method that has endured for over 30 years. Bull. Am. Meteorol.
Soc. 2006, 87, 1195–1210. [CrossRef]
4. Olander, T.L.; Velden, C.S. The advanced Dvorak technique (ADT) for estimating tropical cyclone intensity: Update and new
capabilities. Weather Forecast. 2019, 34, 905–922. [CrossRef]
5. Kidder, S.Q.; Goldberg, M.D.; Zehr, R.M.; DeMaria, M.; Purdom, J.F.; Velden, C.S.; Grody, N.C.; Kusselson, S.J. Satellite analysis of
tropical cyclones using the Advanced Microwave Sounding Unit (AMSU). Bull. Am. Meteorol. Soc. 2000, 81, 1241–1260. [CrossRef]
6. Bankert, R.L.; Tag, P.M. An automated method to estimate tropical cyclone intensity using SSM/I imagery. J. Appl. Meteorol. 2002,
41, 461–472. [CrossRef]
7. Piñeros, M.F.; Ritchie, E.A.; Tyo, J.S. Estimating tropical cyclone intensity from infrared image data. Weather Forecast. 2011, 26,
690–698. [CrossRef]
8. Fetanat, G.; Homaifar, A.; Knapp, K.R. Objective tropical cyclone intensity estimation using analogs of spatial features in satellite
data. Weather Forecast. 2013, 28, 1446–1459. [CrossRef]
9. Rodríguez-Herrera, O.G.; Wood, K.M.; Dolling, K.P.; Black, W.T.; Ritchie, E.A.; Tyo, J.S. Automatic tracking of pregenesis tropical
disturbances within the deviation angle variance system. IEEE Geosci. Remote Sens. Lett. 2014, 12, 254–258. [CrossRef]
10. Knaff, J.A.; Longmore, S.P.; DeMaria, R.T.; Molenar, D.A. Improved tropical-cyclone flight-level wind estimates using routine
infrared satellite reconnaissance. J. Appl. Meteorol. Climatol. 2015, 54, 463–478. [CrossRef]
11. Zhao, Y.; Zhao, C.; Sun, R.; Wang, Z. A multiple linear regression model for tropical cyclone intensity estimation from satellite
infrared images. Atmosphere 2016, 7, 40. [CrossRef]
12. Velden, C.S.; Herndon, D. A consensus approach for estimating tropical cyclone intensity from meteorological satellites: SATCON.
Weather Forecast. 2020, 35, 1645–1662. [CrossRef]
13. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
14. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
15. Han, X.; Li, X.; Yang, J.; Wang, J.; Zheng, G.; Ren, L.; Chen, P.; Fang, H.; Xiao, Q. Dual-Level Contextual Attention Generative
Adversarial Network for Reconstructing SAR Wind Speeds in Tropical Cyclones. Remote Sens. 2023, 15, 2454. [CrossRef]
16. Tong, B.; Wang, X.; Fu, J.; Chan, P.; He, Y. Short-term prediction of the intensity and track of tropical cyclone via ConvLSTM
model. J. Wind Eng. Ind. Aerodyn. 2022, 226, 105026. [CrossRef]
17. Pang, S.; Xie, P.; Xu, D.; Meng, F.; Tao, X.; Li, B.; Li, Y.; Song, T. NDFTC: A new detection framework of tropical cyclones from
meteorological satellite images with deep transfer learning. Remote Sens. 2021, 13, 1860. [CrossRef]
18. Sun, Z.; Zhang, B.; Tang, J. Estimating the Key Parameter of a Tropical Cyclone Wind Field Model over the Northwest Pacific
Ocean: A Comparison between Neural Networks and Statistical Models. Remote Sens. 2021, 13, 2653. [CrossRef]
19. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
20. Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform.
2019, 16, 4681–4690. [CrossRef]
21. Combinido, J.S.; Mendoza, J.R.; Aborot, J. A convolutional neural network approach for estimating tropical cyclone intensity
using satellite-based infrared images. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR),
Beijing, China, 20–24 August 2018; pp. 1474–1480.
22. Wimmers, A.; Velden, C.; Cossuth, J.H. Using deep learning to estimate tropical cyclone intensity from satellite passive microwave
imagery. Mon. Weather Rev. 2019, 147, 2261–2282. [CrossRef]
23. Chen, B.; Chen, B.-F.; Lin, H.-T. Rotation-blended CNNs on a new open dataset for tropical cyclone image-to-intensity regression.
In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK,
19–23 August 2018; pp. 90–99.
24. Wang, C.; Zheng, G.; Li, X.; Xu, Q.; Liu, B.; Zhang, J. Tropical cyclone intensity estimation from geostationary satellite imagery
using deep convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4101416. [CrossRef]
25. Zhang, R.; Liu, Q.; Hang, R. Tropical cyclone intensity estimation using two-branch convolutional neural network from infrared
and water vapor images. IEEE Trans. Geosci. Remote Sens. 2019, 58, 586–597. [CrossRef]
26. Lee, J.; Im, J.; Cha, D.-H.; Park, H.; Sim, S. Tropical cyclone intensity estimation using multi-dimensional convolutional neural
networks from geostationary satellite data. Remote Sens. 2019, 12, 108. [CrossRef]
27. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.;
Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.
28. Wang, D.; Zhang, Q.; Xu, Y.; Zhang, J.; Du, B.; Tao, D.; Zhang, L. Advancing plain vision transformer toward remote sensing
foundation model. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5607315. [CrossRef]
29. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks.
Nature 2023, 619, 533–538. [CrossRef] [PubMed]
30. Harper, B.; Kepert, J.; Ginger, J. Guidelines for Converting between Various Wind Averaging Periods in Tropical Cyclone
Conditions; World Metrological Organization WMO/TD 1555. 2010, p. 64. Available online: https://library.wmo.int/doc_num.
php?explnum_id=290 (accessed on 25 August 2023).
Remote Sens. 2023, 15, 4188 26 of 26

31. Tong, B.; Sun, X.; Fu, J.; He, Y.; Chan, P. Identification of tropical cyclones via deep convolutional neural network based on satellite
cloud images. Atmos. Meas. Tech. 2022, 15, 1829–1848. [CrossRef]
32. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you
need. proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017;
pp. 5998–6008.
33. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [CrossRef]
34. Knaff, J.A.; DeMaria, R.T. Forecasting tropical cyclone eye formation and dissipation in infrared imagery. Weather Forecast. 2017,
32, 2103–2116. [CrossRef]
35. Ren, F.; Liang, J.; Wu, G.; Dong, W.; Yang, X. Reliability analysis of climate change of tropical cyclone activity over the western
North Pacific. J. Clim. 2011, 24, 5887–5898. [CrossRef]
36. Bai, L.; Tang, J.; Guo, R.; Zhang, S.; Liu, K. Quantifying interagency differences in intensity estimations of Super Typhoon Lekima
(2019). Front. Earth Sci. 2022, 16, 5–16. [CrossRef]
37. Ritchie, E.A.; Wood, K.M.; Rodríguez-Herrera, O.G.; Piñeros, M.F.; Tyo, J.S. Satellite-derived tropical cyclone intensity in the
North Pacific Ocean using the deviation-angle variance technique. Weather Forecast. 2014, 29, 505–516. [CrossRef]
38. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using
shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada,
11–17 October 2021; pp. 10012–10022.
39. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation
through attention. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021;
Volume 139, pp. 10347–10357.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Biostatistics - CP For Pharm D IV
100% (1)
Biostatistics - CP For Pharm D IV
7 pages
Cyc 4
No ratings yet
Cyc 4
12 pages
Remotesensing 16 00984
No ratings yet
Remotesensing 16 00984
21 pages
PDF Datastream
No ratings yet
PDF Datastream
19 pages
Tropical Cyclone Intensity Prediction Using Deep C
No ratings yet
Tropical Cyclone Intensity Prediction Using Deep C
11 pages
A Deep Learning Framework For The Detection of Tropical Cyclones From Satellite Images
No ratings yet
A Deep Learning Framework For The Detection of Tropical Cyclones From Satellite Images
5 pages
Cyclone Intensity Estimation Using INSAT 3D IR Imagery and Deep Learning Ijariie19285
No ratings yet
Cyclone Intensity Estimation Using INSAT 3D IR Imagery and Deep Learning Ijariie19285
5 pages
Ctech PPT Template (2) - 3
No ratings yet
Ctech PPT Template (2) - 3
9 pages
Cyc 2
No ratings yet
Cyc 2
5 pages
Remotesensing 13 02466 v2
No ratings yet
Remotesensing 13 02466 v2
25 pages
Pineros Etal 2008
No ratings yet
Pineros Etal 2008
12 pages
Cyclone Detection With End-To-End Super Resolution and Faster R-CNN
No ratings yet
Cyclone Detection With End-To-End Super Resolution and Faster R-CNN
14 pages
DDTien Meteo Application 20250513 V0
No ratings yet
DDTien Meteo Application 20250513 V0
40 pages
Romeo Sarkar Thesis
No ratings yet
Romeo Sarkar Thesis
38 pages
Equipment and Methodologies For Cloud Detection and Classification A Review SE13
No ratings yet
Equipment and Methodologies For Cloud Detection and Classification A Review SE13
39 pages
Symmetry 12 01056 With Cover
No ratings yet
Symmetry 12 01056 With Cover
16 pages
Remotesensing 11 02302 v3
No ratings yet
Remotesensing 11 02302 v3
24 pages
2023 - Monitoring Cloud Parameters Using A Ground Based Airglow Imager
No ratings yet
2023 - Monitoring Cloud Parameters Using A Ground Based Airglow Imager
14 pages
Deepti Deep-Learning-Based Tropical Cyclone Intensity Estimation System
No ratings yet
Deepti Deep-Learning-Based Tropical Cyclone Intensity Estimation System
11 pages
Atmosphere 11 00676 v2
No ratings yet
Atmosphere 11 00676 v2
29 pages
Cyc 5
No ratings yet
Cyc 5
11 pages
Cyclone Intensity Detection
No ratings yet
Cyclone Intensity Detection
10 pages
How To Calculate MTF Slant Edge Method
No ratings yet
How To Calculate MTF Slant Edge Method
25 pages
A Cloud Motion Estimation Method Based On Cloud Image Depth Feature Matching
No ratings yet
A Cloud Motion Estimation Method Based On Cloud Image Depth Feature Matching
5 pages
06velden PDF
No ratings yet
06velden PDF
20 pages
Estimating Monthly Surface Air Temperature Using MODIS LST Data and An Artificial Neural Network in The Loess Plateau, China
No ratings yet
Estimating Monthly Surface Air Temperature Using MODIS LST Data and An Artificial Neural Network in The Loess Plateau, China
13 pages
Symmetry 12 01056 v2
No ratings yet
Symmetry 12 01056 v2
15 pages
Zheng 2018
No ratings yet
Zheng 2018
16 pages
Ai Report
No ratings yet
Ai Report
6 pages
The Advanced Dvorak Technique Continued Developmen
No ratings yet
The Advanced Dvorak Technique Continued Developmen
13 pages
A High-Performance Convolutional Neural Network For Ground-Level Ozone Estimation in Eastern China
No ratings yet
A High-Performance Convolutional Neural Network For Ground-Level Ozone Estimation in Eastern China
17 pages
An Overview of Neural Network Methods For Predicting Uncertainty in Atmospheric Remote Sensing
No ratings yet
An Overview of Neural Network Methods For Predicting Uncertainty in Atmospheric Remote Sensing
34 pages
Kidder S. Q. & Haar T. H. - Satellite Meteorology - Index (1995)
No ratings yet
Kidder S. Q. & Haar T. H. - Satellite Meteorology - Index (1995)
7 pages
Numerical Methods For The Detection of Whirlwind (Cyclone, Tornado, Hurricane) On Satellite Data
No ratings yet
Numerical Methods For The Detection of Whirlwind (Cyclone, Tornado, Hurricane) On Satellite Data
8 pages
1 s2.0 S0950705121010558 Main
No ratings yet
1 s2.0 S0950705121010558 Main
12 pages
Tropical Cyclone Forecast Using Multitask Deep Learning Framework
No ratings yet
Tropical Cyclone Forecast Using Multitask Deep Learning Framework
5 pages
Remote Sensing: A Cloud Detection Method For Landsat 8 Images Based On Pcanet
No ratings yet
Remote Sensing: A Cloud Detection Method For Landsat 8 Images Based On Pcanet
21 pages
1 s2.0 S1110982324000048 Main
No ratings yet
1 s2.0 S1110982324000048 Main
17 pages
SMTGGG
No ratings yet
SMTGGG
24 pages
Remotesensing 12 03261 v2
No ratings yet
Remotesensing 12 03261 v2
19 pages
ChowetalSE2011 TSIForecast
No ratings yet
ChowetalSE2011 TSIForecast
28 pages
Ceanography: The Official Magazine of The Oceanography Society
No ratings yet
Ceanography: The Official Magazine of The Oceanography Society
9 pages
A Novel Method For Monitoring Tropical Cyclones M
No ratings yet
A Novel Method For Monitoring Tropical Cyclones M
19 pages
Estimation of Soil Moisture Content Under High Maize Can - 2022 - Agricultural W
No ratings yet
Estimation of Soil Moisture Content Under High Maize Can - 2022 - Agricultural W
15 pages
Simultaneous Cloud Detection and Removal From Bitemporal Remote Sensing Images Using Cascade Convolutional Neural Networks
No ratings yet
Simultaneous Cloud Detection and Removal From Bitemporal Remote Sensing Images Using Cascade Convolutional Neural Networks
17 pages
A Machine Learning Approach For Air-Quality Forecast by Integrating GNSS Radio Occultation Observation and Weather Modeling
No ratings yet
A Machine Learning Approach For Air-Quality Forecast by Integrating GNSS Radio Occultation Observation and Weather Modeling
15 pages
Remote Sensing
No ratings yet
Remote Sensing
17 pages
Atmosphere 12 00395 v2
No ratings yet
Atmosphere 12 00395 v2
28 pages
Automated Segmentation of Tropical Cyclone Clouds in Geostationary Infrared Images
No ratings yet
Automated Segmentation of Tropical Cyclone Clouds in Geostationary Infrared Images
5 pages
Remotesensing 13 00516 v3
No ratings yet
Remotesensing 13 00516 v3
19 pages
Satellite 4 Good
No ratings yet
Satellite 4 Good
14 pages
Published1IEEEpaper Rainfall Estimation Using Machine Learning-ICELTIC
No ratings yet
Published1IEEEpaper Rainfall Estimation Using Machine Learning-ICELTIC
7 pages
Cdnet: Cnn-Based Cloud Detection For Remote Sensing Imagery
No ratings yet
Cdnet: Cnn-Based Cloud Detection For Remote Sensing Imagery
17 pages
Cloud Detection
No ratings yet
Cloud Detection
8 pages
Manz Et Al 2017
No ratings yet
Manz Et Al 2017
21 pages
Lightweight U-Net For Cloud Detection of Visible A
No ratings yet
Lightweight U-Net For Cloud Detection of Visible A
15 pages
Xia2021 Compressed
No ratings yet
Xia2021 Compressed
13 pages
Exploring The Impact of Noise On Hybrid Inversion of PROSAIL RTM On Sentinel-2 Data
No ratings yet
Exploring The Impact of Noise On Hybrid Inversion of PROSAIL RTM On Sentinel-2 Data
20 pages
remotesensing-TJNU DATASET
No ratings yet
remotesensing-TJNU DATASET
18 pages
Real-Time Earthquake Tracking and Localisation: A Formulation for Elements in Earthquake Early Warning Systems (Eews)
From Everand
Real-Time Earthquake Tracking and Localisation: A Formulation for Elements in Earthquake Early Warning Systems (Eews)
George R. Daglish
No ratings yet
AI and Robotics Applications in Disaster Response
From Everand
AI and Robotics Applications in Disaster Response
Menka Chopra
No ratings yet
Strategic Management Practices and Organ
No ratings yet
Strategic Management Practices and Organ
4 pages
Basic Business Statistics: 10 Edition
No ratings yet
Basic Business Statistics: 10 Edition
77 pages
Attrition and Reenlistment in The Army Using The Tailored Adaptive Personality Assessment System TAPAS To Improve Retention
No ratings yet
Attrition and Reenlistment in The Army Using The Tailored Adaptive Personality Assessment System TAPAS To Improve Retention
16 pages
Soft Reviewer Sa Finance by Totowable..: Activity Cost and Cost Analysis Theories
No ratings yet
Soft Reviewer Sa Finance by Totowable..: Activity Cost and Cost Analysis Theories
9 pages
Archival and Secondary Data - 1st Edition Scribd Full Download
100% (11)
Archival and Secondary Data - 1st Edition Scribd Full Download
15 pages
Decision Analysis Using Microsoft Excel PDF
No ratings yet
Decision Analysis Using Microsoft Excel PDF
400 pages
The Comparative Politics of Corruption: Accounting For The East Asian Paradox in Empirical Studies of Corruption, Growth and Investment
No ratings yet
The Comparative Politics of Corruption: Accounting For The East Asian Paradox in Empirical Studies of Corruption, Growth and Investment
19 pages
Instant Download Statistics For Business Economics With XLSTAT Education Edition Printed Access Card David R. Anderson PDF All Chapters
No ratings yet
Instant Download Statistics For Business Economics With XLSTAT Education Edition Printed Access Card David R. Anderson PDF All Chapters
65 pages
The Effect of Promotion, Relationship Marketing, and Service Quality On Recipient Satisfaction of Participants of BPJS
No ratings yet
The Effect of Promotion, Relationship Marketing, and Service Quality On Recipient Satisfaction of Participants of BPJS
8 pages
Introductory Econometrics Midterm Examnation INSTRUCTIONS: - This Is The Open-Book Exam
No ratings yet
Introductory Econometrics Midterm Examnation INSTRUCTIONS: - This Is The Open-Book Exam
2 pages
The Fundamentals of Regression Analysis PDF
No ratings yet
The Fundamentals of Regression Analysis PDF
99 pages
Reference: Basic Econometrics by Damodar N. Gujarati Additional Reference: Introductory Econometrics by Jeffery M Wooldridge
No ratings yet
Reference: Basic Econometrics by Damodar N. Gujarati Additional Reference: Introductory Econometrics by Jeffery M Wooldridge
16 pages
Tutorial Minitab
No ratings yet
Tutorial Minitab
36 pages
DTREG
No ratings yet
DTREG
395 pages
BA ZG524 Advanced Statistical Methods
No ratings yet
BA ZG524 Advanced Statistical Methods
7 pages
Malhotra Im Pt4videocase
No ratings yet
Malhotra Im Pt4videocase
14 pages
Applied Ergonomics: R.S. Bridger, K. Brasher, A. Dew, S. Kilminster
No ratings yet
Applied Ergonomics: R.S. Bridger, K. Brasher, A. Dew, S. Kilminster
9 pages
Regression Notes-I
No ratings yet
Regression Notes-I
10 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Business Statistics Exam Prep Solutions
No ratings yet
Business Statistics Exam Prep Solutions
3 pages
Tigist Girma Thesis Final Final PDF
No ratings yet
Tigist Girma Thesis Final Final PDF
85 pages
Demand Forecasting
No ratings yet
Demand Forecasting
13 pages
Mosholder - 1983 - Cross-Level Inference and Organizational Research
No ratings yet
Mosholder - 1983 - Cross-Level Inference and Organizational Research
13 pages
Drying Kinetics of Open Sun Drying of Fish
No ratings yet
Drying Kinetics of Open Sun Drying of Fish
5 pages
Dimensional Analysis Applications
No ratings yet
Dimensional Analysis Applications
14 pages
Chapter 4 Power Point Slides
No ratings yet
Chapter 4 Power Point Slides
38 pages
1 s2.0 S1470160X16300279 Main
No ratings yet
1 s2.0 S1470160X16300279 Main
6 pages
Chapter1 IntroductiontoSEMinAMOS
100% (3)
Chapter1 IntroductiontoSEMinAMOS
28 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages

Cyc 6

Uploaded by

Cyc 6

Uploaded by

remote sensing

Remote Sens. 2023, 15, 4188. https://doi.org/10.3390/rs15174188 https://www.mdpi.com/journal/remotesensing

Figure 1. Technical flowchart.

2.1.2. Data Pre-Processing

Figure 2.2. Image

2.2. DCNN Model

zl′ = MSA( LN ( zl-1 ) ) +zl-1 , l =1...L (6)

zl = MLP LN(z0l + z0l l = 1 . . . L

2.4. Smoothing Methods

neural network consisting of multiple layers of interconnected nodes or neurons, and it

2.5. Other Techniques

2.6. Model Performance

Table 2. Confusion matrix of parameters for calculating PRF values.

2.7. Computational Platform

3. Results and Discussions

ViT for the regression of TC intensity.

Table 3. Performance of regression model during testing process.

Table 3. Performance of regression model during testing process.

Model Dataset RMSE MAE MAPE R

3.2. The Two-Stage Strategy

Remote Sens. 2023, 15, x FOR PEER REVIEW 14 of 27

3.2. The Two-Stage Strategy

Model Category Validation Accuracy Validation Testing Ac-Precision

Table 5. Confusion matrix of predictions from the DCNN classification model.

Table 6. Confusion matrix of predictions from the ViT classification model.

3.2.2. Performance of the Two-Stage Strategy

Table 7. Performance of four scenarios for the two-stage strategy.

Scenario RMSE MAE MAPE R

As expected, ViT_DCNN achieves the best performance. However, comparison of

To further demonstrate the validity of the proposed methods and strategies, we

Remote Sens. 2023, 15, 4188 22 of 26

You might also like