# Foundation Models For Weather and Climate Data Understanding A Comprehensive Survey
# Foundation Models For Weather and Climate Data Understanding A Comprehensive Survey
Abstract—
As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-
driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to
decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate
arXiv:2312.03014v1 [cs.LG] 5 Dec 2023
data. Cutting-edge performance on specific tasks within narrower spatio-temporal scales has been achieved recently through DL. The
rise of large models, specifically large language models (LLMs), has enabled fine-tuning processes that yield remarkable outcomes
across various downstream tasks, thereby propelling the advancement of general AI. However, we are still navigating the initial
stages of crafting general AI for weather and climate. In this survey, we offer an exhaustive, timely overview of state-of-the-art AI
methodologies specifically engineered for weather and climate data, with a special focus on time series and text data. Our primary
coverage encompasses four critical aspects: types of weather and climate data, principal model architectures, model scopes and
applications, and datasets for weather and climate. Furthermore, in relation to the creation and application of foundation models for
weather and climate data understanding, we delve into the field’s prevailing challenges, offer crucial insights, and propose detailed
avenues for future research. This comprehensive approach equips practitioners with the requisite knowledge to make substantial
progress in this domain. Our survey encapsulates the most recent breakthroughs in research on large, data-driven models for weather
and climate data understanding, emphasizing robust foundations, current advancements, practical applications, crucial resources, and
prospective research opportunities.
Index Terms—Foundation Models, Weather & Climate Analysis, Deep Learning, Time Series, Spatio-Temporal data, Earth System.
Fine-Tuning
Weather & Climate
Foundation Model
Downscaling
oversimplified representation of local geographical fea- expansion of global weather and climate data [25], [26].
tures [23], as they often fail to capture the intricate nuances Capitalizing on abundant data resources and advancements
of local topography, which exerts a critical influence on in computational technology [27], [28], these models are
regional weather and climate patterns. Another obstacle is revolutionizing climate science [29]. Employing voluminous
the effective integration of observational data from disparate data, deep learning models unravel the intricate nonlinear
sources, such as weather stations, radars, and satellites [8]. relationships concealed within climate variables, thereby
Traditional models often struggle with incorporating these capturing the dynamism and complexity of the climate
data, with varying spatial and temporal resolutions, into system with enhanced precision [30], [31]. However, these
their modeling frameworks. Moreover, they require sub- models are often designed for specific tasks and trained
stantial computational resources to manage the myriad of with data in particular formats, such as regional weather
physical constraints [24]. The complexity and scale of the forecasting or downscaling on a microscale. Differences in
Earth system demand extensive calculations, presenting the representations of training data sources have resulted in
challenges to computational capacity and efficiency. an overly compartmentalized functionality of data-driven
The rapid advancement of AI technology has intro- deep learning models for understanding weather and cli-
duced cost-effective, direct, and simplified solution strate- mate data. Consequently, it poses a significant challenge to
gies for weather and cliamte modeling. In particular, Ma- develop a versatile climate model that can be fine-tuned for
chine Learning (ML) and Deep Learning (DL) technolo- simulating the global weather and climate system.
gies can discern potential trend representations in weather The recent emergence and swift advancement of large
and climate data, bypassing the need for intricate phys- models have yielded significant gains across various fields,
ical relationships. Initially, ML techniques were sparingly including natural language processing (NLP), computer vi-
used for short-term, localized forecasts of weather and cli- sion (CV) [32], robotics [33], and a range of interdisciplinary
mate conditions, given their limited capabilities compared areas encompassing life sciences [34], [35], [36], [37], [38].
with large-scale, time-extensive physical models. However, Particularly in the NLP field, large models, or large language
the past decade has witnessed an exponential surge in models (LLMs), are evolving rapidly, trained on large-scale
the application of data-driven deep learning methods in corpora and fine-tuned for various downstream tasks [39],
weather and climate research, propelled by the explosive [40], [41]. In computer vision, large vision models trained
4
on substantial natural images [42], [43], [44] demonstrate standing weather and climate data, the distinct challenges
exceptional zero-shot capabilities [45], [46]. The impressive associated with the development of large-scale foundation
performance of these models across tasks arises from their models, as outlined above, necessitate concentrated research
substantial parameter counts and large-scale pre-training (refer to Sec. 9 for more details). This emphasizes the need
data. For instance, GPT-3 [47], [48] possesses nearly 120 for a thorough review of advancements in this nascent field.
times the parameters of GPT-2 [49], enabling it to learn more In this paper, we conduct a comprehensive review of
powerfully from fewer samples, while GPT-4 [50] has less data-driven models explicitly designed for weather and
than ten times the parameters of GPT-3, yet excels in text climate data. Our survey encompasses a wide array of large
generation and image understanding. The rapid ascension foundation models/task-specific models spanning various
of LLMs has redefined the path forward for deep learn- data types, model architectures, application domains, and
ing, despite long-standing developments in areas such as representative tasks. This review amplifies the scope of
unsupervised/semi-supervised and transfer learning. A no- insights derived from weather and climate data, encourag-
table example is the vision-language large model [46], [51], ing novel strategies and fostering the cross-application of
[52], [53], such as CLIP [46], which is trained on numerous large models in the weather and climate. By leveraging the
natural image-text pairs and fine-tuned to achieve promis- power of DL in large-scale models, we aim to reveal com-
ing results in tasks like image segmentation [54], [55], [56] plex climate patterns, augment predictions, and deepen our
and video subtitle generation [57], [58]. Recently, the exten- comprehension of the climate system, thereby empowering
sion of large models into domains such as speech [59], [60], society to more effectively adapt to the challenges posed
physics [61], and mathematical analysis [62] has catalyzed by climate change. Our contributions are summarized as
advancements in fundamental science and specialized areas. follows:
The groundbreaking success of pre-trained foundation
models has propelled the domains of NLP and CV sig- • First Comprehensive and Contemporary Survey. To
nificantly closer to the realization of versatile AI. This the best of our knowledge, this paper constitutes the
advancement prompts an intriguing question: The success inaugural comprehensive survey that thoroughly en-
of pre-trained foundation models has allowed the fields of capsulates the state-of-the-art developments of large,
NLP and CV to take a meaningful step towards realizing and task-specific models for weather and climate
general-purpose AI, which not only leads one to wonder: data understanding, spanning across time series, video
Is it possible to develop a universal foundation model for streams, and text sequences. We furnish an in-depth
weather and climate data understanding that effectively and current panorama that covers the broad spectrum
addresses a myriad of related tasks? of the domain, simultaneously delving into the sub-
Building upon the theory of pre-trained models, C LI - tleties of distinct methodologies, thereby providing the
MA X [25] introduces an innovative approach towards the reader with a comprehensive and current apprehension
development of a weather and climate base model. It of this field.
leverages the Transformer to pre-train large-scale weather • Systematic and In-depth Categorization. We introduce
and climate data, yielding a flexible foundation model and discuss an organized and detailed categorization,
proficient in short- to medium-term forecasting,, climate dividing existing related research into two main cat-
projection, and downscaling. Both PAN G U -W EATHER [63] egories: large climate foundation models and task-
and W-MAE [64] exhibit robust climate prediction capabil- specific climate models. Furthermore, we further clas-
ities by modeling the global climate system using copious sify them based on the underlying model architectures,
data. However, the quest for large-scale, universal climate including RNNs, Transformers, GANs, Diffusion mod-
models faces significant obstacles. A primary challenge els, and Graph Neural Networks. Subsequent divisions
is the scarcity of large, diverse, and high-quality train- are made based on the models’ application domains
ing datasets. Existing datasets (refer to Table. 4 for more and specific tasks, with detailed explanations of these
details) struggle with inconsistent measurements, spatial- task definitions. This multidimensional categorization
temporal biases, and limited functionality, hampering the provides readers with a coherent roadmap.
progression of all-encompassing, multipurpose large-scale • Abundant Resource Compilation. e have assembled
foundation models. Additionally, the computational de- a substantial collection of datasets and open-source
mands of these models add another dimension of complex- implementations pertinent to the field of weather and
ity, with the required infrastructure potentially unachiev- climate science. Each dataset is supplemented with an
able in resource-limited settings. Ideally, a weather/climate exhaustive description of its structure, pertinent tasks,
foundation model should seamlessly handle multi-source and direct hyperlinks for expedient access. This compi-
observations and incorporate detailed representations of lation serves as an invaluable resource for prospective
geographic features to generate more precise simulations of research and developmental endeavors in the domain.
weather and climate trends. Unfortunately, this remains a • Future Outlook and Research Opportunities. We have
largely uncharted territory for current weather and climate delineated several promising trajectories for future ex-
base models. Moreover, the interpretability of these models, ploration. These viewpoints span across various do-
often perceived as ”black boxes,” is a significant concern. In mains, including data post-processing, model architec-
tasks related to weather and climate, where erroneous pre- tures, interpretability, privacy, and training paradigms,
dictions can wreak havoc on ecosystems and societies, the among others. This discourse equips the readers with
need for interpretability is especially accentuated [36], [65], an intricate understanding of the current status of the
[66]. Despite the remarkable strides and potential in under- field and potential avenues for future exploration.
5
•Insights for Designing. We discuss and pinpoint cru- et al. [72] primarily emphasized the application of machine
cial design elements for promising weather and climate learning in climate modeling, such as sources of predictabil-
foundation models. These design components incor- ity in climate variability models, feature detection, extreme
porate the selection of temporal and spatial scales, weather and climate prediction, observational model in-
dataset choice, data representation and model design, tegration, downscaling, and bias correction. Materia [8]
learning strategies, and evaluation schemes. Adherence primarily centered on reviewing literature that employed
to this systematic design pipeline enables practitioners machine learning techniques for extreme weather detec-
to rapidly comprehend the design principles and con- tion and understanding. These aforementioned surveys lack
struct robust weather and climate foundation models, thorough investigation into the applications of foundational
thereby fostering the expeditious advancement of the models in weather data understanding. Mukkavilli et al. [73]
weather and climate domain. discussed the application of large models to weather and
Paper Organization. The remainder of this survey is climate tasks and the architectural design, which bears sim-
structured as follows: Section 2 delineates the distinctions ilarity to our endeavor, but does not include more detailed
between our survey and other corresponding studies. Sec- task-specific models and a wider range of data modalities.
tion 3 instills the reader with fundamental knowledge on Globally, these surveys also lack a structured delineation
foundational models, primary depictions of weather and and an exhaustive discussion of deep learning-based mod-
climate data, and related tasks. Section 4 expounds upon els for weather data understanding, as well as adequate
the core architecture of paramount models for weather resources (datasets, open-source models and tools, etc.) that
and climate tasks. Section 6, we present a synopsis of the are either not provided or are limited in their availability.
principal model classifications currently in use for weather Given the recent multiplication of large-scale models in
and climate tasks, encompassing climate basic models and domains such as vision [45], [75], audio [50], and text [56],
task-specific models. This section furnishes a holistic view of our intention with this survey is to provide an exhaustive
the field prior to probing into the complexities of individual and up-to-date overview of large-scale models for weather
methodologies. Section 5 imparts a concise introduction data understanding, as well as a structured delineation,
to climate basic models and task-specific models, further synthesis, and discussion of pertinent task-specific models,
stratifying task-specific models based on dissimilar model with the objective of establishing a robust foundation for the
architectures. Subsequently, Section 7 undertakes an exten- design of weather and climate base models. Our aim sur-
sive exploration of data-driven deep learning models for passes merely documenting recent advances; we also focus
specific weather and climate tasks. Considering the lack on available resources, practical applications, and potential
of a unified and comprehensive index for weather and research directions. Table. 2 encapsulates the discrepancies
climate datasets, Section 8 presents an exhaustive collection between our survey and other analogous reviews.
of dataset resources and introductions, aiming to impart
convenience and efficiency for readers. Section 9 delineates
the challenges currently impeding the evolution of weather
3 BACKGROUND AND P RELIMINARY
and climate basic models, as well as prospective future This study aims to review the recent progress in imple-
directions in this field. Section 10 proposes a potential menting data-driven models, with a primary emphasis on
blueprint for the construction of weather and meteoro- DL techniques, to address weather and climate tasks. The
logical basic models, aiding contemplation and execution objective is to illuminate potential pathways for develop-
by practitioners, and fostering the development of climate ing foundation models dedicated to weather and climate
foundation models. Finally, Sec. 11 provides a summary and data understanding. We direct our attention towards two
concluding remarks on the content of the survey. principal categories of models in the weather and climate
domains: large-scale foundational models and task-specific
models. In this section, we commence by discussing these
2 R ELATED W ORK AND D IFFERENCES two model types and elucidate their distinctions and con-
While numerous expansive surveys have been executed to nections. Subsequently, we delineate weather and climate-
model weather and climate-related data from various van- related data types and representative tasks across diverse
tage points, none of them emphasize the broad-spectrum domains. We conclude with an introduction to four preva-
scope of weather data. For example, Ren et al. [31] under- lent base model architectures employed in weather and
took a survey on deep learning-based weather forecasting, climate tasks.
focusing on neural network architecture design and spatial
and temporal scales, yet it omitted models pertinent to the
era of the weather data explosion. Both Fang et al. [67] 3.1 Foundation Models
and Jones et al. [71] reviewed deep learning-based weather Foundation Models (FMs) originated as pre-trained LLMs
forecasting within the confines of specific scenarios, namely with a broad capability to undertake a myriad of down-
extreme weather conditions and climate impacts on flood stream tasks through fine-tuning strategies. These models
risk. Conversely, Bochenek et al. [68] and Jaseena et al. constitute a versatile class, separate from task-specific mod-
[74] exclusively addressed and summarized machine learn- els, due to their capacity to accommodate a range of down-
ing/deep learning-based works concerning ordinary time stream tasks and integrate heterogeneous representations.
series. Chen et al. [70] provided a survey of machine learn- The prowess of FMs can be classified into two categories:
ing methodologies in weather and climate, but the focus re- (1) Cross-Modal Representation and (2) Reasoning and In-
mained restricted to forecasting tasks. Furthermore, Molina teraction.
6
TABLE 2: Comparison between this and other related surveys, focusing on domains (i.e. specific vs. general), relevant data
modalities (e.g., time series, graphs, video streams, and text), primary areas of focus (i.e., Weather and Climate Foundation
Models (WFM) and Task-Specific Models (TSM)), and available resources (i.e., dataset, and tools & models).
Cross-Modal Representation. his category involves Time Series-based Weather and Climate Analysis. This
multi-modal models, including vision-language models category primarily comprises DL models for weather and
(VLMs) [46], [51], [76], [77]. These models merge and align climate analysis that leverage time series data. These models
linguistic and visual modalities, demonstrating a significant typically utilize weather time series data obtained from a
potential for modal unification. A prime example is CLIP single weather station to determine sequential relationships
(Contrastive Language-Image Pre-training) [46], which con- between one or multiple variables from past observations,
currently trains on text and image data using the contrastive thereby facilitating future trend predictions for specific
learning method. It displays substantial Zero-Shot Learning weather variables.
(ZSL) and Few-Shot Learning (FSL) abilities on downstream A classic example of a data-driven model for weather
tasks. Another innovative model, SAM (Segment Anything forecasting is the Auto Regressive Integrated Moving Av-
Model) [45], integrates the concept of prompting into visual erage (ARIMA) [94], which enables non-stationary data to
tasks, yielding remarkable zero-shot segmentation perfor- become stationary through a differencing operation, and
mance. Models like InstructBLIP [78], CoCa [79], BEIT-3 [80], subsequently employs a combination of auto-regression
InstructGPT [81], and LLaMa [82], [83] further expand the and moving averages to model the time series. Given
reach of cross-modal foundation models, accommodating a the significant seasonality often present in weather data,
broader spectrum of tasks and modal representations. In such as fluctuations in temperature and rainfall, Seasonal
weather prediction and climate change applications, data ARIMA (SARIMA) [95] and Seasonal ARIMA with eX-
typically exhibit large-scale and multimodal characteristics, ogenous variables (SARIMAX) [96] have been developed
such as radar observations [84], [85], satellite images [86], to model weather series, building upon seasonal auto-
ground-based observatories [24], [87], and organized grid- regression/moving average principles. Vector Autoregres-
ded data [88], [89], [90]. These characteristics provide impe- sion (VAR) serves as an alternate method capable of mod-
tus for the development of data-driven FMs for weather and elling and predicting multiple correlated variables concur-
climate tasks. rently. Deep Learning-based models, such as families of
Reasoning and Interaction. FMs demonstrate excep- Recurrent Neural Networks (RNNs) [97], [98], [99], con-
tional reasoning and planning ablities, exemplified by mod- volutional neural network (CNN)-based architectures, and
els like CoT [91], ToT [92], and GoT [93], in addition to models based on the Transformer (e.g., Informer [100],
task planning agents. This category also involves interaction Autoformer [101], Crossformer [102], ETSFormer [103], Re-
abilities, encompassing operations and communication. This former [104], FEDformer [105]), have exhibited superior
study emphasizes the application of data-driven FMs for performance when dealing with non-stationary time series.
weather and climate tasks. Nonetheless, this area remains These models are particularly useful due to their lack of
uncharted, offering abundant opportunities for innovation. reliance on additional statistical knowledge and their effi-
ciency in long-term forecasting.
Spatio-Temporal Series-based Weather and Climate
3.2 Task-Specific Models Analysis. Another focal area is DL models for weather and
Contrary to previously mentioned FMs, the majority of climate analysis that employ spatio-temporal series. Un-
DL models for weather and climate are mainly domain- like time-series data, spatio-temporal data covers weather
specific (e.g., global/regional precipitation forecasting, ex- variable observations across multiple locations over time,
treme weather comprehension, climate model downscaling). allowing for the extraction of intricate spatio-temporal pat-
This survey classifies these task-specific models into two terns. In this context, continuous radar echoes or satellite
categories based on the nature of task for time series: (1) images that represent independent weather times are also
Time Series-based Weather and Climate Analysis; (2) Spatio- considered as spatio-temporal sequences.
Temporal Series-based Weather and Climate Analysis. We Data-driven models designed for analysing spatio-
also delineate an area for climate text data: Climate Text temporal series for weather and climate analysis are often
Analysis Tasks. required to capture both temporal and spatial correlations.
7
For instance, the convolutional LSTM [97], a variant of the N points on the Earth system, at each point there ex-
LSTM, incorporates convolutional operations to the LSTM its a time series x = {x1 , x2 , x3 , ..., xT } ∈ RT , where
to capture additional spatial correlations. 3D Convolutional xt ∈ R, the spatio-temporal series is formulated as Xu =
Neural Networks (3D-CNNs) are frequently employed to {x1 , x2 , x3 , ..., xN } ∈ RT ×N . Similarly, for multivariate
consider spatio-temporal correlations of sequences simul- spatio-temporal sequences, the series can be formulated
taneously. Spatio-Temporal Graph Neural Networks [106], as Xmu = {X1 , X2 , X3 , ..., XN } ∈ RT ×D×N , where XN
and other graph-based structures, effectively encode dif- denote the multivariate time series at the N space point.
ferent spatial information into graphs that capture spatial
Notably that graph-based structure usually utilized to
correlations as well as temporal trends of weather variables.
construct a spatio-temporal series, such as spatio-temporal
Transformer models utilize self-attention mechanisms to
graphs (STGs), temporal knowledge graphs (TKGs), video
assess the importance of different locations and time points
streams, and others. In this survey, we mainly focus above-
when making predictions [107]. Recent advancements in
mentioned classes, which are higly representative and
the field have also seen the exploration of generative AI,
align closely with the current spatio-temporal series-based
such as generative adversarial networks [108] and diffusion
weather forecasting and climate analysis tasks. And we
models [109], for weather prediction and climate change
follow the Ref. to define STGs and TKGs firstly, as follows.
based on spatio-temporal sequences, owing to their excel-
lent generative quality. Definition 3.3 (Spatio-Temporal Graphs). A spatio-
temporal graph G = {G1 , G2 , G3 , ..., GT } denotes a sequence
of T static graph snapshots (also named time steps) indexed
3.3 Types of Weather and Climate Data in time order, in which Gt = (Vt , ϵt ) presents a snapshot at
Investigations into weather and climate typically necessitate t-th time step; Vt and ϵt are sets of nodes and edges at time
the exploration of both temporal and textual data. The t. The adjacent matrix represents the correlation between
primary objectives of these tasks involve discerning the nodes in the graph and node feature matrices are defined
relationships between historical weather patterns — often as At ∈ RN ×N and Xt ∈ RN ×D , where At = {ati,j } and
characterized by numerous meteorological variables — and ati,j ̸= 0 if there is an edge between node i and j . In addition,
future changes. This process also includes the extraction of N = |Vt | is the number of nodes and D is the dimension of
specific features from textual sequences to aid detailed sub- node features.
sequent analysis. In these scenarios, our discussion mainly Definition 3.4 (Temporal Knowledge Graphs). Follow the
revolves around three primary data types: time series, definition of STGs, a temporal knowledge graph G =
spatio-temporal, and textual data. n the context of weather {G1 , G2 , G3 , ..., GT } is a sequence of T knowledge graph
and climate analysis, time series can be broadly divided into snapshots indexed in time order, where Gt = (ϵt mRt )
two types: univariate and multivariate. A univariate time is a snapshot consisting of the entity and relation sets at
series might be represented by the daily mean temperature time t. Specifically, ϵt encapsulates both subject and object
at a single observation point, while a multivariate time entities, and Rt presents the set of relations between them.
series may include daily precipitation and humidity data In a temporal knowledge graph, entities and relations may
collected from the same observation point. Here, we first posses different features, denoted by X ∈ R|ϵt |×De and
discuss the definition of univariate/multivariate time series. Xrt ∈ R|R|×Dr , where De and Dr are feature dimensions.
Formally, we follow the definitions of time series data in
Ref. [110], which we summarize below. Spatio-temporal video streams belong to a species of
spatio-temporal series, which are represented as regular
Definition 3.1 (Time Series Data). For a single point ob- spatial shapes and sequences organized in time order. In
servation, a univariate time series sole weather variables weather forecasting and climate analysis tasks, regional
(such as temperature) x = {x1 , x2 , x3 , ..., xT } ∈ RT is contiguous weather radar echoes and satellite images that
a sequence of T time step indexed in time order, where symbolize specific climate events belong to this type, and
xt ∈ R is the variable value of the time series at time we define spatio-temporal video streams based on the defi-
t. A multivariate time series including different climate nition of spatio-temporal sequences as follows.
variables (i.e., temperature, humidity, precipitation, etc.)
X = {x1 , x2 , x3 , ..., xT } ∈ RT ×D is a sequence of T time Definition 3.5 (Spatio-Temporal Video Streams). Assume a
steps indexed in time order but with D dimensions (vari- spatio-temporal video streams V = {F1 , F2 , F3 , ..., FT } is
ables), in which x ∈ RD denotes the values of the time series a set of continue frames that cover T time steps indexed
at time t along D channels. in time order, where Ft denotes the t-th frame (or time
step). Each frame is viewed as a matrix of pixels1 can be
Global climate data are often represented as spatio- formulated as Ft ∈ RC×H×W , where C, H, W denote the
temporal series, i.e., chaotic correlations with both tem- channels, height, and width of the frame, respectively.
poral (change trend) and spatial dimensions (geographic
location). We define two distinct Spatio-Temporal Series: Definition 3.6 (Text Sequence). Let S be a text sequence,
univerate spatio-temporal series and multivariate spatio- where each element in the sequence represents a word or
temporal series. They are both sequence of data points character. The text sequence can be represented as S =
organized by both temporal and spatial dimensions. {x1 , x2 , . . . , xn }, where xi represents the i-th element in
the sequence. The length of the text sequence, denoted as
Definition 3.2 (Spatio-Temporal Series). For univerate
spatio-temporal series, follow Definition 2.1, there exist 1. we only consider the image mode withou any knowledge there.
8
(N), can be defined as N = |S|, where | · | represents the change and weather forecasting. In meteorological con-
cardinality or number of elements in the sequence. Further- texts, spatio-temporal video streams typically manifest
more, each element in the text sequence can be represented as sequences of frames that depict weather fluctua-
as a one-hot encoded vector, denoted as X . The one-hot tions over a fixed period. These sequences may include
encoded vector Xi for the i-th element in the sequence is regularly shaped radar images, satellite images, and
a binary vector of length M , where M represents the total other types of weather-related visual data. Therefore,
number of unique words or characters in the text corpus. the primary interest in spatio-temporal video stream
The one-hot encoded vector Xi has a value of 1 at the data lies in prediction tasks—namely, the forecasting
position corresponding to the index of the word or character of future images based on a series of past consecutive
in the vocabulary, and 0 elsewhere. frames. The quintessential task in this context involves
the prediction of imminent rainfall based on radar
echoes or the extrapolation of satellite imagery.
3.4 Mainstream Tasks for Weather and Climate
• Climate Text Tasks. The analysis of climate textual
Based on the above definitions, we will present representa- data, or climate text analysis, aspires to distill signif-
tive weather and climate analysis tasks associated with the icant patterns and insights. This process encapsulates
above data types and structures. several subtasks including Sentiment Analysis, Topic
• Weather/Climate Time Series Tasks. Time series anal- Modeling, Information Extraction, and Trend Analysis. Sen-
ysis forms the bedrock of weather and climate studies. timent analysis endeavors to preceive the sentiment
Researchers frequently harness this methodology to or perspectives encapsulated in cliamte text data (e.g.,
extract meteorological trends from sequential data, pro- public perceptions of climate change). Topic modeling,
jecting these tendencies onto multiple variable values conversely, strives to identify and classify the cardi-
across a specified temporal span for granular analy- nal themes or subjects broached within climate texts,
sis. This overarching task encompasses three subtasks: thereby fostering a comprehensive understanding of
Forecasting, Classification, and Imputation. In the fore- pivotal focus areas.Information extraction constitutes
casting task, the primary goal in to precisely predict the extraction of specific details from climate texts, such
a specific variable for a designated future temporal as instances of extreme weather events or particulars of
window grounded on historical observation. This task climate policy. Finally, trend analysis concentrates on
can be bifurcated, based on the magnitude of the pre- pinpointing and examining trends within climate texts,
diction window, into short-term forecasting (typically aiding in the monitoring of shifts in public dialogue,
spanning several hours to a few days) and long-term scientific research, or policy discussions over time. Col-
forecasting (generally a week or beyond). Short-term lectively, these tasks converge to a deeper discernment
weather forecasting is often employed in immediate of climate issues. The insights harvested can enlighten
weather prediction and urban planning, whereas long- decision-making mechanisms, policy development, and
term forecasting predominantly serves climate studies, initiatives to amplify public cognizance.
agriculture, and energy sectors. Subsequently, the clas-
Considering the aforementioned types of weather and cli-
sification task is aimed at mapping distinct meteoro-
mate data, we will now expound on a variety of tasks
logical phenomena, such as drought intensities, based
pertinent to weather and climate analysis. Note that we
on a historical chronology of atmospheric observations.
have omitted the explicit outline and definition of the Cli-
Finally, the imputation task is structured to fill missing
mate Text Analysis task due to its closely related subtasks,
values in the series. This task exploits potential infor-
and instead adopted the aforementioned Climate Task as a
mation embedded in the series, accounting for data
proxy for the Climate Text Analysis definition. A succinct
gaps that might emanate from sensor malfunctions or
description of each task is as follows:
severe climate events, among other factors.
• Graph Structure-based Tasks. The mainstream task of • Forecasting Tasks. These tasks span from a few hours
graph structure-based for climate change is forecasting. (nowcasting) to days and weeks (short- and medium-
We explore graph structure-based tasks in terms of range forecasting). They may include regional fore-
both STGs and TKGs, as previously mentioned. STGs casting for continental states, counties, or cities. Sub-
and TKGs is extensions for representing and reasoning seasonal to seasonal prediction involves forecasting
about spatio-temporal information, fusing the relation- weather between 2 weeks and 2 months in advance,
ships between time, space, and entities into a unified bridging the gap between weather forecasts and sea-
graph structure. Forecasting tasks aim to infer weather sonal climate predictions, which is imperative for dis-
conditions at future spatio-temporal points based on aster mitigation.
historical observations and model predictions. Such • Precipitation nowcasting tasks. Precipitation Nowcast-
tasks involve multiple variables, such as temperature, ing is a weather forecasting technique designed to
humidity, and barometric pressure, as well as temporal predict precipitation over the next few hours. Unlike
and spatial dimensions. The key challenges of spatio- traditional weather forecasting, it focuses on short-
temporal map prediction tasks are how to effectively term changes in precipitation, usually predicted on time
capture and model spatio-temporal dependencies and scales of minutes to hours. This task employs data from
how to cope with data uncertainty and missingness. radar systems, satellites, weather observation facilities,
• Spatio-Temporal Video Streams Tasks. Video data and numerical models, combined with image process-
stands as a crucial asset in the examination of climate ing techniques, to predict the distribution, intensity, and
9
movement of precipitation over a brief future period Graph Neural Networks, and Diffusion Models. Consider-
via real-time monitoring and analysis of atmospheric ing the particular representations of weather and climate
clouds and precipitation systems. Therefore, we have data, we focus on spatio-temporal graphical neural net-
isolated it from the general forecasting task. works in our discussion of GNNs.
• Downscaling tasks. Given the coarse spatial resolution
of global climate models, they can only offer general 4.1 Recurrent Neural Networks
estimates of climate conditions at local or regional
Recurrent Neural Networks [111] (RNNs) are a neural net-
scales. Simulations often exhibit systematic biases that
work architecture specialized in processing sequential data.
diverge from trends in observed data. Downscaling
In RNNs, information is passed on all the time, enabling
climate models aims to generate locally precise climate
the RNN to utilize previous information to influence sub-
information from global climate projections by correlat-
sequent outputs. RNNs are fundamental modules in deep
ing this climate information to observed local climate
learning and are widely used in language modeling [112],
conditions. This process enhances the data’s spatial and
[113], [114], time series analysis [98], [115], [116], and many
temporal resolution, rendering it more suitable for local
other sequence-related tasks. RNNs have also pioneered the
and regional analysis.
use of deep learning techniques to deal with weather and
• Bias correction tasks. Bias correction is vital in weather
climate modeling [97]. The update rule for a general RNN
and climate applications. It aims to minimize or elim-
can be expressed as:
inate systematic biases in model outputs and obser-
vational data, which emerge due to uncertainties in ht = σ(Wh xt + Uh ht−1 + bh ), (1)
weather models and measurement errors. In weather
where ht is the hidden state at t-th time step, xt is the input
forecasting, bias correction enhances the accuracy of
at t-th time step, Wh and Uh are the weight matrices, bh is
model predictions by adjusting variables such as tem-
the bias, and σ is a nonlinear activation function such as
perature and precipitation to match actual observa-
tanh or ReLU.
tions. In climate research, bias correction is crucial
However, ordinary RNNs often encounter the problems
for aligning climate model outputs with observational
of gradient vanishing and gradient explosion in practice,
data, facilitating accurate analysis of climate change
making it difficult to handle long sequences. To solve this
trends, evaluation of model performance, and reliable
problem, some improved RNN structures have been pro-
predictions of future climate changes. Various methods,
posed, such as Long Short-Term Memory [99] (LSTM) and
including statistical, machine learning, and deep learn-
Gated Recurrent Unit [117] (GRU). ConvLSTM [97] and
ing techniques, can be employed for bias correction,
ConvGRU [97] are variants that introduce convolutional
tailoring the approach based on the specific application
operations into LSTM and GRU, allowing them to process
and data characteristics. By minimizing or eliminating
spatially structured data such as images or videos, they
systematic biases, bias correction improves the quality
usually have utilized to process weather spatio-temporal
and reliability of weather and climate data.
series data such as radar echo or satellite image sequences.
• Weather pattern understanding tasks. This task strives
In these models, fully connected operations are replaced by
to analyze weather data to comprehend the variations
convolutional operations. For example, the update rule of
and trends in weather patterns and the climate system.
ConvLSTM can be expressed as:
It involves modeling and analyzing various elements
of the weather system, such as pressure, temperature, ft = σ(Wxf ∗ xt + Whf ∗ ht−1 + bf )
humidity, wind speed, and wind direction, to disclose it = σ(Wxi ∗ xt + Whi ∗ ht−1 + bi )
their relationships and interactions. The objective is to
ot = σ(Wxo ∗ xt + Who ∗ ht−1 + bo )
identify and interpret different weather patterns, such (2)
as cyclones, fronts, and high-pressure systems, and C̃t = tanh(Wxc ∗ xt + Whc ∗ ht−1 + bc )
deduce their impacts on weather changes and extreme Ct = ft ◦ Ct−1 + it ◦ C̃t
weather events. By gaining a deeper understanding
ht = ot ◦ tanh(Ct )
of weather patterns, we can enhance our knowledge
of weather forecasting and climate change, providing where ft , it , ot , and C̃t are forgetting gates, input gates,
decision-makers and researchers with more accurate output gates, and candidate memory cells, respectively,
and comprehensive information about the weather sys- ∗ denotes the convolution operation, and ◦ denotes the
tem. Hadamard product. The ConvGRU update rules can be
represented as follows:
4 BASIC S TRUCTURE FOR W EATHER & C LIMATE rt = σ(Wxr ∗ xt + Whr ∗ ht−1 + br )
Considering the different types of data present in weather zt = σ(Wxz ∗ xt + Whz ∗ ht−1 + bz )
and climate tasks, we mainly consider the use of Convo- (3)
h̃t = tanh(Wxh ∗ xt + rt ◦ (Whh ∗ ht−1 ) + bh )
lutional Neural Networks (CNNs), Recurrent Neural Net-
ht = (1 − zt ) ◦ ht−1 + zt ◦ h̃t
works (RNNs), Graph Neural Networks (GNNs), Trans-
formers, Generative Adversarial Networks (GANs), and where rt and zt are the reset and update gates, respectively.
Diffusion Models to mine complex correlations from these These gating mechanisms allow ConvGRU to handle long
data. In this survey, we mainly focus on Recurrent Neural time dependencies more efficiently. These formulas show
Networks, Transformers, Generative Adversarial Networks, that ConvGRU first computes the reset and update gates at
10
each time step, then computes the candidate hidden state h̃t , incorporates an additional cross-attention layer on top of the
and finally computes the new hidden state ht . The update self-attention layer to capture information from the encoder.
gate zt plays a role in determining how many new candidate To facilitate information flow and alleviate the vanishing
hidden states to use when computing new hidden states. gradient problem, residual connections [128] and layer nor-
malization modules are implemented between each layer.
4.2 Diffusion Models Multi-Head Self-Attention. At the heart of the Trans-
Diffusion Models (DMs) [118], [119] have achieved promis- former achitecture lies in the self-attention mechanism. This
ing achievements in extensive applications across a range mechanism plays a pivotal role in capturing relationships
of fields including computer vision [109], [120], [121], [122], within an input sequence. It accomplishes this by calculating
natural language processing [123], [124], [125], due to their attention scores for each element in the sequence in relation
efficacy in emulating intricate, high-dimensional data dis- to the other elements. These scores are then utilized to assign
tributions. DMs comprise a category of probabilistic gen- weights to the input sequence, resulting in the generation of
erative models and the core of these lie the principles of a new weighted sequence. The formula for the self-attention
diffusion process, which are stochastic procedures delin- mechanism is as follows:
eating the continuous stochastic motion of particles over QKT
time. At the core of these models lie the principles of H = Attention(Q, K, V) = softmax √ V, (5)
dk
diffusion processes, which are stochastic procedures delin-
eating the continuous stochastic motion of particles over where the dk denotes the dimension of the key, Q ∈ Rn×dk ,
time. These processes model spatial or temporal diffusion K ∈ Rm×dk , V ∈ Rm×dv are the query matrix, key matrix
wherein particles incline towards transitioning from zones and value matrix respectively, which are linear transforma-
of high concentration to those with lower densities, fa- tions of the same input sequence X ∈ Rn×d (or feature
cilitating a gradual assimilation or blending of quantities. matrix from the previous layer) based on three weight
The principal concept involves conducting a sequence of matrices Wq ∈ Rd×dk , Wk ∈ Rd×dk , Wv ∈ Rd×dv , as
diffusion steps, with each step updating the data’s prob-
Q = XWq , K = XWk , V = XWv , (6)
ability distribution. This is accomplished by incorporating
Gaussian noise into the current data samples and iteratively The attention score is obtained by computing the dot prod-
refining them. The noise addition in each diffusion step uct
√ of the query martix and key matrix, then dividing by
perturbs the data points, and the iterative refinement guides dk for scaling, and finally normalizing by softmax.
these perturbed points to gradually converge to the target Transformer uses multi-head self-attention with multiple
distribution. This iterative process is akin to a random walk sets of Q(i) , K(i) , V(i) , each set corresponding to a distinct
in the data space, where the random perturbations, guided (i) (i)
set of linear transformation matrix Wq ∈ Rd×dk , Wk ∈
by the model, eventually lead to the generation of new data (i)
Rd×dk , Wv ∈ Rd×dh , where dh is set to dhv , h is the number
points following the target distribution.
of heads. The final output of the multi-head self-attention is
Mathematically, a diffusion model describes a Markov
obtained by projecting the concatenation of a series of Hi
chain that begins with the data and ends with noise. Let’s
into a new feature space with a new weight matrix Wproj ∈
denote the data as x and the noise as z. The Markov chain
Rdv ×dproj , as follows:
has the following form:
q q
H = Multi-Head Self-Attention(Q, K, V)
xt = (1 − dt) ∗ x( t − 1) + (dt) ∗ zt (4)
= Concat(H1 , H2 , ..., Hh )Wproj , (7)
where zt is sampled from a standard Gaussian distribution,
Hi = Attention(Q(i) , K(i) , V(i) .
dt is a small time step and t is the current step. The goal
of the diffusion model is to learn the reverse transition of For decoder, there is an additional mask mechanism that
this Markov chain, i.e., to generate data from noise. This prevents query vectors from attending to the future posi-
is done by estimating the conditional distribution p(x( t − tions yet to be decoded. In addition, an extra cross-attention
1)|xt ) and sampling from it. With enough steps, the chain following the self-attention, where the Q is derived from
will transform the noise z into the data x. the output of the previous layer in the decoder, and the K
and V are transformed from the output of the last layer
4.3 Transformers of the encoder. It is designed to avoid foreseeing the true
Transformer is a DL model and has become a key infras- label while considering information from the encoder when
tructure for existing state-of-the-art (SOTA) large models encoding.
applied to NLP and other sequence-to-sequence tasks (i.e., Fully-connected Feed-Forward Layer. Fully-connected
weather forecasting) [126]. The key to this is its ability to feed-forward Layers following the attention layer is consists
handle dependencies between any part of the input se- of linear transformation and a non-linear activation func-
quence and any part of the output sequence without having tion. Denote the input matrix X ∈ Rn×di , the output of the
to rely on the order of the sequences as in RNNs [127]. feed-forward layer is
Vanilla Transformer utilizes an encoder-decoder archi- F = FFN(X) = σ(W1 X + b1 ) + b2 , (8)
tecture, where both the encoder and decoder are comprised
of a series of stacked blocks. Each Transformer layer is com- where σ(·) presents the activation function, and W1 ∈
posed of a self-attention layer and a fully-connected feed- Rdi ×dm , b1 ∈ Rdm , W2 ∈ Rdm ×do , b2 ∈ Rdo are all learnable
forward network (FFN). Additionally, the decoder block parameters.
11
Residual Connection and Normalization. Following and temporal information using graph structures. It is par-
each attention layer and each feed-forward layer, residual ticularly useful for analyzing data with both spatial and
connection and layer normalization are applied. They con- temporal dependencies. In STGNN, the basic concept in-
duct to retaining information when the model is consid- volves representing the data as a graph, where each node
erably deep and thus guarantees the model performance. represents a spatial location and the edges capture the
Formally, given a neural layer f (·), the residual connection spatial connectivity. Additionally, each node also contains
and normalization layer is defined as temporal information, representing the state of the variable
at different time steps.
Add & Norm(X, f ) = LayerNorm(X + f (X)). (9)
Spatial Graph Structure. Let G = (V, E) be the graph
Transformer Layer. The design of the Transformer model representing the spatial connections, where V is the set
enables parallel processing of the entire sequence, eliminat- of nodes representing spatial locations, and E is the set
ing the need for sequential processing of elements as in of edges representing the spatial relationships. Each node
RNNs. This parallel processing enhances its efficiency in vi represents the feature vector xi of the corresponding
handling long sequences. By utilizing a multi-layer self- location i.
attention mechanism, the Transformer model effectively Temporal Information. Let X = xti be the set of feature
captures long-distance dependencies in sequences, which is vectors for all locations at time t. Each feature vector xti
crucial for tasks involving translation, summarization, and represents the state of the variable at location i and time
other sequence-to-sequence operations. t.
Spatio-temporal Graph Convolution. STGNN incorpo-
4.4 Generative Adversarial Networks rates both spatial and temporal information through graph
Generative Adversarial Networks (GANs) [108] aim to convolution operations, which capture the relationships be-
train a generative model via adversarial processes, it have tween variables at different locations and time steps. The
widely used to image generation [29], [129], [130], super- spatio-temporal graph convolution can be represented as:
resolution [131], [132], style transferring [133], [134], and
image-based weather forecasting [135]. The fundamental X
ht+1
i = f( wij · htj + bti ). (11)
concept of GANs involves training two NNs adversarially:
j∈N (i)
a Generator G and a Discriminator D. The objective of the
Generator G is to learn the underlying data distribution and
Here, ht+1
i represents the updated feature vector of node
generate novel samples accordingly. The discriminator (D)’s
i at time t + 1, N (i) denotes the set of neighbors of node
objective is to differentiate between the samples generated
i, capturing the spatial connections between locations, wij
by the generator and the real samples.During training, the
represents the weight between node i and its neighbor j ,
generator aims to produce samples that can effectively fool
indicating the strength of their relationship, htj represents
the discriminator, while the discriminator strives to enhance
the feature vector of the neighboring node j at time t. bti is a
its ability to differentiate between real and generated sam-
bias term for node i at time t. f (·) represents an activation
ples. This process can be regarded as a two-player zero-
function, such as ReLU or Sigmoid, applied element-wise
sum game, ultimately leading to an equilibrium where the
to the sum of weighted inputs. The spatio-temporal graph
discriminator cannot distinguish between the generator-
convolution operation combines the spatial connectivity and
generated samples and the real samples.
temporal dependencies to effectively capture the evolving
The objective function of GANs can be expressed as the
patterns and relationships in the data.
following optimization problem:
min max V (D, G) =Ex∼pdata (x) [log D(x)]
G D (10)
+ Ez∼pz (z) [log(1 − D(G(z)))], 5 OVERVIEW AND C ATEGORIZATION
where x is a sample from the true data distribution pdata , z is In this section, we provide an overview and categorization
a sample from some a prior noisy distribution pz , G(z) is the of DL models for weather and climate. Our survey is
sample generated by the generator using the noisy sample structured along three main dimensions: data types, model
z , and Dx is the discriminator’s estimate of whether the architectures, and application domains. A detailed synopsis
sample x (either the true sample or the generated sample) of the related works can be found in the Table. 3. Based on
is the true sample. Training of GANs typically involves the scope of application, we primarily divide the existing
alternately optimizing two of this objective function. First, literature into two main categories: Large Foundation Models
the generator is fixed and the discriminator is optimized. and Task-Specific Weather and Climate Models. Considering the
Then, fix the discriminator and optimize the generator. This task generality of weather/climate foundation models, we
process is repeated until some equilibrium is reached, at discuss them at a high level without further subdivisions.
which point the samples generated by the generator should For task-specific weather/climate models, we categorize
be indistinguishable from the true samples by the discrimi- them based on specific underlying architectures to facilitate
nator. readers in indexing and referencing specific works accord-
ing to model architectures, including Recurrent Neural Net-
4.5 Spatio-Temporal Graph Neural Networks works, Generative Adversarial Networks, Transformers, Diffusion
Spatio-Temporal Graph Neural Networks (STGNNs) [106] Models, and Graph Neural Networks. Subsequently, at the
is a concept in machine learning that combines spatial application level, we divide the existing literature into two
12
TABLE 3: List of representative models under mainstream applications for weather and climate data. Each column
represents, in turn, data type, model category, method, scope, specific task/domain, base model, institution, and year
of publication, respectively, noting that the base model is dominated by the primary starter module. More details available
in Section. 7.
main categories based on specific data categories: Time Series the optimization of different predictors in a regionally
for Weather and Climate2 and Text for Weather and Climate. adaptive manner under the supervision of uncertainty
In the first category, we further dissect the existing loss. Given that the aforementioned large-scale models
literature into six primary classes predicated on the do- are trained via a fully supervised approach, W-MAE [64]
mains of application: Forecasting, Precipitation Nowcasting, implements unsupervised training of weather prediction
Downscaling, Data Assimilation, Bias Correction, and Weather models using a Masked Auto-Encoder (MAE)-based [202],
Pattern Understanding. For the second category, we explore [203] approach, which can be fine-tuned for downstream
it as a general subject (Climate Text Analysis), refraining tasks through various data sources. MetePFL [24] and Fed-
from subdividing it into different subtasks. This is because Wing [154] also propose a Prompt-based federated learn-
these often originate from pre-trained LLMs, and the spe- ing [204] for training large foundation models, considerably
cific task characteristics are typically delineated based on reducing the cost of collaborative model training across
downstream datasets rather than the model itself. regions while safeguarding data privacy. The rapid ad-
vancement of LLMs has led to the processing of weather
and climate tasks that are no longer restricted to visual
6 M ODELS FOR W EATHER & C LIMATE or time-series models. O CEAN GPT [197], based on LLMs,
In this section, we will delve into the advancements of proposes a methodology for processing a wide range of
Foundation Models and Task-Specific Models for weather ocean-related tasks. Beyond the foundation models used for
and climate data understanding. A categorization of repre- forecasting and simulation, C LIMATE B ERT [195] is an NLP-
sentative methods and detailed information can be found in based foundation model for processing climate-related texts.
Table. 3. It is trained on over 2 million climate-related paragraphs
from diverse sources such as news articles, research papers,
6.1 Foundation Models for Weather & Climate and company climate reports [205].
The burgeoning development of foundation models in
NLP [47], [82], [200] and CV [45], [46] has piqued research 6.2 Task-specific Models for Weather & Climate
interest in foundation models for weather and climate data In the realm of weather and climate analysis, task-specific
understanding. Large Foundation Models, created through models have been utilized for a myriad of specific tasks. This
pre-training strategies, can substantially enhance the gen- section will delve into the progress made in task-specific
eralization capability of AI-based climate models and can models for weather and climate, focusing on these principal
be fine-tuned for specific downstream tasks. Pre-training architectures: RNNs, Transformers, GANs, Diffusion Mod-
of such models necessitates large-scale sequence data, not els, and Graph Neural Networks (GNNs).
typically sourced from ordinary time-series data. • Recurrent Neural Networks (RNNs). RNNs serve as
Mindful of computational efficiency and the demand for the backbone of numerous weather forecasting mod-
timely climate predictions, Pathak et al. proposed F OUR - els [85], [97], [145], [175], [206], [207], [208], [209],
C AST N ET [136], a climate pre-trained foundation model [210], [211], [212], [213]. In addition to weather and
based on Vision Transformer and Adaptive Fourier Neural climate prediction models built on RNNs architectures,
Network Operator (AFNO) [201], for high-resolution pre- hybrid models fusing RNN with other mechanisms
dictions and rapid inference. Its training process consists of have also gained traction [146], [147], [214], [215], [216],
self-supervised pre-training and autoregressive fine-tuning [217]. For instance, the amalgamation of Swin Trans-
based on the pre-trained model. PANGU -W EATHER [63], a former [218] with RNN has given birth to models like
data-driven model leveraging the 3D Earth-specific Trans- SwinVRNN [147], which capitalize on the advantages
former, is notable for its swift, precise global predictions and of both architectures. Moreover, the fusion of SwinRNN
superior performance. It predicts atmospheric states over with generative models has led to models for the diffu-
time based on the current state, described by five upper-air sion model SwinRDM [146] and for GAN [216], [217].
variables and four surface variables on a 0.25° horizontal Added to this, physical-informed based approaches
grid with 13 vertical layers for the upper-air variables. On have been introduced [219]. Concurrently, with the
the other hand, C LIMA X [25] introduces the concept of fun- evolution of Transformer-based spatio-temporal extrac-
damental modeling to weather prediction with its fully su- tion, the integration of RNN architecture and Trans-
pervised pre-training based on the Transformer. It proposes former models to address this problem has been on the
variable disambiguation and variable aggregation strategies rise [214], [215].
for merging and revealing potential relationships between • Diffusion Models. Standard diffusion models, com-
different weather variations at various altitudes, offering prising forward noisy processes and backward denois-
promising flexibility for adapting to diverse downstream ing processes, are widely employed for learning data
tasks, including global/regional/seasonal forecasting, cli- distribution and generating data representations in me-
mate mapping, and downscaling tasks. F ENG W U [138] teorological and climatic contexts [146], [147], [150],
tackles the medium-term forecasting problem from a mul- [152], [177], [220], [221], [222], [223] [164], [224]. For
timodal, multitask perspective with a uniquely designed instance, SwinRDM [146] amalgamates SwinRNN [147]
deep learning architecture. It features a model-specific de- and diffusion models to attain high-resolution weather
coder and a cross-modal fusion Transformer that balances forecasting. However, it is important to note that the
2. The scope of time series data includes spatio-temporal series data application of diffusion models in weather and climate
and spatio-temporal video stream data. studies is still in its nascent stage.
14
Fig. 2: An overview of large foundation models for weather and climate, Left: Foundation Models specialized in weather
and climate time series (including time series, spatio-temporal series, video streams, etc.), Right: Foundation Models
specialized in climate-related text data.
• Generative Adversarial Networks (GANs). GANs casts, such as extrapolations based on radar-echo im-
have widely used in image generation tasks, ranging agery [249], satellite cloud images [250] and multi-
from generating handwritten digits [225] to generat- layer atmosphere status, thus contributing to the un-
ing large-scale image datasets [226], [227]. They are derstanding of weather patterns in the region. For the
commonly employed in weather and climate tasks for first category, the Transformer is used to perform short-
spatiotemporal video stream prediction [228], [229], and long-term forecasts, modeling dependencies on
aiming to generate realistic and temporally coherent se- variables at different points in time through positional
quences and match high-dimensional data distributions coding as well as self-attention mechanisms [178], [251],
between them. Therefore, GAN-based architecture is [252], [253], [254], [255]. As for the second category,
common in weather and climate prediction tasks aims Transformers are expected to establish complex multi-
to generate predicted future frames like ground-truth layered spatio-temporal relationships of meteorological
as same as possible [84], [230], [231], [232] [167] [170], variables at different atmospheric pressures, and the
[216], [217], [233], [234], [235], [236], [237] [238], [239]. results of this type of Transformer are usually chal-
Additional physical constraints are often introduced to lenged based on the characteristics of the data itself
improve the accuracy of weather and climate modeling (atmospheric pressures, spatio-temporal correlations,
in these hybrid models [229], [240], [241], [242], [243], variable correlations), and so on [25], [63], [64], [138],
[244], [245], [246], [247], [248]. [148]. Inspired by the fields of NLP and CV, the Trans-
• Transformers. Transformer-based models are widely former structure has also been redesigned for the devel-
used for tasks related to time series analysis due opment of large-scale weather and climate foundation
to its powerful long series modeling capabilities, models [25], [63], [138]. In addition, in the filed of NLP-
which also include responding to weather and climate based climate text analysis, Transformers is a general
change [149]. It focuses on short-term/long-term fore- architecture [196], [196], [198], [199], [256], [257], [258],
casting tasks in weather and climate applications and [259].
can be categorized into two types, the former focus- • Graph Neural Networks. In the field of weather and
ing on one-/two-dimensional forecasts of weather and climate, numerous studies have explored the appli-
climate, such as predicting trends in relevant weather cation of graph neural networks, particularly spatial-
variables globally or regionally on single atmosphere temporal graph neural networks, due to their ability to
level, and the latter focusing on multidimensional fore- establish potential spatial-temporal relationships of the
15
Earth system. [181]. Two common applications include tasks using 0.25° resolution. This achievement is based on
spatial-temporal sequence prediction [137], [142], [143], the Vision Transformer (ViT) [266] and Adaptive Fourier
[144], [183], [184], [221], [260], [261], [262], [263], [264], Neural Network Operators (AFNO). P O ET [148] introduces
[265] and spatial-temporal video stream prediction in hierarchical ensemble transformers to enhance medium-
weather forecasting [168]. In spatial-temporal sequence range ensemble weather forecasts on a global scale. T ELE -
prediction, graph neural networks are used to model V I T [153] integrates fine-grained local-scale and global-scale
the spatio-temporal dependencies and correlations in inputs, treating the Earth as one interconnected system
weather data. This involves predicting future weather for seasonal wildfire forecasting. Large models came out
conditions based on historical observations at different of nowhere when considering the ultra-large-scale, high-
locations [24], [154]. The graph structure is used to resolution global medium-term forecasting task. PANGU -
capture the spatial relationships between nodes, and W EATHER [63], a data-driven model based on 3D Earth-
the temporal dependencies are modeled using recur- specific transformers, is lauded for its rapid and accurate
rent [264], [265] or convolutional layers [144], [183]. In global forecasts. This model predicts the atmospheric state
spatial-temporal video stream prediction, graph neural at a given time based on the current state, described by
networks are employed to predict future weather con- five upper-air variables on a 0.25° horizontal grid and four
ditions in the form of video-like sequences [168]. This surface variables, with 13 vertical levels for the upper-air
involves predicting the evolution of weather patterns variables. F ENG W U [138] addresses the medium-range fore-
over time, taking into account both spatial and tempo- casting problem from a multi-modal, multi-task perspective,
ral dependencies. with its elaborate deep learning architecture with model-
specific decoders and cross-modal fusion transformers that
7 A PPLICATIONS learn under the supervision of uncertainty loss to balance
the optimization of different predictors in a regionally
This section presents an overview of prevalent DL models,
adaptive manner. FuXi [139] cascades cubic embeddings
categorized by their applications in weather and climate
and U-transformers and is trained using 39 years of high-
analysis. These applications include forecasting, precipita-
resolution in-analysis data. It delivers forecast performance
tion nowcasting, downscaling, bias correction, data assim-
comparable to that of the ECMWF EM with a temporal
ilation, climate text analysis, and weather pattern under-
resolution of 6 hr and a spatial resolution of 0.25° in a
standing.
15-day forecast. The FuXi-Extreme model [155] employs
a denoising diffusion probabilistic model (DDPM) [118]
7.1 Forecasting to refine the surface forecast data generated by the FuXi
Accurate weather and climate forecasting is critical for en- model [139] in 5-day forecasts, thereby enhancing extreme
vironmental and societal planning. Significant strides have rainfall/wind forecasting. As an all-purpose foundation
been made in developing robust DL methods that model model, C LIMA X [25] introduces the concept of founda-
the nonlinear associations between historical and future tion modeling to the field of weather prediction, with its
weather patterns. This section mainly focuses on discuss the fully supervised pre-training based on the Transformer, and
advancement in the task of weather and climate forecasting proposes variable tokenization and variable aggregation
based on time series and spatio-temporal series. The most strategies for fusing and mining the potential relationships
common in such tasks are RNNs-based architecture, which of different weather variations at different heights, which
are widely used due to their autoregressive (AR) architec- gives it very promising flexibility to adapt to different
ture [145], [146], [147], [212], [213]. For instance, DWFH in- downstream tasks, including global/regional/seasonal pre-
troduces conductive long and short-term memory models to diction, as well as the tasks of climate mapping, and down-
enhance data-driven deep weather prediction models [145]. scaling. While the aforementioned models are trained in a
Ref. [212] merges the LSTM and an adaptive neuro-fuzzy fully supervised-based pre-training, W-MAE [64] leverages
inference system (ANFIS) for atmospheric pressure forecast- a Masked Auto-Encoder (MAE)-based approach [202], [203]
ing. SwinRDM introduces the SwinRNN as a fundamental for self-supervised training in weather forecasting models,
component for high-resolution weather forecasting [146], potentially allowing fine-tuning by different data sources to
and diffusion models to achieve high-resolution weather adapt to downstream tasks.
forecasting at 0.25 degrees using a two-step training strat- Generative AI are carving a niche in the field of climate
egy: first, cyclic prediction of future atmospheric fields is and weather forecasting, with several promising approaches
performed at low resolution, followed by high-resolution recently reported. SEEDS [150], for instance, employs an
and fine-grained atmospheric detail reconstruction based array of finely-tuned ensemble simulators to generate prob-
on the diffusion-based super-resolution model. Moreover, abilistic weather forecasts. These forecasts are akin to the
S WINVRNN employs a Recurrent Neural Network-based ar- “seeds“ of weather states provided during the inference
chitecture with variations loss to improve long-lead weather process, with two different ensemble simulators gener-
forecasts [147]. In addition, Transformer, especially Vision ating two distinct event predictions. However, the self-
Transformer, is also widely used in weather and climate regression mechanism underpinning this approach, similar
prediction based on spatio-temporal series due to its bright to the RNN architecture used in diffusion model training,
performance in modeling potential representational asso- is susceptible to instability and feature dissipation over
ciations between image regions using Patch mechanism time, particularly in long-range forecasting tasks. Contrast-
and self-attention mechanism. F OUR C AST N ET [136] deliv- ingly, Dyfussion [151] uses pristine initial conditions, while
ers impressive performance in various weather forecasting the PDE-Refiner [222] enhances the diffusion process-based
16
predictions by iteratively observing them to capture low- presents an innovative approach by constructing a knowl-
amplitude information that may not be immediately evident edge graph from open weather observations published by
in the data. DITTO [152] adopts a unique approach, generat- Météo-France. This model is built upon a semantic schema
ing a continuous interpolation between the initial and final that encapsulates the knowledge of meteorological observa-
time steps, and using time fireworks instead of incremental tions for an array of downstream scenarios.
noise in the forward process. TemperatureGAN [242], a
conditional GAN, considers factors such as the month, loca- 7.2 Precipitation Nowcasting
tion, and time period to generate atmospheric temperature The domain of precipitation nowcasting has garnered sub-
predictions at an hourly resolution above ground level. stantial advancements through the application of DL tech-
Furthermore, GANs that integrate physical information niques, including CNNs [84], [271], [272], [273], [274], [275],
constraints are being deployed to emulate ocean systems, RNNs [85], [97], [238], [274], [276], [277], and Transform-
thereby enhancing climate prediction capabilities [244], ers [107], [166], [171], [173], [176], [278]. These methodolo-
[245], [246], [247], [248].For instance, Refs. [244], [245] de- gies have demonstrated remarkable proficiency in manag-
scribe GAN-based models that learn underlying physical ing spatio-temporal data, a prevalent format in Earth system
relationships between surface and subsurface temperatures observation.
in numerical models. Subsequent calibration of model pa- C ONV LSTM [97] was pioneering in its integration of
rameters using observational data leads to enhanced pre- deep learning for processing precipitation proximity fore-
dictions. PGnet [248] is a generative neural network model casts, effectively amalgamating CNN and LSTM to man-
that uses a mask matrix to identify regions of low-quality age spatio-temporal radar data. Successive models, such as
prediction generated during the initial physical stage. The P RED RNN [209] and E3D-LSTM [210], similarly incorpo-
generative neural network then uses this mask as a prior for rate spatio-temporal data within LSTM and CNN architec-
the second stage of fine prediction. WGC-LSTM [260] har- tures to extract long-term higher-order correlations. P HY D-
nesses graph convolutions to capture spatial relationships N ET [279] introduces partial differential equation (PDE)
and amalgamates these with LSTM to concurrently consider [280] constraints into its theoretical space. M ET N ET [165]
both spatial and temporal relationships. and its subsequent iterations, M ET N ET-2 and M ET N ET-
Reflecting upon the intricate interconnections between 3 [281], proposed an architecture based on ConvLSTM and
atmospheric elements, surface variables, and precise ter- advanced CNNs, thereby enabling proficient precipitation
restrial coordinates within the Earth system, a substan- forecasting up to 12 hours ahead.
tial amount of research has utilized graph-based method- The ascension of Transformers in the visual realm has
ologies for weather and climate prediction tasks. For in- benefited the spatio-temporal video streaming data-based
stance, K EISLER G RAPH N EURAL N ETWORK [142] lever- approach to rainfall prediction. For instance, PTCT [173]
ages a graph neural network architecture [267] to achieve divides original frames into multiple patches to eliminate
weather forecasting. It uses an encoder that maps the orig- inductive bias constraints. It also applies 3D temporal con-
inal 1° latitude/longitude mesh to an icosahedral mesh, volutions to effectively capture short-term dependencies.
performs message passing computations on this mesh, and The Preformer [176] model proposes an encoder-translator-
then decodes back into latitude/longitude space. G RAPH - decoder architecture where the encoder integrates spa-
CAST [137], on the other hand, also utilizes a GNN-based tial features from multiple elements, the translator mod-
framework for weather prediction, albeit with a much els spatio-temporal dynamics, and the decoder combines
higher resolution and flexibility. It stands as the inaugu- temporal and spatial information for future precipitation
ral large-scale foundation model for weather and climate prediction. Rainformer [171] introduces global feature ex-
predictions based on graph methodology. G RAPHINO [140], traction units and gate fusion units (GFUs) to balance
a globally spatial GNN, is specifically designed for sea- the fusion of local and global features, thereby enabling
sonal forecasting tasks, including prediction of the El efficient rainfall prediction. T EMP EE [107] proposes a par-
Nino-Southern Oscillation (ENSO) phenomenon [268]. The allel use of spatio-temporal encoders and decoders based
model begins by constructing an initial graph with grid on the Transformer architecture, achieving promising re-
cells as nodes and learns the edges based on the con- sults in the egoless regression strategy for handling non-
nectivity between geographical locations. In addition, GE- stationary spatio-temporal sequences. This significantly im-
STDGN [143] employs a graph structure learning and opti- proves the accuracy of precipitation nowcasting. E ARTH -
mization method underpinned by the evolutionary multi- FORMER model [172], based on Cuboid Attention, is utilized
objective optimization (EMO) algorithm known as graph for Earth system forecasting, including precipitation now-
evolution [269]. This augments the model’s ability to ana- casting and ENSO [282].
lyze intricate node correlations for spatio-temporal weather Taking into account the instructive role of knowledge
sequence prediction. HiSTGNN [144] features an adaptive from other modes, multimodal spatial-temporal tasks have
graph learning module that builds a self-learning hierar- been introduced [174], [175], [276]. The MM-RNN [174] in-
chical graph [270]. This graph is comprised of a global troduces elemental knowledge to guide precipitation now-
graph that represents region-specific information and a lo- casting, enforcing a constraint that requires the movement of
cal graph that encapsulates meteorological variables within precipitation to follow basic atmospheric laws of motion for
each region. The model effectively identifies hidden spatial accurate forecasting. STIN [175] utilizes spatio-temporally
dependencies and diverse long-term weather patterns using specific filters to generate precipitation forecasts from multi-
graph convolution and gated temporal convolution with a modal meteorological data. Recently, precipitation nowcast-
dilated initial as its core structure. Lastly, WeKG-MF [261] ing, viewed as an uncertainty assessment problem, has
17
also benefited from the successful application of diffuison between different weather systems. One strategy to tackle
modeling. this issue is the enhancement of weather data’s spatial res-
Recently, precipitation nowcasting, viewed as an uncer- olution, a process referred to as super-resolution (SR) [284],
tainty assessment problem, has also benefited from the suc- [285]. SR can bolster the resolution of gridded data, surpass-
cessful application of generative modeling. DGMR employs ing conventional interpolation methods in effectiveness.
an adversarial training methodology to generate sharp and A popular DL-based SR model, U-Net, leverages a
accurate proximity forecasts, which solves the problem of synergistic encoder-decoder structure to produce high-
fuzzy prediction. DMSF-GAN [84], on the other hand, resolution outputs from low-resolution inputs [286], [287],
completely eschews autoregressive strategies and is based [288]. Within the realm of semi-supervised learning, gen-
on adversarial training and pure CNN architectures to erative adversarial networks (GANs) have demonstrated
address the problem of feature dispersion over time.PCT- potential in enhancing the representation of more intricate
C YCLE GAN [170] generates temporal causality using two structures and details [164], [217], [236], [237], [289], [290],
generator networks with forward and backward temporal [291], [292]. The typical procedure involves training the
dynamics. Each generator network learns a multitude of generator to learn the potential mapping between low- and
one-to-one mappings on precipitation data based on time- high-resolution grid data or images. For example, Stengel
dependent radar to approximate a mapping function rep- et al. presented an adversarial DL approach that super-
resenting the temporal dynamics in each direction. The resolves the predictions of wind speed and solar irradiance
MPL-GAN [167] utilizes a multi-path learning strategy to in global climate models to a sufficient scale for renewable
improve the diversity of generated sequences while pro- energy resource assessment, thereby improving the resolu-
viding accurate predictions. In addition, to simultaneously tion of wind and solar energy data nearly fiftyfold [292].
handle uncertainty and enhance domain-specific standards, Recent research has deployed diverse strategies such
PreDiff [177] adopts a two-stage probabilistic spatiotem- as normalizing flows and neural operators. Self-supervised
poral prediction pipeline, incorporating explicit knowledge learning-based methods have also been investigated for
control mechanisms to enforce predictions conforming to downscaling low-resolution grid weather data. For instance,
specific domain’s physical constraints. This is achieved by the pre-trained foundation model, C LIMA X [25], allows fine-
estimating the bias of the constraints imposed in each de- tuning for resolution downscaling. González et al. intro-
noising step and correspondingly challenging the overfit- duced a downscaling strategy based on multi-variable phys-
ting distribution. GED [169], known as Generative Ensemble ical hard constraints, ensuring the physical relationships
Diffusion, utilizes a diffusion model to generate a set of pos- between variable sets [161].
sible weather scenarios which are then amalgamated into Physics-constrained DL-based methods have also been
a probable prediction via the use of a post-processing net- proposed to improve the model’s performance via exter-
work. Ref. [231] utilizes radar-based deep learning models nal adjustment [156], [157], [158], [162], [163], [293], [294].
for skillful short-term precipitation forecasting, achieving For example, MeshfreeFlowNet [156] employs a physics-
display-consistent predictions over a 1536x1280 km region. informed model which incorporates Partial Differential
The introduction of physical constraints and graph rela- Equations (PDEs) as regularization terms into the loss
tions can improve the efficiency and accuracy of the model. function, achieving spatio-temporal downscaling. Harder
Ref. [240] introduces a generative adversarial network with et al. [158] were the first to apply hard-constraining to
physical information constraints to improve both local dis- achieve fine-grained downscaling outputs in climate change
tribution and spatial structure for daily precipitation field datasets. Furthermore, strategies such as contrastive learn-
improvement. CNGAT [168] fuses spatial and temporal ing [285] and Betrays DL models [159] were adpoted.
information for improved Radar quantitvative precipitation In response to the lack of interpretability of DL-based
estimate (RQPE) [283]. The precipitation estimation area was downscaling methods, Gong et al. explored the inter-
partitioned into subares that were treated as nodes to from pretability of fundamental CNNs in climate model down-
an input graph. All nodes were then categorized according scaling strategies, thus paving the way for trustworthy
to the temporal mean radar reflectivity for precipitation artificial intelligence in downscaling models [273]. Bano et
estimation with an attention mechanism. al. analyzed the downscaling issue from a multimodel per-
spective, developing a CNN-based downscaling prediction
ensemble (DeepESD) for temperature and precipitation in
7.3 Downscaling the European EUR-44i (0.5°) domain based on eight global
Achieving precise, fine-grained weather predictions neces- circulation models [160]. This represents the first application
sitates high spatial resolution data. However, most global of CNNs in generating a downscaled multimodel ensemble
weather forecasting models are restricted by the availability based on perfect prognosis methods, allowing for the quan-
and scale of data, resulting in an over-reliance on data tification of model uncertainty in climate change signals.
with approximately a 5.625° spatial resolution, equivalent The introduction of uncertainty modeling also allows
to a grid point spacing of about 625 kilometers. Despite downscaling gains in DL-based models, significantly im-
these limitations, the data volume is significant. For in- proving efficiency as well as reconstruction resolution. Res-
stance, the data scale of the ERA5 system at a 0.25° spa- Diff [164] employs a two-step diffusion model-based ap-
tial resolution is several tens of times larger—around 15 proach. In the first step, U-Net regression predicts the mean
terabytes—compared to the 5.625° spatial resolution data. values, while in the second step, the diffusion model pre-
High spatial resolution data offer a more granular represen- dicts the residuals, thereby achieving kilometer-scale atmo-
tation of complex atmospheric processes and the interplay spheric downscaling. However, it should be noted that the
18
use of diffusion models in the field of weather and climate is support adjoint-based data assimilation in ocean modeling.
still in the exploratory stage. Ref. [223] also employs similar Furthermore, Bocquet et al. [295] innovatively combine DA,
operations, utilizing diffusion models for cloud cover and machine learning, and expectation maximization to perform
super-resolution diffusion models for high-resolution solar Bayesian inference of chaotic dynamics, enabling the assim-
energy forecasting. ilative reconstruction of observational data for geophysical
flows. For an in-depth review of DA, we refer readers to
7.4 Bias Correction Geer’s work [296].
Bias correction in weather predictions has traditionally re-
lied on statistical methods [186]. Over time, these techniques 7.6 Climate Text Analysis
have evolved, embracing machine learning strategies such The rapid development of LLMs has provided new in-
as Deep Belief Networks and Support Vector Machines. The sights for climate text analysis. Hershcovich et al. intro-
advent and proliferation of data availability have further duced a climate performance model card, designed with
catalyzed the shift towards deep learning methodologies, the intent of practical application requiring minimal in-
including Long Short-Term Memory (LSTM) [187], [188], formation on the experimental setup and associated com-
[189] and Convolutional Neural Networks (CNN) [185], puter hardware [297]. A language model, known as C LI -
[190], [191]. These methodologies have been instrumental MATE B ERT [195], was developed with a foundation in D IS -
in mitigating common weather-related biases. TILL R O BERTA , specifically designed for analyzing climate-
A notable approach is the DL-Corrector-Remapper tech- orientated text. This versatile model can be employed in
nique [192], which stands apart in its ability to correct, a variety of tasks, such as detecting climate-related con-
remap, and fine-tune gridded uniform forecasts from the tent, discerning sentiment in climate-related paragraphs,
FourCastNet system. This process enables a direct compar- identifying commitment and action-related content, distin-
ison with non-uniform, sparse observational ground truth guishing specific from non-specific climate-related text, and
data via the AFNO method. The Super Resolution Deep assigning climate-related content to one of four categories
Residual Network (SRDRN) [191] has been employed for as per the recommendations of the Task Force on Climate-
climate downscaling and bias correction. This network uti- related Financial Disclosures (TCFD Further refinement of
lizes stacked general circulation models and extracts spatial C LIMATE B ERT is seen in the work of Garrido-Merchán et
features, effectively diminishing biases and correcting spa- al. [199], who utilized ClimaText [205] to fine-tune the
tial dependencies relative to observational data. model for the specific task of analyzing disclosures relat-
In an intriguing application, the Unsupervised Image- ing to financial risks connected with climate change. An
to-Image Translation (UNIT) network [193] capitalizes on extension, C LIMATE BERT-N ET Z ERO [198], was designed
unpaired image translation for bias correction. This method to classify whether a given text contains a net zero or
offers a novel perspective on bias mitigation. Hess et reduction target. Krishnan et al. employed C LIMATE BERT
al. [240] have proposed a post-processing technique that in their C LIMATE NLP project, analyzing public sentiment
employs a physics-constrained Generative Adversarial Net- towards climate change using data gathered from Twitter
work (cGAN) to concurrently correct biases in local fre- and Facebook [257]. Auzepy et al. proposed the use of
quency distribution and spatial patterns of state-of-the-art pretrained LLMs’ zero-shot capabilities to evaluate TCFD
CMIP6-level Earth System Models. reporting [258] . However, this approach is not without its
Recently, the WeatherGNN model [194] has been devel- challenges. Pre-trained LLMs often lack up-to-date infor-
oped, which leverages a Graph Neural Network within a mation and tend to use imprecise language, a significant
comprehensive framework. This model learns the intricate disadvantage in the field of climate change where accuracy
relationships between weather and geography, capturing is paramount. To mitigate this, Kraus et al. incorporated
meteorological interactions and spatial dependencies be- emission data from ClimateWatch and utilized a general
tween grids. This approach provides a robust and sophisti- Google search to enhance the language model [259]. Vaghefi
cated tool for bias correction. These advancements illustrate et al. integrated information from the Intergovernmental
the potential of deep learning methodologies in refining Panel on Climate Change’s Sixth Assessment Report (IPCC
weather prediction systems. AR6) into GPT-4 [298], laying the groundwork for the im-
plementation of conversational AI in the realm of climate
7.5 Data Assimilation science [256] . In the intersection of climate and health,
C LI M ED BERT [196] was developed for diverse applications,
Data assimilation (DA) is a key component of high-level
including understanding climate and health-related con-
NWP systems. These systems not only forecast future states,
cepts, fact-checking, relationship extraction, and generating
but also integrate observational data to establish the initial
evidence on the impact of health on policy text generation.
state, guiding the model’s trajectory to future states. This
Additionally, Bi et al. have proposed OceanGPT [197], based
complex process is computationally demanding, making it
on LLM (e.g., Llama [82] and GPT3.5 [48]), to handle specific
an active area of research. Existing approaches often rely
tasks related to the ocean, such as ocean text analysis and
on simplifying assumptions, such as linearity, which adds
intelligent underwater agent instructions.
to the challenges in the field. However, the integration of
deep learning into DA is gaining recognition, with encour-
aging research outcomes. For instance, O CEANFOURCAST 7.7 Weather Patterns Understanding
[149] employs neural operators alongside a Transformer- Weather pattern understanding, as opposed to forecasting,
based architecture, inspired by F OUR C AST N ET [136], to tends to lean towards a qualitative analysis of climate
19
change. By integrating predictions derived from reanalysis Ref. [178] incorporates a Transformer-based architec-
datasets, we can more effectively quantify the potential im- ture, considering long-term correlations among meteo-
pact of future weather events. Traditional numerical meth- rological variables. A spatial-temporal Transformer for
ods, though costly, rely on manually crafted features such as multi-year ENSO prediction is suggested by Ref. [306].
fronts, tropical cyclones [299], extratropical cyclones, and at- ENSO-GTC [181] applies the Global Teleconnections
mospheric rivers, using heuristic detection algorithms based Coupler (GTC) for potential teleconnections between
on empirical knowledge. However, weather patterns with global SST. Ref. [65], [66] develop an interpretable deep
more distinct features, like tornadoes and typhoons, may be learning model for ENSO forecasting. Ref. [179] intro-
more amenable to pattern detection and prediction due to duces a holistic deep learning model for ENSO that
their characteristic features. For instance, a typhoon’s eye integrates seasonality in climate data to enhance fore-
and surrounding rainbands present distinct patterns. This cast fluctuation. Comprehensive reviews and surveys
pattern detection and prediction could potentially prove on deep learning-based ENSO forecasting can be found
more advantageous than predicting the general atmospheric in Refs. [268], [307].
state in standard training. One approach might be to employ • Climate Tipping Points. Climate tipping points denote
spatio-temporal video stream data, such as radar reflectivity crucial thresholds within the climate system where
data [107] and weather satellite cloud imagery [300]. This the system undergoes significant and irreversible alter-
transition from spatio-temporal weather video stream data ations in response to certain changes or external forc-
to predictions offers a more dynamic and visually intuitive ings [281], [308], [309]. These transitions can instigate
method for weather pattern understanding. major climate system shifts, including modifications in
Weather pattern understanding based on DL techniques oceanic circulation patterns, accelerated glacier melting,
often requires large-scale, well-annotated samples. In one and climate zone migration. The transgression of these
study, Kashinath et al. [301] created a dataset suitable for tipping points can destabilize the long-term equilib-
tropical cyclone (TC) detection in the 25km CAM5.1 model. rium of the climate system, inciting more severe cli-
They achieved fine-grained and rapid segmentation of TCs mate transformations. TIP-GAN [243] is a Generative
and atmospheric rivers (ARs) using DL-based segmenta- Adversarial Network (GAN)-based model designed to
tion algorithms. Racah et al. [302] extended this dataset identify potential climate tipping points in Earth system
to detect and precisely locate TCs, extra-tropical cyclones models, with a particular emphasis on precipitating
(ERCs), ARs, and tropical low-pressure systems using a 3D- the collapse of the Atlantic Meridional Overturning
CNN. Furthermore, Sobash et al. [303] combined CNNs and Circulation (AMOC). Additionally, a neural-symbolic
logistic regression (LR) to detect tornadoes in six-hourly question answering program translator, NS-QAPT, is
dynamical forecasts and turbulence conditions in regional presented as a neural-symbolic approach to enhance
or high-resolution weather forecasts. In addition to detect- the interpretability and explainability of deep learning
ing different weather patterns from large-scale reanalysis climate simulations applied to climate tipping point
datasets, advanced AI models are frequently used to study detection [281]. Further relevant works can be explored
the evolutionary processes of meteorological phenomena. in Refs. [310], [311].
These include the genesis and dissipation of typhoons in • Madden-Julian Oscillation. The Madden-Julian Oscil-
a regional context, as well as the movement trajectories of lation (MJO) [312], [313] is a substantial atmospheric
TCs. Next, we will proceed to conduct a literature review circulation phenomenon predominantly observed near
and discussion on the field of weather pattern understand- the equator. It is characterized by regular oscillations
ing, focusing on climate phenomena and extreme weather in convection activity and precipitation in equatorial
events. regions, with a typical duration spanning 20 to 90
days. The MJO exerts substantial influence on global
7.7.1 Climate Phenomena Understanding and Prediction weather and climate systems, impacting precipitation
We mainly focus on discussing three primary climate phe- patterns, wind fields, and the origination and evolution
nomena/representation in the global scale, including El of tropical cyclones. Consequently, comprehension and
Niño-Southern Oscillation, Climate Tipping Points, and prediction of the MJO are vital for accurate precipitation
Madden-Julian Oscillation. forecasting and disaster prevention, thereby effectively
• El Niño-Southern Oscillation. The El Niño phe- managing and mitigating potential risks. DK-STN [184],
nomenon, arising from intense ocean-atmosphere in- leveraging spatio-temporal knowledge embedding, has
teractions, is marked by heightened sea surface tem- notably enhanced the prediction accuracy of the ANN
peratures (SST), a levelled equatorial Pacific thermo- method, while preserving high levels of efficiency and
cline, and a diminished tropical Pacific Walker circu- stability. For further related works, refer to Ref. [8].
lation [304]. Together with its inverse phase, La Niña,
it constitutes the El Niño-Southern Oscillation (ENSO) 7.7.2 Extreme Weather Prediction and Understanding
cycle. This cycle, with a duration of 2 to 7 years, This discussion primarily centers around the application
is the principal driver of global climate interannual of DL models for the prediction and understanding of
variability, frequently correlating with significant global four pivotal extreme weather events: Extreme Temperatures,
climatic and socio-economic repercussions [305]. Con- Drought, Cyclones, and Extreme Precipitation.
sequently, accurate ENSO forecasting is of paramount • Extreme temperatures. Extreme Temperatures. These
scientific and practical significance. Several methods often present as intense, prolonged, and frequent heat-
have been proposed to enhance ENSO forecasting. waves [74], imposing substantial challenges to human
20
activities and the ecological environment. Extreme tem- cyclones, or directly to precipitation fields [320], [321].
perature events are typically defined as a series of days De Burgh-Day & Leibnberg [322] proposed a systematic
with temperature variables surpassing a specific thresh- model ablation study as a potential approach to address
old or evaluated using accumulation indices composed the interpretability issue of DL models while main-
of amplitude, duration, and frequency. Data-driven cli- taining their good skill. Additionally, some DL-based
mate models rooted in machine learning/deep learning strategies aim to handle cyclones and extreme precipita-
have demonstrated effectiveness in extreme tempera- tion forecasting via meteorological image extrapolation
ture prediction tasks. Techniques such as random forest (refer to Section. Precipitation Nowcasting), and others
and XGBoost have offered promising results. Further- focus on improving model outputs by achieving high-
more, convolutional neural networks, recurrent neural resolution observations to enhance the representation
networks, and Transformers have seen extensive use in of precipitation or wind patterns associated with cy-
extreme temperature prediction due to their capacity to clones rather than directly performing the prediction
capture spatiotemporal representations. task [323], [324] (refer to Sec. 7).
• Drought. Droughts occur at various spatiotemporal
scales and involve multiple triggering mechanisms,
8 R ESOURCES
which complicates a clear and comprehensive defini-
tion [314]. They represent an extremely complex natural In this section, we catalog the prevalent datasets and tools
disaster. Recent research has gravitated towards using pertinent to weather and climate change analysis, aspiring
AI algorithms [315] based on geospatial weather data to streamline their accessibility for practitioners.
for long-term drought prediction, such as in [180],
[182], [316]. For example, Ref. [180] proposed a one- 8.1 Dataset
dimensional CNN combined with a GRU for evapo- This segment classifies datasets employed in data-driven
transpiration prediction, enabling the model to better weather and climate studies. These datasets facilitate
capture dependencies in time series data. Meanwhile, weather time-series analysis, weather spatio-temporal series
Ref. [316] combined CNN and LSTM for drought pre- analysis, weather spatio-temporal video stream analysis,
diction one month in advance. A more comprehensive and climate text analysis. We bifurcate them into two cat-
review of AI applications in drought prediction can egories: weather and climate series data and climate text
be found in Refs. [180], [317]. However, most existing data. It’s noteworthy that the datasets are unordered.
studies are geographically focused, causing the model
performances to heavily depend on specific research 8.1.1 Weather and Climate Series Data
conditions such as the study area, drought index, or This subsection concentrates on datasets related to weather
considered input variables. This dependency makes it and climate sequences, encompassing time series, spatio-
difficult to generalize major findings from one study to temporal sequences, spatio-temporal video streams, and
another. multimodal sequence data.
• Cyclones & Extreme Precipitation. In tropical and mid- CMIP6 [88], [89], [90] is a compendium of simulated data
latitude regions, weather-scale cyclones represent some from Phase 6 of the Coupled Model Comparison Project
of the most extreme events causing significant economic (CMCP). It encompasses a wide array of different climate
damage due to heavy rainfall, strong winds, and storm variables within the Earth system, such as precipitation,
surges [318]. Evidence suggests that climate change temperature, evapotranspiration, and others. The data, de-
may amplify the severity of these extreme events, even rived from over 150 climate models, spans more than 150
if not their frequency [299]. However, predicting their years (1850-2015). It can be utilized to predict the ENSO
variability on sub-seasonal to decadal timescales re- phenomenon and common climate variables.
mains a challenge [319]. Heavy precipitation events ERA5 [88], [89], [90] is widely used for training and
are not always linked with large-scale weather systems benchmarking data-driven weather and climate forecast-
such as cyclones or fronts; many impactful events are ing, down-scaling, and projection models. Managed by the
tied to brief, small-scale severe convective events. These European Center for Medium-Range Weather Forecasting
extremes pose a greater challenge for operational cli- (ECMWF) [343], it is regularly updated. ERA5 contains
mate prediction systems as their spatial resolution is hourly data on a 0.25° grid from 1979 till present, at 37
too coarse to capture the explicit representation of con- different pressure levels, as well as various surface climate
vection. In most regions where extreme precipitation is variables, resulting in nearly 400,000 data points at a resolu-
analyzed, the skill of numerical climate prediction sys- tion of 721 × 1440.
tems for extreme precipitation decreases significantly HCOSD3 is provided by the Institute for Climate and
after a few days. AI techniques have been applied to Applied Frontier Research (ICAR), is a refined subset of the
improve the prediction of cyclones and heavy precipi- CMIP dataset. Standing for Historical Climate Observation
tation events from various perspectives. The objective and Stimulation Dataset, it includes historical simulated
is to enhance the skill of numerical prediction sys- data from the CMIP5/6 model and assimilated data from
tems (e.g., seasonal forecasting) in representing extreme nearly a century of historical observations, reconstructed
weather events by identifying the relationship between from the US SODA model [344]. Each sample encapsulates
large-scale driving factors and the occurrence of ex- meteorological and spatial variables, such as sea surface
treme events. This approach has been applied to large-
scale extreme events, such as tropical or extratropical 3. https://tianchi.aliyun.com/dataset/98942
21
TABLE 4: Summary of weather and climate-related dataset resources in different applications. (FO: Forecasting; PR:
Projection; DO: Downscaling; BC: Bias Correlation; DA: Data Assimilation; WPU: Weather Pattern Understanding; PN:
Precipitation Nowcasting; CTA: Climate Text Analysis. All datasets have accessible hyperlinks on their names.
Applications
Data Types Dataset Statistics Timeframe FO PR DO BC DA WPU PN CTA
CMIP6 [88], [89], [90] Reanalysis Grid Data 1850-2100 ! ! ! ! ! % % %
ERA5 [88], [89], [90] Reanalysis Grid Data 1779 to the present ! ! ! ! ! % % %
HCOSD Reanalysis Grid Data 1850-2100 ! ! % % % ! % %
Reanalysis/Simulation Extreme-ERA5 [90] Reanalysis Grid Data 1979-2018 ! ! ! ! ! ! % %
ExtremeWeather [302] Reanalysis Grid Data 1979-2005 ! ! % % % % ! %
ClimateNet [301] Reanalysis Grid Data 1996-2010 ! ! % % % ! % %
ENS-10 [325] Reanalysis Grid Data 1998-2017 ! % ! ! ! % % %
ClimART [221] Reanalysis Grad Data 1979-2014 ! ! % ! ! ! % %
China-Precipitation,Temperature [326] Observation Data 1901-2017 ! % % % % % % %
Digital Typhoon [327] Observation Data 1978-2022 ! % % % % ! ! %
DroughtED [314] Observation Data June 2017 - December 2017 ! % % % % ! % %
IowaRain [328] Observation Data 2016-2019 ! ! % % % ! % %
SRAD2018 Observation Data 2018 % % % % % ! ! %
Time Series Observation
KnowAir [329] Observation Data 2015-2018 ! % % % % % % %
NASA [24] Observation Data 2012-2022 ! % % % % % % %
PRISM [90] Observation Data 1895 to the present ! ! ! ! ! ! % %
RainNet [330] Observation Data None % % ! % % ! ! %
Continental United States Wind Speeds [331] Observation Data 2007-2013 % % ! % % % % %
Continental United States Solar Irradiance [331] Observation Data 2007-2013 % % ! % % % % %
EarthNet2021 [332] Multimodal Observation Data 2018 ! ! % % % ! % %
KoMet [333] Multimodal Observation Data 2011-2018 ! % % % % % % %
Germany [334] Multimodal Observation Data 2011-2018 ! % % % % ! ! %
China [335] Multimodal Observation Data 2020-2021 ! % % % % ! ! %
MeteoNet [336] Multimodal Observation Data 2016 to 2018 ! % % % % ! ! %
Multimodal
RAIN-F [337] Multimodal Observation Data 2017-2019 ! % % % % ! ! %
RAIN-F+ [337] Multimodal Observation Data 2017-2019 ! % % % % ! ! %
SEVIR [86] Multimodal Observation Data None ! ! % % ! ! ! !
RainBench [338] Multimodal Observation & Reanalysis Data 2000-2017 ! ! ! % % ! ! %
Weather2K [87] Multimodal Observation Data 2017-2021 ! % % % % % % %
LSDSSIMR [300] Multimodal Observation & Reanalysis Data 2020-2022 ! ! % % % ! % %
CLIMATE-FEVER [339] Climate-related Text None % % % % % % % !
ClimateBERT-NetZero [198] Climate-related Text None % % % % % % % !
ClimaText [205] Climate-related Text None % % % % % % % !
Text CLIMA-INS [340] Climate-related Text 2012-2021 % % % % % % % !
CLIMA-CDP [340] Climate-related Text 2012-2021 % % % % % % % !
CLIMATESTANCE & CLIMATEENG [341] Climate-related Text None % % % % % % % !
SCIDCC [342] Climate-related Text None % % % % % % % !
temperature anomalies, heat content anomalies (T300), lati- allel categorization of USDM drought levels, including no
tudinal wind anomalies, and longitudinal wind anomalies, drought (none), abnormally dry (D0), moderate (D1), severe
with data dimensions (year,month,lat,lon). The training data (D2), extreme (D3), and abnormal (D4). Additionally consid-
offers Nino3.4 index-labeled data for the corresponding ering that drought is a seasonal phenomenon, seasonal char-
month. The testing data comprises 12 randomly selected acteristics were also included within the dataset. In addition,
time series from multiple international oceanographic data a location metric was included, including topographic slope,
assimilation results. gradient, and elevation for each site, as well as land use
Extreme-ERA5 [90] is a subset constructed by Climate- (e.g., rain-fed cropland or forested land) for each site, and
Learn from ERA5 to evaluate the prediction capability of soil quality, such as toxicity or nutrient utilization.
data-driven models under extreme weather conditions. It Digital Typhoon [327] is a image-based dataset utilized
comprises various extreme weather events, defined by cli- to long-term spatio-temporal modeling for tropical cyclones.
mate variables exceeding localized thresholds (e.g., heat- It is created from the comprehensive satellite image archive
waves and cold breaks due to sea level temperature anoma- of the Japanese geostationary satellites eries, Himawari,
lies). The dataset covers the period 1979-2018, with 1979- from Himawari-1 to Himawari-9. The dataset consists of
2015 considered the training dataset. 1,099 typhoons and 189,364 images. Geographically, it cov-
PRISM [90] is a dataset contains myriad observed atmo- ers the complete record of typhoons occurring in the North-
spheric variables, including but not limited to temperature western Pacific region, with a time span from 1978 to 2022.
and precipitation for the conterminous U.S. region. Main- The dataset has a temporal resolution of one hour and a
tained by the PRISM Climate Organization at Oregon State spatial resolution of 5km.
University, the dataset spans from 1895 to the present. At its EarthNet2021 [332] is large dataset for Earth surface
highest resolution, it provides daily data based on 4 km x 4 prediction, extreme summer prediction and seasonal cycle
km grid cells, forming a matrix of shape 621 x 1405. prediction. It contains more than 32,000 samples of Sentinel
DroughtED [314] is a drought forecast data that com- 2 (high temporal and spatial resolution Earth satellite) Class
bines 180 daily weather observations for the continental 2A imagery, as well as daily climate data derived from the E-
United States and geospatial location metadata for all 3,108 OBS observational dataset containing interpolated ground-
counties. It includes meteorological real-time and historical truth observations of weather from multiple stations across
data from NASA’s Global Energy Resources (Electricity) Europe for the full year 2018.
Prediction Program, and variables include measurements of ClimateNet [301] is an open and expert-labeled dataset
precipitation, surface pressure, relative humidity dew/frost designed for high-precision analyses of extreme weather
point, wind speed, and daily resolution temperature. Past events. It focuses on capturing tropical cyclones and at-
drought observations were also included and given a par- mospheric rivers in high-resolution climate model outputs,
22
simulating the recent historical period from 1996 to 2010. sified as “no rain“ indicating instances where precipitation
This dataset is valuable for various applications in machine is not observed. Around 13.80% of the samples correspond
learning and climate research, such as transfer learning, cur- to rainy conditions, while 1.10% represent extreme rainfall
riculum learning, active learning, spatiotemporal segmenta- events.
tion, probabilistic segmentation, and hypothesis testing. China [335] is a precipitation forecasting dataset col-
IowaRain [328] is primarily derived from the Quan- lected in China, provided hourly, 1 km × 1 km resolution,
titative Precipitation Estimation System (QPES) based on 3-hourly grid-point precipitation data for the rainy sea-
the National Weather Service’s Weather Detection Radar son. This dataset lasts from April through October for the
Network lowa Flood Center. It covers the region of Iowa and 2020 and 2021 seasons. In addition, it includes 3-hour lead
spans the period from 2016 through the end of 2019. Each time projections from the regional NWP model, including
event in the dataset includes a collection of 2D rainfall rate 28 surface and pressure level variables such as 2-meter
maps, along with information about the size of the event temperature, 2-meter dew point temperature, 10-meter u
(i.e., the number of rainfall rate maps in the set) and the and v wind components, and CAPE (Convective Available
start date of the event. This dataset is specifically designed Potential Energy) values. Each time frame in this dataset
for predicting regional rainfall events. covers a sizable spatial region with a grid size of 430 × 815.
ExtremeWeather [302] is a comprehensive dataset that China-Precipitation/Temperature [326] is a high-spatial-
aims to facilitate the detection, localization, and under- resolution monthly precipitation and temperature dataset
standing of extreme weather events. It is based on post- for China, covering the period from 1901 to 2017. The
processed simulations of CAM5, a widely used atmospheric dataset includes monthly minimum, maximum, and mean
3D model for global climate simulations. The dataset focuses temperatures, as well as precipitation data, at a spatial
on extreme weather events and provides a spatial resolution resolution of 0.5 arcminutes (approximately 1 kilometer) for
of 25-km. Each snapshot of the global atmospheric state is the main land area of China. The dataset was downscaled
represented as a 768 × 1152 grid, with 16 simulated climate using the Delta spatial downscaling method from the 30
variables including surface temperature, surface pressure, arcminute Climatic Research Unit (CRU) time series dataset
precipitation, latitudinal winds, meridional winds, humid- and the WorldClim climatology dataset. It was evaluated us-
ity, cloud fraction, and water vapor. The dataset covers a ing observations collected from 496 weather stations across
time span from 1979 to 2005, with a temporal resolution of China during the period from 1951 to 2016.
3 hours. It consists of a total of 78,840 samples, capturing ClimART [221] is a dataset for emulating atmospheric
four types of extreme weather events: Tropical Depression radiative transfer in weather and climate models, with more
(TD), Tropical Cyclone (TC), Extratropical Cyclone (ETC), than 10 million samples from present, pre-industrial. and
and Atmospheric Rivers (AR). The center of each storm is future climate conditions, based on the Canadian Earth Sys-
considered as the reference point for marking the bounding tem Model. This dataset of global snapshots of the current
box coordinates. Notably, the dataset includes 39,420 labeled atmospheric state from CanESM5 was simulated every 205
images, providing valuable annotations for training and hours from 1979 to 2014. CanESM5 has a horizontal grid
analysis purposes. discretizing longitude to 128 columns of the same size and
KoMet [333] is a collection of data specifically gathered latitude to 64 columns using a Gaussian grid (8192 = 128 x
in Korea. It utilizes input data from GDAPS-KIM, a global 64 columns). This resulted in 43 global snapshots per year
numerical weather prediction model that offers hourly fore- for the period 1979-2014, totaling over 12 million columns
casts for various atmospheric variables. The dataset focuses and a raw dataset size of 1.5 TB.
on precipitation prediction and has a spatial resolution of MeteoNet [336] is a multimodal dataset for regional pre-
12 × 12 kilometers, resulting in a spatial size of 65 × 50. cipitation nowcasting covering a geographical area of 550
The dataset includes two types of variables: pressure level × 550 km in the northwestern quarter of France, spanning
variables and surface variables. These variables provide the years 2016 to 2018. The modalities of the dataset include
valuable information for predicting and understanding pre- radar echo observations, earth-observing satellite imagery,
cipitation patterns in Korea. In terms of the distribution of ground station observations, weather forecast model data
samples, approximately 87.24% of the samples in the dataset and topographic maps. The ground observation data has a
are classified as ”no rain,” indicating instances where pre- temporal resolution of six minutes and includes meteorolog-
cipitation is not observed. Around 11.57% of the samples ical variables such as temperature, humidity, atmospheric
correspond to rainy conditions, while 1.19% represent ex- pressure, and wind speed measured by 500 ground stations.
treme rainfall events. The radar echoes, on the other hand, are precipitation radar
Germany [334] is a precipitation forecasting dataset col- records with a five-minute time resolution, i.e., 12 frames
lected in West Germany. It spans the period from 2011 to recorded in one hour, including radar reflectivity and rain-
2018 and focuses on precipitation forecasting. The input fall estimates. The satellite data are recorded every 15 min
data for this dataset are derived from the COSMO-DE-EPS for Cloud Type (CT) and every 1 hour for Channels (visible,
forecast, which provides 143 variables representing different infrared). Weather models are also included forecasts from 2
atmospheric states. The dataset has a spatial resolution of weather models with 2D parameters, generated once a day.
36 × 36 for the input data, indicating the grid size used RAIN-F [345] is a pre-processed spatio-temporally
to represent the atmospheric conditions. The output data, aligned multimodal dataset for short-advance rainfall fore-
representing the precipitation forecasts, have a higher res- casting, which includes radar, ground-based observations,
olution of 72 × 72. In terms of the distribution of samples, and a variety of summed satellite data, for the time period
approximately 85.10% of the samples in the dataset are clas- from 2017 to 2019, with a coverage of the Korean Penin-
23
sula. Specifically, nine different atmospheric state variables Weather2K [87] is a large-scale dataset for weather pre-
(one radar, seven ground observations, and one satellite) diction based on station observation data, which is extracted
associated with precipitation variables are included with a from 1,866 ground-based meteorological stations through-
temporal resolution of one hour. The ground-based obser- out China, covering an area of 6 million square kilometers,
vations include wind direction and speed, humidity, surface with 23 features corresponding to each meteorological bat-
pressure, temperature, sea level pressure and precipitation. tle, containing three static variables representing geographic
RAIF-F+ [337] is a new version of RAIN-F with new information as well as 20 interacting meteorological vari-
atmospheric variables and TB products, which can also be ables, and with a temporal coverage of January 1, 2017, to
used to retrieve atmospheric variables from satellite obser- August 31, 2021, with a temporal resolution of one hour.
vations or to predict atmospheric state and precipitation, NASA [24] is a collection of regional weather forecasting
with geographic and temporal coverage identical to that of datasets, which consists of three subsets, AvePRE, SurTEMP,
RAIN-F. SurUPS, spanning from Apr 1, 2012 to Feb 28, 2016, Jan 3,
ENS-10 [325] is a post-processing dataset for ensemble 2019 to May 2, 2022, and Jan 2, 2019 to Jul 29, 2022, respec-
weather forecasting, consisting of 10 ensemble members tively, all with one-hour temporal resolution, collected from
spanning 20 years (1998-2017). These ensemble members are 88, 525, and 238 stations, respectively.
generated by perturbing numerical weather simulations to LSDSSIMR [300] is a large-scale dust storm database
capture the chaotic behavior of the Earth. To represent the used for extreme weather and sandstorm prediction. The
three-dimensional state of the atmosphere, ENS10 provides data is sourced from multi-channel and dust label data
11 atmospheric variables at 11 different pressure levels and of the Fengyun-4A (FY-4A) geostationary orbit satellite, as
the most relevant variables at the surface, with a resolution well as Earth system reanalysis data. The dataset covers
of 0.5 degrees. The dataset includes forecast lead times of the time span from March to May each year from 2020 to
T = 0, 24, 48 hours (two data points per week). 2022, with a time resolution of 15 minutes and a spatial
SEVIR [86] is a collection of temporally and spatially resolution of 4 kilometers. Meteorological reanalysis data is
aligned image sequences depicting weather events captured incorporated into LSDSSIMR for spatio-temporal prediction
over the contiguous US (CONUS) by GOES-16 satellite and methods. Each data file is stored in HDF5 format, and the
the mosaic of NEXRAD radars. Five different image data final LSDSSIMR consists of nearly 5400 HDF5 files.
types are included, such as the GOES-16 0.6 µm visible RainNet [330] is a large-scale dataset specifically de-
satellite channel (vis), 6.9 µm and 10.7 µm infrared channels signed for spatial downscaling of precipitation. It contains
(ir069, ir107), a radar mosaic of vertically integrated liquid data from 85 months or 62,424 hours, resulting in a total of
(vil), and total lightning flashes collected by the GOES-16 62,424 pairs of high-resolution and low-resolution precipi-
geostationary lightning mapper (GLM) (lght). The spatial tation maps. The high-resolution precipitation maps have a
resolution is 0.5km, 2km, 2km, 1km, and 8km, the temporal size of 624x999, while the low-resolution maps have a size
resolution is 5 minutes (except for lightning events), and the of 208x333. These data encompass various meteorological
image coverage is 768×768, 192×192, 192×192, and 384×385, phenomena and precipitation conditions such as hurricanes
respectively, corresponding to meteorological events 1403, and squall lines. The precipitation map pairs in RainNet are
13552, 13541, 20393, and 15115, which can be used by the stored in HDF5 files, occupying a total of 360GB of disk
applied to weather prediction, image-to-image conversion, space. The data is collected from satellites, radars, and rain
extreme weather detection, weather annotation, super reso- gauge stations, covering the inherent working characteris-
lution and other applications. tics of different meteorological measurement systems.
SRAD2018 is a precipitation nowcasting dataset com- Continental United States Wind Speeds [331] is a cli-
posed of a series of radar echo image, is from Tianchi IEEE mate downscaling (super-resolution) dataset, was obtained
International Conference on Data Mining (ICDM) 2018 Global from the National Renewable Energy Laboratory’s (NREL’s)
Artificial Intelligence Challenge on Meteorology and collected Wind Integration National Database (WIND) Toolkit, with a
by Shenzhen Meteorological Bureau and Hong Kong Obser- focus on the continental United States. Wind velocity data is
vatory. Each sequence in the dataset contains 501 × 501 km comprised of westward (ua) and southward (va) wind com-
region with 1 × 1 spatial resolution, the temporal resolution ponents, calculated from wind speeds and directions 100-
is 6 min and complete sequence is 6 h, taken from an altitude km from Earth’s surface. The WIND Toolkit has a spatial res-
of 3 km. olution of 2 km x 1 hr spatiotemporal resolution. The dataset
RainBench [338] is a precipitation forecasting dataset contains data sampled at a 4-hourly temporal resolution for
consists of European Centre for Medium-Range Weather the years 2007 to 2013. The sample test dataset contains
Forecasts simulated satellite data (SimSat), ERA5 reanalysis data sampled at a 4-hourly temporal resolution for 2014. We
product and Integrated Multi-satelliteE Rettrievals (IMAGE) transform 2D data arrays of wind speed and direction into
global precipitation estimates. All data is converted from corresponding ua and va wind speed components. These
their original resolution to 5.625 resolutions using bilinear are chipped into 100x100 patches. Low resolution imagery
interpolation. The time span is 2000 to 2017 and the time is obtained by sampling high resolution data at every fifth
resolution is 1 hour. data point as instructed by NREL’s guidelines.
KnowAir [329] is a weather forecasting dataset based Continental United States Solar Irradiance [331] is
on station observations, which includes 184 meteorological a climate downscaling (super-resolution) dataset, was ob-
stations in northern China. The dataset covers the time span tained from the National Renewable Energy Laboratory’s
from 2015 to 2018, with a temporal resolution of three hours. (NREL’s) National Solar Radiation Database (NSRDB), with
It primarily includes 18 weather features. a focus on the continental United States. we consider solar
24
irradiance data from the NSRDB in terms of direct normal CLIMATESTANCE & CLIMATEENG [341] is a ternary
irradiance (DNI) and diffused horizontal irradiance (DHI) at classification dataset about climate-related text, extracted
an approximately 4-km x 1/2-hr spatiotemporal resolution. Twitter data consisting of 3777 tweets posted during the
The solar dataset produced for this work samples data at an 2019 United Nations Framework Convention on Climate
hourly temporal resolution from 6 am to 6 pm for the years Change. Each tweet was labelled for two tasks: stance de-
2007 to 2013. The test dataset contains datapoints sampled tection and categorical classification. For stance detection
from 2014. A 1D array of data points is provided along the authors labelled each tweet as In Favour, Against or
with latitude and longitude metadata for each point. We re- Ambiguous towards climate change prevention. For categor-
arrange this 1D array into a 2D image based on the lat/long ical classification, the five classes are Disaster, Ocean/Water,
metadata. These 2D arrays of DNI and DHI are chipped Agriculture/Forestry, Politics, and General.
into 100 x 100 patches. Low resolution imagery is obtained SCIDCC [342] is curated by scraping new articles from
by sampling high resolution data at every fifth data point. the Science Daily website [342]. It contains around 11k
news articles with 20 labelled categories relevant to climate
8.1.2 Weather and Climate Text Data change such as Earthquakes, Pollution, and Hurricanes. Each
This subsection focuses on weather text datasets, which are article comprises of a title, a summary, and a body which on
more thematically oriented towards climate change related average is much longer (500-600 words) than other climate
policy statements as well as document texts. text datasets.
CLIMATE-FEVER. [339] is a dataset adopting the
FEVER methodology that consists of 1,535 real-world claims
8.2 Tools and Models
regarding climate-change. Each claim is accompanied by
five manually annotated evidence sentences retrieved from In this subsection, we collect and compile a rich and usable
Wikipedia that support, refute or do not give enough in- set of tools and foundation models for modeling weather
formation to validate the claim. The total dataset thus con- and climate data.
tains 7,675 claim-evidence pairs. Furthermore, the dataset • OpenCastKit: A new global AI weather forecasting
features challenging claims that relate multiple facets and project based on FourCastNet and GraphCast. https:
disputed cases of claims where both supporting and refut- //github.com/HFAiLab/OpenCastKit
ing evidence are present. • GraphCast: A foundation model for medium-
ClimateBERT-NetZero. [198] is an expert-annotated range global weather forecasting. https://github.com/
dataset from the Net Zero Tracker Project that assesses google-deepmind/graphcast
targets for reduction and net zero emissions or similar aims • FourCastNet: A foundation model for weather and
(e.g., zero carbon, climate neutral, or net negative). The climate data based on AFNO. https://github.com/
dataset contains 273 claims by cities, 1396 claims by com- NVlabs/FourCastNet
panies, 205 claims by countries, and 159 claims by regions. • PanGu-Weather: A foundation model for medium-
ClimaText. [205] is a dataset for climate change topic range glocal weather forecasting. https://github.com/
detection, consists of labeled sentences. The label generated 198808xc/Pangu-Weather
heuristically or via a mannual process. indicates whether a • FuXi: A forecasting system for 15-day global weather
sentence talks about climate change or not. All sentences are forecast. https://github.com/tpys/FuXi
collected from Wikipedia, the U.S. Securities and Exchange • W-MAE: A unsupervised learning global weather
COMMISSION (SEC) 10K files. For Wikipedia, collect 6,885 forecasting model via Masked Autoencoder. https://
documents, 715 relevant to climate change and 6,170 not github.com/Gufrannn/W-MAE
relevant to climate change. • ClimaX: A versatile climatefoundation model cover-
CLIMA-INS [340] contains survey from annual NAIC ing forecasting, projection, and downscaling. https://
Climate Risk Disclosure Survey responses for the years github.com/microsoft/ClimaX
2012-2021, the purpose of the survey is to enhance trans- • OceanGPT: A large language model for ocean science
parency about how insurers manage climate-related risks tasks trained with KnowLM. https://huggingface.co/
and opportunities to enable better-informed collaboration zjunlp/OceanGPT-7b
on climate-related issues, where each survey consists of • ClimateBert: An algorithm that enables to analyze
eight questions. climate-risk disclosures along the four main TCFD cat-
CLIMA-CDP [340] is composed of three subset part egories. https://huggingface.co/climatebert
where each part is a set of questionnaires filled out by a city, • Climate X Quantus: An XAI toolbox for ML/DL-based
company, or state respectively. The dataset can performs climate models. https://github.com/philine-bommer/
topic classification and question classification. The number Climate X Quantus
of sample from train, development, and test for task of
topic classification is 46.8K, 8.7k, and 8.9K, respectively. In
addition, the number of sample from train, development, 9 C HALLENGES , O UTLOOK , AND O PPORTUNITIES
and test for task for question answering task is 48.2K (8.7K The potential pitfalls of AI foundation models in weather
for states, 34.5K for corporations), 8.5K (0.9K for states, and climate (WFMs) data understanding are manifested in
34.5K for corporations), and 9.3K (1.1K for states, 4.9K for a large number of pending challenges to which data-driven
corporations), respectively. The number of classes for topic models are more susceptible than traditional NWP models.
classification task is 12, for question answering is 294, 132, In this section, we identify five main challenge areas and
and 43 respectively. suggest some best practices that should be recognised and
25
implemented in future research, as well as pointing out data [86], [314], [337], these models often exhibit limitations.
research opportunities and routes that hold great promise They are typically confined to specific geographic regions
for the future. and struggle to accommodate the extensive spectrum of
meteorological modes. A salient challenge in constructing
multimodal climate foundation models lies in enabling
9.1 Post-Processing of Data
these models to learn joint representations that encapsulate
For DL models, the quality of the data is paramount. How- the sequential nature of temporal data and the unique traits
ever, numerous challenges associated with data pose threats of other meteorological modes.
to the development of expansive foundation models for This challenge encompasses understanding and accom-
weather and climate data understanding, including issues modating the disparate temporal and spatial resolutions
related to data quality and quantity, post-processing costs, across modes. For instance, meteorological observations
scarcity of historical data, non-stationarity, and the under- may have an hourly temporal resolution, radar echo data
utilization of existing datasets. might possess a six-minute temporal resolution and 1-4 km
• Data Quality and Quantity. Large-scale foundation spatial resolution, and satellite images could exhibit half-
models require comprehensive and high-quality data hourly temporal resolution and a 5-12 km spatial resolution.
for robust results. Despite the exponential increase in The task of leveraging information with different temporal
global climate data [73], like ERA5 and CMIP [88], and spatial resolutions to construct a robust and powerful
general-purpose datasets that are both large-scale and climate foundation model is complex. Furthermore, it is a
high-quality are seldom available. challenge to balance and align multimodal information col-
• Post-processing Costs. Large models, such as PANGU - lected at different time points to achieve more precise fixed-
W EATHER [63] and C LIMA X [25], often necessitate point prediction and analysis. As such, the development of
costly post-processing for scenario-specific analyses. models that can effectively integrate and learn from these
The analysis of extreme events, for example, presents diverse data sources remains a challenging but important
a unique challenge. These rare events, which are in- frontier in the field of weather and climate analysis.
creasingly likely in a non-stationary climate, are often
characterized by outliers in climate variables. Their de- 9.3 Interpretability and Causability
velopment involves physical processes that span time A significant challenge associated with the use of AI models
cycles from weeks to years, complicating the creation for weather and climate analysis is the often inscrutable
of fine-grained annotations [302]. nature of the model’s decision-making process. Many DL
• Underutilization of Existing Datasets. Large-scale algorithms are inherently complex and opaque, rendering
datasets, despite their size, remain underdeveloped their decision-making processes unintelligible to users [65],
due to the enormous post-processing costs. Benchmark [66]. For applications such as machine translation and text
datasets like WeatherBench [88], WeatherBench2 [89], generation, the interpretability may not be a key concern.
OceanBench [346], and ClimateLearn [90], which con- In these contexts, it is typically sufficient for the model to
tain post-processed data, are still in early stages of display competent performance to meet most requirements.
development due to limited data scenarios. However, in weather and cliamte applications, the inter-
The creation of general WFMs hinges upon the avail- pretability of the model is of paramount importance.
ability of rich, large-scale, post-processed datasets. There is Non-transparent, black-box models can precipitate catas-
substantial scope for deeper analyses and post-processing of trophic errors in predictions, which could have devastating
these datasets, including understanding anomalous weather impacts on society and the environment. To mitigate this
events, integrating physical models, and efficient, rational interpretability challenge, tools rooted in the concept of Ex-
annotations. Overcoming these challenges is key to realizing plainable AI (XAI) have been proposed, such as XAITools4 ,
the full potential of climate foundation models. InterpretML5 , SHAP6 , LIME7 , and AI Explainability 3608 ,
etc. These tools aim to bring increased transparency and
9.2 Development of Multi-Modal Models trustworthiness to black-box models, including those used
in various fields such as Earth sciences [347] (Climate X
Time series data are often enriched with supplementary Quantus9 ), and offer new insights for refining models that
information, including textual descriptions. This is particu- underperform. However, these interpretability tools are not
larly beneficial in economics and finance, where forecasting without their shortcomings and can exhibit significant bi-
can harness information from textual data sources such as ases. In some cases, the truthful representation of the model
news articles or tweets, in conjunction with digital economic may depend more on the specifics of the application and its
time series data [110]. Analogously, weather and climate settings, which can render the results difficult to interpret.
analysis can profit from the diverse modalities encompassed This suggests that the interpretability insights of climate AI
in climate data, which include reanalysis data, multimodal are influenced more by the network’s architecture than by
observation data (e.g, radar echoes [84], [107], satellite im- the causal inference of weather and climate data.
agery [300], and geographic terrain features [314], etc.). The
development of models capable of integrating and learning 4. https://github.com/IntelAI/intel-xai-tools
5. https://github.com/interpretml/interpret
from this rich array of data modalities has the potential to
6. https://github.com/shap/shap
enhance predictive accuracy. However, while efforts have 7. https://github.com/marcotcr/lime
been made to develop weather prediction and meteorolog- 8. https://github.com/Trusted-AI/AIX360
ical analysis models based on multimodal meteorological 9. https://github.com/philine-bommer/Climate X Quantus
26
Unless appropriately designed, Weather/Climate AI • Centralized Training Issues. Models typically undergo
may base predictions on non-physical relationships or false pre-training with substantial data before being de-
correlations. This limitation in drawing causal conclusions ployed to various downstream tasks [25], [63], [136],
from climate models using XAI tools refers to the limited [137], [138]. However, the centralized training strategy
causality these tools can provide. Physics-guided AI, also can be fraught with problems. Aggregating sensitive
known as knowledge-guided or physics-informed AI, is one data from different regions or countries onto a central
avenue researchers are exploring to impose physical realism server is neither reliable nor practical due to the inher-
and mitigate the effects of false correlations on predictive ent risks of data leakage and contamination [24].
algorithms [244], [246], [248], [294]. However, research in • Privacy Leaks and Adversarial Attacks. During the
this area is still nascent. Thus, while strides have been made fine-tuning process, WFMs often memorize specific
in enhancing the interpretability of AI models, substantial details from the datasets, which can potentially com-
challenges remain, highlighting the need for continued re- promise private data. Contaminated data also pose a
search and development in this critical area. risk of deteriorating model performance. Therefore, the
adoption of privacy-preserving techniques to prevent
privacy leaks and mitigate adversarial attacks is crucial
9.4 Generalizability of Models in the training/fine-tuning of WFMs.
The generalization capability of a model refers to its compe- Recent studies have introduced the use of differential
tence in making effective predictions beyond the spatiotem- privacy (DP) techniques or federated learning (FL) to train
poral confines of its training dataset. Lots of DL techniques WFMs [24], [154], effectively lessening the risk of sensitive
operate on the assumption of independent and identically climate data leakage [348]. However, these methods are still
distributed (IID) training and test data [66]. This implies confronted with communication challenges.
that the weights calculated during model training remain • Communication Overheads in Federated Learning
efficacious even on unseen datasets. However, when applied Federated learning allows different clients to collabora-
to weather and climate analysis, foundation models may tively train a global model, with each client maintaining
exhibit suboptimal performance when predicting Non-IID a locally replicated model with consistent structures.
data beyond the training dataset. A notable example of this During the global aggregation stage, each participant
is the use of foundation models for the prediction of ex- uploads their local model parameters to a cloud server
treme events outside their trained distribution. These biased for aggregation. This process results in a significant
and anomalous data often induce significant performance increase in communication overhead between clients
degradation in the model. This is especially the case as and the server due to the large-scale nature of climate
the warming climate alters the Earth’s spatiotemporal dis- models, posing a serious challenge to computational
tribution. The existing relationships that currently describe and hardware costs.
the predictive variables and extreme climate events may no
longer apply in the future.
9.6 Continuous Learning and On-device Adaption
Moreover, climate foundation models are typically pre-
trained on general data before being fine-tuned on specific The performance of WFMs, despite showing promising re-
task datasets [25]. If the fine-tuning data includes adversar- sults, can be substantially improved through the application
ial or noisy examples, the process may introduce vulner- of continuous learning and on-device adaption. Continual
abilities. If the temporal data employed for fine-tuning is learning [349], also referred to lifelong or incremental learning,
not meticulously managed, the model may adopt biases or is the process of updating a model over time as new data
flaws from this data, leading to compromised robustness in emerges. Given the ever-evolving nature of climate and
practical applications and unreliable outputs. This under- weather patterns due to natural variability and anthro-
scores the imperative for robust generalization. The advent pogenic climate change, this approach proves particularly
of physics-informed deep learning represents a promising beneficial. It enables models to adapt to these changes, en-
step towards enhancing the robust generalization of climate hancing their predictive accuracy and robustness. On-device
models. However, the extension of these models beyond adaptation [75] involves the customization of a model based
their trained distribution remains an area that is not yet on local data at the point of deployment. It has the potential
fully explored. This highlights the need for continued re- to boost model performance by enabling adjustments to
search into the generalization capabilities of climate models, local climate and weather patterns, which may not be com-
particularly in light of the rapidly changing climate and the prehensively captured in global training data. Furthermore,
ever-evolving challenges it presents. on-device adaptation can minimize the requirement for
data transmission, thereby enhancing model efficiency and
preserving privacy. However, the implementation of con-
9.5 Privacy, Adversarial Attacks, and Communication tinuous learning and on-device adaptation in models poses
Weather and climate data are often of high sensitivity, several challenges. These include ensuring model stability
encapsulating a wealth of climate variables, geographical during Continual learning and managing the computational
information, and topographical details dispersed across var- and storage constraints of on-device learning:
ious regions/ countries [24], [154]. In particular, radar and • Maintaining Model Stability. Models undergoing
satellite data are highly sensitive. The training of WFMs learning can experience a phenomenon known as
using such data poses significant challenges from aspects of ”catastrophic forgetting,” where a model may forget
centralized training, privacy leaks and adversarial attacks. previously learned patterns after being updated with
27
new data. Balancing the maintenance of model stability ity. However, this can become complex and arduous to
while still allowing it to learn from new data poses a manage, particularly in large collaborative projects.
significant challenge. Addressing these challenges necessitates a united effort
• Managing Computational and Storage Constraints. from the entire research community. This includes the es-
The computational power and storage capacity of a tablishment of standards for data management and model
device inherently limit on-device machine learning. De- documentation, investment in open-source software and
ploying and updating large climate models, on devices infrastructure, and the cultivation of a research culture
with limited resources may prove difficult. Techniques underscored by transparency and openness. While these
for model compression, efficient computation, and se- issues are complex, resolving them is paramount to the
lective model updating are essential to make on-device advancement of climate foundation models research and
adaptation of models feasible. ensuring its benefits are widely disseminated.
Despite these obstacles, Continual learning and on-device
adaptation present a promising avenue for enhancing the
performance of climate models. 10 I NSIGHT FOR F OUNDATION M ODEL D ESIGNING
This section presents an intricate examination of the design
principles that serve as the foundation of current state-
9.7 Reproducibility of-the-art (SOTA) WFMs. Its intent is to offer an exhaus-
Reproducibility stands as a cornerstone principle in the tive guide and insights for the development of resilient,
realm of scientific research. The capacity to reproduce re- multipurpose climate foundation models. Five perspectives
sults using identical data and methodologies not only re- are covered in this discourse: functional design, fusion of
inforces the validity of the findings but also propels fur- multi-source data, data representation, design of network
ther research and innovation. Nevertheless, the pursuit of architecture, and strategy for pre-training/fine-tuning.
reproducibility in climate foundation models, poses several
formidable challenges: 10.1 One Fits All
• Data Availability and Consistency. climate foundation Establishing foundation models necessitates a judicious se-
models frequently utilise extensive datasets, gathered lection of tasks, which influences the data employed, the
from a plethora of sources over extended periods. The training strategies deployed, the fine-tuning methodologies
challenge lies in ensuring the availability and consis- adopted, and other associated factors. Foundation mod-
tency of this data for model reproduction. Data may els are often viewed as a panacea, pre-trained models
undergo updates or corrections, and access permis- that are subsequently fine-tuned for various application-
sions can fluctuate, thereby adding complexity to re- specific tasks. Premier climate foundation models, such as
producibility endeavours. FengWu [138] and Pangu-Weather [63], prioritize systematic
• Model Complexity. climate foundation models of- modeling of the Earth system, encompassing the prediction
ten incorporate sophisticated machine learning archi- of terrestrial and atmospheric climate variables at distinct
tectures, intricate pre-processing steps, and advanced spatio-temporal scales. Conventionally, these models are
training procedures. Reproducing these models neces- trained using data of spatial resolution derived from the
sitates a comprehensive understanding of all these widely accepted ERA5 dataset. In contrast, ClimaX [25]
facets. If any segment of the process is inadequately adopts an alternate approach, pre-training the foundation
documented or if specific implementation details are model at a coarser resolution and later achieving finer
proprietary, model reproduction can become an insur- spatial resolution predictions, mappings, or down-sampling
mountable task. via fine-tuning. Thus, the primary technical strategy for the
• Computational Resources. Climate foundation models development of Weather and WFMs involves pre-training
typically demand substantial computational resources the models with extensive high-resolution data and then
for training and inference. Reproduction of these mod- fine-tuning them with minimal effort to demonstrate excep-
els may be prohibitively costly or technically challeng- tional performance across a range of downstream tasks.
ing for researchers lacking comparable resources. This
disparity can erect barriers to reproducibility and im-
pede the progress of the broader research community. 10.2 Multi-source Data Fusion
• Non-Determinism in Training. Several training pro- Weather and climate data primarily fall into spatio-temporal
cesses involve elements of randomness, such as ran- series. Our discussion primarily revolves around spatio-
dom initialization of weights, shuffling of training data, temporal sequence tasks, as delineated in Sec. 8. Due to the
and stochastic optimization methods. These factors can variety of data sources, including but not limited to ground
yield slightly divergent models and results, even when stations, remote sensing devices, and simulation-based cli-
employing the same data and model architecture. En- mate products, the fusion of information from multiple data
suring reproducibility amidst such non-determinism sources can explicitly benefit the training process of the
can prove challenging. foundation model and thus lead to improved performance.
• Model Versioning. As climate foundation models However, significant modal differences and heterogeneity
evolve, new model versions are developed. It’s crucial among data complicate the realization of multi-source fu-
to maintain a record of model versions and align them sion operations. We present here insights into this from two
with the specific results they generated for reproducibil- main aspects: Spatio-Temporal Scales, Data Modality.
28
• Spatio-Temporal Scales. Practitioners can implement 10.3.2 What strategies can enhance models and facilitate
weather and climate models on a global scale by consid- efficient and accurate representations?
ering data at multiple spatio-temporal scales simultane- Accurate representation of the latent semantic information
ously, most commonly under reanalysis data (see Sec. 8) in weather and climate data hinges on jointly modeling the
by fusing high- and low-resolution data to model both temporal, spatial, and variable dimensions of the data. Po-
fine- and coarse-global features. tential strategies to enhance these models include tokeniza-
• Data Modality. Weather and climate data’s modal tion strategies, positional encoding, attention mechanisms,
mainly focuses on time series10 and text. Fusion of and time feature extraction. Here we discuss these four
multi-source data for foundation model training for strategies in detail:
weather and climate can be encouraged to capture • Tokenization Strategy. The term ”token” originated in
interrelated knowledge from different scales and data the context of Transformers, where a critical operation
modalities. Examples precipitation nowcasting and the is dividing the original input image into small blocks
fusion of multiple neutrals at different pressures for of local semantic information based on a patch size
robust global forecasting models. Practitioners can ex- - a process referred to as tokenization. For irregular
plore simultaneous or staged fusion of multimodal reanalysis gridded weather and climate data, the ab-
weather data to benefit the foundation model. sence of specific rules or definitions for segmentation
implies that the choice of tokenization significantly
10.3 Data Representation and Model Design impacts model performance. For instance, ClimaX in-
troduces a coherent tokenization operation [25], while
The robust development of WFMs is contingent on ef- PanGu-Weather [63], FuXi [139], and FengWu [138]
fective data interpretation and representation of weather use different methods for encoding variables. A good
and climate statistics. This process typically involves two tokenization strategy should consider sptaio-temporal
stages: initial data representation construction through pre- correlations of different variables while accounting for
training, and application of this representational knowledge different physical scales, without introducing excessive
to downstream tasks via fine-tuning. Unique representation complexity.
methods are required given that each data point encodes • Positional Encoding. Positional encoding in a Trans-
complex contextual information, unlike the features of nat- former provides spatial information about data points
ural images. The subsequent discourse seeks to address in a sequence. For weather and climate data, different
two pivotal questions in this domain: (1) Which network positional encoding strategies can be employed. Com-
architectures can effectively represent weather and climate data? pared to fixed encoding, learnable encodings offer more
and (2) What strategies can improve models and facilitate efficient flexibility, as their positional parameters can be updated
and accurate representations? to increase model robustness.
• Attention Mechanisms. Attention mechanisms are crit-
10.3.1 Which network architectures can effectively repre- ical in Transformers for modeling dependencies be-
sent weather and climate data? tween different elements in a sequence. For weather
Reanalysis weather and climate datasets bear significant and climate data, attention mechanisms can help cap-
resemblances to natural images, most notably using grid ture relationships between different time steps, geo-
cells to delineate local semantic information. Consequently, graphical locations, and meteorological variables. The
almost all network architectures employed in computer computational complexity of attention mechanisms also
vision can be utilised for processing weather and climate needs to be considered, as many models encounter high
grid data, including but not limited to ResNet, U-Net, Vision cost and reduced speeds during training and inference.
Transformer, generative adversarial networks (GANs), and • Time Feature Extraction. Weather and climate data
• Self-Supervised Learning. Self-supervised Learning locally, requiring only the transfer of model updates,
(SSL) is an unsupervised paradigm wherein models are thus significantly reducing data transmission demands.
assigned the task of predicting certain components of Lastly, FL allows the model to benefit from climate
their own input data. This approach generates labels data from different geographical locations and types,
intrinsically from the data, obviating the need for exter- enhancing the model’s generalization ability and accu-
nal annotations. As a result, SSL can exploit copious racy. Currently, numerous studies have incorporated FL
amounts of unlabeled data for training. Within the into the process of training WFM [24], [154].
sphere of weather and climate modeling, SSL could
be utilized to identify climate patterns and trends.
For instance, future meteorological conditions could be 11 C ONCLUSION
forecasted using historical weather variables such as In conclusion, we present a comprehensive and up-to-date
temperature, humidity, and wind velocity. This could survey of data-driven models tailored to analyze weather
be accomplished by projecting the subsequent data and climate data. The intention is to offer a fresh view-
point within a pre-established temporal window. In so point on this evolving discipline through a systematically
doing, the model can apprehend inherent weather data organized appraisal of pertinent models. We distill the most
trends and patterns. The principal advantage of SSL lies salient methodologies within each category, investigate their
in its capacity to harness vast quantities of unlabeled respective advantages and drawbacks, and propose viable
data for training. Furthermore, it can reveal inherent trajectories for forthcoming exploration. This survey is in-
data patterns and structures, which is especially advan- tended to act as an impetus to kindle sustained interest
tageous for weather and climate tasks [64]. and nurture a persistent enthusiasm for research within the
• Semi-Supervised Learning. Semi-supervised Learning realm of data-driven models for weather and climate data
(SML) represents an intermediate approach between understanding.
Fully Supervised Learning (FLSL) and Self-Supervised
Learning (SSL), leveraging both labeled and unlabeled
data for model training. This method is particularly R EFERENCES
advantageous for weather and climate prediction tasks
[1] P. S. Fabian, H.-H. Kwon, M. Vithanage, and J.-H. Lee, “Mod-
due to the potential scarcity of labeled weather data and eling, challenges, and strategies for understanding impacts of
abundance of unlabeled data. One prevalent method- climate extremes (droughts and floods) on water quality in asia:
ology in SML is self-training. Initially, a supervised A review,” Environmental Research, p. 115617, 2023.
model is trained using the available labeled data. Sub- [2] Y. Deng, X. Wang, T. Lu, H. Du, P. Ciais, and X. Lin, “Divergent
seasonal responses of carbon fluxes to extreme droughts over
sequently, this model is applied to predict labels for the china,” Agricultural and Forest Meteorology, vol. 328, p. 109253,
unlabeled data, which are then employed as pseudo- 2023.
labels for retraining the model. This iterative process [3] Z. Zhou, Y. Chen, M. C. Yam, K. Ke, and X. He, “Experimental
investigation of a high strength steel frame with curved knee
continues until the model’s performance plateaus. The braces subjected to extreme earthquakes,” Thin-Walled Structures,
salient advantage of SML is its capacity to concurrently vol. 185, p. 110596, 2023.
utilize labeled and unlabeled data for training. This [4] D. Barriopedro, R. Garcı́a-Herrera, C. Ordóñez, D. Miralles, and
facilitates an enhancement in model performance, es- S. Salcedo-Sanz, “Heat waves: Physical understanding and scien-
tific challenges,” Reviews of Geophysics, p. e2022RG000780, 2023.
pecially when labeled data is limited, by capitalizing [5] J. Zeng, G. Han, S. Zhang, X. Xiao, Y. Li, X. Gao, D. Wang,
on the extensive quantity of unlabeled data. and R. Qu, “Response of dissolved organic carbon in rainwater
• Federated Learning. Federated learning [204] (FL) is during extreme rainfall period in megacity: Status, potential
source, and deposition flux,” Sustainable Cities and Society, vol. 88,
a distributed ML paradigm with the central goal of p. 104299, 2023.
enabling multiple participants to collaboratively train [6] M. P. Couldrey, J. M. Gregory, X. Dong, O. Garuba, H. Haak,
a model, all while safeguarding data privacy and se- A. Hu, W. J. Hurlin, J. Jin, J. Jungclaus, A. Köhl et al.,
curity [350]. In FL, every participant trains their model “Greenhouse-gas forced changes in the atlantic meridional over-
turning circulation and related worldwide sea-level change,”
locally and shares only model updates, rather than the Climate Dynamics, vol. 60, no. 7-8, pp. 2003–2039, 2023.
raw data. This endows FL with a distinct advantage [7] A. Raihan, “A review of the global climate change impacts, adap-
when dealing with sensitive data, while also permitting tation strategies, and mitigation options in the socio-economic
cross-learning from diverse data sources that might be and environmental sectors,” Journal of Environmental Science and
Economics, vol. 2, no. 3, pp. 36–58, 2023.
geographically dispersed or unable to be centralized [8] S. Materia, L. P. Garcı́a, C. van Straaten, A. Mamalakis, L. Cavic-
due to privacy or other reasons. In the context of train- chia, D. Coumou, P. De Luca, M. Kretschmer, M. G. Donat et al.,
ing WFMs, the application of federated learning carries “Artificial intelligence for prediction of climate extremes: State
of the art, challenges and future perspectives,” arXiv preprint
significant benefits. Firstly, meteorological bureaus and arXiv:2310.01944, 2023.
research institutions across the globe possess extensive [9] J. R. Beddington, M. Asaduzzaman, A. Fernandez, M. E. Clark,
climate data, but owing to data ownership, privacy, and M. Guillou, M. M. Jahn, L. Erda, T. Mamo, B. N. Van, C. A. Nobre
security concerns, this data cannot easily be central- et al., “Achieving food security in the face of climate change:
Summary for policy makers from the commission on sustainable
ized for processing. Federated learning enables these agriculture and climate change,” 2011.
institutions to collaboratively train a robust weather [10] R. Connor, The United Nations world water development report 2015:
forecasting model without the direct sharing of data. water for a sustainable world. UNESCO publishing, 2015, vol. 1.
Secondly, given the typically large scale of weather and [11] J. F. Kok, T. Storelvmo, V. A. Karydis, A. A. Adebiyi, N. M.
Mahowald, A. T. Evan, C. He, and D. M. Leung, “Mineral dust
climate data, data transfer could potentially become a aerosol impacts on global climate and climate change,” Nature
bottleneck. With FL, data can be processed and trained Reviews Earth & Environment, vol. 4, no. 2, pp. 71–86, 2023.
30
[12] P. Loh, Y. Twumasi, Z. Ning, M. Anokye, J. Oppong, R. Armah, [33] I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay,
C. Apraku, and J. Namwamba, “Analyzing the impact of sea D. Fox, J. Thomason, and A. Garg, “Progprompt: Generating
level rise on coastal flooding and shoreline changes along the situated robot task plans using large language models,” in 2023
coast of louisiana using remote sensory imagery,” The Interna- IEEE International Conference on Robotics and Automation (ICRA).
tional Archives of the Photogrammetry, Remote Sensing and Spatial IEEE, 2023, pp. 11 523–11 530.
Information Sciences, vol. 48, pp. 139–145, 2023. [34] S. Gilbert, H. Harvey, T. Melvin, E. Vollebregt, and P. Wicks,
[13] L. Yu, W. Sun, H. Zhang, N. Cong, Y. Chen, J. Hu, and X. Jing, “Large language model ai chatbots require approval as medical
“Grazing exclusion jeopardizes plant biodiversity effect but en- devices,” Nature Medicine, pp. 1–3, 2023.
hances dryness effect on multifunctionality in arid grasslands,” [35] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutier-
Available at SSRN 4575743. rez, T. F. Tan, and D. S. W. Ting, “Large language models in
[14] I. M. Voskamp and F. H. Van de Ven, “Planning support sys- medicine,” Nature medicine, vol. 29, no. 8, pp. 1930–1940, 2023.
tem for climate adaptation: Composing effective sets of blue- [36] S. Chen, S. Ren, G. Wang, M. Huang, and C. Xue, “Interpretable
green measures to reduce urban vulnerability to extreme weather cnn-multilevel attention transformer for rapid recognition of
events,” Building and Environment, vol. 83, pp. 159–167, 2015. pneumonia from chest x-ray images,” IEEE Journal of Biomedical
[15] H. R. Maier and G. C. Dandy, “Neural networks for the pre- and Health Informatics, 2023.
diction and forecasting of water resources variables: a review [37] K. Zhang and D. Liu, “Customized segment anything model
of modelling issues and applications,” Environmental modelling & for medical image segmentation,” arXiv preprint arXiv:2304.13785,
software, vol. 15, no. 1, pp. 101–124, 2000. 2023.
[16] S. A. Markolf, C. Hoehne, A. Fraser, M. V. Chester, and B. S. [38] J. Ma and B. Wang, “Segment anything in medical images,” arXiv
Underwood, “Transportation resilience to climate change and preprint arXiv:2304.12306, 2023.
extreme weather events–beyond risk and robustness,” Transport [39] H. Abburi, M. Suesserman, N. Pudota, B. Veeramani, E. Bowen,
policy, vol. 74, pp. 174–186, 2019. and S. Bhattacharya, “Generative ai text classification using en-
[17] M. J. Koetse and P. Rietveld, “The impact of climate change semble llm approaches,” arXiv preprint arXiv:2309.07755, 2023.
and weather on transport: An overview of empirical findings,” [40] Y. Shi, H. Ma, W. Zhong, G. Mai, X. Li, T. Liu, and J. Huang,
Transportation Research Part D: Transport and Environment, vol. 14, “Chatgraph: Interpretable text classification by converting chat-
no. 3, pp. 205–221, 2009. gpt knowledge to graphs,” arXiv preprint arXiv:2305.03513, 2023.
[18] K. Ravindra, P. Rattan, S. Mor, and A. N. Aggarwal, “General- [41] X. Sun, X. Li, J. Li, F. Wu, S. Guo, T. Zhang, and G. Wang,
ized additive models: Building evidence of air pollution, climate “Text classification via large language models,” arXiv preprint
change and human health,” Environment international, vol. 132, p. arXiv:2305.08377, 2023.
104987, 2019. [42] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
[19] P. Bauer, A. Thorpe, and G. Brunet, “The quiet revolution of P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
numerical weather prediction,” Nature, vol. 525, no. 7567, pp. 47– context,” in Computer Vision–ECCV 2014: 13th European Confer-
55, 2015. ence, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V
[20] J. Coiffier, Fundamentals of numerical weather prediction. Cam- 13. Springer, 2014, pp. 740–755.
bridge University Press, 2011. [43] A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, “Coco-
[21] R. Kimura, “Numerical weather prediction,” Journal of Wind text: Dataset and benchmark for text detection and recognition in
Engineering and Industrial Aerodynamics, vol. 90, no. 12-15, pp. natural images,” arXiv preprint arXiv:1601.07140, 2016.
1403–1414, 2002. [44] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,
“Imagenet: A large-scale hierarchical image database,” in 2009
[22] T. N. Krishnamurti and L. Bounoua, An introduction to numerical
IEEE conference on computer vision and pattern recognition. Ieee,
weather prediction techniques. CRC press, 2018.
2009, pp. 248–255.
[23] D. Maraun, F. Wetterhall, A. Ireson, R. Chandler, E. Kendon,
[45] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson,
M. Widmann, S. Brienen, H. Rust, T. Sauter, M. Themeßl et al.,
T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment
“Precipitation downscaling under climate change: Recent devel-
anything,” arXiv preprint arXiv:2304.02643, 2023.
opments to bridge the gap between dynamical models and the
[46] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agar-
end user,” Reviews of geophysics, vol. 48, no. 3, 2010.
wal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning
[24] S. Chen, G. Long, T. Shen, and J. Jiang, “Prompt federated transferable visual models from natural language supervision,”
learning for weather forecasting: Toward foundation models on in International conference on machine learning. PMLR, 2021, pp.
meteorological data,” arXiv preprint arXiv:2301.09152, 2023. 8748–8763.
[25] T. Nguyen, J. Brandstetter, A. Kapoor, J. K. Gupta, and A. Grover, [47] L. Floridi and M. Chiriatti, “Gpt-3: Its nature, scope, limits, and
“Climax: A foundation model for weather and climate,” arXiv consequences,” Minds and Machines, vol. 30, pp. 681–694, 2020.
preprint arXiv:2301.10343, 2023. [48] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhari-
[26] M. G. Schultz, C. Betancourt, B. Gong, F. Kleinert, M. Langguth, wal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al.,
L. H. Leufen, A. Mozaffari, and S. Stadtler, “Can deep learning “Language models are few-shot learners,” Advances in neural
beat numerical weather prediction?” Philosophical Transactions of information processing systems, vol. 33, pp. 1877–1901, 2020.
the Royal Society A, vol. 379, no. 2194, p. 20200097, 2021. [49] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever
[27] J. Wei, J. Jiang, H. Liu, F. Zhang, P. Lin, P. Wang, Y. Yu, X. Chi, et al., “Language models are unsupervised multitask learners,”
L. Zhao, M. Ding et al., “Licom3-cuda: A gpu version of lasg/iap OpenAI blog, vol. 1, no. 8, p. 9, 2019.
climate system ocean model version 3 based on cuda,” The Journal [50] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz,
of Supercomputing, pp. 1–31, 2023. E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks
[28] A. F. Prein, N. Ban, T. Ou, J. Tang, K. Sakaguchi, E. Collier, of artificial general intelligence: Early experiments with gpt-4,”
S. Jayanarayanan, L. Li, S. Sobolowski, X. Chen et al., “To- arXiv preprint arXiv:2303.12712, 2023.
wards ensemble-based kilometer-scale climate simulations over [51] D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, “Minigpt-4:
the third pole region,” Climate Dynamics, vol. 60, no. 11-12, pp. Enhancing vision-language understanding with advanced large
4055–4081, 2023. language models,” arXiv preprint arXiv:2304.10592, 2023.
[29] V. L. T. de Souza, B. A. D. Marques, H. C. Batagelo, and J. P. [52] Y. Gao, J. Liu, Z. Xu, J. Zhang, K. Li, R. Ji, and C. Shen, “Pyramid-
Gois, “A review on generative adversarial networks for image clip: Hierarchical feature alignment for vision-language model
generation,” Computers & Graphics, 2023. pretraining,” Advances in neural information processing systems,
[30] J. Willard, X. Jia, S. Xu, M. Steinbach, and V. Kumar, “Integrating vol. 35, pp. 35 959–35 970, 2022.
physics-based modeling with machine learning: A survey,” arXiv [53] P. Zhang, X. Li, X. Hu, J. Yang, L. Zhang, L. Wang, Y. Choi,
preprint arXiv:2003.04919, vol. 1, no. 1, pp. 1–34, 2020. and J. Gao, “Vinvl: Revisiting visual representations in vision-
[31] X. Ren, X. Li, K. Ren, J. Song, Z. Xu, K. Deng, and X. Wang, “Deep language models,” in Proceedings of the IEEE/CVF conference on
learning-based weather prediction: a survey,” Big Data Research, computer vision and pattern recognition, 2021, pp. 5579–5588.
vol. 23, p. 100178, 2021. [54] Z. Wang, Y. Lu, Q. Li, X. Tao, Y. Guo, M. Gong, and T. Liu, “Cris:
[32] L. Yuan, D. Chen, Y.-L. Chen, N. Codella, X. Dai, J. Gao, H. Hu, Clip-driven referring image segmentation,” in Proceedings of the
X. Huang, B. Li, C. Li et al., “Florence: A new foundation model IEEE/CVF conference on computer vision and pattern recognition,
for computer vision,” arXiv preprint arXiv:2111.11432, 2021. 2022, pp. 11 686–11 695.
31
[55] K. Park, S. Woo, S. W. Oh, I. S. Kweon, and J.-Y. Lee, “Per- [76] K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Conditional prompt
clip video object segmentation,” in Proceedings of the IEEE/CVF learning for vision-language models,” in Proceedings of the
Conference on Computer Vision and Pattern Recognition, 2022, pp. IEEE/CVF Conference on Computer Vision and Pattern Recognition,
1352–1361. 2022, pp. 16 816–16 825.
[56] F. Liang, B. Wu, X. Dai, K. Li, Y. Zhao, H. Zhang, P. Zhang, P. Va- [77] M. Maaz, H. Rasheed, S. Khan, and F. S. Khan, “Video-chatgpt:
jda, and D. Marculescu, “Open-vocabulary semantic segmenta- Towards detailed video understanding via large vision and lan-
tion with mask-adapted clip,” in Proceedings of the IEEE/CVF guage models,” arXiv preprint arXiv:2306.05424, 2023.
Conference on Computer Vision and Pattern Recognition, 2023, pp. [78] W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li,
7061–7070. P. Fung, and S. Hoi, “Instructblip: Towards general-purpose
[57] M. Tang, Z. Wang, Z. Liu, F. Rao, D. Li, and X. Li, “Clip4caption: vision-language models with instruction tuning,” 2023.
Clip for video caption,” in Proceedings of the 29th ACM Interna- [79] J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and
tional Conference on Multimedia, 2021, pp. 4858–4862. Y. Wu, “Coca: Contrastive captioners are image-text foundation
[58] Z. Zhang, Y. Chen, Z. Ma, Z. Qi, C. Yuan, B. Li, Y. Shan, and models,” arXiv preprint arXiv:2205.01917, 2022.
W. Hu, “Create: A benchmark for chinese short video retrieval [80] W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal,
and title generation,” arXiv preprint arXiv:2203.16763, 2022. O. K. Mohammed, S. Singhal, S. Som, and F. Wei, “Image as
[59] S. Ling, Y. Hu, S. Qian, G. Ye, Y. Qian, Y. Gong, E. Lin, and a foreign language: Beit pretraining for all vision and vision-
M. Zeng, “Adapting large language model with speech for language tasks,” 2022.
fully formatted end-to-end speech recognition,” arXiv preprint [81] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright,
arXiv:2307.08234, 2023. P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman,
[60] Y. Zhang, W. Han, J. Qin, Y. Wang, A. Bapna, Z. Chen, N. Chen, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder,
B. Li, V. Axelrod, G. Wang et al., “Google usm: Scaling auto- P. Christiano, J. Leike, and R. Lowe, “Training language models
matic speech recognition beyond 100 languages,” arXiv preprint to follow instructions with human feedback,” 2022.
arXiv:2303.01037, 2023. [82] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux,
[61] J. Holmes, Z. Liu, L. Zhang, Y. Ding, T. T. Sio, L. A. McGee, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al.,
J. B. Ashman, X. Li, T. Liu, J. Shen et al., “Evaluating large lan- “Llama: Open and efficient foundation language models,” arXiv
guage models on a highly-specialized topic, radiation oncology preprint arXiv:2302.13971, 2023.
physics,” arXiv preprint arXiv:2304.01938, 2023. [83] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi,
[62] N. Matzakos, S. Doukakis, and M. Moundridou, “Learning math- Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel,
ematics with large language models: A comparative study with L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fer-
computer algebra systems and other tools.” International Journal nandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal,
of Emerging Technologies in Learning, vol. 18, no. 20, 2023. A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez,
[63] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Accurate M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux,
medium-range global weather forecasting with 3d neural net- T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mi-
works,” Nature, vol. 619, no. 7970, pp. 533–538, 2023. haylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein,
[64] X. Man, C. Zhang, C. Li, and J. Shao, “W-mae: Pre-trained R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Sub-
weather model with masked autoencoder for multi-variable ramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan,
weather forecasting,” arXiv preprint arXiv:2304.08754, 2023. P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang,
[65] Y. Liu, K. Duffy, J. G. Dy, and A. R. Ganguly, “Explainable A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2:
deep learning for insights in el niño and river flows,” Nature Open foundation and fine-tuned chat models,” 2023.
Communications, vol. 14, no. 1, p. 339, 2023. [84] S. Chen, T. Shu, H. Zhao, Q. Wan, J. Huang, and C. Li, “Dynamic
[66] H. Wang, S. Hu, and X. Li, “An interpretable deep learning enso multiscale fusion generative adversarial network for radar image
forecasting model,” Ocean-Land-Atmosphere Research, vol. 2, p. extrapolation,” IEEE Transactions on Geoscience and Remote Sens-
0012, 2023. ing, vol. 60, pp. 1–11, 2022.
[67] W. Fang, Q. Xue, L. Shen, and V. S. Sheng, “Survey on the [85] H. Wu, Z. Yao, J. Wang, and M. Long, “Motionrnn: A flexible
application of deep learning in extreme weather prediction,” model for video prediction with spacetime-varying motions,”
Atmosphere, vol. 12, no. 6, p. 661, 2021. in Proceedings of the IEEE/CVF conference on computer vision and
[68] B. Bochenek and Z. Ustrnul, “Machine learning in weather pattern recognition, 2021, pp. 15 435–15 444.
prediction and climate analyses—applications and perspectives,” [86] M. Veillette, S. Samsi, and C. Mattioli, “Sevir: A storm event
Atmosphere, vol. 13, no. 2, p. 180, 2022. imagery dataset for deep learning applications in radar and
[69] K. Jaseena and B. C. Kovoor, “Deterministic weather forecasting satellite meteorology,” Advances in Neural Information Processing
models based on intelligent predictors: A survey,” Journal of King Systems, vol. 33, pp. 22 009–22 019, 2020.
Saud University-Computer and Information Sciences, vol. 34, no. 6, [87] X. Zhu, Y. Xiong, M. Wu, G. Nie, B. Zhang, and Z. Yang,
pp. 3393–3412, 2022. “Weather2k: A multivariate spatio-temporal benchmark dataset
[70] L. Chen, B. Han, X. Wang, J. Zhao, W. Yang, and Z. Yang, for meteorological forecasting based on real-time observa-
“Machine learning methods in weather and climate applications: tion data from ground weather stations,” arXiv preprint
A survey,” Applied Sciences, vol. 13, no. 21, p. 12019, 2023. arXiv:2302.10493, 2023.
[71] A. Jones, J. Kuehnert, P. Fraccaro, O. Meuriot, T. Ishikawa, B. Ed- [88] S. Rasp, P. D. Dueben, S. Scher, J. A. Weyn, S. Mouatadid, and
wards, N. Stoyanov, S. L. Remy, K. Weldemariam, and S. Assefa, N. Thuerey, “Weatherbench: a benchmark data set for data-
“Ai for climate impacts: applications in flood risk,” npj Climate driven weather forecasting,” Journal of Advances in Modeling Earth
and Atmospheric Science, vol. 6, no. 1, p. 63, 2023. Systems, vol. 12, no. 11, p. e2020MS002203, 2020.
[72] M. J. Molina, T. A. O’Brien, G. Anderson, M. Ashfaq, K. E. [89] S. Rasp, S. Hoyer, A. Merose, I. Langmore, P. Battaglia, T. Russel,
Bennett, W. D. Collins, K. Dagon, J. M. Restrepo, and P. A. A. Sanchez-Gonzalez, V. Yang, R. Carver, S. Agrawal et al.,
Ullrich, “A review of recent and emerging machine learning “Weatherbench 2: A benchmark for the next generation of data-
applications for climate variability and weather phenomena,” driven global weather models,” arXiv preprint arXiv:2308.15560,
Artificial Intelligence for the Earth Systems, pp. 1–46, 2023. 2023.
[73] S. K. Mukkavilli, D. S. Civitarese, J. Schmude, J. Jakubik, A. Jones, [90] T. Nguyen, J. Jewik, H. Bansal, P. Sharma, and A. Grover,
N. Nguyen, C. Phillips, S. Roy, S. Singh, C. Watson et al., “Ai foun- “Climatelearn: Benchmarking machine learning for weather and
dation models for weather and climate: Applications, design, and climate modeling,” arXiv preprint arXiv:2307.01909, 2023.
implementation,” arXiv preprint arXiv:2309.10808, 2023. [91] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia,
[74] V. Jacques-Dumas, F. Ragone, P. Borgnat, P. Abry, and F. Bouchet, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits
“Deep learning-based extreme heatwave forecast,” Frontiers in reasoning in large language models,” 2023.
Climate, vol. 4, 2022. [92] S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and
[75] S. Lee and S. Nirjon, “Learning in the wild: When, how, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving
what to learn for on-device dataset adaptation,” in Proceedings of with large language models,” 2023.
the 2nd International Workshop on Challenges in Artificial Intelligence [93] M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, L. Gianinazzi,
and Machine Learning for Internet of Things, 2020, pp. 34–40. J. Gajda, T. Lehmann, M. Podstawski, H. Niewiadomski, P. Ny-
32
czyk, and T. Hoefler, “Graph of thoughts: Solving elaborate directions,” International Journal of Forecasting, vol. 37, no. 1, pp.
problems with large language models,” 2023. 388–427, 2021.
[94] G. P. Zhang, “Time series forecasting using a hybrid arima and [116] A. Lazcano, P. J. Herrera, and M. Monge, “A combined model
neural network model,” Neurocomputing, vol. 50, pp. 159–175, based on recurrent neural networks and graph convolutional net-
2003. works for financial time series forecasting,” Mathematics, vol. 11,
[95] P. Chen, A. Niu, D. Liu, W. Jiang, and B. Ma, “Time series fore- no. 1, p. 224, 2023.
casting of temperatures using sarima: An example from nanjing,” [117] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evalua-
in IOP Conference Series: Materials Science and Engineering, vol. 394. tion of gated recurrent neural networks on sequence modeling,”
IOP Publishing, 2018, p. 052024. arXiv preprint arXiv:1412.3555, 2014.
[96] Y. Chen and S. Tjandra, “Daily collision prediction with sarimax [118] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic
and generalized linear models on the basis of temporal and models,” Advances in neural information processing systems, vol. 33,
weather variables,” Transportation Research Record, vol. 2432, no. 1, pp. 6840–6851, 2020.
pp. 26–36, 2014. [119] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit
[97] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.- models,” arXiv preprint arXiv:2010.02502, 2020.
c. Woo, “Convolutional lstm network: A machine learning ap- [120] C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet,
proach for precipitation nowcasting,” Advances in neural informa- and M. Norouzi, “Palette: Image-to-image diffusion models,” in
tion processing systems, vol. 28, 2015. ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10.
[98] M. Hüsken and P. Stagge, “Recurrent neural networks for time [121] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer,
series classification,” Neurocomputing, vol. 50, pp. 223–235, 2003. “High-resolution image synthesis with latent diffusion models,”
[99] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” in Proceedings of the IEEE/CVF conference on computer vision and
Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. pattern recognition, 2022, pp. 10 684–10 695.
[100] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and [122] F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion
W. Zhang, “Informer: Beyond efficient transformer for long se- models in vision: A survey,” IEEE Transactions on Pattern Analysis
quence time-series forecasting,” in Proceedings of the AAAI confer- and Machine Intelligence, 2023.
ence on artificial intelligence, vol. 35, no. 12, 2021, pp. 11 106–11 115. [123] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch,
[101] M. Chen, H. Peng, J. Fu, and H. Ling, “Autoformer: Search- and D. Cohen-Or, “Prompt-to-prompt image editing with cross
ing transformers for visual recognition,” in Proceedings of the attention control,” arXiv preprint arXiv:2208.01626, 2022.
IEEE/CVF international conference on computer vision, 2021, pp. [124] A. Blattmann, R. Rombach, K. Oktay, J. Müller, and B. Ommer,
12 270–12 280. “Retrieval-augmented diffusion models,” Advances in Neural In-
[102] Y. Zhang and J. Yan, “Crossformer: Transformer utilizing cross- formation Processing Systems, vol. 35, pp. 15 309–15 324, 2022.
dimension dependency for multivariate time series forecasting,” [125] Y. Li, K. Zhou, W. X. Zhao, and J.-R. Wen, “Diffusion models
in The Eleventh International Conference on Learning Representations, for non-autoregressive text generation: A survey,” arXiv preprint
2022. arXiv:2303.06574, 2023.
[103] G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “Etsformer: [126] Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and
Exponential smoothing transformers for time-series forecasting,” L. Sun, “Transformers in time series: A survey,” arXiv preprint
arXiv preprint arXiv:2202.01381, 2022. arXiv:2202.07125, 2022.
[127] K. S. Kalyan, A. Rajasekharan, and S. Sangeetha, “Ammus: A sur-
[104] N. Kitaev, Ł. Kaiser, and A. Levskaya, “Reformer: The efficient
vey of transformer-based pretrained models in natural language
transformer,” arXiv preprint arXiv:2001.04451, 2020.
processing,” arXiv preprint arXiv:2108.05542, 2021.
[105] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer:
[128] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
Frequency enhanced decomposed transformer for long-term se-
for image recognition,” in Proceedings of the IEEE conference on
ries forecasting,” in International Conference on Machine Learning.
computer vision and pattern recognition, 2016, pp. 770–778.
PMLR, 2022, pp. 27 268–27 286.
[129] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and
[106] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
X. He, “Attngan: Fine-grained text to image generation with
networks: A deep learning framework for traffic forecasting,”
attentional generative adversarial networks,” in Proceedings of the
arXiv preprint arXiv:1709.04875, 2017.
IEEE conference on computer vision and pattern recognition, 2018, pp.
[107] S. Chen, T. Shu, H. Zhao, G. Zhong, and X. Chen, 1316–1324.
“Tempee: Temporal–spatial parallel transformer for radar echo [130] Y. Zhang, Y. Wang, Z. Jiang, F. Liao, L. Zheng, D. Tan, J. Chen, and
extrapolation beyond autoregression,” IEEE Transactions on J. Lu, “Diversifying tire-defect image generation based on gener-
Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2023. [Online]. ative adversarial network,” IEEE Transactions on Instrumentation
Available: https://doi.org/10.1109%2Ftgrs.2023.3311510 and Measurement, vol. 71, pp. 1–12, 2022.
[108] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- [131] J. He, W. Shi, K. Chen, L. Fu, and C. Dong, “Gcfsr: a generative
Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adver- and controllable face super resolution method without facial and
sarial networks,” Communications of the ACM, vol. 63, no. 11, pp. gan priors,” in Proceedings of the IEEE/CVF Conference on Computer
139–144, 2020. Vision and Pattern Recognition, 2022, pp. 1889–1898.
[109] P. Dhariwal and A. Nichol, “Diffusion models beat gans on im- [132] J. Park, S. Son, and K. M. Lee, “Content-aware local gan for
age synthesis,” Advances in neural information processing systems, photo-realistic super-resolution,” in Proceedings of the IEEE/CVF
vol. 34, pp. 8780–8794, 2021. International Conference on Computer Vision, 2023, pp. 10 585–
[110] M. Jin, Q. Wen, Y. Liang, C. Zhang, S. Xue, X. Wang, J. Zhang, 10 594.
Y. Wang, H. Chen, X. Li et al., “Large models for time series [133] Z. Zheng, J. Liu, and N. Zheng, “p2 -gan: Efficient stroke style
and spatio-temporal data: A survey and outlook,” arXiv preprint transfer using single style image,” IEEE Transactions on Multime-
arXiv:2310.10196, 2023. dia, 2022.
[111] L. R. Medsker and L. Jain, “Recurrent neural networks,” Design [134] S. M. Bafti, C. S. Ang, G. Marcelli, M. M. Hossain, S. Maxamhud,
and Applications, vol. 5, no. 64-67, p. 2, 2001. and A. D. Tsaousis, “Biogan: An unpaired gan-based image
[112] W. De Mulder, S. Bethard, and M.-F. Moens, “A survey on the to image translation model for microbiological images,” arXiv
application of recurrent neural networks to statistical language preprint arXiv:2306.06217, 2023.
modeling,” Computer Speech & Language, vol. 30, no. 1, pp. 61–98, [135] X. Cheng, J. Zhou, J. Song, and X. Zhao, “A highway traffic image
2015. enhancement algorithm based on improved gan in complex
[113] T. Mikolov, S. Kombrink, L. Burget, J. Černockỳ, and S. Khudan- weather conditions,” IEEE Transactions on Intelligent Transporta-
pur, “Extensions of recurrent neural network language model,” tion Systems, 2023.
in 2011 IEEE international conference on acoustics, speech and signal [136] J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopad-
processing (ICASSP). IEEE, 2011, pp. 5528–5531. hyay, M. Mardani, T. Kurth, D. Hall, Z. Li, K. Azizzadenesheli
[114] T. Mikolov and G. Zweig, “Context dependent recurrent neural et al., “Fourcastnet: A global data-driven high-resolution weather
network language model,” in 2012 IEEE Spoken Language Technol- model using adaptive fourier neural operators,” arXiv preprint
ogy Workshop (SLT). IEEE, 2012, pp. 234–239. arXiv:2202.11214, 2022.
[115] H. Hewamalage, C. Bergmeir, and K. Bandara, “Recurrent neural [137] R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. For-
networks for time series forecasting: Current status and future tunato, A. Pritzel, S. Ravuri, T. Ewalds, F. Alet, Z. Eaton-Rosen
33
et al., “Graphcast: Learning skillful medium-range global weather constrained neural networks,” arXiv preprint arXiv:2208.05424,
forecasting,” arXiv preprint arXiv:2212.12794, 2022. 2022.
[138] K. Chen, T. Han, J. Gong, L. Bai, F. Ling, J.-J. Luo, X. Chen, [159] F. Gerges, M. C. Boufadel, E. Bou-Zeid, H. Nassif, and J. T. Wang,
L. Ma, T. Zhang, R. Su, Y. Ci, B. Li, X. Yang, and W. Ouyang, “A novel bayesian deep learning approach to the downscaling
“Fengwu: Pushing the skillful global medium-range weather of wind speed with uncertainty quantification,” in Pacific-Asia
forecast beyond 10 days lead,” 2023. Conference on Knowledge Discovery and Data Mining. Springer,
[139] L. Chen, X. Zhong, F. Zhang, Y. Cheng, Y. Xu, Y. Qi, and H. Li, 2022, pp. 55–66.
“Fuxi: A cascade machine learning forecasting system for 15-day [160] J. Baño-Medina, R. Manzanas, E. Cimadevilla, J. Fernández,
global weather forecast,” arXiv preprint arXiv:2306.12873, 2023. J. González-Abad, A. S. Cofiño, and J. M. Gutiérrez, “Downscal-
[140] S. R. Cachay, E. Erickson, A. F. C. Bucker, E. Pokropek, W. Po- ing multi-model climate projection ensembles with deep learning
tosnak, S. Bire, S. Osei, and B. Lütjens, “The world as a graph: (deepesd): contribution to cordex eur-44,” Geoscientific Model
Improving el niño forecasts with graph neural networks,” 2021. Development, vol. 15, no. 17, pp. 6747–6758, 2022.
[141] Q. You, Z. Cai, F. Wu, Z. Jiang, N. Pepin, and S. S. Shen, [161] J. González-Abad, Álex Hernández-Garcı́a, P. Harder, D. Rolnick,
“Temperature dataset of cmip6 models over china: evaluation, and J. M. Gutiérrez, “Multi-variable hard physical constraints for
trend and uncertainty,” Climate Dynamics, vol. 57, pp. 17–35, 2021. climate model downscaling,” 2023.
[142] R. Keisler, “Forecasting global weather with graph neural net- [162] P. Harder, V. Ramesh, A. Hernandez-Garcia, Q. Yang, P. Sattigeri,
works,” 2022. D. Szwarcman, C. Watson, and D. Rolnick, “Physics-constrained
[143] Q. Ni, Y. Wang, and Y. Fang, “Ge-stdgn: a novel spatio-temporal deep learning for downscaling,” Copernicus Meetings, Tech.
weather prediction model based on graph evolution,” Applied Rep., 2023.
Intelligence, pp. 1–15, 2022. [163] D. Fuchs, S. C. Sherwood, A. Prasad, K. Trapeznikov, and J. Gim-
[144] M. Ma, P. Xie, F. Teng, B. Wang, S. Ji, J. Zhang, and T. Li, “Histgnn: lett, “Torchclim v1. 0: A deep-learning framework for climate
Hierarchical spatio-temporal graph neural network for weather model physics,” EGUsphere, vol. 2023, pp. 1–25, 2023.
forecasting,” Information Sciences, vol. 648, p. 119580, 2023. [164] M. Mardani, N. Brenowitz, Y. Cohen, J. Pathak, C.-Y. Chen, C.-C.
[145] K. Venkatachalam, P. Trojovskỳ, D. Pamucar, N. Bacanin, and Liu, A. Vahdat, K. Kashinath, J. Kautz, and M. Pritchard, “Gen-
V. Simic, “Dwfh: An improved data-driven deep weather fore- erative residual diffusion modeling for km-scale atmospheric
casting hybrid model using transductive long short term memory downscaling,” 2023.
(t-lstm),” Expert Systems with Applications, vol. 213, p. 119270, [165] C. K. Sønderby, L. Espeholt, J. Heek, M. Dehghani, A. Oliver,
2023. T. Salimans, S. Agrawal, J. Hickey, and N. Kalchbrenner, “Metnet:
[146] L. Chen, F. Du, Y. Hu, Z. Wang, and F. Wang, “Swinrdm: integrate A neural weather model for precipitation forecasting,” arXiv
swinrnn with diffusion model towards high-resolution and high- preprint arXiv:2003.12140, 2020.
quality weather forecasting,” in Proceedings of the AAAI Conference [166] J. Park, I. Lee, M. Son, S. Cho, and C. Kim, “Nowformer: A locally
on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 322–330. enhanced temporal learner for precipitation nowcasting.”
[147] Y. Hu, L. Chen, Z. Wang, and H. Li, “Swinvrnn: A data-driven en- [167] H.-B. Liu and I. Lee, “Mpl-gan: Toward realistic meteorological
semble forecasting model via learned distribution perturbation,” predictive learning using conditional gan,” IEEE Access, vol. 8,
Journal of Advances in Modeling Earth Systems, vol. 15, no. 2, p. pp. 93 179–93 186, 2020.
e2022MS003211, 2023. [168] X. Peng, Q. Li, and J. Jing, “Cngat: A graph neural network model
[148] Z. Ben-Bouallegue, J. A. Weyn, M. C. Clare, J. Dramsch, for radar quantitative precipitation estimation,” IEEE Transactions
P. Dueben, and M. Chantry, “Improving medium-range ensem- on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2021.
ble weather forecasts with hierarchical ensemble transformers,” [169] A. Asperti, F. Merizzi, A. Paparella, G. Pedrazzi, M. Angelinelli,
arXiv preprint arXiv:2303.17195, 2023. and S. Colamonaco, “Precipitation nowcasting with generative
[149] S. Bire, B. Lütjens, D. Newman, and C. Hill, “Oceanfourcast: diffusion models,” 2023.
Emulating ocean models with transformers for adjoint-based [170] J. Choi, Y. Kim, K.-H. Kim, S.-H. Jung, and I. Cho, “Pct-cyclegan:
data assimilation,” Copernicus Meetings, Tech. Rep., 2023. Paired complementary temporal cycle-consistent adversarial net-
[150] L. Li, R. Carver, I. Lopez-Gomez, F. Sha, and J. Anderson, “Seeds: works for radar-based precipitation nowcasting,” in Proceedings
Emulation of weather forecast ensembles with diffusion models,” of the 32nd ACM International Conference on Information and Knowl-
arXiv preprint arXiv:2306.14066, 2023. edge Management, 2023, pp. 348–358.
[151] S. R. Cachay, B. Zhao, H. James, and R. Yu, “Dyffusion: A [171] C. Bai, F. Sun, J. Zhang, Y. Song, and S. Chen, “Rainformer: Fea-
dynamics-informed diffusion model for spatiotemporal forecast- tures extraction balanced network for radar-based precipitation
ing,” arXiv preprint arXiv:2306.01984, 2023. nowcasting,” IEEE Geoscience and Remote Sensing Letters, vol. 19,
[152] O. Ovadia, E. Turkel, A. Kahana, and G. E. Karniadakis, “Ditto: pp. 1–5, 2022.
Diffusion-inspired temporal transformer operator,” arXiv preprint [172] Z. Gao, X. Shi, H. Wang, Y. Zhu, Y. B. Wang, M. Li, and D.-
arXiv:2307.09072, 2023. Y. Yeung, “Earthformer: Exploring space-time transformers for
[153] I. Prapas, N.-I. Bountos, S. Kondylatos, D. Michail, G. Camps- earth system forecasting,” Advances in Neural Information Process-
Valls, and I. Papoutsis, “Televit: Teleconnection-driven trans- ing Systems, vol. 35, pp. 25 390–25 403, 2022.
formers improve subseasonal to seasonal wildfire forecasting,” [173] Z. Yang, X. Yang, and Q. Lin, “Ptct: Patches with 3d-temporal
in Proceedings of the IEEE/CVF International Conference on Computer convolutional transformer network for precipitation nowcast-
Vision, 2023, pp. 3754–3759. ing,” arXiv preprint arXiv:2112.01085, 2021.
[154] S. Chen, G. Long, T. Shen, T. Zhou, and J. Jiang, “Spatial- [174] Z. Ma, H. Zhang, and J. Liu, “Mm-rnn: A multimodal rnn for
temporal prompt learning for federated weather forecasting,” precipitation nowcasting,” IEEE Transactions on Geoscience and
arXiv preprint arXiv:2305.14244, 2023. Remote Sensing, 2023.
[155] X. Zhong, L. Chen, J. Liu, C. Lin, Y. Qi, and H. Li, “Fuxi-extreme: [175] Q. Jin, X. Zhang, X. Xiao, G. Meng, S. Xiang, C. Pan et al.,
Improving extreme rainfall and wind forecasts with diffusion “Spatiotemporal inference network for precipitation nowcasting
model,” 2023. with multi-modal fusion,” IEEE Journal of Selected Topics in Applied
[156] S. Esmaeilzadeh, K. Azizzadenesheli, K. Kashinath, M. Mustafa, Earth Observations and Remote Sensing, 2023.
H. A. Tchelepi, P. Marcus, M. Prabhat, A. Anandkumar [176] Q. Jin, X. Zhang, X. Xiao, Y. Wang, S. Xiang, and C. Pan, “Pre-
et al., “Meshfreeflownet: A physics-constrained deep continuous former: Simple and efficient design for precipitation nowcasting
space-time super-resolution framework,” in SC20: International with transformers,” IEEE Geoscience and Remote Sensing Letters,
Conference for High Performance Computing, Networking, Storage and 2023.
Analysis. IEEE, 2020, pp. 1–15. [177] Z. Gao, X. Shi, B. Han, H. Wang, X. Jin, D. Maddix, Y. Zhu,
[157] M. A. E. R. Hammoud, E. S. Titi, I. Hoteit, and O. Knio, “Cdanet: M. Li, and Y. Wang, “Prediff: Precipitation nowcasting with latent
A physics-informed deep neural network for downscaling fluid diffusion models,” arXiv preprint arXiv:2307.10422, 2023.
flows,” Journal of Advances in Modeling Earth Systems, vol. 14, [178] F. Ye, J. Hu, T.-Q. Huang, L.-J. You, B. Weng, and J.-Y. Gao,
no. 12, p. e2022MS003051, 2022. “Transformer for ei niño-southern oscillation prediction,” IEEE
[158] P. Harder, Q. Yang, V. Ramesh, P. Sattigeri, A. Hernandez- Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
Garcia, C. Watson, D. Szwarcman, and D. Rolnick, “Generating [179] Y.-G. Ham, J.-H. Kim, E.-S. Kim, and K.-W. On, “Unified deep
physically-consistent high-resolution climate data with hard- learning model for el niño/southern oscillation forecasts by in-
34
corporating seasonality in climate data,” Science Bulletin, vol. 66, [201] J. Guibas, M. Mardani, Z. Li, A. Tao, A. Anandkumar, and
no. 13, pp. 1358–1366, 2021. B. Catanzaro, “Adaptive fourier neural operators: Efficient token
[180] A. M. Ahmed, R. C. Deo, Q. Feng, A. Ghahramani, N. Raj, Z. Yin, mixers for transformers,” arXiv preprint arXiv:2111.13587, 2021.
and L. Yang, “Hybrid deep learning method for a week-ahead [202] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked
evapotranspiration forecasting,” Stochastic Environmental Research autoencoders are scalable vision learners,” in Proceedings of the
and Risk Assessment, pp. 1–19, 2021. IEEE/CVF conference on computer vision and pattern recognition,
[181] B. Mu, B. Qin, and S. Yuan, “Enso-gtc: Enso deep learning forecast 2022, pp. 16 000–16 009.
model with a global spatial-temporal teleconnection coupler,” [203] C. Feichtenhofer, Y. Li, K. He et al., “Masked autoencoders as
Journal of Advances in Modeling Earth Systems, vol. 14, no. 12, p. spatiotemporal learners,” Advances in neural information processing
e2022MS003132, 2022. systems, vol. 35, pp. 35 946–35 958, 2022.
[182] D. Xu, Q. Zhang, Y. Ding, and D. Zhang, “Application of a hybrid [204] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A.
arima-lstm model based on the spei for drought forecasting,” y Arcas, “Communication-efficient learning of deep networks
Environmental Science and Pollution Research, vol. 29, no. 3, pp. from decentralized data,” in Artificial intelligence and statistics.
4128–4144, 2022. PMLR, 2017, pp. 1273–1282.
[183] L. Wang, S. Ammons, V. M. Hur, R. L. Sriver, and Z. Zhao, [205] F. S. Varini, J. Boyd-Graber, M. Ciaramita, and M. Leippold,
“Convolutional gru network for seasonal prediction of the el ni\˜ “Climatext: A dataset for climate change topic detection,” 2021.
no-southern oscillation,” arXiv preprint arXiv:2306.10443, 2023. [206] C.-A. Diaconu, S. Saha, S. Günnemann, and X. X. Zhu, “Under-
[184] H. Li, N. Zhang, Z. Xu, X. Li, C. Liu, C. Zhao, and J. Wu, “Dk-stn: standing the role of weather data for earth surface forecasting
A domain knowledge embedded spatio-temporal network model using a convlstm-based model,” in Proceedings of the IEEE/CVF
for mjo forecast,” Expert Systems With Applications, Forthcoming, Conference on Computer Vision and Pattern Recognition, 2022, pp.
2023. 1362–1371.
[185] L. Han, M. Chen, K. Chen, H. Chen, Y. Zhang, B. Lu, L. Song, and [207] S. F. Tekin, O. Karaahmetoglu, F. Ilhan, I. Balaban, and S. S. Kozat,
R. Qin, “A deep learning method for bias correction of ecmwf 24– “Spatio-temporal weather forecasting and attention mechanism
240 h forecasts,” Advances in Atmospheric Sciences, vol. 38, no. 9, on convolutional lstms,” arXiv preprint arXiv:2102.00696, 2021.
pp. 1444–1459, 2021. [208] J. Su, W. Byeon, J. Kossaifi, F. Huang, J. Kautz, and A. Anandku-
[186] T. Yoshikane and K. Yoshimura, “A bias correction method for mar, “Convolutional tensor-train lstm for spatio-temporal learn-
precipitation through recognizing mesoscale precipitation sys- ing,” Advances in Neural Information Processing Systems, vol. 33,
tems corresponding to weather conditions,” PLoS Water, vol. 1, pp. 13 714–13 726, 2020.
no. 5, p. e0000016, 2022. [209] Y. Wang, H. Wu, J. Zhang, Z. Gao, J. Wang, S. Y. Philip, and
[187] Y. Li, F. Tang, X. Gao, T. Zhang, J. Qi, J. Xie, X. Li, and Y. Guo, M. Long, “Predrnn: A recurrent neural network for spatiotempo-
“Numerical weather prediction correction strategy for short-term ral predictive learning,” IEEE Transactions on Pattern Analysis and
wind power forecasting based on bidirectional gated recurrent Machine Intelligence, vol. 45, no. 2, pp. 2208–2225, 2022.
unit and xgboost,” Frontiers in Energy Research, vol. 9, p. 836144, [210] Y. Wang, L. Jiang, M.-H. Yang, L.-J. Li, M. Long, and L. Fei-Fei,
2022. “Eidetic 3d lstm: A model for video prediction and beyond,” in
[188] X. Yang, S. Yang, M. L. Tan, H. Pan, H. Zhang, G. Wang, R. He, International conference on learning representations, 2018.
and Z. Wang, “Correcting the bias of daily satellite precipitation [211] C. Luo, X. Zhao, Y. Sun, X. Li, and Y. Ye, “Predrann: the spa-
estimates in tropical regions using deep neural network,” Journal tiotemporal attention convolution recurrent neural network for
of Hydrology, vol. 608, p. 127656, 2022. precipitation nowcasting,” Knowledge-Based Systems, vol. 239, p.
[189] A. Blanchard, N. Parashar, B. Dodov, C. Lessig, and T. Sapsis, 107900, 2022.
“A multi-scale deep learning framework for projecting weather [212] M. Bilgili, A. Ilhan, and Ş. Ünal, “Time-series prediction of hourly
extremes,” 2022. atmospheric pressure using anfis and lstm approaches,” Neural
[190] Y. Han, L. Mi, L. Shen, C. Cai, Y. Liu, K. Li, and G. Xu, “A short- Computing and Applications, vol. 34, no. 18, pp. 15 633–15 648, 2022.
term wind speed prediction method utilizing novel hybrid deep [213] B. Usharani, “Ilf-lstm: Enhanced loss function in lstm to predict
learning algorithms to correct numerical weather forecasting,” the sea surface temperature,” Soft Computing, vol. 27, no. 18, pp.
Applied Energy, vol. 312, p. 118777, 2022. 13 129–13 141, 2023.
[191] F. Wang and D. Tian, “On deep learning-based bias correction [214] S. Tang, C. Li, P. Zhang, and R. Tang, “Swinlstm: Improving
and downscaling of multiple climate models simulations,” Cli- spatiotemporal prediction accuracy using swin transformer and
mate dynamics, vol. 59, no. 11-12, pp. 3451–3468, 2022. lstm,” in Proceedings of the IEEE/CVF International Conference on
[192] T. Ge, J. Pathak, A. Subramaniam, and K. Kashinath, “Dl- Computer Vision (ICCV), October 2023, pp. 13 470–13 479.
corrector-remapper: A grid-free bias-correction deep learning [215] L. Zhifeng, D. Feng, L. Jianyong, Z. Yue, and C. Hetao, “Com-
methodology for data-driven high-resolution global weather parison of blstm-attention and blstm-transformer models for
forecasting,” arXiv preprint arXiv:2210.12293, 2022. wind speed prediction,” in Proceedings of the Bulgarian Academy
[193] D. J. Fulton, B. J. Clarke, and G. C. Hegerl, “Bias correcting of Sciences, vol. 75, no. 1, 2022, pp. 80–89.
climate model simulations using unpaired image-to-image trans- [216] L. Tian, X. Li, Y. Ye, P. Xie, and Y. Li, “A generative adversarial
lation networks,” Artificial Intelligence for the Earth Systems, vol. 2, gated recurrent unit model for precipitation nowcasting,” IEEE
no. 2, p. e220031, 2023. Geoscience and Remote Sensing Letters, vol. 17, no. 4, pp. 601–605,
[194] B. Wu, W. Chen, W. Wang, B. Peng, L. Sun, and L. Chen, 2019.
“Weathergnn: Exploiting complicated relationships in numerical [217] J. Leinonen, D. Nerini, and A. Berne, “Stochastic super-resolution
weather prediction bias correction,” 2023. for downscaling time-evolving atmospheric fields with a gener-
[195] N. Webersinke, M. Kraus, J. A. Bingler, and M. Leippold, “Cli- ative adversarial network,” IEEE Transactions on Geoscience and
matebert: A pretrained language model for climate-related text,” Remote Sensing, vol. 59, no. 9, pp. 7211–7223, 2020.
2022. [218] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo,
[196] B. J. Fard, S. A. Hasan, and J. E. Bell, “Climedbert: A pre-trained “Swin transformer: Hierarchical vision transformer using shifted
language model for climate and health-related text,” 2022. windows,” in Proceedings of the IEEE/CVF international conference
[197] Z. Bi, N. Zhang, Y. Xue, Y. Ou, D. Ji, G. Zheng, and H. Chen, on computer vision, 2021, pp. 10 012–10 022.
“Oceangpt: A large language model for ocean science tasks,” [219] V. Zantedeschi, D. De Martini, C. Tong, C. S. de Witt,
2023. A. Kalaitzis, M. Chantry, and D. Watson-Parris, “Towards data-
[198] T. Schimanski, J. Bingler, C. Hyslop, M. Kraus, and M. Leippold, driven physics-informed global precipitation forecasting from
“Climatebert-netzero: Detecting and assessing net zero and re- satellite imagery,” in Proceedings of the AI for Earth Sciences Work-
duction targets,” 2023. shop at NeurIPS, 2020.
[199] E. C. Garrido-Merchán, C. González-Barthe, and M. C. Vaca, [220] J. Leinonen, U. Hamann, D. Nerini, U. Germann, and G. Franch,
“Fine-tuning climatebert transformer with climatext for the dis- “Latent diffusion models for generative precipitation nowcast-
closure analysis of climate-related financial risks,” 2023. ing with accurate uncertainty quantification,” arXiv preprint
[200] K. Chen, Y. Meng, X. Sun, S. Guo, T. Zhang, J. Li, and C. Fan, arXiv:2304.12891, 2023.
“Badpre: Task-agnostic backdoor attacks to pre-trained nlp foun- [221] S. R. Cachay, V. Ramesh, J. N. Cole, H. Barker, and D. Rolnick,
dation models,” arXiv preprint arXiv:2110.02467, 2021. “Climart: A benchmark dataset for emulating atmospheric ra-
35
diative transfer in weather and climate models,” arXiv preprint [243] J. Sleeman, D. Chung, A. Gnanadesikan, J. Brett, Y. Kevrekidis,
arXiv:2111.14671, 2021. M. Hughes, T. Haine, M.-A. Pradal, R. Gelderloos, C. Ashcraft,
[222] P. Lippe, B. S. Veeling, P. Perdikaris, R. E. Turner, and J. Brand- C. Tang, A. Saksena, and L. White, “A generative adversarial
stetter, “Pde-refiner: Achieving accurate long rollouts with neural network for climate tipping point discovery (tip-gan),” 2023.
pde solvers,” arXiv preprint arXiv:2308.05732, 2023. [244] Y. Meng, E. Rigall, X. Chen, F. Gao, J. Dong, and S. Chen,
[223] Y. Hatanaka, Y. Glaser, G. Galgon, G. Torri, and P. Sadowski, “Dif- “Physics-guided generative adversarial networks for sea subsur-
fusion models for high-resolution solar forecasts,” arXiv preprint face temperature prediction,” IEEE transactions on neural networks
arXiv:2302.00170, 2023. and learning systems, 2021.
[224] G. P. Høivang, “Diffmet: Diffusion models and deep learning for [245] Y. Meng, F. Gao, E. Rigall, R. Dong, J. Dong, and Q. Du, “Physical
precipitation nowcasting,” Master’s thesis, 2023. knowledge-enhanced deep neural network for sea surface tem-
[225] A. Radford, L. Metz, and S. Chintala, “Unsupervised represen- perature prediction,” IEEE Transactions on Geoscience and Remote
tation learning with deep convolutional generative adversarial Sensing, vol. 61, pp. 1–13, 2023.
networks,” arXiv preprint arXiv:1511.06434, 2015. [246] B. Lütjens, B. Leshchinskiy, C. Requena-Mesa, F. Chishtie,
[226] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training N. Dı́az-Rodrı́guez, O. Boulais, A. Sankaranarayanan, A. Pina,
for high fidelity natural image synthesis,” in International Confer- Y. Gal, C. Raı̈ssi et al., “Physically-consistent generative adver-
ence on Learning Representations, 2018. sarial networks for coastal flood visualization,” arXiv preprint
[227] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing arXiv:2104.04785, 2021.
of gans for improved quality, stability, and variation,” in Interna- [247] T. Yuan, J. Zhu, W. Wang, J. Lu, X. Wang, X. Li, and K. Ren,
tional Conference on Learning Representations, 2018. “A space-time partial differential equation based physics-guided
neural network for sea surface temperature prediction,” Remote
[228] A. Bihlo, “A generative adversarial network approach to (en-
Sensing, vol. 15, no. 14, p. 3498, 2023.
semble) weather prediction,” Neural Networks, vol. 139, pp. 1–16,
2021. [248] Z. Chen, J. Gao, W. Wang, and Z. Yan, “Physics-informed gen-
erative neural network: an application to troposphere tempera-
[229] R. Gupta, M. Mustafa, and K. Kashinath, “Climate-style gan:
ture prediction,” Environmental Research Letters, vol. 16, no. 6, p.
Modeling turbulent climate dynamics using style-gan,” in AI for
065003, 2021.
Earth Science Workshop, 2020.
[249] F. Lin, X. Yuan, Y. Zhang, P. Sigdel, L. Chen, L. Peng, and N.-
[230] K. Klemmer, S. Saha, M. Kahl, T. Xu, and X. X. Zhu, “Genera-
F. Tzeng, “Comprehensive transformer-based model architecture
tive modeling of spatio-temporal weather patterns with extreme
for real-world storm prediction,” in Joint European Conference on
event conditioning,” arXiv preprint arXiv:2104.12469, 2021.
Machine Learning and Knowledge Discovery in Databases. Springer,
[231] S. Ravuri, K. Lenc, M. Willson, D. Kangin, R. Lam, P. Mirowski, 2023, pp. 54–71.
M. Fitzsimons, M. Athanassiadou, S. Kashem, S. Madge et al., [250] Ç. Küçük, A. Giannakos, S. Schneider, and A. Jann, “Transformer-
“Skilful precipitation nowcasting using deep generative models based nowcasting of radar composites from satellite images for
of radar,” Nature, vol. 597, no. 7878, pp. 672–677, 2021. severe weather,” arXiv preprint arXiv:2310.19515, 2023.
[232] K. Klemmer, T. Xu, B. Acciaio, and D. B. Neill, “Spate-gan: [251] A. Bojesomo, H. Al-Marzouqi, P. Liatsis, G. Cong, and M. Ra-
Improved generative modeling of dynamic spatio-temporal pat- manath, “Spatiotemporal swin-transformer network for short
terns with an autoregressive embedding loss,” in Proceedings of time weather forecasting.” in CIKM Workshops, 2021.
the AAAI Conference on Artificial Intelligence, vol. 36, no. 4, 2022, [252] A. Chattopadhyay, M. Mustafa, P. Hassanzadeh, E. Bach,
pp. 4523–4531. and K. Kashinath, “Towards physically consistent data-
[233] Y. Ji, B. Gong, M. Langguth, A. Mozaffari, and X. Zhi, “Clgan: driven weather forecasting: Integrating data assimilation with
a generative adversarial network (gan)-based video prediction equivariance-preserving deep spatial transformers,” 2021.
model for precipitation nowcasting,” Geoscientific Model Develop- [253] O. Bilgin, P. Maka, T. Vergutz, and S. Mehrkanoon, “Tent: Ten-
ment, vol. 16, no. 10, pp. 2737–2752, 2023. sorized encoder transformer for temperature forecasting,” arXiv
[234] C. Luo, X. Li, Y. Ye, S. Feng, and M. K. Ng, “Experimental study preprint arXiv:2106.14742, 2021.
on generative adversarial network for precipitation nowcasting,” [254] A. Bojesomo, H. AlMarzouqi, and P. Liatsis, “A novel transformer
IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. network with shifted window cross-attention for spatiotemporal
1–20, 2022. weather forecasting,” IEEE Journal of Selected Topics in Applied
[235] R. Wang, L. Su, W. K. Wong, A. K. Lau, and J. C. Fung, “Skill- Earth Observations and Remote Sensing, 2023.
ful radar-based heavy rainfall nowcasting using task-segmented [255] Y. Gao, S. Miyata, Y. Matsunami, and Y. Akashi, “Spatio-temporal
generative adversarial network,” IEEE Transactions on Geoscience interpretable neural network for solar irradiation prediction us-
and Remote Sensing, 2023. ing transformer,” Energy and Buildings, vol. 297, p. 113461, 2023.
[236] L. Harris, A. T. McRae, M. Chantry, P. D. Dueben, and T. N. [256] S. A. Vaghefi, Q. Wang, V. Muccione, J. Ni, M. Kraus, J. Bingler,
Palmer, “A generative deep learning approach to stochastic T. Schimanski, C. Colesanti-Senni, N. Webersinke, C. Huggel,
downscaling of precipitation forecasts,” Journal of Advances in and M. Leippold, “chatclimate: Grounding conversational ai in
Modeling Earth Systems, vol. 14, no. 10, p. e2022MS003120, 2022. climate science,” 2023.
[237] N. J. Annau, A. J. Cannon, and A. H. Monahan, “Algorith- [257] A. Krishnan and V. S. Anoop, “Climatenlp: Analyzing public
mic hallucinations of near-surface winds: Statistical downscaling sentiment towards climate change using natural language pro-
with generative adversarial networks to convection-permitting cessing,” 2023.
scales,” Artificial Intelligence for the Earth Systems, 2023. [258] A. Auzepy, E. Tönjes, D. Lenz, and C. Funk, “Evaluating tcfd
[238] K. Dai, X. Li, Y. Ye, S. Feng, D. Qin, and R. Ye, “Mstcgan: reporting: A new application of zero-shot analysis to climate-
Multiscale time conditional generative adversarial network for related financial disclosures,” 2023.
long-term satellite image sequence prediction,” IEEE Transactions [259] M. Kraus, J. A. Bingler, M. Leippold, T. Schimanski, C. C. Senni,
on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022. D. Stammbach, S. A. Vaghefi, and N. Webersinke, “Enhancing
[239] Y. Kim and S. Hong, “Very short-term rainfall prediction using large language models with climate resources,” 2023.
ground radar observations and conditional generative adversar- [260] T. Wilson, P.-N. Tan, and L. Luo, “A low rank weighted graph
ial networks,” IEEE Transactions on Geoscience and Remote Sensing, convolutional approach to weather prediction,” in 2018 IEEE
vol. 60, pp. 1–8, 2021. International Conference on Data Mining (ICDM). IEEE, 2018, pp.
[240] P. Hess, M. Drüke, S. Petri, F. M. Strnad, and N. Boers, “Physi- 627–636.
cally constrained generative adversarial networks for improving [261] N. Y. Ayadi, C. Faron, F. Michel, F. Gandon, and O. Corby,
precipitation fields from earth system models,” Nature Machine “Wekg-mf: A knowledge graph of observational weather data,”
Intelligence, vol. 4, no. 10, pp. 828–839, 2022. in European Semantic Web Conference. Springer, 2022, pp. 101–106.
[241] C. Besombes, O. Pannekoucke, C. Lapeyre, B. Sanderson, and [262] P. Li, Y. Yu, D. Huang, Z.-H. Wang, and A. Sharma, “Regional
O. Thual, “Producing realistic climate data with generative ad- heatwave prediction using graph neural network and weather
versarial networks,” Nonlinear Processes in Geophysics, vol. 28, station data,” Geophysical Research Letters, vol. 50, no. 7, p.
no. 3, pp. 347–370, 2021. e2023GL103405, 2023.
[242] E. Balogun, R. Buechler, R. Rajagopal, and A. Majumdar, “Tem- [263] J. Oskarsson, T. Landelius, and F. Lindsten, “Graph-based neural
peraturegan: Generative modeling of regional atmospheric tem- weather prediction for limited area modeling,” arXiv preprint
peratures,” 2023. arXiv:2309.17370, 2023.
36
[264] J. Han, H. Liu, H. Zhu, H. Xiong, and D. Dou, “Joint air quality [285] T. Ballard and G. Erinjippurath, “Contrastive learning for cli-
and weather prediction based on multi-adversarial spatiotempo- mate model bias correction and super-resolution,” arXiv preprint
ral networks,” in Proceedings of the AAAI Conference on Artificial arXiv:2211.07555, 2022.
Intelligence, vol. 35, no. 5, 2021, pp. 4081–4089. [286] X. Hu, M. A. Naiel, A. Wong, M. Lamm, and P. Fieguth, “Runet:
[265] J. Han, H. Liu, H. Xiong, and J. Yang, “Semi-supervised air A robust unet architecture for image super-resolution,” in Pro-
quality forecasting via self-supervised hierarchical graph neural ceedings of the IEEE/CVF Conference on Computer Vision and Pattern
network,” IEEE Transactions on Knowledge and Data Engineering, Recognition Workshops, 2019, pp. 0–0.
vol. 35, no. 5, pp. 5230–5243, 2022. [287] F. Min, L. Wang, S. Pan, and G. Song, “D 2 unet: Dual decoder
[266] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, u-net for seismic image super-resolution reconstruction,” IEEE
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–13,
et al., “An image is worth 16x16 words: Transformers for image 2023.
recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. [288] Q. Yu, M. Zhu, Q. Zeng, H. Wang, Q. Chen, X. Fu, and Z. Qing,
[267] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Mon- “Weather radar super-resolution reconstruction based on residual
fardini, “The graph neural network model,” IEEE transactions on attention back-projection network,” Remote Sensing, vol. 15, no. 8,
neural networks, vol. 20, no. 1, pp. 61–80, 2008. p. 1999, 2023.
[268] G.-G. Wang, H. Cheng, Y. Zhang, and H. Yu, “Enso analysis and [289] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and
prediction using deep learning: A review,” Neurocomputing, 2022. C. Change Loy, “Esrgan: Enhanced super-resolution generative
[269] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: adversarial networks,” in Proceedings of the European conference on
Densification and shrinking diameters,” ACM transactions on computer vision (ECCV) workshops, 2018, pp. 0–0.
Knowledge Discovery from Data (TKDD), vol. 1, no. 1, pp. 2–es, [290] C. D. Watson, C. Wang, T. Lynar, and K. Weldemariam, “Investi-
2007. gating two super-resolution methods for downscaling precipita-
[270] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, tion: Esrgan and car,” arXiv preprint arXiv:2012.01233, 2020.
“Hierarchical graph representation learning with differentiable [291] J. Wang, Z. Liu, I. Foster, W. Chang, R. Kettimuthu, and V. R. Ko-
pooling,” Advances in neural information processing systems, vol. 31, tamarthi, “Fast and accurate learned multiresolution dynamical
2018. downscaling for precipitation,” Geoscientific Model Development,
[271] J.-H. Lee, S. S. Lee, H. G. Kim, S.-K. Song, S. Kim, and Y. M. Ro, vol. 14, no. 10, pp. 6355–6372, 2021.
“Mcsip net: Multichannel satellite image prediction via deep neu- [292] K. Stengel, A. Glaws, D. Hettinger, and R. N. King, “Adversarial
ral network,” IEEE Transactions on Geoscience and Remote Sensing, super-resolution of climatological wind and solar data,” Proceed-
vol. 58, no. 3, pp. 2212–2224, 2019. ings of the National Academy of Sciences, vol. 117, no. 29, pp. 16 805–
[272] J. Cuomo and V. Chandrasekar, “Developing deep learning mod- 16 815, 2020.
els for storm nowcasting,” IEEE Transactions on Geoscience and [293] N. P. Juan, J. O. Rodrı́guez, V. N. Valdecantos, and G. Iglesias,
Remote Sensing, vol. 60, pp. 1–13, 2021. “Data-driven and physics-based approach for wave downscal-
[273] A. Gong, R. Li, B. Pan, H. Chen, G. Ni, and M. Chen, “Enhancing ing: A comparative study,” Ocean Engineering, vol. 285, p. 115380,
spatial variability representation of radar nowcasting with gen- 2023.
erative adversarial networks,” Remote Sensing, vol. 15, no. 13, p.
[294] D. Feng, Z. Tan, and Q. He, “Physics-informed neural net-
3306, 2023.
works of the saint-venant equations for downscaling a large-
[274] M. R. Ehsani, A. Zarei, H. V. Gupta, K. Barnard, E. Lyons,
scale river model,” Water Resources Research, vol. 59, no. 2, p.
and A. Behrangi, “Nowcasting-nets: Representation learning to
e2022WR033168, 2023.
mitigate latency gap of satellite precipitation products using
[295] M. Bocquet, , J. Brajard, A. Carrassi, L. Bertino, , and and,
convolutional and recurrent neural networks,” IEEE Transactions
“Bayesian inference of chaotic dynamics by merging data
on Geoscience and Remote Sensing, vol. 60, pp. 1–21, 2022.
assimilation, machine learning and expectation-maximization,”
[275] J. G. Fernández and S. Mehrkanoon, “Broad-unet: Multi-scale
Foundations of Data Science, vol. 2, no. 1, pp. 55–80, 2020. [Online].
feature learning for nowcasting tasks,” Neural Networks, vol. 144,
Available: https://doi.org/10.3934%2Ffods.2020004
pp. 419–427, 2021.
[276] C. Huang, C. Bai, S. Chan, and J. Zhang, “Mmstn: A multi- [296] A. J. Geer, “Learning earth system models from observations:
modal spatial-temporal network for tropical cyclone short- machine learning or data assimilation?” Philosophical Transactions
term prediction,” Geophysical Research Letters, vol. 49, no. 4, p. of the Royal Society A, vol. 379, no. 2194, p. 20200089, 2021.
e2021GL096898, 2022. [297] D. Hershcovich, N. Webersinke, M. Kraus, J. A. Bingler, and
[277] C. Luo, X. Li, and Y. Ye, “Pfst-lstm: A spatiotemporal lstm model M. Leippold, “Towards climate awareness in nlp research,” arXiv
with pseudoflow prediction for precipitation nowcasting,” IEEE preprint arXiv:2205.05071, 2022.
Journal of Selected Topics in Applied Earth Observations and Remote [298] OpenAI, “Gpt-4 technical report,” 2023.
Sensing, vol. 14, pp. 843–857, 2020. [299] T. Knutson, S. J. Camargo, J. C. Chan, K. Emanuel, C.-H. Ho,
[278] X. Dong, Z. Zhao, Y. Wang, J. Wang, and C. Hu, “Motion-guided J. Kossin, M. Mohapatra, M. Satoh, M. Sugi, K. Walsh et al., “Trop-
global–local aggregation transformer network for precipitation ical cyclones and climate change assessment: Part ii: Projected
nowcasting,” IEEE Transactions on Geoscience and Remote Sensing, response to anthropogenic warming,” Bulletin of the American
vol. 60, pp. 1–16, 2022. Meteorological Society, vol. 101, no. 3, pp. E303–E322, 2020.
[279] V. L. Guen and N. Thome, “Disentangling physical dynamics [300] C. Bai, Z. Cai, X. Yin, and J. Zhang, “Lsdssimr: Large-scale dust
from unknown factors for unsupervised video prediction,” in storm database based on satellite images and meteorological
Proceedings of the IEEE/CVF Conference on Computer Vision and reanalysis data,” IEEE Journal of Selected Topics in Applied Earth
Pattern Recognition, 2020, pp. 11 474–11 484. Observations and Remote Sensing, 2023.
[280] L. C. Evans, Partial differential equations. American Mathematical [301] K. Kashinath, M. Mudigonda, S. Kim, L. Kapp-Schwoerer,
Society, 2022, vol. 19. A. Graubner, E. Karaismailoglu, L. Von Kleist, T. Kurth,
[281] M. Andrychowicz, L. Espeholt, D. Li, S. Merchant, A. Merose, A. Greiner, A. Mahesh et al., “Climatenet: An expert-labeled
F. Zyda, S. Agrawal, and N. Kalchbrenner, “Deep learning for open dataset and deep learning architecture for enabling high-
day forecasts from sparse observations,” 2023. precision analyses of extreme weather,” Geoscientific Model Devel-
[282] W. Cai, A. Santoso, G. Wang, S.-W. Yeh, S.-I. An, K. M. Cobb, opment, vol. 14, no. 1, pp. 107–124, 2021.
M. Collins, E. Guilyardi, F.-F. Jin, J.-S. Kug et al., “Enso and [302] E. Racah, C. Beckham, T. Maharaj, S. Ebrahimi Kahou, M. Prab-
greenhouse warming,” Nature Climate Change, vol. 5, no. 9, pp. hat, and C. Pal, “Extremeweather: A large-scale climate dataset
849–859, 2015. for semi-supervised detection, localization, and understanding of
[283] J. Zhang, K. Howard, C. Langston, B. Kaney, Y. Qi, L. Tang, extreme weather events,” Advances in neural information processing
H. Grams, Y. Wang, S. Cocks, S. Martinaitis et al., “Multi-radar systems, vol. 30, 2017.
multi-sensor (mrms) quantitative precipitation estimation: Initial [303] R. A. Sobash, D. J. Gagne, C. L. Becker, D. Ahijevych, G. N.
operating capabilities,” Bulletin of the American Meteorological Gantos, and C. S. Schwartz, “Diagnosing storm mode with
Society, vol. 97, no. 4, pp. 621–638, 2016. deep learning in convection-allowing models,” Monthly Weather
[284] S. C. M. Sharma and A. Mitra, “Resdeepd: A residual super- Review, 2023.
resolution network for deep downscaling of daily precipitation [304] E. M. Rasmusson and T. H. Carpenter, “Variations in tropical sea
over india,” Environmental Data Science, vol. 1, p. e19, 2022. surface temperature and surface wind fields associated with the
37
southern oscillation/el niño,” Monthly Weather Review, vol. 110, [326] S. Peng, Y. Ding, W. Liu, and Z. Li, “1 km monthly temperature
no. 5, pp. 354–384, 1982. and precipitation dataset for china from 1901 to 2017,” Earth
[305] M. Latif, D. Anderson, T. Barnett, M. Cane, R. Kleeman, A. Leet- System Science Data, vol. 11, no. 4, pp. 1931–1946, 2019.
maa, J. O’Brien, A. Rosati, and E. Schneider, “A review of the [327] A. Kitamoto, J. Hwang, B. Vuillod, L. Gautier, Y. Tian, and
predictability and prediction of enso,” Journal of Geophysical Re- T. Clanuwat, “Digital typhoon: Long-term satellite image dataset
search: Oceans, vol. 103, no. C7, pp. 14 375–14 393, 1998. for the spatio-temporal modeling of tropical cyclones,” arXiv
[306] D. Song, X. Su, W. Li, Z. Sun, T. Ren, W. Liu, and A.-A. Liu, preprint arXiv:2311.02665, 2023.
“Spatial-temporal transformer network for multi-year enso pre- [328] M. Sit, B.-C. Seo, and I. Demir, “Iowarain: A statewide rain event
diction,” Frontiers in Marine Science, vol. 10, p. 1143499, 2023. dataset based on weather radars and quantitative precipitation
[307] W. Fang, Y. Sha, and V. S. Sheng, “Survey on the application of estimation,” arXiv preprint arXiv:2107.03432, 2021.
artificial intelligence in enso forecasting,” Mathematics, vol. 10, [329] S. Wang, Y. Li, J. Zhang, Q. Meng, L. Meng, and F. Gao, “Pm2.
no. 20, p. 3793, 2022. 5-gnn: A domain knowledge enhanced graph neural network
[308] M. Liu-Schiaffini, C. E. Singer, N. Kovachki, T. Schneider, K. Az- for pm2. 5 forecasting,” in Proceedings of the 28th international
izzadenesheli, and A. Anandkumar, “Tipping point forecasting conference on advances in geographic information systems, 2020, pp.
in non-stationary dynamics on function spaces,” 2023. 163–166.
[309] A. Gnanadesikan, J. Brett, J. Sleeman, and D. Chung, “Using ai [330] X. Chen, K. Feng, N. Liu, Y. Lu, Z. Tong, B. Ni, Z. Liu, and
to detect climate tipping points-or why it’s hard to understand N. Lin, “Rainnet: a large-scale dataset for spatial precipitation
rapid changes in the earth system,” 2023. downscaling,” arXiv preprint arXiv:2012.09700, 2020.
[310] M. Rietkerk, R. Bastiaansen, S. Banerjee, J. van de Koppel, M. Bau- [331] R. Kurinchi-Vendhan, “Continental united states solar irradi-
dena, and A. Doelman, “Evasion of tipping in complex systems ance,” 9 2021.
through spatial pattern formation,” Science, vol. 374, no. 6564, p. [332] C. Requena-Mesa, V. Benson, M. Reichstein, J. Runge, and J. Den-
eabj0359, 2021. zler, “Earthnet2021: A large-scale dataset and challenge for earth
[311] T. M. Bury, R. Sujith, I. Pavithran, M. Scheffer, T. M. Lenton, surface forecasting as a guided video prediction task.” in Pro-
M. Anand, and C. T. Bauch, “Deep learning for early warning ceedings of the IEEE/CVF Conference on Computer Vision and Pattern
signals of tipping points,” Proceedings of the National Academy of Recognition, 2021, pp. 1132–1142.
Sciences, vol. 118, no. 39, p. e2106140118, 2021. [333] T. Kim, N. Ho, D. Kim, and S.-Y. Yun, “Benchmark dataset
[312] C. Zhang, “Madden-julian oscillation,” Reviews of Geophysics, for precipitation forecasting by post-processing the numerical
vol. 43, no. 2, 2005. weather prediction,” arXiv preprint arXiv:2206.15241, 2022.
[313] ——, “Madden–julian oscillation: Bridging weather and climate,” [334] M. Paulat, C. Frei, M. Hagen, and H. Wernli, “A gridded dataset
Bulletin of the American Meteorological Society, vol. 94, no. 12, pp. of hourly precipitation in germany: Its construction, climatology
1849–1870, 2013. and application,” Meteorologische Zeitschrift, vol. 17, pp. 719–732,
[314] C. Minixhofer, M. Swan, C. McMeekin, and P. Andreadis, 2008.
“Droughted: A dataset and methodology for drought forecasting [335] Y. Tang, J. Zhou, X. Pan, Z. Gong, and J. Liang, “Postrainbench:
spanning multiple climate zones,” in ICML 2021 Workshop on A comprehensive benchmark and a new model for precipitation
Tackling Climate Change with Machine Learning, 2021. forecasting,” 2023.
[315] V. Grabar, A. Marusov, A. Zaytsev, Y. Maximov, N. Sotiriadi, [336] G. Larvor, L. Berthomier, V. Chabot, B. Le Pape, B. Pradel, and
and A. Bulkin, “Long-term drought prediction using deep neu- L. Perez, “Meteonet, an open reference weather dataset by meteo-
ral networks based on geospatial weather data,” arXiv preprint france,” 2020.
arXiv:2309.06212, 2023. [337] Y. Choi, K. Cha, M. Back, H. Choi, and T. Jeon, “Rain-f+: The data-
[316] A. Danandeh Mehr, A. Rikhtehgar Ghiasi, Z. M. Yaseen, A. U. driven precipitation prediction model for integrated weather
Sorman, and L. Abualigah, “A novel intelligent deep learning observations,” Remote Sensing, vol. 13, no. 18, p. 3627, 2021.
predictive model for meteorological drought forecasting,” Journal [338] C. S. de Witt, C. Tong, V. Zantedeschi, D. De Martini, A. Kalaitzis,
of Ambient Intelligence and Humanized Computing, vol. 14, no. 8, pp. M. Chantry, D. Watson-Parris, and P. Bilinski, “Rainbench: To-
10 441–10 455, 2023. wards data-driven global precipitation forecasting from satellite
[317] F. A. Prodhan, J. Zhang, S. S. Hasan, T. P. P. Sharma, and H. P. imagery,” in Proceedings of the AAAI Conference on Artificial Intelli-
Mohana, “A review of machine learning methods for drought gence, vol. 35, no. 17, 2021, pp. 14 902–14 910.
hazard monitoring and forecasting: Current research trends, chal- [339] T. Diggelmann, J. Boyd-Graber, J. Bulian, M. Ciaramita, and
lenges, and future research directions,” Environmental Modelling M. Leippold, “Climate-fever: A dataset for verification of real-
& Software, vol. 149, p. 105327, 2022. world climate claims,” 2021.
[318] R. Mendelsohn, K. Emanuel, S. Chonabayashi, and L. Bakkensen, [340] T. Laud, D. Spokoyny, T. Corringham, and T. Berg-Kirkpatrick,
“The impact of climate change on global tropical cyclone dam- “Climabench: A benchmark dataset for climate change text un-
age,” Nature climate change, vol. 2, no. 3, pp. 205–209, 2012. derstanding in english,” arXiv preprint arXiv:2301.04253, 2023.
[319] D. J. Befort, K. I. Hodges, and A. Weisheimer, “Seasonal predic- [341] R. Vaid, K. Pant, and M. Shrivastava, “Towards fine-grained
tion of tropical cyclones over the north atlantic and western north classification of climate change related social media text,” in
pacific,” Journal of Climate, vol. 35, no. 5, pp. 1385–1397, 2022. Proceedings of the 60th Annual Meeting of the Association for Compu-
[320] M. Scheuerer, M. B. Switanek, R. P. Worsnop, and T. M. Hamill, tational Linguistics: Student Research Workshop, 2022, pp. 434–443.
“Using artificial neural networks for generating probabilistic sub- [342] P. Mishra and R. Mittal, “Neuralnere: Neural named entity
seasonal precipitation forecasts over california,” Monthly Weather relationship extraction for end-to-end climate change knowledge
Review, vol. 148, no. 8, pp. 3489–3506, 2020. graph construction,” in ICML 2021 Workshop on Tackling
[321] D. Specq and L. Batté, “Improving subseasonal precipitation Climate Change with Machine Learning, 2021. [Online]. Available:
forecasts through a statistical–dynamical approach: application https://www.climatechange.ai/papers/icml2021/76
to the southwest tropical pacific,” Climate Dynamics, vol. 55, no. [343] K. E. Trenberth and J. G. Olson, “An evaluation and intercompar-
7-8, pp. 1913–1927, 2020. ison of global analyses from the national meteorological center
[322] C. O. de Burgh-Day and T. Leeuwenburg, “Machine learning for and the european centre for medium range weather forecasts,”
numerical weather and climate modelling: a review,” EGUsphere, Bulletin of the American Meteorological Society, vol. 69, no. 9, pp.
vol. 2023, pp. 1–48, 2023. 1047–1057, 1988.
[323] Q. Yang, C.-Y. Lee, M. K. Tippett, D. R. Chavas, and T. R. Knut- [344] J. A. Carton and B. S. Giese, “Soda: A reanalysis of ocean climate,”
son, “Machine learning–based hurricane wind reconstruction,” J. Geophys. Res., submitted, 2005.
Weather and Forecasting, vol. 37, no. 4, pp. 477–493, 2022. [345] Y. Choi, K. Cha, M. Back, H. Choi, and T. Jeon, “Rain-f: A
[324] E. Vosper, P. Watson, L. Harris, A. McRae, R. Santos-Rodriguez, fusion dataset for rainfall prediction using convolutional neural
L. Aitchison, and D. Mitchell, “Deep learning for downscaling network,” in 2021 IEEE International Geoscience and Remote Sensing
tropical cyclone rainfall to hazard-relevant spatial scales,” Journal Symposium IGARSS. IEEE, 2021, pp. 7145–7148.
of Geophysical Research: Atmospheres, p. e2022JD038163, 2023. [346] J. E. Johnson, Q. Febvre, A. Gorbunova, S. Metref, M. Ballarotta,
[325] S. Ashkboos, L. Huang, N. Dryden, T. Ben-Nun, P. Dueben, J. L. Sommer, and R. Fablet, “Oceanbench: The sea surface height
L. Gianinazzi, L. Kummer, and T. Hoefler, “Ens-10: A dataset for edition,” 2023.
post-processing ensemble weather forecasts,” Advances in Neural [347] P. Bommer, M. Kretschmer, A. Hedström, D. Bareeva, and
Information Processing Systems, vol. 35, pp. 21 974–21 987, 2022. M. M. C. Höhne, “Finding the right xai method – a guide for
38