0% found this document useful (0 votes)
21 views110 pages

Remote Sensing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views110 pages

Remote Sensing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

remote sensing

Review
Google Earth Engine and Artificial Intelligence (AI):
A Comprehensive Review
Liping Yang 1,2,3, * , Joshua Driscol 1,2 , Sarigai Sarigai 1,2 , Qiusheng Wu 4 , Haifei Chen 5
and Christopher D. Lippitt 1,2

1 Department of Geography and Environmental Studies, University of New Mexico,


Albuquerque, NM 87131, USA; [email protected] (J.D.); [email protected] (S.S.); [email protected] (C.D.L.)
2 Center for the Advancement of Spatial Informatics Research and Education (ASPIRE), University of New
Mexico, Albuquerque, NM 87131, USA
3 Department of Computer Science, University of New Mexico, Albuquerque, NM 87106, USA
4 Department of Geography, University of Tennessee, Knoxville, TN 37996, USA; [email protected]
5 Interdisciplinary Science Co-Operative, University of New Mexico, Albuquerque, NM 87131, USA;
[email protected]
* Correspondence: [email protected]

Abstract: Remote sensing (RS) plays an important role gathering data in many critical
domains (e.g., global climate change, risk assessment and vulnerability reduction of natural hazards,
resilience of ecosystems, and urban planning). Retrieving, managing, and analyzing large amounts
of RS imagery poses substantial challenges. Google Earth Engine (GEE) provides a scalable, cloud-
based, geospatial retrieval and processing platform. GEE also provides access to the vast majority of
freely available, public, multi-temporal RS data and offers free cloud-based computational power
for geospatial data analysis. Artificial intelligence (AI) methods are a critical enabling technology to
automating the interpretation of RS imagery, particularly on object-based domains, so the integration
of AI methods into GEE represents a promising path towards operationalizing automated RS-based
monitoring programs. In this article, we provide a systematic review of relevant literature to identify
Citation: Yang, L.; Driscol, J.;
Sarigai, S.; Wu, Q.; Chen, H.;
recent research that incorporates AI methods in GEE. We then discuss some of the major challenges
Lippitt, C.D. Google Earth Engine of integrating GEE and AI and identify several priorities for future research. We developed an
and Artificial Intelligence (AI): A interactive web application designed to allow readers to intuitively and dynamically review the
Comprehensive Review. Remote Sens. publications included in this literature review.
2022, 14, 3253. https://doi.org/
10.3390/rs14143253 Keywords: Google Earth Engine (GEE); artificial intelligence (AI); machine learning; deep learning;
Academic Editor: Jaime Zabalza
computer vision; remote sensing; cloud computing; geospatial big data; review

Received: 2 May 2022


Accepted: 2 July 2022
Published: 6 July 2022 1. Introduction and Motivation
Publisher’s Note: MDPI stays neutral Big data approaches have been making substantial changes in science and in society at
with regard to jurisdictional claims in large [1,2]. Geospatial big data, which are collected with ubiquitous location-aware sensors
published maps and institutional affil- that are inherently geospatial [3], are a significant portion of big data. The size of such data
iations. is growing rapidly, by at least 20% per year [4]. The United Nations Initiative on Global
Geospatial Information Management (UN-GGIM) estimated that 2.5 quintillion bytes of
data (one quintillion bytes = 1000 petabytes (PB); 1 PB = 1000 Terabytes (TB)) are being
generated every single day, a large portion of which is location-aware. About 25 PB of data
Copyright: © 2022 by the authors.
are being generated per day at Google, a significant portion of which is spatio-temporal
Licensee MDPI, Basel, Switzerland.
data [4]. This trend will accelerate even faster as the world becomes more mobile and as
This article is an open access article
distributed under the terms and
unoccupied aircraft systems (UAS) and satellite imagery are acquired more often and at
conditions of the Creative Commons
higher resolutions [5]. Along with this exponential increase in geospatial big data, the
Attribution (CC BY) license (https:// need for cloud computing and high-performance computing for modeling, analyzing, and
creativecommons.org/licenses/by/ simulating geospatial contents is also rapidly increasing [4]. Geospatial big data have
4.0/). recently gained attention from researchers and practitioners in geographic information

Remote Sens. 2022, 14, 3253. https://doi.org/10.3390/rs14143253 https://www.mdpi.com/journal/remotesensing


Remote Sens. 2022, 14, 3253 2 of 110

science (GIScience) and remote sensing (RS) [6]. Efficient collection, management, storage,
analysis, and visualization of big data have become critical for the development of intelli-
gent decision systems and provide unprecedented opportunities for business, science, and
engineering [7]. Handling the 5 “Vs” (volume, variety, velocity, veracity, and value [8]) of
big data is still a very challenging task. This is even more challenging for RS imagery due
to its large volume (i.e., high resolution and multiple bands) and long timespan; geospatial
big data pose significant challenges to conventional geographic information systems (GIS)
as well as RS approaches and platforms [9–13].
Geospatial big data, especially RS big data, have posed substantial challenges due
to their large volume, high spatial-temporal resolution, and complexity. One of the very
promising and practical solutions for analyzing RS big data is Google Earth Engine (GEE).
GEE is a scalable, cloud-based geospatial retrieval and processing platform. It also provides
access to the vast majority of freely available, public, multi-temporal RS data and offers free
cloud-based computational power for geospatial data analysis [14–16]. More specifically,
GEE provides free access to a multi-PB archive of geospatial datasets spanning over 40 years
of historical and current Earth observation (EO) imagery, including satellite imagery (e.g.,
Sentinel from the European Space Agency (ESA), Landsat from the United States Geologi-
cal Survey (USGS), Moderate Resolution Imaging Spectroradiometer (MODIS) from the
National Aeronautics and Space Administration (NASA), the Cropland Data Layer (CDL)
from the United States Department of Agriculture’s (USDA) and National Agricultural
Statistics Service (NASS), and the National Agriculture Imagery Program (NAIP), also from
the USDA), airborne imagery, weather and climate datasets, as well as digital elevation
models (DEMs) [14,16]. Those RS data can be efficiently imported and processed on the
cloud platform, avoiding the need to download data to local computers for processing [17].
Along with computing and storage resources, GEE also supports many RS algorithms
(e.g., image enhancement, image classification, and cloud masking), which are readily
accessible and customizable and allow data processing and visualization at different scales
through JavaScript or Python Application Program Interfaces (APIs) [14,16,18,19]. These ca-
pabilities reduce most of the time-consuming preprocessing steps needed in traditional RS
approaches. The computational power of GEE along with its comprehensive data catalog
and data processing methods make GEE an ideal platform for solving geospatial big data
problems. GEE allows researchers and practitioners to focus on developing and solving
their domain problems by making it easier to retrieve data and algorithms and to compute
all in one place. For example, the Landsat archive on GEE is already preprocessed for
atmospheric and topographic effects—this saves researchers and practitioners a substantial
amount of time and effort in terms of downloading and preprocessing data [16]. GEE,
with free planetary-scale geospatial big data (solved the data availability, data storage, and
data preprocessing challenges) and free computing resources, facilitates computationally
cumbersome geospatial big data analysis for researchers and practitioners with minimal
local computing and storage resources. GEE, in the parlance of the RS Communication
Model, reduces the number of channels required to construct an RS system, and therefore
the time required to go from query to result [20]. Researchers from a wide range of fields are
able to generate multiscale (local, national, regional, continental, and global scale) insights
that would have been nearly impossible without the geospatial big data and computing
capacity available in GEE [21].
GEE provides the free cloud-computing platform to tackle geospatial big data chal-
lenges, and recent substantial advances in artificial intelligence (AI) can and will further
elevate the power of GEE. We cover three of AI’s main subdisciplines in this paper: com-
puter vision (CV), machine learning (ML) and its subdomain, deep learning (DL). These
technologies are central to leveraging big data for applications in many domains and have
achieved significant advances in a wide range of applications that have a high social impact,
such as damage assessment and prediction of natural disasters (e.g., automatic flooding
damage assessment [1] and wildfire prediction [22]) and healthcare [23–25]. Geospatial
artificial intelligence (GeoAI) combines methods in spatial science (e.g., GIScience and RS),
Remote Sens. 2022, 14, 3253 3 of 110

AI, data mining, and high-performance computing to extract meaningful knowledge from
geospatial big data [26]. GeoAI stems from GIScience methods applied to RS data but has
advanced the field of AI to solve geospatial-specific big data challenges and problems.
There are substantial separate bodies of research covering AI (especially CV, ML and DL)
and GEE. However, much less research directly combines AI and GEE. Allowing researchers
and practitioners to harness the power of both GEE and AI for their research and real-world
problems is the core motivation for us to investigate a range of recent developments that
combine GEE and AI. Thus, our paper can serve as an academic bridge for researchers and
practitioners in GEE and AI, highlighting how scientists are using GEE and AI and in which
domain areas. Researchers and practitioners in GEE and AI will gain strength from each other
and thus make the science move forward more effectively and efficiently, making it possible
to tackle global challenges such as those relevant to climate change.

1.1. Selection Criterion for Reviewed Papers and Brief Graphic Summary
There is a substantial body of work on GEE (e.g., see recent reviews [14,18,27–29]) and
AI for RS (especially DL, ML and CV used in a RS setting, see recent reviews in [30,31]),
respectively. However, much less research has gone into detailing the integration of GEE
with AI. In the literature review process, we initially identified 500+ papers relevant to GEE.
We then performed a systematic search based on the following strategies: (1) keyword
search on Google Scholar: the keywords used for our literature search are “Google Earth
Engine” AND “machine learning” OR “deep learning” OR “computer vision”; (2) references
tracking: we went through the papers cited in recent GEE reviews ([14,18,27–29]) (i.e., the
“References” list of the papers) and also tracked the last two years’ worth of new papers
citing the existing GEE review papers on their Google Scholar page. Note that our search
was restricted to research articles published in English and in peer-reviewed journals or
conference proceedings. A total number of 200 highly relevant articles were identified
by excluding the papers that purely use GEE for RS data download or those that do not
use AI (including its branches CV, ML, DL). Figure 1 shows the spatial distribution and
statistics summary of the papers covered in this review. The number of published papers by
year (2015 to 2022) has dramatically increased since 2019. “Remote Sensing” and “Remote
Sensing of Environment” are the leading journals where most GEE and AI papers are
published. In addition, most first authors institutions are based in China and the United
States. (Note that a freely accessible interactive version of the map and all charts throughout
the paper can be accessed via our web app tool; the web app tool URL and its brief demo
video are provided in Appendix A).

1.2. Roadmap
Here, we provide a roadmap to the rest of the paper. Section 2 outlines the scope
of this review and our intended audience. Section 3 is the core of the paper, focused
on identifying important and recent developments and their implications in terms of
applications (Section 3.2) and novel methods (Section 3.3) that leverage GEE and AI.
Section 3 covers a wide array of recent research combining GEE and AI from multiple
domains with many cross-connections. The paper concludes in Section 4 with a discussion
of key challenges and opportunities, from both application (Section 4.2) and technical
(Section 4.3) perspectives. Specifically, we focus on the main challenges preventing GEE
and AI integration, as well as some possible future research directions. To make the
substantial number of papers we reviewed (200 total) more transparent and easier to
retrieve and understand, we developed an interactive web tool (see Appendix A for details).
As evaluation metrics are essential for measuring the performance of AI/ML/DL/CV
models, we provide a set of commonly used evaluation metrics in Appendix B. To make
the main text of the paper concise, each application area detailed in Section 3.2 contains a
table and brief textual summary of the papers in that field. However, a more detailed and
comprehensive summary for each section can be found in Appendix C for those that are
Remote Sens. 2022, 14, 3253 4 of 110

Remote Sens. 2022, 14, x FOR PEER REVIEW 4 of 121


interested. Lastly, as there are plenty of acronyms in this paper, we provide a full list of
abbreviations right before the Appendices A–C.

(a)

(b) (c) (d)


Figure
Figure 1.
1. Geospatial
Geospatialdistribution
distributionand
and overview
overview statistics
statistics of
of the
the reviewed
reviewed 200
200 papers.
papers. (a)
(a) Spatial
Spatial
distribution of reviewed papers based on the first author’s institution location, (b) number
distribution of reviewed papers based on the first author’s institution location, (b) number of pub- of
published papers by year from 2015 to early 2022, (c) journals the review papers are published in,
lished papers by year from 2015 to early 2022, (c) journals the review papers are published in, and
and (d) country distribution. Note that a freely accessible, interactive version of the map and all
(d) country distribution. Note that a freely accessible, interactive version of the map and all charts in
charts in this paper can be accessed via our web app tool (the web app tool URL and its brief demo
this paper
video can be accessed
are provided via our
in Appendix web app tool (the web app tool URL and its brief demo video are
A).
provided in Appendix A).
1.2. Roadmap
2. Scope and Intended Audience
Here, we provide
It is very a roadmap
challenging to the rest
to repeatedly of theup-to-date
produce paper. Section
and 2accurate
outlinesmaps
the scope of
and ob-
this review and our intended audience. Section 3 is the core of the paper, focused
tain up-to-date and accurate information, especially at large scales, for many important on
identifying
applicationsimportant and recent
and monitoring developments
systems and their
due to time, effort, implications
and cost. As largerinvolumes
terms ofof
applications (Section 3.2) and novel methods (Section 3.3) that leverage GEE and AI.
Section 3 covers a wide array of recent research combining GEE and AI from multiple
domains with many cross-connections. The paper concludes in Section 4 with a discussion
Remote Sens. 2022, 14, 3253 5 of 110

geospatial data become available, an ever-increasing number of techniques for analyzing


them have increased the number and scope of monitoring applications (e.g., global water
mapping [32], forest and deforest monitoring [33], and global climate change research [34]).
Downloading, analyzing, and managing a multi-decadal time series of satellite im-
agery over large areas is not practical using desktop computing resources [35]. Complement-
ing huge volumes of open-access satellite data are new technologies and services (e.g., cloud
computing, AI) that are shifting the manner in which RS data are used for environmental
monitoring. The technologies and platforms present opportunities for new advances in data
collection for monitoring climate change mitigation, particularly where traditional means
of data exploration and analysis, such as government-led statistical census efforts, are costly
and time consuming [36]. GEE [37], launched on 2 December 2010, receives significant
attention in the earth science community because it provides free and dedicated geospatial
data resources and services. These services include RS imagery storage, preprocessing
routines, and hosting AI algorithms all in one place. Similar cloud computing platforms
and services that support geospatial data hosting and/or computation include the NASA
Earth Exchange (NEX) [38] (2013) and Geostationary-NEX (GeoNEX) [39] (2020), Earth on
Amazon Web Services (AWS) [40] (launched in September 2016), Microsoft’s Azure services
(Geospatial Analytics in Azure Stream Analytics) [41] (launched in 2017) and Microsoft
Planetary Computer [42] (launched in December 2020). GEE has a bigger community, more
data, and more algorithms all in one place whereas these other systems came later and
are not as robust. Building on the emerging needs of GEE and AI to process and analyze
big RS data via cloud computing for many domain problems, like any review paper, one
of our major goals is to survey recent work on GEE with AI to provide suggestions for
new directions built upon these evolving methods. Another important goal of this paper is
to provide a bridge between GEE and AI researchers and practitioners, especially those
who have interdisciplinary backgrounds and expertise. It is our hope that this paper helps
move towards a smoother and deeper integration of GEE with AI. As our comprehensive
investigation shows, there are still several open challenges preventing researchers from
using GEE and AI for their research (for example, the lack of options for those interested in
using DL models on the platform, detailed further in Section 4.1.2).
This comprehensive review is relevant to any research, practice, and education do-
mains that could take advantage of RS imagery coupled with AI and cloud computing,
including, but not limited to RS, GIScience, earth science, geosciences, computer science,
data science, information science, hydroinformatics, and image analysis. This paper does
not attempt to review publications that use GEE without utilizing AI methods (for recent
reviews of GEE, see [14,18,28,29]), nor to review DL with RS (recent reviews on DL with
RS can be found in [30,31,43,44]). This review focuses on investigating recent GEE work
that has integrated with AI, including its branches ML, DL, and CV, for a wide and various
range of applications (e.g., crop, wetland, and water mapping, detailed in Section 3.2). From
our review, only a small subset of papers contribute significantly towards implementing
novel AI architecture or methods within GEE (detailed in Section 3.3).
To our knowledge, no review solely focuses on the combination of GEE and AI. Our
review has a narrowed scope of GEE integrated with AI, but a wider and deeper scope in
terms of AI methods and metrics applied to many domains on the GEE platform. Other GEE
review papers have sections for ML models (“X papers use random forest (RF) models and
Y use Support Vector Machines (SVMs) . . . ”) as part of a general GEE review, but do not
explicitly focus on the implementation of AI models in GEE. In addition, what significantly
distinguishes our review from other GEE survey papers is an interactive web app, named
iLit4GEE-AI (see Appendix A). We developed this app to allow our readers to intuitively
retrieve relevant GEE with AI literature that fits their needs. For example, a user can very
quickly filter out those published articles that used RF models and F1-score for wetland
mapping. Most importantly, iLit4GEE-AI will serve as a live and interactive literature
repository for integrating GEE with AI, as we will continue to update the data in the web
app. In the future, we hope our web app will serve as an important and up-to-date resource
Remote Sens. 2022, 14, 3253 6 of 110

for the GEE and AI research and practice community. Through our deep, thorough, and
interactive investigation (see Appendix A for a visual, interactive investigation using our
web app iLit4GEE-AI), we hope to develop a basis for a smoother and deeper integration of GEE
and AI, which will help move many domains forward. Further, many of the domains presented
in this paper (Section 3.2) are highly related, as different aspects of our environment are
inherently linked. By aggregating research across domains and making it searchable and
filterable, we hope to spur innovation, collaboration, and code sharing between researchers
in the pursuit of tackling cross-disciplinary, complex issues such as those related to global
warming. For example, water body identification, deforestation monitoring, and wildfire
detection are all separate domains, but researchers and practitioners in different domains
may use common data sources, processing methods, and algorithms in their final results.
As we continue to compile papers written at the intersection of GEE and AI via our web
app tool iLit4GEE-AI, it will become easier for researchers to find relevant literature and
code resources even if they are from different areas of study.

3. The State of the Art: GEE with AI


In this section, we firstly provide an overview of the reviewed 200 studies using GEE and
AI (Section 3.1). Then we investigate studies leveraging GEE with AI from the perspective of
applications (Section 3.2) and highlight those with novel methods in Section 3.3.

3.1. Overview of the Reviewed Studies


In this paper, we have reviewed 200 papers. A word-cloud visualization of the
titles and keywords of the reviewed 200 papers are provided in Figure 2. The word
clouds provide an informative (general and specific) focus of those reviewed papers.
The most frequently used words are “Google Earth Engine”, “classification”, “imagery”,
“machine learning”, “mapping”, “remote sensing”, and “detection”. We can also see that
there are specific keywords, such as “cloud”, “water”, “forest”, “crop”, “soil”, “fire” and
“urban” reflecting many of the categories we identified for this review paper. Additionally,
“Landsat”, “Sentinel-1”, and “Sentinel-2”, “SAR” (or synthetic aperture radar) as well as
“China”, “Brazil”, and “Asia” detail some of the many study areas from these 200 7papers.
Remote Sens. 2022, 14, x FOR PEER REVIEW of 121
Note that we only included and reviewed the papers that integrate GEE and AI (including
its branches ML/DL/CV).

Figure
Figure 2. Word-cloud
Word-cloud visualization of all the reviewed 200 papers that leverage GEE
GEE and
and AI.
AI.

Figure 3 shows
Figure shows that most published work leveraging
leveraging thethe power of GEE integrated
with AI is still at the application stage and that there is room to to
with AI is still at the application stage and that there is room develop
develop novel
novel methods
methods to
to advance earth observation in relevant fields. To break this down further,
advance earth observation in relevant fields. To break this down further, in (b) we can in (b) we can
see
see that
that ML
ML is theisdominant
the dominant method,
method, and inand in (c)
(c) the the most-used
most-used tasks
tasks are are classification.
classification. In
In Figure
Figure 4, the primary applications that have applied GEE integrated with
4, the primary applications that have applied GEE integrated with AI are crop, LULC,AI are crop, LULC,
vegetation, wetland,
vegetation, wetland,water,
water,and
andforest,
forest, and
and that
that primary
primary studystudy areas
areas are China,
are China, Brazil,
Brazil, and
and the United States. The most-used RS data types are Landsat 8 OLI,
the United States. The most-used RS data types are Landsat 8 OLI, and Sentinel-2. From and Sentinel-2.
Figure 5, we see that the most-used ML models are RF, SVM, and CART, while the top
evaluation metrics used are: overall accuracy (OA), producer’s accuracy (PA), user’s
accuracy (UA), and Kappa. (Note that a freely accessible interactive version of the map
and charts can be accessed via our web app tool; the web app tool URL and its brief demo
Figure 3 shows that most published work leveraging the power of GEE integrated
with AI is still at the application stage and that there is room to develop novel methods to
advance earth observation in relevant fields. To break this down further, in (b) we can see
that ML is the dominant method, and in (c) the most-used tasks are classification. In Figure
4, the primary applications that have applied GEE integrated with AI are crop, LULC,
Remote Sens. 2022, 14, 3253 7 of 110
vegetation, wetland, water, and forest, and that primary study areas are China, Brazil, and
the United States. The most-used RS data types are Landsat 8 OLI, and Sentinel-2. From
Figure 5, we see that the most-used ML models are RF, SVM, and CART, while the top
From Figuremetrics
evaluation 5, we see thatare:
used the overall
most-used ML models
accuracy (OA),are RF, SVM,accuracy
producer’s and CART, while
(PA), the
user’s
top evaluation metrics used are: overall accuracy (OA), producer’s accuracy
accuracy (UA), and Kappa. (Note that a freely accessible interactive version of the map (PA), user’s
accuracy
and charts(UA),
can beand Kappa.via
accessed (Note that app
our web a freely
tool;accessible interactive
the web app tool URLversion
and its of thedemo
brief map
and charts can be accessed via our web app tool; the web app tool URL and its
video are provided in Appendix A; also, in Appendix B, we provide our resources for an brief demo
video are provided
introduction of a listinofAppendix
commonly A;used
also,evaluation
in Appendix B, we provide our resources for an
metrics).
introduction of a list of commonly used evaluation metrics).

(a) (b) (c)


Figure 3. 3. Overview statistics
statistics of
of methods
methods used
used in
in the
the reviewed
reviewed 200
200 papers.
papers. (a)
(a) method
method or
or application
application
Remote Sens. 2022, 14, x FOROverview
Figure PEER REVIEW 8 of
oriented, (b)
oriented, (b) methods
methods at
at macro
macro level
level (CV,
(CV, ML
ML or
or DL),
DL), (c)
(c) methods
methods at
at detailed
detailed level
level (classification,
(classification,
regression, or segmentation).
regression, or segmentation).

(a) (b) (c)


Figure
Figure 4. Statistics of the4.reviewed
Statistics of the reviewed
papers in termspapers in terms focus
of application of application focus
(a), study area(a), study
(b), and area
RS (b), and
data type used (c).
data type used (c).

3.2. Advances in Applications


We organized the following subsections according to total citation count. Thus, readers
will start in the thematic research area with the highest number of citations that use
ML/DL/CV on GEE. As readers move through Section 3.2, they will then be covering
topics with a less developed presence on GEE (that the authors are aware of) that also utilize
ML/DL/CV. For each subsection, a table with information such as study area, RS data type,
and what sort of ML/DL model or CV algorithm the authors used will accompany each
reference. Note that each table in this section is ordered chronologically to show trends
in data type and model usage. Each table will be accompanied by a word cloud showing
terms from paper titles and keywords given by the authors. For Sections 3.2.8–3.2.18,
there are not enough publications to make the word clouds informative, so in addition to
titles and keywords we also include the abstract text. Below each table are accompanying
summaries for each reference in the table. References with an “*” next to them denote that

(a) (b)
Remote Sens. 2022, 14, 3253 8 of 110

a specific paper used novel methods or went beyond a straight-forward application of data
and methods on GEE. In this paper, we define a very narrow view of what a novel method
is: using ML/DL/CV models and algorithms in new ways on GEE (see Section 3.3 for more
details on novel GEE methods). This means that even if a paper combined data in a new
way or formed a new data preprocessing method, their paper was deemed an application
(a) our focus is on ML/DL/CV
since (b) methods. Each paper (c)we reviewed in Section 3.2 is
grouped into
Figure a specific
4. Statistics subsection,
of the ranging
reviewed papers from
in terms 2 total citations
of application focus (a), (Bathymetric mapping)
study area (b), and RS to
37 citations
data type(Crop mapping).
used (c).

(a) (b)

FigureFigure 5. Statistics
5. Statistics of modelscompared,
of models compared, and
andevaluation
evaluationmetrics used used
metrics in the in
reviewed 200 papers.
the reviewed 200 papers.
(a) Models used and compared in the reviewed studies and (b) evaluation metrics used.
(a) Models used and compared in the reviewed studies and (b) evaluation metrics used.
3.2. Advances in Applications
Note that papers could be divided up into several different sections. Thus, there is
We organized the following subsections according to total citation count. Thus,
some readers
subjectivity in this
will start in theassignment of categories
thematic research area withto different
the papers,ofsocitations
highest number readersthat
should be
awareusethat there is on
ML/DL/CV overlap.
GEE. AsGenerally,
readers movemost papers
through could
Section 3.2, be
theyclassified as covering
will then be LULC or “land
use and land
topics cover”.
with Agriculture,
a less developed vegetation,
presence on GEE water,
(that theand forests
authors are are all of)
aware classes commonly
that also
utilizeinML/DL/CV.
classified For each Our
LULC analyses. subsection, a table
intention withtoinformation
was represent such as study
the focus ofarea, RS paper.
a given
data type,if and
For example, what predicted
a paper sort of ML/DL model or
for several CV algorithm
classes the authors
in their analysis butused
theirwill
focus was
accompany each reference. Note that each table in this section is ordered chronologically
on producing deforestation maps, even for papers that had “LULC” in their title, then
to show trends in data type and model usage. Each table will be accompanied by a word
their paper will be under “Forest and deforestation monitoring”. Similarly, if a paper was
cloud showing terms from paper titles and keywords given by the authors. For Sections
creating
3.2.8–3.2.18, theremaps
vegetation ofenough
are not tidal flats, then this
publications paper
to make theisword
under “Vegetation
clouds mapping”
informative, so in and
not “Wetland
addition tomapping”. As another
titles and keywords example,
we also include if authors
the abstractwere monitoring
text. Below vegetation
each table are or
wateraccompanying
indices in RSsummaries
imagery but theirreference
for each goal wasintothemonitor reclamation
table. References withprogress
an “*” nextortopollution
levels at mining sites, their papers would be found under “Heavy industry and pollution
monitoring”. Only in the case where the goal was to expressly create a general LULC map
would that paper go under “Land cover classification”.

3.2.1. Crop Mapping


Crop mapping is the most-well-developed application using GEE and AI (37 studies).
Table 1 below summarizes those studies and a word cloud generated from the titles and
keywords of the 37 studies is provided in Figure 6. The most frequently used words are
“Google Earth Engine”, “mapping”, and “classification”. However, the term “yield” also
reflects that regression tasks such as yield prediction are almost as common as classification
tasks such as producing maps. Additionally, specific words such as “30-m” and “Asia”
indicate the spatial resolution and coverage of the RS imagery used in the reviewed papers.
From our interactive web app (see Appendix A) and Table 1, Landsat 8 Operational Land
Imager (OLI), Shuttle Radar Topography Mission (SRTM) DEM, and Sentinel-2 are the most-
used RS data. The most popular AI models are RF, SVM, Classification and Regression Trees
Remote Sens. 2022, 14, 3253 9 of 110

(CART), and k-means models; and the most-used evaluation metrics are user’s accuracy
(UA), producer’s accuracy (PA), Kappa, and R2 . A brief summary of those studies is
provided right below Table 1. More detailed textual summaries for most of the reviewed
crop mapping studies are provided in Appendix C.1.

Table 1. Studies targeting crop mapping from RS imagery using AI (Note that references marked *
denotes novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


Cropland Data Layer,
Lobell et al. (2015) [45] regression multiple linear regression United States
Landsat 5, Landsat 7
CART, ensemble NN,
Landsat 8 OLI,
Shelestov et al. (2017) [46] classification IKPamir, MLP, NB, Ukraine
Landsat 8 TOA
RF, SVM
Landsat 8 OLI TOA,
classification,
Xiong et al. (2017) [47] RF, RHSeg, SVM Sentinel-2 MSI TOA, Africa (continent)
segmentation
SRTM DEM
Africover, CUI, FROMGC,
GCEV1, GLC 2000,
Global30, Globcover,
Xiong et al. (2017) [48] classification DT, k-means Africa (continent)
GRIPC, IKONOS, LULC
2000, MCD12Q1, MYD13,
QuickBird, WorldView 2
AIM-RRB, Cropland Data
Layer, Landsat 5 TM,
classification,
Deines et al. (2017) [49] CART, RF Landsat 7 ETM+, Landsat United States
regression
8 OLI, NLCD, USGS
MIrAD-US
Google Earth Pro, Landsat
7 TM, Landsat 8 OLI,
Teluguntla et al. (2018) [16] classification RF Australia, China
National Geospatial
Agency
Landsat 8 TOA, SRTM
Kelley et al. (2018) [50] classification RF Nicaragua
DEM
Landsat 7, Landsat 8,
k-means, region growing MOD09A1, MOD09Q1,
Ragettli et al. (2018) [51] classification Kazakhstan, Kyrgyzstan
clustering algorithm, RF MYD09A1, MYD09Q1,
Sentinel 2, SRTM DEM
Ghazaryan et al. (2018) [52] classification decision fusion, RF, SVM Landsat 8 OLI, Sentinel-1 Ukraine
Mandal et al. (2018) [53] classification k-means Sentinel-1 India
Brunei, Cambodia,
GeoEye, Landsat 7 ETM+,
Indonesia, Japan, Laos,
Landsat 8 OLI, National
Malaysia, Myanmar,
Oliphant et al. (2019) [54] classification RF Geospatial Agency,
North Korean, Philippines,
Quickbird, SRTM DEM,
South Korean, Thailand,
WorldView
Vietnam
CNN, CNN-LSTM hybrid, Cropland Data Layer,
Sun et al. (2019) [55] regression United States
LSTM MOD11A2, MOD09A1
Wang et al. (2019) [56] classification ANN, CART, RF, SVM Sentinel 2 China
Sentinel-1, Sentinel-2 MSI,
Tian et al. (2019) [57] classification RF China
SRTM DEM
Cropland Data Layer,
GIAM, GMIA, LANID,
Landsat 5, Landsat 7,
Xie et al. (2019) [58] classification RF United States
Landsat 8, MIrAD-US,
MOD11A2, MOD13Q1,
NAIP, NLCD
Remote Sens. 2022, 14, 3253 10 of 110

Table 1. Cont.

References Method Model Comparison RS Data Type Study Area


classification, Sentinel-1, Sentinel-2 MSI,
Jin et al. (2019) [59] linear regression, RF Kenya, Tanzania
regression MYD11A2
Rudiyanto et al. (2019) [60] classification ANN, k-means, RF, SVM Google Earth, Sentinel-1 Indonesia, Malaysia
Cropland Data Layer,
Wang et al. (2019) [61] classification GMM, k-means, RF Landsat 5 TM, Landsat 7 United States
ETM+, Landsat 8 OLI
Cropland Data Layer,
Liang et al. (2019) [62] classification CART Landsat 5 TM, Landsat 8 United States
OLI
Landsat 7, Landsat 8,
Tian et al. (2019) [63] classification DT MOD13Q1, Sentinel-2, China
SRTM DEM
Neetu et al. (2019) [64] classification CART, RF, SVM Sentinel-2 MSI India
GeoEye, GDEM, IKONOS,
Bangladesh, Bhutan, India,
Gumma et al. (2020) [65] classification RF Landsat 7, Landsat 8,
Nepal, Pakistan, Sri Lanka
Quickbird, WorldView
BGT, BST, DT, GPR, KNN,
Han et al. (2020) [66] regression MODIS Terra China
NN, RF, SVM
Google Earth, Landsat 7
ETM+, Landsat 8 OLI, Asia, Europe, Middle East
Phalke et al. (2020) [67] classification CART, RF, SVM
National Geospatial (multiple countries)
Agency, SRTM DEM
Landsat 8 OLI, Rapid Burkina Faso, Mali,
Samasse et al. (2020) [19] classification RF
Land Cover Mapper Mauritania, Niger, Senegal
JRC Global Surface Water,
Chen et al. (2020) [68] classification RF China
Sentinel-1, Sentinel-2 MSI
Canada’s Annual Crop
classification,
Amani et al. (2020) [69] * ANN, SNIC Inventory, MCD12Q1, Canada
segmentation
Sentinel-1, Sentinel-2
Google Earth, Sentinel-1,
You and Dong (2020) [70] classification RF China
Sentinel-2 MSI
Poortinga et al. (2021) [71] segmentation U-Net NICFI Planet Thailand
classification, DnCNN, RF, SegNet, Sentinel-1, Sentinel-2,
Adrian et al. (2021) [72] * United States
segmentation U-Net, 3D U-Net WorldView 3
Cao et al. (2021) [73] regression CNN, DNN, LSTM, RF MOD13A2, SRTM DEM China
Luo et al. (2021) [74] classification RF, SNIC Sentinel-1 China
Ni et al. (2021) [75] classification SVM Sentinel-2 China
FROM-GLC10, MOD15A3,
Sun et al. (2022) [76] regression RF China
Sentinel-2
classification,
Li et al. (2022) [77] RF, SNIC Google Earth, Sentinel-2 China
segmentation
Landsat 5 TM, Landsat 8
Han et al. (2022) [78] classification RF China
OLI, Sentinel-2
ALOS PALSAR, Google
fuzzy rules, Maximum
Hedayati et al. (2022) [79] classification Earth, Landsat 8, Iran
Likelihood
MYD11A2

Creating country-level, crop-specific maps using RS data can be difficult because of


the large amount of data involved. GEE provides data storage and online processing capa-
bilities, greatly ameliorating the issues with downloading data and managing computing
resources. Several algorithms were compared in [46] on GEE, CART, IKPamir, logistic
regression, a MLP, NB, RF, and an SVM, for crop-type classification in Ukraine. The authors
also used an ensemble NN but had to move off the GEE platform since NNs were not
currently supported. It is often difficult to map croplands on a large scale using RS imagery
because of a lack of ground-truth validation data. However, there are also problems re-
not “Wetland mapping”. As another example, if authors were monitoring vegetation or
water indices in RS imagery but their goal was to monitor reclamation progress or
pollution levels at mining sites, their papers would be found under “Heavy industry and
pollution monitoring”. Only in the case where the goal was to expressly create a general
LULC map would that paper go under “Land cover classification”.
Remote Sens. 2022, 14, 3253 11 of 110

3.2.1. Crop Mapping


Crop mapping is the most-well-developed application using GEE and AI (37 studies).
lating1to
Table differing
below cultivation
summarizes techniques
those studies andandadefinitions
word cloudofgenerated
what makesfromupthe
cropland.
titles andTo
address these issues, the authors in [67] collected large amounts of training
keywords of the 37 studies is provided in Figure 6. The most frequently used words are points from
Google Earth
“Google Earth Engine”,
imagery “mapping”,
and analyzed andLandsat and DEMHowever,
“classification”. data to create a cropland
the term data
“yield” also
layer across Europe, the Middle East, and Russia. Accurate classification and mapping of
reflects that regression tasks such as yield prediction are almost as common as
crops is essential for supporting sustainable land management. A two-step approach for
classification tasks such as producing maps. Additionally, specific words such as “30-m”
crop identification in the central region of Ukraine was developed in [52] by exploiting
and “Asia” indicate the spatial resolution and coverage of the RS imagery used in the
intra-annual variation of temporal signatures of remotely sensed observations (Sentinel-1
reviewed papers. From our interactive web app (see Appendix A) and Table 1, Landsat 8
and Landsat images) and prior knowledge of crop calendars. Crop maps are often created
Operational Land Imager (OLI), Shuttle Radar Topography Mission (SRTM) DEM, and
using vegetation indices and field observation data. The authors in [73] argued that this
Sentinel-2 are the most-used RS data. The most popular AI models are RF, SVM,
may lead to datasets and ML models that can only predict in specific areas and not gen-
Classification and Regression Trees (CART), and k-means models; and the most-used
eralize up to larger areas (i.e., regions or countries) or to other time periods in the same
evaluation metrics are user’s accuracy (UA), producer’s accuracy (PA), Kappa, and R2. A
area. They further argued that what is needed is a more generalized method that can
brief summary of those studies is provided right below Table 1. More detailed textual
take in information like weather and climate data or DEM data and scale up to field-level
summaries for most of the reviewed crop mapping studies are provided in Appendix C.1.
predictions or larger.

Word-cloudvisualization
Figure6.6.Word-cloud
Figure visualization ofof
allall
thethe reviewed
reviewed papers
papers targeting
targeting crop
crop mapping
mapping (i.e.,(i.e.,
thosethose
37
37 papers
papers summarized
summarized in Table
in Table 1). 1).

Agricultural expansion can cause harmful effects to ecosystems and their levels of
biodiversity. Producing crop-type maps using RS imagery and ML is one way to help mon-
itor agricultural expansion over large areas, and these maps in turn can help policymakers
and land-use managers make more informed decisions about current and future land use.
However, creating the maps themselves normally requires a lot of data and it is not a
straightforward task to pick an ML model that will perform well with that data. There is
also the concern that the predictions from that ML model will be uninterpretable, given that
many ML and DL models are so-called “black boxes”. To get around this issue, the authors
in [79] trained a maximum likelihood model and a fuzzy-rules classifier to determine paddy
rice distribution in Iran. Plants look very different in RS imagery depending on the type of
imagery that you use, but also over the course of a plant’s lifetime. This is especially true
of crops like rice, so it is important to incorporate phenological information in order to be
able to monitor it over time. Over a three-year time period, the authors in [75] were able
to map paddy rice using Sentinel imagery by utilizing several different spectral indices
and creating composites of different paddy rice growth periods. Continued agricultural
expansion threatens many ecosystems around the globe with high levels of biodiversity.
Being able to monitor agricultural expansion is one part in being able to make timely
decisions related to water and soil health in addition to pollution levels caused by fertilizer
use. Mapping croplands over a large scale with NNs and high-resolution RS imagery has
resulted in highly accurate maps, but NNs are computationally expensive to train. A U-Net
was used in [71] to map sugarcane in Thailand but used a lightweight NN as an encoder for
the DL model to reduce computing costs. Sugarcane grows in rainy conditions in complex
landscapes, making mapping it difficult. However, using phenology information can help
identify sugarcane in high-resolution RS imagery as shown in [56]. The performance of
ANN to CART, RF, and SVM models on GEE was compared for sugarcane mapping in
China using Sentinel-2 imagery. Shade-grown coffee landscapes are critical to biodiversity
Remote Sens. 2022, 14, 3253 12 of 110

in the forested tropics, but mapping it is difficult because of mountainous terrain, cloud
cover, and spectral similarities to more traditional forested landscapes. Landsat, precip-
itation, and DEM data were used in [50] to map shade-grown coffee in Nicaragua using
an RF model. Accuracy scores across different land class types (including shade-grown
coffee) were high; a relative variable importance was also analyzed on what data con-
tributed most to the RF model’s performance. It is difficult to know beforehand the effect
different datasets will have on producing LULC maps. It is therefore useful to compare the
performance of a ML classifier on different datasets like Landsat and Sentinel imagery, so
that future researchers know which datasets fit their application. The differences between
Landsat and Sentinel imagery were explored in [78] for identifying cotton in China over
the course of the plant’s life cycle.
Crop maps are increasingly being produced at the national and global levels, but this
process requires a lot of compute resources. Cloud computing offers free access to data and
computing, yet many studies producing crop maps and crop yield estimates do not take
advantage of these resources. In the United States, crop yield estimates for soybeans start very
late in the season, but early estimates are needed to inform management decisions like when
to harvest. The authors in [55] used a CNN–LSTM hybrid model to predict soybean yield in
the contiguous United States using RS imagery alongside weather data and showed that the
hybrid approach works better than either CNN or LSTM alone, although the results were
better in some states than others. Additionally, the authors created combinations of input
data to determine which variables were most important in training their NN. Still, the authors
had to move their DL training off the GEE platform because it did not currently support
NN architectures. Many variables, including climate/weather, fertilizer, soil, economic,
and hydrological data, can be incorporated into crop yield prediction simulation models.
However, this amount of data, needed to make the crop models accurate, is often not available
in specific countries or are too time-consuming and cost-intensive to collect and maintain. RS
imagery can help fill this need by providing open data over long temporal scales and global
coverage, regardless of country. The authors in [66] demonstrated that by using climate and
soil data with RS imagery on the GEE platform, it was possible to predict winter wheat yields
1–2 months ahead of harvesting in China. Producing crop type maps is often a useful first step
in predicting crop yield. However, crop type maps that are derived from lower-resolution RS
data suffer from uncertainties in areas where soil, crops, and plants are heavily mixed. Current
cropland products only focus on a subset of staple crops. Optical and SAR Sentinel data were
combined in [72] to create higher-resolution maps capable of displaying information on less
commonly mapped non-staple crops in the US.
It is challenging to map cropland extent over large countries or regions in a rapid,
repeatable, and accurate manner. This is in part due to the large amount of RS imagery
that are usually required to make these maps, in addition to needing to access validation
datasets in comparable formats across geo-political boundaries. Even when this is possible,
crop maps are created using coarse RS imagery, limiting the utility of the output crop maps.
In [16], the authors feed RS imagery with elevation and government data in Australia
and China into an RF model to produce crop extent maps at 30 m, 250 m, and 1 km
resolutions. It is difficult to achieve continuous, cloud-free imagery in Australia and China
over time, so this analysis depends on creating bi-monthly composites. The authors noted
that this analysis could have benefitted from a larger dataset in addition to comparing more
classification algorithms to help reduce uncertainties from the RF model. LAI and fraction
of photosynthetically active radiation (FPAR) are two important features while trying to
produce crop extent maps and crop yield estimates. However, most current products for
producing crop extent maps and crop yield estimates are derived from low-resolution RS
imagery. In order to produce these maps and estimates at a higher resolution, the authors
in [76] utilized GEE, Sentinel-2 and field data to train an RF to first estimate LAI and FPAR
at a much finer spatial scale.
Global crop maps often fail to capture small farms because the resolution of the RS
imagery used to create the maps is too coarse. Additionally, agricultural areas change over
Remote Sens. 2022, 14, 3253 13 of 110

time and so the underlying validation (which is hard to acquire in the first place) often changes.
Thus, producing high-resolution maps that track agricultural areas that are able to track crop
production over time in an accurate fashion has proved difficult. Landsat-8 and Sentinel-2
imagery were combined in [47] with elevation data to produce a crop map across continental
Africa on the GEE platform. Crop maps that are produced to cover a large area are often
created from coarse RS imagery. This poses problems with identifying small or fragmented
farms, as well as farms that are mixed-use or have several crop types over the same small area.
Several attempts have been made to map land-use classes over large areas, but these maps do
not focus specifically on crops and so their utility to food production studies is limited. To
address these issues, [54] used RS imagery from several different platforms (GeoEye, Landsat,
NGA, Quickbird, WorldView) to produce a 30-m resolution crop map for Southeast and
Northeast Asia. Using an RF model, the authors achieved high accuracy rates across several
crop type classes and made the resulting data layer public. However, to create cloud-free
scenes from optical imagery across countries, the authors had to rely on multi-year composites.
The authors noted that in the future, a harmonized Landsat–Sentinel dataset would be useful
to expand spatial and temporal data coverage.
Sustainable management of agricultural water resources requires improved under-
standing of irrigation patterns in space and time. Annual irrigation maps (1999–2016) in the
US Northern High Plains were produced in [49] by combining all available Landsat satellite
imagery with climate and soil covariables in an RF classification workflow. In [51], the
authors implemented an automatic irrigation mapping procedure in GEE that uses surface
reflectance satellite imagery from different sensors (Landsat 7/8, Sentinel-2, MODIS Terra
and Aqua imagery, SRTM DEM). A rapid method was developed to map Landsat-scale
(30 m) irrigated croplands in [58] across the conterminous United States (CONUS). The
method was based upon an automatic generation of training samples for most areas based
on the assumptions that irrigated crops appear greener than non-irrigated crops and had
limited water stress.
Cropland classification is highly dependent on RS imagery resolution, the scale of a
given analysis, the processing steps, and the input training data. Coarse resolution cropland
data products have been found to contain large errors, but even higher resolution maps
tend to have low accuracy rates and overestimate overall crop area. An open-source map
was created in [19] for several West African countries using an RF model trained on Landsat
data. The amount of RS data collected is increasing every day. This poses a problem for
how best to analyze RS imagery and extract useful information from it, regardless of the
EO domain. The authors in [77] implemented a dynamic feature importance tool that
automatically finds the most important subset of input features for identifying crop types
in China. They then fed these features to the SNIC algorithm and then to an RF on GEE and
combined the output predictions with growth period information to produce crop-type
maps that incorporate plant phenology. By incorporating growth stage information as
an input feature to the ML model, the authors achieved a 6–7% boost in OA, precision,
and recall across different crops like rice, maize, and soybeans. In their paper, the authors
showed that red edge, NDVI, red, SWIR2, and aerosol information contributed the most to
their analysis. However, the authors themselves stated that their method was unstable due
to the nature of their feature importance algorithm. Depending on what data was chosen
with their feature importance algorithm, the accuracy of the method fluctuated. Thus, their
method was good for reducing data size and should be used when compute is limited,
though using all of the data in a given time series was shown to work better.

3.2.2. Land Cover Classification


Land cover classification is the 2nd-most-developed domain area using GEE and AI
(27 studies total). Table 2 below summarizes those studies and a word cloud generated
from the titles and keywords of those papers is provided in Figure 7. The most frequently
used words are “Google Earth Engine”, “land”, “cover”, and “classification”. “Landsat”
RS imagery features heavily in LULC research, though the trend is moving towards higher-
Remote Sens. 2022, 14, 3253 14 of 110

resolution data and creating maps over much larger areas. The words “Sentinel” and
“Africa” illustrate this point well.

Table 2. Studies targeting LULC from RS imagery using AI (Note that references marked * denotes
novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


Landsat 8 OLI TOA,
Azzari and Lobell (2017) [80] classification RF Sentinel-2 MSI TOA, Zambia
SRTM DEM
DMSP NTL, Globeland30,
Midekisa et al. (2017) [81] classification RF Hansen Global Forest Africa (continent)
Change, Landsat 7 ETM+
GlobCover, Landsat 5 TM,
Hu et al. (2018) [82] classification CART China
Landsat 8 OLI
Bayesian hierarchical ALOS DSM, Landsat 8
Ge et al. (2019) [83] classification China
model, RF OLI, VIIRS NTL
Lee et al. (2018) [84] * classification BULC-U GlobCover, Landsat 5 Brazil
Landsat 5 SR, Landsat 5
TOA, Landsat 8 SR,
Landsat 8 TOA, NAIP,
Zurqani et al. (2018) [85] classification RF United States
NLCD, SRTM DEM, USGS
Watershed
Boundary Dataset
Landsat 7 ETM+, Landsat
Murray et al. (2018) [86] * classification RF 8 OLI, Landsat 8 SR, Global
SRTM DEM
Mardani et al. (2019) [87] classification BT, SVM FAO land cover, Sentinel-2 Lesotho
FROM-GLC, Landsat 8,
Gong et al. (2019) [88] classification RF Global
Sentinel-2, SRTM DEM
GLDAS, GlobeLand30,
Hao et al. (2019) [89] classification CART Landsat 8 OLI, MOD11A2, China
MOD13A2
ALOS PALSAR-2, Landsat Brunei, Indonesia,
DT, maximum
Miettinen et al. (2019) [90] classification 7 ETM+, Landsat 8 OLI, Malaysia, Singapore,
likelihood, RF
Sentinel-1, SRTM DEM Timor-Leste
Global Field Photo Library,
Google Earth, Landsat 5
Xie et al. (2019) [91] classification RF China
TM, Landsat 7 ETM+,
MCD12Q1, MCD43A4
CART, gmoMaxEnt, RF, Google Earth, Landsat 8
Adepoju and Adelabu (2020) [92] classification South Africa
SVM OLI, SRTM DEM
Google Earth, Sentinel-1,
Ghorbanian et al. (2020) [93] classification RF Iran
Sentinel-2
Google Earth, Landsat 5
TM, Landsat 7 ETM+,
Liang et al. (2020) [94] * classification CART, MD, RF China
Landsat 8 OLI, SRTM
DEM
GFSAD, GHSL, Hansen
Global Forest Change,
Zeng et al. (2020) [95] classification RF South Africa
Landsat 8 OLI, Sentinel-1
GRD, SRTM DEM
Landsat 8 OLI,
Naboureh et al. (2020) [96] classification RF Iran
SRTM DEM
Naboureh et al. (2020) [97] classification SVM, RUESVM Google Earth, Sentinel-2 China, Iran
Remote Sens. 2022, 14, 3253 15 of 110

Table 2. Cont.

References Method Model Comparison RS Data Type Study Area


FROM-GLC10, GHSL,
Landsat 8 OLI, MYD11A2,
Li et al. (2020) [98] classification RF Africa (continent)
Sentinel-2 MSI, SRTM
DEM, Suomi-NPP NTL
CCI-LC, FROM-GLC,
Huang et al. (2020) [99] classification RF Google Earth, Global
Landsat 5 TM
classification, Landsat 8, PlanetScope,
Tassi and Vizzari (2020) [100] RF, SNIC, SVM Italy
segmentation Sentinel-2
BCLL, GlobCover, Google
Shetty et al. (2021) [101] classification CART, RF, RVM, SVM India
Earth, Landsat 8 OLI
aerial photography,
Feizizadeh et al. (2021) [102] classification CART, RF, SVM Landsat 5 TM, Landsat 7 Iran
ETM+, Landsat 8 OLI
Shafizadeh-Moghadam et al. classification, Iran, Iraq, Kuwait, Saudi
RF, SNIC Google Earth, Landsat 8
(2021) [103] segmentation Arabia, Syria, Turkey
Landsat 5 TM, MCD12Q1,
Pan et al. (2021) [104] classification CART, RF Australia, United States
SRTM DEM
Becker et al. (2021) [105] classification RF Landsat 8 Brazil
ALOS PALSAR, CCI-LC,
CGLS-LC, FROM-GLC,
Remote Sens. classification, GFSAD30, GHSL, JRC
Jin et2022, 14, x FOR
al. (2022) [106]PEER REVIEW RF, SNIC 16 of 121
Asia (multiple countries)
segmentation Global Surface Water,
Landsat 7 ETM+, Landsat
8 OLI, MCD12Q1

Figure 7. Word-cloud visualization of all the reviewed papers targeting LULC application (i.e., those
Figure 7. Word-cloud visualization of all the reviewed papers targeting LULC application (i.e., those
27 papers summarized in Table 2).
27 papers summarized in Table 2).
From our interactive web app (see Appendix A) and Table 2, Landsat 8 OLI, SRTM
Table 2. Studies targeting LULC from RS imagery using AI (Note that references marked * denotes
DEM, and Google Earth are mostly used. The most popular AI models are RF, CART,
novel methods and will be detailed in Section 3.3).
and SVM, and the mostly used evaluation metrics are overall accuracy (OA), PA, UA, and
References Kappa.
Method A brief summary of those studies is provided
Model Comparison right below TableStudy
RS Data Type 2. More detailed
Area
textual summaries for most of the reviewed land cover classification studies are provided
in Appendix C.2. Landsat 8 OLI TOA,
Azzari and Lobell
classification RF Sentinel-2 MSI TOA, Zambia
(2017) [80]
SRTM DEM

DMSP NTL,
Midekisa et al. (2017) Globeland30, Hansen
classification RF Africa (continent)
Remote Sens. 2022, 14, 3253 16 of 110

LULC maps can help decision-makers and land managers make more informed
decisions about the environment. Still, producing LULC maps with ML and RS data
requires a lot of compute and labeled input training data. GEE currently offers free compute,
so researchers can use the data that they are interested in without having to worry about
hardware setup or compute time. The authors in [102] took advantage of this to create an
LULC map in Northern Iran, predicting for water, rangelands, built-up areas, orchards,
and other LULC classes. They used Landsat RS imagery, field observations, and historical
datasets to train CART, RF, and SVM models. The SVM performed better than the CART
and RF models, but perhaps more importantly the authors also ran a spatial uncertainty
analysis to show each model’s confidence level on the output maps. More research should
include uncertainty incorporated into reporting metrics or on maps produced with ML to
better convey a model’s certainty to both citizens and decision-makers.
There are currently high data and computational costs of having to store RS data
across different machines using different ML algorithms. There is an additional challenge
in that most RS analyses depend on optical data, which is often obscured by clouds and
shadows. In addition, most land cover maps have coarse resolution and do not often
describe the same things as other maps (making them not directly comparable). These
static maps need to be more accurate and updated frequently to be of real use, and cloud
computing alongside data and algorithms being in one place has allowed both of these
to become a reality. An RF model was used in [80] to determine land-use classes such as
vegetation, croplands, and urban areas from Landsat imagery in Zambia. An approach was
presented in [81] to quantify continental land cover and impervious surface changes over
continental Africa for 2000–2015 using Landsat images and an RF classifier on GEE. Simple
change detection based on Landsat images from two different years with two different
phenophases yields unsatisfactory results and may induce many misclassifications and
pseudo-change identifications because of the phenological differences between RS images.
A land-use/land-cover type discrimination method based on a CART was proposed in [82],
which applied change-vector analysis in posterior probability space (CVAPS) and the best
histogram maximum entropy method for change detection, and further improved the
accuracy of the land-updating results in combination with NDVI timing analysis. The last
land-cover map of Iran was produced with MODIS imagery in 2016. Now, there are much
higher resolution satellite data products, but it is difficult to collect more ground-truth
validation data. Cloud computing and ML can help produce newer land cover classification
maps that are easy to reuse. Such a workflow was designed in [93] on GEE for Iran using
Sentinel-1 and -2 data and an RF model and SNIC. With the ground-truth training samples
available, the authors used SNIC to segment land-use classes into objects while the RF
model classifies them on the pixel level.
Numerous efforts have been made to end poverty around the globe. Mapping land-
use changes in poverty areas can provide insights into the poverty reduction progress.
Landsat images available on GEE were utilized in [83] to map annual land-use changes in
China’s poverty-stricken areas. An open-source land cover mapping processing pipeline
was created in [87] using GEE. The authors argued that land cover maps specifically can
help countries properly plan for sustainable levels of food production, but that many
developing countries did not have the financial or compute resources to monitor land
classes in real time. Using SVM and bagged trees (BT) models, the authors predicted urban,
agriculture, tree, vegetation, water, and barren land-use types in Lesotho.
In RS imagery, many different land-use types have similar spectral signatures or
are very complex, making them difficult to be properly identified. Several different ML
models available on GEE were trained in [92] with different combinations of input data to
determine which were the most important in determining land-use types in Golden Gate
Highland Park in China. Although RS and ML have allowed LULC analysis to become
ever more accurate for general LULC classes, it is still challenging to correctly identify
land subtypes. For example, while classifying vegetation to a high degree of accuracy has
become more commonplace, identifying vegetation subtypes like shrubs or grassland is
Remote Sens. 2022, 14, 3253 17 of 110

not as straightforward, especially in mixed-use areas. In addition, as is the case for many
RS applications, it is challenging to know which types of input data will contribute to a
given ML model’s ability to learn these subtypes. Therefore, the authors in [95] set out
to compare the contribution of SAR data and different indices (NDVI, EVI, SAVI, NDWI)
derived from optical data on overall classifier performance. A land cover map of the whole
African continent at 10 m resolution was generated in [98], using multiple data sources
including Sentinel-2, Landsat-8, Global Human Settlement Layer (GHSL), Night Time Light
(NTL) Data, SRTM, and MODIS Land Surface Temperature (LST). Different combinations
of data sources were tried to determine the best data input configurations. Pixel-based
classification methods often suffer from “salt-and-pepper” noise in their end predictions.
Object-based classifiers can help alleviate this problem but are not commonly used because
of their high compute overhead. While GEE does not have many object-based classifiers, it
does provide free compute. To take advantage of this while comparing the performance of
pixel-based and object-based classification methods, [100] produced LULC maps in Italy
using Landsat, Planet, and Sentinel RS imagery. The authors compared the performance
of RF and SVM models alone with that of the same models used in conjunction with the
SNIC and gray-level co-occurrence matrix (GLCM) texture data. Their results showed
that pixel-based methods worked better at lower resolutions (i.e., using Landsat data),
whereas object-based methods worked better for higher-resolution RS imagery. The best
classifier was the RF model trained with SNIC and incorporating GLCM data. Still, the
authors noted that ML models were heavily influenced by input data, feature engineering,
the classes that you were trying to predict for, and the place being studied. Many studies
evaluate ML methods and the effect that input data sources have on their performance.
Not as much research is done into determining how data sampling strategies affect ML
classifiers. The authors in [101] compared different data sampling strategies and their
effects on how different ML classifiers performed on LULC tasks. A multi-seasonal sample
set was collected in [88] for global land cover mapping in 2015 from Landsat 8 images.
The concept of “stable classification” was used to approximately determine how much
reduction in training sample and how much land cover change or image interpretation
errors can be acceptable.
Mountain Land Cover (MLC) classification can be relatively challenging due to high
spatial heterogeneity and the cloud contamination in optical satellite imagery over the
mountainous areas. Distribution of Land Cover (LC) classes in these areas is mostly
imbalanced. To date, three approaches have been proposed to address the class imbalance
problem: (1) applying specific classification methods by focusing on the learning of minority
classes, (2) assigning higher weights on minority classes by adjusting classifiers, and
(3) rebalancing training datasets (e.g., oversampling and under-sampling techniques). A
hybrid data-balancing method, called the Partial Random Over-Sampling and Random
Under-Sampling (PROSRUS), was proposed in [96] to resolve the class imbalance issue. The
class imbalance problem reduces classification accuracy for infrequent and rare LC classes.
A new method was proposed in [97] by integrating random under-sampling of majority
classes and an ensemble of Support Vector Machines, namely Random Under-sampling
Ensemble of Support Vector Machines (RUESVMs). Rapid urban expansion puts pressure
on local ecosystems and human well-being, so urban sustainability studies are increasingly
turning to applications that process large amounts of geospatial data and model ecosystem
services. Currently, it is not straightforward for urban or ecology scientists to use cloud-
based platforms like GEE as their processing routines are more complicated than the many
common mapping applications (i.e., classification) available on GEE. While determining
ecosystem service values is complicated (many disciplines, many opinions, etc.), GEE
was used in [94] to illustrate a processing workflow for how LULC classes can be used to
compute more complex ecosystem service values.
Watersheds around the world are under stress, both due to climate change and human
disturbance. LULC maps can help with planning and conservation decisions, but they are
often difficult to make because they are compute-intensive to make. GEE has helped many
Remote Sens. 2022, 14, 3253 18 of 110

researchers by providing freely available data, methods, and compute, but researchers
often find that they run into compute limits on the platform before they can complete
their analyses. To overcome these compute limits in GEE, the authors in [103] used feature
reduction techniques and designed their own parallel processing algorithms to produce
an LULC map across several Middle Eastern countries. To get a better idea of how water
resources were being affected by LULC classes, the authors combined topographic data,
spectral data, RS image composites, and texture information to train a combined SNIC-RF
model. They achieved high accuracy across several LULC classes and showed feature
importances for each class in their analysis. However, the authors noted that other than
SNIC, advanced object-based classification and segmentation algorithms were not available
on GEE.

3.2.3. Forest and Deforestation Monitoring


Forest and deforestation monitoring is the 3rd-most-developed application using
GEE and AI (20 studies total). Table 3 below summarizes those studies and a word cloud
generated from the titles and keywords of those studies is provided in Figure 8. The
most frequently used words are “Google Earth Engine” and “forest change”. A significant
amount of research is done in monitoring deforestation over time and differentiating forest
cover from plantations of oil palm, for example. Thus, the words “deforestation”, “planta-
tion”, “palm oil”, and “time series” feature prominently in the word cloud. Additionally,
Landsat, Sentinel-1, and Sentinel-2 data are frequently used in forest and deforestation
monitoring research in tropical forests in places like the Amazon and Myanmar. From our
interactive web app (see Appendix A) and Table 3, the most-used RS datasets are Landsat 8
OLI, SRTM DEM, and Google Earth. The most popular AI models are RF, SVM, and CART,
and the most-used evaluation metrics are OA, PA, UA, Kappa. A brief summary of those
studies is provided right below Table 3. More detailed textual summaries for most of the
reviewed forest and deforestation monitoring studies are provided in Appendix C.3.

Table 3. Studies targeting forest change and deforestation from RS imagery using AI.

References Method Model Comparison RS Data Type Study Area


Lee et al. (2016) [107] classification CART, MD, RF Landsat 8 Indonesia
ALOS PALSAR, GlobeLand30-2010,
Hansen Global Forest Change dataset,
JRC Yearly Water Classification History,
Wang et al. (2019) [15] classification RF Brazil
Landsat 5 TM, Landsat 7 ETM+,
RapidEye, TerraClass-2010, USGS
Global Tree Cover 2010
CART, Markov Chain Google Earth, Landsat MSS, Landsat 5
Voight et al. (2019) [108] classification Belize
model, MLP TM, Landsat 7 ETM+, Landsat 8 OLI
ALOS PALSAR, Google Earth, Landsat
Koskinen et al. (2019) [109] classification CART, RF, SVM 8 OLI, NAFORMA, Sentinel-1, Tanzania
Sentinel-2 MSI, SRTM DEM
Duan et al. (2019) [110] classification RF Google Earth, Sentinel-2 China
ALOS GDSM, Landsat 8, PlanetScope,
Poortinga et al. (2019) [111] classification DT, Monte Carlo, RF Myanmar
RapidEye, Sentinel-1, Sentinel-2
Google Earth, Landsat 8, MCD12Q1,
Shimizu et al. (2019) [112] classification RF Myanmar
PlanetScope, RapidEye, Sentinel-1
Ramdani (2019) [113] classification GMM, KNN, RF, SVM Sentinel-1, SRTM DEM Indonesia
Çolak et al. (2019) [114] classification SVM CORINE LULC, Sentinel-1, Sentinel-2 Turkey
Shaharum et al. (2020) [115] classification CART, RF, SVM Google Earth, Landsat 8, SRTM DEM Malaysia
ALOS PALSAR, Landsat 8 OLI,
de Sousa et al. (2020) [116] classification RF Gabon, Liberia
SRTM DEM
CBERS 2B, CBERS 4, Landsat 5,
Brovelli et al. (2020) [117] classification ANN, RF Brazil
Landsat 7, Landsat 8, Sentinel-2
Remote Sens. 2022, 14, 3253 19 of 110

Table 3. Cont.

References Method Model Comparison RS Data Type Study Area


Kamal et al. (2020) [118] classification SVM Landsat 8 OLI Indonesia
AW3D30, CHELSA V1.2, GeoEye-1,
binomial logistic
Wei et al. (2020) [119] classification GMTED2010, Google Earth, Hansen United States
regression
Remote Sens. 2022, 14, x FOR PEER REVIEW Global Forest Change, Landsat 5, NAIP 21 of 121
CART, k-means, RF,
Praticò et al. (2021) [120] classification Sentinel-2 Italy
SVM
Xie et al. (2021) [121] classification RF Sentinel-1, Sentinel-2, SRTM DEM China
“plantation”, “palm oil”, and “time series” feature prominently in the word cloud.
Floreano and de Google Earth Pro, Landsat 5 TM,
Moraes (2021) [122]
Additionally,
classification Landsat, Sentinel-1,
Markov-CA, MLP, RF and Sentinel-2 data are frequently used in
Landsat 7 ETM, Landsat 8 OLI
forest and
Brazil
deforestation monitoring research in tropical forests in places like the Amazon and
Forest Survey of India, Landsat 5 TM,
Kumar et al. (2022) [123] Myanmar.
classification From our interactive
RF web app (seeLandsat
Appendix A) and Table 3, the most-used
7 ETM+, India RS
datasets are Landsat 8 OLI, SRTM DEM, Landsat and Google Earth. The most popular AI models
8 OLI, MCD12Q1
are RF, SVM, and CART, and the most-used evaluation
Google Earth metrics
Pro, Hansen Globalare OA, PA, UA, Kappa.
classification, LandTrendr, RF,
Zhao et al. (2022) [124] A brief summary of those studies is Forest Change,
provided right MTBS,
belowMCD64A1,
Table Brazil,
3. More United States
detailed textual
segmentation U-Net
Planet, Sentinel-1, SRTM DEM
summaries for most of the reviewed forest and deforestation monitoring studies are
classification, Google Earth, Landsat 7 ETM+,
Wimberly et al. (2022) [125] provided in Appendix C.3. RF
LandTrendr, Ghana
segmentation Landsat 8 OLI, WorldView

Figure 8.
Figure 8. Word-cloud
Word-cloud visualization
visualization of
of all
all the
the reviewed
reviewed papers
papers targeting
targeting forest
forest and
and deforestation
deforestation
monitoring (i.e., those 20 papers summarized in Table 3).
monitoring (i.e., those 20 papers summarized in Table 3).

TableForests
3. Studies targeting
provide manyforest change and
ecosystem deforestation
services, from RS imagery
from preventing using AI.
soil erosion, regulating the
hydrological cycle, and providing shelter for many plant and animal species. However,
References Method is occurring
deforestation Model Comparison
at a rate that is makingRS Data Type for individual
it impossible Studyspecies
Area to
recover. As deforestation accelerates, there are cascading effects for entire ecosystems. In
Lee et al. (2016) [107] classification CART, MD, RF Landsat 8 Indonesia
Brazil, agriculture, ranching, and land occupation is causing the vast forest of the Amazon
to become fragmented. Still, it is difficult to monitor the changes through time due to cloud
ALOS PALSAR,
cover and the rate that new satellite imagery comes in every day. The authors in [117]
GlobeLand30-2010,
showed how GEE can be used to overcome data storage and compute needs and analyze
Hansen Global
about 20 year’s worth of Landsat data to determine Forestchanges. Land use maps
forest cover
Change
can help inform policymakers and land-use managers but are dataset, JRCoften static and of coarse
resolution. It would be more Yearly Water
Wang et al. (2019) [15] classification RF useful to create these maps in a repeatable manner, Brazil one in
which code and data could be reused for making Classification
decisions History,
based on up-to-date information.
Sentinel-2 data were analyzed in [120] andLandsat
several 5different
TM, Landsat 7
ML classifiers were trained to
distinguish between four different forest types in Italy during
ETM+, RapidEye, both summer and winter
seasons. Monitoring tree species distribution is an importantUSGS
TerraClass-2010, metric in monitoring overall
forest health and in determining current Global carbonTree
storage efforts.
Cover 2010 However, doing so is
difficult without the use of high-resolution RS data, much of which is either private and
inaccessible or too expensive to collect (in the case ofEarth,
Google LiDAR or UAS data). Recent research
Landsat
Voight et al. (2019) CART, Markov MSS, Landsat 5 TM,
classification Belize
[108] Chain model, MLP Landsat 7 ETM+, Landsat
8 OLI
Remote Sens. 2022, 14, 3253 20 of 110

in environmental mapping applications uses DL and NN to identify tree species across


large areas with minimal feature engineering, but NNs currently need large amounts of
compute and labeled input data to train on. To classify tree species across a large area
in China while fitting within compute restraints, an RF was trained on the GEE platform
in [121] using optical and SAR imagery, DEM data, and field observations.
A participatory forest mapping methodology was developed and tested in [109] to
map the extent and species composition of forest plantations in the Southern Highlands
area of Tanzania. Collecting field observations of plant phenology can be time- and labor-
intensive to repeatedly obtain. RS imagery can help continuously monitor phenology
information because of its high spatial and temporal resolution. To create a forest type
map in India using RS imagery and ML, the authors in [123] predicted for evergreen and
deciduous forest types, as well as “non-forest” classes. Collecting, storing, and processing
large amounts of RS imagery presents a barrier to doing research in the environmental and
earth sciences. GEE provides data storage, compute, data processing, and ML algorithms
on its platform. The researchers in [118] used GEE to map mangrove extent in Indonesia.
Global forests face many anthropogenic threats, one of the most prominent being the
conversion to agriculture. This is often done through deforestation by fire or clearcutting,
but a less studied mechanism related to forest health is the slow degradation caused by
continual disturbances. This is more difficult to track using EO methods like vegetative
indices derived from RS data, because forest degradation does not always result in canopy
loss. The tropical forests of Ghana, many of which are in protected areas, still suffer
from logging, mining activities, fires, and expanding agriculture production. All of these
disturbances contribute to declines in forest health; a method was developed in [125] for
monitoring tropical forest loss and recovery based on Landsat data. Forest logging and
forest fires are both dominant drivers of forest loss worldwide. However, deforestation
monitoring efforts are often limited by low-resolution RS data and the inability to create
forest maps on a continual basis. Using SAR data as input and high-resolution optical data
as validation data, a U-Net was trained on Google Cloud in [124] to create monthly forest
loss maps.
Forest degradation and deforestation have been occurring around the globe during the
past decades, which threaten the biodiversity of ecosystems. As a highly forested landscape,
southern Belize has been experiencing deforestation due to agricultural expansion. Landsat
8 imagery on the GEE was utilized in [108] to perform a supervised classification. An MLP
model was then built to predict future deforestation patterns and magnitude based on the
drivers of past deforestation patterns in the region. Deforestation is accelerating in the
tropics in part due to industrial oil palm plantation expansion. Being able to monitor illegal
deforestation can aid in conserving the remaining forested landscape and is thus critical to
maintaining biodiversity and ecosystem services. A low-cost method was demonstrated
in [107] for monitoring industrial oil palm plantations in Indonesia using Landsat 8 imagery
that allowed them to distinguish between oil palm (immature oil palm, mature oil palm),
forest, clouds, and water classes using the CART, RF, and MD algorithms. Oil palm can
play a key role in both ecosystems and local economies but is also a common cause of
deforestation. Thus, monitoring and managing the oil palm industry is often necessary
to ensure that deforestation does not occur, but traditional ground surveys are time-,
effort-, and cost-intensive. RS imagery can help monitor large areas over time. In [115],
several ML models were compared to map oil palm using 30 m Landsat 8 imagery in
Malaysia. Tropical forests in different Sub-Saharan African countries face high rates of
deforestation to illegal logging and cropland expansion. Being able to monitor tropical
deforestation is important to monitor ecosystem balances and health. However, separate
efforts to produce deforestation maps or products use different data sources, describe
slightly different things, or lead to diverging land cover estimates, making them difficult
or impossible to directly compare. To explore how GEE could be used to create an open-
source processing pipeline for deforestation mapping in Liberia and Gabon, two different
RF models were used in [116] to create data masks and then predictions for various land
Remote Sens. 2022, 14, 3253 21 of 110

types there. The Amazon Rainforest is home to much of the world’s biodiversity and plays
an important role in natural carbon sequestration. However, this region is experiencing
high rates of deforestation due to the expansion of agriculture and cattle farming. It remains
challenging, though, to monitor such a large area given its size and biological complexity
and use that information to produce forest change projections into the future. An RF
was used in [122] for initial LULC classification, then used an MLP to simulate possible
deforestation scenarios into the future.
Mapping how much carbon forests sequester remains difficult because current tech-
niques rely on mapping forested versus deforested landscapes. However, a major source of
uncertainty stems from the fact that degraded forests, ones open to selective logging, are
not a separate class but can emit carbon heavily even though they are counted as forested
regions. This issue was addressed in [15] by mapping disturbed forest areas in Brazil using
27 years of Landsat surface reflectance imagery.

3.2.4. Vegetation Mapping


Vegetation mapping is a well-developed application domain using GEE and AI
(18 studies total). Table 4 below summarizes those studies and a word cloud generated
from the titles and keywords of the 18 papers is provided in Figure 9. The most frequently
used words are “Google Earth Engine”, “Landsat”, and “vegetation mapping”. Plant
dynamics (i.e., phenology and land-use changes over time) play an important role in differ-
entiating vegetation from forest cover. From our interactive web app (see Appendix A) and
Table 4, the most-used RS datasets are Landsat 8 OLI, Landsat 5 Thematic Mapper (TM),
and Landsat 7 Enhanced Thematic Mapper Plus (ETM+). The most popular AI models are
RF, CART, and SVM, and the mostly used evaluation metrics are PA, UA, and OA. A brief
summary of those studies is provided right below Table 4. More detailed textual summaries
for most of the reviewed vegetation mapping studies are provided in Appendix C.4.

Table 4. Studies targeting wetland mapping from RS imagery using AI (Note that references marked
* denotes novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


CART, RF, NDVI,
Johansen et al. (2015) [126] classification Foliage Projective Landsat 5 TM, Landsat 7 ETM+ Australia
Cover
Traganos et al. (2018) [127] classification CART, RF, SVM Sentinel-2 LIC TOA Greece
Tsai et al. (2018) [128] classification DT, RF Landsat 7 TM, Landsat 8 OLI China
multiple linear
regression, Landsat 7 ETM+, Landsat 8 OLI,
Jansen et al. (2018) [129] regression United States
polynomial linear USGS National Elevation Dataset
regression
Landsat 5 TM, Landsat 7 ETM+,
Jones et al. (2018) [130] classification RF Landsat 8 OLI, USGS National United States
Elevation Dataset
Campos-Taberner et al. BELMANIP2,
regression RF Global
(2018) [131] MCD15A3H, MCD43A4
FCNN, CNN-LSTM
Xin and Adler (2019) [132] classification Sentinel-2 MSI United States
hybrid
classification,
Parente et al. (2019) [43] LSTM, RF, U-Net PlanetScope Brazil
segmentation
Google Earth, Landsat 5 TM,
Parente et al. (2019) [133] classification RF Landsat 7 ETM+, Landsat 8 OLI, Brazil
MOD13Q1
Zhang et al. (2019) [134] classification RF Google Earth, Landsat 8 OLI China
Landsat 5 TM, Landsat
Alencar et al. (2020) [135] * classification DT, RF Brazil
7 ETM+, Landsat 8 OLI
Remote Sens. 2022, 14, 3253 22 of 110

Table 4. Cont.

References Method Model Comparison RS Data Type Study Area


CART, CNB, MLP, Landsat 8 OLI,
Zhou et al. (2020) [21] regression United States
RF, SVM MCD43A1, MCD43A4
Google Earth, Landsat 5 TM,
Landsat 8 OLI, Pleiades 2,
Tian et al. (2020) [136] classification SAE, SVM China
QuickBird, SPOT 4, SPOT 6, UAS,
WorldView 1, WorldView 3
MOD09A1, SRTM DEM,
Srinet et al. (2020) [137] classification RF India
WorldClim V2 Bioclim
CGLS-LC100, Landsat 5 TM,
classification, CART, LandTrendr,
Long et al. (2021) [138] * Landsat 7 ETM+, Landsat 8 OLI, China
segmentation MD, NB, RF, SVM
Sentinel-1, Sentinel-2, SRTM DEM
Remote Sens. 2022, 14, x FOR PEER REVIEW Gaofen-2, Landsat 4 TM, Landsat 5 25 of 121
TM, Landsat 7 ETM+, Landsat 8
Yan et al. (2021) [139] classification RF China
OLI, Pléiades A, QuickBird,
UAS, WorldView 2
Wu et al. (2021) [140] summaries for most of the
classification RF reviewed vegetation mapping
Gaofen-2, Landsat 8 OLIstudies are provided
China in
Appendix C.4. Europe
Pipia et al. (2021) [141] * regression GPR HyMap, Sentinel-2
(multiple countries)

Word cloud
Figure9.9. Word
Figure cloud visualization
visualizationofofallallthe reviewed
the papers
reviewed targeting
papers vegetation
targeting mapping
vegetation (i.e., those
mapping (i.e.,
18 papers summarized in Table 4).
those 18 papers summarized in Table 4).

Table Accurate near real-time


4. Studies targeting wetland estimates of vegetation
mapping from coverAIand
RS imagery using biomass
(Note are critical
that references markedto
adaptive rangeland management. An approach was
* denotes novel methods and will be detailed in Section 3.3). developed and tested in [129] to
automate the mapping and quantification of vegetation cover and biomass using Landsat
References 7 Method
and Landsat 8 imagery
Model across the grazing season
Comparison RS(i.e.,
Data changing
Type phenological Studyconditions).
Area
Annual percent land cover maps of plant functional types across rangeland ecosystems were
produced to effectively
CART,and efficiently
RF, NDVI, respond to pressing challenges facing conservation
Johansen et al. (2015) of biodiversity andFoliage
ecosystem services. TheLandsat
authors 5inTM, Landsat
[130] utilized7 the historical Landsat
classification Projective Australia
[126] satellite record, gridded meteorology, abiotic land ETM+
surface data, and over 30,000 field plots
Cover
within an RF model to predict per-pixel percent cover of annual forbs and grasses, perennial
Traganos et al. (2018) forbs and grasses, shrubs, and bare ground over the western United States from 1984 to
classification
2017, at approximatelyCART,30 RF, SVM
m resolution. Sentinel-2 in
Rangelands LICthe
TOA
western United Greece
States are
[127]
home to many different animal and plant species. They are ecologically diverse and have
been traditionally monitored by taking and analyzing
Landsat 7 TM,in Landsat
situ measurements
8 in different
Tsai et al. (2018) [128] classification DT, RF
areas. However, continually collecting field observations China
OLIcan be time- and labor-intensive
and land managers are often asked to make decisions about large areas with sparse field
multiple linear
Landsat 7 ETM+, Landsat 8
regression,
Jansen et al. (2018) [129] regression OLI, USGS National United States
polynomial linear
Elevation Dataset
regression
Remote Sens. 2022, 14, 3253 23 of 110

information. RS data can help monitor rangelands with a large spatial scope and a short
return time, making them key to informing land management decisions in a timely manner.
Using climate and field data alongside Landsat imagery and MODIS land-use maps, ML
models used in [21] were able to predict for several important rangeland indicators like
plant height, total vegetation and rock cover, as well as bare soil.
Invasive species can degrade ecosystems and harm biodiversity as well as soil and water
quality. It is often difficult to monitor invasive species in coastal environments from optical RS
imagery, though, because of frequent cloud cover. A specific invasive species in China was
used in [136] as a case study for developing an ML pipeline that takes into account both cloud
cover and phenological information. Invasive species can have harmful environmental effects
as they disrupt ecosystem balances. Long-term datasets, like those of the grass S. alterniflora,
are not always available, making them difficult to detect using RS methods. In order to
produce a map of this invasive species, field data were collected and processed in [139] in
addition to UAS imagery and optical RS data from several different platforms.
It is often difficult to detect changes in savanna landscapes due to their high hetero-
geneity in vegetation types, which makes it even harder to attribute change to natural or
anthropogenic causes. This is especially problematic in areas like the Brazilian Cerrado
where agricultural expansion is happening on a large scale. In order to clarify what changes
have been happening there, over three decades worth of Landsat imagery was used in [135]
to determine which areas have experienced vegetation change. Wetlands provide many
ecosystem services and provide important habitats for several different plant and animal
species. In order to make informed conservation and policy decisions, it is important not
only to be able to map the current state of wetlands vegetation, but how that vegetation is
changing over time. However, different sets of input data and ML methods used for change
detection of wetland vegetation need to be evaluated more fully as choices made during
preprocessing and hyperparameter tuning can affect the end result of an analysis. The
authors in [138] used an adaptive stacking algorithm to train an ML classifier on optical,
SAR, and DEM data to identify wetland vegetation.
Seagrasses provide many ecosystem services, from carbon storage, providing habitat
for many marine species, and preventing coastal erosion. However, they are in decline
due to anthropogenic impacts. Mapping their extent is key to being able to conserve
them. Bathymetry and RS data were combined in [127] to create a processing and analysis
pipeline for large-scale seagrass habitat monitoring in Greece using GEE. Grasslands
are often integrated into land-use type or cropland-specific maps, even high-resolution
products. However, different grassland species are not identified and thus are classified as
a single homogenous land or crop type. This is a problem not just because previous maps
have not separated out different grassland types, but it is difficult to recognize them in RS
imagery because they look very similar. Some experts are able to recognize such classes,
but it is time-consuming to analyze grassland types at scale. Thus, DL techniques that do
not rely on expert knowledge are needed so that these identification systems can work over
large areas over time. A CNN–LSTM hybrid model was used in [132] to identify grassland
types in Sentinel-2 imagery in the United States.
Feature engineering is important in ML, but it is labor-intensive and often requires
domain expertise [1]. As one ML branch, DL does not need feature engineering, as deep
NN will figure it out from large, annotated data examples, but DL requires much more
large training data than ML [1]. The authors in [43] addressed this issue by comparing the
performance of an RF model with feature engineering to an LSTM and U-Net NN models
without feature engineering for identifying pasturelands in Brazil. Monitoring vegetation
on a large spatial scale can be difficult because field data collection takes only snapshots
in time and is labor-intensive and expensive. Instead, methods for measuring vegetation
need to be done over time so that change detection is possible. Still, novel methods, such
as those utilizing RS imagery, need to meet current governmental quality standards. An
example of how this can be done is illustrated in [126] in Australia using the GEE platform
by comparing how well several ML classifiers compare to index-based methods like NDVI.
Remote Sens. 2022, 14, 3253 24 of 110

Although coastal wetland systems are critical habitats for different animal and plant species,
it is difficult to monitor them due to cloud cover and difficulty in obtaining RS imagery
at high and low tides. Previous studies have used single images or spectral time series
to try and identify wetland vegetation, but coastal wetland environments are complex
ecosystems. The same species of plant can look different at different stages of its life while
also being submerged under water in some RS scenes. The authors in [140] argued that
phenology information in RS time series can better capture tidal flat wetland vegetation
and so compared phenology information to statistical (min, max, median) and temporal
features (quartile ranges). Mapping plant functional types is important because it can give
ecosystem modelers and environmental planners a better idea of the spatial distribution of
vegetation. This in turn has implications on how resilient areas and ecosystems are and will
be to changing climatic factors like heat stress. However, plant function type classification
relies on and is often derived directly from current LULC map products that themselves
can contain inaccuracies. To explore how plant functional types can be derived directly
from RS information, [137] trained an RF model on field, DEM, MODIS, and climate data.
Many methods have been developed to estimate different vegetative properties from
RS imagery in response to environmental changes. One such method, Gaussian Process
Regression (GPR), is increasingly used to do so because it is a transparent ML model that also
outputs model uncertainties. However, as environmental and earth scientists move to GEE for
finding and processing data, they may find a lack of GPR models ready to use or train. This is
most likely because GPR models become slow and memory-intensive when trained on large
RS time series imagery. Such a model was implemented in [141] that has been optimized for
green LAI in RS imagery but does so in a way that is also optimized for GEE.

3.2.5. Water Mapping and Water Quality Monitoring


Water mapping and water quality monitoring is another well-developed application
domain using GEE and AI (18 studies total). Table 5 below summarizes those studies
and a word cloud generated from the titles and keywords of the 18 papers is provided in
Figure 10. The most frequently used keywords are “Google Earth Engine”, “surface water”,
and “change”. Change detection is very important in water mapping research, though
it requires the ability to first map the water. Sentinel-1 data are used almost on par with
Landsat data in the water mapping literature, thus the size of these two terms is almost
the same. From our interactive web app (see Appendix A) and Table 5, the most-used RS
datasets are Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI. The most popular models
are RF, multiple linear regression, Modified Normalized Difference Water Index (MNDWI),
and14,SVM,
Remote Sens. 2022, x FOR PEER the most-used evaluation metrics are R2 , Kappa, and OA. A brief
andREVIEW summary
29 of 121

of those studies is provided right below Table 5. More detailed textual summaries for most
of the reviewed water mapping studies are provided in Appendix C.5.

Figure 10. Word-cloud


Figure 10.visualization of all the
Word-cloud visualization reviewed
of all papers
the reviewed papers targeting water
targeting water mapping
mapping and water
and water
quality monitoring (i.e.,
quality those 18
monitoring papers
(i.e., those 18summarized
papers summarized in Table
in Table5).
5).

Table 5. Studies targeting water body detection from RS imagery using AI (Note that references
marked * denotes novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area

Pekel et al. (2016) [32] Landsat 5 TM, Landsat 7


classification expert system global
Remote Sens. 2022, 14, 3253 25 of 110

Table 5. Studies targeting water body detection from RS imagery using AI (Note that references
marked * denotes novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


Landsat 5 TM, Landsat 7 ETM+,
Pekel et al. (2016) [32] * classification expert system global
Landsat 8 OLI
multiple linear Global Inland Water, Landsat 5,
Zou et al. (2017) [142] regression United States
regression Landsat 7, NLCD
non-local active Gaofen-1, Google Earth, Landsat 8
Chen et al. (2017) [143] segmentation Tibet
contour algorithm OLI, SRTM DEM
JRC Global Surface Water, Landsat
Wang et al. (2018) [144] classification RF 4 TM, Landsat 5 TM, Landsat 8 China
OLI
BRT, multiple linear
regression, nonlinear
Lin et al. (2018) [145] regression Landsat 5 TM, Landsat 7 ETM+ United States
general additive
models, RF
Landsat 5 TM, Landsat 7 ETM+,
multiple linear Canada, Russia,
Griffin et al. (2018) [146] regression NASA GSFC Ozone Monitoring
regression United States
Instrument
DeepWaterMapv,
Isikdogan et al. (2019) [147] * segmentation DeepWaterMap, Landsat 8 Global
MNDWI, MLP
China Lake Dataset, China’s
Ecosystem Assessment and
Ecological Security Pattern
linear regression, Database, Global Lakes and
Fang et al. (2019) [148] regression China
polynomial regression Wetlands, Global Reservoir and
Dam Database, HydroLakes,
Hydroweb, JRC Global Surface
Water, SRTM DEM
JRC Global Surface Water, Landsat
Australia, United
Fuentes et al. (2019) [149] regression CART 5, LiDAR DTM, USGS National
States
Elevation Dataset
Bmax Otsu
MERIT DEM, PlanetScope,
Markert et al. (2020) [150] segmentation thresholding, Edge Myanmar, Cambodia
Sentinel-1 GRD
Otsu thresholding
Google Earth, Landsat 5 TM,
Wang et al. (2020) [151] * classification MNDWI, MSCNN, RF China
Landsat 7 ETM+, Landsat 8 OLI
Peterson et al. (2020) [152] regression DNN, ELR, MLR, SVR GREON, Landsat 8, Sentinel-2 United States
JRC Global Surface Water, Landsat
Australia, United
Wang et al. (2020) [153] regression CART 5, LiDAR DTM, USGS National
States
Elevation Dataset
Landsat 5 TM, Landsat 7 ETM+,
Boothroyd et al. (2021) [154] classification RivaMap Philippines
Landsat 8 OLI
maximum likelihood, NAIP, National Hydrography
Weber et al. (2020) [155] regression multiple linear Dataset, NLCD, National Wetland United States
regression, RF, SVM Inventory, Sentinel-2
JRC Global Surface Water datasets,
Mayer et al. (2021) [156] * segmentation U-Net Cambodia
PlanetScope, Sentinel-1
NDWI, MNDWI,
Li et al. (2021) [157] classification MuWI-R, Otsu Sentinel-2 Sri Lanka
thresholding, SVM
ALOS DSM, China Lake Dataset,
China Wetlands Map, Google
Li and Niu (2022) [158] classification RF Earth, Global Reservoir and Dam China
Database, Global Surface Water,
Sentinel-1, Sentinel-2, SRTM DEM

Static surface water maps are often produced at the regional or national level, but do
not show long-term trends resulting from seasonality or global warming’s effects. In [32],
Remote Sens. 2022, 14, 3253 26 of 110

the authors created a web portal using GEE as a backend alongside an expert system to
identify bodies of water in Landsat imagery. RS has been widely used to map and monitor
surface water. In [142], the authors used all available Landsat images to study surface
water dynamics in Oklahoma from 1984 to 2015. The authors in [142] found significant
inter-annual variations in the number of surface water bodies and surface water areas.
They also found that both the number of surface water bodies and surface water areas had
a positive relationship with precipitation and a negative relationship with temperature.
Floods and heavy precipitation events often occur at times of heavy cloud cover,
making optical imagery not well-suited to water mapping or flood monitoring during those
times. Traditionally, ground-based gauges are used to monitor water level and stream flow,
but only work at specific points, limiting their utility during large-scale flood events. SAR
imagery, however, is often used in water mapping or flood monitoring analyses because
of its ability to see through clouds and work over large spatial scales. This is especially
important for monsoonal regions like Southeast Asia where intense rains can lead to flood
conditions. However, SAR imagery is also susceptible to classification errors when flooding
occurs under tree cover or looks like concrete/pavement in urban areas, so preprocessing
steps should be carefully considered. The authors in [150] analyzed to what degree different
preprocessing steps affect the output water maps using both SAR and DEM data and two
variations of Otsu’s thresholding algorithm. Glacial lake outburst floods (GLOF) are one
of the serious natural hazards in the Himalayan region. To reduce the potential risks of
GLOF, the information about the location and spatial distribution of glacial lakes is critical.
In [143], the authors used Landsat 8 images available on GEE to map glacial lakes in the
Tibet Plateau region. Their results revealed that climate warming played a major role in
glacial lake changes.
Categorizing urban water resources faces two main challenges. First, it is often difficult
to distinguish between water and things like asphalt or shadows in urban settings using RS
imagery. Second, the distribution of water resources has changed alongside the accelerating
impacts of climate change, making up-to-date, temporally aware water monitoring difficult.
GEE provides free data storage, datasets, and compute, but as of yet high-accuracy DL
models like NNs are not available on the platform. In [151], the authors compared the
performance of MNDWI and an RF to that of a multi-scale CNN (MSCNN) and showed
that the DL method was the most accurate (with less false classifications) for identifying
urban water resources in several Chinese cities. While DL receives a lot of attention in
water mapping research, these models still require a lot of input data and large amounts
of compute to train them. However, as compute becomes publicly available in cloud-
based platforms like GEE, obtaining large amounts of labeled training data remains a key
bottleneck to using DL models. One way to make the data labeling process less time- and
resource-intensive was illustrated in [156], where the authors used current water maps and
a segmentation algorithm to automatically collect data labels from Sentinel-1 imagery.
Optical imagery used in surface water mapping analyses is often occluded by clouds,
and many common methods used to map surface water confuse snow, ice, rock, and
shadows as water. DeepWaterMapv2 was released in [147] and aimed to address these
false positive misclassifications.
ML models have achieved high levels of accuracy in identifying water bodies in RS
imagery. However, the models often misclassify soil, rock, clouds, ice, and shadow as water
and often rely on cloud-free, optical RS imagery, which is not always available. The authors
in [157] used masking, filtering, and segmentation algorithms to identify bodies of water in
Sri Lanka in complex, mountainous environments. It is challenging to repeatedly produce
up-to-date, accurate surface water maps over large areas. Water bodies change their shape
and overall distribution through time, and humans use water in ways that look dissimilar
to natural water bodies in RS imagery. Most studies to date focus on one type of water
body (lakes, rivers, etc.) or create a binary classification mask giving little to no detail on
various water body classification types. To explore the potential to distinguish between
surface water body subtypes, [158] used slope, shape, phenology, and flooding information
Remote Sens. 2022, 14, 3253 27 of 110

as input to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and
agricultural ponds.
The authors in [144] proposed a new method for quickly mapping yearly minimal
and maximal surface water extents. In [148], the authors integrated global surface water
(GSW) dataset and SRTM-DEM to determine the spatiotemporal patterns of water storage
changes in China’s lakes and reservoirs. Multitemporal, multispectral satellite observations
from the Landsat program and Sentinel constellation are particularly useful in fluvial
geomorphology, in which river channel mapping and the analysis of planimetric change
have long been a focus. The authors in [154] demonstrated a workflow showing how GEE
can be used to extract active river channel masks from a section of the Cagayan River
(Luzon, Philippines).
Satellite RS can be used to estimate chromophoric dissolved organic matter (CDOM)
as a riverine constituent that influences optical properties in surface waters. CDOM ab-
sorption is a common proxy for dissolved organic carbon (DOC) concentrations in inland
waters, including Arctic rivers. The authors in [146] stated that this was the first study
using GEE for RS of water quality parameters in inland waters. Collecting field data for
monitoring water quality can be costly in terms of money, time, and effort. Additionally,
traditional monitoring techniques do not extend over a large area and are often difficult to
repeat over time. Satellite RS imagery can help monitor water quality at frequent intervals
over large areas. To estimate water quality parameters like chlorophyll-a (Chl-a) concentra-
tions, turbidity, and dissolved organic matter, [152] used ML and DL models to analyze
RS imagery. Harmful algal blooms (HABs) have become a serious issue in freshwater
ecosystems. RS has proven to be a cost-effective means for monitoring HABs. The authors
in [153] developed a methodological framework for mapping Chl-a concentrations with
multi-sensor satellite observations and in-situ water quality samples.

3.2.6. Wetland Mapping


Wetland mapping is one of most well-developed applications using GEE and AI
(16 studies total). Table 6 below summarizes those studies and a word cloud generated
from the titles and keywords of the 20 papers is provided in Figure 11. The most frequently
used words are “Google Earth Engine”. Wetlands have many different subtypes and
occur in both inland and coastal environments, so we can see “wetlands” alongside terms
like “tidal flats” and “coastal”. Most of the wetland mapping studies we reviewed take
place in Canada and combine high-resolution RS imagery like Sentinel-1 and Sentinel-2
imagery to better distinguish between water and aquatic vegetation. From our interactive
web app (see Appendix A) and Table 6, the most-used RS datasets in those studies are
Sentinel-1, Sentinel-2, and Google Earth. The most popular models used are RF, boosted
regression trees (BRT), CART, and Simple Non-Iterative Clustering (SNIC), and the mostly
used evaluation metrics are OA, Kappa, PA, and UA. A brief summary of those studies
is provided right below Table 6. More detailed textual summaries for almost all reviewed
wetland mapping studies are provided in Appendix C.6.

Table 6. Studies targeting wetland mapping from RS imagery using AI.

References Method Model Comparison RS Data Type Study Area


Hird et al. (2017) [35] classification BRT LiDAR DTM, Sentinel-1, Sentinel-2 Canada
CART, Fast NB, GMO
Max Entropy, IKPamir, Landsat 3 MMS, Landsat 5 TM,
Farda (2017) [159] classification MLP, Margin SVM, Landsat 7 ETM+, Landsat 8 OLI, Indonesia
Pegasos, RF, Voting ASTER GDEM
SVM, Winnow
classification,
Amani et al. (2019) [160] RF, SNIC Landsat 8 Canada
segmentation
Mahdianpari et al. (2018) [161] classification RF Sentinel-1, Sentinel-2 Canada
Remote Sens. 2022, 14, 3253 28 of 110

Table 6. Cont.

References Method Model Comparison RS Data Type Study Area


LiDAR DEM, Sentinel-1,
DeLancey et al. (2019) [162] classification BRT Canada
Sentinel-2, SRTM DEM
NAIP, JRC Global Surface Water
Wu et al. (2019) [163] classification k-means datasets, LiDAR DEMs, National Canada, United States
Wetlands Inventory (NWI)
Canadian DEM,
Amani et al. (2019) [17] classification RF Canada
Landsat 8, Sentinel-1
Zhang et al. (2019) [164] classification RF Google Earth Pro, Landsat 8 OLI China
classification, aerial photography, Google Earth,
Mahdianpari et al. (2020) [165] RF, SNIC Canada
segmentation Sentinel-1, Sentinel-2
ASTER DEM, Landsat 5 TM,
Hakdaoui et al. (2020) [166] classification RF Morocco
Sentinel-1 GRD, Sentinel-2 MSI
DeLancey et al. (2019) [167] classification U-Net, XGBoost ALOS DEM, Sentinel-1, Sentinel-2 Canada
Canada’s Annual Crop Inventory,
Mahdianpari et al. (2020) [168] classification RF Google Earth, Pleiades, Sentinel-1, Canada
Sentinel-2, WorldView 2
Google Earth, Landsat 5 TM,
Wang et al. (2020) [169] classification DT China
Landsat 7 ETM+, Landsat 8 OLI
Landsat 5 TM, Landsat 7 ETM+,
Mahdianpari et al. (2020 [170] classification CART, MD, RF Canada
Landsat 8 OLI
aerial photography, Google Earth,
Sahour et al. (2021) [171] classification RF, SVM JRC Global Surface Water, United States
Remote Sens. 2022, 14, x FOR PEER REVIEW Sentinel-1, Sentinel-2 33 of 121

Otsu’s thresholding DJI Phantom 4 pro, Gaofen-2,


Jia et al. (2021) [172] segmentation China
algorithm Google Earth, Sentinel 2

Figure 11. Word-cloud visualization of all the reviewed papers targeting wetland mapping (i.e., those
16 papers
Figure 11. summarized
Word-cloud in Table 6).
visualization of all the reviewed papers targeting wetland mapping (i.e.,
those 16 papers summarized in Table 6).
Wetland serves as the globally biggest carbon pool, and thus has important ecolog-
ical service functions
Table 6. Studies targeting(e.g., water
wetland conservation,
mapping regulation,
from RS imagery usingand
AI. maintenance of species
diversity) [173–175]. Global climate change and human activities have posed dramatic chal-
References lenges
Methodin the past few decades
Model to wetland ecosystems,
Comparison and Type
RS Data wetland mapping is essential
Study Area to
conserve and manage terrestrial ecosystems [176]. RS makes investigating large wetland
systems and monitoring their change over time possible
LiDAR DTM,[177].
Sentinel-1,
Hird et al. (2017) [35] classification
Wetlands are highly BRT dynamic landscapes, often making past efforts Canada
to map them
Sentinel-2
out-of-date. This is especially true at the regional or national level, where it is often difficult
to monitor wetlands at scale
CART, due GMO
Fast NB, to their remote location and large spatial scale. While
Landsat 3 MMS, Landsat
Max Entropy, IKPamir,
5 TM, Landsat 7 ETM+,
Farda (2017) [159] classification MLP, Margin SVM, Indonesia
Landsat 8 OLI, ASTER
Pegasos, RF, Voting
GDEM
SVM, Winnow
Remote Sens. 2022, 14, 3253 29 of 110

there are efforts to monitor wetlands in Canada at the sub-regional and -province level,
this is mostly through governmental efforts to produce static maps. Cloud computing on
GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence
probability using LiDAR and RS data for the entire area of Alberta. Mapping subtypes
of wetlands is difficult because while they look similar in RS imagery, they are diverse
environments that cover a wide area. The same is true for classifying peatlands, a subtype
of wetlands, which cover large geographic areas in complex patterns. This is problematic
because peatlands, like wetlands, provide critical habitats that promote biodiversity while
also being a global carbon sink. Past studies have shown that while optical data are useful
for peatland mapping, it is often occluded by clouds or other atmospheric conditions. SAR
data, on the other hand, can detect bodies of water and vegetation at any time of day or
night, but are prone to being noisy due to surface moisture content and roughness. The
authors in [162] demonstrated that by combining SAR, optical, and LiDAR data on the GEE
platform, a BRT model was able to predict peatland occurrence across Alberta province
with relatively high accuracy at high resolution.
Due to the difficulties in producing wetland inventory maps, either from lack of
field data or the challenge of recognizing wetlands because of their heterogeneous and
fragmented nature, these maps are often only produced at a local level. Furthermore,
because of the many local efforts to produce these maps, wetland inventories are often
produced with different datasets and different methods, limiting the ability of interested
parties/stakeholders to compare or combine maps. Anthropogenic activities are meanwhile
converting these wetlands into agricultural or urban landscapes, in addition to natural
rain and flooding events changing their spatial makeup. Thus, it is more important than
ever to be able to produce wetland inventory class maps in order to monitor and protect
existing wetlands. The authors in [161] used optical and SAR RS imagery to produce a 10 m
resolution wetland map for the entire province of Newfoundland, Canada, using both an
RF model and SNIC. Mapping environmental features like wetlands is the first step in being
able to make informed decisions about conservation and restoration projects. However,
more relevant to policymakers is how environments change over time. This information
would allow them to isolate how human activity has changed wetlands during different
periods. The authors in [170] classified wetlands in Newfoundland during three different
periods to show the spatial dynamics of these ecosystems. There have been several attempts
to produce wetland inventory maps in Canada on a large scale, although they often lack
high spatial resolution and the ability to distinguish between wetland sub-types. There is
also the issue of a lack of ground-truth field data, a common problem in ML applications
in EO (there is overwhelmingly more unlabeled data than labeled data). Using field data
collected from one Canadian province was proposed in [17] to create wetland inventory
maps for several others using a mix of optical, SAR, and digital elevation data.
Across Canada, wetland mapping is a well-studied phenomenon. However, different
local and regional agency wetland inventories use different techniques for monitoring
wetlands or have altogether different definitions of what constitutes a wetland. Thus,
even though several large-scale wetland maps have been produced, they are often not
directly comparable. Additionally, these maps are often static and do not continually
monitor wetlands through time. However, these are not the only barriers to mapping
wetlands using RS imagery [165]. Others include obtaining sufficient and recent field data
to verify wetland monitoring products, but also the difficulty of monitoring such dynamic
landscapes. Wetlands do not have clear-cut boundaries, are extremely diverse landscapes
and ecosystems, and are often in flux throughout seasons and years due to flooding and
drying. The authors use optical and SAR Sentinel data in addition to field samples over
the entirety of Canada and show that almost one-fifth of Canada is covered in wetlands.
The study in [165] produced a high-resolution (10-m) wetland inventory map of Canada
(an approximate area of one billion hectares), using multi-year, multi-source (Sentinel-1
and Sentinel-2) RS data on the GEE platform. Wetlands provide a variety of ecological
services and are a key habitat for many species. Human activity has significantly disturbed
Remote Sens. 2022, 14, 3253 30 of 110

wetlands as they are drained for urban or agricultural development. However, monitoring
their health is challenging because it would require taking repeated field measurements
over wide areas. Researchers have used ML and RS data to do so, but the large amount of
compute needed to map wetlands is often prohibitive. The authors in [160] analyzed a large
number of field samples alongside Landsat imagery with an RF model to produce a wetland
map for all of Canada. Wetland mapping and monitoring have been a challenging issue
for the RS community during the past decades. Compared with the United States with the
National Wetlands Inventory, Canada has been lacking a national wetland inventory until
recently. The authors in [168] proposed an object-based classification method to classify
Sentinel-1 and Sentinel-2 data on the GEE cloud-computer platform, which resulted in the
10-m Canadian Wetland Inventory.
Large, inundated wetlands can be effectively mapped using RS imagery. Small wet-
lands or wetlands that are inundated only part of the time are much more difficult to
identify. Yet, it is more important to do so now than ever given that wetlands are rapidly
being converted for agricultural use or are drying up due to climate-induced drying. Moni-
toring wetlands at large scales is possible, however, with the help of automated techniques
like ML. For example, NAIP imagery and LiDAR derived DEM data were used in [163]
to detect wetlands across the northern United States using unsupervised classification on
the GEE platform. Being able to identify wetlands in RS imagery is the first step towards
monitoring their health or decline in a new climate regime, and to make policy choices
based on this information. To this end, spatially high-resolution sensors like LiDAR or
data products like NAIP can help researchers identify wetlands in RS imagery but are not
collected often enough to map wetlands at a fine temporal resolution. This is problematic
because wetlands are dynamic ecosystems; they can be both wet and dry over the course of
the same season. To get around this limitation, Sentinel-1 and 2 imagery were combined
in [171] with aerial photographs and field data to map the spatial variation of wetlands
in portions of the United States over time. Environmental problems are often associated
with land-use changes, but these changes are not solely linked to urban expansion. Land
use change also negatively affects areas like coastal wetlands, which are not monitored as
regularly. The possibility of using GEE to map coastal wetlands in Indonesia was explored
in [159] by comparing all of the different classifiers on the platform and how they perform
with Landsat, digital elevation, and Haralick texture data. The authors showed that in all
cases, ML models did much better at binary than multi-class classification.
Tidal flats, often referred to as coastal non-vegetated areas, are dynamic ecosystems,
both due to their natural rhythms of water advance and retreat, but also due to anthro-
pogenic change and rising sea levels. It is difficult to monitor tidal flats without the use
of multi-temporal, high-resolution RS imagery because of how they change through time.
With Landsat 8 and high-resolution Google Earth imagery, an RF model was used in [164]
on GEE to classify tidal flat types and their distribution in China. The authors reported very
high classification rates across tidal flat classes. However, the authors detailed that satellites
like Landsat did not fully capture tidal ranges. Coastal wetlands are usually composed
of coastal vegetation areas and tidal flats. Coastal tidal flats are natural transitions from
terrestrial ecosystems to ocean ecosystems and are vulnerable to anthropogenic activities
and natural disturbances such as sea-level rise, land reclamation, and aquaculture. Many
existing global land cover data products have a wetland layer, but do not explicitly dif-
ferentiate coastal vegetation area and coastal tidal flats (no specific layer for coastal tidal
flats). The authors in [169] developed a pixel- and frequency-based approach to generate
annual maps of tidal flats at 30-m spatial resolution in China’s coastal zone using the
Landsat TM/ETM+/OLI images and the GEE cloud computing platform. Tidal flats are
unique ecosystems but are threatened due to human disturbances and climate change.
Additionally, they are difficult to identify in RS imagery because satellite platforms cannot
capture intertidal variability due to their infrequent return times. The authors in [172]
addressed this limitation by first processing high-resolution RS and UAS imagery to map
minimum and maximum water and vegetation extent. They used Otsu’s thresholding
Remote Sens. 2022, 14, 3253 31 of 110

algorithm to automatically detect the best ratio for each index. These two indices were then
Remote Sens. 2022, 14, x FOR PEER REVIEW
combined in a composite that showed the total intertidal area in the RS imagery, to37which of 121

the authors again applied the Otsu thresholding algorithm. The end result was a highly
accurate map of tidal flats that did not require any post-processing.
Sebkhas
sebkhas formare a type of salty,
in Morocco. Wetlandunvegetated
inventorywetland
maps are created when desert
increasingly beingbodies
used toofinform
water
become more salinated over time due to mechanisms of water loss
carbon pricing, ecosystem service values, and conservation/restoration decisions. Thus, it such as evaporation.
They are home
is important to tomake
specific species ofprocessing
a repeatable vegetation pipeline
and fish thatthat can
can survive in salinated
ingest, process, and
environments,
visualize data on buta their drainage
day-to-day networks
basis are often underground,
so that monitoring programs and making
reportingthem hard to
programs
identify
(like in aingovernment
RS imagery.setting)
An RF have
modelup-to-date,
was used in [166] toinformation.
accurate identify water To cavities
this end,wherethere
sebkhas form in Morocco. Wetland inventory maps are increasingly
have been many studies identifying wetlands using RS imagery and ML, yet most of them being used to inform
carbon pricing,
suffer from not ecosystem
being able service values, between
to distinguish and conservation/restoration
wetland subtypes. This decisions. Thus, it
is a challenging
is important to make a repeatable processing pipeline that can
issue because fens, peatlands, bogs, marshes, and swamps can have very different ingest, process, and visualize
data on a day-to-day
vegetation types andbasis so thatItmonitoring
structure. is important programs
to be able and to
reporting programs
distinguish between (likethem
in a
government
because theysetting) have up-to-date,
each respond differently accurate
to humaninformation.
disturbanceTo andthis end, there
changes have been
in climate. The
many studies identifying wetlands using RS imagery and ML,
authors in [167] compared the performance of an XGBoost model to a CNN for wetland yet most of them suffer from
not
typebeing able to distinguish between wetland subtypes. This is a challenging issue because
classification.
fens, peatlands, bogs, marshes, and swamps can have very different vegetation types and
structure. It is important
3.2.7. Infrastructure and to be able Detection,
Building to distinguish between them
Urbanization because they each respond
Monitoring
differently to human disturbance and changes in climate. The authors in [167] compared
Infrastructure, building detection, and urbanization monitoring is the 7th-most-well-
the performance of an XGBoost model to a CNN for wetland type classification.
developed application using GEE and AI (11 studies total). Table 7 below summarizes
those Infrastructure
3.2.7. studies and a word cloud generated
and Building Detection, from the titles and
Urbanization keywords of those papers is
Monitoring
provided in Figure 12. The most frequently used terms are “Google Earth Engine”,
Infrastructure, building detection, and urbanization monitoring is the 7th-most-well-
“urban”, “land”,
developed “building”,
application “impervious”,
using GEE and AI (11etc. The vast
studies majority
total). Table 7ofbelow
the studies in this
summarizes
domain take place in China and are both static mapping
those studies and a word cloud generated from the titles and keywords of those papers and change-detection
applications.
is provided inInfrastructure
Figure 12. The andmost
urban area identification
frequently used termsisare often done Earth
“Google by comparing
Engine”,
these classes to other LULC classes, so we notice that “vegetation”
“urban”, “land”, “building”, “impervious”, etc. The vast majority of the studies and “forest” in also
this
appear in the word cloud. From our interactive web app (see Appendix
domain take place in China and are both static mapping and change-detection applications. A) and Table 7,
most frequently used RS datasets are Landsat 8 OLI, Landsat 7 ETM+,
Infrastructure and urban area identification is often done by comparing these classes to and Google Earth.
The most
other LULC popular
classes,AI so models
we notice arethat
RF,“vegetation”
CART, and and SVM, and the
“forest” most
also frequently
appear used
in the word
evaluation metrics are OA, Kappa, PA, and UA. A brief summary
cloud. From our interactive web app (see Appendix A) and Table 7, most frequently used of those studies is
provided below Table 7. More detailed textual summaries for
RS datasets are Landsat 8 OLI, Landsat 7 ETM+, and Google Earth. The most popular AI some selected studies are
provided
models areinRF,
Appendix
CART, and C.7.SVM, and the most frequently used evaluation metrics are OA,
Kappa, PA, and UA. A brief summary of those studies is provided below Table 7. More
detailed textual summaries for some selected studies are provided in Appendix C.7.

Figure 12. Word-cloud visualization of all the reviewed papers targeting infrastructure and building
Figure 12. urbanization
detection, Word-cloud visualization of all
monitoring (i.e., the reviewed
those 11 paperspapers targeting
summarized infrastructure
in Table 7). and building
detection, urbanization monitoring (i.e., those 11 papers summarized in Table 7).
Remote Sens. 2022, 14, 3253 32 of 110

Table 7. Studies targeting infrastructure and building detection from RS imagery using AI (Note that
references marked * denotes novel methods and is detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


Google Earth, Landsat 7 ETM+,
Goldblatt et al. (2016) [178] classification CART, RF, SVM India
Landsat 8, WorldPop
Google Earth, Landsat 7 ETM+,
Huang et al. (2018) [179] classification BRT China
Landsat 8 OLI
FROM-GLC, GHSL, Google Earth,
classification,
Xu et al. (2019) [180] LandTrendr, RF Landsat 5 TM, Landsat 7 ETM+, China
segmentation
Landsat 8 OLI
GPP, GOME-2, Google Earth Pro,
Zhong et al. (2019) [181] regression cubic regression Landsat 5 TM, Landsat 7 ETM+, China
Landsat 8 OLI, MOD09A1
DMSP NTL, GHSL, GlobeLand30,
Lin et al. (2020) [182] classification RF Google Earth, Landsat 8, China
Sentinel-1, SRTM DEM, VIIRS NTL
Geo-Wiki, GHSL, GlobeLand30,
CART, Otsu’s Google Earth, Hansen Global
classification,
Liu et al. (2020) [183] thresholding Forest Change, Landsat 5 TM, China
segmentation
algorithm, RF Landsat 7 ETM+, Landsat 8 OLI,
OpenStreetMap, SRTM DEM
classification, Google Earth, Landsat 5 TM,
Mugiraneza et al. (2020) [184] LandTrendr, SVM Rwanda
segmentation Landsat 7 ETM+, Landsat 8 OLI
CART, gmoMaxEnt,
Lin et al. (2021) [185] * classification Landsat 8 OLI China
NB, RF, SVM
Landsat 5 TM, Landsat 7 ETM+,
Carneiro et al. (2021) [186] classification RF Landsat 8 OLI, Sentinel-2, SRTM Brazil
DEM
Zhang et al. (2021) [187] classification RF Landsat 8 OLI China
GCL-FCS30-2020, GHSL, Google
Samat et al. (2022) [188] classification SVM China
Earth, Sentinel-2

Materials like parking lots, roads, and buildings (i.e., concrete, asphalt) can be classified
as “impervious surfaces” in RS analyses and are often indicative of human development
and urban extent. Impervious surfaces change the hydrological cycle and produce heat
effects, affecting overall ecosystem health and well-being. To monitor these materials,
researchers have tried using night-time lights to estimate their extent, but this process leads
to overestimates as light scatters. To investigate how best to identify impervious materials
in RS imagery regardless of cloud cover, the authors in [182] combined nighttime light,
DEM, and SAR data and an RF model on GEE. Their resulting maps were more accurate
than commonly used maps like GlobeLand30. The authors in [180] put forward a new
scheme to conduct long-term monitoring of impervious−relevant land disturbances using
Landsat archives.
While greenhouses are used to grow food and help ensure food security, their prolif-
eration can have environmental consequences. Previous attempts to classify greenhouses
from RS imagery as part of LULC research have focused on small-scale proof-of-concept
applications and have not emphasized identifying the structures in complex terrain types.
To explore the possibility of identifying greenhouses in RS imagery over a large area in
China, an ensemble ML model was designed in [185] to distinguish them from water, forest,
farmland, and construction sites. Urban green spaces have a multitude of benefits, such as
regulating urban climate, improving air quality, and reducing stormwater. RS has proven
useful for studying the landscape structure of urban green spaces. The authors in [179]
assessed the impact of urban form on the landscape structure of urban green spaces in
262 cities in China. The results revealed that cities with a high road density tended to
have a smaller area of urban green spaces and be more fragmented. In contrast, cities with
complex terrains tended to have more fragmented urban green spaces.
Remote Sens. 2022, 14, 3253 33 of 110

Rapid urban expansion around the world has led to worsening human and ecosystem
health, affecting forests, air and water pollution levels, and overall levels of biodiversity.
However, the currently available maps for mapping urban settlements and their expansion
are mostly static, whereas it would be more useful to have up-to-date information to be
able to make better urban planning and land-use decisions. The authors in [186] designed a
workflow for mapping urban sprawl over time in Brazil using an RF on the GEE platform.
Increasing rates of urbanization put pressure on conservation targets and biodiversity levels
as land previously occupied by ecosystems is converted into built-up areas. RS imagery
makes it much easier for urban planners and researchers to monitor rates of urbanization
and urban sprawl over wide areas. However, few labeled datasets are available using ML to
identify buildings and built-up areas. To address this problem, a large, vectorized, ground-
truth verified dataset was created in [178] in India in order to train different ML models
on GEE. A semi-automatic large-scale and long-time-series (LSLTS) urban land mapping
framework was demonstrated in [183] by integrating the crowdsourced OpenStreetMap
(OSM) data with free Landsat images to generate annual urban land maps in the middle
Yangtze River basin (MYRB) from 1987 to 2017.
Research on urbanization and urban sprawl will often focus on how urban spaces are
replacing agricultural land and forested spaces. Vegetation maps, on the other hand, are
often produced using “urban”, “built-up areas”, or “impervious surfaces” as classes to
predict for, distinctly separating vegetation and zones of human inhabitation. Much less
work has gone into monitoring vegetation prevalence and distribution within urban spaces
themselves. This is an important and timely research topic given the environmental and
psychological benefits people get from having access to green spaces within cities, such
as stress reduction, better air quality, and lower temperatures. Using different vegetative
indices (EVI, Gross Primary Production, etc.) derived from Landsat and MODIS data, the
authors in [181] showed that urban sprawl in Shanghai had increased significantly in the
last decade and a half.

3.2.8. Wildfires and Burned Area


Wildfires and burned areas have less than 10 studies using GEE and AI (eight studies
total). Table 8 below summarizes those studies and a word cloud generated from the titles,
keywords, and abstracts of the eight papers is provided in Figure 13. The most frequently used
words are “Google Earth Engine”, “burn/ed”, “fire(s)”, “forest”, and “change”, which show
that the main focus of these studies is on monitoring forested areas pre- and post-fire. The
data products “Sentinel-2”, “Landsat”, and MODIS maps like “MCD64A1” (a burned-area
product) indicate that wildfire and burned area mapping analyses rely mainly on optical
imagery. From our interactive web app (see Appendix A) and Table 3, the mostly used RS
datasets are Landsat 8 OLI, and MODIS. The most popular AI models are RF, SVM, and CART,
and the most-used evaluation metrics are OA, commission error (CE), Kappa, omission error
(OE), and R2 . A brief summary of those eight studies is provided below Table 8. More detailed
textual summaries for each of the eight studies are detailed in Appendix C.8.

Table 8. Studies targeting wildfires from RS imagery using AI. (Note that references marked * denotes
novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


Landsat 4 TM, Landsat 5 TM,
Parks et al. (2019) [189] regression RF Canada, United States
Landsat 7 ETM+, Landsat 8 OLI
Landsat 5 TM, Landsat ETM+,
FormaTrend,
Quintero et al. (2019) [190] segmentation Landsat OLI, MCD64A1, Spain
LandTrendr
SRTM DEM
CBERS-4 MUX, FireCCI51,
Gaofen-1 WFV, GFED4, Google
Long et al. (2019) [191] * classification RF, SVM Global
Earth, MCD12C1, MOD44B,
MTBS, Landsat-8
Remote Sens. 2022, 14, 3253 34 of 110

Table 8. Cont.

References Method Model Comparison RS Data Type Study Area


FireCCI51, IRS 1C, Landsat 5,
CART, RF, SVM, Weka
Bar et al. (2020) [192]Remote Sens. 2022,
classification
14, x FOR PEER REVIEW Landsat 8 OLI, MODIS, India 40 of 121
clustering
ResourceSat 2, Sentinel-2, VIIRS
Sulova and Jokar Arsanjani CGLS-LC100, FIRMS, MOD13Q1,
classification CART, NB, RF Australia
(2020) [193] mainly on optical imagery. FromSentinel-2, SRTM DEM
our interactive web app (see Appendix A) and Table 3,
Zhang et al. (2020) [194] classification the mostly used
RF RS datasets are Landsat 8 OLI,5 and MODIS. The mostGlobal
Landsat popular AI models
Seydi et al. (2021) [195] classification are RF, SVM,
KNN, and
RF, CART, andLandsat
SVM the most-used
8, MODIS, evaluation
Sentinel 2metrics are OA, commission error
Australia
(CE), Kappa, omission error (OE), and R2. A brief summary of those eight studies is
Arruda et al. (2021) [196] classification DNN Table 8. More
provided below INPE, Landsat
detailed 8 OLI,
textual MODIS for each of the
summaries Brazil
eight studies are
detailed in Appendix C.8.

Figure 13. Word-cloudFigure visualization of visualization


13. Word-cloud all the reviewed papers
of all the targeting
reviewed papers wildfires and burned
targeting wildfires areasareas
and burned
(i.e., those eight papers summarized
(i.e., those eight papers summarized in Table 8). in Table 8).

Table 8. Studies targeting wildfires from RS imagery using AI. (Note that references marked *
Wildfires cause damage to ecosystems and human health, in addition to releasing
denotes novel methods and will be detailed in Section 3.3.).
greenhouse gasses when they burn. Climate change increases the number of wildfires
across the globe. The
References recent massive
Method Modelwildfires,
Comparison which hit Australia
RS Data Type during the 2019–2020
Study Area
summer season, raised questions to what extent the risk of wildfires can be linked to
various climate, environmental, topographical, and social Landsat 4 TM, Landsat
factors and how 5 to predict fire
Canada, United
Parks etoccurrences
al. (2019) [189]to take regression RF TM, Landsat
preventive measures. An automated and cloud-based workflow 7 ETM+, was
States
Landsat 8 OLI
developed in [193] for generating a training dataset of fire events at a continental level
using freely available RS data on GEE. Landscape firesLandsat have been
5 TM,aLandsat
major natural hazard
Quintero et al. (2019)
affecting West-Central Spain, and FormaTrend,
therefore, it is critical
segmentation ETM+, Landsat OLI, and characterize
to be able to map Spain
[190] LandTrendr
landscape fires. Using the LandTrendr (Landsat-basedMCD64A1,DetectionSRTM of Trends
DEM in Disturbance
and Recovery) and FormaTrend (Forest Monitoring for Action—Trend) algorithms on
the GEE cloud-computing platform, a method was proposed CBERS-4 MUX, FireCCI51,
in [190] for identifying fire-
Gaofen-1 WFV, GFED4,
Long et induced disturbances.
al. (2019) [191] * Wildfires are aRF,
classification common
SVM occurrence in the Brazilian Cerrado, often
Global
Google Earth, MCD12C1,
determining and changing the natural plant species in burn cycles. However, the Cerrado
MOD44B, MTBS, Landsat-8
has been undergoing increasing anthropogenic conversion into cropland and pastures,
which has changed hydrological and biogeochemicalFireCCI51, cycles within
IRS 1C, this ecosystem. This in
Landsat
turn has led to changes in fire size,CART, pattern,
RF, frequency,
SVM, 5,and severity,
Landsat 8 OLI,so it is more important
MODIS,
Bar et al. (2020) [192] classification India
than ever that methods to quickly Weka
andclustering
reproduciblyResourceSat
monitor the 2, Sentinel-2,
fire landscape within
savannah are created. A completely cloud-based DL workflow VIIRScombining Google Cloud
and GEE was designed in [196] to classify burn scar areas in Brazil.
CGLS-LC100, FIRMS,
Sulova andTraditional
Jokar wildfire mapping field surveys and digitization efforts are time-consuming
classification CART, NB, RF MOD13Q1, Sentinel-2, Australia
Arsanjani
and(2020)
hard[193]
to reproduce over time. Burned area indices can SRTM be created
DEM to monitor post-fire
landscapes and their subsequent recovery, but their thresholds are not dynamic and so
Zhang etperform
al. (2020)differently
[194] classification
in different locations.RFSentinel-2 data wasLandsat used in5 [195], along with Global
two
different burn areas and LULC maps to train different ML classifiers (k-nearest neighbor
(KNN), RF, SVM) to map wildfire damage in Australia. As the planet warms, forest fires
are increasing in occurrence and severity. This has negative consequences for ecosystems,
Remote Sens. 2022, 14, 3253 35 of 110

biodiversity, and human health. To estimate the damage caused by forest fires and their
subsequent recovery rates, RS imagery is needed to monitor forests and burn scars over
large areas. However, to date, most fire products are created with coarse RS imagery,
making regional and local fire monitoring difficult. To determine the impact of using
higher-resolution RS data products, how Landsat and Sentinel optical imagery affected an
ML model’s performance in burn area classification was compared in [192].
Burned area maps showing where wildfires have occurred are important in being
able to analyze global wildfire trends. However, many burned area maps derived from
RS imagery are from the MODIS platform. The 250 m spatial resolution of products like
FireCCI51 leave out a lot of detail, so the authors in [191] used CBERS, Gaofen, and Landsat
imagery to create a 30 m burned-area dataset for 2015. However, the authors noted that their
method had difficulty recognizing burned areas from recently plowed fields in agricultural
areas, so crop-type masks should be used to remove potential false positives. Additionally,
Landsat data was used for both the data collection and validation stage. Thus, the authors
were not able to assess the suitability of using Landsat imagery for data collection purposes
despite their high accuracy rates. Later on, [194] adapted the exact same processing steps
on GEE to produce a burned area map for the year 2005, illustrating how sharing and
storing code on GEE makes it easy to re-run analyses or adapt them for new use cases. 42 of 121
Remote Sens. 2022, 14, x FOR PEER REVIEW
Satellite-derived spectral indices such as the relativized burn ratio (RBR) allow fire
severity maps to be produced across multiple fires and broad spatial extents. In order to
better interpret the fire severity in terms
Satellite-derived spectralofindices
on-the-ground
such as the fire effectsburn
relativized compared to non-
ratio (RBR) allow fire
standardized spectral indices,
severity maps to[189] produced
be produced a multiple
across map of fires
composite
and broad burn index
spatial (CBI),
extents. a to
In order
frequently used field-based measure
better interpret the fireofseverity
fire severity.
in terms of on-the-ground fire effects compared to non-
standardized spectral indices, [189] produced a map of composite burn index (CBI), a
3.2.9. Heavy Industry and Pollution
frequently Monitoring
used field-based measure of fire severity.
There are seven studies about heavy industry and pollution monitoring using GEE
3.2.9. Heavy Industry and Pollution Monitoring
and AI. Table 9 below summarizes those studies and a word cloud generated from the titles,
There are seven studies about heavy industry and pollution monitoring using GEE
keywords, and abstracts
and AI. of the9seven
Table below papers is provided
summarizes in Figure
those studies and a 14.
word The
cloudmost frequently
generated from the
used words formtitles,
the phrase
keywords,“Google Earth of
and abstracts Engine”.
the sevenMost
papersapplications
is provided in inFigure
this area aremost
14. The
focused on monitoring reclamation
frequently used wordsor pollution at active
form the phrase or previous
“Google mining
Earth Engine”. sites,
Most so “mine”
applications in this
and “mining” feature prominently
area are in this word
focused on monitoring cloud.or The
reclamation algorithm
pollution at active LandTrendr wassites,
or previous mining
so “mine”
used by several papers afterand “mining”mine
identifying feature prominently
sites to monitorinpollution
this wordand cloud. Thelevels
water algorithm
LandTrendr was used by several papers after identifying mine
or vegetation changes through time. From our interactive web app (see Appendix A) and sites to monitor pollution
and water levels or vegetation changes through time. From our interactive web app (see
Table 9, the most-used RS datasets are Sentinel-2, Landsat 8 OLI, Landsat 5 TM, Landsat 5,
Appendix A) and Table 9, the most-used RS datasets are Sentinel-2, Landsat 8 OLI,
and Google Earth. The most
Landsat popular
5 TM, Landsat models
5, and Googleare RF, The
Earth. CART,
mostand
popularLandTrendr,
models are RF, and the and
CART,
most-used evaluation metricsand
LandTrendr, arethe
Kappa, OA,evaluation
most-used PA, UA. A brief are
metrics summary
Kappa, OA, of those seven
PA, UA. A brief
studies is provided below Table
summary 9. More
of those seven detailed
studies istextual
provided summaries
below Table for 9.each
Moreof detailed
the seven textual
summaries
studies are detailed for each
in Appendix of the seven studies are detailed in Appendix C.9.
C.9.

Figure 14. Word-cloudFigure 14. Word-cloud


visualization of all visualization
the reviewedofpapers
all the targeting
reviewed heavy
papers industry
targeting and
heavy industry and
pollution
pollution monitoring (i.e., those seven papers summarized in Table 9).
monitoring (i.e., those seven papers summarized in Table 9).
Table 9. Studies targeting heavy industry and pollution from RS imagery using AI.

References Method Model Comparison RS Data Type Study Area

DART, Google Earth Pro,


Waller et al. (2018) [197] regression RF United States
Landsat 5 TM, NLCD
Remote Sens. 2022, 14, 3253 36 of 110

Table 9. Studies targeting heavy industry and pollution from RS imagery using AI.

References Method Model Comparison RS Data Type Study Area


DART, Google Earth Pro, Landsat
Waller et al. (2018) [197] regression RF United States
5 TM, NLCD
Lobo et al. (2018) [198] classification CART RapidEye, Sentinel 2 Brazil
Google Earth, Landsat 5,
Xiao et al. (2020) [199] segmentation LandTrendr Mongolia
Landsat 7, Landsat 8
Balaniuk et al. (2020) [200] classification CNN Sentinel-2 Brazil
Landsat 5, Landsat 8, Natural
Fuentes et al. (2020) [201] classification RF Resource Canada DEM, Sentinel-1, Canada
Sentinel-2
Google Earth, Landsat 5 TM,
He et al. (2020) [202] segmentation LandTrendr China
Landsat 7 ETM+, Landsat 8 OLI
Zhou et al. (2021) [203] classification CART, RF, SVM Sentinel-2 China

Mining can lead to lots of environmental degradation during the actual mining process
itself, but often continues to do so if mines are not properly reclaimed after the mine is no
longer active. Field techniques for monitoring environmental damage operate on a limited
spatial and temporal scale, failing to fully capture what is happening. RS can help monitor
ecological changes during mining and ensure that mining companies clean up after mining
has stopped during the reclamation process. A mapping study was performed in [198] for
mining areas in the Brazilian Amazon using Sentinel-2A images and the CART classifier in
GEE. To monitor mining disturbances at a coalfield in Mongolia, the LandTrendr algorithm
was used in [199] to analyze Landsat data. The authors designed a fast, efficient method
on the GEE platform to monitor surface mining operations and show that only 26% of
promised reclamation was undertaken at the Shengli Coalfield. Heavy industry projects
like mining normally require reclamation after the fact to ensure that local ecosystems can
heal and regenerate. Monitoring sites that have undergone mining is made much easier
with RS imagery because they are often large, spatially distributed ecological disturbances.
This is especially the case for underground mining projects where subsidence occurs but is
difficult to track without an aerial view. Landsat imagery and the LandTrendr algorithm
were utilized in [202] to monitor water accumulation in subsidence areas of past mining in
China. Mining is economically important because of the many jobs and resultant materials
it provides but is associated with various environmental and health risks. One such danger
comes from the failure of tailings dams, which store water with toxic levels of waste solids.
Even though these failures can cause significant damage to the environment, human health,
and infrastructure, there is not a global database containing active tailings dams. This in
turn can make it easier for illegal mines to operate as legal mining operations with tailings
dams are not heavily monitored. In order to keep track of mines and dams in Brazil, two
different CNNs were used in [200] to first classify potential mining sites and then to classify
perceived/potential environmental risk.
As cities expand and develop, construction and demolition waste is often stored until
it can be further processed, reused, or gotten rid of. Sometimes these waste piles are orderly
and are trackable, but many are not, making it hard to manage them and their potential
negative environmental or social effects. Current methods to take stock of waste piles
and dump sites rely on field investigations, which take a lot of time, effort, and money to
produce. More work needs to be done to identify them using RS imagery and ML methods,
but tuning different ML methods and their respective parameters can lead to different
results. To test the efficacy of different ML algorithms for identifying waste and dump sites
in optical imagery, the parameters for the CART, RF, and SVM algorithms available on GEE
were optimized in [203].
Oil and gas pads are developed for production and then capped, reclaimed, and left
to recover when no longer productive. Understanding the rates, controls, and degree of
recovery of these reclaimed well sites to a state similar to pre-development conditions is
Remote Sens. 2022, 14, 3253 37 of 110

critical for energy development and land management decision processes. The authors
in [197] used time series data of the Soil Adjusted Total Vegetation Index (SATVI), calculated
from Landsat 5 imagery, to track changes and assess vegetation regrowth on 365 abandoned
well pads located across the Colorado Plateau. Previous estimates of particulate matter for
the Canadian Air Pollutant Emissions Inventory (APEI) were based on the exposed mine
disturbance areas that had been calculated using outdated mine area extents. With GEE
JavaScript API, RF classifiers were used in [201] to produce maps of mine waste extents
with Landsat-8 and Sentinel-1 and Sentinel-2 archives.

3.2.10. Climate and Meteorology


There are seven climate and meteorology studies using GEE and AI. Table 10 below
summarizes those studies and a word cloud generated from the titles, keywords, and
abstracts of those papers is provided in Figure 15. The most frequently used words are
“Google Earth Engine”, “changes” and “climate”. There are also specific keywords for
ocean-related studies like “sea” and “salinity”, and studies focused more on land and
atmosphere characteristics like “surface”, “land”, “temperature”, “LST” and “albedo”.
From our interactive web app (see Appendix A) and Table 10, the most-used RS datasets
are Landsat 8, Landsat 5, and Sentinel-2. The most popular AI models are RF, and the most
frequently used evaluation metrics are mean absolute error (MAE), OA, root mean square
error (RMSE), R2 . A brief summary of those studies is provided below Table 10. More
detailed textual summaries for each of the seven studies are detailed in Appendix C.10.

Table 10. Studies targeting climate and meteorology studies.

References Method Model Comparison RS Data Type Study Area


MCD43A1, MCD43A2,
Chrysoulakis et al. (2019) [204] regression polynomial regression Global
MOD09CMA
Landsat 7 ETM+, Landsat 8 OLI, France, Portugal,
Chastain et al. (2019) [205] regression major axis regression
Sentinel-2 MSI Spain, United States
Australia, Brazil,
Canada, China,
DMSP-OLS NTL, Global Forest France, Japan, Mexico,
Demuzere et al. (2019) [206] classification RF Canopy Height, Landsat 8, Poland, Singapore,
Sentinel-1, Sentinel-2 Spain, Sudan, United
Kingdom, United
States
Ranagalage et al. (2019) [207] classification SVM Landsat 5, Landsat 8 Sri Lanka
Medina-Lopez and
regression DNN Sentinel-2 Global
Ureña-Fuentes (2019) [208]
Landsat 4, Landsat 5, Landsat 7,
Besnard et al. (2019) [209] regression LSTM, RF Global
Landsat 8, MCD43A4
China, India,
MCD12Q1, MOD13A2,
Elnashar et al. (2020) [210] regression ANN, GBR, SVR Myanmar, Thailand,
SRTM DEM
Vietnam

Forests store much of the world’s terrestrial carbon, but globally they are under threat
due to the effects of global warming and human disturbance. While forests release carbon
immediately when they are cut down or otherwise disturbed, they also release carbon
through secondary effects. This type of climate “memory” or lag in carbon flux is much
less studied and so not well-known. To study this mechanism further, the authors in [209]
used an LSTM and compared the performance to an RF for carbon fluxes in global forests.
atmosphere characteristics like “surface”, “land”, “temperature”, “LST” and “albedo”.
From our interactive web app (see Appendix A) and Table 10, the most-used RS datasets
are Landsat 8, Landsat 5, and Sentinel-2. The most popular AI models are RF, and the
most frequently used evaluation metrics are mean absolute error (MAE), OA, root mean
square error (RMSE), R2. A brief summary of those studies is provided below Table 10.
Remote Sens. 2022, 14, 3253 38 of 110
More detailed textual summaries for each of the seven studies are detailed in Appendix
C.10.

Figure 15.15.Word-cloud
Word-cloud visualization of all the
visualization of reviewed papers targeting
all the reviewed papers climate and climate
targeting meteorology
and
(i.e., those seven
meteorology (i.e.,papers summarized
those seven in Table 10). in Table 10).
papers summarized

TableAccurate satellite-derived
10. Studies targeting climatealbedo estimations
and meteorology are needed to parameterize and in turn
studies.
to validate climate simulation models. MODIS satellite observations from 2000 to 2015 were
analyzed in [204] usingModel
GEE to derive global snow-free land surface albedo estimations and
References Method RS Data Type Study Area
trends at a 500 m resolution.
Comparison A method was presented in [208] to obtain high-resolution
sea surface salinity (SSS) and temperature (SST) by using Sentinel-2 Level 1-C Top of
Chrysoulakis et al. Atmosphere reflectance polynomial MCD43A1,
data. The consistency betweenMCD43A2,
Tropical Rainfall Measuring Mission
regression Global
(2019) [204] (TRMM) multi-satellite precipitation
regression and monthly
MOD09CMAprecipitation has been confirmed
gauged
worldwide. A downscaling framework (from 25 km to 1 km) was proposed in [210] for
TRMM precipitation products by integrating GEE and Google Colaboratory France,(Colab).
Portugal,
Chastain et al. (2019) major axis Landsat 7 ETM+, Landsat 8
Furthermore, 30-m Landsat imagery has a long history of coverage
regression Spain,between
United the
[205] regression OLI, Sentinel-2 MSI
seven ETM+ and eight OLI sensors. Sentinel-2 Multispectral Instrument (MSI) Statesimagery
has a higher resolution of 10-m and faster revisit frequency (10 days instead of 16 days
for Landsat). Being able to use all ofDMSP-OLS
these sensors
NTL, together
Global for a Australia,
given EOBrazil,
analysis
Demuzere et al. (2019) would greatly increase the available spatial and temporal resolution,
Forest Canopy Height, but the sensors
Canada, China, have
classification
differences that need to be RFcalibrated before they can be integrated. Still, this is one of the
[206] Landsat 8, Sentinel-1, France, Japan,
most-requested datasets we found in our review. Major-axis regression
Sentinel-2 was performed
Mexico, Poland,
in [205] on these datasets in pairs (seven ETM+/8 OLI, 7 ETM+/2 MSI, and eight OLI/2
MSI) across the entire coterminous United States and they were able to determine cross-
platform correction coefficients for the Blue, Green, Red, NIR, and SWIR bands present in
all three satellites.
Urbanization has changed the urban landscape and resulted in increasing land surface
temperature (LST). In [207], the authors investigated the impacts of landscape changes on LST
intensity (LSTI) in a tropical mountain city in Sri Lanka. There are several ongoing attempts
to classify cities around the world based on various characteristics like urban canopy cover,
total built-up area, neighborhood sizes, and urban heat island effects (for example, see Urban
Atlas, World Urban Database Access and Portal Tools (WUDAPT)). These datasets can help
planners and policymakers make more informed decisions as they consider implementing
sustainability measures in their respective cities. However, these types of spatial datasets often
rely on surveying methods that need to be continually updated. A cloud-based workflow
was implemented in [206] and compared to the traditional method of using SAGA GIS for
producing local climate zone city maps based on data like WUDAPT.

3.2.11. Disaster Management


Disaster management has six studies using GEE and AI. Table 11 below summarizes
those studies and a word cloud generated from the titles, keywords, and abstracts of the six
papers is provided in Figure 16. The most frequently used words are “Google Earth Engine”,
“recovery”, “landslide”, “post-disaster”, “hurricane”, and “damage”. Many studies were
focused on mapping buildings after flood or landslide events. One of the main challenges
Remote Sens. 2022, 14, 3253 39 of 110

to doing disaster management research on GEE is that there is a delay to uploading RS


data from the time it is recorded and the time it is uploaded to the platform. This limits the
utility of doing time-sensitive research or deploying time-sensitive applications 46
Remote Sens. 2022, 14, x FOR PEER REVIEW onofGEE
121
(i.e., the keywords “rapid”, “assess”). From our interactive web app (see Appendix A)
and Table 11, the most-used RS dataset is Landsat 8. The most popular ML models are RF
and CART, while
(WUDAPT)). thedatasets
These most frequent evaluation
can help plannersmetrics used are OA,make
and policymakers PA, and
moreUA. A brief
informed
summary as
decisions of they
thoseconsider
studies is provided below
implementing Table 11. More
sustainability detailed
measures textual
in their summaries
respective for
cities.
each of the six studies are detailed in Appendix C.11.
However, these types of spatial datasets often rely on surveying methods that need to be
continually updated. A cloud-based workflow was implemented in [206] and compared
Table 11. Studies targeting disaster management from RS imagery using AI.
to the traditional method of using SAGA GIS for producing local climate zone city maps
References basedMethod
on data like WUDAPT.
Model Comparison RS Data Type Study Area
Landsat 5 TM, Landsat 7, Landsat
Yu et al. (2018) [211] classification
3.2.11. RF
Disaster Management Nepal
8, SRTM DEM

Cho et al. (2019) [212]


Disaster management RF
classification
has six studiesLandsat
using7GEE ETM+, and AI. Table
Landsat 8 OLI, 11 below summarizes
United States
those studies and a word cloud generated MODIS
fromTerra, Sentinel-1,
the titles, SMOS
keywords, and abstracts of the
six segmentation
papers is providedCART,
in Figure 16. The mostGoogle Earth, Landsat
frequently used8, words are Bangladesh
“Google Earth
Uddin et al. (2019) [213] GEOBIA
Sentinel-1, SRTM DEM
Engine”, “recovery”, “landslide”, “post-disaster”, “hurricane”, and “damage”. Many
studies were focused on mapping buildings Global Precipitation
after floodMeasurement
or landslide events. One of the
satellite data, JRC Global Surface
main Otsu’s thresholding
challenges to doing disaster management
Vanama et al. (2020) [214] segmentation Waterresearch on GEE
dataset, Landsat 8, is that there India
is a delay to
algorithm
uploading RS data from the time it is recorded and the
Sentinel-1 GRD,time it is uploaded to the platform.
Sentinel-2,
This limits the utility of doing time-sensitiveWorldView research3 or deploying time-sensitive
applications on GEE (i.e., the keywords “rapid”,GeoEye “assess”).
1, Google Earth
From Pro, our interactive web app
Ghaffarian et al. (2020) [215] classification RF Landsat 7 ETM+, Landsat 8 OLI, Philippines
(see Appendix A) and Table 11, the most-used RS dataset
WorldView is Landsat
1, WorldView 3 8. The most popular
ML models are RF and CART, while the mostNAIP, frequent evaluation metrics used are OA,
NOAA NGS
Kakooei and Baleghi (2020) [216] PA,classification CART, RF
and UA. A brief summary of those studies is provided below Table 11.UnitedMoreStates
detailed
Emergency Response
textual summaries for each of the six studies are detailed in Appendix C.11.

Figure 16.
16.Word-cloud visualization
Word-cloud of all the of
visualization reviewed papers
all the targeting
reviewed disastertargeting
papers management (i.e.,
disaster
those six papers
management summarized
(i.e., in Table
those six papers 11).
summarized in Table 11).

TableRS
11.imagery has longdisaster
Studies targeting been used to monitor
management fromcommunity recovery
RS imagery using AI. after natural disas-
ters. Decision makers can use RS imagery and analyses to redirect resources during the
recovery process. EvenModel
so, many studies focused on disaster recovery use VHR imagery
References Method RS Data Type Study Area
that increases data storage and compute needs. To explore the suitability of GEE for dis-
Comparison
aster recovery, the authors in [215] used an RF model trained on Landsat imagery to do
Landsat
change detection on pre- and post-disaster 5 TM,
areas Landsat
in the 7,
Philippines. Building detections
Yu et al. (2018) [211] classification RF Nepal
in post-disaster scenes are a valuable resource
Landsatfor timelyDEM
8, SRTM assessing damages in disaster
management. Using RGB images as input, an automatic building detection method was
proposed in [216] to find buildings andLandsat 7 ETM+, Landsat
their irregularities in pre-8 and post-disaster (sub-)
Cho et al. (2019) [212] meter resolution images.RF
classification OLI, MODIS Terra, Sentinel-1, United States
SMOS

CART, Google Earth, Landsat 8,


Uddin et al. (2019) [213] segmentation Bangladesh
GEOBIA Sentinel-1, SRTM DEM
Remote Sens. 2022, 14, 3253 40 of 110

Landslides are a major natural hazard in mountainous regions. Traditionally, landslide


mapping heavily relies on field surveys and visual interpretation of satellite imagery. A
new method was proposed in [211] for mapping landslides in Nepal using RF on GEE.
Many agricultural landscapes have incorporated surface drainage systems to stop fields from
flooding during heavy precipitation and runoff. These underground drainage networks have
caused flood forecasting to become harder to do since it is more difficult to track water in space
and time, as drainage networks are not always well mapped. The authors in [212] created
surface drainage maps through running an RF model on the GEE platform, by analyzing
vegetation, thermal, moisture, and climate datasets, along with surface drainage records.
Producing flood maps is critical to giving advanced warning to those who may be
in affected areas. However, producing these maps in real time is hindered by the fact
that many mapping applications focus on too small an area due to lack of computational
resources. The authors in [214] presented a case study for the 2018 Kerala flood in India.
They demonstrated how GEE can be used to process large optical and SAR RS datasets,
in conjunction with field and precipitation data, using image processing techniques to
produce high-resolution flood maps over a large area. Flood forecasting in Bangladesh
currently involves running hydrological inundation simulations based on DEMs in order
to produce early warning notifications. However, these simulations are compute-intensive
and require access to high-resolution, up-to-date DEMs. Cloud-based mapping using RS
imagery has the potential to provide quicker inundation forecasts over a large spatial area.
This is especially true for analyses that utilize SAR imagery, since floods are often caused by
heavy rains leaving optical imagery obstructed by clouds. The authors in [213] produced
flood maps in Bangladesh, by taking advantage of the easy-to-find data and freely available
compute on the GEE platform.
Remote Sens. 2022, 14, x FOR PEER REVIEW 48 of 121
3.2.12. Soil
There are six soil studies using GEE and AI. Table 12 below summarizes those papers
and Figure 17 below is the word cloud generated from the title, keywords, and abstract
the six soil studies. The most frequently used are characteristics that soil researchers try
of the six soil studies. The most frequently used are characteristics that soil researchers
to monitor: “SOM”, “organic”, and “stocks” for soil organic matter, as well as “litter”,
try to monitor: “SOM”, “organic”, and “stocks” for soil organic matter, as well as “litter”,
“moisture”, “thermal”, and “salinity”. MODIS and Sentinel-2 data feature heavily in the
“moisture”, “thermal”, and “salinity”. MODIS and Sentinel-2 data feature heavily in the
word cloud
word cloud because
becausetheytheyare
arethe
themost-used
most-used data products
data (alongside
products Landsat).
(alongside From
Landsat). our
From
interactive web app (see Appendix A) and Table 12, we found that the
our interactive web app (see Appendix A) and Table 12, we found that the most-used RSmost-used RS
datasets are Landsat 8 OLI and SRTM DEM. The most popular ML models
datasets are Landsat 8 OLI and SRTM DEM. The most popular ML models used are RF and used are RF
and CART,
CART, whilewhile
the toptheevaluation
top evaluation metrics
metrics used
used are R2are
andR2RMSE.
and RMSE. A summary
A brief brief summary
of thoseof
those studies
studies is provided
is provided belowbelow Table
Table 12. 12. Much
Much more detailed
more detailed textualtextual summaries
summaries forofeach
for each the
of the six studies are detailed in Appendix
six studies are detailed in Appendix C.12. C.12.

Figure 17. Word-cloud visualization of all the reviewed papers targeting soil (i.e., those six papers
summarized in
summarized in Table
Table 12).
12).

Table 12. Studies targeting soil from RS imagery using AI.

References Method Model Comparison RS Data Type Study Area

Padarian et al. (2015) classification, CART, Serial Rifle


Remote Sens. 2022, 14, 3253 41 of 110

Table 12. Studies targeting soil from RS imagery using AI.

References Method Model Comparison RS Data Type Study Area


CART, Serial Rifle
Padarian et al. (2015) [217] classification, regression SRTM DEM United States
Classifier
Landsat 5 SR, Landsat 8
Ivushkin et al. (2019) [218] classification CART, RF, SVM Global
SR, SoilGrids
ALOS DEM, Landsat 4
TM, Landsat 5 TM,
Poppiel et al. (2019) [219] regression RF Brazil
Landsat 7 ETM+,
Landsat 8 OLI
Cao et al. (2019) [220] regression KNN, QRF, RF LANDFIRE, Landsat 7 United States
ASTER DEM,
CGLS-LC100, GLDAS,
Greifeneder et al. (2021) [221] regression GBRT Landsat 8 OLI, Landsat 8 Global
TIRS, MOD13Q1,
Sentinel-1, SRTM DEM
HydroSHEDS DEM,
Zhang et al. (2021) [222] regression ANN, RF, SVM China
MODIS, Sentinel-2A

Many authors come to GEE curious to test out the new cloud computing platform
for their domain-specific application. GEE provides freely available compute and data to
interested researchers, which they then use to explore the strengths and limitations of GEE.
An early soil mapping study was performed in [217] on GEE in 2015. Collecting field samples
for soil mapping can be time- and labor-intensive and can be bound to small areas given their
costs. These data collections also need to be repeated, representing a barrier to presenting
up-to-date information that covers large spatial areas to decision-makers. To address these
issues, the authors in [219] used field observations, DEM data, and Landsat imagery on GEE
to map different soil types and soil attributes across a large region in Brazil.
Soil plays a critical role in the carbon and water cycles, along with providing areas
for habitat or agricultural use. The spatial distribution of litter and soil carbon (C) stocks
is important in greenhouse gas estimation and reporting and inform land management
decisions, policy, and climate change mitigation strategies. The effects of spatial aggregation
of climatic, biotic, topographic and soil variables on national estimates of litter and soil C
stocks were explored in [220]. The authors also characterized the spatial distribution of
litter and soil C stocks in the conterminous United States (CONUS). Litter and soil variables
were measured on permanent sample plots from the National Forest Inventory (NFI) from
2000 to 2011. Beyond mapping litter and soil carbon (C) stocks, it is also important to map
soil organic matter at a large scale, but traditional field collection techniques are cost- and
effort-intensive. Many researchers have thus turned to RS imagery and/or ML to map
soil organic matter, but there is still some difficulty in selecting the right input data or ML
model for prediction. To determine how different datasets and ML models perform on GEE
in predicting soil organic matter, an ANN, RF, and SVR model was compared in [222] with
MODIS, Sentinel-2A, and DEM data as input.
Accurate soil moisture content information is crucial to being able to correctly model
water, energy, and carbon cycles, as well as being key to understanding and predicting
natural hazards like drought, floods, and landslides. However, most soil moisture datasets
are created with medium or coarse spatial resolution. Using optical, thermal, and SAR
imagery in addition to DEM data, a global, high-resolution soil moisture map was produced
in [221]. The authors concluded that optical RS imagery and land-cover information play
the most important roles in determining soil moisture content, but SAR imagery and
soil data also contribute significantly to the model’s overall performance. This finding
highlights other studies’ results ([95,161,182]) that the combination of optical and SAR
data improves predictive outcomes. Soil salinity can impact agricultural yields and is
a global issue, but current datasets like the Harmonized World Soil Database have low
spatial resolution and need to be updated. As one of the main soil salinity datasets in
Remote Sens. 2022, 14, 3253 42 of 110

use, this makes it difficult to estimate up-to-date soil salinity levels even as they change
due to increasing drought severity from global warming. The authors in [218] explored
GEE’s potential to make a global soil salinity map based on field data and Landsat thermal
infrared imagery.

3.2.13. Cloud Detection and Masking


There are five studies (four are novel methods) that used GEE and AI for cloud
detection and masking. Table 13 below summarizes those studies and a word cloud
generated from the titles, keywords, and abstracts of those papers is provided in Figure 18.
The most frequently used words are “Google Earth Engine”, “cloud(s)”, and “masking”.
The main task in this literature domain is masking clouds in optical imagery (which is
not a problem for SAR data), so the words “optical”, “Landsat-8”, and “Sentinel-2” are
also prominent in this word cloud. From our interactive web app (see Appendix A) and
Table 13, the most-used RS datasets are Landsat 8 OLI, SRTM DEM, and Google Earth. The
most popular models are Fmask and the most-used evaluation metrics are CE, OA, OE,
RMSE. A brief summary of those studies is provided below Table 13. More detailed textual
summaries for each of the five studies are detailed in Appendix C.13.

Table 13. Studies targeting cloud detection from RS imagery using AI (Note that references marked *
denotes novel methods and will be detailed in Section 3.3).

References Method Model Comparison RS Data Type Study Area


Gómez-Chova et al. kernel ridge regression, Landsat 8, RapidEye, Argentina, China,
regression
(2017) [223] * linear regression SPOT 4 Jordan, Spain
Mateo-García et al. ACCA, Landsat 8 Biome Cloud
classification Global
(2018) [224] * Fmask, k-means Masks, Landsat 8 TOA
Remote Sens. 2022, 14, x FOR PEER REVIEW 50 of 121
DeepGEE-CD, 1
Yin et al. (2020) [225] * classification Landsat 8 OLI NS
FMask, RS_Net
Cloud-Score, Fmask, Amazon tropical forest,
Li et al. (2022) [226] 2” are also prominent in this word cloud. From our
classification interactive web app (see Appendix A)
Sentinel-2
SVM, QA60 China, Sri Lanka
and Table 13, the most-used RS datasets are Landsat 8 OLI, SRTM DEM, and Google Earth.
Zhang et al. DeepGEE-S2CR,
The most popular models
regression are Fmask and the most-used
SEN12MS-CR evaluation
2 metrics are CE, OA,
Global
(2022) [227] * DSen2-CR
OE, RMSE. A brief summary of those studies is provided below Table 13. More detailed
1 Not Specified, 2 this is a co-registered set of Sentinel-1 and Sentinel-2 images.
textual summaries for each of the five studies are detailed in Appendix C.13.

Figure 18.
Figure 18. Word-cloud
Word-cloud visualization
visualization of
of all
all the
the reviewed papers targeting
reviewed papers targeting cloud
cloud detection
detection and
and
masking (i.e., those five papers summarized in Table 13).
masking (i.e., those five papers summarized in Table 13).
TableMany
13. Studies targeting
mapping andcloud detection from
identification tasksRSthat
imagery
use RSusing AI (Note
imagery andthat
ML references
rely on marked
optical
* denotes novel methods and will be detailed in Section 3.3.).
cloud-free imagery. Detecting and removing clouds in optical RS imagery is a difficult
but important task, as many other classification and detection methods rely on masking
References Method Model Comparison RS Data Type Study Area

kernel ridge
Gómez-Chova et al. Landsat 8, RapidEye, Argentina, China,
regression regression, linear
(2017) [223] * SPOT 4 Jordan, Spain
regression
Remote Sens. 2022, 14, 3253 43 of 110

clouds and on obtaining cloud-free imagery. Many algorithms, including Fmask, which is
a commonly used algorithm to create a cloud mask in RS imagery, rely on using thresholds
for single RS images, which makes them prone to errors when applied to entire RS time
series. The authors in [223] treated cloud detection as a change detection problem across
time using a kernel ridge regression model.
Optical RS imagery has many applications across several environmental and earth
science domains. However, Optical RS imagery is often occluded by clouds, limiting its
utility. While processing techniques like taking monthly composite images to remove
clouds works to some extent, it relies on having enough cloud-free imagery to make the
composites, which is not always available. Recently, DL models have shown the ability to
reconstruct scenes in optical RS imagery that is blocked by clouds. However, researchers
looking to use DL models in cloud environments often have to coordinate across different
storage, analysis, and ML platforms (e.g., Google Cloud Storage, Google Colab, Google AI),
which can be cumbersome and expensive. The authors in [227] thus decided to implement
their cloud-removal DL model directly in GEE. Their model, DeepGEE-S2CR, is a cloud-
optimized version of the DSen2-CR model presented in [228] and fuses co-registered
Sentinel-1 and Sentinel-1-2 images from the SEN12MS-CR dataset.
Cloud detection is a well-studied task and GEE has several cloud detection/masking
algorithms available on its platform. However, some of them have shown to be unstable
leading to considerable under- or overestimation. To explore how CV algorithms and ML
models can be used together on GEE, [226] combined the existing Cloud-Score algorithm
with an SVM to detect clouds in imagery ranging from Amazon tropical forests, Hainan
Island, and Sri Lanka. Fmask is the most commonly used method but has limited use
in mountainous regions where terrain and shadows can be confused for clouds or when
sudden changes in the Earth’s surface occur in time-series imagery. A convolutional neural
network (CNN) called DeepGEE-CD was built in [225] to detect clouds in RS imagery
directly on the GEE platform. Cloud screening may be cast as an unsupervised change
detection problem in the temporal domain. A cloud screening method based on detecting
abrupt changes along the time dimension was introduced in [224], assuming that image
time series follow smooth variations over land (background) and abrupt changes are mainly
due to the presence of clouds.

3.2.14. Wildlife and Animal Studies


Wildlife and animal studies is one of the less developed application domains using
GEE and AI (four studies total). Table 14 below summarizes those studies and a word
cloud generated from the titles, keywords, and abstracts of the four studies is provided in
Figure 19. The most frequently used words are “insect”, “bird”, “roadkill”, and “malaria”
which show what scientists are trying to monitor in this domain. Meanwhile, specific place
names like”, “forest”, “water”, “Peru”, and “Amazonian” reflect where the studies are
being done. From our interactive web app (see Appendix A) and Table 14, the most popular
AI model is an RF and the most-used evaluation metric is OA. A brief summary of those
studies is provided below Table 14. More detailed textual summaries for each of the four
studies are detailed in Appendix C.14.

Table 14. Studies targeting wildlife and animal studies.

References Method Model Comparison RS Data Type Study Area


Carrasco-Escobar et al. (2019) [229] classification RF DJI Phantom 4 Pro, 3DR Solo Peru
binomial logistic Hansen Global Forest Change,
Ascensão et al. (2019) [230] classification Brazil
regression MCD12Q1, MOD13Q1
Lyons et al. (2019) [231] classification RF DJI Phantom 3 Professional Australia
classification, Landsat 5 TM, Landsat 7 ETM+,
Pérez-Romero et al. (2019) [232] KNN, RF Spain
regression Landsat 8 OLI
Figure 19. The most frequently used words are “insect”, “bird”, “roadkill”, and “malaria”
which show what scientists are trying to monitor in this domain. Meanwhile, specific
place names like”, “forest”, “water”, “Peru”, and “Amazonian” reflect where the studies
are being done. From our interactive web app (see Appendix A) and Table 14, the most
popular AI model is an RF and the most-used evaluation metric is OA. A brief summary
Remote Sens. 2022, 14, 3253 44 of 110
of those studies is provided below Table 14. More detailed textual summaries for each of
the four studies are detailed in Appendix C.14.

Figure 19. Word-cloud visualization of all the reviewed papers targeting wildlife and animal
studies (i.e., those four papers summarized in Table 14).

UAS (i.e., drones) are able to collect high-quality data over large aggregations of
wildlife, as they offer an attractive opportunity for improving methods and increasing
cost effectiveness of monitoring wildlife populations. The authors in [229] explored the
use of UAS for identifying Ny. darlingi breeding sites with high-resolution imagery
(~0.02 m/pixel) and their multispectral profile in Amazonian Peru. Land use changes
such as deforestation, irrigation, wetland modification and road construction, may drive
infectious disease outbreaks and interfere with their transmission dynamics. Accurate
classification of Ny. darlingi -positive and -negative water bodies would increase the
impact of targeted mosquito control on aquatic life stages. Researchers in [231] developed
a semi-automated framework for monitoring large complex wildlife aggregations using
drone-acquired imagery over four large and complex waterbird colonies.
The success of conservation and mitigation management strategies may greatly de-
pend on the knowledge of the temporal and spatial patterns of roadkill risk, and its
relationship with key environmental drivers. The authors in [230] used a set of freely
available environmental variables, namely habitat information from RS observations and
climatic information from weather stations, to assess and predict the roadkill risk.
Pest outbreaks are causing more damage to forests around the world as winters get
warmer and summers are drier and start earlier. These conditions allow pests to proliferate,
though pests do not always kill trees outright. They often defoliate trees, which weakens
them before future pest outbreaks or drought conditions. However, forest defoliation
is understudied and much of the research done in this area relies on coarse resolution
data. Using Landsat RS imagery, climate variables, and government environmental data,
Ref. [232] analyzed Pine Processionary Moth outbreaks in pine forests in southern Spain.

3.2.15. Archaeology
Archaeology is also one of the less researched applications using GEE and AI (three
studies total). Table 15 below summarizes those studies and a word cloud generated from
the titles, keywords, and abstracts of the three papers is provided in Figure 20. The most
frequently used words are “Google Earth Engine”, “detection”, “satellite”, “drone”, and
“survey” while terms like “automated” and “mounds” are also common. This reflects the
papers we reviewed and their focus on using the GEE platform to scale up and automate
exploratory surveys using RS data, both from satellite platforms and self-collected drone
imagery. From our interactive web app (see Appendix A) and Table 15, the most frequently
used RS dataset is WorldView 2. The most popular ML model is an RF and the most-used
evaluation metric is visual analysis.
frequently used RS dataset is WorldView 2. The most popular ML model is an RF and the
most-used evaluation metric is visual analysis.
Utilizing RS imagery for anthropological studies can be difficult because of a lack of
financial resources, technical training, or compute needed to analyze large RS datasets.
Remote Sens. 2022, 14, 3253 More specific to searching for mounded sites and scattered materials that would indicate 45 of 110
past human habitation in RS imagery, it is difficult to pair legacy field data with RS
imagery. When archaeologists look for potsherds, either in the field or at development
sites,
Table the standard
15. Studies practice
targeting is to form
archeology walking
from surveys
RS imagery to detect
using evidence
AI. (Note of priormarked
that references human*
settlement. This usually involves a large group of
denotes novel methods and will be detailed in Section 3.3). people walking in parallel lines over a
given area, documenting what they find along the way. This process involves a lot of
References upfront
Methodpersonnel costs. The
Model authors in [233] demonstrated
Comparison RS Data Type the potential Study role of GEE in
Area
Liss et al. (2017) [233] * theclassification
future of archaeological research
Canny edge detection,through
RF two WorldView
case studies.
2 The authors in [234] used
Jordan
Orengo and Garcia-Molsosa drone imagery and GEE to detect potsherds in the field in the hopes of speeding up this
classification CART, RF, SVM DJI Phantom 4 Pro Greece
(2019) [234] * process. In [235], the authors utilized optical and SAR data on GEE to create a classifier
capable of outputting a likelihood that there Google is a mounded site in a given region of the
Earth, Sentinel-1,
Orengo et al. (2020) [235] * Cholistan Desert in Pakistan. More detailed textual summaries 2,
classification RF Sentinel-2 MSI, WorldView for each ofPakistan
those three
WorldView 3
studies are provided in Appendix C.15, as they are all proposed some novel methods.

Figure 20. Word-cloud


Figure 20. Word-cloudvisualization
visualizationofof
allall
thethe reviewed
reviewed papers
papers targeting
targeting archaeology
archaeology (i.e., those
(i.e., those three
three papers summarized in Table
papers summarized in Table 15). 15).

TableUtilizing
15. Studies
RStargeting
imageryarcheology from RS imagery
for anthropological using
studies canAI.
be (Note thatbecause
difficult references
of marked
a lack of*
denotes novel methods and will be detailed in Section 3.3.).
financial resources, technical training, or compute needed to analyze large RS datasets.
More specific to searching for mounded sites and scattered materials that would indicate
References Method
past ModelinComparison
human habitation RS imagery, it is RS Data Type
difficult Studydata
to pair legacy field Areawith RS
imagery. When archaeologists look for potsherds, either in the field or at development
Liss et al. (2017) [233] * sites, the standardCanny
classification
edge
practice detection,
is to form walkingWorldView
surveys to detect
2 evidenceJordan
of prior human
RF
settlement. This usually involves a large group of people walking in parallel lines over
a given area, documenting what they find along the way. This process involves a lot of
Orengo and Garcia- upfront personnel costs. TheRF,
authors DJI Phantom 4
classification CART, SVMin [233] demonstrated the potential role of GEE in the
Greece
Molsosa (2019) [234] * Pro
future of archaeological research through two case studies. The authors in [234] used drone
imagery and GEE to detect potsherds in the field in the hopes of speeding up this process.
In [235], the authors utilized optical and SARGoogle Earth,
data on GEE to create a classifier capable
Sentinel-1,
of outputting a likelihood that there is a mounded site in a given region of the Cholistan
Orengo et al. (2020)
classification RF
Desert in Pakistan. More detailed Sentinel-2 for
textual summaries MSI,
each of thosePakistan
three studies are
[235] *
WorldView
provided in Appendix C.15, as they are all proposed some2,novel methods.
WorldView 3
3.2.16. Coastline Monitoring
Coastline monitoring is one of the less researched applications using GEE and AI
(three studies total). Table 16 below summarizes those studies and a word cloud generated
from the titles, keywords, and abstracts of the three papers is provided in Figure 21. The
word clouds provide an informative (general and specific) focus of each set of the papers.
For example, we can see that the most frequently used general words are “shoreline”,
“coastline”, and “tidal”, and “beach”. This type of research is interested in first detecting
coastlines, but also monitoring geospatial changes over time (i.e., keywords “detection”,
“position”, “changes”, “temporal”, “time”, and “multi-annual”). From our interactive web
3.2.16. Coastline Monitoring
Coastline monitoring is one of the less researched applications using GEE and AI
(three studies total). Table 16 below summarizes those studies and a word cloud generated
Remote Sens. 2022, 14, 3253 from the titles, keywords, and abstracts of the three papers is provided in Figure 21. 46 ofThe
110
word clouds provide an informative (general and specific) focus of each set of the papers.
For example, we can see that the most frequently used general words are “shoreline”,
“coastline”, and “tidal”,
app (see Appendix A) and and “beach”.
Table 16, theThis type ofRS
most-used research
datasets is are
interested
Landsatin5 first detecting7
TM, Landsat
coastlines,
ETM+, andbut also monitoring
Landsat 8 OLI. geospatial changes over time (i.e., keywords “detection”,
“position”, “changes”, “temporal”, “time”, and “multi-annual”). From our interactive
web 16. Studies
Tableapp targeting coastline
(see Appendix monitoring
A) and Table studies.
16, the most-used RS datasets are Landsat 5 TM,
Landsat 7 ETM+, and Landsat 8 OLI.
References Method Model Comparison RS Data Type Study Area
Observing and quantifying the changing position of the shorelines is critical to
linear regression, marching
present-day coastal management and future Landsat coastal5 TM,
planning.
Landsat 7 The authors in [236]
squares interpolation
Hagenaars et al. (2018) [236] regression
presented an automated method to extract ETM+, Landsat
shorelines from 8 OLI, and Sentinel
Landsat Netherlands
satellite
algorithm, region growing
Sentinel 2
imagery. The authors inclustering
[237] evaluated
algorithm the capability of satellite RS to resolve at differing
temporal scales the variability and trends inLandsat sandy4 shoreline
TM, Landsat positions.
5 In [238],
Australia, France,the
Vos et al. (2019) [237] regression
authors proposed a method toMLP map continuousTM, Landsatin
changes 7, coastlines
Landsat 8, and New
tidalZealand,
flats in the
Sentinel-2, UAS United States
Zhoushan Archipelago during 1985–2017, using Landsat images on the GEE platform.
Landsat 5 TM, Landsat 7
Cao et al. (2020) [238] More detailed textual summaries
classification for each of those
hierarchical clustering three studies are provided in Appendix
China
ETM+, Landsat 8 OLI
C.16.

Figure
Figure 21.
21.Word-cloud visualization
Word-cloud of all theofreviewed
visualization all thepapers targeting
reviewed coastline
papers monitoring
targeting (i.e.,
coastline
those three papers
monitoring summarized
(i.e., those in Table
three papers 16).
summarized in Table 16).

TableObserving
16. Studies and
targeting coastlinethe
quantifying monitoring
changingstudies.
position of the shorelines is critical to present-
day coastal management and future coastal planning. The authors in [236] presented an
References Method
automated method Model Comparison
to extract RS Dataand
shorelines from Landsat Type Study
Sentinel satellite Area The
imagery.
authors in [237] evaluated the capability of satellite RS to resolve at differing temporal
linear
scales the variability andregression,
trends in sandy shoreline positions.
Landsat 5 TM, In [238], the authors pro-
marching
posed a method to map continuous squares changes in coastlines and tidal flats in the Zhoushan
Hagenaars et al. Landsat 7 ETM+,
regression interpolation
Archipelago during 1985–2017,algorithm,
using Landsat images on8the Netherlands
GEE platform. More detailed
(2018) [236] Landsat OLI,
region
textual summaries forgrowing clustering
each of those three studies are provided in Appendix C.16.
Sentinel 2
algorithm
3.2.17. Bathymetric Mapping
There are only two bathymetric mappingLandsat studies 4leveraging
TM, GEE and AI. Table 17
Australia, France,
Landsat 5 TM,
below summarizes those studies and a word cloud generated from the titles, keywords,
Vos et al. (2019) [237] regression MLP New Zealand,
and abstracts of the two papers is provided Landsat
in Figure7,22. The most
Landsat 8, frequently used words
United States
are “bathymetry”, “satellite” and “satellite-derived”, as well
Sentinel-2, UAS as “validation”. Currently,
bathymetric mapping applications are derived from radar, sonar, and light detection and
ranging (LiDAR) measurements from boats and small aircraft in conjunction with model
simulations. The authors using GEE for bathymetric mapping research are trying to use
satellite imagery and ML on the cloud platform to generate bathymetric maps over much
larger scales than would be possible otherwise.
bathymetric mapping applications are derived from radar, sonar, and light detection and
ranging (LiDAR) measurements from boats and small aircraft in conjunction with model
simulations. The authors using GEE for bathymetric mapping research are trying to use
satellite imagery and ML on the cloud platform to generate bathymetric maps over much
larger scales than would be possible otherwise.
Remote Sens. 2022, 14, 3253 47 of 110
Mapping bathymetry across large areas is a difficult problem. This is in part because
high-resolution aerial radar data, which produces some of the best bathymetry maps, are
expensive to collect and only cover small areas. Researchers in [239] paired field
Table 17. Studies
observations of targeting bathymetry
coastal depths with from RS imagery
RS imagery usingmultiple
to train AI. linear regression models
that can then predict in areas where no depth information is available. Without accurate
References Method Model Comparison RS Data Type Study Area
bathymetry information, ships risk getting stranded in shallow water areas around the
globe. Typically, shipsmultiple
equipped with sonarGarmin
linear
Fishfinder
and planes that have airborne LiDAR are
Traganos et al. (2018) [239] regression
used to get water depth measurements. 160C sonar,
However, sonar Lowrance
is not suitable forGreece
shallow water
regression
HDS-5 sonar, Sentinel-2
measurements and airborne LiDAR is expensive to get. Moreover, there are very few
bathymetry datasets that have a global reach. CZMIL airborne
The authors in [240] used airborne LiDAR,
LiDAR, HDS-5 sonar,
sonar, and Landsat data to estimate bathymetry in Japan, Puerto Japan, Rico, Puerto
the USA,
Rico,and
Sagawa et al. (2019) [240] regression RF HDS-7 sonar, Landsat 8,
Vanuatu using an RF model. More detailed Riegl textual summaries for USA, Vanuatu
each of those two
VO-880G
studies are provided in Appendix C.17. airborne LiDAR

Figure 22.
Figure 22.Word-cloud visualization
Word-cloud of all the
visualization of reviewed
all the papers targeting
reviewed papersbathymetric
targeting mapping (i.e.,
bathymetric
those papers summarized in Table 17).
mapping (i.e., those papers summarized in Table 17).

TableMapping
17. Studies targeting bathymetry
bathymetry from
across large RS imagery
areas using problem.
is a difficult AI. This is in part because
high-resolution aerial radar data, which produces some of the best bathymetry maps,
References areMethod Modeland
expensive to collect Comparison
only cover smallRSareas.
Data Type
Researchers inStudy
[239] Area
paired field
observations of coastal depths with RS imagery to train multiple linear regression models
that can then predict in areas where Garmin
Traganos et al. (2018) multiple linear no depth information is available. Without accurate
bathymetry Fishfinderin160C
regression information, ships risk getting stranded Greece
shallow water areas around the
[239] regression
sonar, Lowrance
globe. Typically, ships equipped with sonar and planes that have airborne LiDAR are
used to get water depth measurements. However, sonar is not suitable for shallow water
measurements and airborne LiDAR is expensive to get. Moreover, there are very few
bathymetry datasets that have a global reach. The authors in [240] used airborne LiDAR,
sonar, and Landsat data to estimate bathymetry in Japan, Puerto Rico, the USA, and
Vanuatu using an RF model. More detailed textual summaries for each of those two studies
are provided in Appendix C.17.

3.2.18. Ice and Snow


There are only two studies in ice and snow that have leveraged GEE and AI. Table 18
below summarizes the two studies and Figure 23 below is the word cloud generated from
the title, keywords, and abstract of the two studies. The authors are interested in measuring
“changes” and “trends” in “ablation”, “break-up”, “freeze/-up”, “freezing”, “phenology”,
“subsistence”, and “reflectance” levels in “ice”, and “snowfields”. From our interactive web
app (see Appendix A) and Table 18, we found that the most-used RS datasets are Landsat 5
TM, Landsat 7 ETM+, and Landsat 8 OLI.
from the title, keywords, and abstract of the two studies. The authors are interested in
measuring “changes” and “trends” in “ablation”, “break-up”, “freeze/-up”, “freezing”,
“phenology”, “subsistence”, and “reflectance” levels in “ice”, and “snowfields”. From our
interactive web app (see Appendix A) and Table 18, we found that the most-used RS
datasets are Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI.
Remote Sens. 2022, 14, 3253 48 of 110
Global warming is putting pressure on Arctic ice and snow cover as the Arctic is
heating up much more rapidly than the rest of the planet. In Alaska, changes in perennial
snow cover have wide-ranging implications from changing hydrology and vegetation
Table 18. Studies targeting ice and snow studies.
patterns, altering the local topology through more frequent freeze-thaw cycles, and by
References
disrupting
Method
the abilityModel
of subsistence
Comparison
hunters in the region to find food. The authors
RS Data Type
in [241]
Study Area
used a CART model to track the changes in the cryosphere in Alaska. The duration and
ArcticDEM, Landsat 4 TM,
Tedesche et al. (2019) [241] seasonality
classificationof lake ice isCARTsensitive to local environmental
Landsat changes such
5 TM, Landsat 7 ETM+, as wind,
United States air
temperature, and snow accumulation. Lake ice Landsat phenology
8 OLI (LIP, ice breakup and freeze-
up dates and ice duration) is a particularly robust
Landsat proxy 5for
MSS, Landsat TM,climate variability. The
Qi et al. (2020) [242] regression
authors in [242] studiedlinear
LIPregression Landsat
in Qinghai Lake, 7 ETM+,
China. A Landsat 8 OLI,
more detailed textual China
summary of
MOD09GQ, NOAA AVHRR
those two studies is provided in Appendix C.18.

Figure 23. Word-cloud visualization of all the reviewed papers targeting ice and snow (i.e., those
Figure 23. Word-cloud visualization of all the reviewed papers targeting ice and snow (i.e., those two
two summarized in Table 18).
summarized in Table 18).
Table 18. Studies targeting ice and snow studies.
Global warming is putting pressure on Arctic ice and snow cover as the Arctic is
heating up much more rapidly than the rest of the planet. In Alaska, changes in perennial
References Method Model Comparison RS Data Type Study Area
snow cover have wide-ranging implications from changing hydrology and vegetation
patterns, altering the local topology through more frequent freeze-thaw cycles, and by
disrupting the ability of subsistence hunters in the region to find food. The authors in [241]
used a CART model to track the changes in the cryosphere in Alaska. The duration
and seasonality of lake ice is sensitive to local environmental changes such as wind, air
temperature, and snow accumulation. Lake ice phenology (LIP, ice breakup and freeze-up
dates and ice duration) is a particularly robust proxy for climate variability. The authors
in [242] studied LIP in Qinghai Lake, China. A more detailed textual summary of those
two studies is provided in Appendix C.18.

3.3. Advances in Methods


In this section, we provide a summary of all 21 novel method papers (i.e., those marked
as * in the tables in Section 3.2 above). Specifically, see Table 19 below for novel methods for
classification tasks, and Table 20 for segmentation tasks, and Table 21 for regression tasks. A
word cloud for all 21 novel methods papers (i.e., those papers in Tables 19–21) is provided
in Figure 24. The most frequently used words are “Google Earth Engine”, “classification”,
and “machine learning”. For being smaller research domains (in terms of total paper count
for this GEE + AI review paper), archaeology and cloud detection and mapping research
presented many novel ways to use CV, ML, and DL methods on the GEE platform. In the
word cloud above, “cloud”, “masking”, “archaeology”, “archaeological”, and “survey”
reflect this influence. “Urban”, “water”, and “fire” are also included in this word cloud as
they included papers with novel methods in them. The “cover” and “surface” keywords
could be referring to the LULC, vegetation, water, or infrastructure domains. Those novel
methods are detailed in subsections in Section 3.2. Inspired recommendations from those
Remote Sens. 2022, 14, 3253 49 of 110

novel methods are provided in Sections 4.2 and 4.3. It is interesting to see that there are
only three studies about archaeology (Section 3.2.15), but all three papers have proposed
novel methods.

Table 19. Method papers for classification tasks.

References Evaluation Metrics Application Area Model Comparison RS Data Type


Landsat 5 TM, Landsat 7 ETM+,
Pekel et al. (2016) [32] CE, OE water expert system
Landsat 8 OLI
Canny edge detection,
Liss et al. (2017) [233] accuracy archaeology WorldView 2
RF
Lee et al. (2018) [84] OA, PA, UA LULC BULC-U GlobCover, Landsat 5
CE, false positive rate,
Mateo-García et al. Landsat 8 Biome Cloud Masks,
OA, OE, RMSE, ROC, cloud ACCA, Fmask, k-means
(2018) [224] Landsat 8 TOA
true positive rate
Landsat 7 ETM+, Landsat 8 OLI,
Murray et al. (2018) [86] CA, OA, PA LULC RF
Landsat 8 SR, SRTM DEM
Orengo and Garcia-Molsosa
visual analysis archaeology CART, RF, SVM DJI Phantom 4 Pro
(2019) [234]
CBERS-4 MUX, FireCCI51,
Gaofen-1 WFV, GFED4, Google
Long et al. (2019) [191] OA, CE, OE, R2 fire RF, SVM
Earth, MCD12C1, MOD44B, MTBS,
Landsat-8
CE, OA, OE, visual Landsat 5 TM, Landsat 7 ETM+,
Alencar et al. (2020) [135] vegetation DT, RF
assessment Landsat 8 OLI
Google Earth, Sentinel-1,
Orengo et al. (2020) [235] visual analysis archaeology RF Sentinel-2 MSI, WorldView 2,
WorldView 3
Google Earth, Landsat 5 TM,
Liang et al. (2020) [94] Kappa, OA LULC CART, MD, RF Landsat 7 ETM+, Landsat 8 OLI,
SRTM DEM
accuracy, CE, F1-score, Google Earth, Landsat 5 TM,
Wang et al. (2020) [151] water MNDWI, MSCNN, RF
IoU, Kappa, OE Landsat 7 ETM+, Landsat 8 OLI
DeepGEE-CD, FMask,
Yin et al. (2020) [225] CE, MIoU, OA, OE cloud Landsat 8 OLI
RS_Net
CE, Kappa, OA, OE,
Canada’s Annual Crop Inventory,
Amani et al. (2020) [69] PA, UA, visual crop ANN, SNIC
MCD12Q1, Sentinel-1, Sentinel-2
assessment
CGLS-LC100, Landsat 5 TM,
CART, LandTrendr, MD,
Long et al. (2021) [138] Kappa, OA, PA, UA vegetation Landsat 7 ETM+, Landsat 8 OLI,
NB, RF, SVM
Sentinel-1, Sentinel-2, SRTM DEM
IoU, Kappa, OA, DnCNN, RF, SegNet,
Adrian et al. (2021) [72] crop Sentinel-1, Sentinel-2, WorldView 3
PSNR, SSIM U-Net, 3D U-Net
CART, gmoMaxEnt, NB,
Lin et al. (2021) [185] Kappa, OA, PA, UA infrastructure Landsat 8 OLI
RF, SVM

Table 20. Method papers for segmentation tasks.

References Evaluation Metrics Application Area Model Comparison RS Data Type


DeepWaterMapv2,
Isikdogan et al. (2019) [147] F1-score, precision, recall water DeepWaterMap, Landsat 8
MNDWI, MLP
JRC Global Surface Water
F1-score, Kappa,
Mayer et al. (2021) [156] water U-Net datasets, PlanetScope,
recall, precision
Sentinel-1
marked as * in the tables in Section 3.2 above). Specifically, see Table 19 below for novel
methods for classification tasks, and Table 20 for segmentation tasks, and Table 21 for
regression tasks. A word cloud for all 21 novel methods papers (i.e., those papers in Tables
19–21) is provided in Figure 24. The most frequently used words are “Google Earth
Engine”, “classification”, and “machine learning”. For being smaller research domains (in
Remote Sens. 2022, 14, 3253 50 of 110
terms of total paper count for this GEE + AI review paper), archaeology and cloud
detection and mapping research presented many novel ways to use CV, ML, and DL
methods on the GEE platform. In the word cloud above, “cloud”, “masking”,
Table 21. Method “archaeological”,
“archaeology”, papers for regressionand
tasks.
“survey” reflect this influence. “Urban”, “water”,
and “fire” are also included in this word cloud as they included papers with novel
References Evaluation Metrics Application Area Model Comparison RS Data Type
methods in them. The “cover” and “surface” keywords could be referring to the LULC,
Gómez-Chova et al. vegetation, kernelThose
ridge regression, Landsat
are8, detailed
RapidEye,in
RMSE water, or infrastructure
cloud domains. novel methods
(2017) [223] linear regression SPOT 4
subsections in Section 3.2. Inspired recommendations from those novel methods are
Pipia et al. (2021) [141] MAE, RMSE, RRMSE,
provided R2 4.2 and
in Sections vegetation
4.3. It is interesting to GPR
see that there areHyMap, Sentinel-2
only three studies
about
MAE,archaeology
RMSE, (Section 3.2.15), but all three DeepGEE-S2CR,
papers have proposed novel methods.
Zhang et al. (2022) [227] cloud SEN12 MS-CR
PSNR, SSIM DSen2-CR

Figure
Figure 24. Word-cloudvisualization
24. Word-cloud visualizationofofreviewed
reviewed
2121 novel
novel methods
methods papers
papers (all those
(all those 21 papers
21 papers from
from Tables
Tables 19–21).
19–21).

4. Challenges
Table and
19. Method Research
papers Opportunities
for classification tasks.
This section provides a summary of the patterns observed (Section 4.1) from reviewing
the research discussed above. Sections 4.2 and 4.3 describe the challenges and research
opportunities from application (Section 4.2) and technical (Section 4.3) perspectives.

4.1. Summary and Discussion


4.1.1. Brief Summary of Reviewed Studies
Our comprehensive and interactive review indicates that the integration of GEE and
ML (such as RF) is relatively straightforward and can directly run model training on GEE,
and that the integration of GEE and DL is not as intuitive and convenient (e.g., DL is
not supported directly in GEE, detailed in Section 4.1.2 below; researchers need to train
DL models outside GEE, either offline on their local computers or in Google Cloud AI).
However, the literature confirms that the integration of GEE and AI is becoming more
widespread for geospatial analysis across a range of domains (Section 3.2). The expanding
range of applications and increasing integration of AI methods into GEE observed in the
reviewed literature affirm their potential to enable effective and accurate RS systems at a
variety of scales.
Among the 200 reviewed studies, the most frequently used RS data are (see Figure 4c
for details): Landsat 8 OLI (74 studies), Landsat 5 TM (49 studies), and Landsat 7 ETM+
(48 studies). RF (125), SVM (40), and CART (38) are the most popular ML models (see
Figure 5a for details). It is not surprising that RF is the dominantly used model, as RF
is a widely accepted and efficient ensemble learning model that has demonstrated the
ability to cope well with a number of common ML problems (e.g., imbalanced data, missing
values, the presence of outliers, and overfitting) [243]. Among the reviewed studies, the
Remote Sens. 2022, 14, 3253 51 of 110

majority used ML (181), only a very small portion used DL (22) and CV (16); this is not
surprising, due to GEE’s limitations (Section 4.1.2). Note that the number does not add up
to 200, because some studies used combinations of ML, DL, and CV, so they were counted
multiple times. Among the 22 DL studies, most of them had to run the DL models either
offline on their local computers or on the Google Cloud AI platform. Only a very small
portion of studies (Section 4.3.1) actually integrated GEE with DL in an indirect way—DL
models were trained offline or on Google Cloud AI and then weights were uploaded to
GEE and performed online prediction there. The most-employed evaluation metrics are
OA (137 studies), PA (101), UA (98), and Kappa (76) (see Figure 5b for details). Of the
200 papers that we reviewed, all utilized GEE for both data processing, where 104 papers
also ran computation offline.
While the research investigated in Section 3 has demonstrated the power of using GEE
and AI for many different problem domains, most of the studies use GEE’s built-in ML
methods (e.g., RF, SVM, and CART). There is still a long way to go before researchers can
more easily develop, implement, test, and use novel AI methods (especially DL) on the
platform (see Section 4.1.2) due to bottlenecks in integrating GEE with Google AI cloud.
Some thematic areas are saturated with application-oriented papers, as is evident by the
list and number of citations in each subsection in Section 3.2. Our recommendation is
that for these areas (e.g., crop mapping and LULC), journals take less application-based
papers unless they are contributing new datasets or processing pipelines for working
with multiple datasets and start calling for novel method-based papers. However, other
areas (e.g., archaeology and bathymetry) could benefit from more use-cases or proof-of-
concept papers that open-source their code and data, speeding up the pace of research in
those respective fields.
From our interactive web app tool (see Figure 20 below), we noticed that most work
does not include hardware and software specifications (e.g., what CPU/GPU the authors
used to run their models, what Python libraries they used to implement the DL models,
etc.) and/or processing times [244]. Of the 200 total papers we reviewed, 101 ran strictly in
cloud computing environments (i.e., they had no offline component). Of these 101 papers,
only 10 papers provided their offline computation specifications (see Figure 25b for details).
From Figure 25a, most GEE integrated with AI work ran on the GEE cloud platform. Of
these papers, 98 (i.e., those marked as NA, which refers to “not applicable”) ran solely on
cloud platform(s) and 92 (those marked as NS, which means “not specified”) ran locally
without giving the hardware specification of the machines or runtimes for their analyses.
Of those studies that used cloud computation, the majority of them are on GEE while
a few combined GEE and the Google AI platform. A visual summary of software used
in the reported literate is provided in Figure 26. If a publication only used GEE or its
APIs, this is given a value of “NA” for “not applicable” since no additional software was
used. We can see from Figure 26 that 96 papers fall into this category. Of the remaining
papers that specified software that was used to complete part of an analysis outside of
GEE, 27 studies used R, 23 used Python, 19 used ArcGIS, and 10 used the scikit-learn
Python package. To make models comparable, reproducible, and to inform the design of
RS systems, it is important to report this type of information [245]. This is even true for
index-based methods and more traditional ML models so that researchers can fully evaluate
the trade-offs between runtime, accuracy, and ease of implementation. The interactive web
app tool that accompanies this review is intended, in part, to make future research more
reproducible. Most papers have an open-access PDF/HTML version of their manuscripts,
though a sizable portion of manuscripts (42 out of 200 of reviewed articles) do not. To
increase the rate of progress integrating GEE and AI, we suggest authors seek to provide
an open-access version of manuscripts whenever possible.
models so that researchers can fully evaluate the trade-offs between runtime, accur
and ease of implementation. The interactive web app tool that accompanies this revie
intended, in part, to make future research more reproducible. Most papers have an o
access PDF/HTML version of their manuscripts, though a sizable portion of manusc
(42 out of 200 of reviewed articles) do not. To increase the rate of progress integrating
Remote Sens. 2022, 14, 3253 and AI, we suggest authors seek to provide an open-access version 52 of 110
of manusc
whenever possible.

(a) (b)
Figurerelated
Figure 25. Statistics 25. Statistics related
to studies to studies
being computedbeing computed
in the incomputed
cloud or the cloud offline
or computed
on localoffline on
Remote Sens. 2022, 14, x FOR PEER REVIEW computers in the reviewed 200 papers. (a) Computed online on cloud
computers in the reviewed 200 papers. (a) Computed online on cloud platforms, (b) computed platforms,
121 (b) comp
62 ofoffline
offline on local machines. NA refers to “not applicable”, indicating a publication’s code ran s
on local machines. NA refers to “not applicable”, indicating a publication’s code ran solely on cloud
on cloud platform(s) and NS means “not specified”.
platform(s) and NS means “not specified”.

Figure26.
Figure Statisticsrelated
26.Statistics relatedtotowhat
whatsoftware
softwareand/or
and/orprogramming
programminglanguages
languageswere
wereused
usedininthe
the
studiesininthe
studies thereviewed
reviewed200200papers.
papers.NA
NA refers
refers toto “not
“not applicable”, meaning that those papers used
applicable”.
only GEE to complete their analysis.
4.1.2. GEE Limitations
GEE serves as a great free-of-charge cloud platform for EO big data processing and
analysis. With the very large amounts of data and combinations of temporal domains
utilized in [21], GEE was critical to enabling these investigations. The use of GEE also
facilitated the testing of several ML algorithms in a much faster way than would have
Remote Sens. 2022, 14, 3253 53 of 110

4.1.2. GEE Limitations


GEE serves as a great free-of-charge cloud platform for EO big data processing and
analysis. With the very large amounts of data and combinations of temporal domains
utilized in [21], GEE was critical to enabling these investigations. The use of GEE also
facilitated the testing of several ML algorithms in a much faster way than would have
been possible without it. The oil palm classification demonstrated [107] in GEE is useful to
provide a quick understanding of oil palm plantations present in the landscape. This in itself
is advantageous for independent monitoring bodies to conduct a survey of the landscape
in question and conduct more detailed assessments if necessary. In the near future, it
is foreseeable that a growing number of large-scale mapping and monitoring programs,
enabled through the integration of AI with GEE, will emerge as critical to tools to help
scientists, managers, and policymakers understand and respond to our environment [136].
However, GEE also has multiple noticeable limitations. Many authors reported com-
pute limits, lack of processing methods, the inflexibility of different models, and a lack of
data as their main limitations (each detailed below). Some recommendations for future
research derived from these limitations are provided in Sections 4.2–4.4.
• Compute limits [17,55,78,83,85,87,93,141,155,171,200,217,234,239,240]: Authors often
ran into memory errors when analyzing too many field samples/observations. This
also happened when the size of authors’ input data was too large more generally and
it was difficult to know beforehand if intermediate processing steps would trigger
this error. Thus, many authors had to export data as part of their analysis to access
functionality not on GEE or because using GEE would make them run out of the
amount of free compute provided. For example, every image uploaded to GEE (at
the time of this paper’s release) is limited to 10 GB [234]. As the authors used sub-
centimeter drone imagery, they had to downsize each image before uploading it,
resulting in a loss of resolution. See below for a few quoted limitations:
# “ . . . The users are limited to approximately 1 million training points . . . ,
a limitation in using a high number of trees within GEE when the amount of
field samples is high” [17].
# “One of the disadvantages of using the GEE cloud computing platform is that
it limits the number of field samples and input features. This is especially
challenging when the analysis is applied to a large domain, which may reduce
the efficiency of the implemented method” [171].
# “ . . . The current GEE pipeline for processing the available data on GEE
through the Python or JavaScript APIs requires exporting large volumes of
data to cloud or local storage . . . These processes are time consuming and
require extra funds for cloud processing and cloud storage” [87].
• A lack of processing methods/models/algorithms [17,21,35,46,81,85,93,101,107,110,
120,141,152,158–160,201,215,217,221,225], reasons listed were:
# There is a lack of domain-specific models and methods (GEE algorithms are
general) because GEE is more developed in some areas (LULC, forest, vegeta-
tion, crop) than others [17,46,107,152,158,225,239];
# No neural networks (NNs) are currently supported on GEE directly, but many
authors use DL models for their research [46,55,71,107,136,161,166], and they
either have to train their DL models offline or on Google Cloud AI, which is
not free of charge. Authors can also use TensorFlow on Google Colab and
Google Cloud AI but not directly on GEE. For example, in [225], “ . . . limited
by the computation resource of GEE, some specific convolution layers of DNN
cannot be implemented in GEE. For example, a dilated convolution layer could
not be achieved due to the fact that dilation is not supported in the convolu-
tion API provided by GEE. Conversion of other types of convolutions to the
convolution used in this study may help to solve this problem and it needs
further investigation . . . ”. The authors in [156] mention, “ . . . integration of
Remote Sens. 2022, 14, 3253 54 of 110

the Google AI platform with GEE creates a versatile technology to deploy deep
learning technologies at scale. Data migration and computational demands
are among the main present constraints in deploying these technologies in an
operational setting;”
# SNIC is the only object-based classifier on GEE; authors also want more “ad-
vanced methods” or just more options;
# Hyperparameter tuning is not possible on the platform [21], so many authors
use local software (e.g., scikit-learn) for this purpose and then upload the
models to GEE afterwards;
# One of the benefits of using an RF model is that you can run a feature impor-
tance analysis afterwards to determine which set of input features contributed
most to the model’s learning. However, this extremely common and important
operation is not possible on GEE.
• Inflexibility of models [19,35,46,152,159]: This limitation is similar to lack of models
but is different in that it describes issues using models already on GEE. For example,
authors in [35] emphasized, “A third limitation to the modeling approach described
here is its current incomplete use of cloud-computing services, and reliance on desk-
top computer power to run the BRT models. Ideally, the modeling would be run
within the same environment where the satellite data are preprocessed—Google Earth
Engine—or a similar cloud-computing service offering similar levels of access to
Sentinel datasets. GEE does currently provide machine-learning algorithms such
as random forests, but these do not provide the flexibility that is currently offered
within the BRT R functions”. This is both lack of methods and model inflexibility. The
authors in [46] found that in general the algorithms on GEE were not very flexible
and some preprocessing steps such as dealing with missing data were difficult to
implement. Thus, the authors performed all preprocessing steps outside of the GEE
platform.
• Lack of data [32,46,54,67,75,94,120,126,127,160–162,183,184,193,215,221]: This related
to both a lack of field observations and curated RS datasets.
# Not every data product is on GEE;
# Authors specifically called for a Landsat-Sentinel combined dataset. This dataset
could serve as the foundation for research in many different application areas by
expanding both the spatial and temporal resolution available to researchers;
# Very-high-resolution imagery is not on GEE, meaning that to validate GEE
prediction results authors often need to download this data locally.
• Importing and exporting data from GEE [83,126,193,198,234]: This process is time-
consuming and results in lower resolution classification maps. However, many
authors need to import or export data based on storage constraints on GEE.
• Other limitations:
# There is a delay from the time RS data are available and the time that they
are uploaded to the platform, limiting their utility for time-sensitive applica-
tions [213,214];
# Authors might have a hard time converting programs to GEE from their own
environment [81,136,217]. Cited issues were that authors were not familiar with
JavaScript, Python, or the GEE programming interface. Authors were concerned
that not everyone would have the skillset to implement models in GEE;
# A concern that data and code will not be kept private for sensitive use-
cases [217].

4.2. Challenges and Opportunities from an Application Perspective


Most of the current integrations of GEE and AI utilize data and models already
available on GEE (detailed in Section 3.2). Only a few papers proposed novel methods
Remote Sens. 2022, 14, 3253 55 of 110

(summarized in Section 3.3). Below, we provide some challenges and opportunities related
to application-oriented research.

4.2.1. Proof-of-Concept for Less Researched Applications and Novel Methods for Saturated
Application Domains
The authors in [107] point out, “ . . . classification method demonstrated in GEE is
useful to provide a quick understanding of oil palm plantations . . . This in itself is advanta-
geous for independent monitoring bodies to conduct a survey of the landscape in question
and conduct more detailed assessments if necessary.” For applications that are not yet
well-studied using GEE, it will be useful to run some proof-of-concept experiments on GEE.
These types of analyses will shed light on what limitations exist for doing domain-specific
research on the platform (e.g., are the main barriers a lack of data, lack of preprocessing
models or AI methods, etc.).
Even for very saturated application domains (e.g., wetland mapping, see Section 3.2.6),
there are few novel methods. We would like to clarify that it is not that there are no novel
methods. Researchers still use interesting preprocessing pipelines, creating new datasets, and
often use DL. Again, we take a very narrow view of “novel” in this paper and this definition is
confined to how researchers are using AI methods on the GEE platform. Researchers focused
on wetland studies seem to be much more focused on using free compute, compiling and
scaling up datasets over larger areas than would be possible on local machines, and creating
open-source processing and visualization pipelines. However, there is still a lot of room for
novel methods for those saturated application domains. For example, it would be useful
for a saturated application domain to experiment with novel methods developed for other
domains. The web app we developed for this review paper will serve as an important tool
to easily find novel methods (check the demo video of the web app for how to find a novel
method paper; the link to the video is provided in the Appendix A).

4.2.2. Using ML for Exploration/as an Aid to Human Expertise


At a certain point, it is difficult or impossible for humans to determine meaningful
relationships in complex, highly dimensional data. One of the ways AI is most helpful is in
data exploration. Still, the goal of EO-AI research should not be to automate away human
expertise since AI models cannot understand human values and are often heavily biased.
In [233], the authors use an RF set to output probabilities instead of class predictions to
identify possible archaeological features in Jordan. The authors in [235] do the same thing
in Pakistan over a large desert, saving time and effort that otherwise would have required
surveyors to spend time in potentially unsafe conditions. While [233,235] use ML models
to prepare for fieldwork identifying archaeological mounds over large areas, [234] used it
to identify potsherds in the field. They use drone imagery and GEE to identify potsherds,
thus allowing surveyors to focus their attention on finding and cataloging them even over
large areas. This exploration method has also been successfully demonstrated for burned
area mapping in [191]. The authors use a similar process of first using an RF model in
probability mode to find a good “starting point” for classification, and then they tweak
the probability to remove false positives before using a pixel-aggregation algorithm to
determine the final classification output. The authors are able to show that their method
shows good agreement with other commonly used burned area products, but with finer
classification boundaries. In order to explore the potential to distinguish between subtypes
of surface water body, the authors in [158] use slope, shape, and phenology, and flooding
information as input to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice
fields, and agricultural ponds. They found that their method does not work very well for
wetlands and the OA is not very high (85%) across classes. However, the RF model they
use is interpretable and they show which other subclasses are easy or more difficult to
predict for. Unfortunately, the entire preprocessing (method) cannot be run directly on
GEE, because the shape features cannot be calculated on the GEE platform and are crucial
Remote Sens. 2022, 14, 3253 56 of 110

to the overall analysis; the authors first have to do this in a local environment and then
upload them.
It is important to note that the authors in these papers are actively changing the results
of classification. In some cases, they are doing so many times (over several iterations). Thus,
they are introducing bias into their models, but the trade-off is acceptable if the emphasis is
on exploration rather than on statistical validity. This methodology is similar to using an
expert system where domain experts use ML systems in a “collaborative” way, blending
human expertise with the automation capabilities of AI. Still, these models would need to
be continuously tested on new data to make sure that their probability threshold values are
accurate, and their predictions should not be taken at face value.

4.2.3. More (High Quality) Data


Many authors suggested and called for more data for performance improvement
(e.g., [16,127,162]). However, as the studies in [105] showed, more data are not always
better. The authors in [105] investigated the difference in ML model performance when
using single image mosaics, time series RS imagery, statistical features (median, standard
deviation), band ratios, or all of the features listed. They test this by training an RF model
on each subset of data to create LULC maps in Brazil. The authors find that inputting a
time series of the data is the most accurate, more accurate even than when using all of the
computed indices and statistical features. The authors in [239] train four different multiple
linear regression models on sonar from field data collection and optical RS imagery to map
bathymetric depths in three different locations near Greece. They acquired good results
with a very simple, intuitive model. While current trends point to the use of models with
increasing complexity, it is important to note that many times a simpler model will perform
well given high-quality input data.
The analysis in [246] suggests an increased focus on dataset scaling is needed; the
authors further emphasized that scaling to larger and larger datasets is only beneficial
when the data are high quality. Meanwhile, as [247] emphasizes, more data and a simpler
NN is better than a bigger NN with more data. This echoes the data-centric views of
AI [248] proposed by one of the AI pioneers Dr. Andrew Ng. Ng says it is time for “data-
centric” solutions to big issues. Ng observes that 80% of the AI developer’s time is spent on
data preparation [248]. Thus, domain experts should be involved in creating high-quality
datasets, since they know the data sources and relevant input variables much better than
AI engineers. Thus, together with the authors in [246,247], we call for responsibly collecting
larger datasets with a high focus on dataset quality. We also call for researchers to share
their datasets on GEE, which would be useful to a wide variety of domains and researchers.
Potential recommendations in this direction: (1) improving the quality of existing datasets,
and (2) working on generating more data but with a focus on good quality.

4.2.4. Feature Engineering and Feature Importance


Feature engineering (see Appendix A.1 in [1]) using RS data is difficult because it is
time-consuming to do and often relies heavily on human experience, domain knowledge,
and technical expertise (in terms of location, what kind of data are being processed, what
variable to look for, etc.). Meanwhile, feature engineering is often necessary for a given ML
analysis because of the large amount of RS imagery coming in every day. In this way, DL
methods can help because they are able to recognize complex patterns in data without the
necessity for feature engineering (feature engineering can still help in DL analysis, but is
not necessary, i.e., NNs can learn complex patterns from raw data). Still, tradeoffs between
traditional ML models and DL methods have not been properly mapped out for the RS
space. The authors [43] set a good example by addressing this issue through comparing
the performance of an RF model (with feature engineering) to a long short-term memory
neural network (LSTM) and U-Net NN models (without feature engineering) to identify
pasturelands. The RF model was trained on GEE, while the DL models had to be trained
offline as GEE does not currently support DL models (Section 4.1.2). U-Net had the highest
Remote Sens. 2022, 14, 3253 57 of 110

generalization across both the validation and testing sets, maintaining high accuracy rates,
while the LSTM and RF model underfit the test set. To illustrate the tradeoffs between
ML and DL models, the authors include run and inference times. The RF model was able
to complete training and prediction in 3 h. As reported in [43], U-Net takes a long time
to train, while the LSTM takes a long time at inference time. Specifically, the LSTM took
30 min to train but 23 h to predict on the test set, while the U-Net took 24 h to train but
1.2 h at inference time. Much more work like this should be done to explore the strengths
and weaknesses for ML and DL models, as this will be helpful for many research areas that
would like to take advantage of GEE and AI.
With proper features from feature engineering, ML algorithms, which require less
(good quality) trained data than DL, often perform better than DL. For example, the authors
in [136] reported that their results indicated that the classification accuracy of DL was not as
good as traditional ML methods (e.g., SVM). We recommend the following three directions
for future studies in terms of feature engineering.
(1) Compare multiple ML algorithms or ML vs. DL algorithms: As pointed out
in [136], it is worth investigating which methods (ML vs. DL) are better for a specific
domain application. Their results indicated that the classification accuracy of DL was not
as good as traditional ML methods (e.g., SVM). Several ML models are compared in [115]
to map oil palm using Landsat 8 imagery in Malaysia. The authors find that tree-based
ML models (e.g., RF, CART) work better than an SVM for the task and are able to classify
large areas with high accuracy. Even so, classification errors are traced to the relatively
coarse resolution of Landsat data. The authors suggested that higher resolution imagery
(e.g., Sentinel) and the ability in the future to use DL methods on GEE will most probably
improve higher performance. The authors in [136] developed and implemented a new
pixel-based method (Ppf-CM) in GEE using 525 full Landsat scenes (19.96 billion pixels) to
monitor S. alterniflora dynamics. They found that Ppf-CM not only enhances the spectral
separability between S. alterniflora and others, but also improves the problems caused by
the scarcity of entire cloud-free Landsat scenes. These findings echo well with prior GEE-
supported pixel-based studies (e.g., [80]) and further confirm that pixel-based methods
outperform scene-based methods to monitor S. alterniflora. The classification results in [161]
were evaluated using both pixel-based and object-based RF classifications available on the
GEE platform. The results revealed the superiority of the object-based approach relative to
the pixel-based classification for wetland mapping.
The authors in [46] compare several algorithms on the GEE platform, including CART,
IKPamir, LR, a multi-layer perceptron (MLP), NB, RF, and an SVM, for crop-type classi-
fication. The authors also use an ensemble NN but have to move off the GEE platform
since NNs are not currently supported. The ensemble NN performed the best out of all the
models. The authors found that atmospherically corrected Landsat data boosted model
performance more than when models were fed Landsat composites data. The authors
in [56] compare the performance of an artificial neural network (ANN) to CART, RF, and
SVM models on GEE for sugarcane mapping in China using Sentinel-2 imagery. The au-
thors identify that the SVM performs the best, but then go on to show which type of errors
each model makes. For example: the ANN tended to overfit the data and give too much
preference to the sugarcane class, while tree-based models confuse the forest and water
classes. The authors then incorporate Normalized Difference Vegetation Index (NDVI)
information into the SVM to show how the model does with this extra information. It
is not clear why the authors did not allow each model to see NDVI information, as this
extra information may have helped various models learn better. If the authors wanted to
show how models learned from phenology information versus phenology combined with
NDVI information, they could have trained each model on separate subsets of the data.
While GEE allowed [159] to train several ML models, some models failed to run due to
computational constraints or inflexibility. The authors show that in all cases, ML models do
much better at binary than multi-class classification. The authors in [66] utilize many ML
algorithms available on GEE and compare specific time windows for phenological analysis
Remote Sens. 2022, 14, 3253 58 of 110

and find that the closer the data comes to planting and harvesting time, the better the ML
models performed.
(2) SAR + optical RS images for better model performance: In addition, many stud-
ies reported [17,57,68,72,74,165,166,182,212,214] or suggested in future work [46,56,107,215]
that SAR combined with optical RS images would improve model performance. Three
classification methods (SVM, RF, and decision fusion) were used in [52] for the pixel-wise
classification for crop mapping. The SVM classifier resulted in the lowest accuracy. The
integration of multispectral and SAR data improved the classification accuracy. To improve
the results in this study, the authors in [56] identify that using SAR data would be helpful
in removing the impact shadows have on classification errors for sugarcane mapping. The
authors in [95] compare the contribution of SAR data and different indices (e.g., NDVI,
EVI, Soil Adjusted Vegetation Index (SAVI), Normalized Difference Water Index (NDWI))
derived from optical data on overall classifier performance. They find that including SAR
data moderately improves performance, while only NDWI gives the ML model a signifi-
cant performance enhancement. Using optical, thermal, and SAR imagery in addition to
DEM data, [221] produces a global, high-resolution soil moisture map. The authors use a
gradient boosted regression tree (GBRT) model to train on in-situ observations paired with
RS imagery to then predict soil moisture in other locations. After running a relative variable
importance analysis, the authors can conclude that optical RS imagery and land-cover
information play the most important roles in determining soil moisture content, but that
SAR imagery and soil data also contribute significantly to the model’s overall performance.
This finding highlights other studies’ results ([95,161,182]) that the combination of optical
and SAR data improves predictive outcomes.
(3) What input for what algorithms (feature importance): This section is separate
from feature engineering in that it is less concerned with computing new features from
existing data than with determining which input variables contribute to model learning.
In [58], the random samples extracted from the training pool along with RS-derived
features and climate variables were then used to train ecoregion-stratified RF classifiers for
pixel-level classification. Evaluation of feature importance indicated that Landsat-derived
features played the primary role in classification in relatively arid regions while climate
variables were important in the more humid eastern states.
To investigate how best to identify impervious materials in RS imagery regardless
of cloud cover, [182] combine nighttime light, DEM, and SAR data and an RF model on
GEE. Their resulting maps are more accurate than commonly used maps like GlobeLand30.
More importantly, though, the authors quantitatively show that using multiple sources of
data are better than single sources for this task; optical data are the most important, but
SAR data improve accuracy rates across all metrics. In future studies, more work like this
needs to be done so that researchers can save time and effort by knowing which data will
be useful for a task beforehand. The authors in [178] compare different combinations of
input data and their impact on model performance. For their application, Landsat 8 data
serve as better input than Landsat 7 alone or Landsat 7 data with computed indices like
NDVI. Having access to datasets like the one produced by [178] will make it much easier
for future researchers to create more accurate building detection models, either by allowing
researchers to add to this dataset and training ML models or by using it as one of several
other datasets incorporated into the same analysis.
It is important not only to be able to map the current state of wetlands vegetation, but
how that vegetation is changing over time. However, different sets of input data and ML
methods used for change detection of wetland vegetation need to be evaluated more fully
as choices made during preprocessing and hyperparameter tuning can affect the end result
of an analysis. The authors in [138] use an adaptive stacking algorithm to train an ML
classifier on optical, SAR, and DEM data to identify wetland vegetation. Adaptive stacking
is using one ML classifier to identify the optimal combination of ensemble classifiers and
hyperparameters to be used for a given task. In this case, the authors use an RF model to
determine the best combination of the CART, Minimum Distance (MD), NaiveBayes (NB),
Remote Sens. 2022, 14, 3253 59 of 110

RF, and SVM classifiers on GEE. The authors find that the adaptive stacking method is
much more accurate than the RF and SVM models alone. The resulting classification map
is then combined with a trend analysis performed by the LandTrendr algorithm, which
allows them to identify wetland vegetation distribution as it is now and also how it has
changed over time. The authors in [138] also test their workflow on different subsets of
input data and show that adding more data helped the adaptive stacking algorithm learn
better (the best combination of input data was all of the data). The authors note that forest
and reed classes were not identified well with their adaptive stacking algorithm, and that
the LandTrendr algorithm will most likely need to be re-tuned in different environments.
The authors in [15] integrated single-date features with temporal characteristics from
six time-series trajectories (i.e., two Landsat shortwave infrared bands and four vegetation
indices), to produce an intact-disturbed forest map to track degraded forests. The whole
processing pipeline is done on GEE using an RF. The authors also ran a relative variable
importance analysis for each ecoregion. The authors are able to show that past maps
are a bit outdated due to their inability to separate forest classes by intact and degraded,
although their results vary from ecoregion to ecoregion. The purpose of the study in [21]
was to determine how the inclusion or exclusion of data for training RF models with RS
and temporally variable climate variables influences model outcomes. Cloud computing on
GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence
probability using LiDAR and RS data for the entire Alberta. Using a BRT, the authors are
able to match a current governmental effort in Alberta while also producing a relative
variable importance showing which RS variable might be the most useful for future wetland
mapping efforts in the area.
The authors in [55] used a CNN–LSTM hybrid model to predict soybean yield in the
contiguous United States using RS imagery alongside weather data and show that the
hybrid approach works better than either CNN or LSTM alone, although the results were
better in some states than others. Additionally, the authors create combinations of input
data to determine which variables are most important in training their NN.
A low-cost method was demonstrated in [107] for monitoring industrial oil palm plan-
tations in Indonesia using Landsat 8 imagery that allowed them to distinguish between
oil palm, forest, clouds, and water classes using the CART, RF, and MD algorithms. Their
results demonstrated that CART and RF had higher OA and Kappa coefficients than the
MD algorithm. In addition, the authors [107] compared model accuracy based on different
combinations of spectral bands (particularly red-green-blue (RGB) and infrared bands include
shortwave infrared (SWIR), thermal infrared (TIR), and near infrared (NIR)), including all
bands, to determine which would help specifically with oil palm plantation monitoring.
The authors in [136] used a specific invasive species in China as a case study for
developing an ML pipeline that takes into account both cloud cover and phenological
information. They compared the ability of a stacked autoencoder and an SVM to classify
vegetation types. While the SVM was trained on GEE, the DL model had to be trained
offline as the platform does not currently support DL models. The authors find that the DL
model performs better than the SVM and that both models perform better with phenological
information. The same species of plant can look different at different stages of its life while
also being submerged under water in some RS scenes. The authors in [140] argue that
phenology information in RS time series can better capture tidal flat wetland vegetation
and so can compare phenology information to statistical (min, max, median) and temporal
features (quartile ranges). They then feed this data into an RF while analyzing its effect on
model performance during different periods of time (all data, green and senescence seasons)
for wetland vegetation classification. The authors showed that the phenological information
was the most important input feature to the RF, while combining all three sets of features
led to the highest accuracy. In addition, the model performed best when predicting over
both the green and senescence periods, most likely providing the model with a better
estimate of the total variance needed to identify wetland vegetation. More research like
this should be done to isolate the importance of individual input features and time periods
Remote Sens. 2022, 14, 3253 60 of 110

in ML model performance. To explore the potential to distinguish between surface water


body subtypes, [158] use slope, shape, and phenology, and flooding information as input
to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural
ponds. Their method does not work very well for wetlands and the OA is not very high
(85%) across classes. However, the RF model they use is interpretable and they show which
other subclasses are easy or more difficult to predict for. The authors in [192] found that
Landsat 8 data led to higher fire burn estimates while still improving fire burn detection
accuracy, though both Landsat and Sentinel-2 catch more small fire patches than MODIS.
To determine the impact of using higher-resolution RS data products, the study in [192]
compared how Landsat and Sentinel optical imagery affected an ML model’s performance
in burn area classification. The authors used Weka clustering output and different spectral
and index information as input into the CART, RF, and SVM models available on GEE.
They find that both Landsat and Sentinel imagery produce much better maps that capture
small burn areas that current maps and fire monitoring products like MODIS are not able
to capture, though Sentinel imagery leads to an underestimation in burn area. The authors
also find that the tree-based algorithms perform comparably to each other but much better
than the SVM model. This study highlights the importance of analyzing different data
sources and ML models to show their respective contribution to predictive performance.

4.2.5. Creative Integration of Existing Algorithms Available on GEE


Through in-depth exploration, it is promising for domain experts to propose creative
integration of existing CV/ML algorithms available on GEE. The GEE has provided an
extensive cloud platform to train and classify using ML algorithms. The authors in [192]
have studied and evaluated the potential utilities of medium resolution satellite imageries
of Landsat-8 OLI and Sentinel-2 to estimate precise forest burnt area over Uttarakhand,
Himalaya. Specifically, they used the pre- and post-fire differential reflectance to capture
fire patches using “differenced” burn sensitive spectral indices (dNBR, dNDVI, dNDWI
and dSWIR, see the abbreviations list right before the References). The unsupervised Weka
cluster layer used as input to ML algorithms along with differential indices, which played
an important role to recognize the pattern and expansion of fire patches. Among three
ML algorithms, CART and RF achieved better performance in terms of accuracy (Kappa)
than SVM. To explore how CV algorithms and ML models can be used together on GEE,
the authors in [226] combine the existing Cloud-Score algorithm with an SVM to detect
clouds in imagery ranging from Amazon tropical forests, Hainan Island, and Sri Lanka. The
Cloud-Score algorithm first masks input RS imagery, then is used for input to train the SVM.
This process led to much higher accuracy rates than any of the other CV algorithms for cloud
detection and does so with considerably lower error rates. The authors in [150] analyze to
what degree different preprocessing steps affect the output water maps using both SAR and
DEM data and two variations of Otsu’s thresholding algorithm. They showed that SAR data
included radiometric terrain correction (RTC) as a preprocessing step yield more accurate
results and that Bmax Otsu thresholding is more stable to different inputs than Edge Otsu.
However, their analysis was limited in time and space, so more work needs to be done to test
their results in different locations and varying terrain types at different times.
Model stacking, ensemble learning, and label estimation: Many authors use or test
multiple CV, ML, and DL models in their research. Still, it is difficult to tune hyperparame-
ters or choose threshold values that affect the end result of a given analysis. To alleviate
these problems, several authors we identified in our review use different models to au-
tomate the hyperparameter tuning process. For example, in [135], the authors used two
different RF models to produce maps of vegetation change, as detailed in Section 3.2.4. The
authors in [138] used an RF model to train a separate ensemble classifier made up of CART,
MD, NB, SVM, and another RF model. The first RF was able to choose the best combination
of models and each model’s respective hyperparameters. This ensemble model performs
better at wetland detection than any of the models individually. In [185], the authors use
Remote Sens. 2022, 14, 3253 61 of 110

almost the exact same method (this time for building detection), though the final ensemble
is chosen via a manual weighting process.

4.2.6. Beyond ML: Modeling in GEE


A majority of papers we reviewed used data and algorithms on GEE to complete their
analyses. These “proof-of-concept” papers are often explorations by authors into how to
use the platform or for showing that research typically done offline can be done in the cloud.
However, many of the most straightforward classification and regression applications have
now been sufficiently demonstrated. In the future, one area of research that should be
given much more focus is on implementing more complex production applications built on
top of GEE that make use of modeling. For example, after mapping out urban areas using
ML classification techniques, the authors in [94] then went further and implemented an
ecosystem service value model on the platform. This is a creative way to use the available
parallel processing capabilities of GEE and is an under-researched application area.

4.3. Challenges and Opportunities from a Technical Perspective


Making the integration between GEE and AI more seamless would allow researchers
and practitioners in various domains to better take advantage of GEE and AI. From our
systematic review, we provide some identified future challenges, opportunities, and recom-
mendations below for researchers, practitioners, and engineers (including GEE engineers
at Google) to consider. Note that some of the recommendations are general directions
(Sections 4.3.1–4.3.5) and others are more specific (Section 4.4).

4.3.1. Model Implementation and Online Learning in GEE


GEE has a large data catalog and houses many preprocessing methods, as well as vari-
ous CV and ML algorithms. However, an often-cited limitation (Section 4.1.2) to research on
GEE is that there are not enough methods or models implemented on the platform. Thus, a
promising applied research direction for using GEE and AI is to implement and test CV
or ML models that would be useful to other researchers. For instance, the authors in [141]
developed a GPR model on GEE that works for both vector and tensor input. This model
was also cloud-optimized specifically for the size limits of GEE, making it a lightweight
but accurate option. Similarly, the authors in [84] implemented an unsupervised Bayesian
model in GEE for LULC classifications.
Many GEE and AI analyses rely heavily on optical imagery. Obtaining enough cloud-
free RS imagery can be difficult, though there are methods like Fmask that help remove
clouds when they are present in RS scenes. Still, options for cloud-removal algorithms on
GEE are limited. The authors in [223,224] both propose new methods for cloud removal
over large areas that can be run directly on GEE, without the need to download data.
In [224], the authors show that their proposed method outperforms popular algorithms
like Fmask and Automated Cloud Cover Assessment (ACCA) by 4–5% accuracy. The
authors in [223] tested their algorithm on both Landsat and Satellite pour l’ Observation
de la Terre (SPOT) optical imagery. Along with the authors in [107,239], we call for more
domain-specific novel AI methods to be implemented on the platform, which would be
useful to a wide-variety of researchers. To move towards the seamless integration of GEE
and AI, we suggest the following three directions.
(1) Simple but robust ML/CV methods: From current GEE limitations (see Section 4.1.2),
one promising way to make integration of GEE and AI smoother, deeper, and more robust,
is to develop simple, novel, and robust CV/ML methods, for example, Canny edge detec-
tor (developed by John F. Canny in 1986 but still being used in nowadays’ edge detection
applications, cited a total of 39,549 times as of 17 April 2022 and by 1953 papers in 2021; his
Google Scholar citations as of today total 66,796, though the author’s Canny edge detector
paper’s citations are about 60% of the total). We explicitly list this example here to show that it
is worth devoting the time to develop simple but robust CV/ML algorithms. The Canny edge
detector is robust and is used in many image processing applications. However, this specific
Remote Sens. 2022, 14, 3253 62 of 110

algorithm may not be appropriate for RS images or for RS images in a specific domain. We
call for AI and RS researchers and engineers to develop robust CV/ML methods for novel
and, ideally, computation-optimized, RS-image processing algorithms towards the smooth
and robust integration of GEE and AI.
(2) Reimplementing and/or optimizing (both classic and state-of-the-art) CV/ML
methods on GEE: The authors in [107] pointed out a need for more and better algorithms
on the GEE platform. The authors in [141] implemented GPR, which is increasingly used
because it is a transparent ML model that also outputs model uncertainties. The method
in [141] has been optimized for green Leaf Area Index (LAI) in RS imagery, in a way that is
optimized for GEE. First, they created the model so that it can run on vector or tensor time
series imagery. Then, the authors used active learning (AL) for feature reduction so that
the model only learns on important data while creating a model that can run within GEE’s
memory confines. This GPR model is then used to gap-fill RS imagery focused on LAI,
meaning the model is able to “see” through clouded optical imagery. More work like this
should be done, either in creating new models to upload to the cloud that other researchers
can use or optimizing these models so that they are memory efficient and thus can leverage
GEE on the cloud, instead of needing to preprocess and model training on local computers
or on Google Cloud AI. The authors mentioned that better GEE code documentation and
error messages could help future researchers interested in developing custom ML models
for the platform. (detailed in Section 3.2.4).
(3) DL with GEE: DL models are not currently available on the GEE platform (Section 4.1.2).
However, some authors [69,151,225,227,228] have found an interesting workaround that allows
them to use NN models directly in the cloud. All of these authors first train an NN model
outside GEE, and then upload the weight matrices as data files that can be read by the JavaScript
or Python development environments. Then, it is necessary to implement each layer in the
network (convolutional layers, activation layers, etc.), so that imagery can be run through the
NN at inference time to produce predictions. This method has worked across domains like water
extraction, cloud detection, and crop mapping. Still, there are several caveats to this approach.
First, researchers need to have access to the compute needed to train the NN model in the first
place. Often researchers are drawn to GEE because of the freely available compute, so this
method is mainly geared towards those looking specifically to use NNs. Researchers also need
to know how to implement and test different layers in an NN, a task that many EO researchers
may not have the experience for. Lastly, none of the authors listed above implemented the full
training process on GEE (e.g., forward and backpropagation).
Novel model architectures: Both [72,147] used the GEE platform to download and
process data with which they could then use to train novel NN models. The authors in [147]
trained a CNN called DeepWaterMapv2 that can handle flexible input sizes of optical RS
imagery and evaluate images with a constant runtime. Additionally, their CNN can filter out
clouds to fill in obstructed scenes and predict where water is with high accuracy. The authors
in [72], on the other hand, used both optical and SAR data from GEE to train a 3D U-Net
model for crop-type classification. The 3D CNN architecture shows an improvement over the
more traditionally used 2D convolution operations. Neither author used GEE itself for the
DL part of their analysis, because NN models are not currently supported on GEE. However,
their research shows that GEE makes it easy to locate data for a variety of applications.
Transfer learning (TL): TL is one powerful technique that makes models trained on
large sets of data and compute available for applications without these resources. TL was
initially proposed in [249] and recently received significant attention due to recent advances
in DL [250–255]. Inspired by humans’ capabilities to transfer knowledge across domains
(e.g., the knowledge gained while learning violin can be helpful to learn piano faster), the
main idea behind TL is that it is more efficient to take a DL model trained on an (unrelated)
massive image dataset (e.g., ImageNet [256]) in one domain, and transfer its knowledge to
a smaller dataset in another domain instead of training a DL classifier from scratch [257]. A
major assumption in many ML and DL algorithms is that the models will generalize to new,
unseen data given that it is from the same feature space and distribution [258], and that
Remote Sens. 2022, 14, 3253 63 of 110

there are universal, low-level features shared between datasets for different applications.
However, this assumption does not hold for many real-world problems. For example,
it is not uncommon that a classification task in one domain lacks sufficient data, but a
very large set of training data is available in another domain, where the data may be in a
different feature space or follow a different data distribution. In such situations, knowledge
transfer, if done successfully, would greatly boost the learning performance by avoiding
expensive and labor-intensive data-labeling efforts [250]. The authors in [71] showed that
TL works the best when they use a U-Net to map sugarcane in Thailand, meaning that the
pre-trained weights resulted in the highest accuracy, F1-score, precision, and recall. More
work should be done towards evaluating the effectiveness of TL within the EO studies as
it could potentially save large amounts of compute from not having to constantly train
DL models from scratch. The authors note that their model does not take into account
phenological information, which would have required changing the NN architecture, but
that this is an area for future research using their method.

4.3.2. Web Interface Tools to Support ML Exploration


The authors in [107] noted that there is a need to make intuitive, easy-to-use tools
for specific tasks that incorporate input from the public and other stakeholders like non-
governmental organizations (NGOs) and government agencies. We recommend the following
two directions for future research while researchers and practitioners develop such tools.
(1) Humans-in-the-loop: As authors in [1,116] emphasized, one big research direction
we recommend is human-in-the-loop ML Human-in-the-loop computing aims to achieve
what neither a human being nor a machine can achieve on their own. The authors in [259]
emphasized that a human-centered understanding of ML can lead not only to more usable
ML tools, but to new ways of learning computationally.
The authors in [234] used drone imagery and GEE to detect potsherds in the field in
the hopes of speeding up this process. They train a CART, RF, and SVM on this drone
imagery, but only the RF model produces adequate results. The authors test their workflow
in two separate locations in Greece. This research is interesting because the overall goal
of the paper is not to optimize accuracy per se, or to even replace human experts in the
field. As the authors note, “It is important to note here that this method does not aim to
substitute archaeological fieldwalking but complement much of the non-specialist work
conducted by groups of people for long periods of time in conductive environments so
there is more time and resources available to dedicate to specialized work”.
To explore how GEE could be used to create an open-source processing pipeline
for deforestation mapping in Liberia and Gabon, [116] used two different RF models to
create data masks and then predictions for various land types there. However, the output
classification maps were then shown to local experts to correct, boosting the accuracy of
the final accuracy rates. The authors showed that their method is more accurate than
other efforts to classify deforestation rates in these two countries, though there were still
some model misclassifications between classes due to not enough ground-truth data. This
presents a future area of research, where ML/DL/CV models are used to generate first-
order maps that are then verified by experts in that field (i.e., expert systems). Building
land classification maps in this way saves experts’ time but also keeps humans in-the-loop
where human values and knowledge can still be represented and included.
(2) Smart GEE + AI data annotator: ML, especially DL methods, are only as good as
the amount of labeled training data that they have access to. To accelerate the integration of
GEE with AI to generate informative insights, development of smart GEE data annotators
for GEE and AI is one of the most important directions to go. Humans and machines they
each have their own strengths, in the smart GEE-AI data annotator systems, the classifiers
should be able to select what samples are most confused based on the current learning
status and thus to ask the human annotator (e.g., domain expert) for annotation; this is what
AL is good at. The main idea behind AL is to take advantage of a large set of not-annotated
images by selecting which images would help to improve the performance of ML/DL
Remote Sens. 2022, 14, 3253 64 of 110

and thus need annotation through an uncertainty selection strategy (see [1] for a detailed
introduction about the selection strategy).
While DL receives lots of attention, these models still require a lot of input data and
large amounts of compute to train them. However, as compute becomes publicly available
in cloud-based platforms like GEE, obtaining large amounts of labeled training data remains
a key bottleneck to using DL models. One novel way to make the data labeling process less
time- and resource-intensive was illustrated in [156], where the authors used current water
maps and a segmentation algorithm to automatically collect data labels from Sentinel-1
imagery. These data are then used to train variations of U-Net in an offline environment.
Due to computational constraints, the authors were not able to compare their model to more
traditional ML models like an RF. Even with their automated data labeling pipeline, the
authors note that their study lacked sufficient data to adapt their method to more than one
country and manual validation was still necessary to validate the model post-prediction.

4.3.3. Open-Source GEE-AI Library Development


One promising way to accelerate the integration of GEE and AI is the availability
of open-source libraries in multiple languages (e.g., Python and R). All the studies we
have investigated using GEE with DL have trained their DL models offline (detailed in
Section 4.3.1), not directly on the GEE cloud computing environment.
A strong need of Python-based GEE application/package/framework: As [225] pointed
out “some specific convolution layers of DNN cannot be implemented in GEE. For example,
dilated convolution layer could not be achieved due to the fact that dilation is not supported
in the convolution API provided by GEE. Conversion other types of convolutions to the con-
volution used in this study may help to solve this problem and it needs further investigation”,
to make the integration of GEE and AI seamless, we need open-source GEE and AI libraries
that help make sure existing AI (especially ML and DL) algorithms can be used in the GEE
environment. Good examples of this are the Geemap [260] and Rgee [261] libraries, which
make it easy to access GEE JavaScript functions for researchers who use Python and R. The
authors in [35] noted that some uncertainties in their underlying training dataset, a lack of
subsurface soil information, and having to move between GEE and offline analysis may have
contributed to errors in this analysis. It will be easier to avoid errors like these if there are more
open-source Python/R libraries that make it easier to connect GEE and local computers. GEE
does provide JavaScript and Python APIs, but more work needs to be done to incorporate the
wealth of well-tested AI algorithms already available in Python into these APIs [94].

4.3.4. Model Deployment Using GEE as Backend


Several publications have used GEE as a backend to their applications (e.g., see [262,263]),
taking advantage of the parallel processing capabilities, freely available compute, and large
number of datasets. However, one of the main benefits of using GEE is the wide variety of CV
and ML algorithms available. In [32], the authors built a custom expert system to map global
surface water changes using the platform. Using GEE as a backend allowed them to both
run their analysis and then host their resulting maps on an interactive web browser. Another
example is Remap [86], an application that allows users to crowd-source LULC observations
while using GEE to browse data and make predictions using an RF model.

4.3.5. Vectorizing Data Boundaries


Both [72,147] implemented novel DL architectures using semantic segmentation. In the
future, it would be very useful to instead design models capable of instance segmentation.
That way, results can be vectorized and used to create global datasets for future mapping
research. Having digitized boundaries for individual ecological features is the first step
to monitoring them and measuring how they have changed over time. While the authors
in [156] demonstrated a novel way of creating data labels via segmentation algorithms,
this is still semantic segmentation and the data labels required additional verification. The
authors in [233], however, showed this is possible. First, the authors used an RF model for
Remote Sens. 2022, 14, 3253 65 of 110

detecting archaeological mounds. They then used an edge detection algorithm after the
supervised classification to automatically digitize/vectorize boundary features. Obtaining
an accuracy score before digitizing boundaries can give a higher level of confidence in
using the resulting dataset in future studies.

4.4. Overarching Challenges and Opportunities


We have provided some separate recommendations for future research for both
application-oriented (Section 4.2) and novel/technical (Section 4.3) research above. How-
ever, some higher-level combination of the recommendations for application-oriented and
more technical perspectives will strengthen the integration of GEE and AI and thus further
advance many domain areas of research. There are several opportunities for researchers
and practitioners who have interdisciplinary backgrounds and expertise or for research
groups with the required complementary expertise to team up and work on problems
at the intersection of RS and AI. For example, domain experts working on using ML for
exploration and as an aid to human expertise (detailed in Section 4.2.2) will be significantly
more productive if there are intuitive, interactive, and visual open-source web app tools
to support them in their work. Another example is deep and careful investigation of RS
sensors, imagery, and AI towards novel and effective models and algorithms tailored for
RS imagery. Particularly, simple but robust ML/CV methods (Section 4.3.1) will be more
effective if the researchers and practitioners who have interdisciplinary background and ex-
pertise can work towards developing RS image-specific tools, since most CV algorithms are
initially designed for camera images and videos, not for remotely sensed satellite images.
One more promising overarching direction is implementing open-source web tools
(Section 4.3.2) so that users who do not have a programming background can explore and
use existing CV/ML algorithms available on GEE. This could include reimplementing both
classic and state-of-the-art CV and/or ML algorithms and deploying them on GEE. A lack
of models and model flexibility are two of the most-cited limitations researchers give when
using GEE (Section 4.1.2), so by building out the number and type of algorithms on GEE,
scientists and practitioners will be better able to do their research in a more seamless way
on the platform. Additionally, the GEE Python and JavaScript API documentation pages
are not for domain expert users; they are made for web app developers, ML engineers,
or for those researchers and practitioners who have an interdisciplinary background. We
will stop here, as we do not want to confine research and practitioners’ imagination to
propose and develop creative and effective overarching opportunities that will significantly
advance various domains, which can leverage the power of GEE and AI.

5. Conclusions
To leverage RS big data for large-scale important challenges such as global climate
change, intelligent methods and computation-intensive and -supportive cloud platforms
(including cloud storage of huge RS datasets) are critical. GEE is a pilot platform that has
great potential to support both challenges (i.e., AI methods and cloud computing platform).
Yet to date, many application domains (Section 3) still remain at the proof-of-concept
stage regarding leveraging GEE and AI. This trend may relate to a steep learning curve
for researchers. Overall, based on our systematic and interactive (Appendix A) review,
we contend that GEE integrated with AI has great potential to provide a collaborative
and scalable platform for researchers, practitioners, and policymakers to solve critically
important problems in various areas. However, many challenges, and thus opportunities,
still remain for a deeper and more seamless integration of GEE and AI. This is especially
true of the integration between DL and the GEE platform, which is detailed in Sections 4.2
and 4.3. Up to now, to take advantage of DL with GEE, the time-consuming training process
still has to take place outside GEE. Researchers and practitioners either have to train DL
models offline on local computers or on a separate cloud computing platform (e.g., Google
cloud AI), which is often not freely available to the public. In summary, the deeper and
Remote Sens. 2022, 14, 3253 66 of 110

smoother integration of GEE and AI has considerable potential to address major scientific
and societal challenges such as climate change and natural hazards risk management.

Author Contributions: All authors have contributed to this review paper. L.Y. initiated the review,
contributed to writing and overall organization, identified selected research to include in the review,
supervised the web app design and development, and coordinated input from other authors. J.D.
took the lead on identifying relevant literature, contributed to writing and editing the text, and
provided the data for the accompanying interactive web app. S.S. contributed to the web app design
and development, word clouds visualization, and editing. Q.W. contributed to identifying selected
research to include in the review and in writing part of Section 3. H.C. contributed to writing part
of Section 3 and editing the whole manuscript. C.D.L. has contributed to editing. All authors have
revised the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This material is partly based upon work supported by the US National Aeronautics and
Space Administration under Grant number 80NSSC22K0384, and supported by the funding support
from the College of Arts and Sciences at University of New Mexico.
Acknowledgments: The authors are grateful to Gordon Woodhull for his useful UI/UX design
discussion. The authors are also grateful to the three reviewers for their useful suggestions.
Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations
The following abbreviations (ordered alphabetically) are used in this article:
ACCA Automated Cloud Cover Assessment
ADL Active Deep Learning
AEZ Agro-Ecological Zone
AI Artificial Intelligence
AIM-RRB Annual Irrigation Maps—Republican River Basin
AL Active Learning
ALOS Advanced Land Observing Satellite
ANN Artificial Neural Network
APEI Air Pollutant Emissions Inventory
API Application Program Interfaces
ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer
AVHRR Advanced Very High Resolution Radiometer
AWS Amazon Web Services
AW3D30 ALOS World 3D—30 m
BCLL Biodiversity Characterization at Landscape Level
BELMANIP2 Benchmark Land Multisite Analysis Intercomparison Products 2
BFAST Breaks for Additive Season and Trend
BGT Bagging Trees
BRT Boosted Regression Tree
BST Boosted Trees
BT Bagged Trees
CART Classification And Regression Tree
CCI-LC Climate Change Initiative Land Cover
CBERS China–Brazil Earth Resources Satellite
CBI Composite Burn Index
CDL Cropland Data Layer
CDOM Chromorphic Dissolved Organic Matter
CGD Crowdsourced Geographic Data
CGLS-LC100 Copernicus Global Land Cover Layer
CHELSA Climatologies at High Resolution for the Earth’s Land Surface Areas
Chl-a Chlorophyll-a
Colab Google Colaboratory
Remote Sens. 2022, 14, 3253 67 of 110

CONUS Coterminous United States


CORINE Coordination of Information on the Environment
CNB Continuous NaiveBayes
CNN Convolutional Neural Network
CV Computer Vision
CVAPS Change-Vector Analysis in Posterior Probability Space
CE Commission Error
CZMIL Coastal Zone Mapping and Imaging LiDAR
DEM Digital Elevation Model
DL Deep Learning
DMSP NTL Defense Meteorological Satellite Program Nighttime Lights
dNBR Differenced Normalized Burn Index
DnCNN Denoising Convolutional Neural Network
dNDVI Differenced Normalized Difference Vegetation Index
dNDWI Differenced Normalized Difference Water Index
DNN Deep Neural Network
DOC Dissolved Organic Carbon
DSM Digital Surface Model
dSWIR Differenced Shortwave Infrared
DT Decision Tree
DTM Digital Terrain Model
ELR Extreme Learning Machine Regression
EO Earth Observation
ESA European Space Agency
ETM+ Enhanced Thematic Mapper Plus
EVI Enhanced Vegetation Index
FAO Food and Agriculture Organization
FCN Fully Convolutional Network
FireCCI51 MODIS Fire Version 5.1
FormaTrend Forest Monitoring for Action—Trend
FPAR Fraction Photosynthetically Active Radiation
FROM-GLC Finer Resolution Observation and Monitoring of Global Land Cover
GBRT Gradient Boosted Regression Trees
GCEV1 Global Cropland Extent Version 1
GDEM Global Digital Elevation Map
GEE Google Earth Engine
GeoAI Geospatial Artificial Intelligence
GEOBIA Geographic Object-Based Image Analysis
GeoNEX Geostationary-NASA Earth Exchange
GFED4 Global Fire Emissions Database 4
GFSAD Global Food Security-Support Analysis Data
GHSL Global Human Settlement Layers
GIS Geographic Information System(s)
GIScience Geographic Information Science
GLCM Gray-Level Co-occurrence Matrix
GLC 2000 Global Land Cover 2000
GLDAS Global Land Data Assimilation System
GLOF Glacial Lake Outburst Floods
GMM Gaussian Mixture Model
GMTED2010 Global Multi-Resolution Terrain Elevation Data 2020
gmoMaxEnt Maximum Entropy Classifier
GPR Gaussian Process Regression
GREON Great Rivers Ecological Observation Network
GSW Global Surface Water
HAB Harmful Algal Blooms
IKPamir Intersection Kernel Passive Aggressive Method for Information Retrieval
Remote Sens. 2022, 14, 3253 68 of 110

INPE National Institute for Space Research (Brazil)


IoU Intersection over Union
IRS Indian Remote Sensing
JRC Joint Research Centre
KNN K-Nearest Neighbor
LAI Leaf Area Index
Landsat 8 OLI Operational Land Imager
LandTrendr Landsat-based Detection of Trends in Disturbance and Recovery
LiDAR Light Detection and Ranging
LIP Lake Ice Phenology
LSLTS Large-Scale and Long Time Series
LSTM Long Short-Term Memory
LSWI Land Surface Water Index
LULC Land Use and Land Cover
MAE Mean Absolute Error
Markov-CA Markov-based Cellular Automata
MERIT Multi-Error Removed Improved-Terrain
MCD12C1 MODIS Land Cover Type (5.5 km)
MCD12Q1 MODIS Land Cover Type (500 m)
MCD15A3H MODIS Terra Aqua Leaf Area Index/FPAR
MCD43A1 MODIS Bidirectional Reflectance Distribution Function (BRDF) Model Parameters
MCD43A4 MODIS Nadir BRDF-Adjusted Reflectance (NBAR)
MCD64A1 MODIS Burned Area Product
MD Minimum Distance
MDA Mean Decrease in Accuracy
MIoU Mean Intersection over Union
MIrAD-US MODIS Irrigated Agriculture
ML Machine Learning
MLP Multi-Layer Perceptron
MLR Multiple Linear Regression
MNDWI Modified Normalized Difference Water Index
MODIS Moderate Resolution Imaging Spectroradiometer
MOD09A1 MODIS Terra Surface Reflectance (500 m)
MOD09GQ MODIS Terra Surface Reflectance (250 m)
MOD11A2 MODIS Terra Land Surface Temperature and Emissivity
MOD13A2 MODIS Terra Vegetation Indices (1 km)
MOD13Q1 MODIS Terra Vegetation Indices (250 m)
MOD15A3 MODIS Terra Leaf Area Index/FPAR
MOD44B MODIS Terra Vegetation Continuous Fields
MSCNN Multiscale Convolutional Neural Network
MSI Multispectral Instrument
MTBS Monitoring Trends in Burn Severity dataset
MuWI-R Multi-Spectral Water Index
MYD11A2 MODIS Aqua Land Surface Temperature and Emissivity
NAIP National Agriculture Imagery Program
MYD13 MODIS Aqua Vegetation Indices
NASA National Aeronautics and Space Administration
NASS National Agricultural Statistics Service
NA Not Applicable
NB NaiveBayes
NDBI Normalized Difference Built-up Index
NDVI Normalized Difference Vegetation Index
NDWI Normalized Difference Water Index
NEX NASA Earth Exchange
NGA National Geospatial-Intelligence Agency
NGTI Normalized Difference Tillage Index
NFI National Forest Inventory
Remote Sens. 2022, 14, 3253 69 of 110

NICFI Norway’s International Climate and Forest Initiative


NIR Near Infrared
NLCD National Land Cover Dataset
NN Neural Network
NOAA National Oceanic and Atmospheric Administration
NS Not Specified
NWI National Wetland Inventory
OA Overall Accuracy
OE Omission Error
OLI Operation Land Imager
OSM OpenStreetMap
PA Producer’s Accuracy
PB Petabyte
Pegasos Primal Estimated sub-GrAdient SOlver for SVM
PRODES Amazon Deforestation Monitoring Project
PSNR Peak Signal-to-Noise Ratio
QA60 Sentinel 2 Quality Assurance Bitmask Cloud Band
QRF Quantile Regression Forest
RBR Relativized Burn Ratio
RF Random Forest
RFVC Relative Fractional Vegetation Cover
RGB Red-Green-Blue
RHSeg Recursive Hierarchical Segmentation
RMSE Root Mean Square Error
ROC Receiver Operator Curve
RRMSE Relative Root Mean Square Error
RS Remote Sensing
RTC Radiometric Terrain Correction
RUESVM Random Under-sampling Ensemble of Support Vector Machines
RVM Relevance Vector Machine
SAE Stacked AutoEncoder
SAR Synthetic Aperture Radar
SATVI Soil Adjusted Total Vegetation Index
SAVI Soil Adjusted Vegetation Index
SDS Satellite Derived Shoreline
SEN12MS-CR Sentinel 1 and 2 Multi-Spectral Cloud Removal dataset
SNIC Simple Non-Iterative Clustering
SPOT Satellite pour l’Observation de la Terre
SRTM Shuttle Radar Topography Mission
SSIM Structural Similarity Index
SSS Sea Surface Salinity
SST Sea Surface Temperature
Suomi-NPP NTL Suomi National Polar-orbiting Partnership Nighttime Lights
SVM Support Vector Machine
SWIR Shortwave Infrared
TB Terabyte
TIR Thermal Infrared
TL Transfer Learning
TM Thematic Mapper
TRMM Tropical Rainfall Measuring Mission
UA User’s Accuracy
UAS Unoccupied Aircraft Systems
UN-GGIM United Nations Initiative on Global Geospatial Information Management
USDA United States Department of Agriculture
USGS United States Geological Survey
VHR Very High Resolution
VIIRS NTL Visible Infrared Imaging Radiometer Suite Nighttime Lights
WUDAPT World Urban Database Access and Portal Tools
Remote Sens. 2022, 14, 3253 70 of 110

Appendix A. The Accompanying Interactive Web App Tool for the Literature of GEE
and AI
In Sections 1.1 and 3.1, we provided a brief map and graphic summary of the 200 papers
covered in this review. To allow readers to search for literature that is relevant to their
research interests, get more useful and dynamic information and insights from the papers re-
viewed, we have developed an interactive web app called iLit4GEE-AI
(https://geoair-lab.github.io/iLit4GEE-AI-WebApp/index.html (accessed on 1 May 2022)).
On our site, you will find:
• A brief web app demo video: the video link is accessible at the web app
page (top-right corner);
• Acronyms that are used in the data table of the web app, as well as explanations for
each data field and chart (also in the top-right corner). A plan to continuously update
and maintain the web app: To better serve the RS/GEE researcher and practitioner
community, as well as AI engineers who would like to contribute to RS and GEE, we will
continue to update the data to include new GEE + AI literature as it is published. Even
after this paper is published, we hope this web app will serve as one place to keep track
of a comprehensive and up-to-date list of GEE + AI literature. In the future, the data on
the web app will be maintained and continually updated by the members of the GeoAIR
Lab (Geospatial Artificial Intelligence Research and Visualization Laboratory). Our web app is
data-driven and scalable (i.e., once data gets updated, the web app will automatically
sync and update the visualization and filtering functions on the site).

Appendix B. Evaluation Metrics


For the most commonly used evaluation metrics in the context of combined GEE, AI,
and RS literatures, see Appendix C of our recent paper [264] at https://doi.org/10.3390/s2
2062416 (accessed on 1 May 2022).

Appendix C. Textual Summaries for Advances in Applications


To make the main body of the paper concise, but to also provide a comprehensive summary
of application domains that leverage GEE and AI, this appendix provides textual summaries for
selected studies in each of the application areas provided in Sections 3.2.1–3.2.18.

Appendix C.1. Textual Summaries for Crop Mapping


Landsat-8 and Sentinel-2 imagery were combined in [47] with elevation data to pro-
duce a crop map across continental Africa on the GEE platform. The crop extent map
was produced by combining the output of a Recursive Hierarchical Segmentation (RHSeg)
object-oriented segmentation with either a RF or SVM pixel-based classification to reduce
“salt and pepper” noise from using pixel-based models alone. The final, open-source data
product being compared to other commonly used crop maps. However, their method
relies on optical imagery and obtaining cloud-free, continuous scenes for the entire African
continent proved difficult. A two-step approach for crop identification in the central re-
gion of Ukraine was developed by the authors in [52] through exploiting intra-annual
variation of temporal signatures of remotely sensed observations (Sentinel-1 and Landsat
images) and prior knowledge of crop calendars. Landsat-based time-series metrics cap-
turing within-season phenological variation were first preprocessed. The developmental
stage of each crop was modeled by fitting harmonic function, which was then used for
the automatic generation of training samples. Three classification methods (SVM, RF, and
decision fusion) were used for the pixel-wise classification. The SVM classifier resulted in
the lowest accuracy. The integration of multispectral and SAR data improved the classifica-
tion accuracy. Large amounts of training points were collected in [67] from Google Earth
imagery and analyzed Landsat and DEM data to create a cropland data layer across Europe,
the Middle East, and Russia. Their results compared favorably to existing data products
like the United Nations Food and Agriculture Organization (FAO) estimates while relying
Remote Sens. 2022, 14, 3253 71 of 110

only on open-source data and releasing their code for the GEE platform. The authors
were also able to distinguish between crop subtypes like agriculture and agroforestry, a
common problem for many cropland data products. In addition, [67] showed that across
regions, NDVI, NDWI, and slope were good predictors for various crop labels while blue
and SWIR1 were not. While the authors achieved good results across a wide area, their
processing pipeline and thus results relied on relatively cloud-free Landsat data. In the
future, a harmonized Landsat-Sentinel data product would increase data availability and
improve results further. Lastly, the authors noted that while the data gathering process
was time- and resource-intensive, future projects that crowdsource or pool data products
together would save time and effort.
Over a three-year time period, the authors in [75] were able to map paddy rice using
Sentinel imagery by utilizing several different spectral indices and creating composites of
different paddy rice growth periods. Their results were highly accurate in three separate
areas. The authors shared their code on GEE, while also showing that their open-source
analysis showed good agreement with maps previously produced by government agencies.
However, the authors noted that their method was still subject to finding cloud-free optical
RS imagery and/or finding adequate cloud masking algorithms. In [68], the authors pro-
posed a paddy rice area extraction approach by using the combination of optical vegetation
indices and SAR data. The Sentinel-1A SAR and the Sentinel-2 MSI Level-2A imagery were
used to identify paddy rice. Three vegetation indices, namely NDVI, EVI, and land surface
water index (LSWI), were estimated from optical bands. Two polarization bands from
Sentinel SAR imagery were used as a supplement to overcome the cloud contamination
problem. This approach was applied with RF algorithm for the Jianghan Plain in China as
an experimental area. The authors in [71] thus used a U-Net to map sugarcane in Thailand
but used a lightweight NN as an encoder for the DL model to reduce compute costs. They
tested the network architecture using the RGB channels and pre-trained weights, RGB
channels and randomly initialized weights, and then randomly initialized weights while
using the RGB and NIR channels. Because DL models were not currently supported by GEE,
the authors used Google Cloud, GEE, and the Google AI Platform together to preprocess
their data and train their models. They showed that transfer learning works the best (i.e.,
the pre-trained weights resulted in the highest accuracy, F1-score, precision, and recall).
The authors noted that their model did not take into account phenological information,
which would have required changing the NN architecture, but that this was an area for
future research using their method.
Shade-grown coffee landscapes are critical to biodiversity in the forested tropics, but
mapping it is difficult because of mountainous terrain, cloud cover, and spectral similarities
to more traditional forested landscapes. The authors in [50] used Landsat, precipitation,
and DEM data to map shade-grown coffee in Nicaragua using a RF model. The authors
reported high accuracy scores across different land class types (including shade-grown
coffee), but also did a relative variable importance on what data contributed most to the
RF model’s performance. More specifically, [50] performed an ablation study where they
compared model performance based on increasing the number of features the model sees.
They found that elevation was the most important factor, followed by the correlation
between precipitation and NDVI, temperature, and slope, and seasonal information helped,
as well. The authors noted that high-resolution data would help boost accuracy metrics
in this classification task, but that increasing accuracy did not directly relate to increased
socio-cultural or economic relationships in the region of study. The authors in [57] mapped
corn at a 10-m resolution using multitemporal SAR and optical images. Certain metric
composites were calculated, including monthly composites and percentile composites for
Sentinel-1 images and percentile and interval mean composites for Sentinel-2 images, which
were used as input to the RF algorithm on the GEE platform. To avoid speckle noise in
the classification results, the pixel-based classification result was integrated with the object
segmentation boundary completed in eCognition software to generate an object-based
corn map according to crop intensity. In [78] the authors explored the differences between
Remote Sens. 2022, 14, 3253 72 of 110

Landsat and Sentinel imagery for identifying cotton in China over the course of the plant’s
life cycle. They found that Landsat data performed slightly better than Sentinel optical
imagery, perhaps due to compute constraints on GEE: all of Sentinel’s input bands were not
able to be used and vegetation indices were not able to be calculated, perhaps not taking
advantage of Sentinel’s full potential. However, for the three years of RS data analyzed in
the analysis, the authors only used Sentinel imagery for one year, making the results for
the two datasets not directly comparable. Importantly, though, the authors examined the
types of error that different input datasets made, finding for example that small dirt roads
were more distinguishable from cotton fields in Sentinel imagery than in Landsat imagery.
The authors in [66], showed that by using climate and soil data with RS imagery on
the GEE platform, it was possible to predict winter wheat yields 1–2 months ahead of
harvesting in China. The authors utilized many ML algorithms available on GEE and
compared specific time windows for phenological analysis and found that the closer the
data came to planting and harvesting time, the better the ML models performed. Still,
uncertainties from data resolution and human activity were present and affected the ability
of models to predict with high accuracy across agricultural zones.
Crop maps are often created using vegetation indices and field observation data. The
authors in [73] argued that this may lead to datasets and ML models that can only predict
in specific areas and not generalize up to larger areas (i.e., regions or countries) or to other
time periods in the same area. They further argued that what is needed is a more generalized
method that can take in information like weather and climate data or DEM data and scale up
to field-level predictions or larger. The authors compared a RF to three different DL models, a
DNN, 1D CNN, and LSTM for predicting wheat yield in China. The DNN and RF performed
the best over large areas, and the RF model often had the best performance. This is important
to note because RFs often have comparable or better performance than DL models but use
much less compute to train. However, this result could be due to the small size of the author’s
dataset, meaning that the DL models were not able to train on enough data to merit their use.
The authors ran a variable feature importance with the RF model across different years and
months within their data and showed that elevation, latitude, soil, and vegetation indices
were the most important input data while weather and climate data were the least important.
The authors in [76] utilized GEE, Sentinel-2, and field data to train a RF to first estimate
LAI and FPAR at a much finer spatial scale. Their LAI and FPAR maps matched well with
field observations and, when spatially aggregated to match the resolution of the MODIS
LAI/FPAR product, were in good agreement there, too. However, their method was based
on an assumption about static land cover classes over a three-year time span, meaning that
future work could potentially boost the accuracy of the method by checking to make sure this
assumption was not in fact dynamic and changing over this period.
The authors in [49] produced annual irrigation maps (1999–2016) in the US Northern
High Plains by combining all available Landsat satellite imagery with climate and soil
covariables in a RF classification workflow. In total, 9 Landsat variables and 11 covariables
were generated for use in the machine learning classification. To understand the relative
contribution of input variables to classification accuracy, permutation tests and GINI Index
metrics were run in R with an identically parameterized classifier since GEE did not output
variable importance measures at the time of this study. Two novel indices that integrate
plant greenness and moisture information ranked highest for both importance metrics used,
warranting further study for use in irrigation classification in other agricultural regions.
Statistical modeling suggested that precipitation and commodity price influenced irrigated
extent through time. This method relied on manually produced training and test datasets
well suited to identify areas where irrigation clearly enhances greenness. The authors
in [51] implemented an automatic irrigation mapping procedure in GEE that uses surface
reflectance satellite imagery from different sensors (Landsat 7/8, Sentinel-2, MODIS Terra
and Aqua imagery, SRTM DEM). The approach integrated in a novel way unsupervised
object-based image segmentation, unsupervised pixel-by-pixel classification, and multi-
temporal image analysis to distinguish productive irrigated fields from non-productive and
Remote Sens. 2022, 14, 3253 73 of 110

non-irrigated areas. The combination of these techniques enabled the detection of irrigated
areas without requiring any reference cropland data for training of the mapping algorithm.
The authors in [58] developed a rapid method to map Landsat-scale (30 m) irrigated
croplands across the conterminous United States (CONUS). The method was based upon
an automatic generation of training samples for most areas based on the assumptions
that irrigated crops appear greener than non-irrigated crops and had limited water stress.
Two intermediate irrigation maps were generated by segmenting Landsat-derived annual
maximum greenness and Enhanced Vegetation Index (EVI) using county-level thresholds
calibrated from an existing coarse resolution irrigation map. The random samples extracted
from the training pool along with RS-derived features and climate variables were then
used to train ecoregion-stratified RF classifiers for pixel-level classification. Evaluation
of feature importance indicated that Landsat-derived features played the primary role in
classification in relatively arid regions while climate variables were important in the more
humid eastern states.
The authors in [46] compared several algorithms on the GEE platform, CART, IKPamir,
logistic regression, a MLP, NB, RF, and an SVM, for crop-type classification in Ukraine.
The authors also used an ensemble NN but had to move off the GEE platform since NNs
were not currently supported. The ensemble NN performed the best out of all the models,
although the authors noted that the SVM algorithms were not working on the GEE platform.
To that end, the authors found that in general the algorithms on GEE were not very flexible,
and some preprocessing steps like dealing with missing data were difficult to implement,
so all preprocessing steps took place outside of the GEE platform. The authors found
that atmospherically corrected Landsat data boosted model performance more than when
models were fed Landsat composites data. In the future, [46] said that optical imagery
in conjunction with SAR data or combining data from multiple RS platforms would help
boost performance. The authors in [72] combined optical and SAR Sentinel data to create
higher-resolution maps capable of displaying information on less commonly mapped non-
staple crops in the US. First, the authors denoised their SAR data with a CNN, and then
fused this with optical RS imagery. These data were then used to train a RF, as well as three
separate DL models: SegNet, U-Net, and a 3D U-Net. The authors showed that fusing
optical and SAR data worked better than using optical data alone, that using denoised SAR
data in the fusion process led to higher accuracy scores, and that the best model was the
3D U-Net model trained on the optical-denoised SAR fused data. However, an interesting
finding was that the RF performed best when using only optical information alone. The
authors trained their DL models offline as NNs were not currently supported on GEE.
The authors mentioned that the extremely high accuracy rates of the 3D U-Net model
might indicate overfitting, and that when taking into account required training times, the
RF model performed well while using the least amount of compute across all datasets.
Lastly, this paper used semantic segmentation, but future research in the field should
investigate instance segmentation. Optical imagery is used in many EO analyses because it
is comparable to how humans see; we can easily understand it. However, it is often blocked
by clouds limiting its utility. SAR imagery works day or night regardless of cloud cover,
so [74] used it for crop classification while testing input composite image length and ML
classification performance. The authors compared an object-oriented classification method
combining the SNIC algorithm with a RF with that of a pixel-based method of just the RF
by itself. The authors found that adding SNIC to their processing routines smooths the
data before it was fed into the RF model, ultimately boosting accuracy rates more than 10%
in their study. They also showed that shorter time periods were more useful for making
composites for classification, most likely because plants look very different over the course
of a growing season. However, the authors noted that their method worked better for
larger cropland areas and might not generalize to other areas with smaller field sizes. The
authors in [56] compared the performance of an ANN to CART, RF, and SVM models on
GEE for sugarcane mapping in China using Sentinel-2 imagery. The authors identified that
the SVM performed the best, but then went on to show which type of errors each model
Remote Sens. 2022, 14, 3253 74 of 110

makes. For example: the ANN tended to overfit the data and give too much preference
to the sugarcane class, while tree-based models confuse the forest and water classes. The
authors then incorporated NDVI information into the SVM to show how the model did
with this extra information. To improve the results in this study, the authors identified
using SAR data would be helpful in removing the impact shadows have on classification
errors. The authors in [19] created an open-source map for several West African countries
using a RF model trained on Landsat data. Their map was moderately more accurate than
other maps produced for the region, though going further and demonstrating the difference
between feature importances based on wet and dry seasons for their countries of analysis.
The authors used GEE for processing data but needed to train their model offline because
the GEE RF model implementation was not flexible enough for their analysis. Papers like
this one show a trend that GEE is facilitating in that researchers now have freely available
compute and are moving away from local, small-scale classifications and towards regional,
national, and even global classification tasks.
The authors in [48] developed and implemented an automated cropland mapping
algorithm (ACMA) using MODIS 250-m 16-day NDVI time-series data. A web-based in
situ reference dataset repository was first developed to collect ground data through field
visits, very high spatial resolution data (sub-meter to 5-m), and through community by
crowdsourcing. Comprehensive knowledge base was then established for Africa using
the web repository. Second, clustered classes from each of the eight agro-ecological zones
generated using k-means algorithm were grouped together through quantitative spectral
matching techniques (QSMTs) and the group of similar cluster classes was matched with
the ideal spectra to identify and label classes. This process produced a reference cropland
layer for the year 2014 (RCL2014) for the entire African continent consisting of five crop
products (cropland extent and areas; irrigated versus rainfed croplands; cropping intensi-
ties; crop type and/or dominance; croplands versus cropland fallows). Third, decision tree
(DT) algorithms were established for the eight agro-ecological zones (AEZs) based on the
RCL2014 knowledge base which was subsequently composed into an ACMA applicable
for the entire African continent. Finally, the ACMA algorithm was deployed on GEE and
applied on MODIS data from 2003 through 2014 to produce annual ACMA generated
cropland layers. The Agriculture and Agri-Food Canada (AAFC) has been responsible
for producing Annual Space-Based Crop Inventory (ACI) maps for Canada. The 30-m
ACI maps were created by applying a decision tree method to optical (e.g., Landsat) and
SAR data (e.g., Radarsat-2). With the goal of producing ACI maps more effectively and
efficiently, the authors in [69] developed an object-based method (i.e., simple Non-Iterative
Clustering (SNIC)) for producing ACI maps based on Sentinel-1 SAR data and Sentinel-2
optional data. The GEE platform and ANN were used to produce an ACI map for 2018.
The OA was reported at 77%. Even though the OA was slightly lower than that of the
AAFC’ ACI maps, the authors argued that their proposed GEE method is promising due to
its superior computational efficiency.

Appendix C.2. Textual Summaries for Land Cover Classification


The authors in [80] used a RF model to determine land-use classes such as vegetation,
croplands, and urban areas from Landsat imagery in Zambia. The authors noted that
the GEE platform allowed their workflow to be more flexible, leading to this type- and
place-specific land cover application. However, the authors had to leave the GEE platform
to create verification points for the ML training process. The authors compared their
maps to other commonly used land cover maps like Globecover, GLC 2000, and GFSAD
and noted the similarities and differences. The authors in [81] presented an approach to
quantify continental land cover and impervious surface changes over continental Africa
for 2000–2015 using Landsat images and a RF classifier on GEE. Landsat spectral bands,
NDVI, NDWI and night-time light served as predictor variables. This study relied on
visual inspection of high-resolution imagery to produce training data. The authors in [82]
proposed a land-use/land-cover type discrimination method based on a CART, applied
Remote Sens. 2022, 14, 3253 75 of 110

change-vector analysis in posterior probability space (CVAPS) and the best histogram
maximum entropy method for change detection, and further improved the accuracy of the
land-updating results in combination with NDVI timing analysis. Selecting western China
as the research area and using GEE’s JavaScript API interface, they obtained a 2014 land
map based on the ESA GlobCover 2009 dataset. A total of 1000 verification points were
selected for visual interpretation in Google Earth. A program with Node.js and JavaScript
was also developed to randomly generate validation points and an auxiliary rectangle.
The results of the transfer error matrix analysis showed that the overall accuracy of the
land map from the proposed CART-CVAPS-NDVI method was 78.6–88.2%. The authors
in [93] designed such a workflow on GEE for Iran using Sentinel-1 and -2 data and a RF
model and SNIC. With the ground-truth training samples available, the authors used SNIC
to segment land-use classes into objects while the RF model classifies them on the pixel
level. Afterwards, visual assessment was used to verify majority voting between the two
classifiers for 13 different land-use classes. While there was some confusion between similar
classes (e.g., water and marshland), this analysis resulted in a much higher resolution,
much more accurate land-use map of Iran than the 2016 map. However, the authors noted
that in some ways GEE limited their study: for example, SNIC was the only segmentation
algorithm on GEE. Additionally, because of computational limits on the platform, only so
many training samples can be included, and input features have to be chosen carefully
before feeding them to a ML model.
The authors in [83] utilized Landsat images available through GEE to map annual
land-use changes in China’s poverty-stricken areas. Landsat 8 images from 2013–2018 were
preprocessed and then used to compute spectral indices (e.g., NDVI, Normalized Difference
Built-up Index (NDBI), MNDWI). Night-time data were also included to improve the
extraction of built-up areas. A RF classifier was then trained and used to perform land-use
classification in poverty areas. The results revealed significant variations in land-use change
among the poverty areas in China. Some poverty areas had more intense construction
activities than others. The authors mentioned some limitations of GEE, for example, the
low computational efficiency of vector data. Uploading data to GEE or exporting data
from GEE can be time-consuming. The authors in [87] set out to create an open-source
land cover mapping processing pipeline using GEE. They argued that land cover maps
specifically can help countries properly plan for sustainable levels of food production, but
that many developing countries did not have the financial or compute resources to monitor
land classes in real time. Using SVM and bagged trees (BT) models, the authors predicted
urban, agriculture, tree, vegetation, water, and barren land-use types in Lesotho. However,
the authors had low accuracy rates across most classes. During the ML training process, the
authors ultimately had to leave the GEE platform because of “out-of-computation” time
errors in the code editor.
The authors in [88] collected a multi-seasonal sample set for global land cover map-
ping in 2015 from Landsat 8 images. The concept of “stable classification” was used to
approximately determine how much reduction in training sample and how much land
cover change or image interpretation errors can be acceptable. Using a RF algorithm with
200 trees, a numerical experiment showed that less than 1% overall accuracy was lost when
less than 40% of the total global training sample set were used, when 20% of the global
training sample points were in error, or even the land cover changed by 20%. With this
knowledge in mind, the authors transferred their 2015 global training sample set at 30-m
resolution to 10-m resolution Sentinel-2 images acquired in 2017 and produced a 10-m
resolution global land cover map.
Feature engineering can lead to higher accuracies in EO analyses when using ML.
However, it is difficult to create features that you know will be useful to a model beforehand,
even with expert domain knowledge in a given area. Thus, the authors in [105] tested the
difference in model performance when using single image mosaics, time series RS imagery,
statistical features (median, standard deviation), band ratios, or all of the features listed.
They test this by training a RF model on each subset of data to create LULC maps in Brazil.
Remote Sens. 2022, 14, 3253 76 of 110

The authors found that inputting a time series of the data was the most accurate, more
accurate even than when using all of the data. This research showed that more data was not
always better and that feature engineering did not always lead to better model performance
despite the increased compute cost. The authors in [92] trained several different ML models
available on GEE with different combinations of input data to determine which were the
most important in determining land-use types in Golden Gate Highland Park in China.
The authors compared combinations of different band ratios, elevation, aspect, and slope
data and found that including SWIR data in their analysis reduced classification errors
in areas with sparse vegetation. Different models were able to capture different land-use
types. For example, SVMs better distinguished between urban and agricultural lands,
while the RF model used was better at identifying forested landscapes, suggesting that
different types of models may be suitable for different tasks. Even though OA rates were
high for the best models, most models still had issues telling bare or rocky landscapes
apart from drier vegetation. The authors in [95] set out to compare the contribution of
SAR data and different indices (NDVI, EVI, SAVI, NDWI) derived from optical data on
overall classifier performance. They found that including SAR data moderately improved
performance, while only NDWI gave the ML model a significant performance enhancement.
The authors still struggled to classify vegetation subtypes like shrubs, grasslands, and
aquatic vegetation, but their accuracy rates matched those of common LULC maps like
Finer Resolution Observation and Monitoring of Global Land Cover 30 m (FROM-GLC30)
and GlobeLand30. This work contributed to a growing body of literature attempting to
empirically show which input data types can help identify which LULC classes using RS
and ML. This researchers in [98] generated a land cover map of the whole African continent
at 10 m resolution, using multiple data sources including Sentinel-2, Landsat-8, Global
Human Settlement Layer (GHSL), Night Time Light (NTL) Data, SRTM, and MODIS Land
Surface Temperature (LST). Different combinations of data sources were tried to determine
the best data input configurations. It was found there was always an increase of accuracy
when new data were introduced. They also conducted an investigation of the importance
of individual features derived from a RF classifier. A transferability analysis experiment
was designed to study the influence of sampling strategies on the land cover mapping
performance. It was suggested that training samples of natural land cover classes should
be collected from areas covering each main Köppen climate zone for African land cover
mapping and other similar tasks. Different data sampling strategies and their effects on
how different ML classifiers performed on LULC tasks were compared in [101]. The authors
trained a Relevance Vector Machine (RVM) offline in addition to the CART, RF, and SVM
models on GEE. For their particular LULC application, stratified proportional random
sampling led to higher overall accuracy scores than stratified equal random sampling or
stratified systematic sampling and the RF model performed better than the CART, RVM,
and SVM. However, their study lacked ground truth data, so the authors needed to use
existing land cover maps for data collection purposes. As a result, even the best model (RF)
had trouble recognizing classes without many samples leading to low class accuracies.
The authors in [96] proposed a hybrid data balancing method, called the Partial
Random Over-Sampling and Random Under-Sampling (PROSRUS), to resolve the class
imbalance issue. PROSRUS used a partial balancing approach with hundreds of fractions
for majority and minority classes to balance datasets. The reference samples were generated
using visual interpretation of very high spatial resolution images of Google Earth. It was
observed that PROSRUS had better performance than several other balancing methods
and increased the accuracy of minority classes without a reduction in overall classification
accuracy. It was noted though that every dataset requires a specific balancing ratio to
obtain the optimal result because the imbalance ratios and complexity levels are different
for different datasets. It also showed that topographic data including elevation, slope, and
aspect had higher impacts than spectral indices in improving the accuracy of MLC maps.
The authors in [97] proposed a new method by integrating random under-sampling of
majority classes and an ensemble of Support Vector Machines, namely Random Under-
Remote Sens. 2022, 14, 3253 77 of 110

sampling Ensemble of Support Vector Machines (RUESVMs). Specifically, the RUESVMs


method created an ensemble of SVM classifiers that each was trained by a randomly
under sampled subset of the original imbalanced data based on the defined fractions, and
finally combined the output of the SVM classifiers using majority voting. The performance
of RUESVMs for LC classification was evaluated in GEE over two case studies using
Sentinel-2 time-series data and five well-known spectral indices. The results showed
that the RUESVMs method considerably outperforms the other benchmarks methods. It
not only increased the accuracy of minority classes, but also increased the accuracy of
majority classes. Aiming to resolve the problem of lack of training samples for dynamic
global land cover mapping efforts, [99] developed an automatic training sample migration
method based on the first all-season sample set (FAST) in 2015 (Li et al., 2017) and all
available Landsat 5 TM archives in GEE. Spectral similarity and spectral distance measure
were calculated between the reference spectra and target spectra. Threshold values were
determined to indicate a land cover change in a pixel. EO analyses making use of ML are
often limited by the number of labeled training samples available in a given domain. The
authors in [104] created a training set by pairing Landsat imagery with a MODIS LULC
map as labels. This allowed them to train CART and RF classifiers in both Australia and
the United States, though their results indicated that, because of their small dataset, both
models were overfitting on the training set compared to the test set. While determining
ecosystem service values is complicated (many disciplines, many opinions . . . ), the authors
in [94] used GEE to illustrate a processing workflow for how LULC classes can be used to
compute more complex ecosystem service values. Their open-source code and ecosystem
model analyzed both optical RS imagery and DEM data. However, GEE did not support
historical imagery, meaning that all the data the authors wanted to use were not available
on the platform.

Appendix C.3. Textual Summaries for Forest and Deforestation Monitoring


The authors in [120] analyzed Sentinel-2 data and trained several different ML classi-
fiers to distinguish between four different forest types in Italy during both summer and
winter seasons. The authors compared combinations of the visible and infrared bands,
vegetation indices, DEM data, and unsupervised classification output as input to CART, RF,
and SVM models to see what effect different data sources had on model performance. They
found that the best performing model was a RF trained on all of the input data, though
accuracy rates varied across different classes. The authors completed the entire analysis
completely within the GEE platform, allowing people regardless of programming skill and
available compute to rerun their analysis. However, the authors noted that this effort also
meant not being able to use third-party libraries for data processing and analysis like the
Python API for GEE. To create a forest-type map in India using RS imagery and ML, [123]
predicted for evergreen and deciduous forest types, as well as “non-forest” classes. The
authors created NDVI signatures based on Landsat imagery and fed this information to a
RF. For several classes, the authors achieved low accuracy rates. However, they achieved
higher accuracy than the current MODIS maps used for forest cover, yet also showed where
their predictions matched those of the MODIS maps. Analyses like this one contribute to
a growing body of literature that show where current land maps need improvement and
serve as a call to update land-use maps to a higher resolution. The authors made their code
freely available both on GEE and GitHub so that their analysis can be rerun and improved
upon. To classify tree species across a large area in China while fitting within compute
restraints, [121] trained a RF on optical and SAR imagery, DEM data, and field observations
on the GEE platform. Across seven different tree species, the authors achieved an OA
rate of 77.5%, but noted that including climate and soil data in addition to incorporating
ecological models would help boost accuracy rates. The authors in [118] used GEE to
map mangrove extent in Indonesia. The authors used a SVM trained on Landsat data
while also predicting for water and cloud LULC classes. However, the authors had a low
accuracy rate for identifying mangroves, the class that they were actually trying to predict
Remote Sens. 2022, 14, 3253 78 of 110

for. Most issues classification errors were related to cloud and hill shadows and identifying
mangroves farther away from the coastline. Further, the authors used visual assessment as
their only accuracy metric. While representing classification accuracy visually is certainly
important, more quantitative measures are needed in order to properly compare results
from different studies.
The authors in [109] developed and tested a participatory mapping methodology to
map the extent and species composition of forest plantations in the Southern Highlands
area of Tanzania. A large set of reference data was collected in a two-week participatory
GIS campaign in which local experts interpreted very high-resolution satellite images in
Google Earth through the Collect Earth tool in the open-source Open Foris suite. Three
different classifiers (CART, SVM, and RF) were tested to classify a multi-sensor image
stack of Landsat 8 (2013–2015), Sentinel-2 (2015–2016), Sentinel-1 (2015), and SRTM derived
elevation and slope data layers. A RF with 150 trees was selected for creation of the forest
plantation area and planted species distribution maps. One of the main challenges in
participatory reference data collection was the quality and consistency of the collected
samples. The study found that sufficient training prior to the data collection was crucial for
the interpretation success. The interpretation agreement generally declines when details
are increased from forest plantation coverage to specific plantation quality attributes. The
authors stated that at least in complex environments, it may not be realistic to expect good
accuracy on detailed level information such as tree species or age derived from visual
interpretation of optical data. To explore how GEE could be used to create an open-source
processing pipeline for deforestation mapping in Liberia and Gabon, the authors in [116]
used two different RF models to create data masks and then predictions for various land
types there. However, the output classification maps were then shown to local experts to
correct, boosting the accuracy of the final accuracy rates. The authors showed that their
method was more accurate than other efforts to classify deforestation rates in these two
countries, though there were still some model misclassifications between classes due to
not enough ground-truth data. This presents a future area of research, where ML/DL/CV
models are used to generate first-order maps that are then verified by experts in that
field (i.e., expert systems). Building land classification maps in this way saves experts’
time but also keeps humans in-the-loop where human values and knowledge can still be
represented and included.
The authors in [125] developed a method for monitoring tropical forest loss and
recovery based on Landsat data. First, the authors used a RF model to map canopy cover
through time as a proxy for forest degradation and then applied the LandTrendr algorithm
to detect changes over a 19-year period. They found that the most valuable variables for
predicting tree canopy decline and regrowth was shortwave surface reflectance data and
an index related to plant moisture. While Landsat data were useful for tracking changes in
forest distribution through time, the authors noted that more very high-resolution products
for ground-truthing would benefit their analysis, as would the use of SAR data since
tropical forests were covered by clouds a large portion of the time. Using SAR data as
input and high-resolution optical data as validation data, the authors in [124] trained a
U-Net on Google Cloud to create monthly forest loss maps. They compared this model
with a RF trained on GEE while testing both models in Brazil and the United States where
both logging activity and wildfires were prevalent. They showed that the U-Net model
outperformed the RF in most cases, though the RF model still achieved high accuracy
rates. However, when the U-Net model was trained on data from one region and then
applied to the other, it did not perform well. Thus, the CNN is not generalizable and
would need to be re-trained before being used in additional locations. In [117], the authors
showed how GEE can be used to overcome data storage and compute needs and analyze
about 20 years’ worth of Landsat data to determine forest cover changes. The authors
used a RF model to show where deforestation has continued versus where forests have
partly recovered. Then, they fed the predictions of their RF model to an ANN-based forest
projection model to simulate forest loss up through 2028. The authors noted that because of
Remote Sens. 2022, 14, 3253 79 of 110

a lack of availability in reference high-resolution RS imagery, certain years in their analysis


could not be validated. In [122], the authors used a RF for initial LULC classification, then
used a MLP to simulate possible deforestation scenarios into the future. Finally, the authors
used a Markov-based Cellular Automata (Markov-CA) model to analyze the probability
of transition scenarios. Their results verified previous research findings and their maps
showed good agreement to current efforts to map forest change like those of the Amazon
Deforestation Monitoring Project (PRODES) program. What’s more, the authors identified
several key factors indicating high rates of deforestation, such as proximity to roads and
urban centers.
The authors in [107] demonstrated a low-cost method for monitoring industrial oil
palm plantations in Indonesia using Landsat 8 imagery that allowed them to distinguish
between oil palm (immature oil palm, mature oil palm), forest, clouds, and water classes
using the CART, RF, and MD algorithms. Their results demonstrated that CART and RF
had higher OA and Kappa coefficients than the MD algorithm. Critically, the authors
compared model accuracy based on different combinations of spectral bands (particularly,
RGB, SWIR, TIR, and NIR), including all bands, to determine which would help specifically
with oil palm plantation monitoring. The authors did not use SAR for this analysis but
noted that in future work the combination of optical and SAR imagery might improve
results. They also pointed out a need for more and better algorithms on the GEE platform.
Lastly, the authors noted that there was a need to make intuitive, easy-to-use tools for
specific tasks that incorporated input from the public and other stakeholders like NGOs
and government agencies. The authors in [115] compared several ML models to map oil
palm using 30 m Landsat 8 imagery in Malaysia. The authors found that tree-based models
(e.g., RF, CART) worked better than a SVM for the task and were able to classify large
areas with high accuracy. Even so, classification errors were traced to the relatively coarse
resolution of Landsat data. The authors noted that higher resolution platforms like Sentinel
and the ability in the future to use DL methods on GEE will lead to higher performance.
As a highly forested landscape, southern Belize has been experiencing deforestation due
to agricultural expansion. In [108], the authors utilized Landsat 8 imagery on the GEE
to perform a supervised classification. Subsequently, they built a MLP model to predict
future deforestation patterns and magnitude based on the drivers of past deforestation
patterns in the region. The projections indicated that the forest cover in southern Belize
will decrease from 75.0% in 2016 to 71.9% in 2016. The deforestation prediction maps can
provide useful information for stakeholders on how to better allocate resources to protect
forested landscape and improve the biodiversity of ecosystems.
The authors in [15] addressed this issue by mapping disturbed forest areas in Brazil
using 27 years of Landsat surface reflectance imagery. By separating out old-growth forests
from degraded forests and deforested regions, the authors were able to produce an intact-
disturbed forest map to track degraded forests. The whole processing pipeline was done
on GEE using a RF. They integrated single date features with temporal characteristics
from six time-series trajectories, in particular, two Landsat shortwave infrared bands and
four vegetation indices. The authors run a relative variable importance analysis for each
ecoregion. The authors were able to show that past maps were a bit outdated due to their
inability to separate forest classes by intact and degraded, although their results vary from
ecoregion to ecoregion.

Appendix C.4. Textual Summaries for Vegetation Mapping


The authors in [129] developed and tested an approach to automate the mapping and
quantification of vegetation cover and biomass using Landsat 7 and Landsat 8 imagery
across the grazing season (i.e., changing phenological conditions). Using a best-subset
regression modeling approach, they found that the best predictor variables vary by sea-
son, corresponding to vegetation phenology. It was found that NDVI, a rough proxy of
vegetation production which is widely used for rangeland monitoring tools, is less accu-
rate when vegetation contains high proportions of standing dead or senescent vegetation.
Remote Sens. 2022, 14, 3253 80 of 110

Different NDVI thresholds were determined to guide season-specific model application.


They showed that using NDVI to select from seasonal models for application increased
accuracy when modeling vegetation amounts at varying growth stages compared to the
single variable all-year normalized difference tillage index (NDTI) models. In [130], the
authors utilized the historical Landsat satellite record, gridded meteorology, abiotic land
surface data, and over 30,000 field plots within a RF model to predict per-pixel percent
cover of annual forbs and grasses, perennial forbs and grasses, shrubs, and bare ground
over the western United States from 1984 to 2017, at approximately 30 m resolution. The
R ranger package, which provides diagnostic tools and variable importance ranking, was
first used to define RF model parameters and select the optimal input variables. The RF
model was then implemented in GEE to predict percent cover using the top 40 most impor-
tant variables per class. With continuous rather than categorical estimates of vegetation
cover, it is possible to assess changes in functional group composition, transitions to new
vegetation states, efficacy of vegetation treatments, and vegetation dynamics pre- and
post-disturbance across space and time. Using climate and field data alongside Landsat
imagery and MODIS land-use maps, ML models used in [21] were able to predict for
several important rangeland indicators like plant height, total vegetation and rock cover,
as well as bare soil. After running a relative variable importance analysis, the authors
found that topographic variables were less important to the best performing model (RF),
while the MODIS land map input data were the driving factor in model performance.
However, the authors noted that because GEE did not have hyperparameter tuning for ML
models, they trained some offline. Additionally, while this analysis used RS imagery and
current land-use maps to make predictions, it was still reliant on field data. Because of a
lack of observations during the winter for western US rangelands, the authors cautioned
that before their model was used for making predictions during that season, more field
observations would need to be collected first to tune their model.
The authors in [136] used a specific invasive species in China as a case study for
developing a ML pipeline that takes into account both cloud cover and phenological
information. They compared the ability of a stacked autoencoder and a SVM to classify
vegetation types. While the SVM was trained on GEE, the DL model had to be trained
offline as the platform did not currently support DL models. The authors found that the
DL model performed better than the SVM and that both models performed better with
phenological information. Even so, the authors noted that the 16-day return time of Landsat
imagery was a limiting factor in their analysis and that further work could be done to apply
their method on Sentinel imagery. Importantly, the authors in [136] called on other authors
to upload the ground observation data and final maps to GEE so that authors can replicate
studies and compare results. Increasingly, researchers are not only producing maps but
comparing them to current data products and seeing how they differ. In order to produce a
map of this invasive species, [139] collected and processed field data in addition to UAS
imagery and optical RS data from several different platforms. The authors trained a RF
model for classification purposes and while the data processing was done in GEE, all the
ML portion was done outside of the platform. By using a RF, the authors were able to show
exactly how the model was making decisions, distinguishing between mud flats and water
and different coastal grass species. While the authors were able to achieve high accuracy
rates, issues related to cloud masking, not incorporating phenological information, and
challenges in identifying submerged grasses in tidal areas led to some model uncertainty.
In order to clarify what changes have been happening there, over three decades
worth of Landsat imagery was used in [135] to determine which areas have experienced
vegetation change. Two RF models were used on the GEE platform. The first classified
land-use types and assessed the stability of predictions for those classes over half the
total time period in question. The second was used to perform the overall classification,
and this two-part process improved the accuracy by 4% (87%, up from 83% without
assessing the stability of pixel classifications first). The main limitation was confusing
class types like grassland, planted pasture, and savanna, though in the future radar and
Remote Sens. 2022, 14, 3253 81 of 110

LiDAR data could help distinguish similar classes and boost OA. The resulting maps are
freely available through the MapBiomass platform. The authors in [138] used an adaptive
stacking algorithm to train a ML classifier on optical, SAR, and DEM data to identify
wetland vegetation. Adaptive stacking is using one ML classifier to identify the optimal
combination of ensemble classifiers and hyperparameters to be used for a given task. In this
case, the authors used a RF model to determine the best combination of the CART, MD, NB,
RF, and SVM classifiers on GEE. The authors found that the adaptive stacking method was
much more accurate than the RF and SVM models alone. The resulting classification map
was then combined with a trend analysis performed by the LandTrendr algorithm, which
allowed them to identify wetland vegetation distribution as it is now and also how it has
changed over time. Additionally, [138] tested their workflow on different subsets of input
data and showed that adding more data helped the adaptive stacking algorithm learn better
(in fact, the best combination of input data was all of the data). The authors noted that forest
and reed classes were not identified well with their adaptive stacking algorithm, and that
the LandTrendr algorithm will most likely need to be re-tuned in different environments.
Bathymetry and RS data were combined in [127] to create a processing and analysis
pipeline for large scale seagrass habitat monitoring in Greece using GEE. While the authors
compared CART, RF, and SVM models on the GEE platform and how they performed
on open-source datasets, they validated the models on unpublished data, which made
it difficult to replicate their results. A key limitation to this processing workflow is the
lack of in-situ validation data. Thus, their preprocessing pipeline depends on creating a
data mask for labels using a ML model, which is then fed to ML models as input data. If
there is uncertainty or errors in the first output data layer, these errors would persist in
the secondary classification step. Their reported OA is 72% and the authors suggested
more seagrass datasets for performance improvement. A CNN–LSTM hybrid model was
used in [132] to identify grassland types in Sentinel-2 imagery in the United States. The
authors collected ground-truth field data for their experiment, and with the help of GEE
for preprocessing and Google Colab for NN training, they received an almost 7% accuracy
boost for identifying a type of grass (98.8%, up from 92%). However, the authors’ dataset
was very small (13 Sentinel-2 images in total, 6 images in 2016 and 7 in 2017, as the time
range corresponds to their field surveys years), so it was uncertain how this model would
generalize to other regions in the same state or in different states altogether.
The authors in [43] compared the performance of a RF model with feature engineering
to a LSTM and U-Net NN models without feature engineering for identifying pasturelands
in Brazil. The RF model was trained on GEE while the NNs had to be trained offline as GEE
did not currently support DL models. The authors crowdsourced the creation of a LULC
dataset for Brazil using PlanetScope imagery to domain experts, ensuring that the labels
for the input data were accurate. These LULC classes contained important pastureland
subtypes in addition to savannah, forest, built-up areas, and water. U-Net had the highest
generalization across both the validation and testing sets, maintaining high accuracy rates
while the LSTM and RF model underfit the test set. To illustrate the tradeoffs between
ML and DL models, the authors included run and inference times. The RF model was
able to complete training and prediction in 3 h. The LSTM took 30 min to train but 23 h
to predict on the test set, while the U-Net took 24 h to train but 1.2 h at inference time.
The authors in [126] used GEE to compare how well several ML classifiers compare to
index-based methods like NDVI. Using over 40 years of optical Landsat imagery, the
authors were able to map vegetation loss with high accuracy matching that of a current
government vegetation monitoring program (though their process relies only on cloud
computing and freely available data) in Australia. However, different amounts of rainfall
affected their results because models were not able to fully recognize vegetation in varying
greening and drying patterns. Future analyses should attempt to collect more and higher
resolution data to improve model performance. The authors in [140] argued that phenology
information in RS time series can better capture tidal flat wetland vegetation and so
compared phenology information to statistical (min, max, median) and temporal features
Remote Sens. 2022, 14, 3253 82 of 110

(quartile ranges). They then fed this data into a RF while analyzing its effect on model
performance during different periods of time (all data, green and senescence seasons) for
wetland vegetation classification. The authors showed that the phenological information
was the most important input feature to the RF, while combining all three sets of features
led to the highest accuracy. Additionally, the model performed best when predicting over
both the green and senescence periods. To explore how plant functional types can be
derived directly from RS information, [137] trained a RF model on field, DEM, MODIS, and
climate data. Their method was able to distinguish between moist and dry deciduous tree
types with a high degree of accuracy, which could lead to better estimates of carbon, water,
and energy fluxes. Still, the authors struggled to identify shrubs, grasses, and crops, and
built-up areas.
The authors in [141] implemented just such a model that has been optimized for green
LAI in RS imagery but do so in a way that is optimized for GEE. First, they created the model
so that it can run on vector or tensor time series imagery. Then, the authors used AL for feature
reduction so that the model only learned on important data while creating a model that can
run within GEE’s memory confines. This GPR model was then used to gap-fill RS imagery
focused on LAI, meaning the model was able to “see” through clouded optical imagery. More
work like this should be done, either in creating new models to upload to the cloud that
other researchers can use or optimizing these models so that they are memory efficient. The
authors mention that better GEE code documentation and error messages could help future
researchers interested in developing custom ML models for the platform.

Appendix C.5. Textual Summaries for Water Mapping and Water Quality Monitoring
In [32], the authors created a web portal using GEE as a backend alongside an expert
system to identify bodies of water in Landsat imagery. Being able to visualize global trends
in surface water allowed the authors to identify trends such as all continents gaining surface
water, although this varies from region to region. While small bodies of water (30 m × 30 m or
smaller) were not able to be mapped using the expert system, the process of mapping global
surface water was sped up by the use of GEE compute resources. The authors noted that some
regions had more accurate water maps because of the length of the observation record. In [142],
the authors used all available Landsat images to study surface water dynamics in Oklahoma
from 1984 to 2015. About 16,000 Landsat scenes were preprocessed using GEE. Subsequently,
they computed spectral indices (e.g., MNDWI, NDVI, and EVI) and performed conditional
operations to extract surface water areas. Four surface water products were created, including
the maximum, year-long, seasonal, and average surface water extents. The results showed
that both the number of surface water bodies and surface water areas had been decreasing
from 1984 through 2015. Significant inter-annual variations in the number of surface water
bodies and surface water areas were found. They also found that both the number of surface
water bodies and surface water areas had a positive relationship with precipitation and a
negative relationship with temperature.
The authors in [150] analyzed to what degree different preprocessing steps affect the
output water maps using both SAR and DEM data and two variations of Otsu’s threshold-
ing algorithm. They showed that SAR data included radiometric terrain correction (RTC)
as a preprocessing step yielded more accurate results and that Bmax Otsu thresholding was
more stable to different inputs than Edge Otsu. However, their analysis was limited in time
and space, so more work needed to be done to test their results in different locations and
varying terrain types at different times. In [143], the authors used Landsat 8 images avail-
able on GEE to map glacial lakes in the Tibet Plateau region. About 3580 Landsat scenes
acquired in 2015 were preprocessed. After that, the MNDWI algorithm was applied to each
image to extract glacial lakes with thresholding techniques. The initial results were then
exported from GEE for further processing. They also analyzed the various characteristics
of glacial lakes, including size classes, elevation, and climate forcing. The results revealed
that climate warming played a major role in glacial lake changes. The authors in [151]
compared the performance of MNDWI and a RF to that of a multi-scale CNN (MSCNN)
Remote Sens. 2022, 14, 3253 83 of 110

and showed that the DL method was the most accurate (with less false classifications) for
identifying urban water resources in several Chinese cities. However, the authors took a
novel approach in avoiding the lack of DL methods available on GEE: they trained the CNN
locally, and then uploaded the weight matrix to GEE. They then implemented the rest of the
CNNs features (convolutions, etc) directly in GEE, effectively allowing the authors to run
DL inference on the platform. Still, the MSCNN model had issues classifying small/thin
water bodies and water scenes with mixed pixel classes. One way to make the data labeling
process less time- and resource-intensive for DL was illustrated in [156], where the authors
used current water maps and a segmentation algorithm to automatically collect data labels
from Sentinel-1 imagery. This data were then used to train variations of U-Net in an offline
environment. Due to computational constraints, the authors were not able to compare
their model to more traditional ML models like a RF. Even with their automated data
labeling pipeline, the authors noted that their study lacked sufficient data to adapt their
method to more than one country and manual validation was still necessary to validate the
model post-prediction.
Optical imagery used in surface water mapping analyses is often occluded by clouds,
and many common methods used to map surface water confuse snow, ice, rock, and
shadows as water. DeepWaterMapv2 was released in [147] to address these false positive
misclassifications. The authors used Landsat imagery from GEE to train their NN archi-
tecture to identify bodies of water across different terrain types and in different weather
conditions. However, due to the compute constraints and lack of NN models on GEE,
the authors moved the data offline during the training process. The authors designed the
network to work with many different satellite platforms as long as they have a set group of
input bands. The authors in [157] used masking, filtering, and segmentation algorithms to
identify bodies of water in Sri Lanka in complex, mountainous environments. They showed
that their model performs well even in the presence of shadow or soil and does so much
better than other common index-based methods like NDWI, MNDWI, or multi-spectral
water index (MuWI-R). To explore the potential to distinguish between surface water body
subtypes, [158] used slope, shape, and phenology, and flooding information as input to
a RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural
ponds. Their method did not work very well for wetlands and the OA was not very high
(85%) across classes. However, the RF model they use was interpretable and they showed
which other subclasses were easy or more difficult to predict for. Unfortunately, the entire
preprocessing (method) cannot be run directly on GEE: because the shape features cannot
be calculated on the platform and were crucial to the overall analysis, the authors first had
to do this in a local environment and then upload them.
The authors in [144] proposed a new method for quickly mapping yearly minimal
and maximal surface water extents. Using the GEE and Landsat images, temporal changes
in the extent of surface water in the Middle Yangtze River Basin were identified. Firstly,
based on the estimated value of cloud cover for each pixel, the high cloud covered pixels
were removed to eliminate the cloud interference and improve the calculation efficiency.
Secondly, the annual greenest and wettest images were mosaiced based on vegetation
index and surface water index. Thirdly, the minimum and maximum surface water extents
were obtained by the RF classification. Finally, manual noise removal as implemented
in ESRI ArcMap was applied to reduce noise in the classification result. In [148], the
authors integrated global surface water (GSW) dataset and SRTM-DEM to determine the
spatiotemporal patterns of water storage changes in China’s lakes and reservoirs. The
dynamic water storage change of 760 lakes and reservoirs, each with an area greater than
10 km2 , were evaluated over a time span of 30 years (1984–2015), the total area accounting
for about 80% of the total water surface area in China. The HydroLAKES data and China’s
lake dataset and river shapefile were also used to help select lakes and reservoirs. Water
level data for a total of 30 lakes across China from Hydroweb dataset were used for
validation. The DEM-based geo-statistic approach was used to construct hypsometric
relationships between water area and elevation for each lake and reservoir. Their data
Remote Sens. 2022, 14, 3253 84 of 110

preprocessing was implemented using ArcGIS, GEE was used for extraction and correction
of water coverage and also extraction of surface area-elevation pairs, and R software
was used for statistical analysis on pixel contamination ratios, hypsometric analysis, and
identification of spatio-temporal patterns.
The authors in [154] reviewed recent fluvial geomorphology GEE applications and
synthesized three common themes relevant to future planimetric river channel change
studies: (1) GEE has been used as a tool for mining the satellite imagery data archive,
cloud-masking images and then generating multitemporal image composites; (2) many
applications have provided accessible source code and/or data repositories, promoting
transparent and open science; (3) cartographic, graphical, and statistical analyses are almost
always completed outside of the GEE environment. This study [154] shared a demon-
stration workflow showing how GEE can be used to extract active river channel masks
from a section of the Cagayan River (Luzon, Philippines). The spatiotemporal planform
change was then quantified outside of the GEE environment, i.e., extracting centerline
position and channel width and calculating centerline migration rates. For RS applications
in fluvial geomorphology, challenges remain around issues of scaling, transferability, and
data uncertainties; particularly for small- to mid-sized rivers where medium-resolution,
multispectral satellite imagery is rarely suitable for geomorphic analyses. Caution is always
required to interpret geomorphic changes based on two-dimensional planforms alone, as
rivers also adjust in the vertical dimension. By enabling fluvial geomorphologists to take
their algorithms to petabytes worth of data, GEE is transformative in enabling determin-
istic science at scales defined by the user and determined by the phenomena of interest.
GEE offers a mechanism for promoting a cultural shift toward open science, through the
democratization of access and sharing of reproducible code.
The authors in [146] stated that this was the first study using GEE for RS of water
quality parameters in inland waters. Using Landsat imagery in conjunction with ground-
based measurements of CDOM absorption and DOC concentrations, a regression-based
model was built to estimate CDOM in the six largest Arctic rivers using 424 separate
observations from 2000 to 2013.
To estimate water quality parameters like Chl-a concentrations, turbidity, and dis-
solved organic matter, [152] used ML and DL models to analyze RS imagery. The authors
showed that several ML and DL models were able to achieve very low error rates for this
regression task. Some of the relationships detected by the models could be used to predict
for non-optical variables, as well. However, the authors had to move the ML portion of
their analysis off the GEE platform due to “algorithmic limitations” (inflexible models).
While a DL model performed well for predicting various water quality indicators, [152]
cited a lack of model transparency. They cautioned that feature extraction and expert
knowledge may still be necessary to make some sense of the DL model outputs, otherwise
they were difficult to interpret, negating the level of accuracy achieved with the model. The
authors in [153] developed a methodological framework for mapping Chl-a concentrations
with multi-sensor satellite observations and in-situ water quality samples. A SVM model
was trained on the GEE cloud-computer platform and used to predict Chl-a concentrations
of 12 inland lakes in the tri-state region of the U.S., including Kentucky, Indiana, and Ohio.
The results demonstrated that GEE and multi-sensor satellite observations can enable fast
and accurate mapping of Chl-a at a regional scale.

Appendix C.6. Textual Summaries for Wetland Mapping


Cloud computing on GEE was utilized in [35] to create an open-source, reproducible
map of wetland occurrence probability using LiDAR and RS data for the entire Alberta.
Using a BRT, the authors were able to match a current governmental effort in Alberta while
also producing a relative variable importance showing which RS variable might be the most
useful for future wetland mapping efforts in the area. However, the authors noted that
some uncertainties in the underlying training dataset, a lack of subsurface soil information,
and having to move between GEE and offline analysis may have contributed to errors in
Remote Sens. 2022, 14, 3253 85 of 110

this analysis. The authors in [162] showed that by combining SAR, optical, and LiDAR data
on the GEE platform, a BRT model was able to predict peatland occurrence across Alberta
province with relatively high accuracy at high resolution. Using different input variable
selection methods and optimization techniques, the authors were able to trim down their
dataset to six variables, saving time and compute in the final analysis while pointing future
studies towards which data would be useful to collect more of for peatland mapping. [162]
pointed out that additional training data from field work or photo interpretation will aid in
future peatland monitoring and detection studies and that more research needs to go into
distinguishing between different wetland classes.
The authors in [161] used optical and SAR RS imagery to produce a 10 m resolution
wetland map for the entire province of Newfoundland, Canada, using both a RF model
and SNIC. Optical data contributed more to the accuracy of the models, although including
SAR boosted accuracy rates. While OA rates were high for distinguishing between wetland
and non-wetland classes, distinguishing between wetland sub-types (bog, fen, marsh, etc.)
remained difficult. Limitations for the study include not having access to a harmonized
Landsat-Sentinel data produced on GEE, not being able to use TensorFlow or DL models on
GEE, and a continued lack of ground-truth data for wetland detection studies. In [170], the
authors classified wetlands in Newfoundland during three different periods to show the
spatial dynamics of these ecosystems. The authors obtained high accuracy rates using both
a RF and CART model and were even able to distinguish between wetland subtypes like
bogs, fens, and peatlands. The authors used Landsat imagery because its data catalog goes
back to the 1980s. This was necessary because of the length of the wetland change detection
they were interested in. Still, the authors noted that future mapping applications should
focus their analyses on using higher-resolution products like Sentinel imagery to increase
accuracy rates even further over wide areas. The authors in [17] proposed using field data
collected from one Canadian province to create wetland inventory maps for several others
using a mix of optical, SAR, and digital elevation data. However, the authors received
mixed accuracy results from their RF model, most likely because the study rests on the
assumption that there was a static underlying distribution of data between wetlands across
Canadian provinces. The authors noted that their results could be improved if the GEE
platform allowed for more samples to be analyzed at once, and if there were more flexibility
or choice in choosing ML model hyperparameters or if there were more segmentation
algorithms included on the platform.
Across Canada, wetland mapping is a well-studied phenomenon. However, different
local and regional agency wetland inventories use different techniques for monitoring
wetlands or have altogether different definitions of what constitutes a wetland. Thus,
even though several large-scale wetland maps have been produced, they are often not
directly comparable. Additionally, these maps are often static and do not continually
monitor wetlands through time. However, as [165] detailed, these are not the only barriers
to mapping wetlands using RS imagery. Others include obtaining sufficient and recent
field data to verify wetland monitoring products, but also the difficulty of monitoring such
dynamic landscapes. Wetlands do not have clear-cut boundaries, are extremely diverse
landscapes and ecosystems, and are often in flux throughout seasons and years due to
flooding and drying. The authors use optical and SAR Sentinel data in addition to field
samples over the entirety of Canada and show that almost one-fifth of Canada is covered
in wetlands. The study in [165] produced a high-resolution (10-m) wetland inventory map
of Canada (an approximate area of one billion hectares), using multi-year, multi-source
(Sentinel-1 and Sentinel-2) RS data on the GEE platform. The whole country was mapped
using a large volume of reference samples using an object-based RF classification scheme
with an OA approaching 80%. They [165] used both pixel- and object-based classification
with an RF model and SNIC to reduce noise in the output map. However, the authors came
into the study with an accuracy threshold in mind and changed the training and dataset to
meet it after already seeing accuracy results. The authors presented uneven performance
across Canadian provinces, mainly due to a lack of RS or field data in some locations. The
Remote Sens. 2022, 14, 3253 86 of 110

authors in [160] analyzed a large number of field samples alongside Landsat imagery with
a RF model to produce a wetland map for all of Canada. While this analysis showed how
GEE made it easier to scale up the spatial scope of a given analysis (i.e., move from local to
regional, country-level, or global scope) [160], obtained low accuracy scores across Canada.
The authors note that more field samples and the use of SAR data could improve future
results given that large parts of Canada is often covered by clouds and snow throughout
the year. The authors in [168] proposed an object-based classification method to classify
Sentinel-1 and Sentinel-2 data on the GEE platform, which resulted in the 10-m Canadian
Wetland Inventory. The method consisted of a simple non-iterative clustering algorithm
and the RF algorithm, which was applied to identify wetlands in each of the 15 ecozones in
Canada. The overall accuracies for each ecozone ranged from 76% to 91%. It represents a
7% improvement compared to the first generation of the Canadian Wetland Inventory.
The authors in [163] used NAIP imagery and LiDAR derived DEM data to detect
wetlands across the northern United States using unsupervised classification on the GEE
platform. They then compared their output with Joint Research Centre (JRC) Monthly
Water History and National Wetland Inventory (NWI) data. Additionally, all code and
implementation details were made open source, making it easy for others to verify or build
on their results. A benefit of their technique is that unsupervised learning does not rely
on underlying ground-truth data, often a bottleneck in ML and wetland mapping studies.
However, this was also a limitation in the study as it was difficult to verify their resulting
maps other than by comparison with other water and wetland map products (which
themselves could have inaccuracies). To get around the limitation that wetlands can be both
wet and dry over the course of the same season, the authors in [171] combined Sentinel-1
and -2 imagery with aerial photographs and field data to map the spatial variation of
wetlands in portions of the United States over time. First, the authors trained RF and
SVM models to predict the occurrence of wetlands and then masked out permanent water
using the JRC Global Surface Water dataset. This allowed the authors to show not only
permanently inundated wetlands, but how wetlands change over time. The RF model
was the most accurate when compared to the SVM and NDWI, while also reducing false
positives and negatives. The authors made their workflow open source in the hopes that
conservation managers or people without coding experience can rerun their analysis for
updated wetland extent information. More analyses should take into account spatial
variation while producing environmental mapping applications, especially as governments
and nonprofits make conservation decisions based on them. The authors in [159] explored
the possibility of using GEE to map coastal wetlands in Indonesia by comparing all of the
different classifiers on the platform and how they perform with Landsat, digital elevation,
and Haralick texture data. While the results showed that the CART algorithm performed
the best on this task across every year of training data, it was unclear from the results
whether feature engineering and PCA bands helped the model learn better than from just
the spectral input data. While GEE allowed [159] to train several models, some models
failed to run due to computational constraints or inflexibility. The authors showed that in
all cases, ML models did much better at binary than multi-class classification.
With Landsat 8 and high-resolution Google Earth imagery, [164] used a RF model on
GEE to classify tidal flat types and their distribution in China. The authors reported very
high classification rates across tidal flat classes and showed that their methods produced
on GEE compared favorably to or did a much better job at classifying tidal flats based on
visual interpretation. However, the authors detailed that satellites like Landsat did not
fully capture tidal ranges, meaning that accuracy could be improved further with future
data products that observe full tidal duration distributions. In [169], the authors developed
a pixel and frequency-based approach to generate annual maps of tidal flats at 30-m spatial
resolution in China’s coastal zone using the Landsat TM/ETM+/OLI images and the GEE
cloud computing platform. The resulting map of coastal tidal flats in 2016 was evaluated
using very high-resolution images available in Google Earth. The annual frequency maps
of open surface water bodies and vegetation were first produced using Landsat-based
Remote Sens. 2022, 14, 3253 87 of 110

time series vegetation indices and water-related spectral index. Pixels with a water body
frequency spanning from 0.05 to 0.95 were classified as intertidal zones. A threshold value
of 0.05 was used to classify coastal vegetation area (vegetation frequency ≥ 0.05) and non-
vegetated tidal flats (vegetation frequency < 0.05). Mixed pixels, such as remnant tidal flats
water, could not be detected. In [172], the authors first processed high-resolution RS, and
UAS imagery to map minimum and maximum water and vegetation extent. They then used
Otsu’s thresholding algorithm to automatically detect the best ratio for each index. These
two indices were then combined in a composite that showed the total intertidal area in the
RS imagery, to which the authors again applied the Otsu thresholding algorithm. The end
result was a highly accurate map of tidal flats that did not require any post-processing. The
authors compared their results with other tidal flat datasets in China and noted that their
method produces (at least visually) better estimated because their method incorporated
high-resolution imagery, did a better job at cloud-masking, and achieved better estimates of
tidal minima and maxima. Still, the authors noted that more imagery of high and low tides
in RS imagery needed to be collected and would increase the accuracy of their method.
A RF model was used on GEE in [166] to identify water cavities where sebkhas form
in Morocco. The authors used digital elevation data, SAR, and optical imagery, as well as
digital photos on GEE to identify saltwater cavities and their aquifers with high accuracy.
However, future challenges remain in incorporating multi-sensor, multi-temporal, multi-
resolution RS big data and in improving open-source, cloud-based ML workflows for EO
data. The authors in [167] compared the performance of a XGBoost model to a CNN for
wetland type classification. The authors got a decent accuracy score, but their F1-score was
bad, so it was not clear what the models were actually learning. The authors were also not
able to train the two models on the same two subsets of data, making their performance
not directly comparable. However, in addition to making their resulting maps and trained
CNN model open source, the authors run an informative comparison between how long it
takes to run and train the two models used in this study. The CNN and XGBoost model
took the same time to train, but the CNN took far less time to predict on the test set. More
studies should adopt this reporting metric so that researchers can more clearly evaluate the
tradeoffs between using specific models for their use-cases.

Appendix C.7. Textual Summaries for Infrastructure and Building Detection, Urbanization Monitoring
The authors in [178] created a large, vectorized, ground-truth verified dataset in India
specifically for the purpose of being able to train different ML models. They verified the
utility of the dataset by training CART, RF, and SVM models on GEE and compared their
predictions to those of the WorldPop dataset. While manually creating a large dataset takes
time, the authors showed that they can achieve accuracy rates of 87% with the RF model.
The authors also compared different combinations of input data and their impact on model
performance. For their application, Landsat 8 data served as better input than Landsat
7 alone or Landsat 7 data with computed indices like NDVI.
To investigate how best to identify impervious materials in RS imagery regardless of
cloud cover, [182] combined nighttime light, DEM, and SAR data and a RF model on GEE.
Their resulting maps were more accurate than commonly used maps like GlobeLand30.
More importantly, though, the authors quantitatively showed that using multiple sources
of data were better than single sources for this task: optical data were the most important,
but SAR data improved accuracy rates across all metrics. The mounting expansion of
impervious surfaces (major components of human settlements) could lead to a series of
human-dominated environmental and ecological issues. In [180], the authors put forward
a new scheme to conduct long-term monitoring of impervious−relevant land disturbances
using Landsat archives. The developed region was identified using a RF classifier. The
GEE-version LandTrendr was then used to detect land disturbances, characterizing the
conversion from vegetation to impervious surfaces. Finally, the actual disturbance areas
within the developed regions were derived and quantitatively evaluated.
Remote Sens. 2022, 14, 3253 88 of 110

The authors in [179] accessed the impact of urban form on the landscape structure of
urban green spaces in 262 cities in China. They preprocessed and classified 6673 Landsat
scenes for these cities using the RF classifier on GEE. Subsequently, they calculated several
landscape structure metrics and urban form metrics. To evaluate the relationship between
landscape metrics and urban form metrics, a BRT model was constructed to analyze their
relationships. The results revealed that cities with a high road density tended to have
a smaller area of urban green spaces and be more fragmented. In contrast, cities with
complex terrains tended to have more fragmented urban green spaces.
A semi-automatic large-scale and long time series (LSLTS) urban land mapping frame-
work was demonstrated in [183] by integrating the crowdsourced OpenStreetMap (OSM)
data with free Landsat images to generate annual urban land maps in the middle Yangtze
River basin (MYRB) from 1987 to 2017. First, the annual Landsat images and the related
spectral indices were collected and calculated in GEE. The OSM related data were collected
and processed manually in ArcGIS to generate the training samples. Then, the generated
samples were uploaded to GEE. Two classification algorithms were used: CART and RF.
Pixels that were both classified as urban land by the two methods were labeled as urban
land. The classified maps were downloaded from GEE and a spatial-temporal consistency
checking was further performed. Except for the generation of reference data for training
and validation as well as post classification analysis, most of the data processing was
performed automatically in GEE. Use of crowdsourced geographic data (CGD) such as
OSM came with many challenges: OSM polygons may overlap and contain multiple LULC
types; there is a large diversity of tags in OSM, some of which cannot be converted directly
to LULC classes; most of human activities are in urban areas, resulting in an imbalance of
(non-urban) class data. The authors noted a lack of GEE infrastructure, such as 1) GEE API
related to CGD, that could facilitate the training samples generation, and 2) direct import of
the Google Earth annual very high resolution (VHR) images to GEE that users can set as the
background image and collect validation samples on the cloud. In this study, urban areas
on RS images were defined as sites that were dominated by a built environment, including
all non-vegetative, human-constructed elements and were defined as features with tags of
all non-vegetative, human-constructed elements including road networks and buildings in
OSM data.
To explore the possibility of identifying greenhouses in RS imagery over a large
area in China, [185] designed an ensemble ML model to distinguish them from water,
forest, farmland, and construction sites. The authors found that of various ML models
available on GEE, the CART, gmoMaxEnt, and RF models performed the best. These models
were then combined through a weighting system to make predictions, and this resultant
ensemble model performed better at this classification task than any of the individual
models. Additionally, [185] looked at which features play the most important role in the
ML model’s predictions. The authors found that spectral information was most useful,
but that texture and terrain features helped boost the accuracy even more. However, this
method relies on optical imagery, so it depends on relatively cloud-free imagery. More
work would need to be done to help the model generalize to situations where cloud-free
imagery is not available and to distinguish between greenhouse subtypes.
The authors in [186] designed a workflow for mapping urban sprawl over time in
Brazil using a RF on the GEE platform. They used optical RS imagery from the Landsat
and Sentinel platforms, alongside DEM data and found that the cities used for their case
study had built out horizontally instead of densifying vertically. Still, the drivers behind
the urban sprawl need to be investigated further, in addition to how best to incorporate
their maps into the governmental policy decision-making process.
Using different vegetative indices (EVI, Gross Primary Production, etc.) derived from
Landsat and MODIS data, [181] showed that urban sprawl in Shanghai had increased
significantly in the last decade and a half. The spread of suburbs in Shanghai had led to
much less green space over a 15-year period. This is a very impactful area of research that
can be done completely on the GEE platform and replicated across cities around the world.
Remote Sens. 2022, 14, 3253 89 of 110

Produced together with heatmaps of a given city, urban vegetation maps can be used to
pursue environmental justice strategies that can improve equitable access to green spaces
and attempt to reduce extreme temperature disparities (“heat islands”) in cities.
Producing up-to-date land cover maps can be time-consuming and expensive to make.
This is especially true in areas without dense data coverage for common LULC classes.
In [184], the authors combined Landsat 5 and 8 RS imagery, slope from a digital terrain
model (DTM), and GLCM information, and then trained a SVM to output two classification
maps for portions of Rwanda: one for 1987 and the other for 2019. The authors then used
the LandTrendr algorithm to compute LULC changes through time, which allowed them to
produce maps without having dense field observations for validation. They showed that
while water, wetland, and forested areas had remained fairly constant in terms of total area,
urban development has been replacing open land and agricultural areas.

Appendix C.8. Textual Summaries for Wildfires and Burned Area


The authors in [190] proposed a method for identifying fire-induced disturbances
using the LandTrendr and FormaTrend algorithms on the GEE cloud-computing platform.
Various metrics were used to quantify fire disturbances, such as type, magnitude, direction,
and duration. The results showed that the FormaTrend algorithm outperformed the
LandTrendr algorithm in identifying low-severity fire-induced disturbances. Nevertheless,
the LandTrendr algorithm can be useful for generating change metrics that are useful for
studying post-disturbances.
To determine the impact of using higher-resolution RS data products, [192] compared
how Landsat and Sentinel optical imagery affected a ML model’s performance in burn area
classification. The authors used Weka clustering output and different spectral and index
information as input into the CART, RF, and SVM models available on GEE. They found that
both Landsat and Sentinel imagery produced much better maps that captured small burn
areas that current maps and fire monitoring products like MODIS were not able to capture,
though Sentinel imagery led to an underestimation in burn area. Additionally, the authors
found that the tree-based algorithms performed comparably to each other but much better
than the SVM model. This study highlighted the importance of analyzing different data
sources and ML models to show their respective contribution to predictive performance.
The authors in [193] developed an automated and cloud-based workflow for gener-
ating a training dataset of fire events at a continental level using freely available RS data.
The training dataset was applied to different machine learning algorithms (i.e., RF, NB,
and CART). It was found that the RF outperformed the other algorithms, which was hence
used further to explore the driving factors using variable importance analysis. The results
showed that the most important variables were soil moisture, temperature, and drought.
In [195], the authors used Sentinel-2 data along with two different burn areas and
LULC maps to train different ML classifiers (k-nearest neighbor (KNN), RF, SVM) to map
wildfire damage in Australia. They first used an optimization algorithm to select features
and showed that this improves model performance for each model used. The RF model
with feature selection performed the best and the authors were able to predict for burned
areas in different LULC types, whereas previous studies had focused on producing binary
burned/non-burned maps. However, [195] noted that low resolution LULC maps were
a limiting factor in their analysis, and that future studies could be repeated with higher
resolution ones to improve model performance even further.
The authors in [196] designed a completely cloud-based DL workflow combining
Google Cloud and GEE to classify burn scar areas in Brazil. Using a DNN, the authors
produced a fire burn map that was more accurate than maps produced by MODIS and
the National Institute for Space Research in Brazil. However, perhaps more importantly,
the authors identified the areas that their map disagreed with other maps and why. They
found that the southern areas of the Cerrado were misclassified more often in all three
maps, and that clouds, shadows, and plant regrowth were the main features leading to
misclassification. This type of analysis is important because it can highlight where current
Remote Sens. 2022, 14, 3253 90 of 110

maps fall short while making them interoperable with higher-resolution, more accurate
maps being produced today. Still, [196] said that the number of ground-truth observations
was the limiting factor in their analysis and that model’s performance could be improved
further with more validation data.
The 250 m spatial resolution of products like FireCCI51 leave out a lot of detail, so
the authors in [191] used CBERS, Gaofen, and Landsat imagery to create a 30 m burned-
area dataset for 2015. The authors first trained a RF on this imagery and set it to output
probabilities instead of class predictions. These probabilities were then used as a starting
point for a pixel-aggregation algorithm that classifies neighboring pixels as whether they
belong to the burned-area class or not. The authors called this “burned-are shaping” and
the resulting maps for this process were used as training data for an SVM. The resulting
map had good spatial agreement with FireCCI51 but had much higher spatial resolution
with more detailed and accurate boundaries. However, the authors noted that their method
had difficulty recognizing burned areas from recently plowed fields in agricultural areas, so
crop-type masks should be used to remove potential false positives. Additionally, Landsat
data were used for both the data collection and validation stage. Thus, the authors were not
able to assess the suitability of using Landsat imagery for data collection purposes despite
their high accuracy rates. Later on, [194] adapted the exact same processing steps on GEE
to produce a burned area map for the year 2005, illustrating how sharing and storing code
on GEE makes it easy to re-run analyses or adapt them for new use cases.
In order to better interpret the fire severity in terms of on-the-ground fire effects
compared to non-standardized spectral indices, [189] produced a map of composite burn
index (CBI), a frequently used, field-based measure of fire severity. A RF model was built
on the GEE, describing CBI across forested landscapes in North America as a function of
multiple spectral indices and climatic and geographic coordinates. The robust relationships
and the fairly high model skill in most regions suggest the resulting CBI maps may be
beneficial in remote regions where it is expensive and difficult to acquire field measures of
severity (e.g., Alaska and the majority of Canada).

Appendix C.9. Textual Summaries for Heavy Industry/Pollution Monitoring


The authors in [197] used time series data of the Soil Adjusted Total Vegetation In-
dex (SATVI), calculated from Landsat 5 imagery, to track changes and assess vegetation
regrowth on 365 abandoned well pads located across the Colorado Plateau. BFAST (Breaks
for Additive Season and Trend) time-series models were used to fit temporal trends, identi-
fying when vegetation was cleared from the site and the magnitudes and rates of vegetation
change after abandonment. The time series metrics were used to calculate the Relative
Fractional Vegetation Cover (RFVC) of each pad, a measure of post-abandonment vegeta-
tion cover relative to pre-drilling condition. Cover change values were standardized by
measuring with respect to vegetation cover values at nearby reference pixels, undisturbed
by energy development, determined using an automated reference site selection algorithm.
Statistical modeling using linear regression and a RF was performed to identify the environ-
mental and/or management variables most related to RFVC response. Results suggested
that reclamation efforts on abandoned oil and gas pads of the Colorado Plateau gave mixed
results. A substantial amount of year-to-year variability in relative fractional vegetation
cover corresponded to moisture conditions assessed using an index of evaporation and
drought (SPEI). Both time series analysis and statistical modeling used the R package.
The authors in [198] presented a mapping study for mining areas in the Brazilian
Amazon using Sentinel-2A images and the CART classifier in GEE. The map was then
exported to ArcGIS, in which the data provided by Brazilian National Department for
Mineral Production—DNPM (license status, mineral type among other information) was
integrated with the mining map. The mapping results were compared to the high-resolution
RapidEye imagery. The area occupied by each mining category was computed, providing
key information for the environmental management of mining activities.
Remote Sens. 2022, 14, 3253 91 of 110

In [202], the authors made use of Landsat imagery and the LandTrendr algorithm
to monitor water accumulation in subsidence areas of past mining in China. First, they
identified permanent versus seasonal water bodies, then used a water index in areas of
known mining to track water changes. The authors incorporated a popular subsidence
simulator that predicted for water accumulation at underground mining sites and showed
that their dataset had good agreement with it. Thus, their processing workflow can be
integrated with the simulator to verify the output. While the authors achieved high
accuracy rates, this varied dramatically between different years and between different
stages of water accumulation. The authors noted that more work needed to be done to
increase the robustness of their processing pipeline to more accurately distinguish between
water accumulation at mining sites and flooding and heavy rainfall events.
To monitor mining disturbances at a coalfield in Mongolia, [199] used the LandTrendr
algorithm to analyze Landsat data. The authors designed a fast, efficient method on the
GEE platform to monitor surface mining operations and show that only 26% of promised
reclamation was undertaken at the Shengli Coalfield. However, the authors noted that their
pixel-based classification approach would benefit from a comparison of an object-based
approach (although many object-based classifiers are not on GEE).
In order to keep track on mines and dams in Brazil, [200] used two different CNNs to
first classify potential mining sites and then to classify its perceived/potential environmental
risk. In this two-phase approach, the authors were able to identify 263 unregistered mines and
designed the CNN to work on variable-sized RS images. This analysis relied on government
data, which may not be available in other locations where mining was taking place. Addition-
ally, since the authors used a DL approach, they had to move their training process from GEE
to Google Colab. Even so, their data were too big for the GPU memory limits.
With GEE JavaScript API, [201] used RF classifiers to produce maps of mine waste
extents with Landsat-8 and Sentinel-1 and Sentinel-2 archives. The simplest method of
mapping mines is through thresholding, where a division between spectral response that
represents mines and non-mine areas can be clearly defined. Thresholding only produces
high accuracy when the spectral response of mines is significantly different than the
surrounding non-mine areas. Although the interpreter attempted to collect training data
points that were representative of all of the mine types as well as the variability in the
other classes, more training data may be required to better distinguish classes as similar as
outcrops/rock, mines, and urban areas. The RF classification algorithm computes Mean
Decrease in Accuracy (MDA) which is commonly used to assess variable importance. No
functions exist within GEE (yet) to analyze the importance of variables in a RF classifier
therefore this was completed using extracted training data values in R.
To test the efficacy of different ML algorithms for identifying waste and dump sites in
optical imagery, [203] optimized the parameters for the CART, RF, and SVM algorithms
available on GEE. The authors found that the RF algorithm was by far the most accurate
even when using several optimization schemes for each model. However, the authors
noted that a lack of elevation data in their processing pipeline led to classification errors,
and that more work could be done using DL methods to identify waste and dump piles in
the future.

Appendix C.10. Textual Summaries for Climate and Meteorology


In [204], MODIS satellite observations from 2000 to 2015 were analyzed using GEE to
derive global snow-free land surface albedo estimations and trends at a 500 m resolution.
The bulk of albedo trends can be attributed to rainfall, changes in agricultural practices
and snow cover duration. This study confirmed that at local scale, albedo changes were
consistent with land cover/use changes that were driven by anthropogenic activities such
as deforestation, irrigation, and urbanization.
The authors in [210] proposed a downscaling framework (from 25 km to 1 km) for
TRMM precipitation products by integrating GEE and Google Colab. Three ML methods,
including a Gradient Boosting Regressor (GBR), a Support Vector Regressor (SVR), and an
Remote Sens. 2022, 14, 3253 92 of 110

ANN were used to establish the relationship between precipitation and four environmental
variables, including elevation, longitude, latitude, and one of the three vegetation indices
(NDVI, EVI, LAI), The StandardScaler algorithm of scikit-learn was used to standardize
variables using their means and standard deviation to eliminate the effects of different scaling.
The GridSearchCV algorithm with 10-fold cross-validation (GSCV) splitting strategy was
used to identify the best hyper-parameter values of each machine learning-vegetation index.
The monthly precipitation maps were derived from the annual downscaled precipitation by
disaggregation. According to validation in the Great Mekong upstream region, the ANN
method yielded the best performance when simulating the annual TRMM precipitation. The
most sensitive vegetation index for downscaling TRMM was LAI, followed by EVI.
The authors in [205] performed major-axis regression on these datasets in pairs
(7 ETM+/8 OLI, 7 ETM+/2 MSI, and 8 OLI/2 MSI) across the entire coterminous United
States and were able to determine cross-platform correction coefficients for the Blue, Green,
Red, NIR, and SWIR bands present in all three satellites. The authors then validated their
methodology and correction coefficients by analyzing these same satellite platforms across
Europe. While [205] did not create an actual integrated dataset for use on the GEE platform,
their research was the first step to building such a dataset and making sure that it is of
high quality.
The authors in [206] implemented a cloud-based workflow and compared that to the
traditional method of using SAGA GIS for producing local climate zone city maps based on
data like WUDAPT. The authors showed that the traditional method was more accurate on
average than the GEE method when using only the datasets available to WUDAPT and when
trying to transfer an urban morphology classifier between individual cities. However, using
GEE allowed the authors to aggregate information from multiple cities in the same climate
zone and for the RF model they used to be trained on more RS data and derived indices that
were not available in the WUDAPT dataset. These improvements boost OA scores in urban
topology classification. Thus, while the GEE and more traditional classification methods are
not directly comparable, the cloud-based method outlined by [206] can be used to complement
research being done in urban topology studies.
The authors in [207] investigated the impacts of landscape changes on LST intensity
(LSTI) in a tropical mountain city in Sri Lanka. Annual median temperatures from three
years were extracted from Landsat data through the GEE interface. The SVM algorithm was
used to conduct LULC mapping, which was then used to calculate the fractions of built-up,
forested, and agricultural land based on urban–rural zone analysis. The study showed that
rapid development was spreading towards rural zones, and the fraction of built-up land
influenced the increase in annual mean LST. It was recommended that having a mixture of
land-use types would considerably control the increasing LST in the study area.
The authors in [208] presented a method to obtain high-resolution sea surface salinity
(SSS) and temperature (SST) by using raw satellite data, i.e., Sentinel-2 Level 1-C Top of
Atmosphere reflectance data. A deep NN had been built to link band information with in
situ data, which was obtained from the Copernicus Marine In Situ platform. The deep NN
providing the best results was found to be composed of 20 hidden layers with 43 nodes in
each layer. Shortcuts were used in the network architecture to avoid the so-called vanishing
gradient problem, providing an improved performance compared with the equivalent feed-
forward architecture. Accurate salinity values were estimated without using temperature as
input in the network. However, a clear dependency on temperature ranges was observed, with
less accurate estimations for locations where ocean temperature falls below 10 ◦ C. The NN
presented in this paper outperformed classical architectures tested for regression problems.
To study this mechanism further, the authors in [209] used a LSTM and compared the
performance to a RF for carbon fluxes in global forests. They combined bioclimatic and
forest age data with Landsat imagery and MODIS atmospheric reflectance maps as input
data to their models. The authors showed that previous seasons’ water and temperature
records (specifically from the spring) affected the ways forests release carbon in the current
season. Still, the LSTM model used in [209] struggled when it was trained in one site or one
Remote Sens. 2022, 14, 3253 93 of 110

forest type and applied to another. For instance, their ML and DL models did not perform
well in the Tropics and had varying performance predicting carbon flux for evergreen
and deciduous forests. This lack of generalizability was indicative of the way that carbon
fluxes vary from forest to forest around the world, but also of that their dataset was biased
towards older, undisturbed forests which led the LSTM to underperform for those classes.

Appendix C.11. Textual Summaries for Disaster Management


The authors in [211] proposed a new method for mapping landslides in Nepal using
RF. Landsat images acquired between 2012 and 2016 were processed using GEE, which were
then used to compute spectral indices and derive texture information. In addition, DEM
data were used to characterize landscape patterns. An RF model was constructed based on
spectral indices, texture information, and landscape patterns. The RF model was applied to
Central Nepal to identify landslides with reasonable accuracy. There are several limitations to
this study. First, GEE was used as a preprocessing platform. Some analyses were conducted
outside GEE. Second, the study area was rather limited with only one Landsat scene. Last but
not least, the accuracy varied substantially depending on the distribution and availability of
training samples.
The authors in [212] analyzed vegetation, thermal, moisture, and climate datasets,
along with surface drainage records, using a RF model on the GEE platform to create surface
drainage maps. In addition, the authors used optical and SAR imagery and completed a
relative variable importance analysis with the RF model. They [212] found that surface
drainage maps were sensitive to RS data scale while identifying soil properties and land
surface temperature as important features in their predictions. However, their method was
not able to predict for all land class types equally well, and the authors noted that their
processing method may not work in other areas due to the lack of data like government
surface drainage permit records (could mention example between North Dakota and
Minnesota below).
The authors in [213] took advantage of the easy-to-find data and freely available compute
on GEE in order to produce flood maps in Bangladesh. First, they used Landsat data from pre-
flood imagery and trained a CART model to make a land-use map for the country. Then, the
authors analyzed Sentinel imagery with a geographic object-based image analysis (GEOBIA)
model to produce water vs. non-water classification maps. These maps were then combined
to show which land-use types in different parts of Bangladesh are flooded and for how
long. While the authors achieved high accuracy rates, their method struggled to differentiate
flooded crop fields from inundated areas. This could be solved by overlaying a crop-use map
to remove mislabeled areas. However, it is important to note that using GEE for real-time
hazard response is not yet currently advisable given that there is a lag between the time RS
imagery is collected and then subsequently uploaded to the platform.
The authors in [214] presented a case study for the 2018 Kerala flood in India. They
demonstrated how GEE can be used to process large optical and SAR RS datasets, in
conjunction with field and precipitation data, using image processing techniques to produce
high-resolution flood maps over a large area. This application/processing flow was called
GEE4Flood and processes large datasets quickly to produce flood maps. However, the
authors noted that several challenges remain in making their algorithm operational: the
input data need to have a reference image of the area pre-flood that is cloud free, and Otsu’s
thresholding algorithm needs to have classes (flood versus non-flooded) that are relatively
equal in frequency. This may very well not be the case for imagery that is inundated.
Additionally, it is difficult to get in-situ data from flooded regions as the flood is happening,
making it difficult to validate their results. Lastly, the GEE platform has a significant delay
in uploading the most recent RS imagery, up to several days later, making this unsuitable
for real-time flood forecasting.
To assess the suitability of GEE for disaster recovery, the authors in [215] used a RF
model trained on Landsat imagery to do change detection on pre- and post-disaster areas
in the Philippines. However, the authors found that a lack of cloud-free VHR imagery led
Remote Sens. 2022, 14, 3253 94 of 110

to lesser model performance, especially in complex urban environments in the aftermath


of a hurricane. In the future, SAR imagery and DL methods could be used to increase
model accuracy.
Using RGB images as input, the authors in [216] proposed an automatic building
detection method to find buildings and their irregularities in pre- and post-disaster (sub-)
meter resolution images. Firstly, a knowledge-based method, which utilized shadow
information, was combined with an edge-based method that uses texture information, to
find building maps in temporal pre-disaster images. Then, a two-level fusion that used
spectral and georeferenced features was applied to find building irregularities in post-
disaster images. Building facade and rooftop were also considered in the oblique imagery.
This method was implemented on the GEE platform and evaluated using Hurricane Nate
(2017) and Hurricane Harvey (2017) oblique images. Temporal pre-disaster data were
provided by NAIP, which acquired aerial imagery in 1-m resolution during the agricultural
growing seasons. NOAA provided post-disaster data in nadir and different oblique angles
varying about 30 degrees. Some post-disaster images were manually uploaded to GEE
servers for evaluation.

Appendix C.12. Textual Summaries for Soil


To determine how different datasets and ML models perform in predicting soil organic
matter, the authors in [222] compared an ANN, RF, and SVR model with MODIS, Sentinel-
2A, and DEM data as input. They found that for all models, Sentinel-2A data were better
for model performance due to its higher spectral and spatial resolution. Among the models,
the RF performed the best, making the best combination the RF trained on Sentinel-2A
data. The authors also looked at which input bands were correlated with better predictive
performance and found that indices (e.g., NDVI, NDWI) were not correlated in either
dataset, while the elevation, SWIR, RGB bands were. This type of analysis is important
because it addresses not only which dataset is important, but which data are important to
include based on data availability. However, the authors caution that more work needs
to be done to make their model more generalizable and robust. This could take the form
of incorporating different types of data or data from outside their study region so that the
model has more data and more variation to learn from.
The authors in [217] produced an early soil mapping study on the GEE platform in
2015. They used a Rifle Serial Classifier to test out soil-type classification and a CART for
soil organic matter percentage regression over the entire contiguous United States. Their
methods at the time, while poor, matched other comparable studies in digital soil mapping,
meaning that GEE was not a limitation on performance. The cloud computing platform
sped up their processing time from 1.5–3.5 h down to 2 min. However, despite the freely
available compute, the authors noted several limitations with the platform. First, a major
limitation of the platform (then still in its early stages) was a lack of processing methods
like kriging and uncertainty analysis. Second, the authors came up against processing
limits when using a large number of field samples. According to many other authors
included in this review, these two issues are still some of the top cited limitations on the
platform. Lastly, the authors noted that while a researcher can make data and code scripts
private, ultimately the analysis is stored on a remote server so GEE may not be suitable for
analyzing, storing, or transmitting sensitive data.
The authors in [218] explored GEE’s potential to make a global soil salinity map
based on field data and Landsat thermal infrared imagery. GEE allowed the authors to
run their processing steps quickly, though creating thermal mosaics on the platform still
took hours. However, because the field samples dataset the authors used was sparse,
they achieved accuracy rates between 67 and 70%. Visual analysis of their results showed
that some regions with low field samples were correctly classified on the regional scale,
while others were considerably overestimated. In their conclusion, the authors noted that
many researchers may be hesitant to use the platform since model and processing function
implementations may not be known and could change without the researcher knowing.
Remote Sens. 2022, 14, 3253 95 of 110

Using field observations, DEM data, and Landsat imagery, the authors in [219] sought
to address these issues by mapping different soil types and soil attributes across a large
region in Brazil using the GEE platform. The authors were able to show that elevation,
climate data, as well as the SWIR2, NIR, and Blue bands from Landsat imagery are the
most important factors in determining soil types, even at different soil depths. However,
the authors noted that more soil observations were needed to increase the accuracy of their
method and would aid further digital soil mapping studies.
The authors in [221] were able to produce a global, high-resolution soil moisture map
on GEE, by using optical, thermal, and SAR imagery in addition to DEM data. The authors
used a GBRT model to train on in-situ observations paired with RS imagery to then predict
soil moisture in other locations. After running a relative variable importance analysis,
the authors found that optical RS imagery and land-cover information played the most
important roles in determining soil moisture content, but that SAR imagery and soil data
also contributed significantly to the model’s overall performance. Their finding highlights
other studies results ([95,161,182]) that the combination of optical and SAR data improves
predictive outcomes. The entire processing pipeline is now an open-source Python package
(PYSMM). However, the authors had issues with the GEE platform. The model needed to
be trained offline due to issues with flexibility and design, and the validation soil moisture
observation dataset was not available on the platform. The authors noted that sparse or
clustered observations led to model inaccuracies, which was a call to both collect more soil
moisture observation data but also to upload more of it (and other diverse types of data) to
the GEE platform.
The authors in [220] explored the effects of spatial aggregation of climatic, biotic,
topographic and soil variables on national estimates of litter and soil C stocks and charac-
terized the spatial distribution of litter and soil C stocks in the conterminous United States
(CONUS). Litter and soil variables were measured on permanent sample plots from the
National Forest Inventory (NFI) from 2000 to 2011. These data were used with vegetation
phenology data estimated from Landsat 7 imagery and raster data describing environmen-
tal variables for the entire CONUS to predict litter and soil carbon stocks. Specifically, the
maximum of NDVI values from the growing season and forty categorical and continuous
environmental variables compiled from various data sources and resolutions with ArcGIS
were selected as predictor variables. Three supervised ML methods (i.e., RF, quantile regres-
sion forest (QRF) and KNN) were chosen to model the distribution of litter and soil carbon
stocks. All analyses were conducted with R. The results suggested that the RF and QRF
prediction models performed better than KNN models although results across the three
methods were similar. All modeling approaches performed better for soil compared to litter
layers and the spatial pattern of association between litter, soil carbon, and environmental
covariates observed from the RF and QRF models may reflect spatial patterns in litter
decomposition, soil chemistry, and plant and microbial communities.

Appendix C.13. Textual Summaries for Cloud Detection and Masking


Researchers in [223] treated cloud detection as a change detection problem across time
using a kernel ridge regression model. This allowed them to detect nonlinear features
that are easier to identify in RS time series imagery. The authors tested their algorithm
on Landsat and SPOT imagery and showed that it performed better than Fmask while
obtaining less false positives during classification. Additionally, the authors in [223]
implemented their model directly on GEE so that it can be run alongside other preprocessing
tasks without the need to switch to an outside cloud or offline coding environment.
Cloud detection methods for optical satellite images can be divided into monotem-
poral single scene and multitemporal approaches. Single scene approaches use only the
information from a given image to build the cloud mask, while multitemporal approaches
also exploit the information of previously acquired images, collocated over the same area,
to improve the cloud detection accuracy. Multitemporal methods are computationally
demanding, and most of the multitemporal cloud detection schemes cast the problem as a
Remote Sens. 2022, 14, 3253 96 of 110

change detection problem. The authors in [224] implemented a multitemporal cloud detec-
tion method using the GEE Python API, which was applied to the Landsat-8 imagery and
validated over a large collection of manually labeled cloud masks from the Biome dataset.
The approach was based on a simple multitemporal background modeling algorithm, in
which k-means clustering was applied to the difference image between the cloudy image
(target) and the cloud-free estimated background (reference). The obtained clusters were
then labeled as cloudy or cloud-free areas by applying a set of thresholds on the difference
intensity and on the reflectance of the representative clusters. This approach was found
to outperform single-scene threshold-based cloud detection approaches such as FMask
(Zhu et al. 2015). More specifically, linear and nonlinear least squares regression algorithms
were proposed to minimize both the prediction and the estimation error simultaneously.
Significant differences in the image of interest with respect to the estimated background
were identified as clouds. The use of kernel methods allowed the generalization of the al-
gorithm to account for higher-order (nonlinear) feature relations. The method was tested in
a dataset with 5-day revisit time series from SPOT-4 at high resolution and with Landsat-8
time series.
A CNN model, called DeepGEE-CD, was built in [225] to detect clouds in RS imagery
directly on the GEE platform. First, the authors developed and trained the CNN locally
and then uploaded the weights to GEE. They then implemented most of the layers in
the network, with the exception of a few of the more complicated convolutional layers
which were too complicated to be coded directly on the GEE platform. This CNN can run
inference directly in the cloud. In addition, the authors made the model flexible, able to
handle RS imagery of varying input sizes. The CNN gets comparable performance to the
Fmask algorithm, but without the additional information in the form of physical rules that
Fmask needs to work well.
To explore how CV algorithms and ML models can be used together on GEE, the
authors in [226] combined the existing Cloud-Score algorithm with a SVM to detect clouds
in imagery ranging from Amazon tropical forests, Hainan Island, and Sri Lanka. The
cloud-score algorithm first masked input RS imagery, then was used for input to train the
SVM. This process led to much higher accuracy rates than any of the other CV algorithms
for cloud detection and did so with considerably lower error rates.
The authors in [227] implemented their cloud removal DL model directly in GEE. Their
model, DeepGEE-S2CR, is a cloud-optimized version of the DSen2-CR model presented
in [228] and fused co-registered Sentinel-1 and Sentinel-1-2 images from the SEN12MS-CR
dataset. First, the authors trained their CNN locally and then uploaded the weights to GEE.
They then designed the network using the GEE API, implementing layers and custom cost
functions so that the CNN fits into memory constraints. The authors showed that their
model had a slight reduction in RMSE, but produced very similar results to the bigger and
more compute-intensive DSen2-CR. The CNN can be run directly on GEE without the need
to download, store, and process data locally.

Appendix C.14. Textual Summaries for Wildlife and Animal Studies


UAS were explored in [229] for identifying Ny. darlingi breeding sites with high-
resolution imagery (~0.02 m/pixel) and their multispectral profile in Amazonian Peru.
Both RGB and multispectral imagery were collected simultaneously, and the addition
of multispectral bands were found to add critical information to differentiate the water
bodies. All multispectral orthomosaics were uploaded to GEE assets and a RF classification
was performed. Their findings back the use of a low-cost UASs and the GEE platform to
achieve a highly accurate classification of the differential spectral signature of water bodies
that harbor Ny. darlingi larvae and those that do not, resulting in new ways to control
and survey malaria in affected settings. The portability of UASs allows investigators to
navigate moderately hostile and complex environments and to generate maps with a higher
resolution compared to those available through satellites. However, transport of imagery
Remote Sens. 2022, 14, 3253 97 of 110

in any physical storage unit to GEE needs a stable internet connection. Ways need to be
developed to speed up image transfer and processing.
A set of freely available environmental variables (i.e., habitat information from RS
observations and climatic information from weather stations), was used in [230] to assess
and predict the roadkill risk. For each of the seven medium-large mammals, they performed
binomial logistic regressions relating the roadkill presence-absence in the road sections
across the survey dates, with the collection of environmental variables (land cover classes,
forest cover, distance to rivers, temperature, precipitation, and NDVI) and the temporal
and spatial trends of overall roadkill. The intrinsic spatial and temporal roadkill risk were
the most important variables, followed by land cover, climate and NDVI. The modeling
framework of coupling RS information, climate data, traffic volume and biodiversity
metrics, may allow to provide more accurate roadkill risk predictions in near real time and
potentially at the global scale.
A semi-automated framework developed in [231] for monitoring large complex
wildlife aggregations using drone-acquired imagery over four large and complex wa-
terbird colonies. The semi-automated approach applied a RF classifier to high-resolution
drone imagery to identify nests, followed by predictive modeling (k-fold estimation) to
estimate nest counts from the mapped nest area. Arithmetic and textural metrics from
the red, green and blue channels in the drone data were calculated and used as predictor
variables in the RF classification, which helped capture more of the spatial and spectral
variation in target features. The predictor variable calculation and nest mapping routines
using RF classification were implemented in GEE. All statistical analyses, including nest
counting and accuracy assessment, were performed in the R programming environment.
Using Landsat RS imagery, climate variables, and government environmental data,
the authors in [232] analyzed Pine Processionary Moth outbreaks in pine forests in southern
Spain. The authors first used a KNN to determine which features related to various
vegetative indices and environmental variables. Then, after choosing a representative
subset of their data based on the KNN’s output, [232] used a RF to predict for pest outbreaks
based on ground-truth defoliation data. The authors found that minimum temperatures in
February and the precipitation patterns for each season were the best at predicting pest
outbreaks, followed by vegetative indices. While having access to medium-resolution
imagery helped the authors map pest outbreaks in pine forests over a large area in Spain,
they noted that more work should be done to collect more ground-truth data and to explore
the use of higher-resolution data products like those from the Sentinel satellites.

Appendix C.15. Textual Summaries for Archaeology


The potential role of GEE in the future of archaeological research were demonstrated
in [233] through two case studies. WorldView-2 satellite imagery with eight bands of
spectral resolution and spatial resolution of 1.84 m provided the base for analysis in
both cases. The first case used a RF classifier in GEE to automatically identify specific
archaeological features across the landscape of the archaeologically rich Faynan region of
Southern Jordan. The second case used the Canny edge-detection algorithm in GEE for
automatic vectorization of archaeological sites. The authors noted that the vectorization
was not appropriate for detailed mapping at a subsite scale unless the results were modified
by a smoothing function. However, at a site-wide or regional scale, the results obtained can
successfully identify the main features in the landscape.
Drone imagery and GEE were used in [234] to detect potsherds in the field in the hopes
of speeding up this process. They trained a CART, RF, and SVM on this drone imagery,
but only the RF model produced adequate results. The authors tested their workflow
in two separate locations in Greece. In their processing pipeline, the authors set the RF
model to output probabilities for where potsherds are in part of a drone image. Then, the
authors iterated over the data three separate times, subjectively determining a threshold
point at every iteration to suppress false positives. This is generally bad practice in ML
research because it means that humans are actively changing the results of the analysis
Remote Sens. 2022, 14, 3253 98 of 110

before releasing them. Perhaps most importantly, the authors vectorized their results at the
end of their analysis so that other researchers can use them for visualization or classification
tasks. This points to an urgent need in EO and ML research: more studies should attempt to
vectorize their data instead of producing binary or multi-class classification maps. However,
their analysis depends on having an internet connection to upload, process, and classify
data with GEE in the field. This is not always possible, perhaps limiting the future utility
of their work. The authors also mention data and compute limits on GEE as being a main
limitation to their analysis. For example, every image uploaded to GEE (at the time of this
paper’s release) is 10 GB. Because the authors used sub-centimeter drone imagery, they had
to downsize each image before uploading it, resulting in a loss of resolution.
Optical and SAR data on GEE were used in [235] to create a classifier capable of
outputting a likelihood that there is a mounded site in a given region of the Cholistan
Desert in Pakistan. Doing field sites there is difficult because it can be unsafe for people
due to its heat and remoteness. Thus, it is important that the authors were able to use a
RF model to show where likely mound sites are to analyze further. However, the authors
introduced some subjectivity by tweaking the probability threshold for mount/no-mound
boundaries. This was necessary because of a lack of high-quality validation data, so it is
difficult to measure the accuracy of their process.

Appendix C.16. Textual Summaries for Coastline Monitoring


The authors in presented an automated method was proposed in [236] to extract
shorelines from Landsat and Sentinel satellite imagery. The accuracy of this method was
assessed for the Sand Motor mega-scale nourishment by comparing the Satellite Derived
Shorelines (SDS) to topographic surveys. The NDWI grayscale image was classified into
a binary water-land image using the unsupervised greyscale classification method. A
region growing algorithm was then applied to cluster all pixels identified as water into a
coherent water mask. The SDS coordinates were smoothed using a 1D Gaussian smoothing
operation to obtain a gradual shoreline. The results showed that the average accuracy of
the SDS for the ideal case of cloud and wave free images for the Sand Motor was 1 m, well
within the pixel resolution. The accuracy decreased in the presence of clouds, waves, sensor
corrections and georeferencing errors. The most important driver of inaccuracy is cloud
cover, which hampers the detection of a SDS and causes large seaward deviations in the
order of 200 m, followed by the presence of waves, which cause deviations of about 40 m.
A seaward bias of the SDS is always present because all drivers of inaccuracy introduce a
seaward shift. Surprisingly, the pansharpening method, which is intended to increase the
image pixel resolution, reduces the accuracy by about a pixel at a sandy shoreline. These
inaccuracies can largely be overcome by creating composite images with a moving average
time window, which results in a continuous dataset with subpixel precision (10–30 m,
depending on the satellite mission).
The capability of satellite RS was evaluated in [237] to resolve at differing temporal
scales the variability and trends in sandy shoreline positions. The authors combined Land-
sat 5/7/8 and Sentinel-2 image datasets to extract time-series of shoreline change at five
long-term monitoring sites across three continents. The images were first preprocessed by
applying panchromatic image sharpening and down-sampling. The sub-pixel shoreline
extraction algorithm consisted of three steps: (1) image classification by a NN classifier
into the four classes of ‘sand’, ‘water’, ‘white-water’ and ‘other land features’; (2) sub-pixel
resolution border segmentation with the aid of histogram thresholding for MNDWI; and
(3) tidal correction. The observed typical horizontal errors varied between a RMSE of
7.3 m and 12.7 m, indicating that pixel size is not the main source of error when extracting
instantaneous shorelines from satellite imagery. The application of semi-variogram analysis
revealed that the presently available satellite imagery can be used to resolve typical shore-
line variability of around 6 months and longer. Event-scale shoreline changes (e.g., rapid
storm-induced shoreline retreat and a major sand nourishment) may also be captured.
Remote Sens. 2022, 14, 3253 99 of 110

Using Landsat images on the GEE platform, a method was proposed in [238] to
map continuous changes in coastlines and tidal flats in the Zhoushan Archipelago during
1985–2017. The workflow flow consists of (1) building the full time series of MNDWI at the
pixel level, (2) performing a temporal segmentation using a binary segmentation algorithm
and deriving the corresponding temporal segments, (3) classifying the coastal cover types
(i.e., water, tidal flats, and land) in each temporal segment based on the features of MNDWI
and regional tidal heights, (4) detecting the change information including conversion types,
turning years and months. The spatial and temporal validation was implemented based on
the visual interpretation of Landsat images. Three major coastal change types are found,
including land reclamation, aquaculture expansion, and assertion of tidal flats; the land
reclamation was the dominant coastal change.

Appendix C.17. Textual Summaries for Bathymetric Mapping


To extend bathymetry maps, researchers [239] have paired field observations of coastal
depths with RS imagery to train models that can then predict in areas where no depth
information is available. The authors trained four different multiple linear regression
models on sonar from field data collection and optical RS imagery to map bathymetric
depths in three different locations near Greece. They got good results with a very simple,
intuitive model. Still, the best performing regression model suffered from both a slight
under- and over- estimation depending on the region, meaning that more field observations
should be included in more locations to capture more of the natural variance in this domain.
While there were crowdsourced bathymetry datasets being collected, they were not publicly
available and so could not be used in this analysis. Even if they were available, though, the
authors note that they would likely run into the limits of GEE’s compute given that authors
working with a large number of field samples often do. In the future, the authors called
for more domain-specific methods to be implemented on the platform and for a fused
Sentinel-1 SAR, Sentinel-2 optical, and DEM dataset to be uploaded to GEE, which would
be useful to a wide-variety of researchers, not just those studying bathymetry mapping.
The authors in [240] used airborne LiDAR, sonar, and Landsat data to estimate
bathymetry in Japan, Puerto Rico, the USA, and Vanuatu, using a RF model. Because
GEE only allows for so much data to be uploaded and analyzed at any one time, the RF
model was prone to overfitting. In the end, the authors’ results did not meet the standards
that would allow the data to be used in practice. For that, the authors note that airborne
LiDAR and sonar data would need to be combined with higher-resolution RS data like
Sentinel or WorldView.

Appendix C.18. Textual Summaries for Ice and Snow


To track the changes in the cryosphere in Alaska, the authors in [241] used a CART
model to map stable snow areas versus snow-loss areas for the snowfields over a wide
area. Over a 19-year period, the authors found that the total area of snowfields in their
region of analysis decreased by 13 km2 and that an additional 48 km2 transitioned from
stable snow fields to ablation zones. However, [241] noted that their automated approach
classified both new snow loss and seasonal snow as the same class, so their classification
results were an overestimation. Thus, future work for mapping perennial snow loss could
focus on the separation of these similar classes. The authors shared their code on GEE so
that other researchers interested in replicating their study or in using parts of their code for
their own analyses can easily do so.
The authors in [242] used National Oceanic and Atmospheric Administration (NOAA),
Advanced Very High Resolution Radiometer (AVHRR), MOD09GQ surface reflectance
products, and Landsat surface reflectance Tier 1 products to study LIP in Qinghai Lake. The
threshold method was used to extract the lake ice area, with the threshold variables being
set by the red-light reflectance value and the difference between the red-light reflectance
and the near-infrared reflectance. The freeze-up start date was defined as the time point
when the lake ice area was continuously greater than or equal to 10% of the lake area. When
Remote Sens. 2022, 14, 3253 100 of 110

the lake ice area was greater than or equal to 90% of the lake area, the date of this day was
determined as freeze-up end. If the lake ice area was stable at less than or equal to 90% of
the lake area, the date of this day was determined as break-up start, while breakup end
was defined as the time point when the ice was less than or equal to 10% of the total cover.
The presence of clouds and crushed ice may cause some errors in the results obtained from
different data sources.

References
1. Yang, L.; MacEachren, A.M.; Mitra, P.; Onorati, T. Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification:
A Review. ISPRS Int. J. Geo-Inf. 2018, 7, 65. [CrossRef]
2. Sebestyén, V.; Czvetkó, T.; Abonyi, J. The Applicability of Big Data in Climate Change Research: The Importance of System of
Systems Thinking. Front. Environ. Sci. 2021, 9, 619092. [CrossRef]
3. Li, Z. Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions. In High
Performance Computing for Geospatial Applications; Tang, W., Wang, S., Eds.; Springer International Publishing: Cham, Switzerland,
2020; pp. 53–76, ISBN 9783030479985.
4. Lee, J.-G.; Kang, M. Geospatial Big Data: Challenges and Opportunities. Big Data Res. 2015, 2, 74–81. [CrossRef]
5. Lippitt, C.D.; Zhang, S. The impact of small unmanned airborne platforms on passive optical remote sensing: A conceptual
perspective. Int. J. Remote Sens. 2018, 39, 4852–4868. [CrossRef]
6. Zhen, L.I.U.; Huadong, G.U.O.; Wang, C. Considerations on Geospatial Big Data. IOP Conf. Ser. Earth Environ. Sci. 2016, 46, 012058.
7. Karimi, H.A. Big Data: Techniques and Technologies in Geoinformatics; CRC Press: Boca Raton, FL, USA, 2014; ISBN 9781466586512.
8. Marr, B. Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance; John Wiley &
Sons: Hoboken, NJ, USA, 2015; ISBN 9781118965825.
9. Deng, X.; Liu, P.; Liu, X.; Wang, R.; Zhang, Y.; He, J.; Yao, Y. Geospatial Big Data: New Paradigm of Remote Sensing Applications.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3841–3851. [CrossRef]
10. Kashyap, R. Geospatial Big Data, Analytics and IoT: Challenges, Applications and Potential. In Cloud Computing for Geospatial Big
Data Analytics: Intelligent Edge, Fog and Mist Computing; Das, H., Barik, R.K., Dubey, H., Roy, D.S., Eds.; Springer International
Publishing: Cham, Switzerland, 2019; pp. 191–213, ISBN 9783030033590.
11. Yang, C.; Yu, M.; Hu, F.; Jiang, Y.; Li, Y. Utilizing Cloud Computing to address big geospatial data challenges. Comput. Environ.
Urban Syst. 2017, 61, 120–128. [CrossRef]
12. Liu, Y.; Dang, L.; Li, S.; Cai, K.; Zuo, X. Research Progress on Models, Algorithms, and Systems for Remote Sensing Spatial-
Temporal Big Data Processing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5918–5931. [CrossRef]
13. Liu, P.; Di, L.; Du, Q.; Wang, L. Remote Sensing Big Data: Theory, Methods and Applications. Remote Sens. 2018, 10, 711.
[CrossRef]
14. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial
analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [CrossRef]
15. Wang, Y.; Ziv, G.; Adami, M.; Mitchard, E.; Batterman, S.A.; Buermann, W.; Marimon, B.S.; Junior, B.H.M.; Reis, S.M.;
Rodrigues, D.; et al. Mapping tropical disturbed forests using multi-decadal 30 m optical satellite imagery. Remote Sens. Environ.
2018, 221, 474–488. [CrossRef]
16. Teluguntla, P.; Thenkabail, P.S.; Oliphant, A.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K.; Huete, A. A 30-m landsat-
derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine
cloud computing platform. ISPRS J. Photogramm. Remote Sens. 2018, 144, 325–340. [CrossRef]
17. Amani, M.; Brisco, B.; Afshar, M.; Mirmazloumi, S.M.; Mahdavi, S.; Mirzadeh, S.M.J.; Huang, W.; Granger, J. A generalized
supervised classification scheme to produce provincial wetland inventory maps: An application of Google Earth Engine for big
geo data processing. Big Earth Data 2019, 3, 378–394. [CrossRef]
18. Kumar, L.; Mutanga, O. Google Earth Engine Applications Since Inception: Usage, Trends, and Potential. Remote Sens. 2018, 10, 1509.
[CrossRef]
19. Samasse, K.; Hanan, N.P.; Anchang, J.Y.; Diallo, Y. A High-Resolution Cropland Map for the West African Sahel Based on
High-Density Training Data, Google Earth Engine, and Locally Optimized Machine Learning. Remote Sens. 2020, 12, 1436.
[CrossRef]
20. Lippitt, C.D.; Stow, D.A.; Clarke, K.C. On the nature of models for time-sensitive remote sensing. Int. J. Remote Sens. 2014, 35,
6815–6841. [CrossRef]
21. Zhou, B.; Okin, G.S.; Zhang, J. Leveraging Google Earth Engine (GEE) and machine learning algorithms to incorporate in situ
measurement from different times for rangelands monitoring. Remote Sens. Environ. 2020, 236, 111521. [CrossRef]
22. Sayad, Y.O.; Mousannif, H.; Al Moatassime, H. Predictive modeling of wildfires: A new dataset and machine learning approach.
Fire Saf. J. 2019, 104, 130–146. [CrossRef]
23. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; Depristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to
deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [CrossRef]
24. Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Health J. 2019, 6, 94–98. [CrossRef]
Remote Sens. 2022, 14, 3253 101 of 110

25. Mittal, S.; Hasija, Y. Applications of Deep Learning in Healthcare and Biomedicine. In Deep Learning Techniques for Biomedical and
Health Informatics; Dash, S., Acharya, B.R., Mittal, M., Abraham, A., Kelemen, A., Eds.; Springer International Publishing: Cham,
Switzerland, 2020; pp. 57–77, ISBN 9783030339661.
26. Boulos, M.N.K.; Peng, G.; VoPham, T. An overview of GeoAI applications in health and healthcare. Int. J. Health Geogr. 2019, 18, 7.
[CrossRef] [PubMed]
27. Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.;
Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications:
A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [CrossRef]
28. Wang, L.; Diao, C.; Xian, G.; Yin, D.; Lu, Y.; Zou, S.; Erickson, T.A. A summary of the special issue on remote sensing of land
change science with Google earth engine. Remote Sens. Environ. 2020, 248, 112002. [CrossRef]
29. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data
applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [CrossRef]
30. Hoeser, T.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part
I: Evolution and Recent Trends. Remote Sens. 2020, 12, 1667. [CrossRef]
31. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive
Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [CrossRef]
32. Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes.
Nature 2016, 540, 418–422. [CrossRef]
33. Decuyper, M.; Chávez, R.O.; Lohbeck, M.; Lastra, J.A.; Tsendbazar, N.; Hackländer, J.; Herold, M.; Vågen, T.-G. Continuous
monitoring of forest change dynamics with satellite time series. Remote Sens. Environ. 2021, 269, 112829. [CrossRef]
34. Guo, H.-D.; Zhang, L.; Zhu, L.-W. Earth observation big data for climate change research. Adv. Clim. Chang. Res. 2015, 6, 108–117.
[CrossRef]
35. Hird, J.N.; DeLancey, E.R.; McDermid, G.J.; Kariyeva, J. Google Earth Engine, Open-Access Satellite Data, and Machine Learning
in Support of Large-Area Probabilistic Wetland Mapping. Remote Sens. 2017, 9, 1315. [CrossRef]
36. Hsu, A.; Khoo, W.; Goyal, N.; Wainstein, M. Next-Generation Digital Ecosystem for Climate Data Mining and Knowledge
Discovery: A Review of Digital Data Collection Technologies. Front. Big Data 2020, 3, 29. [CrossRef] [PubMed]
37. Google Earth Engine. A Planetary-Scale Platform for Earth Science & Data Analysis. Available online: https://earthengine.
google.com/ (accessed on 19 November 2019).
38. National Aeronautics and Space Administration (NASA). Welcome to the NASA Earth Exchange (NEX). Available online:
https://www.nasa.gov/nex (accessed on 23 April 2022).
39. National Aeronautics and Space Administration (NASA). Geostationary-NASA Earth Exchange (GeoNEX). Available online:
https://www.nasa.gov/geonex (accessed on 23 April 2022).
40. Earth on AWS. Available online: https://aws.amazon.com/earth/ (accessed on 10 July 2019).
41. Chandrashekar, S. Announcing Real-Time Geospatial Analytics in Azure Stream Analytics. Available online: https://azure.
microsoft.com/en-us/blog/announcing-real-time-geospatial-analytics-in-azure-stream-analytics/ (accessed on 23 April 2022).
42. Microsoft. Microsoft Planetary Computer. Available online: https://planetarycomputer.microsoft.com/ (accessed on 23 April 2022).
43. Parente, L.; Taquary, E.; Silva, A.P.; Souza, C.; Ferreira, L. Next Generation Mapping: Combining Deep Learning, Cloud
Computing, and Big Remote Sensing Data. Remote Sens. 2019, 11, 2881. [CrossRef]
44. Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review.
ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [CrossRef]
45. Lobell, D.B.; Thau, D.; Seifert, C.; Engle, E.; Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 2015, 164,
324–333. [CrossRef]
46. Shelestov, A.; Lavreniuk, M.; Kussul, N.; Novikov, A.; Skakun, S. Exploring Google Earth Engine Platform for Big Data Processing:
Classification of Multi-Temporal Satellite Imagery for Crop Mapping. Front. Earth Sci. 2017, 5, 17. [CrossRef]
47. Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N.
Nominal 30-m cropland extent map of continental Africa by integrating pixel-based and object-based algorithms using Sentinel-2
and Landsat-8 data on Google Earth Engine. Remote Sens. 2017, 9, 1065. [CrossRef]
48. Xiong, J.; Thenkabail, P.S.; Gumma, M.K.; Teluguntla, P.; Poehnelt, J.; Congalton, R.G.; Yadav, K.; Thau, D. Automated cropland
mapping of continental Africa using Google Earth Engine cloud computing. ISPRS J. Photogramm. Remote Sens. 2017, 126, 225–244.
[CrossRef]
49. Deines, J.M.; Kendall, A.D.; Hyndman, D.W. Annual Irrigation Dynamics in the U.S. Northern High Plains Derived from Landsat
Satellite Data. Geophys. Res. Lett. 2017, 44, 9350–9360. [CrossRef]
50. Kelley, L.C.; Pitcher, L.; Bacon, C. Using Google Earth Engine to Map Complex Shade-Grown Coffee Landscapes in Northern
Nicaragua. Remote Sens. 2018, 10, 952. [CrossRef]
51. Ragettli, S.; Herberz, T.; Siegfried, T. An Unsupervised Classification Algorithm for Multi-Temporal Irrigated Area Mapping in
Central Asia. Remote Sens. 2018, 10, 1823. [CrossRef]
52. Ghazaryan, G.; Dubovyk, O.; Löw, F.; Lavreniuk, M.; Kolotii, A.; Schellberg, J.; Kussul, N. A rule-based approach for crop
identification using multi-temporal and multi-sensor phenological metrics. Eur. J. Remote Sens. 2018, 51, 511–524. [CrossRef]
Remote Sens. 2022, 14, 3253 102 of 110

53. Mandal, D.; Kumar, V.; Bhattacharya, A.; Rao, Y.S.; Siqueira, P.; Bera, S. Sen4Rice: A Processing Chain for Differentiating Early
and Late Transplanted Rice Using Time-Series Sentinel-1 SAR Data with Google Earth Engine. IEEE Geosci. Remote Sens. Lett.
2018, 15, 1947–1951. [CrossRef]
54. Oliphant, A.J.; Thenkabail, P.S.; Teluguntla, P.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K. Mapping cropland extent of
Southeast and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google
Earth Engine cloud. Int. J. App. Earth Observ. Geoinf. 2019, 81, 110–124. [CrossRef]
55. Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model. Sensors 2019, 19, 4363.
[CrossRef] [PubMed]
56. Wang, M.; Liu, Z.; Baig, M.H.A.; Wang, Y.; Li, Y.; Chen, Y. Mapping sugarcane in complex landscapes by integrating multi-temporal
Sentinel-2 images and machine learning algorithms. Land Use Policy 2019, 88, 104190. [CrossRef]
57. Tian, F.; Wu, B.; Zeng, H.; Zhang, X.; Xu, J. Efficient Identification of Corn Cultivation Area with Multitemporal Synthetic
Aperture Radar and Optical Images in the Google Earth Engine Cloud Platform. Remote Sens. 2019, 11, 629. [CrossRef]
58. Xie, Y.; Lark, T.J.; Brown, J.F.; Gibbs, H.K. Mapping irrigated cropland extent across the conterminous United States at 30 m
resolution using a semi-automatic training approach on Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2019, 155,
136–149. [CrossRef]
59. Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at
national scales with Google Earth Engine. Remote Sens. Environ. 2019, 228, 115–128. [CrossRef]
60. Rudiyanto; Minasny, B.; Shah, R.M.; Che Soh, N.; Arif, C.; Indra Setiawan, B.; Rudiyanto Minasny, B. Automated Near-Real-Time
Mapping and Monitoring of Rice Extent, Cropping Patterns, and Growth Stages in Southeast Asia Using Sentinel-1 Time Series
on a Google Earth Engine Platform. Remote Sens. 2019, 11, 1666. [CrossRef]
61. Wang, S.; Azzari, G.; Lobell, D.B. Crop type mapping without field-level labels: Random forest transfer and unsupervised
clustering techniques. Remote Sens. Environ. 2019, 222, 303–317. [CrossRef]
62. Liang, L.; Runkle, B.R.K.; Sapkota, B.B.; Reba, M.L. Automated mapping of rice fields using multi-year training sample
normalization. Int. J. Remote Sens. 2019, 40, 7252–7271. [CrossRef]
63. Tian, H.F.; Huang, N.; Niu, Z.; Qin, Y.C.; Pei, J.; Wang, J. Mapping Winter Crops in China with Multi-Source Satellite Imagery and
Phenology-Based Algorithm. Remote Sens. 2019, 11, 820. [CrossRef]
64. Neetu; Ray, S.S. Exploring machine learning classification algorithms for crop classification using sentinel 2 data. Int. Arch.
Photogramm. Remote Sens. Spatial Inf. Sci. 2019, XLII-3/W6, 573–578. [CrossRef]
65. Gumma, M.K.; Thenkabail, P.S.; Teluguntla, P.G.; Oliphant, A.; Xiong, J.; Giri, C.; Pyla, V.; Dixit, S.; Whitbread, A.M. Agricultural
cropland extent and areas of South Asia derived using Landsat satellite 30-m time-series big-data using random forest machine
learning algorithms on the Google Earth Engine cloud. GISci. Remote Sens. 2019, 57, 302–322. [CrossRef]
66. Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data and
Machine Learning in China. Remote Sens. 2020, 12, 236. [CrossRef]
67. Phalke, A.R.; Özdoğan, M.; Thenkabail, P.S.; Erickson, T.; Gorelick, N.; Yadav, K.; Congalton, R.G. Mapping Croplands of Europe,
Middle East, Russia, and Central Asia Using Landsat, Random Forest, and Google Earth Engine. ISPRS J. Photogramm. Remote
Sens. 2020, 167, 104–122. [CrossRef]
68. Chen, N.; Yu, L.; Zhang, X.; Shen, Y.; Zeng, L.; Hu, Q.; Niyogi, D. Mapping Paddy Rice Fields by Combining Multi-Temporal
Vegetation Index and Synthetic Aperture Radar Remote Sensing Data Using Google Earth Engine Machine Learning Platform.
Remote Sens. 2020, 12, 2992. [CrossRef]
69. Amani, M.; Kakooei, M.; Moghimi, A.; Ghorbanian, A.; Ranjgar, B.; Mahdavi, S.; Davidson, A.; Fisette, T.; Rollin, P.; Brisco, B.; et al.
Application of Google Earth Engine Cloud Computing Platform, Sentinel Imagery, and Neural Networks for Crop Mapping in
Canada. Remote Sens. 2020, 12, 3561. [CrossRef]
70. You, N.; Dong, J. Examining Earliest Identifiable Timing of Crops Using All Available Sentinel 1/2 Imagery and Google Earth
Engine. ISPRS J. Photogramm. Remote Sens. 2020, 161, 109–123.
71. Poortinga, A.; Thwal, N.S.; Khanal, N.; Mayer, T.; Bhandari, B.; Markert, K.; Nicolau, A.P.; Dilger, J.; Tenneson, K.; Clinton, N.; et al.
Mapping sugarcane in Thailand using transfer learning, a lightweight convolutional neural network, NICFI high resolution
satellite imagery and Google Earth Engine. ISPRS Open J. Photogramm. Remote Sens. 2021, 1, 100003. [CrossRef]
72. Adrian, J.; Sagan, V.; Maimaitijiang, M. Sentinel SAR-optical fusion for crop type mapping using deep learning and Google Earth
Engine. ISPRS J. Photogramm. Remote Sens. 2021, 175, 215–235. [CrossRef]
73. Cao, J.; Zhang, Z.; Luo, Y.; Zhang, L.; Zhang, J.; Li, Z.; Tao, F. Wheat yield predictions at a county and field scale with deep
learning, machine learning, and google earth engine. Eur. J. Agron. 2020, 123, 126204. [CrossRef]
74. Luo, C.; Qi, B.; Liu, H.; Guo, D.; Lu, L.; Fu, Q.; Shao, Y. Using Time Series Sentinel-1 Images for Object-Oriented Crop Classification
in Google Earth Engine. Remote Sens. 2021, 13, 561. [CrossRef]
75. Ni, R.; Tian, J.; Li, X.; Yin, D.; Li, J.; Gong, H.; Zhang, J.; Zhu, L.; Wu, D. An enhanced pixel-based phenological feature for accurate
paddy rice mapping with Sentinel-2 imagery in Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 178, 282–296.
[CrossRef]
76. Sun, Y.; Qin, Q.; Ren, H.; Zhang, Y. Decameter Cropland LAI/FPAR Estimation from Sentinel-2 Imagery Using Google Earth
Engine. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [CrossRef]
Remote Sens. 2022, 14, 3253 103 of 110

77. Li, M.; Zhang, R.; Luo, H.; Gu, S.; Qin, Z. Crop Mapping in the Sanjiang Plain Using an Improved Object-Oriented Method Based
on Google Earth Engine and Combined Growth Period Attributes. Remote Sens. 2022, 14, 273. [CrossRef]
78. Han, L.; Ding, J.; Wang, J.; Zhang, J.; Xie, B.; Hao, J. Monitoring Oasis Cotton Fields Expansion in Arid Zones Using the Google
Earth Engine: A Case Study in the Ogan-Kucha River Oasis, Xinjiang, China. Remote Sens. 2022, 14, 225. [CrossRef]
79. Hedayati, A.; Vahidnia, M.H.; Behzadi, S. Paddy lands detection using Landsat-8 satellite images and object-based classification
in Rasht city, Iran. Egypt. J. Remote Sens. Space Sci. 2022, 25, 73–84. [CrossRef]
80. Azzari, G.; Lobell, D. Landsat-based classification in the cloud: An opportunity for a paradigm shift in land cover monitoring.
Remote Sens. Environ. 2017, 202, 64–74. [CrossRef]
81. Midekisa, A.; Holl, F.; Savory, D.J.; Andrade-Pacheco, R.; Gething, P.; Bennett, A.; Sturrock, H. Mapping land cover change over
continental Africa using Landsat and Google Earth Engine cloud computing. PLoS ONE 2017, 12, e0184926. [CrossRef]
82. Hu, Y.; Dong, Y. Batunacun An Automatic Approach for Land-Change Detection and Land Updates Based on Integrated NDVI
Timing Analysis and the CVAPS Method with GEE Support. ISPRS J. Photogramm. Remote Sens. 2018, 146, 347–359. [CrossRef]
83. Ge, Y.; Hu, S.; Ren, Z.; Jia, Y.; Wang, J.; Liu, M.; Zhang, D.; Zhao, W.; Luo, Y.; Fu, Y.; et al. Mapping annual land use changes in
China’s poverty-stricken areas from 2013 to 2018. Remote Sens. Environ. 2019, 232, 111285. [CrossRef]
84. Lee, J.; Cardille, J.A.; Coe, M.T. BULC-U: Sharpening Resolution and Improving Accuracy of Land-Use/Land-Cover Classifications
in Google Earth Engine. Remote Sens. 2018, 10, 1455. [CrossRef]
85. Zurqani, H.A.; Post, C.J.; Mikhailova, E.A.; Schlautman, M.A.; Sharp, J.L. Geospatial analysis of land use change in the Savannah
River Basin using Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2018, 69, 175–185. [CrossRef]
86. Murray, N.J.; Keith, D.A.; Simpson, D.; Wilshire, J.H.; Lucas, R.M. Remap: An online remote sensing application for land cover
classification and monitoring. Methods Ecol. Evol. 2018, 9, 2019–2027. [CrossRef]
87. Mardani, M.; Mardani, H.; De Simone, L.; Varas, S.; Kita, N.; Saito, T. Integration of Machine Learning and Open Access Geospatial
Data for Land Cover Mapping. Remote Sens. 2019, 11, 1907. [CrossRef]
88. Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited
sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci.
Bull. 2019, 64, 370–373. [CrossRef]
89. Hao, B.; Ma, M.; Li, S.; Li, Q.; Hao, D.; Huang, J.; Ge, Z.; Yang, H.; Han, X. Land Use Change and Climate Variation in the Three
Gorges Reservoir Catchment from 2000 to 2015 Based on the Google Earth Engine. Sensors 2019, 19, 2118. [CrossRef]
90. Miettinen, J.; Shi, C.; Liew, S.C. Towards automated 10–30 m resolution land cover mapping in insular South-East Asia. Geocarto
Int. 2017, 34, 443–457. [CrossRef]
91. Xie, S.; Liu, L.; Zhang, X.; Yang, J.; Chen, X.; Gao, Y. Automatic Land-Cover Mapping using Landsat Time-Series Data based on
Google Earth Engine. Remote Sens. 2019, 11, 3023. [CrossRef]
92. Adepoju, K.A.; Adelabu, S.A. Improving accuracy of Landsat-8 OLI classification using image composite and multisource data
with Google Earth Engine. Remote Sens. Lett. 2019, 11, 107–116. [CrossRef]
93. Ghorbanian, A.; Kakooei, M.; Amani, M.; Mahdavi, S.; Mohammadzadeh, A.; Hasanlou, M. Improved land cover map of Iran
using Sentinel imagery within Google Earth Engine and a novel automatic workflow for land cover classification using migrated
training samples. ISPRS J. Photogramm. Remote. Sens. 2020, 167, 276–288. [CrossRef]
94. Liang, J.; Xie, Y.; Sha, Z.; Zhou, A. Modeling urban growth sustainability in the cloud by augmenting Google Earth Engine (GEE).
Comput. Environ. Urban Syst. 2020, 84, 101542. [CrossRef]
95. Zeng, H.; Wu, B.; Wang, S.; Musakwa, W.; Tian, F.; Mashimbye, Z.E.; Poona, N.; Syndey, M. A Synthesizing Land-cover
Classification Method Based on Google Earth Engine: A Case Study in Nzhelele and Levhuvu Catchments, South Africa. Chin.
Geogr. Sci. 2020, 30, 397–409. [CrossRef]
96. Naboureh, A.; Li, A.; Bian, J.; Lei, G.; Amani, M. A Hybrid Data Balancing Method for Classification of Imbalanced Training Data
within Google Earth Engine: Case Studies from Mountainous Regions. Remote Sens. 2020, 12, 3301. [CrossRef]
97. Naboureh, A.; Ebrahimy, H.; Azadbakht, M.; Bian, J.; Amani, M. RUESVMs: An Ensemble Method to Handle the Class Imbalance
Problem in Land Cover Mapping Using Google Earth Engine. Remote Sens. 2020, 12, 3484. [CrossRef]
98. Li, Q.; Qiu, C.; Ma, L.; Schmitt, M.; Zhu, X.X. Mapping the Land Cover of Africa at 10 m Resolution from Multi-Source Remote
Sensing Data with Google Earth Engine. Remote Sens. 2020, 12, 602. [CrossRef]
99. Huang, H.; Wang, J.; Liu, C.; Liang, L.; Li, C.; Gong, P. The migration of training samples towards dynamic global land cover
mapping. ISPRS J. Photogramm. Remote Sens. 2020, 161, 27–36. [CrossRef]
100. Tassi, A.; Vizzari, M. Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine
Learning Algorithms. Remote Sens. 2020, 12, 3776. [CrossRef]
101. Shetty, S.; Gupta, P.; Belgiu, M.; Srivastav, S. Assessing the Effect of Training Sampling Design on the Performance of Machine
Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine. Remote Sens.
2021, 13, 1433. [CrossRef]
102. Feizizadeh, B.; Omarzadeh, D.; Garajeh, M.K.; Lakes, T.; Blaschke, T. Machine learning data-driven approaches for land use/cover
mapping and trend analysis using Google Earth Engine. J. Environ. Plan. Manag. 2021, 1–33. [CrossRef]
103. Shafizadeh-Moghadam, H.; Khazaei, M.; Alavipanah, S.K.; Weng, Q. Google Earth Engine for large-scale land use and land cover
mapping: An object-based classification approach using spectral, textural and topographical factors. GISci. Remote Sens. 2021, 58,
914–928. [CrossRef]
Remote Sens. 2022, 14, 3253 104 of 110

104. Pan, X.; Wang, Z.; Gao, Y.; Dang, X.; Han, Y. Detailed and automated classification of land use/land cover using machine learning
algorithms in Google Earth Engine. Geocarto Int. 2021, 1–18. [CrossRef]
105. Becker, W.R.; Ló, T.B.; Johann, J.A.; Mercante, E. Statistical features for land use and land cover classification in Google Earth
Engine. Remote Sens. Appl. Soc. Environ. 2020, 21, 100459. [CrossRef]
106. Jin, Q.; Xu, E.; Zhang, X. A Fusion Method for Multisource Land Cover Products Based on Superpixels and Statistical Extraction
for Enhancing Resolution and Improving Accuracy. Remote Sens. 2022, 14, 1676. [CrossRef]
107. Lee, J.S.H.; Wich, S.; Widayati, A.; Koh, L.P. Detecting industrial oil palm plantations on Landsat images with Google Earth
Engine. Remote Sens. Appl. Soc. Environ. 2016, 4, 219–224. [CrossRef]
108. Voight, C.; Hernandez-Aguilar, K.; Garcia, C.; Gutierrez, S. Predictive Modeling of Future Forest Cover Change Patterns in
Southern Belize. Remote Sens. 2019, 11, 823. [CrossRef]
109. Koskinen, J.; Leinonen, U.; Vollrath, A.; Ortmann, A.; Lindquist, E.; D’Annunzio, R.; Pekkarinen, A.; Käyhkö, N. Participatory
mapping of forest plantations with Open Foris and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2018, 148, 63–74.
[CrossRef]
110. Duan, Q.; Tan, M.; Guo, Y.; Wang, X.; Xin, L. Understanding the Spatial Distribution of Urban Forests in China Using Sentinel-2
Images with Google Earth Engine. Forests 2019, 10, 729. [CrossRef]
111. Poortinga, A.; Tenneson, K.; Shapiro, A.; Nquyen, Q.; Aung, K.S.; Chishtie, F.; Saah, D. Mapping Plantations in Myanmar
by Fusing Landsat-8, Sentinel-2 and Sentinel-1 Data along with Systematic Error Quantification. Remote Sens. 2019, 11, 831.
[CrossRef]
112. Shimizu, K.; Ota, T.; Mizoue, N. Detecting Forest Changes Using Dense Landsat 8 and Sentinel-1 Time Series Data in Tropical
Seasonal Forests. Remote Sens. 2019, 11, 1899. [CrossRef]
113. Ramdani, F. Recent expansion of oil palm plantation in the most eastern part of Indonesia: Feature extraction with polarimetric
SAR. Int. J. Remote Sens. 2018, 40, 7371–7388. [CrossRef]
114. Çolak, E.; Chandra, M.; Sunar, F. The use of multi-temporal sentinel satellites in the analysis of land cover/land use changes
caused by the nuclear power plant construction. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-3/W8,
491–495. [CrossRef]
115. Shaharum, N.S.N.; Shafri, H.Z.M.; Ghani, W.A.W.A.K.; Samsatli, S.; Al-Habshi, M.M.A.; Yusuf, B. Oil palm mapping over Peninsular
Malaysia using Google Earth Engine and machine learning algorithms. Remote Sens. Appl. Soc. Environ. 2020, 17, 100287. [CrossRef]
116. De Sousa, C.; Fatoyinbo, L.; Neigh, C.; Boucka, F.; Angoue, V.; Larsen, T. Cloud-computing and machine learning in support
of country-level land cover and ecosystem extent mapping in Liberia and Gabon. PLoS ONE 2020, 15, e0227438. [CrossRef]
[PubMed]
117. Brovelli, M.A.; Sun, Y.; Yordanov, V. Monitoring Forest Change in the Amazon Using Multi-Temporal Remote Sensing Data and
Machine Learning Classification on Google Earth Engine. ISPRS Int. J. Geo-Inf. 2020, 9, 580. [CrossRef]
118. Kamal, M.; Farda, N.M.; Jamaluddin, I.; Parela, A.; Wikantika, K.; Prasetyo, L.B.; Irawan, B. A preliminary study on machine
learning and google earth engine for mangrove mapping. IOP Conf. Series Earth Environ. Sci. 2020, 500, 012038. [CrossRef]
119. Wei, C.; Karger, D.N.; Wilson, A.M. Spatial detection of alpine treeline ecotones in the Western United States. Remote Sens. Environ.
2020, 240, 111672. [CrossRef]
120. Praticò, S.; Solano, F.; Di Fazio, S.; Modica, G. Machine Learning Classification of Mediterranean Forest Habitats in Google
Earth Engine Based on Seasonal Sentinel-2 Time-Series and Input Image Composition Optimisation. Remote Sens. 2021, 13, 586.
[CrossRef]
121. Xie, B.; Cao, C.; Xu, M.; Duerler, R.; Yang, X.; Bashir, B.; Chen, Y.; Wang, K. Analysis of Regional Distribution of Tree Species
Using Multi-Seasonal Sentinel-1&2 Imagery within Google Earth Engine. Forests 2021, 12, 565. [CrossRef]
122. Floreano, I.X.; de Moraes, L.A.F. Land Use/land Cover (LULC) Analysis (2009–2019) with Google Earth Engine and 2030
Prediction Using Markov-CA in the Rondônia State, Brazil. Environ. Monit. Assess. 2021, 193, 239. [CrossRef]
123. Kumar, M.; Phukon, S.N.; Paygude, A.C.; Tyagi, K.; Singh, H. Mapping Phenological Functional Types (PhFT) in the Indian
Eastern Himalayas using machine learning algorithm in Google Earth Engine. Comput. Geosci. 2021, 158, 104982. [CrossRef]
124. Zhao, F.; Sun, R.; Zhong, L.; Meng, R.; Huang, C.; Zeng, X.; Wang, M.; Li, Y.; Wang, Z. Monthly mapping of forest harvesting
using dense time series Sentinel-1 SAR imagery and deep learning. Remote Sens. Environ. 2021, 269, 112822. [CrossRef]
125. Wimberly, M.C.; Dwomoh, F.K.; Numata, I.; Mensah, F.; Amoako, J.; Nekorchuk, D.M.; McMahon, A. Historical trends of
degradation, loss, and recovery in the tropical forest reserves of Ghana. Int. J. Digit. Earth 2022, 15, 30–51. [CrossRef]
126. Johansen, K.; Phinn, S.; Taylor, M. Mapping woody vegetation clearing in Queensland, Australia from Landsat imagery using the
Google Earth Engine. Remote Sens. Appl. Soc. Environ. 2015, 1, 36–49. [CrossRef]
127. Traganos, D.; Aggarwal, B.; Poursanidis, D.; Topouzelis, K.; Chrysoulakis, N.; Reinartz, P. Towards Global-Scale Seagrass Mapping
and Monitoring Using Sentinel-2 on Google Earth Engine: The Case Study of the Aegean and Ionian Seas. Remote Sens. 2018, 10, 1227.
[CrossRef]
128. Tsai, Y.H.; Stow, D.; Chen, H.L.; Lewison, R.; An, L.; Shi, L. Mapping Vegetation and Land Use Types in Fanjingshan National
Nature Reserve Using Google Earth Engine. Remote Sens. 2018, 10, 927. [CrossRef]
129. Jansen, V.S.; Kolden, C.A.; Schmalz, H.J. The Development of Near Real-Time Biomass and Cover Estimates for Adaptive
Rangeland Management Using Landsat 7 and Landsat 8 Surface Reflectance Products. Remote Sens. 2018, 10, 1057. [CrossRef]
Remote Sens. 2022, 14, 3253 105 of 110

130. Jones, M.O.; Allred, B.W.; Naugle, D.E.; Maestas, J.; Donnelly, P.; Metz, L.J.; Karl, J.; Smith, R.; Bestelmeyer, B.; Boyd, C.; et al.
Innovation in rangeland monitoring: Annual, 30 m, plant functional type percent cover maps for U.S. rangelands, 1984–2017.
Ecosphere 2018, 9, e02430. [CrossRef]
131. Campos-Taberner, M.; Moreno-Martínez, Á.; García-Haro, F.J.; Camps-Valls, G.; Robinson, N.P.; Kattge, J.; Running, S.W. Global
Estimation of Biophysical Variables from Google Earth Engine Platform. Remote Sens. 2018, 10, 1167. [CrossRef]
132. Xin, Y.; Adler, P.R. Mapping Miscanthus Using Multi-Temporal Convolutional Neural Network and Google Earth Engine. In
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, Chicago, IL,
USA, 5 November 2019; pp. 81–84. [CrossRef]
133. Parente, L.; Mesquita, V.; Miziara, F.; Baumann, L.; Ferreira, L. Assessing the pasturelands and livestock dynamics in Brazil, from
1985 to 2017: A novel approach based on high spatial resolution imagery and Google Earth Engine cloud computing. Remote Sens.
Environ. 2019, 232, 111301. [CrossRef]
134. Zhang, M.; Gong, P.; Qi, S.; Liu, C.; Xiong, T. Mapping bamboo with regional phenological characteristics derived from dense
Landsat time series using Google Earth Engine. Int. J. Remote Sens. 2019, 40, 9541–9555. [CrossRef]
135. Alencar, A.; Shimbo, J.Z.; Lenti, F.; Balzani Marques, C.; Zimbres, B.; Rosa, M.; Arruda, V.; Castro, I.; Fernandes Márcico Ribeiro,
J.P.; Varela, V.; et al. Mapping Three Decades of Changes in the Brazilian Savanna Native Vegetation Using Landsat Data Processed
in the Google Earth Engine Platform. Remote Sens. 2020, 12, 924. [CrossRef]
136. Tian, J.; Wang, L.; Yin, D.; Li, X.; Diao, C.; Gong, H.; Shi, C.; Menenti, M.; Ge, Y.; Nie, S.; et al. Development of spectral-phenological
features for deep learning to understand Spartina alterniflora invasion. Remote Sens. Environ. 2020, 242, 111745. [CrossRef]
137. Srinet, R.; Nandy, S.; Padalia, H.; Ghosh, S.; Watham, T.; Patel, N.R.; Chauhan, P. Mapping plant functional types in Northwest
Himalayan foothills of India using random forest algorithm in Google Earth Engine. Int. J. Remote Sens. 2020, 41, 7296–7309.
[CrossRef]
138. Long, X.; Li, X.; Lin, H.; Zhang, M. Mapping the vegetation distribution and dynamics of a wetland using adaptive-stacking
and Google Earth Engine based on multi-source remote sensing data. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2021, 102, 102453.
[CrossRef]
139. Yan, D.; Li, J.; Yao, X.; Luan, Z. Quantifying the Long-Term Expansion and Dieback of Spartina Alterniflora Using Google Earth
Engine and Object-Based Hierarchical Random Forest Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14,
9781–9793. [CrossRef]
140. Wu, N.; Shi, R.; Zhuo, W.; Zhang, C.; Zhou, B.; Xia, Z.; Tao, Z.; Gao, W.; Tian, B. A Classification of Tidal Flat Wetland Vegetation
Combining Phenological Features with Google Earth Engine. Remote Sens. 2021, 13, 443. [CrossRef]
141. Pipia, L.; Amin, E.; Belda, S.; Salinero-Delgado, M.; Verrelst, J. Green LAI Mapping and Cloud Gap-Filling Using Gaussian
Process Regression in Google Earth Engine. Remote Sens. 2021, 13, 403. [CrossRef]
142. Zou, Z.; Dong, J.; Menarguez, M.A.; Xiao, X.; Qin, Y.; Doughty, R.B.; Hooker, K.V.; Hambright, K.D. Continued decrease of open
surface water body area in Oklahoma during 1984–2015. Sci. Total Environ. 2017, 595, 451–460. [CrossRef]
143. Chen, F.; Zhang, M.; Tian, B.; Li, Z. Extraction of Glacial Lake Outlines in Tibet Plateau Using Landsat 8 Imagery and Google
Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4002–4009. [CrossRef]
144. Wang, C.; Jia, M.; Chen, N.; Wang, W. Long-Term Surface Water Dynamics Analysis Based on Landsat Imagery and the Google
Earth Engine Platform: A Case Study in the Middle Yangtze River Basin. Remote Sens. 2018, 10, 1635. [CrossRef]
145. Lin, S.; Novitski, L.N.; Qi, J.; Stevenson, R.J. Landsat TM/ETM+ and machine-learning algorithms for limnological studies and
algal bloom management of inland lakes. J. Appl. Remote Sens. 2018, 12, 026003. [CrossRef]
146. Griffin, C.G.; McClelland, J.W.; Frey, K.E.; Fiske, G.; Holmes, R.M. Quantifying CDOM and DOC in major Arctic rivers during
ice-free conditions using Landsat TM and ETM+ data. Remote Sens. Environ. 2018, 209, 395–409. [CrossRef]
147. Isikdogan, L.F.; Bovik, A.; Passalacqua, P. Seeing Through the Clouds with DeepWaterMap. IEEE Geosci. Remote Sens. Lett. 2019,
17, 1662–1666. [CrossRef]
148. Fang, Y.; Li, H.; Wan, W.; Zhu, S.; Wang, Z.; Hong, Y.; Wang, H. Assessment of Water Storage Change in China’s Lakes and
Reservoirs over the Last Three Decades. Remote Sens. 2019, 11, 1467. [CrossRef]
149. Fuentes, I.; Padarian, J.; van Ogtrop, F.; Vervoort, R.W. Vervoort Comparison of Surface Water Volume Estimation Methodologies
That Couple Surface Reflectance Data and Digital Terrain Models. Water 2019, 11, 780. [CrossRef]
150. Markert, K.N.; Markert, A.M.; Mayer, T.; Nauman, C.; Haag, A.; Poortinga, A.; Bhandari, B.; Thwal, N.S.; Kunlamai, T.;
Chishtie, F.; et al. Comparing Sentinel-1 Surface Water Mapping Algorithms and Radiometric Terrain Correction Processing in
Southeast Asia Utilizing Google Earth Engine. Remote Sens. 2020, 12, 2469. [CrossRef]
151. Wang, Y.; Li, Z.; Zeng, C.; Xia, G.; Shen, H. An Urban Water Extraction Method Combining Deep Learning and Google Earth
Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 768–781. [CrossRef]
152. Peterson, K.T.; Sagan, V.; Sloan, J.J. Deep Learning-Based Water Quality Estimation and Anomaly Detection Using Landsat-
8/Sentinel-2 Virtual Constellation and Cloud Computing. GISci. Remote Sens. 2020, 57, 510–525. [CrossRef]
153. Wang, L.; Xu, M.; Liu, Y.; Liu, H.; Beck, R.; Reif, M.; Emery, E.; Young, J.; Wu, Q. Mapping Freshwater Chlorophyll-a Concentrations
at a Regional Scale Integrating Multi-Sensor Satellite Observations with Google Earth Engine. Remote Sens. 2020, 12, 3278.
[CrossRef]
154. Boothroyd, R.J.; Williams, R.D.; Hoey, T.B.; Barrett, B.; Prasojo, O.A. Applications of Google Earth Engine in fluvial geomorphology
for detecting river channel change. WIREs Water 2020, 8, e21496. [CrossRef]
Remote Sens. 2022, 14, 3253 106 of 110

155. Weber, S.J.; Mishra, D.R.; Wilde, S.B.; Kramer, E. Risks for cyanobacterial harmful algal blooms due to land management and
climate interactions. Sci. Total Environ. 2019, 703, 134608. [CrossRef] [PubMed]
156. Mayer, T.; Poortinga, A.; Bhandari, B.; Nicolau, A.P.; Markert, K.; Thwal, N.S.; Markert, A.; Haag, A.; Kilbride, J.; Chishtie, F.; et al.
Deep learning approach for Sentinel-1 surface water mapping leveraging Google Earth Engine. ISPRS Open J. Photogramm. Remote
Sens. 2021, 2, 100005. [CrossRef]
157. Li, J.; Peng, B.; Wei, Y.; Ye, H. Accurate extraction of surface water in complex environment based on Google Earth Engine and
Sentinel-2. PLoS ONE 2021, 16, e0253209. [CrossRef]
158. Li, Y.; Niu, Z. Systematic method for mapping fine-resolution water cover types in China based on time series Sentinel-1 and 2
images. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2021, 106, 102656. [CrossRef]
159. Farda, N.M. Multi-temporal Land Use Mapping of Coastal Wetlands Area using Machine Learning in Google Earth Engine. IOP
Conf. Series Earth Environ. Sci. 2017, 98, 012042. [CrossRef]
160. Amani, M.; Mahdavi, S.; Afshar, M.; Brisco, B.; Huang, W.; Mohammad Javad Mirzadeh, S.; White, L.; Banks, S.; Montgomery, J.;
Hopkinson, C. Canadian Wetland Inventory using Google Earth Engine: The First Map and Preliminary Results. Remote Sens.
2019, 11, 842. [CrossRef]
161. Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Homayouni, S.; Gill, E. The First Wetland Inventory Map of Newfoundland
at a Spatial Resolution of 10 m Using Sentinel-1 and Sentinel-2 Data on the Google Earth Engine Cloud Computing Platform.
Remote Sens. 2019, 11, 43. [CrossRef]
162. DeLancey, E.R.; Kariyeva, J.; Bried, J.T.; Hird, J. Large-scale probabilistic identification of boreal peatlands using Google Earth
Engine, open-access satellite data, and machine learning. PLoS ONE 2019, 14, e0218165. [CrossRef]
163. Wu, Q.; Lane, C.R.; Li, X.; Zhao, K.; Zhou, Y.; Clinton, N.; DeVries, B.; Golden, H.E.; Lang, M.W. Integrating LiDAR data and
multi-temporal aerial imagery to map wetland inundation dynamics using Google Earth Engine. Remote Sens. Environ. 2019, 228,
1–13. [CrossRef] [PubMed]
164. Zhang; Zhang; Dong; Liu; Gao; Hu; Wu Mapping Tidal Flats with Landsat 8 Images and Google Earth Engine: A Case Study of
the China’s Eastern Coastal Zone circa 2015. Remote Sens. 2019, 11, 924. [CrossRef]
165. Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Brisco, B.; Homayouni, S.; Gill, E.; DeLancey, E.R.; Bourgeau-Chavez, L. Big
Data for a Big Country: The First Generation of Canadian Wetland Inventory Map at a Spatial Resolution of 10-m Using Sentinel-1
and Sentinel-2 Data on the Google Earth Engine Cloud Computing Platform. Can. J. Remote Sens. 2020, 46, 15–33. [CrossRef]
166. Hakdaoui, S.; Emran, A.; Pradhan, B.; Qninba, A.; El Balla, T.; Mfondoum, A.H.N.; Lee, C.-W.; Alamri, A.M. Assessing the
Changes in the Moisture/Dryness of Water Cavity Surfaces in Imlili Sebkha in Southwestern Morocco by Using Machine Learning
Classification in Google Earth Engine. Remote Sens. 2020, 12, 131. [CrossRef]
167. DeLancey, E.R.; Simms, J.F.; Mahdianpari, M.; Brisco, B.; Mahoney, C.; Kariyeva, J. Comparing Deep Learning and Shallow
Learning for Large-Scale Wetland Classification in Alberta, Canada. Remote Sens. 2019, 12, 2. [CrossRef]
168. Mahdianpari, M.; Brisco, B.; Granger, J.E.; Mohammadimanesh, F.; Salehi, B.; Banks, S.; Homayouni, S.; Bourgeau-Chavez, L.;
Weng, Q. The Second Generation Canadian Wetland Inventory Map at 10 Meters Resolution Using Google Earth Engine. Can. J.
Remote Sens. 2020, 46, 360–375. [CrossRef]
169. Wang, X.; Xiao, X.; Zou, Z.; Chen, B.; Ma, J.; Dong, J.; Doughty, R.B.; Zhong, Q.; Qin, Y.; Dai, S.; et al. Tracking annual changes of
coastal tidal flats in China during 1986–2016 through analyses of Landsat images with Google Earth Engine. Remote Sens. Environ.
2018, 238, 110987. [CrossRef]
170. Mahdianpari, M.; Jafarzadeh, H.; Granger, J.E.; Mohammadimanesh, F.; Brisco, B.; Salehi, B.; Homayouni, S.; Weng, Q. A large-
scale change monitoring of wetlands using time series Landsat imagery on Google Earth Engine: A case study in Newfoundland.
GISci. Remote Sens. 2020, 57, 1102–1124. [CrossRef]
171. Sahour, H.; Kemink, K.M.; O’Connell, J. Integrating SAR and Optical Remote Sensing for Conservation-Targeted Wetlands
Mapping. Remote Sens. 2021, 14, 159. [CrossRef]
172. Jia, M.; Wang, Z.; Mao, D.; Ren, C.; Wang, C.; Wang, Y. Rapid, robust, and automated mapping of tidal flats in China using time
series Sentinel-2 images and Google Earth Engine. Remote Sens. Environ. 2021, 255, 112285. [CrossRef]
173. van Deventer, H.; Cho, M.A.; Mutanga, O. Multi-season RapidEye imagery improves the classification of wetland and dryland
communities in a subtropical coastal region. ISPRS J. Photogramm. Remote Sens. 2019, 157, 171–187. [CrossRef]
174. Ye, X.-C.; Meng, Y.-K.; Xu, L.-G.; Xu, C.-Y. Net primary productivity dynamics and associated hydrological driving factors in the
floodplain wetland of China’s largest freshwater lake. Sci. Total Environ. 2019, 659, 302–313. [CrossRef] [PubMed]
175. Dalezios, N.R.; Dercas, N.; Eslamian, S.S. Water scarcity management: Part 2: Satellite-based composite drought analysis. Int. J.
Glob. Environ. Issues 2018, 17, 262. [CrossRef]
176. Zhang, M.; Lin, H. Wetland classification using parcel-level ensemble algorithm based on Gaofen-6 multispectral imagery and
Sentinel-1 dataset. J. Hydrol. 2022, 606, 127462. [CrossRef]
177. Guo, Y.; Jia, X.; Paull, D.; Benediktsson, J.A. Nomination-favoured opinion pool for optical-SAR-synergistic rice mapping in face
of weakened flooding signals. ISPRS J. Photogramm. Remote Sens. 2019, 155, 187–205. [CrossRef]
178. Goldblatt, R.; You, W.; Hanson, G.; Khandelwal, A.K. Detecting the Boundaries of Urban Areas in India: A Dataset for Pixel-Based
Image Classification in Google Earth Engine. Remote Sens. 2016, 8, 634. [CrossRef]
179. Huang, C.; Yang, J.; Jiang, P. Assessing Impacts of Urban Form on Landscape Structure of Urban Green Spaces in China Using
Landsat Images Based on Google Earth Engine. Remote Sens. 2018, 10, 1569. [CrossRef]
Remote Sens. 2022, 14, 3253 107 of 110

180. Xu, H.; Wei, Y.; Liu, C.; Li, X.; Fang, H. A Scheme for the Long-Term Monitoring of Impervious−Relevant Land Disturbances
Using High Frequency Landsat Archives and the Google Earth Engine. Remote Sens. 2019, 11, 1891. [CrossRef]
181. Zhong, Q.; Ma, J.; Zhao, B.; Wang, X.; Zong, J.; Xiao, X. Assessing spatial-temporal dynamics of urban expansion, vegetation
greenness and photosynthesis in megacity Shanghai, China during 2000–2016. Remote Sens. Environ. 2019, 233, 111374. [CrossRef]
182. Lin, Y.; Zhang, H.; Lin, H.; Gamba, P.E.; Liu, X. Incorporating synthetic aperture radar and optical images to investigate the
annual dynamics of anthropogenic impervious surface at large scale. Remote Sens. Environ. 2020, 242, 111757. [CrossRef]
183. Liu, D.; Chen, N.; Zhang, X.; Wang, C.; Du, W. Annual large-scale urban land mapping based on Landsat time series in Google
Earth Engine and OpenStreetMap data: A case study in the middle Yangtze River basin. ISPRS J. Photogramm. Remote Sens. 2019,
159, 337–351. [CrossRef]
184. Mugiraneza, T.; Nascetti, A.; Ban, Y. Continuous Monitoring of Urban Land Cover Change Trajectories with Landsat Time Series
and LandTrendr-Google Earth Engine Cloud Computing. Remote Sens. 2020, 12, 2883.
185. Lin, J.; Jin, X.; Ren, J.; Liu, J.; Liang, X.; Zhou, Y. Rapid Mapping of Large-Scale Greenhouse Based on Integrated Learning
Algorithm and Google Earth Engine. Remote Sens. 2021, 13, 1245. [CrossRef]
186. Carneiro, E.; Lopes, W.; Espindola, G. Urban Land Mapping Based on Remote Sensing Time Series in the Google Earth Engine
Platform: A Case Study of the Teresina-Timon Conurbation Area in Brazil. Remote Sens. 2021, 13, 1338. [CrossRef]
187. Zhang, Z.; Wei, M.; Pu, D.; He, G.; Wang, G.; Long, T. Assessment of Annual Composite Images Obtained by Google Earth Engine
for Urban Areas Mapping Using Random Forest. Remote Sens. 2021, 13, 748. [CrossRef]
188. Samat, A.; Gamba, P.; Wang, W.; Luo, J.; Li, E.; Liu, S.; Du, P.; Abuduwaili, J. Mapping Blue and Red Color-Coated Steel Sheet
Roof Buildings over China Using Sentinel-2A/B MSIL2A Images. Remote Sens. 2022, 14, 230. [CrossRef]
189. Parks, S.A.; Holsinger, L.M.; Koontz, M.J.; Collins, L.; Whitman, E.; Parisien, M.-A.; Loehman, R.A.; Barnes, J.L.; Bourdon, J.-F.;
Boucher, J.; et al. Giving Ecological Meaning to Satellite-Derived Fire Severity Metrics across North American Forests. Remote
Sens. 2019, 11, 1735. [CrossRef]
190. Quintero, N.; Viedma, O.; Urbieta, I.R.; Moreno, J.M. Assessing Landscape Fire Hazard by Multitemporal Automatic Classification
of Landsat Time Series Using the Google Earth Engine in West-Central Spain. Forests 2019, 10, 518. [CrossRef]
191. Long, T.; Zhang, Z.; He, G.; Jiao, W.; Tang, C.; Wu, B.; Zhang, X.; Wang, G.; Yin, R. 30 m Resolution Global Annual Burned Area
Mapping Based on Landsat Images and Google Earth Engine. Remote Sens. 2019, 11, 489. [CrossRef]
192. Bar, S.; Parida, B.R.; Pandey, A.C. Landsat-8 and Sentinel-2 based Forest fire burn area mapping using machine learning algorithms
on GEE cloud platform over Uttarakhand, Western Himalaya. Remote Sens. Appl. Soc. Environ. 2020, 18, 100324. [CrossRef]
193. Sulova, A.; Arsanjani, J.J. Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning
within Google Earth Engine. Remote Sens. 2021, 13, 10. [CrossRef]
194. Zhang, Z.; He, G.; Long, T.; Tang, C.; Wei, M.; Wang, W.; Wang, G. Spatial Pattern Analysis of Global Burned Area in 2005 Based
on Landsat Satellite Images. IOP Conf. Ser. Earth Environ. Sci. 2020, 428, 012078. [CrossRef]
195. Seydi, S.; Akhoondzadeh, M.; Amani, M.; Mahdavi, S. Wildfire Damage Assessment over Australia Using Sentinel-2 Imagery and
MODIS Land Cover Product within the Google Earth Engine Cloud Platform. Remote Sens. 2021, 13, 220. [CrossRef]
196. Arruda, V.L.; Piontekowski, V.J.; Alencar, A.; Pereira, R.S.; Matricardi, E.A. An alternative approach for mapping burn scars using
Landsat imagery, Google Earth Engine, and Deep Learning in the Brazilian Savanna. Remote Sens. Appl. Soc. Environ. 2021, 22, 100472.
[CrossRef]
197. Waller, E.K.; Villarreal, M.L.; Poitras, T.B.; Nauman, T.W.; Duniway, M.C. Landsat time series analysis of fractional plant cover
changes on abandoned energy development sites. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2018, 73, 407–419. [CrossRef]
198. Lobo, F.D.L.; Souza-Filho, P.W.M.; Novo, E.M.L.D.M.; Carlos, F.M.; Barbosa, C.C.F. Mapping Mining Areas in the Brazilian
Amazon Using MSI/Sentinel-2 Imagery (2017). Remote Sens. 2018, 10, 1178. [CrossRef]
199. Xiao, W.; Deng, X.; He, T.; Chen, W. Mapping Annual Land Disturbance and Reclamation in a Surface Coal Mining Region Using
Google Earth Engine and the LandTrendr Algorithm: A Case Study of the Shengli Coalfield in Inner Mongolia, China. Remote
Sens. 2020, 12, 1612. [CrossRef]
200. Balaniuk, R.; Isupova, O.; Reece, S. Mining and Tailings Dam Detection in Satellite Imagery Using Deep Learning. Sensors 2020,
20, 6936. [CrossRef]
201. Fuentes, M.; Millard, K.; Laurin, E. Big geospatial data analysis for Canada’s Air Pollutant Emissions Inventory (APEI): Using
google earth engine to estimate particulate matter from exposed mine disturbance areas. GISci. Remote Sens. 2019, 57, 245–257.
[CrossRef]
202. He, T.; Xiao, W.; Zhao, Y.; Deng, X.; Hu, Z. Identification of waterlogging in Eastern China induced by mining subsidence: A case
study of Google Earth Engine time-series analysis applied to the Huainan coal field. Remote Sens. Environ. 2020, 242, 111742.
[CrossRef]
203. Zhou, L.; Luo, T.; Du, M.; Chen, Q.; Liu, Y.; Zhu, Y.; He, C.; Wang, S.; Yang, K. Machine Learning Comparison and Parameter
Setting Methods for the Detection of Dump Sites for Construction and Demolition Waste Using the Google Earth Engine. Remote
Sens. 2021, 13, 787. [CrossRef]
204. Chrysoulakis, N.; Mitraka, Z.; Gorelick, N. Exploiting satellite observations for global surface albedo trends monitoring. Arch.
Meteorol. Geophys. Bioclimatol. Ser. B 2018, 137, 1171–1179. [CrossRef]
Remote Sens. 2022, 14, 3253 108 of 110

205. Chastain, R.; Housman, I.; Goldstein, J.; Finco, M.; Tenneson, K. Empirical Cross Sensor Comparison of Sentinel-2A and 2B MSI,
Landsat-8 OLI, and Landsat-7 ETM Top of Atmosphere Spectral Characteristics over the Conterminous United States. Remote
Sens. Environ. 2019, 221, 274–285. [CrossRef]
206. Demuzere, M.; Bechtel, B.; Mills, G. Global transferability of local climate zone models. Urban Clim. 2018, 27, 46–63. [CrossRef]
207. Ranagalage, M.; Murayama, Y.; Dissanayake, D.; Simwanda, M. The Impacts of Landscape Changes on Annual Mean Land
Surface Temperature in the Tropical Mountain City of Sri Lanka: A Case Study of Nuwara Eliya (1996–2017). Sustainability 2019,
11, 5517. [CrossRef]
208. Medina-Lopez, E.; Ureña-Fuentes, L. High-Resolution Sea Surface Temperature and Salinity in the Global Ocean from Raw
Satellite Data. Remote Sens. 2019, 11, 2191. [CrossRef]
209. Besnard, S.; Carvalhais, N.; Arain, M.A.; Black, A.; Brede, B.; Buchmann, N.; Chen, J.; Clevers, J.; Dutrieux, L.P.; Gans, F.; et al.
Memory effects of climate and vegetation affecting net ecosystem CO2 fluxes in global forests. PLoS ONE 2019, 14, e0211510.
[CrossRef]
210. Elnashar, A.; Zeng, H.; Wu, B.; Zhang, N.; Tian, F.; Zhang, M.; Zhu, W.; Yan, N.; Chen, Z.; Sun, Z.; et al. Downscaling TRMM
Monthly Precipitation Using Google Earth Engine and Google Cloud Computing. Remote Sens. 2020, 12, 3860. [CrossRef]
211. Yu, B.; Chen, F.; Muhammad, S. Analysis of satellite-derived landslide at Central Nepal from 2011 to 2016. Environ. Earth Sci.
2018, 77, 331. [CrossRef]
212. Cho, E.; Jacobs, J.M.; Jia, X.; Kraatz, S. Identifying Subsurface Drainage using Satellite Big Data and Machine Learning via Google
Earth Engine. Water Resour. Res. 2019, 55, 8028–8045. [CrossRef]
213. Uddin; Uddin; Matin; Meyer Operational Flood Mapping Using Multi-Temporal Sentinel-1 SAR Images: A Case Study from
Bangladesh. Remote Sens. 2019, 11, 1581. [CrossRef]
214. Vanama, V.S.K.; Mandal, D.; Rao, Y.S. GEE4FLOOD: Rapid mapping of flood areas using temporal Sentinel-1 SAR images with
Google Earth Engine cloud platform. J. Appl. Remote Sens. 2020, 14, 034505. [CrossRef]
215. Ghaffarian, S.; Rezaie Farhadabad, A.; Kerle, N. Post-Disaster Recovery Monitoring with Google Earth Engine. Appl. Sci. 2020, 10, 4574.
[CrossRef]
216. Kakooei, M.; Baleghi, Y. A two-level fusion for building irregularity detection in post-disaster VHR oblique images. Earth Sci.
Inform. 2020, 13, 459–477. [CrossRef]
217. Padarian, J.; Minasny, B.; McBratney, A. Using Google’s cloud-based platform for digital soil mapping. Comput. Geosci. 2015, 83,
80–88. [CrossRef]
218. Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Kempen, B.; de Sousa, L. Global mapping of soil salinity change. Remote
Sens. Environ. 2019, 231, 111260. [CrossRef]
219. Poppiel, R.R.; Lacerda, M.P.C.; Safanelli, J.L.; Rizzo, R.; Oliveira, M.P., Jr.; Novais, J.J.; Demattê, J.A.M. Mapping at 30 m Resolution
of Soil Attributes at Multiple Depths in Midwest Brazil. Remote Sens. 2019, 11, 2905. [CrossRef]
220. Cao, B.; Domke, G.M.; Russell, M.B.; Walters, B.F. Spatial modeling of litter and soil carbon stocks on forest land in the
conterminous United States. Sci. Total Environ. 2018, 654, 94–106. [CrossRef]
221. Greifeneder, F.; Notarnicola, C.; Wagner, W. A Machine Learning-Based Approach for Surface Soil Moisture Estimations with
Google Earth Engine. Remote Sens. 2021, 13, 2099. [CrossRef]
222. Zhang, M.; Zhang, M.; Yang, H.; Jin, Y.; Zhang, X.; Liu, H. Mapping Regional Soil Organic Matter Based on Sentinel-2A and
MODIS Imagery Using Machine Learning Algorithms and Google Earth Engine. Remote Sens. 2021, 13, 2934. [CrossRef]
223. Gómez-Chova, L.; Amorós-López, J.; Mateo-García, G.; Muñoz-Marí, J.; Camps-Valls, G. Cloud masking and removal in remote
sensing image time series. J. Appl. Remote Sens. 2017, 11, 015005. [CrossRef]
224. Mateo-García, G.; Gómez-Chova, L.; Amorós-López, J.; Muñoz-Marí, J.; Camps-Valls, G. Multitemporal Cloud Masking in the
Google Earth Engine. Remote Sens. 2018, 10, 1079. [CrossRef]
225. Yin, Z.; Ling, F.; Foody, G.M.; Li, X.; Du, Y. Cloud detection in Landsat-8 imagery in Google Earth Engine based on a deep
convolutional neural network. Remote Sens. Lett. 2020, 11, 1181–1190. [CrossRef]
226. Li, J.; Wang, L.; Liu, S.; Peng, B.; Ye, H. An automatic cloud detection model for Sentinel-2 imagery based on Google Earth Engine.
Remote Sens. Lett. 2021, 13, 196–206. [CrossRef]
227. Zhang, X.; Qiu, Z.; Peng, C.; Ye, P. Removing cloud cover interference from Sentinel-2 imagery in Google Earth Engine by fusing
Sentinel-1 SAR data with a CNN model. Int. J. Remote Sens. 2021, 43, 132–147. [CrossRef]
228. Meraner, A.; Ebel, P.; Zhu, X.X.; Schmitt, M. Cloud removal in Sentinel-2 imagery using a deep residual neural network and
SAR-optical data fusion. ISPRS J. Photogramm. Remote Sens. 2020, 166, 333–346. [CrossRef]
229. Carrasco-Escobar, G.; Manrique, E.; Ruiz-Cabrejos, J.; Saavedra, M.; Alava, F.; Bickersmith, S.; Prussing, C.; Vinetz, J.M.; Conn, J.;
Moreno, M.; et al. High-accuracy detection of malaria vector larval habitats using drone-based multispectral imagery. PLoS Negl.
Trop. Dis. 2019, 13, e0007105. [CrossRef]
230. Ascensão, F.; Yogui, D.R.; Alves, M.; Medici, E.P.; Desbiez, A. Predicting spatiotemporal patterns of road mortality for medium-
large mammals. J. Environ. Manag. 2019, 248, 109320. [CrossRef]
231. Lyons, M.B.; Brandis, K.J.; Murray, N.J.; Wilshire, J.H.; McCann, J.A.; Kingsford, R.T.; Callaghan, C.T. Monitoring large and
complex wildlife aggregations with drones. Methods Ecol. Evol. 2019, 10, 1024–1035. [CrossRef]
Remote Sens. 2022, 14, 3253 109 of 110

232. Pérez-Romero, J.; Navarro-Cerrillo, R.M.; Palacios-Rodriguez, G.; Acosta, C.; Mesas-Carrascosa, F.J. Improvement of Remote
Sensing-Based Assessment of Defoliation of Pinus spp. Caused by Thaumetopoea pityocampa Denis and Schiffermüller and
Related Environmental Drivers in Southeastern Spain. Remote Sens. 2019, 11, 1736.
233. Liss, B.; Howland, M.D.; Levy, T.E. Testing Google Earth Engine for the automatic identification and vectorization of archaeological
features: A case study from Faynan, Jordan. J. Archaeol. Sci. Rep. 2017, 15, 299–304. [CrossRef]
234. Orengo, H.; Garcia-Molsosa, A. A brave new world for archaeological survey: Automated machine learning-based potsherd
detection using high-resolution drone imagery. J. Archaeol. Sci. 2019, 112, 105013. [CrossRef]
235. Orengo, H.A.; Conesa, F.C.; Garcia-Molsosa, A.; Lobo, A.; Green, A.S.; Madella, M.; Petrie, C.A. Automated detection of
archaeological mounds using machine-learning classification of multisensor and multitemporal satellite data. Proc. Natl. Acad.
Sci. USA 2020, 117, 18240–18250. [CrossRef] [PubMed]
236. Hagenaars, G.; de Vries, S.; Luijendijk, A.P.; de Boer, W.P.; Reniers, A.J. On the accuracy of automated shoreline detection derived
from satellite imagery: A case study of the sand motor mega-scale nourishment. Coast. Eng. 2018, 133, 113–125. [CrossRef]
237. Vos, K.; Harley, M.D.; Splinter, K.D.; Simmons, J.A.; Turner, I.L. Sub-annual to multi-decadal shoreline variability from publicly
available satellite imagery. Coast. Eng. 2019, 150, 160–174. [CrossRef]
238. Cao, W.; Zhou, Y.; Li, R.; Li, X. Mapping changes in coastlines and tidal flats in developing islands using the full time series of
Landsat images. Remote Sens. Environ. 2020, 239, 111665. [CrossRef]
239. Traganos, D.; Poursanidis, D.; Aggarwal, B.; Chrysoulakis, N.; Reinartz, P. Estimating Satellite-Derived Bathymetry (SDB) with
the Google Earth Engine and Sentinel-2. Remote Sens. 2018, 10, 859. [CrossRef]
240. Sagawa, T.; Yamashita, Y.; Okumura, T.; Yamanokuchi, T. Satellite Derived Bathymetry Using Machine Learning and Multi-
Temporal Satellite Images. Remote Sens. 2019, 11, 1155. [CrossRef]
241. Tedesche, M.E.; Trochim, E.D.; Fassnacht, S.R.; Wolken, G.J. Extent Changes in the Perennial Snowfields of Gates of the Arctic
National Park and Preserve, Alaska. Hydrology 2019, 6, 53. [CrossRef]
242. Qi, M.; Liu, S.; Yao, X.; Xie, F.; Gao, Y. Monitoring the Ice Phenology of Qinghai Lake from 1980 to 2018 Using Multisource Remote
Sensing Data and Google Earth Engine. Remote Sens. 2020, 12, 2217. [CrossRef]
243. Yang, L.; Cervone, G. Analysis of remote sensing imagery for disaster assessment using deep learning: A case study of flooding
event. Soft Comput. 2019, 23, 13393–13408. [CrossRef]
244. Davies, D.K.; Murphy, K.J.; Michael, K.; Becker-Reshef, I.; Justice, C.O.; Boller, R.; Braun, S.A.; Schmaltz, J.E.; Wong, M.M.; Pasch,
A.N.; et al. The Use of NASA LANCE Imagery and Data for Near Real-Time Applications. In Time-Sensitive Remote Sensing;
Lippitt, C.D., Stow, D.A., Coulter, L.L., Eds.; Springer: New York, NY, USA, 2015; pp. 165–182, ISBN 9781493926022.
245. Lippitt, C.D.; Stow, D.A.; Riggan, P.J. Application of the remote-sensing communication model to a time-sensitive wildfire
remote-sensing system. Int. J. Remote Sens. 2016, 37, 3272–3292. [CrossRef]
246. Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; de Las Casas, D.; Hendricks, L.A.; Welbl, J.;
Clark, A.; et al. Training Compute-Optimal Large Language Models. arXiv 2022, arXiv:2203.15556.
247. Banko, M.; Brill, E. Scaling to Very Very Large Corpora for Natural Language Disambiguation. In Proceedings of the 39th Annual
Meeting of the Association for Computational Linguistics, Association for Computational Linguistics. Toulouse, France, 6–11 July
2001; pp. 26–33.
248. Gil Press Andrew Ng Launches A Campaign for Data-Centric AI. Available online: https://www.forbes.com/sites/gilpress/20
21/06/16/andrew-ng-launches-a-campaign-for-data-centric-ai/ (accessed on 25 April 2022).
249. Pratt, L.Y. Discriminability-Based Transfer between Neural Networks. In Advances in Neural Information Processing Systems 5;
Hanson, S.J., Cowan, J.D., Giles, C.L., Eds.; Morgan-Kaufmann: Burlington, MA, USA, 1993; pp. 204–211.
250. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [CrossRef]
251. Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 1345–1459. [CrossRef]
252. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Proceedings of the International
Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2018; pp. 270–279.
253. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE
2021, 109, 43–76. [CrossRef]
254. Li, C.; Zhang, S.; Qin, Y.; Estupinan, E. A systematic review of deep transfer learning for machinery fault diagnosis. Neurocomputing
2020, 407, 121–135. [CrossRef]
255. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
256. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
257. Bar, Y.; Diamant, I.; Wolf, L.; Lieberman, S.; Konen, E.; Greenspan, H. Chest Pathology Detection Using Deep Learning with
Non-Medical Training. In Proceedings of the 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York,
NY, USA, 16–19 April 2015; pp. 294–297.
258. Maaten, L.; Chen, M.; Tyree, S.; Weinberger, K. Learning with Marginalized Corrupted Features. In Proceedings of the International
Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 410–418.
259. Gillies, M.; Fiebrink, R.; Tanaka, A.; Garcia, J.; Bevilacqua, F.; Heloir, A.; Nunnari, F.; Mackay, W.; Amershi, S.; Lee, B.; et al.
Human-Centred Machine Learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in
Computing Systems 2016, San Jose, CA, USA, 7–12 May 2016.
Remote Sens. 2022, 14, 3253 110 of 110

260. Wu, Q. geemap: A Python package for interactive mapping with Google Earth Engine. J. Open Source Softw. 2020, 5, 2305.
[CrossRef]
261. Aybar, C.; Wu, Q.; Bautista, L.; Yali, R.; Barja, A. rgee: An R package for interacting with Google Earth Engine. J. Open Source
Softw. 2020, 5, 2272. [CrossRef]
262. Huntington, J.L.; Hegewisch, K.C.; Daudert, B.; Morton, C.G.; Abatzoglou, J.T.; McEvoy, D.J.; Erickson, T. Climate Engine: Cloud
Computing and Visualization of Climate and Remote Sensing Data for Advanced Natural Resource Monitoring and Process
Understanding. Bull. Am. Meteorol. Soc. 2017, 98, 2397–2410. [CrossRef]
263. Li, H.; Wan, W.; Fang, Y.; Zhu, S.; Chen, X.; Liu, B.; Hong, Y. A Google Earth Engine-enabled software for efficiently generating
high-quality user-ready Landsat mosaic images. Environ. Model. Softw. 2018, 112, 16–22. [CrossRef]
264. Yang, L.; Driscol, J.; Sarigai, S.; Wu, Q.; Lippitt, C.D.; Morgan, M. Towards Synoptic Water Monitoring Systems: A Review of AI
Methods for Automating Water Body Detection and Water Quality Monitoring Using Remote Sensing. Sensors 2022, 22, 2416.
[CrossRef] [PubMed]

You might also like