Remote Sensing
Remote Sensing
Review
Google Earth Engine and Artificial Intelligence (AI):
A Comprehensive Review
Liping Yang 1,2,3, * , Joshua Driscol 1,2 , Sarigai Sarigai 1,2 , Qiusheng Wu 4 , Haifei Chen 5
and Christopher D. Lippitt 1,2
Abstract: Remote sensing (RS) plays an important role gathering data in many critical
domains (e.g., global climate change, risk assessment and vulnerability reduction of natural hazards,
resilience of ecosystems, and urban planning). Retrieving, managing, and analyzing large amounts
of RS imagery poses substantial challenges. Google Earth Engine (GEE) provides a scalable, cloud-
based, geospatial retrieval and processing platform. GEE also provides access to the vast majority of
freely available, public, multi-temporal RS data and offers free cloud-based computational power
for geospatial data analysis. Artificial intelligence (AI) methods are a critical enabling technology to
automating the interpretation of RS imagery, particularly on object-based domains, so the integration
of AI methods into GEE represents a promising path towards operationalizing automated RS-based
monitoring programs. In this article, we provide a systematic review of relevant literature to identify
Citation: Yang, L.; Driscol, J.;
Sarigai, S.; Wu, Q.; Chen, H.;
recent research that incorporates AI methods in GEE. We then discuss some of the major challenges
Lippitt, C.D. Google Earth Engine of integrating GEE and AI and identify several priorities for future research. We developed an
and Artificial Intelligence (AI): A interactive web application designed to allow readers to intuitively and dynamically review the
Comprehensive Review. Remote Sens. publications included in this literature review.
2022, 14, 3253. https://doi.org/
10.3390/rs14143253 Keywords: Google Earth Engine (GEE); artificial intelligence (AI); machine learning; deep learning;
Academic Editor: Jaime Zabalza
computer vision; remote sensing; cloud computing; geospatial big data; review
science (GIScience) and remote sensing (RS) [6]. Efficient collection, management, storage,
analysis, and visualization of big data have become critical for the development of intelli-
gent decision systems and provide unprecedented opportunities for business, science, and
engineering [7]. Handling the 5 “Vs” (volume, variety, velocity, veracity, and value [8]) of
big data is still a very challenging task. This is even more challenging for RS imagery due
to its large volume (i.e., high resolution and multiple bands) and long timespan; geospatial
big data pose significant challenges to conventional geographic information systems (GIS)
as well as RS approaches and platforms [9–13].
Geospatial big data, especially RS big data, have posed substantial challenges due
to their large volume, high spatial-temporal resolution, and complexity. One of the very
promising and practical solutions for analyzing RS big data is Google Earth Engine (GEE).
GEE is a scalable, cloud-based geospatial retrieval and processing platform. It also provides
access to the vast majority of freely available, public, multi-temporal RS data and offers free
cloud-based computational power for geospatial data analysis [14–16]. More specifically,
GEE provides free access to a multi-PB archive of geospatial datasets spanning over 40 years
of historical and current Earth observation (EO) imagery, including satellite imagery (e.g.,
Sentinel from the European Space Agency (ESA), Landsat from the United States Geologi-
cal Survey (USGS), Moderate Resolution Imaging Spectroradiometer (MODIS) from the
National Aeronautics and Space Administration (NASA), the Cropland Data Layer (CDL)
from the United States Department of Agriculture’s (USDA) and National Agricultural
Statistics Service (NASS), and the National Agriculture Imagery Program (NAIP), also from
the USDA), airborne imagery, weather and climate datasets, as well as digital elevation
models (DEMs) [14,16]. Those RS data can be efficiently imported and processed on the
cloud platform, avoiding the need to download data to local computers for processing [17].
Along with computing and storage resources, GEE also supports many RS algorithms
(e.g., image enhancement, image classification, and cloud masking), which are readily
accessible and customizable and allow data processing and visualization at different scales
through JavaScript or Python Application Program Interfaces (APIs) [14,16,18,19]. These ca-
pabilities reduce most of the time-consuming preprocessing steps needed in traditional RS
approaches. The computational power of GEE along with its comprehensive data catalog
and data processing methods make GEE an ideal platform for solving geospatial big data
problems. GEE allows researchers and practitioners to focus on developing and solving
their domain problems by making it easier to retrieve data and algorithms and to compute
all in one place. For example, the Landsat archive on GEE is already preprocessed for
atmospheric and topographic effects—this saves researchers and practitioners a substantial
amount of time and effort in terms of downloading and preprocessing data [16]. GEE,
with free planetary-scale geospatial big data (solved the data availability, data storage, and
data preprocessing challenges) and free computing resources, facilitates computationally
cumbersome geospatial big data analysis for researchers and practitioners with minimal
local computing and storage resources. GEE, in the parlance of the RS Communication
Model, reduces the number of channels required to construct an RS system, and therefore
the time required to go from query to result [20]. Researchers from a wide range of fields are
able to generate multiscale (local, national, regional, continental, and global scale) insights
that would have been nearly impossible without the geospatial big data and computing
capacity available in GEE [21].
GEE provides the free cloud-computing platform to tackle geospatial big data chal-
lenges, and recent substantial advances in artificial intelligence (AI) can and will further
elevate the power of GEE. We cover three of AI’s main subdisciplines in this paper: com-
puter vision (CV), machine learning (ML) and its subdomain, deep learning (DL). These
technologies are central to leveraging big data for applications in many domains and have
achieved significant advances in a wide range of applications that have a high social impact,
such as damage assessment and prediction of natural disasters (e.g., automatic flooding
damage assessment [1] and wildfire prediction [22]) and healthcare [23–25]. Geospatial
artificial intelligence (GeoAI) combines methods in spatial science (e.g., GIScience and RS),
Remote Sens. 2022, 14, 3253 3 of 110
AI, data mining, and high-performance computing to extract meaningful knowledge from
geospatial big data [26]. GeoAI stems from GIScience methods applied to RS data but has
advanced the field of AI to solve geospatial-specific big data challenges and problems.
There are substantial separate bodies of research covering AI (especially CV, ML and DL)
and GEE. However, much less research directly combines AI and GEE. Allowing researchers
and practitioners to harness the power of both GEE and AI for their research and real-world
problems is the core motivation for us to investigate a range of recent developments that
combine GEE and AI. Thus, our paper can serve as an academic bridge for researchers and
practitioners in GEE and AI, highlighting how scientists are using GEE and AI and in which
domain areas. Researchers and practitioners in GEE and AI will gain strength from each other
and thus make the science move forward more effectively and efficiently, making it possible
to tackle global challenges such as those relevant to climate change.
1.1. Selection Criterion for Reviewed Papers and Brief Graphic Summary
There is a substantial body of work on GEE (e.g., see recent reviews [14,18,27–29]) and
AI for RS (especially DL, ML and CV used in a RS setting, see recent reviews in [30,31]),
respectively. However, much less research has gone into detailing the integration of GEE
with AI. In the literature review process, we initially identified 500+ papers relevant to GEE.
We then performed a systematic search based on the following strategies: (1) keyword
search on Google Scholar: the keywords used for our literature search are “Google Earth
Engine” AND “machine learning” OR “deep learning” OR “computer vision”; (2) references
tracking: we went through the papers cited in recent GEE reviews ([14,18,27–29]) (i.e., the
“References” list of the papers) and also tracked the last two years’ worth of new papers
citing the existing GEE review papers on their Google Scholar page. Note that our search
was restricted to research articles published in English and in peer-reviewed journals or
conference proceedings. A total number of 200 highly relevant articles were identified
by excluding the papers that purely use GEE for RS data download or those that do not
use AI (including its branches CV, ML, DL). Figure 1 shows the spatial distribution and
statistics summary of the papers covered in this review. The number of published papers by
year (2015 to 2022) has dramatically increased since 2019. “Remote Sensing” and “Remote
Sensing of Environment” are the leading journals where most GEE and AI papers are
published. In addition, most first authors institutions are based in China and the United
States. (Note that a freely accessible interactive version of the map and all charts throughout
the paper can be accessed via our web app tool; the web app tool URL and its brief demo
video are provided in Appendix A).
1.2. Roadmap
Here, we provide a roadmap to the rest of the paper. Section 2 outlines the scope
of this review and our intended audience. Section 3 is the core of the paper, focused
on identifying important and recent developments and their implications in terms of
applications (Section 3.2) and novel methods (Section 3.3) that leverage GEE and AI.
Section 3 covers a wide array of recent research combining GEE and AI from multiple
domains with many cross-connections. The paper concludes in Section 4 with a discussion
of key challenges and opportunities, from both application (Section 4.2) and technical
(Section 4.3) perspectives. Specifically, we focus on the main challenges preventing GEE
and AI integration, as well as some possible future research directions. To make the
substantial number of papers we reviewed (200 total) more transparent and easier to
retrieve and understand, we developed an interactive web tool (see Appendix A for details).
As evaluation metrics are essential for measuring the performance of AI/ML/DL/CV
models, we provide a set of commonly used evaluation metrics in Appendix B. To make
the main text of the paper concise, each application area detailed in Section 3.2 contains a
table and brief textual summary of the papers in that field. However, a more detailed and
comprehensive summary for each section can be found in Appendix C for those that are
Remote Sens. 2022, 14, 3253 4 of 110
(a)
for the GEE and AI research and practice community. Through our deep, thorough, and
interactive investigation (see Appendix A for a visual, interactive investigation using our
web app iLit4GEE-AI), we hope to develop a basis for a smoother and deeper integration of GEE
and AI, which will help move many domains forward. Further, many of the domains presented
in this paper (Section 3.2) are highly related, as different aspects of our environment are
inherently linked. By aggregating research across domains and making it searchable and
filterable, we hope to spur innovation, collaboration, and code sharing between researchers
in the pursuit of tackling cross-disciplinary, complex issues such as those related to global
warming. For example, water body identification, deforestation monitoring, and wildfire
detection are all separate domains, but researchers and practitioners in different domains
may use common data sources, processing methods, and algorithms in their final results.
As we continue to compile papers written at the intersection of GEE and AI via our web
app tool iLit4GEE-AI, it will become easier for researchers to find relevant literature and
code resources even if they are from different areas of study.
Figure
Figure 2. Word-cloud
Word-cloud visualization of all the reviewed 200 papers that leverage GEE
GEE and
and AI.
AI.
Figure 3 shows
Figure shows that most published work leveraging
leveraging thethe power of GEE integrated
with AI is still at the application stage and that there is room to to
with AI is still at the application stage and that there is room develop
develop novel
novel methods
methods to
to advance earth observation in relevant fields. To break this down further,
advance earth observation in relevant fields. To break this down further, in (b) we can in (b) we can
see
see that
that ML
ML is theisdominant
the dominant method,
method, and inand in (c)
(c) the the most-used
most-used tasks
tasks are are classification.
classification. In
In Figure
Figure 4, the primary applications that have applied GEE integrated with
4, the primary applications that have applied GEE integrated with AI are crop, LULC,AI are crop, LULC,
vegetation, wetland,
vegetation, wetland,water,
water,and
andforest,
forest, and
and that
that primary
primary studystudy areas
areas are China,
are China, Brazil,
Brazil, and
and the United States. The most-used RS data types are Landsat 8 OLI,
the United States. The most-used RS data types are Landsat 8 OLI, and Sentinel-2. From and Sentinel-2.
Figure 5, we see that the most-used ML models are RF, SVM, and CART, while the top
evaluation metrics used are: overall accuracy (OA), producer’s accuracy (PA), user’s
accuracy (UA), and Kappa. (Note that a freely accessible interactive version of the map
and charts can be accessed via our web app tool; the web app tool URL and its brief demo
Figure 3 shows that most published work leveraging the power of GEE integrated
with AI is still at the application stage and that there is room to develop novel methods to
advance earth observation in relevant fields. To break this down further, in (b) we can see
that ML is the dominant method, and in (c) the most-used tasks are classification. In Figure
4, the primary applications that have applied GEE integrated with AI are crop, LULC,
Remote Sens. 2022, 14, 3253 7 of 110
vegetation, wetland, water, and forest, and that primary study areas are China, Brazil, and
the United States. The most-used RS data types are Landsat 8 OLI, and Sentinel-2. From
Figure 5, we see that the most-used ML models are RF, SVM, and CART, while the top
From Figuremetrics
evaluation 5, we see thatare:
used the overall
most-used ML models
accuracy (OA),are RF, SVM,accuracy
producer’s and CART, while
(PA), the
user’s
top evaluation metrics used are: overall accuracy (OA), producer’s accuracy
accuracy (UA), and Kappa. (Note that a freely accessible interactive version of the map (PA), user’s
accuracy
and charts(UA),
can beand Kappa.via
accessed (Note that app
our web a freely
tool;accessible interactive
the web app tool URLversion
and its of thedemo
brief map
and charts can be accessed via our web app tool; the web app tool URL and its
video are provided in Appendix A; also, in Appendix B, we provide our resources for an brief demo
video are provided
introduction of a listinofAppendix
commonly A;used
also,evaluation
in Appendix B, we provide our resources for an
metrics).
introduction of a list of commonly used evaluation metrics).
(a) (b)
Remote Sens. 2022, 14, 3253 8 of 110
a specific paper used novel methods or went beyond a straight-forward application of data
and methods on GEE. In this paper, we define a very narrow view of what a novel method
is: using ML/DL/CV models and algorithms in new ways on GEE (see Section 3.3 for more
details on novel GEE methods). This means that even if a paper combined data in a new
way or formed a new data preprocessing method, their paper was deemed an application
(a) our focus is on ML/DL/CV
since (b) methods. Each paper (c)we reviewed in Section 3.2 is
grouped into
Figure a specific
4. Statistics subsection,
of the ranging
reviewed papers from
in terms 2 total citations
of application focus (a), (Bathymetric mapping)
study area (b), and RS to
37 citations
data type(Crop mapping).
used (c).
(a) (b)
FigureFigure 5. Statistics
5. Statistics of modelscompared,
of models compared, and
andevaluation
evaluationmetrics used used
metrics in the in
reviewed 200 papers.
the reviewed 200 papers.
(a) Models used and compared in the reviewed studies and (b) evaluation metrics used.
(a) Models used and compared in the reviewed studies and (b) evaluation metrics used.
3.2. Advances in Applications
Note that papers could be divided up into several different sections. Thus, there is
We organized the following subsections according to total citation count. Thus,
some readers
subjectivity in this
will start in theassignment of categories
thematic research area withto different
the papers,ofsocitations
highest number readersthat
should be
awareusethat there is on
ML/DL/CV overlap.
GEE. AsGenerally,
readers movemost papers
through could
Section 3.2, be
theyclassified as covering
will then be LULC or “land
use and land
topics cover”.
with Agriculture,
a less developed vegetation,
presence on GEE water,
(that theand forests
authors are are all of)
aware classes commonly
that also
utilizeinML/DL/CV.
classified For each Our
LULC analyses. subsection, a table
intention withtoinformation
was represent such as study
the focus ofarea, RS paper.
a given
data type,if and
For example, what predicted
a paper sort of ML/DL model or
for several CV algorithm
classes the authors
in their analysis butused
theirwill
focus was
accompany each reference. Note that each table in this section is ordered chronologically
on producing deforestation maps, even for papers that had “LULC” in their title, then
to show trends in data type and model usage. Each table will be accompanied by a word
their paper will be under “Forest and deforestation monitoring”. Similarly, if a paper was
cloud showing terms from paper titles and keywords given by the authors. For Sections
creating
3.2.8–3.2.18, theremaps
vegetation ofenough
are not tidal flats, then this
publications paper
to make theisword
under “Vegetation
clouds mapping”
informative, so in and
not “Wetland
addition tomapping”. As another
titles and keywords example,
we also include if authors
the abstractwere monitoring
text. Below vegetation
each table are or
wateraccompanying
indices in RSsummaries
imagery but theirreference
for each goal wasintothemonitor reclamation
table. References withprogress
an “*” nextortopollution
levels at mining sites, their papers would be found under “Heavy industry and pollution
monitoring”. Only in the case where the goal was to expressly create a general LULC map
would that paper go under “Land cover classification”.
(CART), and k-means models; and the most-used evaluation metrics are user’s accuracy
(UA), producer’s accuracy (PA), Kappa, and R2 . A brief summary of those studies is
provided right below Table 1. More detailed textual summaries for most of the reviewed
crop mapping studies are provided in Appendix C.1.
Table 1. Studies targeting crop mapping from RS imagery using AI (Note that references marked *
denotes novel methods and will be detailed in Section 3.3).
Table 1. Cont.
Word-cloudvisualization
Figure6.6.Word-cloud
Figure visualization ofof
allall
thethe reviewed
reviewed papers
papers targeting
targeting crop
crop mapping
mapping (i.e.,(i.e.,
thosethose
37
37 papers
papers summarized
summarized in Table
in Table 1). 1).
Agricultural expansion can cause harmful effects to ecosystems and their levels of
biodiversity. Producing crop-type maps using RS imagery and ML is one way to help mon-
itor agricultural expansion over large areas, and these maps in turn can help policymakers
and land-use managers make more informed decisions about current and future land use.
However, creating the maps themselves normally requires a lot of data and it is not a
straightforward task to pick an ML model that will perform well with that data. There is
also the concern that the predictions from that ML model will be uninterpretable, given that
many ML and DL models are so-called “black boxes”. To get around this issue, the authors
in [79] trained a maximum likelihood model and a fuzzy-rules classifier to determine paddy
rice distribution in Iran. Plants look very different in RS imagery depending on the type of
imagery that you use, but also over the course of a plant’s lifetime. This is especially true
of crops like rice, so it is important to incorporate phenological information in order to be
able to monitor it over time. Over a three-year time period, the authors in [75] were able
to map paddy rice using Sentinel imagery by utilizing several different spectral indices
and creating composites of different paddy rice growth periods. Continued agricultural
expansion threatens many ecosystems around the globe with high levels of biodiversity.
Being able to monitor agricultural expansion is one part in being able to make timely
decisions related to water and soil health in addition to pollution levels caused by fertilizer
use. Mapping croplands over a large scale with NNs and high-resolution RS imagery has
resulted in highly accurate maps, but NNs are computationally expensive to train. A U-Net
was used in [71] to map sugarcane in Thailand but used a lightweight NN as an encoder for
the DL model to reduce computing costs. Sugarcane grows in rainy conditions in complex
landscapes, making mapping it difficult. However, using phenology information can help
identify sugarcane in high-resolution RS imagery as shown in [56]. The performance of
ANN to CART, RF, and SVM models on GEE was compared for sugarcane mapping in
China using Sentinel-2 imagery. Shade-grown coffee landscapes are critical to biodiversity
Remote Sens. 2022, 14, 3253 12 of 110
in the forested tropics, but mapping it is difficult because of mountainous terrain, cloud
cover, and spectral similarities to more traditional forested landscapes. Landsat, precip-
itation, and DEM data were used in [50] to map shade-grown coffee in Nicaragua using
an RF model. Accuracy scores across different land class types (including shade-grown
coffee) were high; a relative variable importance was also analyzed on what data con-
tributed most to the RF model’s performance. It is difficult to know beforehand the effect
different datasets will have on producing LULC maps. It is therefore useful to compare the
performance of a ML classifier on different datasets like Landsat and Sentinel imagery, so
that future researchers know which datasets fit their application. The differences between
Landsat and Sentinel imagery were explored in [78] for identifying cotton in China over
the course of the plant’s life cycle.
Crop maps are increasingly being produced at the national and global levels, but this
process requires a lot of compute resources. Cloud computing offers free access to data and
computing, yet many studies producing crop maps and crop yield estimates do not take
advantage of these resources. In the United States, crop yield estimates for soybeans start very
late in the season, but early estimates are needed to inform management decisions like when
to harvest. The authors in [55] used a CNN–LSTM hybrid model to predict soybean yield in
the contiguous United States using RS imagery alongside weather data and showed that the
hybrid approach works better than either CNN or LSTM alone, although the results were
better in some states than others. Additionally, the authors created combinations of input
data to determine which variables were most important in training their NN. Still, the authors
had to move their DL training off the GEE platform because it did not currently support
NN architectures. Many variables, including climate/weather, fertilizer, soil, economic,
and hydrological data, can be incorporated into crop yield prediction simulation models.
However, this amount of data, needed to make the crop models accurate, is often not available
in specific countries or are too time-consuming and cost-intensive to collect and maintain. RS
imagery can help fill this need by providing open data over long temporal scales and global
coverage, regardless of country. The authors in [66] demonstrated that by using climate and
soil data with RS imagery on the GEE platform, it was possible to predict winter wheat yields
1–2 months ahead of harvesting in China. Producing crop type maps is often a useful first step
in predicting crop yield. However, crop type maps that are derived from lower-resolution RS
data suffer from uncertainties in areas where soil, crops, and plants are heavily mixed. Current
cropland products only focus on a subset of staple crops. Optical and SAR Sentinel data were
combined in [72] to create higher-resolution maps capable of displaying information on less
commonly mapped non-staple crops in the US.
It is challenging to map cropland extent over large countries or regions in a rapid,
repeatable, and accurate manner. This is in part due to the large amount of RS imagery
that are usually required to make these maps, in addition to needing to access validation
datasets in comparable formats across geo-political boundaries. Even when this is possible,
crop maps are created using coarse RS imagery, limiting the utility of the output crop maps.
In [16], the authors feed RS imagery with elevation and government data in Australia
and China into an RF model to produce crop extent maps at 30 m, 250 m, and 1 km
resolutions. It is difficult to achieve continuous, cloud-free imagery in Australia and China
over time, so this analysis depends on creating bi-monthly composites. The authors noted
that this analysis could have benefitted from a larger dataset in addition to comparing more
classification algorithms to help reduce uncertainties from the RF model. LAI and fraction
of photosynthetically active radiation (FPAR) are two important features while trying to
produce crop extent maps and crop yield estimates. However, most current products for
producing crop extent maps and crop yield estimates are derived from low-resolution RS
imagery. In order to produce these maps and estimates at a higher resolution, the authors
in [76] utilized GEE, Sentinel-2 and field data to train an RF to first estimate LAI and FPAR
at a much finer spatial scale.
Global crop maps often fail to capture small farms because the resolution of the RS
imagery used to create the maps is too coarse. Additionally, agricultural areas change over
Remote Sens. 2022, 14, 3253 13 of 110
time and so the underlying validation (which is hard to acquire in the first place) often changes.
Thus, producing high-resolution maps that track agricultural areas that are able to track crop
production over time in an accurate fashion has proved difficult. Landsat-8 and Sentinel-2
imagery were combined in [47] with elevation data to produce a crop map across continental
Africa on the GEE platform. Crop maps that are produced to cover a large area are often
created from coarse RS imagery. This poses problems with identifying small or fragmented
farms, as well as farms that are mixed-use or have several crop types over the same small area.
Several attempts have been made to map land-use classes over large areas, but these maps do
not focus specifically on crops and so their utility to food production studies is limited. To
address these issues, [54] used RS imagery from several different platforms (GeoEye, Landsat,
NGA, Quickbird, WorldView) to produce a 30-m resolution crop map for Southeast and
Northeast Asia. Using an RF model, the authors achieved high accuracy rates across several
crop type classes and made the resulting data layer public. However, to create cloud-free
scenes from optical imagery across countries, the authors had to rely on multi-year composites.
The authors noted that in the future, a harmonized Landsat–Sentinel dataset would be useful
to expand spatial and temporal data coverage.
Sustainable management of agricultural water resources requires improved under-
standing of irrigation patterns in space and time. Annual irrigation maps (1999–2016) in the
US Northern High Plains were produced in [49] by combining all available Landsat satellite
imagery with climate and soil covariables in an RF classification workflow. In [51], the
authors implemented an automatic irrigation mapping procedure in GEE that uses surface
reflectance satellite imagery from different sensors (Landsat 7/8, Sentinel-2, MODIS Terra
and Aqua imagery, SRTM DEM). A rapid method was developed to map Landsat-scale
(30 m) irrigated croplands in [58] across the conterminous United States (CONUS). The
method was based upon an automatic generation of training samples for most areas based
on the assumptions that irrigated crops appear greener than non-irrigated crops and had
limited water stress.
Cropland classification is highly dependent on RS imagery resolution, the scale of a
given analysis, the processing steps, and the input training data. Coarse resolution cropland
data products have been found to contain large errors, but even higher resolution maps
tend to have low accuracy rates and overestimate overall crop area. An open-source map
was created in [19] for several West African countries using an RF model trained on Landsat
data. The amount of RS data collected is increasing every day. This poses a problem for
how best to analyze RS imagery and extract useful information from it, regardless of the
EO domain. The authors in [77] implemented a dynamic feature importance tool that
automatically finds the most important subset of input features for identifying crop types
in China. They then fed these features to the SNIC algorithm and then to an RF on GEE and
combined the output predictions with growth period information to produce crop-type
maps that incorporate plant phenology. By incorporating growth stage information as
an input feature to the ML model, the authors achieved a 6–7% boost in OA, precision,
and recall across different crops like rice, maize, and soybeans. In their paper, the authors
showed that red edge, NDVI, red, SWIR2, and aerosol information contributed the most to
their analysis. However, the authors themselves stated that their method was unstable due
to the nature of their feature importance algorithm. Depending on what data was chosen
with their feature importance algorithm, the accuracy of the method fluctuated. Thus, their
method was good for reducing data size and should be used when compute is limited,
though using all of the data in a given time series was shown to work better.
resolution data and creating maps over much larger areas. The words “Sentinel” and
“Africa” illustrate this point well.
Table 2. Studies targeting LULC from RS imagery using AI (Note that references marked * denotes
novel methods and will be detailed in Section 3.3).
Table 2. Cont.
Figure 7. Word-cloud visualization of all the reviewed papers targeting LULC application (i.e., those
Figure 7. Word-cloud visualization of all the reviewed papers targeting LULC application (i.e., those
27 papers summarized in Table 2).
27 papers summarized in Table 2).
From our interactive web app (see Appendix A) and Table 2, Landsat 8 OLI, SRTM
Table 2. Studies targeting LULC from RS imagery using AI (Note that references marked * denotes
DEM, and Google Earth are mostly used. The most popular AI models are RF, CART,
novel methods and will be detailed in Section 3.3).
and SVM, and the mostly used evaluation metrics are overall accuracy (OA), PA, UA, and
References Kappa.
Method A brief summary of those studies is provided
Model Comparison right below TableStudy
RS Data Type 2. More detailed
Area
textual summaries for most of the reviewed land cover classification studies are provided
in Appendix C.2. Landsat 8 OLI TOA,
Azzari and Lobell
classification RF Sentinel-2 MSI TOA, Zambia
(2017) [80]
SRTM DEM
DMSP NTL,
Midekisa et al. (2017) Globeland30, Hansen
classification RF Africa (continent)
Remote Sens. 2022, 14, 3253 16 of 110
LULC maps can help decision-makers and land managers make more informed
decisions about the environment. Still, producing LULC maps with ML and RS data
requires a lot of compute and labeled input training data. GEE currently offers free compute,
so researchers can use the data that they are interested in without having to worry about
hardware setup or compute time. The authors in [102] took advantage of this to create an
LULC map in Northern Iran, predicting for water, rangelands, built-up areas, orchards,
and other LULC classes. They used Landsat RS imagery, field observations, and historical
datasets to train CART, RF, and SVM models. The SVM performed better than the CART
and RF models, but perhaps more importantly the authors also ran a spatial uncertainty
analysis to show each model’s confidence level on the output maps. More research should
include uncertainty incorporated into reporting metrics or on maps produced with ML to
better convey a model’s certainty to both citizens and decision-makers.
There are currently high data and computational costs of having to store RS data
across different machines using different ML algorithms. There is an additional challenge
in that most RS analyses depend on optical data, which is often obscured by clouds and
shadows. In addition, most land cover maps have coarse resolution and do not often
describe the same things as other maps (making them not directly comparable). These
static maps need to be more accurate and updated frequently to be of real use, and cloud
computing alongside data and algorithms being in one place has allowed both of these
to become a reality. An RF model was used in [80] to determine land-use classes such as
vegetation, croplands, and urban areas from Landsat imagery in Zambia. An approach was
presented in [81] to quantify continental land cover and impervious surface changes over
continental Africa for 2000–2015 using Landsat images and an RF classifier on GEE. Simple
change detection based on Landsat images from two different years with two different
phenophases yields unsatisfactory results and may induce many misclassifications and
pseudo-change identifications because of the phenological differences between RS images.
A land-use/land-cover type discrimination method based on a CART was proposed in [82],
which applied change-vector analysis in posterior probability space (CVAPS) and the best
histogram maximum entropy method for change detection, and further improved the
accuracy of the land-updating results in combination with NDVI timing analysis. The last
land-cover map of Iran was produced with MODIS imagery in 2016. Now, there are much
higher resolution satellite data products, but it is difficult to collect more ground-truth
validation data. Cloud computing and ML can help produce newer land cover classification
maps that are easy to reuse. Such a workflow was designed in [93] on GEE for Iran using
Sentinel-1 and -2 data and an RF model and SNIC. With the ground-truth training samples
available, the authors used SNIC to segment land-use classes into objects while the RF
model classifies them on the pixel level.
Numerous efforts have been made to end poverty around the globe. Mapping land-
use changes in poverty areas can provide insights into the poverty reduction progress.
Landsat images available on GEE were utilized in [83] to map annual land-use changes in
China’s poverty-stricken areas. An open-source land cover mapping processing pipeline
was created in [87] using GEE. The authors argued that land cover maps specifically can
help countries properly plan for sustainable levels of food production, but that many
developing countries did not have the financial or compute resources to monitor land
classes in real time. Using SVM and bagged trees (BT) models, the authors predicted urban,
agriculture, tree, vegetation, water, and barren land-use types in Lesotho.
In RS imagery, many different land-use types have similar spectral signatures or
are very complex, making them difficult to be properly identified. Several different ML
models available on GEE were trained in [92] with different combinations of input data to
determine which were the most important in determining land-use types in Golden Gate
Highland Park in China. Although RS and ML have allowed LULC analysis to become
ever more accurate for general LULC classes, it is still challenging to correctly identify
land subtypes. For example, while classifying vegetation to a high degree of accuracy has
become more commonplace, identifying vegetation subtypes like shrubs or grassland is
Remote Sens. 2022, 14, 3253 17 of 110
not as straightforward, especially in mixed-use areas. In addition, as is the case for many
RS applications, it is challenging to know which types of input data will contribute to a
given ML model’s ability to learn these subtypes. Therefore, the authors in [95] set out
to compare the contribution of SAR data and different indices (NDVI, EVI, SAVI, NDWI)
derived from optical data on overall classifier performance. A land cover map of the whole
African continent at 10 m resolution was generated in [98], using multiple data sources
including Sentinel-2, Landsat-8, Global Human Settlement Layer (GHSL), Night Time Light
(NTL) Data, SRTM, and MODIS Land Surface Temperature (LST). Different combinations
of data sources were tried to determine the best data input configurations. Pixel-based
classification methods often suffer from “salt-and-pepper” noise in their end predictions.
Object-based classifiers can help alleviate this problem but are not commonly used because
of their high compute overhead. While GEE does not have many object-based classifiers, it
does provide free compute. To take advantage of this while comparing the performance of
pixel-based and object-based classification methods, [100] produced LULC maps in Italy
using Landsat, Planet, and Sentinel RS imagery. The authors compared the performance
of RF and SVM models alone with that of the same models used in conjunction with the
SNIC and gray-level co-occurrence matrix (GLCM) texture data. Their results showed
that pixel-based methods worked better at lower resolutions (i.e., using Landsat data),
whereas object-based methods worked better for higher-resolution RS imagery. The best
classifier was the RF model trained with SNIC and incorporating GLCM data. Still, the
authors noted that ML models were heavily influenced by input data, feature engineering,
the classes that you were trying to predict for, and the place being studied. Many studies
evaluate ML methods and the effect that input data sources have on their performance.
Not as much research is done into determining how data sampling strategies affect ML
classifiers. The authors in [101] compared different data sampling strategies and their
effects on how different ML classifiers performed on LULC tasks. A multi-seasonal sample
set was collected in [88] for global land cover mapping in 2015 from Landsat 8 images.
The concept of “stable classification” was used to approximately determine how much
reduction in training sample and how much land cover change or image interpretation
errors can be acceptable.
Mountain Land Cover (MLC) classification can be relatively challenging due to high
spatial heterogeneity and the cloud contamination in optical satellite imagery over the
mountainous areas. Distribution of Land Cover (LC) classes in these areas is mostly
imbalanced. To date, three approaches have been proposed to address the class imbalance
problem: (1) applying specific classification methods by focusing on the learning of minority
classes, (2) assigning higher weights on minority classes by adjusting classifiers, and
(3) rebalancing training datasets (e.g., oversampling and under-sampling techniques). A
hybrid data-balancing method, called the Partial Random Over-Sampling and Random
Under-Sampling (PROSRUS), was proposed in [96] to resolve the class imbalance issue. The
class imbalance problem reduces classification accuracy for infrequent and rare LC classes.
A new method was proposed in [97] by integrating random under-sampling of majority
classes and an ensemble of Support Vector Machines, namely Random Under-sampling
Ensemble of Support Vector Machines (RUESVMs). Rapid urban expansion puts pressure
on local ecosystems and human well-being, so urban sustainability studies are increasingly
turning to applications that process large amounts of geospatial data and model ecosystem
services. Currently, it is not straightforward for urban or ecology scientists to use cloud-
based platforms like GEE as their processing routines are more complicated than the many
common mapping applications (i.e., classification) available on GEE. While determining
ecosystem service values is complicated (many disciplines, many opinions, etc.), GEE
was used in [94] to illustrate a processing workflow for how LULC classes can be used to
compute more complex ecosystem service values.
Watersheds around the world are under stress, both due to climate change and human
disturbance. LULC maps can help with planning and conservation decisions, but they are
often difficult to make because they are compute-intensive to make. GEE has helped many
Remote Sens. 2022, 14, 3253 18 of 110
researchers by providing freely available data, methods, and compute, but researchers
often find that they run into compute limits on the platform before they can complete
their analyses. To overcome these compute limits in GEE, the authors in [103] used feature
reduction techniques and designed their own parallel processing algorithms to produce
an LULC map across several Middle Eastern countries. To get a better idea of how water
resources were being affected by LULC classes, the authors combined topographic data,
spectral data, RS image composites, and texture information to train a combined SNIC-RF
model. They achieved high accuracy across several LULC classes and showed feature
importances for each class in their analysis. However, the authors noted that other than
SNIC, advanced object-based classification and segmentation algorithms were not available
on GEE.
Table 3. Studies targeting forest change and deforestation from RS imagery using AI.
Table 3. Cont.
Figure 8.
Figure 8. Word-cloud
Word-cloud visualization
visualization of
of all
all the
the reviewed
reviewed papers
papers targeting
targeting forest
forest and
and deforestation
deforestation
monitoring (i.e., those 20 papers summarized in Table 3).
monitoring (i.e., those 20 papers summarized in Table 3).
TableForests
3. Studies targeting
provide manyforest change and
ecosystem deforestation
services, from RS imagery
from preventing using AI.
soil erosion, regulating the
hydrological cycle, and providing shelter for many plant and animal species. However,
References Method is occurring
deforestation Model Comparison
at a rate that is makingRS Data Type for individual
it impossible Studyspecies
Area to
recover. As deforestation accelerates, there are cascading effects for entire ecosystems. In
Lee et al. (2016) [107] classification CART, MD, RF Landsat 8 Indonesia
Brazil, agriculture, ranching, and land occupation is causing the vast forest of the Amazon
to become fragmented. Still, it is difficult to monitor the changes through time due to cloud
ALOS PALSAR,
cover and the rate that new satellite imagery comes in every day. The authors in [117]
GlobeLand30-2010,
showed how GEE can be used to overcome data storage and compute needs and analyze
Hansen Global
about 20 year’s worth of Landsat data to determine Forestchanges. Land use maps
forest cover
Change
can help inform policymakers and land-use managers but are dataset, JRCoften static and of coarse
resolution. It would be more Yearly Water
Wang et al. (2019) [15] classification RF useful to create these maps in a repeatable manner, Brazil one in
which code and data could be reused for making Classification
decisions History,
based on up-to-date information.
Sentinel-2 data were analyzed in [120] andLandsat
several 5different
TM, Landsat 7
ML classifiers were trained to
distinguish between four different forest types in Italy during
ETM+, RapidEye, both summer and winter
seasons. Monitoring tree species distribution is an importantUSGS
TerraClass-2010, metric in monitoring overall
forest health and in determining current Global carbonTree
storage efforts.
Cover 2010 However, doing so is
difficult without the use of high-resolution RS data, much of which is either private and
inaccessible or too expensive to collect (in the case ofEarth,
Google LiDAR or UAS data). Recent research
Landsat
Voight et al. (2019) CART, Markov MSS, Landsat 5 TM,
classification Belize
[108] Chain model, MLP Landsat 7 ETM+, Landsat
8 OLI
Remote Sens. 2022, 14, 3253 20 of 110
types there. The Amazon Rainforest is home to much of the world’s biodiversity and plays
an important role in natural carbon sequestration. However, this region is experiencing
high rates of deforestation due to the expansion of agriculture and cattle farming. It remains
challenging, though, to monitor such a large area given its size and biological complexity
and use that information to produce forest change projections into the future. An RF
was used in [122] for initial LULC classification, then used an MLP to simulate possible
deforestation scenarios into the future.
Mapping how much carbon forests sequester remains difficult because current tech-
niques rely on mapping forested versus deforested landscapes. However, a major source of
uncertainty stems from the fact that degraded forests, ones open to selective logging, are
not a separate class but can emit carbon heavily even though they are counted as forested
regions. This issue was addressed in [15] by mapping disturbed forest areas in Brazil using
27 years of Landsat surface reflectance imagery.
Table 4. Studies targeting wetland mapping from RS imagery using AI (Note that references marked
* denotes novel methods and will be detailed in Section 3.3).
Table 4. Cont.
Word cloud
Figure9.9. Word
Figure cloud visualization
visualizationofofallallthe reviewed
the papers
reviewed targeting
papers vegetation
targeting mapping
vegetation (i.e., those
mapping (i.e.,
18 papers summarized in Table 4).
those 18 papers summarized in Table 4).
information. RS data can help monitor rangelands with a large spatial scope and a short
return time, making them key to informing land management decisions in a timely manner.
Using climate and field data alongside Landsat imagery and MODIS land-use maps, ML
models used in [21] were able to predict for several important rangeland indicators like
plant height, total vegetation and rock cover, as well as bare soil.
Invasive species can degrade ecosystems and harm biodiversity as well as soil and water
quality. It is often difficult to monitor invasive species in coastal environments from optical RS
imagery, though, because of frequent cloud cover. A specific invasive species in China was
used in [136] as a case study for developing an ML pipeline that takes into account both cloud
cover and phenological information. Invasive species can have harmful environmental effects
as they disrupt ecosystem balances. Long-term datasets, like those of the grass S. alterniflora,
are not always available, making them difficult to detect using RS methods. In order to
produce a map of this invasive species, field data were collected and processed in [139] in
addition to UAS imagery and optical RS data from several different platforms.
It is often difficult to detect changes in savanna landscapes due to their high hetero-
geneity in vegetation types, which makes it even harder to attribute change to natural or
anthropogenic causes. This is especially problematic in areas like the Brazilian Cerrado
where agricultural expansion is happening on a large scale. In order to clarify what changes
have been happening there, over three decades worth of Landsat imagery was used in [135]
to determine which areas have experienced vegetation change. Wetlands provide many
ecosystem services and provide important habitats for several different plant and animal
species. In order to make informed conservation and policy decisions, it is important not
only to be able to map the current state of wetlands vegetation, but how that vegetation is
changing over time. However, different sets of input data and ML methods used for change
detection of wetland vegetation need to be evaluated more fully as choices made during
preprocessing and hyperparameter tuning can affect the end result of an analysis. The
authors in [138] used an adaptive stacking algorithm to train an ML classifier on optical,
SAR, and DEM data to identify wetland vegetation.
Seagrasses provide many ecosystem services, from carbon storage, providing habitat
for many marine species, and preventing coastal erosion. However, they are in decline
due to anthropogenic impacts. Mapping their extent is key to being able to conserve
them. Bathymetry and RS data were combined in [127] to create a processing and analysis
pipeline for large-scale seagrass habitat monitoring in Greece using GEE. Grasslands
are often integrated into land-use type or cropland-specific maps, even high-resolution
products. However, different grassland species are not identified and thus are classified as
a single homogenous land or crop type. This is a problem not just because previous maps
have not separated out different grassland types, but it is difficult to recognize them in RS
imagery because they look very similar. Some experts are able to recognize such classes,
but it is time-consuming to analyze grassland types at scale. Thus, DL techniques that do
not rely on expert knowledge are needed so that these identification systems can work over
large areas over time. A CNN–LSTM hybrid model was used in [132] to identify grassland
types in Sentinel-2 imagery in the United States.
Feature engineering is important in ML, but it is labor-intensive and often requires
domain expertise [1]. As one ML branch, DL does not need feature engineering, as deep
NN will figure it out from large, annotated data examples, but DL requires much more
large training data than ML [1]. The authors in [43] addressed this issue by comparing the
performance of an RF model with feature engineering to an LSTM and U-Net NN models
without feature engineering for identifying pasturelands in Brazil. Monitoring vegetation
on a large spatial scale can be difficult because field data collection takes only snapshots
in time and is labor-intensive and expensive. Instead, methods for measuring vegetation
need to be done over time so that change detection is possible. Still, novel methods, such
as those utilizing RS imagery, need to meet current governmental quality standards. An
example of how this can be done is illustrated in [126] in Australia using the GEE platform
by comparing how well several ML classifiers compare to index-based methods like NDVI.
Remote Sens. 2022, 14, 3253 24 of 110
Although coastal wetland systems are critical habitats for different animal and plant species,
it is difficult to monitor them due to cloud cover and difficulty in obtaining RS imagery
at high and low tides. Previous studies have used single images or spectral time series
to try and identify wetland vegetation, but coastal wetland environments are complex
ecosystems. The same species of plant can look different at different stages of its life while
also being submerged under water in some RS scenes. The authors in [140] argued that
phenology information in RS time series can better capture tidal flat wetland vegetation
and so compared phenology information to statistical (min, max, median) and temporal
features (quartile ranges). Mapping plant functional types is important because it can give
ecosystem modelers and environmental planners a better idea of the spatial distribution of
vegetation. This in turn has implications on how resilient areas and ecosystems are and will
be to changing climatic factors like heat stress. However, plant function type classification
relies on and is often derived directly from current LULC map products that themselves
can contain inaccuracies. To explore how plant functional types can be derived directly
from RS information, [137] trained an RF model on field, DEM, MODIS, and climate data.
Many methods have been developed to estimate different vegetative properties from
RS imagery in response to environmental changes. One such method, Gaussian Process
Regression (GPR), is increasingly used to do so because it is a transparent ML model that also
outputs model uncertainties. However, as environmental and earth scientists move to GEE for
finding and processing data, they may find a lack of GPR models ready to use or train. This is
most likely because GPR models become slow and memory-intensive when trained on large
RS time series imagery. Such a model was implemented in [141] that has been optimized for
green LAI in RS imagery but does so in a way that is also optimized for GEE.
of those studies is provided right below Table 5. More detailed textual summaries for most
of the reviewed water mapping studies are provided in Appendix C.5.
Table 5. Studies targeting water body detection from RS imagery using AI (Note that references
marked * denotes novel methods and will be detailed in Section 3.3).
Table 5. Studies targeting water body detection from RS imagery using AI (Note that references
marked * denotes novel methods and will be detailed in Section 3.3).
Static surface water maps are often produced at the regional or national level, but do
not show long-term trends resulting from seasonality or global warming’s effects. In [32],
Remote Sens. 2022, 14, 3253 26 of 110
the authors created a web portal using GEE as a backend alongside an expert system to
identify bodies of water in Landsat imagery. RS has been widely used to map and monitor
surface water. In [142], the authors used all available Landsat images to study surface
water dynamics in Oklahoma from 1984 to 2015. The authors in [142] found significant
inter-annual variations in the number of surface water bodies and surface water areas.
They also found that both the number of surface water bodies and surface water areas had
a positive relationship with precipitation and a negative relationship with temperature.
Floods and heavy precipitation events often occur at times of heavy cloud cover,
making optical imagery not well-suited to water mapping or flood monitoring during those
times. Traditionally, ground-based gauges are used to monitor water level and stream flow,
but only work at specific points, limiting their utility during large-scale flood events. SAR
imagery, however, is often used in water mapping or flood monitoring analyses because
of its ability to see through clouds and work over large spatial scales. This is especially
important for monsoonal regions like Southeast Asia where intense rains can lead to flood
conditions. However, SAR imagery is also susceptible to classification errors when flooding
occurs under tree cover or looks like concrete/pavement in urban areas, so preprocessing
steps should be carefully considered. The authors in [150] analyzed to what degree different
preprocessing steps affect the output water maps using both SAR and DEM data and two
variations of Otsu’s thresholding algorithm. Glacial lake outburst floods (GLOF) are one
of the serious natural hazards in the Himalayan region. To reduce the potential risks of
GLOF, the information about the location and spatial distribution of glacial lakes is critical.
In [143], the authors used Landsat 8 images available on GEE to map glacial lakes in the
Tibet Plateau region. Their results revealed that climate warming played a major role in
glacial lake changes.
Categorizing urban water resources faces two main challenges. First, it is often difficult
to distinguish between water and things like asphalt or shadows in urban settings using RS
imagery. Second, the distribution of water resources has changed alongside the accelerating
impacts of climate change, making up-to-date, temporally aware water monitoring difficult.
GEE provides free data storage, datasets, and compute, but as of yet high-accuracy DL
models like NNs are not available on the platform. In [151], the authors compared the
performance of MNDWI and an RF to that of a multi-scale CNN (MSCNN) and showed
that the DL method was the most accurate (with less false classifications) for identifying
urban water resources in several Chinese cities. While DL receives a lot of attention in
water mapping research, these models still require a lot of input data and large amounts
of compute to train them. However, as compute becomes publicly available in cloud-
based platforms like GEE, obtaining large amounts of labeled training data remains a key
bottleneck to using DL models. One way to make the data labeling process less time- and
resource-intensive was illustrated in [156], where the authors used current water maps and
a segmentation algorithm to automatically collect data labels from Sentinel-1 imagery.
Optical imagery used in surface water mapping analyses is often occluded by clouds,
and many common methods used to map surface water confuse snow, ice, rock, and
shadows as water. DeepWaterMapv2 was released in [147] and aimed to address these
false positive misclassifications.
ML models have achieved high levels of accuracy in identifying water bodies in RS
imagery. However, the models often misclassify soil, rock, clouds, ice, and shadow as water
and often rely on cloud-free, optical RS imagery, which is not always available. The authors
in [157] used masking, filtering, and segmentation algorithms to identify bodies of water in
Sri Lanka in complex, mountainous environments. It is challenging to repeatedly produce
up-to-date, accurate surface water maps over large areas. Water bodies change their shape
and overall distribution through time, and humans use water in ways that look dissimilar
to natural water bodies in RS imagery. Most studies to date focus on one type of water
body (lakes, rivers, etc.) or create a binary classification mask giving little to no detail on
various water body classification types. To explore the potential to distinguish between
surface water body subtypes, [158] used slope, shape, phenology, and flooding information
Remote Sens. 2022, 14, 3253 27 of 110
as input to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and
agricultural ponds.
The authors in [144] proposed a new method for quickly mapping yearly minimal
and maximal surface water extents. In [148], the authors integrated global surface water
(GSW) dataset and SRTM-DEM to determine the spatiotemporal patterns of water storage
changes in China’s lakes and reservoirs. Multitemporal, multispectral satellite observations
from the Landsat program and Sentinel constellation are particularly useful in fluvial
geomorphology, in which river channel mapping and the analysis of planimetric change
have long been a focus. The authors in [154] demonstrated a workflow showing how GEE
can be used to extract active river channel masks from a section of the Cagayan River
(Luzon, Philippines).
Satellite RS can be used to estimate chromophoric dissolved organic matter (CDOM)
as a riverine constituent that influences optical properties in surface waters. CDOM ab-
sorption is a common proxy for dissolved organic carbon (DOC) concentrations in inland
waters, including Arctic rivers. The authors in [146] stated that this was the first study
using GEE for RS of water quality parameters in inland waters. Collecting field data for
monitoring water quality can be costly in terms of money, time, and effort. Additionally,
traditional monitoring techniques do not extend over a large area and are often difficult to
repeat over time. Satellite RS imagery can help monitor water quality at frequent intervals
over large areas. To estimate water quality parameters like chlorophyll-a (Chl-a) concentra-
tions, turbidity, and dissolved organic matter, [152] used ML and DL models to analyze
RS imagery. Harmful algal blooms (HABs) have become a serious issue in freshwater
ecosystems. RS has proven to be a cost-effective means for monitoring HABs. The authors
in [153] developed a methodological framework for mapping Chl-a concentrations with
multi-sensor satellite observations and in-situ water quality samples.
Table 6. Cont.
Figure 11. Word-cloud visualization of all the reviewed papers targeting wetland mapping (i.e., those
16 papers
Figure 11. summarized
Word-cloud in Table 6).
visualization of all the reviewed papers targeting wetland mapping (i.e.,
those 16 papers summarized in Table 6).
Wetland serves as the globally biggest carbon pool, and thus has important ecolog-
ical service functions
Table 6. Studies targeting(e.g., water
wetland conservation,
mapping regulation,
from RS imagery usingand
AI. maintenance of species
diversity) [173–175]. Global climate change and human activities have posed dramatic chal-
References lenges
Methodin the past few decades
Model to wetland ecosystems,
Comparison and Type
RS Data wetland mapping is essential
Study Area to
conserve and manage terrestrial ecosystems [176]. RS makes investigating large wetland
systems and monitoring their change over time possible
LiDAR DTM,[177].
Sentinel-1,
Hird et al. (2017) [35] classification
Wetlands are highly BRT dynamic landscapes, often making past efforts Canada
to map them
Sentinel-2
out-of-date. This is especially true at the regional or national level, where it is often difficult
to monitor wetlands at scale
CART, due GMO
Fast NB, to their remote location and large spatial scale. While
Landsat 3 MMS, Landsat
Max Entropy, IKPamir,
5 TM, Landsat 7 ETM+,
Farda (2017) [159] classification MLP, Margin SVM, Indonesia
Landsat 8 OLI, ASTER
Pegasos, RF, Voting
GDEM
SVM, Winnow
Remote Sens. 2022, 14, 3253 29 of 110
there are efforts to monitor wetlands in Canada at the sub-regional and -province level,
this is mostly through governmental efforts to produce static maps. Cloud computing on
GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence
probability using LiDAR and RS data for the entire area of Alberta. Mapping subtypes
of wetlands is difficult because while they look similar in RS imagery, they are diverse
environments that cover a wide area. The same is true for classifying peatlands, a subtype
of wetlands, which cover large geographic areas in complex patterns. This is problematic
because peatlands, like wetlands, provide critical habitats that promote biodiversity while
also being a global carbon sink. Past studies have shown that while optical data are useful
for peatland mapping, it is often occluded by clouds or other atmospheric conditions. SAR
data, on the other hand, can detect bodies of water and vegetation at any time of day or
night, but are prone to being noisy due to surface moisture content and roughness. The
authors in [162] demonstrated that by combining SAR, optical, and LiDAR data on the GEE
platform, a BRT model was able to predict peatland occurrence across Alberta province
with relatively high accuracy at high resolution.
Due to the difficulties in producing wetland inventory maps, either from lack of
field data or the challenge of recognizing wetlands because of their heterogeneous and
fragmented nature, these maps are often only produced at a local level. Furthermore,
because of the many local efforts to produce these maps, wetland inventories are often
produced with different datasets and different methods, limiting the ability of interested
parties/stakeholders to compare or combine maps. Anthropogenic activities are meanwhile
converting these wetlands into agricultural or urban landscapes, in addition to natural
rain and flooding events changing their spatial makeup. Thus, it is more important than
ever to be able to produce wetland inventory class maps in order to monitor and protect
existing wetlands. The authors in [161] used optical and SAR RS imagery to produce a 10 m
resolution wetland map for the entire province of Newfoundland, Canada, using both an
RF model and SNIC. Mapping environmental features like wetlands is the first step in being
able to make informed decisions about conservation and restoration projects. However,
more relevant to policymakers is how environments change over time. This information
would allow them to isolate how human activity has changed wetlands during different
periods. The authors in [170] classified wetlands in Newfoundland during three different
periods to show the spatial dynamics of these ecosystems. There have been several attempts
to produce wetland inventory maps in Canada on a large scale, although they often lack
high spatial resolution and the ability to distinguish between wetland sub-types. There is
also the issue of a lack of ground-truth field data, a common problem in ML applications
in EO (there is overwhelmingly more unlabeled data than labeled data). Using field data
collected from one Canadian province was proposed in [17] to create wetland inventory
maps for several others using a mix of optical, SAR, and digital elevation data.
Across Canada, wetland mapping is a well-studied phenomenon. However, different
local and regional agency wetland inventories use different techniques for monitoring
wetlands or have altogether different definitions of what constitutes a wetland. Thus,
even though several large-scale wetland maps have been produced, they are often not
directly comparable. Additionally, these maps are often static and do not continually
monitor wetlands through time. However, these are not the only barriers to mapping
wetlands using RS imagery [165]. Others include obtaining sufficient and recent field data
to verify wetland monitoring products, but also the difficulty of monitoring such dynamic
landscapes. Wetlands do not have clear-cut boundaries, are extremely diverse landscapes
and ecosystems, and are often in flux throughout seasons and years due to flooding and
drying. The authors use optical and SAR Sentinel data in addition to field samples over
the entirety of Canada and show that almost one-fifth of Canada is covered in wetlands.
The study in [165] produced a high-resolution (10-m) wetland inventory map of Canada
(an approximate area of one billion hectares), using multi-year, multi-source (Sentinel-1
and Sentinel-2) RS data on the GEE platform. Wetlands provide a variety of ecological
services and are a key habitat for many species. Human activity has significantly disturbed
Remote Sens. 2022, 14, 3253 30 of 110
wetlands as they are drained for urban or agricultural development. However, monitoring
their health is challenging because it would require taking repeated field measurements
over wide areas. Researchers have used ML and RS data to do so, but the large amount of
compute needed to map wetlands is often prohibitive. The authors in [160] analyzed a large
number of field samples alongside Landsat imagery with an RF model to produce a wetland
map for all of Canada. Wetland mapping and monitoring have been a challenging issue
for the RS community during the past decades. Compared with the United States with the
National Wetlands Inventory, Canada has been lacking a national wetland inventory until
recently. The authors in [168] proposed an object-based classification method to classify
Sentinel-1 and Sentinel-2 data on the GEE cloud-computer platform, which resulted in the
10-m Canadian Wetland Inventory.
Large, inundated wetlands can be effectively mapped using RS imagery. Small wet-
lands or wetlands that are inundated only part of the time are much more difficult to
identify. Yet, it is more important to do so now than ever given that wetlands are rapidly
being converted for agricultural use or are drying up due to climate-induced drying. Moni-
toring wetlands at large scales is possible, however, with the help of automated techniques
like ML. For example, NAIP imagery and LiDAR derived DEM data were used in [163]
to detect wetlands across the northern United States using unsupervised classification on
the GEE platform. Being able to identify wetlands in RS imagery is the first step towards
monitoring their health or decline in a new climate regime, and to make policy choices
based on this information. To this end, spatially high-resolution sensors like LiDAR or
data products like NAIP can help researchers identify wetlands in RS imagery but are not
collected often enough to map wetlands at a fine temporal resolution. This is problematic
because wetlands are dynamic ecosystems; they can be both wet and dry over the course of
the same season. To get around this limitation, Sentinel-1 and 2 imagery were combined
in [171] with aerial photographs and field data to map the spatial variation of wetlands
in portions of the United States over time. Environmental problems are often associated
with land-use changes, but these changes are not solely linked to urban expansion. Land
use change also negatively affects areas like coastal wetlands, which are not monitored as
regularly. The possibility of using GEE to map coastal wetlands in Indonesia was explored
in [159] by comparing all of the different classifiers on the platform and how they perform
with Landsat, digital elevation, and Haralick texture data. The authors showed that in all
cases, ML models did much better at binary than multi-class classification.
Tidal flats, often referred to as coastal non-vegetated areas, are dynamic ecosystems,
both due to their natural rhythms of water advance and retreat, but also due to anthro-
pogenic change and rising sea levels. It is difficult to monitor tidal flats without the use
of multi-temporal, high-resolution RS imagery because of how they change through time.
With Landsat 8 and high-resolution Google Earth imagery, an RF model was used in [164]
on GEE to classify tidal flat types and their distribution in China. The authors reported very
high classification rates across tidal flat classes. However, the authors detailed that satellites
like Landsat did not fully capture tidal ranges. Coastal wetlands are usually composed
of coastal vegetation areas and tidal flats. Coastal tidal flats are natural transitions from
terrestrial ecosystems to ocean ecosystems and are vulnerable to anthropogenic activities
and natural disturbances such as sea-level rise, land reclamation, and aquaculture. Many
existing global land cover data products have a wetland layer, but do not explicitly dif-
ferentiate coastal vegetation area and coastal tidal flats (no specific layer for coastal tidal
flats). The authors in [169] developed a pixel- and frequency-based approach to generate
annual maps of tidal flats at 30-m spatial resolution in China’s coastal zone using the
Landsat TM/ETM+/OLI images and the GEE cloud computing platform. Tidal flats are
unique ecosystems but are threatened due to human disturbances and climate change.
Additionally, they are difficult to identify in RS imagery because satellite platforms cannot
capture intertidal variability due to their infrequent return times. The authors in [172]
addressed this limitation by first processing high-resolution RS and UAS imagery to map
minimum and maximum water and vegetation extent. They used Otsu’s thresholding
Remote Sens. 2022, 14, 3253 31 of 110
algorithm to automatically detect the best ratio for each index. These two indices were then
Remote Sens. 2022, 14, x FOR PEER REVIEW
combined in a composite that showed the total intertidal area in the RS imagery, to37which of 121
the authors again applied the Otsu thresholding algorithm. The end result was a highly
accurate map of tidal flats that did not require any post-processing.
Sebkhas
sebkhas formare a type of salty,
in Morocco. Wetlandunvegetated
inventorywetland
maps are created when desert
increasingly beingbodies
used toofinform
water
become more salinated over time due to mechanisms of water loss
carbon pricing, ecosystem service values, and conservation/restoration decisions. Thus, it such as evaporation.
They are home
is important to tomake
specific species ofprocessing
a repeatable vegetation pipeline
and fish thatthat can
can survive in salinated
ingest, process, and
environments,
visualize data on buta their drainage
day-to-day networks
basis are often underground,
so that monitoring programs and making
reportingthem hard to
programs
identify
(like in aingovernment
RS imagery.setting)
An RF have
modelup-to-date,
was used in [166] toinformation.
accurate identify water To cavities
this end,wherethere
sebkhas form in Morocco. Wetland inventory maps are increasingly
have been many studies identifying wetlands using RS imagery and ML, yet most of them being used to inform
carbon pricing,
suffer from not ecosystem
being able service values, between
to distinguish and conservation/restoration
wetland subtypes. This decisions. Thus, it
is a challenging
is important to make a repeatable processing pipeline that can
issue because fens, peatlands, bogs, marshes, and swamps can have very different ingest, process, and visualize
data on a day-to-day
vegetation types andbasis so thatItmonitoring
structure. is important programs
to be able and to
reporting programs
distinguish between (likethem
in a
government
because theysetting) have up-to-date,
each respond differently accurate
to humaninformation.
disturbanceTo andthis end, there
changes have been
in climate. The
many studies identifying wetlands using RS imagery and ML,
authors in [167] compared the performance of an XGBoost model to a CNN for wetland yet most of them suffer from
not
typebeing able to distinguish between wetland subtypes. This is a challenging issue because
classification.
fens, peatlands, bogs, marshes, and swamps can have very different vegetation types and
structure. It is important
3.2.7. Infrastructure and to be able Detection,
Building to distinguish between them
Urbanization because they each respond
Monitoring
differently to human disturbance and changes in climate. The authors in [167] compared
Infrastructure, building detection, and urbanization monitoring is the 7th-most-well-
the performance of an XGBoost model to a CNN for wetland type classification.
developed application using GEE and AI (11 studies total). Table 7 below summarizes
those Infrastructure
3.2.7. studies and a word cloud generated
and Building Detection, from the titles and
Urbanization keywords of those papers is
Monitoring
provided in Figure 12. The most frequently used terms are “Google Earth Engine”,
Infrastructure, building detection, and urbanization monitoring is the 7th-most-well-
“urban”, “land”,
developed “building”,
application “impervious”,
using GEE and AI (11etc. The vast
studies majority
total). Table 7ofbelow
the studies in this
summarizes
domain take place in China and are both static mapping
those studies and a word cloud generated from the titles and keywords of those papers and change-detection
applications.
is provided inInfrastructure
Figure 12. The andmost
urban area identification
frequently used termsisare often done Earth
“Google by comparing
Engine”,
these classes to other LULC classes, so we notice that “vegetation”
“urban”, “land”, “building”, “impervious”, etc. The vast majority of the studies and “forest” in also
this
appear in the word cloud. From our interactive web app (see Appendix
domain take place in China and are both static mapping and change-detection applications. A) and Table 7,
most frequently used RS datasets are Landsat 8 OLI, Landsat 7 ETM+,
Infrastructure and urban area identification is often done by comparing these classes to and Google Earth.
The most
other LULC popular
classes,AI so models
we notice arethat
RF,“vegetation”
CART, and and SVM, and the
“forest” most
also frequently
appear used
in the word
evaluation metrics are OA, Kappa, PA, and UA. A brief summary
cloud. From our interactive web app (see Appendix A) and Table 7, most frequently used of those studies is
provided below Table 7. More detailed textual summaries for
RS datasets are Landsat 8 OLI, Landsat 7 ETM+, and Google Earth. The most popular AI some selected studies are
provided
models areinRF,
Appendix
CART, and C.7.SVM, and the most frequently used evaluation metrics are OA,
Kappa, PA, and UA. A brief summary of those studies is provided below Table 7. More
detailed textual summaries for some selected studies are provided in Appendix C.7.
Figure 12. Word-cloud visualization of all the reviewed papers targeting infrastructure and building
Figure 12. urbanization
detection, Word-cloud visualization of all
monitoring (i.e., the reviewed
those 11 paperspapers targeting
summarized infrastructure
in Table 7). and building
detection, urbanization monitoring (i.e., those 11 papers summarized in Table 7).
Remote Sens. 2022, 14, 3253 32 of 110
Table 7. Studies targeting infrastructure and building detection from RS imagery using AI (Note that
references marked * denotes novel methods and is detailed in Section 3.3).
Materials like parking lots, roads, and buildings (i.e., concrete, asphalt) can be classified
as “impervious surfaces” in RS analyses and are often indicative of human development
and urban extent. Impervious surfaces change the hydrological cycle and produce heat
effects, affecting overall ecosystem health and well-being. To monitor these materials,
researchers have tried using night-time lights to estimate their extent, but this process leads
to overestimates as light scatters. To investigate how best to identify impervious materials
in RS imagery regardless of cloud cover, the authors in [182] combined nighttime light,
DEM, and SAR data and an RF model on GEE. Their resulting maps were more accurate
than commonly used maps like GlobeLand30. The authors in [180] put forward a new
scheme to conduct long-term monitoring of impervious−relevant land disturbances using
Landsat archives.
While greenhouses are used to grow food and help ensure food security, their prolif-
eration can have environmental consequences. Previous attempts to classify greenhouses
from RS imagery as part of LULC research have focused on small-scale proof-of-concept
applications and have not emphasized identifying the structures in complex terrain types.
To explore the possibility of identifying greenhouses in RS imagery over a large area in
China, an ensemble ML model was designed in [185] to distinguish them from water, forest,
farmland, and construction sites. Urban green spaces have a multitude of benefits, such as
regulating urban climate, improving air quality, and reducing stormwater. RS has proven
useful for studying the landscape structure of urban green spaces. The authors in [179]
assessed the impact of urban form on the landscape structure of urban green spaces in
262 cities in China. The results revealed that cities with a high road density tended to
have a smaller area of urban green spaces and be more fragmented. In contrast, cities with
complex terrains tended to have more fragmented urban green spaces.
Remote Sens. 2022, 14, 3253 33 of 110
Rapid urban expansion around the world has led to worsening human and ecosystem
health, affecting forests, air and water pollution levels, and overall levels of biodiversity.
However, the currently available maps for mapping urban settlements and their expansion
are mostly static, whereas it would be more useful to have up-to-date information to be
able to make better urban planning and land-use decisions. The authors in [186] designed a
workflow for mapping urban sprawl over time in Brazil using an RF on the GEE platform.
Increasing rates of urbanization put pressure on conservation targets and biodiversity levels
as land previously occupied by ecosystems is converted into built-up areas. RS imagery
makes it much easier for urban planners and researchers to monitor rates of urbanization
and urban sprawl over wide areas. However, few labeled datasets are available using ML to
identify buildings and built-up areas. To address this problem, a large, vectorized, ground-
truth verified dataset was created in [178] in India in order to train different ML models
on GEE. A semi-automatic large-scale and long-time-series (LSLTS) urban land mapping
framework was demonstrated in [183] by integrating the crowdsourced OpenStreetMap
(OSM) data with free Landsat images to generate annual urban land maps in the middle
Yangtze River basin (MYRB) from 1987 to 2017.
Research on urbanization and urban sprawl will often focus on how urban spaces are
replacing agricultural land and forested spaces. Vegetation maps, on the other hand, are
often produced using “urban”, “built-up areas”, or “impervious surfaces” as classes to
predict for, distinctly separating vegetation and zones of human inhabitation. Much less
work has gone into monitoring vegetation prevalence and distribution within urban spaces
themselves. This is an important and timely research topic given the environmental and
psychological benefits people get from having access to green spaces within cities, such
as stress reduction, better air quality, and lower temperatures. Using different vegetative
indices (EVI, Gross Primary Production, etc.) derived from Landsat and MODIS data, the
authors in [181] showed that urban sprawl in Shanghai had increased significantly in the
last decade and a half.
Table 8. Studies targeting wildfires from RS imagery using AI. (Note that references marked * denotes
novel methods and will be detailed in Section 3.3).
Table 8. Cont.
Table 8. Studies targeting wildfires from RS imagery using AI. (Note that references marked *
Wildfires cause damage to ecosystems and human health, in addition to releasing
denotes novel methods and will be detailed in Section 3.3.).
greenhouse gasses when they burn. Climate change increases the number of wildfires
across the globe. The
References recent massive
Method Modelwildfires,
Comparison which hit Australia
RS Data Type during the 2019–2020
Study Area
summer season, raised questions to what extent the risk of wildfires can be linked to
various climate, environmental, topographical, and social Landsat 4 TM, Landsat
factors and how 5 to predict fire
Canada, United
Parks etoccurrences
al. (2019) [189]to take regression RF TM, Landsat
preventive measures. An automated and cloud-based workflow 7 ETM+, was
States
Landsat 8 OLI
developed in [193] for generating a training dataset of fire events at a continental level
using freely available RS data on GEE. Landscape firesLandsat have been
5 TM,aLandsat
major natural hazard
Quintero et al. (2019)
affecting West-Central Spain, and FormaTrend,
therefore, it is critical
segmentation ETM+, Landsat OLI, and characterize
to be able to map Spain
[190] LandTrendr
landscape fires. Using the LandTrendr (Landsat-basedMCD64A1,DetectionSRTM of Trends
DEM in Disturbance
and Recovery) and FormaTrend (Forest Monitoring for Action—Trend) algorithms on
the GEE cloud-computing platform, a method was proposed CBERS-4 MUX, FireCCI51,
in [190] for identifying fire-
Gaofen-1 WFV, GFED4,
Long et induced disturbances.
al. (2019) [191] * Wildfires are aRF,
classification common
SVM occurrence in the Brazilian Cerrado, often
Global
Google Earth, MCD12C1,
determining and changing the natural plant species in burn cycles. However, the Cerrado
MOD44B, MTBS, Landsat-8
has been undergoing increasing anthropogenic conversion into cropland and pastures,
which has changed hydrological and biogeochemicalFireCCI51, cycles within
IRS 1C, this ecosystem. This in
Landsat
turn has led to changes in fire size,CART, pattern,
RF, frequency,
SVM, 5,and severity,
Landsat 8 OLI,so it is more important
MODIS,
Bar et al. (2020) [192] classification India
than ever that methods to quickly Weka
andclustering
reproduciblyResourceSat
monitor the 2, Sentinel-2,
fire landscape within
savannah are created. A completely cloud-based DL workflow VIIRScombining Google Cloud
and GEE was designed in [196] to classify burn scar areas in Brazil.
CGLS-LC100, FIRMS,
Sulova andTraditional
Jokar wildfire mapping field surveys and digitization efforts are time-consuming
classification CART, NB, RF MOD13Q1, Sentinel-2, Australia
Arsanjani
and(2020)
hard[193]
to reproduce over time. Burned area indices can SRTM be created
DEM to monitor post-fire
landscapes and their subsequent recovery, but their thresholds are not dynamic and so
Zhang etperform
al. (2020)differently
[194] classification
in different locations.RFSentinel-2 data wasLandsat used in5 [195], along with Global
two
different burn areas and LULC maps to train different ML classifiers (k-nearest neighbor
(KNN), RF, SVM) to map wildfire damage in Australia. As the planet warms, forest fires
are increasing in occurrence and severity. This has negative consequences for ecosystems,
Remote Sens. 2022, 14, 3253 35 of 110
biodiversity, and human health. To estimate the damage caused by forest fires and their
subsequent recovery rates, RS imagery is needed to monitor forests and burn scars over
large areas. However, to date, most fire products are created with coarse RS imagery,
making regional and local fire monitoring difficult. To determine the impact of using
higher-resolution RS data products, how Landsat and Sentinel optical imagery affected an
ML model’s performance in burn area classification was compared in [192].
Burned area maps showing where wildfires have occurred are important in being
able to analyze global wildfire trends. However, many burned area maps derived from
RS imagery are from the MODIS platform. The 250 m spatial resolution of products like
FireCCI51 leave out a lot of detail, so the authors in [191] used CBERS, Gaofen, and Landsat
imagery to create a 30 m burned-area dataset for 2015. However, the authors noted that their
method had difficulty recognizing burned areas from recently plowed fields in agricultural
areas, so crop-type masks should be used to remove potential false positives. Additionally,
Landsat data was used for both the data collection and validation stage. Thus, the authors
were not able to assess the suitability of using Landsat imagery for data collection purposes
despite their high accuracy rates. Later on, [194] adapted the exact same processing steps
on GEE to produce a burned area map for the year 2005, illustrating how sharing and
storing code on GEE makes it easy to re-run analyses or adapt them for new use cases. 42 of 121
Remote Sens. 2022, 14, x FOR PEER REVIEW
Satellite-derived spectral indices such as the relativized burn ratio (RBR) allow fire
severity maps to be produced across multiple fires and broad spatial extents. In order to
better interpret the fire severity in terms
Satellite-derived spectralofindices
on-the-ground
such as the fire effectsburn
relativized compared to non-
ratio (RBR) allow fire
standardized spectral indices,
severity maps to[189] produced
be produced a multiple
across map of fires
composite
and broad burn index
spatial (CBI),
extents. a to
In order
frequently used field-based measure
better interpret the fireofseverity
fire severity.
in terms of on-the-ground fire effects compared to non-
standardized spectral indices, [189] produced a map of composite burn index (CBI), a
3.2.9. Heavy Industry and Pollution
frequently Monitoring
used field-based measure of fire severity.
There are seven studies about heavy industry and pollution monitoring using GEE
3.2.9. Heavy Industry and Pollution Monitoring
and AI. Table 9 below summarizes those studies and a word cloud generated from the titles,
There are seven studies about heavy industry and pollution monitoring using GEE
keywords, and abstracts
and AI. of the9seven
Table below papers is provided
summarizes in Figure
those studies and a 14.
word The
cloudmost frequently
generated from the
used words formtitles,
the phrase
keywords,“Google Earth of
and abstracts Engine”.
the sevenMost
papersapplications
is provided in inFigure
this area aremost
14. The
focused on monitoring reclamation
frequently used wordsor pollution at active
form the phrase or previous
“Google mining
Earth Engine”. sites,
Most so “mine”
applications in this
and “mining” feature prominently
area are in this word
focused on monitoring cloud.or The
reclamation algorithm
pollution at active LandTrendr wassites,
or previous mining
so “mine”
used by several papers afterand “mining”mine
identifying feature prominently
sites to monitorinpollution
this wordand cloud. Thelevels
water algorithm
LandTrendr was used by several papers after identifying mine
or vegetation changes through time. From our interactive web app (see Appendix A) and sites to monitor pollution
and water levels or vegetation changes through time. From our interactive web app (see
Table 9, the most-used RS datasets are Sentinel-2, Landsat 8 OLI, Landsat 5 TM, Landsat 5,
Appendix A) and Table 9, the most-used RS datasets are Sentinel-2, Landsat 8 OLI,
and Google Earth. The most
Landsat popular
5 TM, Landsat models
5, and Googleare RF, The
Earth. CART,
mostand
popularLandTrendr,
models are RF, and the and
CART,
most-used evaluation metricsand
LandTrendr, arethe
Kappa, OA,evaluation
most-used PA, UA. A brief are
metrics summary
Kappa, OA, of those seven
PA, UA. A brief
studies is provided below Table
summary 9. More
of those seven detailed
studies istextual
provided summaries
below Table for 9.each
Moreof detailed
the seven textual
summaries
studies are detailed for each
in Appendix of the seven studies are detailed in Appendix C.9.
C.9.
Table 9. Studies targeting heavy industry and pollution from RS imagery using AI.
Mining can lead to lots of environmental degradation during the actual mining process
itself, but often continues to do so if mines are not properly reclaimed after the mine is no
longer active. Field techniques for monitoring environmental damage operate on a limited
spatial and temporal scale, failing to fully capture what is happening. RS can help monitor
ecological changes during mining and ensure that mining companies clean up after mining
has stopped during the reclamation process. A mapping study was performed in [198] for
mining areas in the Brazilian Amazon using Sentinel-2A images and the CART classifier in
GEE. To monitor mining disturbances at a coalfield in Mongolia, the LandTrendr algorithm
was used in [199] to analyze Landsat data. The authors designed a fast, efficient method
on the GEE platform to monitor surface mining operations and show that only 26% of
promised reclamation was undertaken at the Shengli Coalfield. Heavy industry projects
like mining normally require reclamation after the fact to ensure that local ecosystems can
heal and regenerate. Monitoring sites that have undergone mining is made much easier
with RS imagery because they are often large, spatially distributed ecological disturbances.
This is especially the case for underground mining projects where subsidence occurs but is
difficult to track without an aerial view. Landsat imagery and the LandTrendr algorithm
were utilized in [202] to monitor water accumulation in subsidence areas of past mining in
China. Mining is economically important because of the many jobs and resultant materials
it provides but is associated with various environmental and health risks. One such danger
comes from the failure of tailings dams, which store water with toxic levels of waste solids.
Even though these failures can cause significant damage to the environment, human health,
and infrastructure, there is not a global database containing active tailings dams. This in
turn can make it easier for illegal mines to operate as legal mining operations with tailings
dams are not heavily monitored. In order to keep track of mines and dams in Brazil, two
different CNNs were used in [200] to first classify potential mining sites and then to classify
perceived/potential environmental risk.
As cities expand and develop, construction and demolition waste is often stored until
it can be further processed, reused, or gotten rid of. Sometimes these waste piles are orderly
and are trackable, but many are not, making it hard to manage them and their potential
negative environmental or social effects. Current methods to take stock of waste piles
and dump sites rely on field investigations, which take a lot of time, effort, and money to
produce. More work needs to be done to identify them using RS imagery and ML methods,
but tuning different ML methods and their respective parameters can lead to different
results. To test the efficacy of different ML algorithms for identifying waste and dump sites
in optical imagery, the parameters for the CART, RF, and SVM algorithms available on GEE
were optimized in [203].
Oil and gas pads are developed for production and then capped, reclaimed, and left
to recover when no longer productive. Understanding the rates, controls, and degree of
recovery of these reclaimed well sites to a state similar to pre-development conditions is
Remote Sens. 2022, 14, 3253 37 of 110
critical for energy development and land management decision processes. The authors
in [197] used time series data of the Soil Adjusted Total Vegetation Index (SATVI), calculated
from Landsat 5 imagery, to track changes and assess vegetation regrowth on 365 abandoned
well pads located across the Colorado Plateau. Previous estimates of particulate matter for
the Canadian Air Pollutant Emissions Inventory (APEI) were based on the exposed mine
disturbance areas that had been calculated using outdated mine area extents. With GEE
JavaScript API, RF classifiers were used in [201] to produce maps of mine waste extents
with Landsat-8 and Sentinel-1 and Sentinel-2 archives.
Forests store much of the world’s terrestrial carbon, but globally they are under threat
due to the effects of global warming and human disturbance. While forests release carbon
immediately when they are cut down or otherwise disturbed, they also release carbon
through secondary effects. This type of climate “memory” or lag in carbon flux is much
less studied and so not well-known. To study this mechanism further, the authors in [209]
used an LSTM and compared the performance to an RF for carbon fluxes in global forests.
atmosphere characteristics like “surface”, “land”, “temperature”, “LST” and “albedo”.
From our interactive web app (see Appendix A) and Table 10, the most-used RS datasets
are Landsat 8, Landsat 5, and Sentinel-2. The most popular AI models are RF, and the
most frequently used evaluation metrics are mean absolute error (MAE), OA, root mean
square error (RMSE), R2. A brief summary of those studies is provided below Table 10.
Remote Sens. 2022, 14, 3253 38 of 110
More detailed textual summaries for each of the seven studies are detailed in Appendix
C.10.
Figure 15.15.Word-cloud
Word-cloud visualization of all the
visualization of reviewed papers targeting
all the reviewed papers climate and climate
targeting meteorology
and
(i.e., those seven
meteorology (i.e.,papers summarized
those seven in Table 10). in Table 10).
papers summarized
TableAccurate satellite-derived
10. Studies targeting climatealbedo estimations
and meteorology are needed to parameterize and in turn
studies.
to validate climate simulation models. MODIS satellite observations from 2000 to 2015 were
analyzed in [204] usingModel
GEE to derive global snow-free land surface albedo estimations and
References Method RS Data Type Study Area
trends at a 500 m resolution.
Comparison A method was presented in [208] to obtain high-resolution
sea surface salinity (SSS) and temperature (SST) by using Sentinel-2 Level 1-C Top of
Chrysoulakis et al. Atmosphere reflectance polynomial MCD43A1,
data. The consistency betweenMCD43A2,
Tropical Rainfall Measuring Mission
regression Global
(2019) [204] (TRMM) multi-satellite precipitation
regression and monthly
MOD09CMAprecipitation has been confirmed
gauged
worldwide. A downscaling framework (from 25 km to 1 km) was proposed in [210] for
TRMM precipitation products by integrating GEE and Google Colaboratory France,(Colab).
Portugal,
Chastain et al. (2019) major axis Landsat 7 ETM+, Landsat 8
Furthermore, 30-m Landsat imagery has a long history of coverage
regression Spain,between
United the
[205] regression OLI, Sentinel-2 MSI
seven ETM+ and eight OLI sensors. Sentinel-2 Multispectral Instrument (MSI) Statesimagery
has a higher resolution of 10-m and faster revisit frequency (10 days instead of 16 days
for Landsat). Being able to use all ofDMSP-OLS
these sensors
NTL, together
Global for a Australia,
given EOBrazil,
analysis
Demuzere et al. (2019) would greatly increase the available spatial and temporal resolution,
Forest Canopy Height, but the sensors
Canada, China, have
classification
differences that need to be RFcalibrated before they can be integrated. Still, this is one of the
[206] Landsat 8, Sentinel-1, France, Japan,
most-requested datasets we found in our review. Major-axis regression
Sentinel-2 was performed
Mexico, Poland,
in [205] on these datasets in pairs (seven ETM+/8 OLI, 7 ETM+/2 MSI, and eight OLI/2
MSI) across the entire coterminous United States and they were able to determine cross-
platform correction coefficients for the Blue, Green, Red, NIR, and SWIR bands present in
all three satellites.
Urbanization has changed the urban landscape and resulted in increasing land surface
temperature (LST). In [207], the authors investigated the impacts of landscape changes on LST
intensity (LSTI) in a tropical mountain city in Sri Lanka. There are several ongoing attempts
to classify cities around the world based on various characteristics like urban canopy cover,
total built-up area, neighborhood sizes, and urban heat island effects (for example, see Urban
Atlas, World Urban Database Access and Portal Tools (WUDAPT)). These datasets can help
planners and policymakers make more informed decisions as they consider implementing
sustainability measures in their respective cities. However, these types of spatial datasets often
rely on surveying methods that need to be continually updated. A cloud-based workflow
was implemented in [206] and compared to the traditional method of using SAGA GIS for
producing local climate zone city maps based on data like WUDAPT.
Figure 16.
16.Word-cloud visualization
Word-cloud of all the of
visualization reviewed papers
all the targeting
reviewed disastertargeting
papers management (i.e.,
disaster
those six papers
management summarized
(i.e., in Table
those six papers 11).
summarized in Table 11).
TableRS
11.imagery has longdisaster
Studies targeting been used to monitor
management fromcommunity recovery
RS imagery using AI. after natural disas-
ters. Decision makers can use RS imagery and analyses to redirect resources during the
recovery process. EvenModel
so, many studies focused on disaster recovery use VHR imagery
References Method RS Data Type Study Area
that increases data storage and compute needs. To explore the suitability of GEE for dis-
Comparison
aster recovery, the authors in [215] used an RF model trained on Landsat imagery to do
Landsat
change detection on pre- and post-disaster 5 TM,
areas Landsat
in the 7,
Philippines. Building detections
Yu et al. (2018) [211] classification RF Nepal
in post-disaster scenes are a valuable resource
Landsatfor timelyDEM
8, SRTM assessing damages in disaster
management. Using RGB images as input, an automatic building detection method was
proposed in [216] to find buildings andLandsat 7 ETM+, Landsat
their irregularities in pre-8 and post-disaster (sub-)
Cho et al. (2019) [212] meter resolution images.RF
classification OLI, MODIS Terra, Sentinel-1, United States
SMOS
Figure 17. Word-cloud visualization of all the reviewed papers targeting soil (i.e., those six papers
summarized in
summarized in Table
Table 12).
12).
Many authors come to GEE curious to test out the new cloud computing platform
for their domain-specific application. GEE provides freely available compute and data to
interested researchers, which they then use to explore the strengths and limitations of GEE.
An early soil mapping study was performed in [217] on GEE in 2015. Collecting field samples
for soil mapping can be time- and labor-intensive and can be bound to small areas given their
costs. These data collections also need to be repeated, representing a barrier to presenting
up-to-date information that covers large spatial areas to decision-makers. To address these
issues, the authors in [219] used field observations, DEM data, and Landsat imagery on GEE
to map different soil types and soil attributes across a large region in Brazil.
Soil plays a critical role in the carbon and water cycles, along with providing areas
for habitat or agricultural use. The spatial distribution of litter and soil carbon (C) stocks
is important in greenhouse gas estimation and reporting and inform land management
decisions, policy, and climate change mitigation strategies. The effects of spatial aggregation
of climatic, biotic, topographic and soil variables on national estimates of litter and soil C
stocks were explored in [220]. The authors also characterized the spatial distribution of
litter and soil C stocks in the conterminous United States (CONUS). Litter and soil variables
were measured on permanent sample plots from the National Forest Inventory (NFI) from
2000 to 2011. Beyond mapping litter and soil carbon (C) stocks, it is also important to map
soil organic matter at a large scale, but traditional field collection techniques are cost- and
effort-intensive. Many researchers have thus turned to RS imagery and/or ML to map
soil organic matter, but there is still some difficulty in selecting the right input data or ML
model for prediction. To determine how different datasets and ML models perform on GEE
in predicting soil organic matter, an ANN, RF, and SVR model was compared in [222] with
MODIS, Sentinel-2A, and DEM data as input.
Accurate soil moisture content information is crucial to being able to correctly model
water, energy, and carbon cycles, as well as being key to understanding and predicting
natural hazards like drought, floods, and landslides. However, most soil moisture datasets
are created with medium or coarse spatial resolution. Using optical, thermal, and SAR
imagery in addition to DEM data, a global, high-resolution soil moisture map was produced
in [221]. The authors concluded that optical RS imagery and land-cover information play
the most important roles in determining soil moisture content, but SAR imagery and
soil data also contribute significantly to the model’s overall performance. This finding
highlights other studies’ results ([95,161,182]) that the combination of optical and SAR
data improves predictive outcomes. Soil salinity can impact agricultural yields and is
a global issue, but current datasets like the Harmonized World Soil Database have low
spatial resolution and need to be updated. As one of the main soil salinity datasets in
Remote Sens. 2022, 14, 3253 42 of 110
use, this makes it difficult to estimate up-to-date soil salinity levels even as they change
due to increasing drought severity from global warming. The authors in [218] explored
GEE’s potential to make a global soil salinity map based on field data and Landsat thermal
infrared imagery.
Table 13. Studies targeting cloud detection from RS imagery using AI (Note that references marked *
denotes novel methods and will be detailed in Section 3.3).
Figure 18.
Figure 18. Word-cloud
Word-cloud visualization
visualization of
of all
all the
the reviewed papers targeting
reviewed papers targeting cloud
cloud detection
detection and
and
masking (i.e., those five papers summarized in Table 13).
masking (i.e., those five papers summarized in Table 13).
TableMany
13. Studies targeting
mapping andcloud detection from
identification tasksRSthat
imagery
use RSusing AI (Note
imagery andthat
ML references
rely on marked
optical
* denotes novel methods and will be detailed in Section 3.3.).
cloud-free imagery. Detecting and removing clouds in optical RS imagery is a difficult
but important task, as many other classification and detection methods rely on masking
References Method Model Comparison RS Data Type Study Area
kernel ridge
Gómez-Chova et al. Landsat 8, RapidEye, Argentina, China,
regression regression, linear
(2017) [223] * SPOT 4 Jordan, Spain
regression
Remote Sens. 2022, 14, 3253 43 of 110
clouds and on obtaining cloud-free imagery. Many algorithms, including Fmask, which is
a commonly used algorithm to create a cloud mask in RS imagery, rely on using thresholds
for single RS images, which makes them prone to errors when applied to entire RS time
series. The authors in [223] treated cloud detection as a change detection problem across
time using a kernel ridge regression model.
Optical RS imagery has many applications across several environmental and earth
science domains. However, Optical RS imagery is often occluded by clouds, limiting its
utility. While processing techniques like taking monthly composite images to remove
clouds works to some extent, it relies on having enough cloud-free imagery to make the
composites, which is not always available. Recently, DL models have shown the ability to
reconstruct scenes in optical RS imagery that is blocked by clouds. However, researchers
looking to use DL models in cloud environments often have to coordinate across different
storage, analysis, and ML platforms (e.g., Google Cloud Storage, Google Colab, Google AI),
which can be cumbersome and expensive. The authors in [227] thus decided to implement
their cloud-removal DL model directly in GEE. Their model, DeepGEE-S2CR, is a cloud-
optimized version of the DSen2-CR model presented in [228] and fuses co-registered
Sentinel-1 and Sentinel-1-2 images from the SEN12MS-CR dataset.
Cloud detection is a well-studied task and GEE has several cloud detection/masking
algorithms available on its platform. However, some of them have shown to be unstable
leading to considerable under- or overestimation. To explore how CV algorithms and ML
models can be used together on GEE, [226] combined the existing Cloud-Score algorithm
with an SVM to detect clouds in imagery ranging from Amazon tropical forests, Hainan
Island, and Sri Lanka. Fmask is the most commonly used method but has limited use
in mountainous regions where terrain and shadows can be confused for clouds or when
sudden changes in the Earth’s surface occur in time-series imagery. A convolutional neural
network (CNN) called DeepGEE-CD was built in [225] to detect clouds in RS imagery
directly on the GEE platform. Cloud screening may be cast as an unsupervised change
detection problem in the temporal domain. A cloud screening method based on detecting
abrupt changes along the time dimension was introduced in [224], assuming that image
time series follow smooth variations over land (background) and abrupt changes are mainly
due to the presence of clouds.
Figure 19. Word-cloud visualization of all the reviewed papers targeting wildlife and animal
studies (i.e., those four papers summarized in Table 14).
UAS (i.e., drones) are able to collect high-quality data over large aggregations of
wildlife, as they offer an attractive opportunity for improving methods and increasing
cost effectiveness of monitoring wildlife populations. The authors in [229] explored the
use of UAS for identifying Ny. darlingi breeding sites with high-resolution imagery
(~0.02 m/pixel) and their multispectral profile in Amazonian Peru. Land use changes
such as deforestation, irrigation, wetland modification and road construction, may drive
infectious disease outbreaks and interfere with their transmission dynamics. Accurate
classification of Ny. darlingi -positive and -negative water bodies would increase the
impact of targeted mosquito control on aquatic life stages. Researchers in [231] developed
a semi-automated framework for monitoring large complex wildlife aggregations using
drone-acquired imagery over four large and complex waterbird colonies.
The success of conservation and mitigation management strategies may greatly de-
pend on the knowledge of the temporal and spatial patterns of roadkill risk, and its
relationship with key environmental drivers. The authors in [230] used a set of freely
available environmental variables, namely habitat information from RS observations and
climatic information from weather stations, to assess and predict the roadkill risk.
Pest outbreaks are causing more damage to forests around the world as winters get
warmer and summers are drier and start earlier. These conditions allow pests to proliferate,
though pests do not always kill trees outright. They often defoliate trees, which weakens
them before future pest outbreaks or drought conditions. However, forest defoliation
is understudied and much of the research done in this area relies on coarse resolution
data. Using Landsat RS imagery, climate variables, and government environmental data,
Ref. [232] analyzed Pine Processionary Moth outbreaks in pine forests in southern Spain.
3.2.15. Archaeology
Archaeology is also one of the less researched applications using GEE and AI (three
studies total). Table 15 below summarizes those studies and a word cloud generated from
the titles, keywords, and abstracts of the three papers is provided in Figure 20. The most
frequently used words are “Google Earth Engine”, “detection”, “satellite”, “drone”, and
“survey” while terms like “automated” and “mounds” are also common. This reflects the
papers we reviewed and their focus on using the GEE platform to scale up and automate
exploratory surveys using RS data, both from satellite platforms and self-collected drone
imagery. From our interactive web app (see Appendix A) and Table 15, the most frequently
used RS dataset is WorldView 2. The most popular ML model is an RF and the most-used
evaluation metric is visual analysis.
frequently used RS dataset is WorldView 2. The most popular ML model is an RF and the
most-used evaluation metric is visual analysis.
Utilizing RS imagery for anthropological studies can be difficult because of a lack of
financial resources, technical training, or compute needed to analyze large RS datasets.
Remote Sens. 2022, 14, 3253 More specific to searching for mounded sites and scattered materials that would indicate 45 of 110
past human habitation in RS imagery, it is difficult to pair legacy field data with RS
imagery. When archaeologists look for potsherds, either in the field or at development
sites,
Table the standard
15. Studies practice
targeting is to form
archeology walking
from surveys
RS imagery to detect
using evidence
AI. (Note of priormarked
that references human*
settlement. This usually involves a large group of
denotes novel methods and will be detailed in Section 3.3). people walking in parallel lines over a
given area, documenting what they find along the way. This process involves a lot of
References upfront
Methodpersonnel costs. The
Model authors in [233] demonstrated
Comparison RS Data Type the potential Study role of GEE in
Area
Liss et al. (2017) [233] * theclassification
future of archaeological research
Canny edge detection,through
RF two WorldView
case studies.
2 The authors in [234] used
Jordan
Orengo and Garcia-Molsosa drone imagery and GEE to detect potsherds in the field in the hopes of speeding up this
classification CART, RF, SVM DJI Phantom 4 Pro Greece
(2019) [234] * process. In [235], the authors utilized optical and SAR data on GEE to create a classifier
capable of outputting a likelihood that there Google is a mounded site in a given region of the
Earth, Sentinel-1,
Orengo et al. (2020) [235] * Cholistan Desert in Pakistan. More detailed textual summaries 2,
classification RF Sentinel-2 MSI, WorldView for each ofPakistan
those three
WorldView 3
studies are provided in Appendix C.15, as they are all proposed some novel methods.
TableUtilizing
15. Studies
RStargeting
imageryarcheology from RS imagery
for anthropological using
studies canAI.
be (Note thatbecause
difficult references
of marked
a lack of*
denotes novel methods and will be detailed in Section 3.3.).
financial resources, technical training, or compute needed to analyze large RS datasets.
More specific to searching for mounded sites and scattered materials that would indicate
References Method
past ModelinComparison
human habitation RS imagery, it is RS Data Type
difficult Studydata
to pair legacy field Areawith RS
imagery. When archaeologists look for potsherds, either in the field or at development
Liss et al. (2017) [233] * sites, the standardCanny
classification
edge
practice detection,
is to form walkingWorldView
surveys to detect
2 evidenceJordan
of prior human
RF
settlement. This usually involves a large group of people walking in parallel lines over
a given area, documenting what they find along the way. This process involves a lot of
Orengo and Garcia- upfront personnel costs. TheRF,
authors DJI Phantom 4
classification CART, SVMin [233] demonstrated the potential role of GEE in the
Greece
Molsosa (2019) [234] * Pro
future of archaeological research through two case studies. The authors in [234] used drone
imagery and GEE to detect potsherds in the field in the hopes of speeding up this process.
In [235], the authors utilized optical and SARGoogle Earth,
data on GEE to create a classifier capable
Sentinel-1,
of outputting a likelihood that there is a mounded site in a given region of the Cholistan
Orengo et al. (2020)
classification RF
Desert in Pakistan. More detailed Sentinel-2 for
textual summaries MSI,
each of thosePakistan
three studies are
[235] *
WorldView
provided in Appendix C.15, as they are all proposed some2,novel methods.
WorldView 3
3.2.16. Coastline Monitoring
Coastline monitoring is one of the less researched applications using GEE and AI
(three studies total). Table 16 below summarizes those studies and a word cloud generated
from the titles, keywords, and abstracts of the three papers is provided in Figure 21. The
word clouds provide an informative (general and specific) focus of each set of the papers.
For example, we can see that the most frequently used general words are “shoreline”,
“coastline”, and “tidal”, and “beach”. This type of research is interested in first detecting
coastlines, but also monitoring geospatial changes over time (i.e., keywords “detection”,
“position”, “changes”, “temporal”, “time”, and “multi-annual”). From our interactive web
3.2.16. Coastline Monitoring
Coastline monitoring is one of the less researched applications using GEE and AI
(three studies total). Table 16 below summarizes those studies and a word cloud generated
Remote Sens. 2022, 14, 3253 from the titles, keywords, and abstracts of the three papers is provided in Figure 21. 46 ofThe
110
word clouds provide an informative (general and specific) focus of each set of the papers.
For example, we can see that the most frequently used general words are “shoreline”,
“coastline”, and “tidal”,
app (see Appendix A) and and “beach”.
Table 16, theThis type ofRS
most-used research
datasets is are
interested
Landsatin5 first detecting7
TM, Landsat
coastlines,
ETM+, andbut also monitoring
Landsat 8 OLI. geospatial changes over time (i.e., keywords “detection”,
“position”, “changes”, “temporal”, “time”, and “multi-annual”). From our interactive
web 16. Studies
Tableapp targeting coastline
(see Appendix monitoring
A) and Table studies.
16, the most-used RS datasets are Landsat 5 TM,
Landsat 7 ETM+, and Landsat 8 OLI.
References Method Model Comparison RS Data Type Study Area
Observing and quantifying the changing position of the shorelines is critical to
linear regression, marching
present-day coastal management and future Landsat coastal5 TM,
planning.
Landsat 7 The authors in [236]
squares interpolation
Hagenaars et al. (2018) [236] regression
presented an automated method to extract ETM+, Landsat
shorelines from 8 OLI, and Sentinel
Landsat Netherlands
satellite
algorithm, region growing
Sentinel 2
imagery. The authors inclustering
[237] evaluated
algorithm the capability of satellite RS to resolve at differing
temporal scales the variability and trends inLandsat sandy4 shoreline
TM, Landsat positions.
5 In [238],
Australia, France,the
Vos et al. (2019) [237] regression
authors proposed a method toMLP map continuousTM, Landsatin
changes 7, coastlines
Landsat 8, and New
tidalZealand,
flats in the
Sentinel-2, UAS United States
Zhoushan Archipelago during 1985–2017, using Landsat images on the GEE platform.
Landsat 5 TM, Landsat 7
Cao et al. (2020) [238] More detailed textual summaries
classification for each of those
hierarchical clustering three studies are provided in Appendix
China
ETM+, Landsat 8 OLI
C.16.
Figure
Figure 21.
21.Word-cloud visualization
Word-cloud of all theofreviewed
visualization all thepapers targeting
reviewed coastline
papers monitoring
targeting (i.e.,
coastline
those three papers
monitoring summarized
(i.e., those in Table
three papers 16).
summarized in Table 16).
TableObserving
16. Studies and
targeting coastlinethe
quantifying monitoring
changingstudies.
position of the shorelines is critical to present-
day coastal management and future coastal planning. The authors in [236] presented an
References Method
automated method Model Comparison
to extract RS Dataand
shorelines from Landsat Type Study
Sentinel satellite Area The
imagery.
authors in [237] evaluated the capability of satellite RS to resolve at differing temporal
linear
scales the variability andregression,
trends in sandy shoreline positions.
Landsat 5 TM, In [238], the authors pro-
marching
posed a method to map continuous squares changes in coastlines and tidal flats in the Zhoushan
Hagenaars et al. Landsat 7 ETM+,
regression interpolation
Archipelago during 1985–2017,algorithm,
using Landsat images on8the Netherlands
GEE platform. More detailed
(2018) [236] Landsat OLI,
region
textual summaries forgrowing clustering
each of those three studies are provided in Appendix C.16.
Sentinel 2
algorithm
3.2.17. Bathymetric Mapping
There are only two bathymetric mappingLandsat studies 4leveraging
TM, GEE and AI. Table 17
Australia, France,
Landsat 5 TM,
below summarizes those studies and a word cloud generated from the titles, keywords,
Vos et al. (2019) [237] regression MLP New Zealand,
and abstracts of the two papers is provided Landsat
in Figure7,22. The most
Landsat 8, frequently used words
United States
are “bathymetry”, “satellite” and “satellite-derived”, as well
Sentinel-2, UAS as “validation”. Currently,
bathymetric mapping applications are derived from radar, sonar, and light detection and
ranging (LiDAR) measurements from boats and small aircraft in conjunction with model
simulations. The authors using GEE for bathymetric mapping research are trying to use
satellite imagery and ML on the cloud platform to generate bathymetric maps over much
larger scales than would be possible otherwise.
bathymetric mapping applications are derived from radar, sonar, and light detection and
ranging (LiDAR) measurements from boats and small aircraft in conjunction with model
simulations. The authors using GEE for bathymetric mapping research are trying to use
satellite imagery and ML on the cloud platform to generate bathymetric maps over much
larger scales than would be possible otherwise.
Remote Sens. 2022, 14, 3253 47 of 110
Mapping bathymetry across large areas is a difficult problem. This is in part because
high-resolution aerial radar data, which produces some of the best bathymetry maps, are
expensive to collect and only cover small areas. Researchers in [239] paired field
Table 17. Studies
observations of targeting bathymetry
coastal depths with from RS imagery
RS imagery usingmultiple
to train AI. linear regression models
that can then predict in areas where no depth information is available. Without accurate
References Method Model Comparison RS Data Type Study Area
bathymetry information, ships risk getting stranded in shallow water areas around the
globe. Typically, shipsmultiple
equipped with sonarGarmin
linear
Fishfinder
and planes that have airborne LiDAR are
Traganos et al. (2018) [239] regression
used to get water depth measurements. 160C sonar,
However, sonar Lowrance
is not suitable forGreece
shallow water
regression
HDS-5 sonar, Sentinel-2
measurements and airborne LiDAR is expensive to get. Moreover, there are very few
bathymetry datasets that have a global reach. CZMIL airborne
The authors in [240] used airborne LiDAR,
LiDAR, HDS-5 sonar,
sonar, and Landsat data to estimate bathymetry in Japan, Puerto Japan, Rico, Puerto
the USA,
Rico,and
Sagawa et al. (2019) [240] regression RF HDS-7 sonar, Landsat 8,
Vanuatu using an RF model. More detailed Riegl textual summaries for USA, Vanuatu
each of those two
VO-880G
studies are provided in Appendix C.17. airborne LiDAR
Figure 22.
Figure 22.Word-cloud visualization
Word-cloud of all the
visualization of reviewed
all the papers targeting
reviewed papersbathymetric
targeting mapping (i.e.,
bathymetric
those papers summarized in Table 17).
mapping (i.e., those papers summarized in Table 17).
TableMapping
17. Studies targeting bathymetry
bathymetry from
across large RS imagery
areas using problem.
is a difficult AI. This is in part because
high-resolution aerial radar data, which produces some of the best bathymetry maps,
References areMethod Modeland
expensive to collect Comparison
only cover smallRSareas.
Data Type
Researchers inStudy
[239] Area
paired field
observations of coastal depths with RS imagery to train multiple linear regression models
that can then predict in areas where Garmin
Traganos et al. (2018) multiple linear no depth information is available. Without accurate
bathymetry Fishfinderin160C
regression information, ships risk getting stranded Greece
shallow water areas around the
[239] regression
sonar, Lowrance
globe. Typically, ships equipped with sonar and planes that have airborne LiDAR are
used to get water depth measurements. However, sonar is not suitable for shallow water
measurements and airborne LiDAR is expensive to get. Moreover, there are very few
bathymetry datasets that have a global reach. The authors in [240] used airborne LiDAR,
sonar, and Landsat data to estimate bathymetry in Japan, Puerto Rico, the USA, and
Vanuatu using an RF model. More detailed textual summaries for each of those two studies
are provided in Appendix C.17.
Figure 23. Word-cloud visualization of all the reviewed papers targeting ice and snow (i.e., those
Figure 23. Word-cloud visualization of all the reviewed papers targeting ice and snow (i.e., those two
two summarized in Table 18).
summarized in Table 18).
Table 18. Studies targeting ice and snow studies.
Global warming is putting pressure on Arctic ice and snow cover as the Arctic is
heating up much more rapidly than the rest of the planet. In Alaska, changes in perennial
References Method Model Comparison RS Data Type Study Area
snow cover have wide-ranging implications from changing hydrology and vegetation
patterns, altering the local topology through more frequent freeze-thaw cycles, and by
disrupting the ability of subsistence hunters in the region to find food. The authors in [241]
used a CART model to track the changes in the cryosphere in Alaska. The duration
and seasonality of lake ice is sensitive to local environmental changes such as wind, air
temperature, and snow accumulation. Lake ice phenology (LIP, ice breakup and freeze-up
dates and ice duration) is a particularly robust proxy for climate variability. The authors
in [242] studied LIP in Qinghai Lake, China. A more detailed textual summary of those
two studies is provided in Appendix C.18.
novel methods are provided in Sections 4.2 and 4.3. It is interesting to see that there are
only three studies about archaeology (Section 3.2.15), but all three papers have proposed
novel methods.
Figure
Figure 24. Word-cloudvisualization
24. Word-cloud visualizationofofreviewed
reviewed
2121 novel
novel methods
methods papers
papers (all those
(all those 21 papers
21 papers from
from Tables
Tables 19–21).
19–21).
4. Challenges
Table and
19. Method Research
papers Opportunities
for classification tasks.
This section provides a summary of the patterns observed (Section 4.1) from reviewing
the research discussed above. Sections 4.2 and 4.3 describe the challenges and research
opportunities from application (Section 4.2) and technical (Section 4.3) perspectives.
majority used ML (181), only a very small portion used DL (22) and CV (16); this is not
surprising, due to GEE’s limitations (Section 4.1.2). Note that the number does not add up
to 200, because some studies used combinations of ML, DL, and CV, so they were counted
multiple times. Among the 22 DL studies, most of them had to run the DL models either
offline on their local computers or on the Google Cloud AI platform. Only a very small
portion of studies (Section 4.3.1) actually integrated GEE with DL in an indirect way—DL
models were trained offline or on Google Cloud AI and then weights were uploaded to
GEE and performed online prediction there. The most-employed evaluation metrics are
OA (137 studies), PA (101), UA (98), and Kappa (76) (see Figure 5b for details). Of the
200 papers that we reviewed, all utilized GEE for both data processing, where 104 papers
also ran computation offline.
While the research investigated in Section 3 has demonstrated the power of using GEE
and AI for many different problem domains, most of the studies use GEE’s built-in ML
methods (e.g., RF, SVM, and CART). There is still a long way to go before researchers can
more easily develop, implement, test, and use novel AI methods (especially DL) on the
platform (see Section 4.1.2) due to bottlenecks in integrating GEE with Google AI cloud.
Some thematic areas are saturated with application-oriented papers, as is evident by the
list and number of citations in each subsection in Section 3.2. Our recommendation is
that for these areas (e.g., crop mapping and LULC), journals take less application-based
papers unless they are contributing new datasets or processing pipelines for working
with multiple datasets and start calling for novel method-based papers. However, other
areas (e.g., archaeology and bathymetry) could benefit from more use-cases or proof-of-
concept papers that open-source their code and data, speeding up the pace of research in
those respective fields.
From our interactive web app tool (see Figure 20 below), we noticed that most work
does not include hardware and software specifications (e.g., what CPU/GPU the authors
used to run their models, what Python libraries they used to implement the DL models,
etc.) and/or processing times [244]. Of the 200 total papers we reviewed, 101 ran strictly in
cloud computing environments (i.e., they had no offline component). Of these 101 papers,
only 10 papers provided their offline computation specifications (see Figure 25b for details).
From Figure 25a, most GEE integrated with AI work ran on the GEE cloud platform. Of
these papers, 98 (i.e., those marked as NA, which refers to “not applicable”) ran solely on
cloud platform(s) and 92 (those marked as NS, which means “not specified”) ran locally
without giving the hardware specification of the machines or runtimes for their analyses.
Of those studies that used cloud computation, the majority of them are on GEE while
a few combined GEE and the Google AI platform. A visual summary of software used
in the reported literate is provided in Figure 26. If a publication only used GEE or its
APIs, this is given a value of “NA” for “not applicable” since no additional software was
used. We can see from Figure 26 that 96 papers fall into this category. Of the remaining
papers that specified software that was used to complete part of an analysis outside of
GEE, 27 studies used R, 23 used Python, 19 used ArcGIS, and 10 used the scikit-learn
Python package. To make models comparable, reproducible, and to inform the design of
RS systems, it is important to report this type of information [245]. This is even true for
index-based methods and more traditional ML models so that researchers can fully evaluate
the trade-offs between runtime, accuracy, and ease of implementation. The interactive web
app tool that accompanies this review is intended, in part, to make future research more
reproducible. Most papers have an open-access PDF/HTML version of their manuscripts,
though a sizable portion of manuscripts (42 out of 200 of reviewed articles) do not. To
increase the rate of progress integrating GEE and AI, we suggest authors seek to provide
an open-access version of manuscripts whenever possible.
models so that researchers can fully evaluate the trade-offs between runtime, accur
and ease of implementation. The interactive web app tool that accompanies this revie
intended, in part, to make future research more reproducible. Most papers have an o
access PDF/HTML version of their manuscripts, though a sizable portion of manusc
(42 out of 200 of reviewed articles) do not. To increase the rate of progress integrating
Remote Sens. 2022, 14, 3253 and AI, we suggest authors seek to provide an open-access version 52 of 110
of manusc
whenever possible.
(a) (b)
Figurerelated
Figure 25. Statistics 25. Statistics related
to studies to studies
being computedbeing computed
in the incomputed
cloud or the cloud offline
or computed
on localoffline on
Remote Sens. 2022, 14, x FOR PEER REVIEW computers in the reviewed 200 papers. (a) Computed online on cloud
computers in the reviewed 200 papers. (a) Computed online on cloud platforms, (b) computed platforms,
121 (b) comp
62 ofoffline
offline on local machines. NA refers to “not applicable”, indicating a publication’s code ran s
on local machines. NA refers to “not applicable”, indicating a publication’s code ran solely on cloud
on cloud platform(s) and NS means “not specified”.
platform(s) and NS means “not specified”.
Figure26.
Figure Statisticsrelated
26.Statistics relatedtotowhat
whatsoftware
softwareand/or
and/orprogramming
programminglanguages
languageswere
wereused
usedininthe
the
studiesininthe
studies thereviewed
reviewed200200papers.
papers.NA
NA refers
refers toto “not
“not applicable”, meaning that those papers used
applicable”.
only GEE to complete their analysis.
4.1.2. GEE Limitations
GEE serves as a great free-of-charge cloud platform for EO big data processing and
analysis. With the very large amounts of data and combinations of temporal domains
utilized in [21], GEE was critical to enabling these investigations. The use of GEE also
facilitated the testing of several ML algorithms in a much faster way than would have
Remote Sens. 2022, 14, 3253 53 of 110
the Google AI platform with GEE creates a versatile technology to deploy deep
learning technologies at scale. Data migration and computational demands
are among the main present constraints in deploying these technologies in an
operational setting;”
# SNIC is the only object-based classifier on GEE; authors also want more “ad-
vanced methods” or just more options;
# Hyperparameter tuning is not possible on the platform [21], so many authors
use local software (e.g., scikit-learn) for this purpose and then upload the
models to GEE afterwards;
# One of the benefits of using an RF model is that you can run a feature impor-
tance analysis afterwards to determine which set of input features contributed
most to the model’s learning. However, this extremely common and important
operation is not possible on GEE.
• Inflexibility of models [19,35,46,152,159]: This limitation is similar to lack of models
but is different in that it describes issues using models already on GEE. For example,
authors in [35] emphasized, “A third limitation to the modeling approach described
here is its current incomplete use of cloud-computing services, and reliance on desk-
top computer power to run the BRT models. Ideally, the modeling would be run
within the same environment where the satellite data are preprocessed—Google Earth
Engine—or a similar cloud-computing service offering similar levels of access to
Sentinel datasets. GEE does currently provide machine-learning algorithms such
as random forests, but these do not provide the flexibility that is currently offered
within the BRT R functions”. This is both lack of methods and model inflexibility. The
authors in [46] found that in general the algorithms on GEE were not very flexible
and some preprocessing steps such as dealing with missing data were difficult to
implement. Thus, the authors performed all preprocessing steps outside of the GEE
platform.
• Lack of data [32,46,54,67,75,94,120,126,127,160–162,183,184,193,215,221]: This related
to both a lack of field observations and curated RS datasets.
# Not every data product is on GEE;
# Authors specifically called for a Landsat-Sentinel combined dataset. This dataset
could serve as the foundation for research in many different application areas by
expanding both the spatial and temporal resolution available to researchers;
# Very-high-resolution imagery is not on GEE, meaning that to validate GEE
prediction results authors often need to download this data locally.
• Importing and exporting data from GEE [83,126,193,198,234]: This process is time-
consuming and results in lower resolution classification maps. However, many
authors need to import or export data based on storage constraints on GEE.
• Other limitations:
# There is a delay from the time RS data are available and the time that they
are uploaded to the platform, limiting their utility for time-sensitive applica-
tions [213,214];
# Authors might have a hard time converting programs to GEE from their own
environment [81,136,217]. Cited issues were that authors were not familiar with
JavaScript, Python, or the GEE programming interface. Authors were concerned
that not everyone would have the skillset to implement models in GEE;
# A concern that data and code will not be kept private for sensitive use-
cases [217].
(summarized in Section 3.3). Below, we provide some challenges and opportunities related
to application-oriented research.
4.2.1. Proof-of-Concept for Less Researched Applications and Novel Methods for Saturated
Application Domains
The authors in [107] point out, “ . . . classification method demonstrated in GEE is
useful to provide a quick understanding of oil palm plantations . . . This in itself is advanta-
geous for independent monitoring bodies to conduct a survey of the landscape in question
and conduct more detailed assessments if necessary.” For applications that are not yet
well-studied using GEE, it will be useful to run some proof-of-concept experiments on GEE.
These types of analyses will shed light on what limitations exist for doing domain-specific
research on the platform (e.g., are the main barriers a lack of data, lack of preprocessing
models or AI methods, etc.).
Even for very saturated application domains (e.g., wetland mapping, see Section 3.2.6),
there are few novel methods. We would like to clarify that it is not that there are no novel
methods. Researchers still use interesting preprocessing pipelines, creating new datasets, and
often use DL. Again, we take a very narrow view of “novel” in this paper and this definition is
confined to how researchers are using AI methods on the GEE platform. Researchers focused
on wetland studies seem to be much more focused on using free compute, compiling and
scaling up datasets over larger areas than would be possible on local machines, and creating
open-source processing and visualization pipelines. However, there is still a lot of room for
novel methods for those saturated application domains. For example, it would be useful
for a saturated application domain to experiment with novel methods developed for other
domains. The web app we developed for this review paper will serve as an important tool
to easily find novel methods (check the demo video of the web app for how to find a novel
method paper; the link to the video is provided in the Appendix A).
to the overall analysis; the authors first have to do this in a local environment and then
upload them.
It is important to note that the authors in these papers are actively changing the results
of classification. In some cases, they are doing so many times (over several iterations). Thus,
they are introducing bias into their models, but the trade-off is acceptable if the emphasis is
on exploration rather than on statistical validity. This methodology is similar to using an
expert system where domain experts use ML systems in a “collaborative” way, blending
human expertise with the automation capabilities of AI. Still, these models would need to
be continuously tested on new data to make sure that their probability threshold values are
accurate, and their predictions should not be taken at face value.
generalization across both the validation and testing sets, maintaining high accuracy rates,
while the LSTM and RF model underfit the test set. To illustrate the tradeoffs between
ML and DL models, the authors include run and inference times. The RF model was able
to complete training and prediction in 3 h. As reported in [43], U-Net takes a long time
to train, while the LSTM takes a long time at inference time. Specifically, the LSTM took
30 min to train but 23 h to predict on the test set, while the U-Net took 24 h to train but
1.2 h at inference time. Much more work like this should be done to explore the strengths
and weaknesses for ML and DL models, as this will be helpful for many research areas that
would like to take advantage of GEE and AI.
With proper features from feature engineering, ML algorithms, which require less
(good quality) trained data than DL, often perform better than DL. For example, the authors
in [136] reported that their results indicated that the classification accuracy of DL was not as
good as traditional ML methods (e.g., SVM). We recommend the following three directions
for future studies in terms of feature engineering.
(1) Compare multiple ML algorithms or ML vs. DL algorithms: As pointed out
in [136], it is worth investigating which methods (ML vs. DL) are better for a specific
domain application. Their results indicated that the classification accuracy of DL was not
as good as traditional ML methods (e.g., SVM). Several ML models are compared in [115]
to map oil palm using Landsat 8 imagery in Malaysia. The authors find that tree-based
ML models (e.g., RF, CART) work better than an SVM for the task and are able to classify
large areas with high accuracy. Even so, classification errors are traced to the relatively
coarse resolution of Landsat data. The authors suggested that higher resolution imagery
(e.g., Sentinel) and the ability in the future to use DL methods on GEE will most probably
improve higher performance. The authors in [136] developed and implemented a new
pixel-based method (Ppf-CM) in GEE using 525 full Landsat scenes (19.96 billion pixels) to
monitor S. alterniflora dynamics. They found that Ppf-CM not only enhances the spectral
separability between S. alterniflora and others, but also improves the problems caused by
the scarcity of entire cloud-free Landsat scenes. These findings echo well with prior GEE-
supported pixel-based studies (e.g., [80]) and further confirm that pixel-based methods
outperform scene-based methods to monitor S. alterniflora. The classification results in [161]
were evaluated using both pixel-based and object-based RF classifications available on the
GEE platform. The results revealed the superiority of the object-based approach relative to
the pixel-based classification for wetland mapping.
The authors in [46] compare several algorithms on the GEE platform, including CART,
IKPamir, LR, a multi-layer perceptron (MLP), NB, RF, and an SVM, for crop-type classi-
fication. The authors also use an ensemble NN but have to move off the GEE platform
since NNs are not currently supported. The ensemble NN performed the best out of all the
models. The authors found that atmospherically corrected Landsat data boosted model
performance more than when models were fed Landsat composites data. The authors
in [56] compare the performance of an artificial neural network (ANN) to CART, RF, and
SVM models on GEE for sugarcane mapping in China using Sentinel-2 imagery. The au-
thors identify that the SVM performs the best, but then go on to show which type of errors
each model makes. For example: the ANN tended to overfit the data and give too much
preference to the sugarcane class, while tree-based models confuse the forest and water
classes. The authors then incorporate Normalized Difference Vegetation Index (NDVI)
information into the SVM to show how the model does with this extra information. It
is not clear why the authors did not allow each model to see NDVI information, as this
extra information may have helped various models learn better. If the authors wanted to
show how models learned from phenology information versus phenology combined with
NDVI information, they could have trained each model on separate subsets of the data.
While GEE allowed [159] to train several ML models, some models failed to run due to
computational constraints or inflexibility. The authors show that in all cases, ML models do
much better at binary than multi-class classification. The authors in [66] utilize many ML
algorithms available on GEE and compare specific time windows for phenological analysis
Remote Sens. 2022, 14, 3253 58 of 110
and find that the closer the data comes to planting and harvesting time, the better the ML
models performed.
(2) SAR + optical RS images for better model performance: In addition, many stud-
ies reported [17,57,68,72,74,165,166,182,212,214] or suggested in future work [46,56,107,215]
that SAR combined with optical RS images would improve model performance. Three
classification methods (SVM, RF, and decision fusion) were used in [52] for the pixel-wise
classification for crop mapping. The SVM classifier resulted in the lowest accuracy. The
integration of multispectral and SAR data improved the classification accuracy. To improve
the results in this study, the authors in [56] identify that using SAR data would be helpful
in removing the impact shadows have on classification errors for sugarcane mapping. The
authors in [95] compare the contribution of SAR data and different indices (e.g., NDVI,
EVI, Soil Adjusted Vegetation Index (SAVI), Normalized Difference Water Index (NDWI))
derived from optical data on overall classifier performance. They find that including SAR
data moderately improves performance, while only NDWI gives the ML model a signifi-
cant performance enhancement. Using optical, thermal, and SAR imagery in addition to
DEM data, [221] produces a global, high-resolution soil moisture map. The authors use a
gradient boosted regression tree (GBRT) model to train on in-situ observations paired with
RS imagery to then predict soil moisture in other locations. After running a relative variable
importance analysis, the authors can conclude that optical RS imagery and land-cover
information play the most important roles in determining soil moisture content, but that
SAR imagery and soil data also contribute significantly to the model’s overall performance.
This finding highlights other studies’ results ([95,161,182]) that the combination of optical
and SAR data improves predictive outcomes.
(3) What input for what algorithms (feature importance): This section is separate
from feature engineering in that it is less concerned with computing new features from
existing data than with determining which input variables contribute to model learning.
In [58], the random samples extracted from the training pool along with RS-derived
features and climate variables were then used to train ecoregion-stratified RF classifiers for
pixel-level classification. Evaluation of feature importance indicated that Landsat-derived
features played the primary role in classification in relatively arid regions while climate
variables were important in the more humid eastern states.
To investigate how best to identify impervious materials in RS imagery regardless
of cloud cover, [182] combine nighttime light, DEM, and SAR data and an RF model on
GEE. Their resulting maps are more accurate than commonly used maps like GlobeLand30.
More importantly, though, the authors quantitatively show that using multiple sources of
data are better than single sources for this task; optical data are the most important, but
SAR data improve accuracy rates across all metrics. In future studies, more work like this
needs to be done so that researchers can save time and effort by knowing which data will
be useful for a task beforehand. The authors in [178] compare different combinations of
input data and their impact on model performance. For their application, Landsat 8 data
serve as better input than Landsat 7 alone or Landsat 7 data with computed indices like
NDVI. Having access to datasets like the one produced by [178] will make it much easier
for future researchers to create more accurate building detection models, either by allowing
researchers to add to this dataset and training ML models or by using it as one of several
other datasets incorporated into the same analysis.
It is important not only to be able to map the current state of wetlands vegetation, but
how that vegetation is changing over time. However, different sets of input data and ML
methods used for change detection of wetland vegetation need to be evaluated more fully
as choices made during preprocessing and hyperparameter tuning can affect the end result
of an analysis. The authors in [138] use an adaptive stacking algorithm to train an ML
classifier on optical, SAR, and DEM data to identify wetland vegetation. Adaptive stacking
is using one ML classifier to identify the optimal combination of ensemble classifiers and
hyperparameters to be used for a given task. In this case, the authors use an RF model to
determine the best combination of the CART, Minimum Distance (MD), NaiveBayes (NB),
Remote Sens. 2022, 14, 3253 59 of 110
RF, and SVM classifiers on GEE. The authors find that the adaptive stacking method is
much more accurate than the RF and SVM models alone. The resulting classification map
is then combined with a trend analysis performed by the LandTrendr algorithm, which
allows them to identify wetland vegetation distribution as it is now and also how it has
changed over time. The authors in [138] also test their workflow on different subsets of
input data and show that adding more data helped the adaptive stacking algorithm learn
better (the best combination of input data was all of the data). The authors note that forest
and reed classes were not identified well with their adaptive stacking algorithm, and that
the LandTrendr algorithm will most likely need to be re-tuned in different environments.
The authors in [15] integrated single-date features with temporal characteristics from
six time-series trajectories (i.e., two Landsat shortwave infrared bands and four vegetation
indices), to produce an intact-disturbed forest map to track degraded forests. The whole
processing pipeline is done on GEE using an RF. The authors also ran a relative variable
importance analysis for each ecoregion. The authors are able to show that past maps
are a bit outdated due to their inability to separate forest classes by intact and degraded,
although their results vary from ecoregion to ecoregion. The purpose of the study in [21]
was to determine how the inclusion or exclusion of data for training RF models with RS
and temporally variable climate variables influences model outcomes. Cloud computing on
GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence
probability using LiDAR and RS data for the entire Alberta. Using a BRT, the authors are
able to match a current governmental effort in Alberta while also producing a relative
variable importance showing which RS variable might be the most useful for future wetland
mapping efforts in the area.
The authors in [55] used a CNN–LSTM hybrid model to predict soybean yield in the
contiguous United States using RS imagery alongside weather data and show that the
hybrid approach works better than either CNN or LSTM alone, although the results were
better in some states than others. Additionally, the authors create combinations of input
data to determine which variables are most important in training their NN.
A low-cost method was demonstrated in [107] for monitoring industrial oil palm plan-
tations in Indonesia using Landsat 8 imagery that allowed them to distinguish between
oil palm, forest, clouds, and water classes using the CART, RF, and MD algorithms. Their
results demonstrated that CART and RF had higher OA and Kappa coefficients than the
MD algorithm. In addition, the authors [107] compared model accuracy based on different
combinations of spectral bands (particularly red-green-blue (RGB) and infrared bands include
shortwave infrared (SWIR), thermal infrared (TIR), and near infrared (NIR)), including all
bands, to determine which would help specifically with oil palm plantation monitoring.
The authors in [136] used a specific invasive species in China as a case study for
developing an ML pipeline that takes into account both cloud cover and phenological
information. They compared the ability of a stacked autoencoder and an SVM to classify
vegetation types. While the SVM was trained on GEE, the DL model had to be trained
offline as the platform does not currently support DL models. The authors find that the DL
model performs better than the SVM and that both models perform better with phenological
information. The same species of plant can look different at different stages of its life while
also being submerged under water in some RS scenes. The authors in [140] argue that
phenology information in RS time series can better capture tidal flat wetland vegetation
and so can compare phenology information to statistical (min, max, median) and temporal
features (quartile ranges). They then feed this data into an RF while analyzing its effect on
model performance during different periods of time (all data, green and senescence seasons)
for wetland vegetation classification. The authors showed that the phenological information
was the most important input feature to the RF, while combining all three sets of features
led to the highest accuracy. In addition, the model performed best when predicting over
both the green and senescence periods, most likely providing the model with a better
estimate of the total variance needed to identify wetland vegetation. More research like
this should be done to isolate the importance of individual input features and time periods
Remote Sens. 2022, 14, 3253 60 of 110
almost the exact same method (this time for building detection), though the final ensemble
is chosen via a manual weighting process.
algorithm may not be appropriate for RS images or for RS images in a specific domain. We
call for AI and RS researchers and engineers to develop robust CV/ML methods for novel
and, ideally, computation-optimized, RS-image processing algorithms towards the smooth
and robust integration of GEE and AI.
(2) Reimplementing and/or optimizing (both classic and state-of-the-art) CV/ML
methods on GEE: The authors in [107] pointed out a need for more and better algorithms
on the GEE platform. The authors in [141] implemented GPR, which is increasingly used
because it is a transparent ML model that also outputs model uncertainties. The method
in [141] has been optimized for green Leaf Area Index (LAI) in RS imagery, in a way that is
optimized for GEE. First, they created the model so that it can run on vector or tensor time
series imagery. Then, the authors used active learning (AL) for feature reduction so that
the model only learns on important data while creating a model that can run within GEE’s
memory confines. This GPR model is then used to gap-fill RS imagery focused on LAI,
meaning the model is able to “see” through clouded optical imagery. More work like this
should be done, either in creating new models to upload to the cloud that other researchers
can use or optimizing these models so that they are memory efficient and thus can leverage
GEE on the cloud, instead of needing to preprocess and model training on local computers
or on Google Cloud AI. The authors mentioned that better GEE code documentation and
error messages could help future researchers interested in developing custom ML models
for the platform. (detailed in Section 3.2.4).
(3) DL with GEE: DL models are not currently available on the GEE platform (Section 4.1.2).
However, some authors [69,151,225,227,228] have found an interesting workaround that allows
them to use NN models directly in the cloud. All of these authors first train an NN model
outside GEE, and then upload the weight matrices as data files that can be read by the JavaScript
or Python development environments. Then, it is necessary to implement each layer in the
network (convolutional layers, activation layers, etc.), so that imagery can be run through the
NN at inference time to produce predictions. This method has worked across domains like water
extraction, cloud detection, and crop mapping. Still, there are several caveats to this approach.
First, researchers need to have access to the compute needed to train the NN model in the first
place. Often researchers are drawn to GEE because of the freely available compute, so this
method is mainly geared towards those looking specifically to use NNs. Researchers also need
to know how to implement and test different layers in an NN, a task that many EO researchers
may not have the experience for. Lastly, none of the authors listed above implemented the full
training process on GEE (e.g., forward and backpropagation).
Novel model architectures: Both [72,147] used the GEE platform to download and
process data with which they could then use to train novel NN models. The authors in [147]
trained a CNN called DeepWaterMapv2 that can handle flexible input sizes of optical RS
imagery and evaluate images with a constant runtime. Additionally, their CNN can filter out
clouds to fill in obstructed scenes and predict where water is with high accuracy. The authors
in [72], on the other hand, used both optical and SAR data from GEE to train a 3D U-Net
model for crop-type classification. The 3D CNN architecture shows an improvement over the
more traditionally used 2D convolution operations. Neither author used GEE itself for the
DL part of their analysis, because NN models are not currently supported on GEE. However,
their research shows that GEE makes it easy to locate data for a variety of applications.
Transfer learning (TL): TL is one powerful technique that makes models trained on
large sets of data and compute available for applications without these resources. TL was
initially proposed in [249] and recently received significant attention due to recent advances
in DL [250–255]. Inspired by humans’ capabilities to transfer knowledge across domains
(e.g., the knowledge gained while learning violin can be helpful to learn piano faster), the
main idea behind TL is that it is more efficient to take a DL model trained on an (unrelated)
massive image dataset (e.g., ImageNet [256]) in one domain, and transfer its knowledge to
a smaller dataset in another domain instead of training a DL classifier from scratch [257]. A
major assumption in many ML and DL algorithms is that the models will generalize to new,
unseen data given that it is from the same feature space and distribution [258], and that
Remote Sens. 2022, 14, 3253 63 of 110
there are universal, low-level features shared between datasets for different applications.
However, this assumption does not hold for many real-world problems. For example,
it is not uncommon that a classification task in one domain lacks sufficient data, but a
very large set of training data is available in another domain, where the data may be in a
different feature space or follow a different data distribution. In such situations, knowledge
transfer, if done successfully, would greatly boost the learning performance by avoiding
expensive and labor-intensive data-labeling efforts [250]. The authors in [71] showed that
TL works the best when they use a U-Net to map sugarcane in Thailand, meaning that the
pre-trained weights resulted in the highest accuracy, F1-score, precision, and recall. More
work should be done towards evaluating the effectiveness of TL within the EO studies as
it could potentially save large amounts of compute from not having to constantly train
DL models from scratch. The authors note that their model does not take into account
phenological information, which would have required changing the NN architecture, but
that this is an area for future research using their method.
and thus need annotation through an uncertainty selection strategy (see [1] for a detailed
introduction about the selection strategy).
While DL receives lots of attention, these models still require a lot of input data and
large amounts of compute to train them. However, as compute becomes publicly available
in cloud-based platforms like GEE, obtaining large amounts of labeled training data remains
a key bottleneck to using DL models. One novel way to make the data labeling process less
time- and resource-intensive was illustrated in [156], where the authors used current water
maps and a segmentation algorithm to automatically collect data labels from Sentinel-1
imagery. These data are then used to train variations of U-Net in an offline environment.
Due to computational constraints, the authors were not able to compare their model to more
traditional ML models like an RF. Even with their automated data labeling pipeline, the
authors note that their study lacked sufficient data to adapt their method to more than one
country and manual validation was still necessary to validate the model post-prediction.
detecting archaeological mounds. They then used an edge detection algorithm after the
supervised classification to automatically digitize/vectorize boundary features. Obtaining
an accuracy score before digitizing boundaries can give a higher level of confidence in
using the resulting dataset in future studies.
5. Conclusions
To leverage RS big data for large-scale important challenges such as global climate
change, intelligent methods and computation-intensive and -supportive cloud platforms
(including cloud storage of huge RS datasets) are critical. GEE is a pilot platform that has
great potential to support both challenges (i.e., AI methods and cloud computing platform).
Yet to date, many application domains (Section 3) still remain at the proof-of-concept
stage regarding leveraging GEE and AI. This trend may relate to a steep learning curve
for researchers. Overall, based on our systematic and interactive (Appendix A) review,
we contend that GEE integrated with AI has great potential to provide a collaborative
and scalable platform for researchers, practitioners, and policymakers to solve critically
important problems in various areas. However, many challenges, and thus opportunities,
still remain for a deeper and more seamless integration of GEE and AI. This is especially
true of the integration between DL and the GEE platform, which is detailed in Sections 4.2
and 4.3. Up to now, to take advantage of DL with GEE, the time-consuming training process
still has to take place outside GEE. Researchers and practitioners either have to train DL
models offline on local computers or on a separate cloud computing platform (e.g., Google
cloud AI), which is often not freely available to the public. In summary, the deeper and
Remote Sens. 2022, 14, 3253 66 of 110
smoother integration of GEE and AI has considerable potential to address major scientific
and societal challenges such as climate change and natural hazards risk management.
Author Contributions: All authors have contributed to this review paper. L.Y. initiated the review,
contributed to writing and overall organization, identified selected research to include in the review,
supervised the web app design and development, and coordinated input from other authors. J.D.
took the lead on identifying relevant literature, contributed to writing and editing the text, and
provided the data for the accompanying interactive web app. S.S. contributed to the web app design
and development, word clouds visualization, and editing. Q.W. contributed to identifying selected
research to include in the review and in writing part of Section 3. H.C. contributed to writing part
of Section 3 and editing the whole manuscript. C.D.L. has contributed to editing. All authors have
revised the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This material is partly based upon work supported by the US National Aeronautics and
Space Administration under Grant number 80NSSC22K0384, and supported by the funding support
from the College of Arts and Sciences at University of New Mexico.
Acknowledgments: The authors are grateful to Gordon Woodhull for his useful UI/UX design
discussion. The authors are also grateful to the three reviewers for their useful suggestions.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations (ordered alphabetically) are used in this article:
ACCA Automated Cloud Cover Assessment
ADL Active Deep Learning
AEZ Agro-Ecological Zone
AI Artificial Intelligence
AIM-RRB Annual Irrigation Maps—Republican River Basin
AL Active Learning
ALOS Advanced Land Observing Satellite
ANN Artificial Neural Network
APEI Air Pollutant Emissions Inventory
API Application Program Interfaces
ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer
AVHRR Advanced Very High Resolution Radiometer
AWS Amazon Web Services
AW3D30 ALOS World 3D—30 m
BCLL Biodiversity Characterization at Landscape Level
BELMANIP2 Benchmark Land Multisite Analysis Intercomparison Products 2
BFAST Breaks for Additive Season and Trend
BGT Bagging Trees
BRT Boosted Regression Tree
BST Boosted Trees
BT Bagged Trees
CART Classification And Regression Tree
CCI-LC Climate Change Initiative Land Cover
CBERS China–Brazil Earth Resources Satellite
CBI Composite Burn Index
CDL Cropland Data Layer
CDOM Chromorphic Dissolved Organic Matter
CGD Crowdsourced Geographic Data
CGLS-LC100 Copernicus Global Land Cover Layer
CHELSA Climatologies at High Resolution for the Earth’s Land Surface Areas
Chl-a Chlorophyll-a
Colab Google Colaboratory
Remote Sens. 2022, 14, 3253 67 of 110
Appendix A. The Accompanying Interactive Web App Tool for the Literature of GEE
and AI
In Sections 1.1 and 3.1, we provided a brief map and graphic summary of the 200 papers
covered in this review. To allow readers to search for literature that is relevant to their
research interests, get more useful and dynamic information and insights from the papers re-
viewed, we have developed an interactive web app called iLit4GEE-AI
(https://geoair-lab.github.io/iLit4GEE-AI-WebApp/index.html (accessed on 1 May 2022)).
On our site, you will find:
• A brief web app demo video: the video link is accessible at the web app
page (top-right corner);
• Acronyms that are used in the data table of the web app, as well as explanations for
each data field and chart (also in the top-right corner). A plan to continuously update
and maintain the web app: To better serve the RS/GEE researcher and practitioner
community, as well as AI engineers who would like to contribute to RS and GEE, we will
continue to update the data to include new GEE + AI literature as it is published. Even
after this paper is published, we hope this web app will serve as one place to keep track
of a comprehensive and up-to-date list of GEE + AI literature. In the future, the data on
the web app will be maintained and continually updated by the members of the GeoAIR
Lab (Geospatial Artificial Intelligence Research and Visualization Laboratory). Our web app is
data-driven and scalable (i.e., once data gets updated, the web app will automatically
sync and update the visualization and filtering functions on the site).
only on open-source data and releasing their code for the GEE platform. The authors
were also able to distinguish between crop subtypes like agriculture and agroforestry, a
common problem for many cropland data products. In addition, [67] showed that across
regions, NDVI, NDWI, and slope were good predictors for various crop labels while blue
and SWIR1 were not. While the authors achieved good results across a wide area, their
processing pipeline and thus results relied on relatively cloud-free Landsat data. In the
future, a harmonized Landsat-Sentinel data product would increase data availability and
improve results further. Lastly, the authors noted that while the data gathering process
was time- and resource-intensive, future projects that crowdsource or pool data products
together would save time and effort.
Over a three-year time period, the authors in [75] were able to map paddy rice using
Sentinel imagery by utilizing several different spectral indices and creating composites of
different paddy rice growth periods. Their results were highly accurate in three separate
areas. The authors shared their code on GEE, while also showing that their open-source
analysis showed good agreement with maps previously produced by government agencies.
However, the authors noted that their method was still subject to finding cloud-free optical
RS imagery and/or finding adequate cloud masking algorithms. In [68], the authors pro-
posed a paddy rice area extraction approach by using the combination of optical vegetation
indices and SAR data. The Sentinel-1A SAR and the Sentinel-2 MSI Level-2A imagery were
used to identify paddy rice. Three vegetation indices, namely NDVI, EVI, and land surface
water index (LSWI), were estimated from optical bands. Two polarization bands from
Sentinel SAR imagery were used as a supplement to overcome the cloud contamination
problem. This approach was applied with RF algorithm for the Jianghan Plain in China as
an experimental area. The authors in [71] thus used a U-Net to map sugarcane in Thailand
but used a lightweight NN as an encoder for the DL model to reduce compute costs. They
tested the network architecture using the RGB channels and pre-trained weights, RGB
channels and randomly initialized weights, and then randomly initialized weights while
using the RGB and NIR channels. Because DL models were not currently supported by GEE,
the authors used Google Cloud, GEE, and the Google AI Platform together to preprocess
their data and train their models. They showed that transfer learning works the best (i.e.,
the pre-trained weights resulted in the highest accuracy, F1-score, precision, and recall).
The authors noted that their model did not take into account phenological information,
which would have required changing the NN architecture, but that this was an area for
future research using their method.
Shade-grown coffee landscapes are critical to biodiversity in the forested tropics, but
mapping it is difficult because of mountainous terrain, cloud cover, and spectral similarities
to more traditional forested landscapes. The authors in [50] used Landsat, precipitation,
and DEM data to map shade-grown coffee in Nicaragua using a RF model. The authors
reported high accuracy scores across different land class types (including shade-grown
coffee), but also did a relative variable importance on what data contributed most to the
RF model’s performance. More specifically, [50] performed an ablation study where they
compared model performance based on increasing the number of features the model sees.
They found that elevation was the most important factor, followed by the correlation
between precipitation and NDVI, temperature, and slope, and seasonal information helped,
as well. The authors noted that high-resolution data would help boost accuracy metrics
in this classification task, but that increasing accuracy did not directly relate to increased
socio-cultural or economic relationships in the region of study. The authors in [57] mapped
corn at a 10-m resolution using multitemporal SAR and optical images. Certain metric
composites were calculated, including monthly composites and percentile composites for
Sentinel-1 images and percentile and interval mean composites for Sentinel-2 images, which
were used as input to the RF algorithm on the GEE platform. To avoid speckle noise in
the classification results, the pixel-based classification result was integrated with the object
segmentation boundary completed in eCognition software to generate an object-based
corn map according to crop intensity. In [78] the authors explored the differences between
Remote Sens. 2022, 14, 3253 72 of 110
Landsat and Sentinel imagery for identifying cotton in China over the course of the plant’s
life cycle. They found that Landsat data performed slightly better than Sentinel optical
imagery, perhaps due to compute constraints on GEE: all of Sentinel’s input bands were not
able to be used and vegetation indices were not able to be calculated, perhaps not taking
advantage of Sentinel’s full potential. However, for the three years of RS data analyzed in
the analysis, the authors only used Sentinel imagery for one year, making the results for
the two datasets not directly comparable. Importantly, though, the authors examined the
types of error that different input datasets made, finding for example that small dirt roads
were more distinguishable from cotton fields in Sentinel imagery than in Landsat imagery.
The authors in [66], showed that by using climate and soil data with RS imagery on
the GEE platform, it was possible to predict winter wheat yields 1–2 months ahead of
harvesting in China. The authors utilized many ML algorithms available on GEE and
compared specific time windows for phenological analysis and found that the closer the
data came to planting and harvesting time, the better the ML models performed. Still,
uncertainties from data resolution and human activity were present and affected the ability
of models to predict with high accuracy across agricultural zones.
Crop maps are often created using vegetation indices and field observation data. The
authors in [73] argued that this may lead to datasets and ML models that can only predict
in specific areas and not generalize up to larger areas (i.e., regions or countries) or to other
time periods in the same area. They further argued that what is needed is a more generalized
method that can take in information like weather and climate data or DEM data and scale up
to field-level predictions or larger. The authors compared a RF to three different DL models, a
DNN, 1D CNN, and LSTM for predicting wheat yield in China. The DNN and RF performed
the best over large areas, and the RF model often had the best performance. This is important
to note because RFs often have comparable or better performance than DL models but use
much less compute to train. However, this result could be due to the small size of the author’s
dataset, meaning that the DL models were not able to train on enough data to merit their use.
The authors ran a variable feature importance with the RF model across different years and
months within their data and showed that elevation, latitude, soil, and vegetation indices
were the most important input data while weather and climate data were the least important.
The authors in [76] utilized GEE, Sentinel-2, and field data to train a RF to first estimate
LAI and FPAR at a much finer spatial scale. Their LAI and FPAR maps matched well with
field observations and, when spatially aggregated to match the resolution of the MODIS
LAI/FPAR product, were in good agreement there, too. However, their method was based
on an assumption about static land cover classes over a three-year time span, meaning that
future work could potentially boost the accuracy of the method by checking to make sure this
assumption was not in fact dynamic and changing over this period.
The authors in [49] produced annual irrigation maps (1999–2016) in the US Northern
High Plains by combining all available Landsat satellite imagery with climate and soil
covariables in a RF classification workflow. In total, 9 Landsat variables and 11 covariables
were generated for use in the machine learning classification. To understand the relative
contribution of input variables to classification accuracy, permutation tests and GINI Index
metrics were run in R with an identically parameterized classifier since GEE did not output
variable importance measures at the time of this study. Two novel indices that integrate
plant greenness and moisture information ranked highest for both importance metrics used,
warranting further study for use in irrigation classification in other agricultural regions.
Statistical modeling suggested that precipitation and commodity price influenced irrigated
extent through time. This method relied on manually produced training and test datasets
well suited to identify areas where irrigation clearly enhances greenness. The authors
in [51] implemented an automatic irrigation mapping procedure in GEE that uses surface
reflectance satellite imagery from different sensors (Landsat 7/8, Sentinel-2, MODIS Terra
and Aqua imagery, SRTM DEM). The approach integrated in a novel way unsupervised
object-based image segmentation, unsupervised pixel-by-pixel classification, and multi-
temporal image analysis to distinguish productive irrigated fields from non-productive and
Remote Sens. 2022, 14, 3253 73 of 110
non-irrigated areas. The combination of these techniques enabled the detection of irrigated
areas without requiring any reference cropland data for training of the mapping algorithm.
The authors in [58] developed a rapid method to map Landsat-scale (30 m) irrigated
croplands across the conterminous United States (CONUS). The method was based upon
an automatic generation of training samples for most areas based on the assumptions
that irrigated crops appear greener than non-irrigated crops and had limited water stress.
Two intermediate irrigation maps were generated by segmenting Landsat-derived annual
maximum greenness and Enhanced Vegetation Index (EVI) using county-level thresholds
calibrated from an existing coarse resolution irrigation map. The random samples extracted
from the training pool along with RS-derived features and climate variables were then
used to train ecoregion-stratified RF classifiers for pixel-level classification. Evaluation
of feature importance indicated that Landsat-derived features played the primary role in
classification in relatively arid regions while climate variables were important in the more
humid eastern states.
The authors in [46] compared several algorithms on the GEE platform, CART, IKPamir,
logistic regression, a MLP, NB, RF, and an SVM, for crop-type classification in Ukraine.
The authors also used an ensemble NN but had to move off the GEE platform since NNs
were not currently supported. The ensemble NN performed the best out of all the models,
although the authors noted that the SVM algorithms were not working on the GEE platform.
To that end, the authors found that in general the algorithms on GEE were not very flexible,
and some preprocessing steps like dealing with missing data were difficult to implement,
so all preprocessing steps took place outside of the GEE platform. The authors found
that atmospherically corrected Landsat data boosted model performance more than when
models were fed Landsat composites data. In the future, [46] said that optical imagery
in conjunction with SAR data or combining data from multiple RS platforms would help
boost performance. The authors in [72] combined optical and SAR Sentinel data to create
higher-resolution maps capable of displaying information on less commonly mapped non-
staple crops in the US. First, the authors denoised their SAR data with a CNN, and then
fused this with optical RS imagery. These data were then used to train a RF, as well as three
separate DL models: SegNet, U-Net, and a 3D U-Net. The authors showed that fusing
optical and SAR data worked better than using optical data alone, that using denoised SAR
data in the fusion process led to higher accuracy scores, and that the best model was the
3D U-Net model trained on the optical-denoised SAR fused data. However, an interesting
finding was that the RF performed best when using only optical information alone. The
authors trained their DL models offline as NNs were not currently supported on GEE.
The authors mentioned that the extremely high accuracy rates of the 3D U-Net model
might indicate overfitting, and that when taking into account required training times, the
RF model performed well while using the least amount of compute across all datasets.
Lastly, this paper used semantic segmentation, but future research in the field should
investigate instance segmentation. Optical imagery is used in many EO analyses because it
is comparable to how humans see; we can easily understand it. However, it is often blocked
by clouds limiting its utility. SAR imagery works day or night regardless of cloud cover,
so [74] used it for crop classification while testing input composite image length and ML
classification performance. The authors compared an object-oriented classification method
combining the SNIC algorithm with a RF with that of a pixel-based method of just the RF
by itself. The authors found that adding SNIC to their processing routines smooths the
data before it was fed into the RF model, ultimately boosting accuracy rates more than 10%
in their study. They also showed that shorter time periods were more useful for making
composites for classification, most likely because plants look very different over the course
of a growing season. However, the authors noted that their method worked better for
larger cropland areas and might not generalize to other areas with smaller field sizes. The
authors in [56] compared the performance of an ANN to CART, RF, and SVM models on
GEE for sugarcane mapping in China using Sentinel-2 imagery. The authors identified that
the SVM performed the best, but then went on to show which type of errors each model
Remote Sens. 2022, 14, 3253 74 of 110
makes. For example: the ANN tended to overfit the data and give too much preference
to the sugarcane class, while tree-based models confuse the forest and water classes. The
authors then incorporated NDVI information into the SVM to show how the model did
with this extra information. To improve the results in this study, the authors identified
using SAR data would be helpful in removing the impact shadows have on classification
errors. The authors in [19] created an open-source map for several West African countries
using a RF model trained on Landsat data. Their map was moderately more accurate than
other maps produced for the region, though going further and demonstrating the difference
between feature importances based on wet and dry seasons for their countries of analysis.
The authors used GEE for processing data but needed to train their model offline because
the GEE RF model implementation was not flexible enough for their analysis. Papers like
this one show a trend that GEE is facilitating in that researchers now have freely available
compute and are moving away from local, small-scale classifications and towards regional,
national, and even global classification tasks.
The authors in [48] developed and implemented an automated cropland mapping
algorithm (ACMA) using MODIS 250-m 16-day NDVI time-series data. A web-based in
situ reference dataset repository was first developed to collect ground data through field
visits, very high spatial resolution data (sub-meter to 5-m), and through community by
crowdsourcing. Comprehensive knowledge base was then established for Africa using
the web repository. Second, clustered classes from each of the eight agro-ecological zones
generated using k-means algorithm were grouped together through quantitative spectral
matching techniques (QSMTs) and the group of similar cluster classes was matched with
the ideal spectra to identify and label classes. This process produced a reference cropland
layer for the year 2014 (RCL2014) for the entire African continent consisting of five crop
products (cropland extent and areas; irrigated versus rainfed croplands; cropping intensi-
ties; crop type and/or dominance; croplands versus cropland fallows). Third, decision tree
(DT) algorithms were established for the eight agro-ecological zones (AEZs) based on the
RCL2014 knowledge base which was subsequently composed into an ACMA applicable
for the entire African continent. Finally, the ACMA algorithm was deployed on GEE and
applied on MODIS data from 2003 through 2014 to produce annual ACMA generated
cropland layers. The Agriculture and Agri-Food Canada (AAFC) has been responsible
for producing Annual Space-Based Crop Inventory (ACI) maps for Canada. The 30-m
ACI maps were created by applying a decision tree method to optical (e.g., Landsat) and
SAR data (e.g., Radarsat-2). With the goal of producing ACI maps more effectively and
efficiently, the authors in [69] developed an object-based method (i.e., simple Non-Iterative
Clustering (SNIC)) for producing ACI maps based on Sentinel-1 SAR data and Sentinel-2
optional data. The GEE platform and ANN were used to produce an ACI map for 2018.
The OA was reported at 77%. Even though the OA was slightly lower than that of the
AAFC’ ACI maps, the authors argued that their proposed GEE method is promising due to
its superior computational efficiency.
change-vector analysis in posterior probability space (CVAPS) and the best histogram
maximum entropy method for change detection, and further improved the accuracy of the
land-updating results in combination with NDVI timing analysis. Selecting western China
as the research area and using GEE’s JavaScript API interface, they obtained a 2014 land
map based on the ESA GlobCover 2009 dataset. A total of 1000 verification points were
selected for visual interpretation in Google Earth. A program with Node.js and JavaScript
was also developed to randomly generate validation points and an auxiliary rectangle.
The results of the transfer error matrix analysis showed that the overall accuracy of the
land map from the proposed CART-CVAPS-NDVI method was 78.6–88.2%. The authors
in [93] designed such a workflow on GEE for Iran using Sentinel-1 and -2 data and a RF
model and SNIC. With the ground-truth training samples available, the authors used SNIC
to segment land-use classes into objects while the RF model classifies them on the pixel
level. Afterwards, visual assessment was used to verify majority voting between the two
classifiers for 13 different land-use classes. While there was some confusion between similar
classes (e.g., water and marshland), this analysis resulted in a much higher resolution,
much more accurate land-use map of Iran than the 2016 map. However, the authors noted
that in some ways GEE limited their study: for example, SNIC was the only segmentation
algorithm on GEE. Additionally, because of computational limits on the platform, only so
many training samples can be included, and input features have to be chosen carefully
before feeding them to a ML model.
The authors in [83] utilized Landsat images available through GEE to map annual
land-use changes in China’s poverty-stricken areas. Landsat 8 images from 2013–2018 were
preprocessed and then used to compute spectral indices (e.g., NDVI, Normalized Difference
Built-up Index (NDBI), MNDWI). Night-time data were also included to improve the
extraction of built-up areas. A RF classifier was then trained and used to perform land-use
classification in poverty areas. The results revealed significant variations in land-use change
among the poverty areas in China. Some poverty areas had more intense construction
activities than others. The authors mentioned some limitations of GEE, for example, the
low computational efficiency of vector data. Uploading data to GEE or exporting data
from GEE can be time-consuming. The authors in [87] set out to create an open-source
land cover mapping processing pipeline using GEE. They argued that land cover maps
specifically can help countries properly plan for sustainable levels of food production, but
that many developing countries did not have the financial or compute resources to monitor
land classes in real time. Using SVM and bagged trees (BT) models, the authors predicted
urban, agriculture, tree, vegetation, water, and barren land-use types in Lesotho. However,
the authors had low accuracy rates across most classes. During the ML training process, the
authors ultimately had to leave the GEE platform because of “out-of-computation” time
errors in the code editor.
The authors in [88] collected a multi-seasonal sample set for global land cover map-
ping in 2015 from Landsat 8 images. The concept of “stable classification” was used to
approximately determine how much reduction in training sample and how much land
cover change or image interpretation errors can be acceptable. Using a RF algorithm with
200 trees, a numerical experiment showed that less than 1% overall accuracy was lost when
less than 40% of the total global training sample set were used, when 20% of the global
training sample points were in error, or even the land cover changed by 20%. With this
knowledge in mind, the authors transferred their 2015 global training sample set at 30-m
resolution to 10-m resolution Sentinel-2 images acquired in 2017 and produced a 10-m
resolution global land cover map.
Feature engineering can lead to higher accuracies in EO analyses when using ML.
However, it is difficult to create features that you know will be useful to a model beforehand,
even with expert domain knowledge in a given area. Thus, the authors in [105] tested the
difference in model performance when using single image mosaics, time series RS imagery,
statistical features (median, standard deviation), band ratios, or all of the features listed.
They test this by training a RF model on each subset of data to create LULC maps in Brazil.
Remote Sens. 2022, 14, 3253 76 of 110
The authors found that inputting a time series of the data was the most accurate, more
accurate even than when using all of the data. This research showed that more data was not
always better and that feature engineering did not always lead to better model performance
despite the increased compute cost. The authors in [92] trained several different ML models
available on GEE with different combinations of input data to determine which were the
most important in determining land-use types in Golden Gate Highland Park in China.
The authors compared combinations of different band ratios, elevation, aspect, and slope
data and found that including SWIR data in their analysis reduced classification errors
in areas with sparse vegetation. Different models were able to capture different land-use
types. For example, SVMs better distinguished between urban and agricultural lands,
while the RF model used was better at identifying forested landscapes, suggesting that
different types of models may be suitable for different tasks. Even though OA rates were
high for the best models, most models still had issues telling bare or rocky landscapes
apart from drier vegetation. The authors in [95] set out to compare the contribution of
SAR data and different indices (NDVI, EVI, SAVI, NDWI) derived from optical data on
overall classifier performance. They found that including SAR data moderately improved
performance, while only NDWI gave the ML model a significant performance enhancement.
The authors still struggled to classify vegetation subtypes like shrubs, grasslands, and
aquatic vegetation, but their accuracy rates matched those of common LULC maps like
Finer Resolution Observation and Monitoring of Global Land Cover 30 m (FROM-GLC30)
and GlobeLand30. This work contributed to a growing body of literature attempting to
empirically show which input data types can help identify which LULC classes using RS
and ML. This researchers in [98] generated a land cover map of the whole African continent
at 10 m resolution, using multiple data sources including Sentinel-2, Landsat-8, Global
Human Settlement Layer (GHSL), Night Time Light (NTL) Data, SRTM, and MODIS Land
Surface Temperature (LST). Different combinations of data sources were tried to determine
the best data input configurations. It was found there was always an increase of accuracy
when new data were introduced. They also conducted an investigation of the importance
of individual features derived from a RF classifier. A transferability analysis experiment
was designed to study the influence of sampling strategies on the land cover mapping
performance. It was suggested that training samples of natural land cover classes should
be collected from areas covering each main Köppen climate zone for African land cover
mapping and other similar tasks. Different data sampling strategies and their effects on
how different ML classifiers performed on LULC tasks were compared in [101]. The authors
trained a Relevance Vector Machine (RVM) offline in addition to the CART, RF, and SVM
models on GEE. For their particular LULC application, stratified proportional random
sampling led to higher overall accuracy scores than stratified equal random sampling or
stratified systematic sampling and the RF model performed better than the CART, RVM,
and SVM. However, their study lacked ground truth data, so the authors needed to use
existing land cover maps for data collection purposes. As a result, even the best model (RF)
had trouble recognizing classes without many samples leading to low class accuracies.
The authors in [96] proposed a hybrid data balancing method, called the Partial
Random Over-Sampling and Random Under-Sampling (PROSRUS), to resolve the class
imbalance issue. PROSRUS used a partial balancing approach with hundreds of fractions
for majority and minority classes to balance datasets. The reference samples were generated
using visual interpretation of very high spatial resolution images of Google Earth. It was
observed that PROSRUS had better performance than several other balancing methods
and increased the accuracy of minority classes without a reduction in overall classification
accuracy. It was noted though that every dataset requires a specific balancing ratio to
obtain the optimal result because the imbalance ratios and complexity levels are different
for different datasets. It also showed that topographic data including elevation, slope, and
aspect had higher impacts than spectral indices in improving the accuracy of MLC maps.
The authors in [97] proposed a new method by integrating random under-sampling of
majority classes and an ensemble of Support Vector Machines, namely Random Under-
Remote Sens. 2022, 14, 3253 77 of 110
for. Most issues classification errors were related to cloud and hill shadows and identifying
mangroves farther away from the coastline. Further, the authors used visual assessment as
their only accuracy metric. While representing classification accuracy visually is certainly
important, more quantitative measures are needed in order to properly compare results
from different studies.
The authors in [109] developed and tested a participatory mapping methodology to
map the extent and species composition of forest plantations in the Southern Highlands
area of Tanzania. A large set of reference data was collected in a two-week participatory
GIS campaign in which local experts interpreted very high-resolution satellite images in
Google Earth through the Collect Earth tool in the open-source Open Foris suite. Three
different classifiers (CART, SVM, and RF) were tested to classify a multi-sensor image
stack of Landsat 8 (2013–2015), Sentinel-2 (2015–2016), Sentinel-1 (2015), and SRTM derived
elevation and slope data layers. A RF with 150 trees was selected for creation of the forest
plantation area and planted species distribution maps. One of the main challenges in
participatory reference data collection was the quality and consistency of the collected
samples. The study found that sufficient training prior to the data collection was crucial for
the interpretation success. The interpretation agreement generally declines when details
are increased from forest plantation coverage to specific plantation quality attributes. The
authors stated that at least in complex environments, it may not be realistic to expect good
accuracy on detailed level information such as tree species or age derived from visual
interpretation of optical data. To explore how GEE could be used to create an open-source
processing pipeline for deforestation mapping in Liberia and Gabon, the authors in [116]
used two different RF models to create data masks and then predictions for various land
types there. However, the output classification maps were then shown to local experts to
correct, boosting the accuracy of the final accuracy rates. The authors showed that their
method was more accurate than other efforts to classify deforestation rates in these two
countries, though there were still some model misclassifications between classes due to
not enough ground-truth data. This presents a future area of research, where ML/DL/CV
models are used to generate first-order maps that are then verified by experts in that
field (i.e., expert systems). Building land classification maps in this way saves experts’
time but also keeps humans in-the-loop where human values and knowledge can still be
represented and included.
The authors in [125] developed a method for monitoring tropical forest loss and
recovery based on Landsat data. First, the authors used a RF model to map canopy cover
through time as a proxy for forest degradation and then applied the LandTrendr algorithm
to detect changes over a 19-year period. They found that the most valuable variables for
predicting tree canopy decline and regrowth was shortwave surface reflectance data and
an index related to plant moisture. While Landsat data were useful for tracking changes in
forest distribution through time, the authors noted that more very high-resolution products
for ground-truthing would benefit their analysis, as would the use of SAR data since
tropical forests were covered by clouds a large portion of the time. Using SAR data as
input and high-resolution optical data as validation data, the authors in [124] trained a
U-Net on Google Cloud to create monthly forest loss maps. They compared this model
with a RF trained on GEE while testing both models in Brazil and the United States where
both logging activity and wildfires were prevalent. They showed that the U-Net model
outperformed the RF in most cases, though the RF model still achieved high accuracy
rates. However, when the U-Net model was trained on data from one region and then
applied to the other, it did not perform well. Thus, the CNN is not generalizable and
would need to be re-trained before being used in additional locations. In [117], the authors
showed how GEE can be used to overcome data storage and compute needs and analyze
about 20 years’ worth of Landsat data to determine forest cover changes. The authors
used a RF model to show where deforestation has continued versus where forests have
partly recovered. Then, they fed the predictions of their RF model to an ANN-based forest
projection model to simulate forest loss up through 2028. The authors noted that because of
Remote Sens. 2022, 14, 3253 79 of 110
LiDAR data could help distinguish similar classes and boost OA. The resulting maps are
freely available through the MapBiomass platform. The authors in [138] used an adaptive
stacking algorithm to train a ML classifier on optical, SAR, and DEM data to identify
wetland vegetation. Adaptive stacking is using one ML classifier to identify the optimal
combination of ensemble classifiers and hyperparameters to be used for a given task. In this
case, the authors used a RF model to determine the best combination of the CART, MD, NB,
RF, and SVM classifiers on GEE. The authors found that the adaptive stacking method was
much more accurate than the RF and SVM models alone. The resulting classification map
was then combined with a trend analysis performed by the LandTrendr algorithm, which
allowed them to identify wetland vegetation distribution as it is now and also how it has
changed over time. Additionally, [138] tested their workflow on different subsets of input
data and showed that adding more data helped the adaptive stacking algorithm learn better
(in fact, the best combination of input data was all of the data). The authors noted that forest
and reed classes were not identified well with their adaptive stacking algorithm, and that
the LandTrendr algorithm will most likely need to be re-tuned in different environments.
Bathymetry and RS data were combined in [127] to create a processing and analysis
pipeline for large scale seagrass habitat monitoring in Greece using GEE. While the authors
compared CART, RF, and SVM models on the GEE platform and how they performed
on open-source datasets, they validated the models on unpublished data, which made
it difficult to replicate their results. A key limitation to this processing workflow is the
lack of in-situ validation data. Thus, their preprocessing pipeline depends on creating a
data mask for labels using a ML model, which is then fed to ML models as input data. If
there is uncertainty or errors in the first output data layer, these errors would persist in
the secondary classification step. Their reported OA is 72% and the authors suggested
more seagrass datasets for performance improvement. A CNN–LSTM hybrid model was
used in [132] to identify grassland types in Sentinel-2 imagery in the United States. The
authors collected ground-truth field data for their experiment, and with the help of GEE
for preprocessing and Google Colab for NN training, they received an almost 7% accuracy
boost for identifying a type of grass (98.8%, up from 92%). However, the authors’ dataset
was very small (13 Sentinel-2 images in total, 6 images in 2016 and 7 in 2017, as the time
range corresponds to their field surveys years), so it was uncertain how this model would
generalize to other regions in the same state or in different states altogether.
The authors in [43] compared the performance of a RF model with feature engineering
to a LSTM and U-Net NN models without feature engineering for identifying pasturelands
in Brazil. The RF model was trained on GEE while the NNs had to be trained offline as GEE
did not currently support DL models. The authors crowdsourced the creation of a LULC
dataset for Brazil using PlanetScope imagery to domain experts, ensuring that the labels
for the input data were accurate. These LULC classes contained important pastureland
subtypes in addition to savannah, forest, built-up areas, and water. U-Net had the highest
generalization across both the validation and testing sets, maintaining high accuracy rates
while the LSTM and RF model underfit the test set. To illustrate the tradeoffs between
ML and DL models, the authors included run and inference times. The RF model was
able to complete training and prediction in 3 h. The LSTM took 30 min to train but 23 h
to predict on the test set, while the U-Net took 24 h to train but 1.2 h at inference time.
The authors in [126] used GEE to compare how well several ML classifiers compare to
index-based methods like NDVI. Using over 40 years of optical Landsat imagery, the
authors were able to map vegetation loss with high accuracy matching that of a current
government vegetation monitoring program (though their process relies only on cloud
computing and freely available data) in Australia. However, different amounts of rainfall
affected their results because models were not able to fully recognize vegetation in varying
greening and drying patterns. Future analyses should attempt to collect more and higher
resolution data to improve model performance. The authors in [140] argued that phenology
information in RS time series can better capture tidal flat wetland vegetation and so
compared phenology information to statistical (min, max, median) and temporal features
Remote Sens. 2022, 14, 3253 82 of 110
(quartile ranges). They then fed this data into a RF while analyzing its effect on model
performance during different periods of time (all data, green and senescence seasons) for
wetland vegetation classification. The authors showed that the phenological information
was the most important input feature to the RF, while combining all three sets of features
led to the highest accuracy. Additionally, the model performed best when predicting over
both the green and senescence periods. To explore how plant functional types can be
derived directly from RS information, [137] trained a RF model on field, DEM, MODIS, and
climate data. Their method was able to distinguish between moist and dry deciduous tree
types with a high degree of accuracy, which could lead to better estimates of carbon, water,
and energy fluxes. Still, the authors struggled to identify shrubs, grasses, and crops, and
built-up areas.
The authors in [141] implemented just such a model that has been optimized for green
LAI in RS imagery but do so in a way that is optimized for GEE. First, they created the model
so that it can run on vector or tensor time series imagery. Then, the authors used AL for feature
reduction so that the model only learned on important data while creating a model that can
run within GEE’s memory confines. This GPR model was then used to gap-fill RS imagery
focused on LAI, meaning the model was able to “see” through clouded optical imagery. More
work like this should be done, either in creating new models to upload to the cloud that
other researchers can use or optimizing these models so that they are memory efficient. The
authors mention that better GEE code documentation and error messages could help future
researchers interested in developing custom ML models for the platform.
Appendix C.5. Textual Summaries for Water Mapping and Water Quality Monitoring
In [32], the authors created a web portal using GEE as a backend alongside an expert
system to identify bodies of water in Landsat imagery. Being able to visualize global trends
in surface water allowed the authors to identify trends such as all continents gaining surface
water, although this varies from region to region. While small bodies of water (30 m × 30 m or
smaller) were not able to be mapped using the expert system, the process of mapping global
surface water was sped up by the use of GEE compute resources. The authors noted that some
regions had more accurate water maps because of the length of the observation record. In [142],
the authors used all available Landsat images to study surface water dynamics in Oklahoma
from 1984 to 2015. About 16,000 Landsat scenes were preprocessed using GEE. Subsequently,
they computed spectral indices (e.g., MNDWI, NDVI, and EVI) and performed conditional
operations to extract surface water areas. Four surface water products were created, including
the maximum, year-long, seasonal, and average surface water extents. The results showed
that both the number of surface water bodies and surface water areas had been decreasing
from 1984 through 2015. Significant inter-annual variations in the number of surface water
bodies and surface water areas were found. They also found that both the number of surface
water bodies and surface water areas had a positive relationship with precipitation and a
negative relationship with temperature.
The authors in [150] analyzed to what degree different preprocessing steps affect the
output water maps using both SAR and DEM data and two variations of Otsu’s threshold-
ing algorithm. They showed that SAR data included radiometric terrain correction (RTC)
as a preprocessing step yielded more accurate results and that Bmax Otsu thresholding was
more stable to different inputs than Edge Otsu. However, their analysis was limited in time
and space, so more work needed to be done to test their results in different locations and
varying terrain types at different times. In [143], the authors used Landsat 8 images avail-
able on GEE to map glacial lakes in the Tibet Plateau region. About 3580 Landsat scenes
acquired in 2015 were preprocessed. After that, the MNDWI algorithm was applied to each
image to extract glacial lakes with thresholding techniques. The initial results were then
exported from GEE for further processing. They also analyzed the various characteristics
of glacial lakes, including size classes, elevation, and climate forcing. The results revealed
that climate warming played a major role in glacial lake changes. The authors in [151]
compared the performance of MNDWI and a RF to that of a multi-scale CNN (MSCNN)
Remote Sens. 2022, 14, 3253 83 of 110
and showed that the DL method was the most accurate (with less false classifications) for
identifying urban water resources in several Chinese cities. However, the authors took a
novel approach in avoiding the lack of DL methods available on GEE: they trained the CNN
locally, and then uploaded the weight matrix to GEE. They then implemented the rest of the
CNNs features (convolutions, etc) directly in GEE, effectively allowing the authors to run
DL inference on the platform. Still, the MSCNN model had issues classifying small/thin
water bodies and water scenes with mixed pixel classes. One way to make the data labeling
process less time- and resource-intensive for DL was illustrated in [156], where the authors
used current water maps and a segmentation algorithm to automatically collect data labels
from Sentinel-1 imagery. This data were then used to train variations of U-Net in an offline
environment. Due to computational constraints, the authors were not able to compare
their model to more traditional ML models like a RF. Even with their automated data
labeling pipeline, the authors noted that their study lacked sufficient data to adapt their
method to more than one country and manual validation was still necessary to validate the
model post-prediction.
Optical imagery used in surface water mapping analyses is often occluded by clouds,
and many common methods used to map surface water confuse snow, ice, rock, and
shadows as water. DeepWaterMapv2 was released in [147] to address these false positive
misclassifications. The authors used Landsat imagery from GEE to train their NN archi-
tecture to identify bodies of water across different terrain types and in different weather
conditions. However, due to the compute constraints and lack of NN models on GEE,
the authors moved the data offline during the training process. The authors designed the
network to work with many different satellite platforms as long as they have a set group of
input bands. The authors in [157] used masking, filtering, and segmentation algorithms to
identify bodies of water in Sri Lanka in complex, mountainous environments. They showed
that their model performs well even in the presence of shadow or soil and does so much
better than other common index-based methods like NDWI, MNDWI, or multi-spectral
water index (MuWI-R). To explore the potential to distinguish between surface water body
subtypes, [158] used slope, shape, and phenology, and flooding information as input to
a RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural
ponds. Their method did not work very well for wetlands and the OA was not very high
(85%) across classes. However, the RF model they use was interpretable and they showed
which other subclasses were easy or more difficult to predict for. Unfortunately, the entire
preprocessing (method) cannot be run directly on GEE: because the shape features cannot
be calculated on the platform and were crucial to the overall analysis, the authors first had
to do this in a local environment and then upload them.
The authors in [144] proposed a new method for quickly mapping yearly minimal
and maximal surface water extents. Using the GEE and Landsat images, temporal changes
in the extent of surface water in the Middle Yangtze River Basin were identified. Firstly,
based on the estimated value of cloud cover for each pixel, the high cloud covered pixels
were removed to eliminate the cloud interference and improve the calculation efficiency.
Secondly, the annual greenest and wettest images were mosaiced based on vegetation
index and surface water index. Thirdly, the minimum and maximum surface water extents
were obtained by the RF classification. Finally, manual noise removal as implemented
in ESRI ArcMap was applied to reduce noise in the classification result. In [148], the
authors integrated global surface water (GSW) dataset and SRTM-DEM to determine the
spatiotemporal patterns of water storage changes in China’s lakes and reservoirs. The
dynamic water storage change of 760 lakes and reservoirs, each with an area greater than
10 km2 , were evaluated over a time span of 30 years (1984–2015), the total area accounting
for about 80% of the total water surface area in China. The HydroLAKES data and China’s
lake dataset and river shapefile were also used to help select lakes and reservoirs. Water
level data for a total of 30 lakes across China from Hydroweb dataset were used for
validation. The DEM-based geo-statistic approach was used to construct hypsometric
relationships between water area and elevation for each lake and reservoir. Their data
Remote Sens. 2022, 14, 3253 84 of 110
preprocessing was implemented using ArcGIS, GEE was used for extraction and correction
of water coverage and also extraction of surface area-elevation pairs, and R software
was used for statistical analysis on pixel contamination ratios, hypsometric analysis, and
identification of spatio-temporal patterns.
The authors in [154] reviewed recent fluvial geomorphology GEE applications and
synthesized three common themes relevant to future planimetric river channel change
studies: (1) GEE has been used as a tool for mining the satellite imagery data archive,
cloud-masking images and then generating multitemporal image composites; (2) many
applications have provided accessible source code and/or data repositories, promoting
transparent and open science; (3) cartographic, graphical, and statistical analyses are almost
always completed outside of the GEE environment. This study [154] shared a demon-
stration workflow showing how GEE can be used to extract active river channel masks
from a section of the Cagayan River (Luzon, Philippines). The spatiotemporal planform
change was then quantified outside of the GEE environment, i.e., extracting centerline
position and channel width and calculating centerline migration rates. For RS applications
in fluvial geomorphology, challenges remain around issues of scaling, transferability, and
data uncertainties; particularly for small- to mid-sized rivers where medium-resolution,
multispectral satellite imagery is rarely suitable for geomorphic analyses. Caution is always
required to interpret geomorphic changes based on two-dimensional planforms alone, as
rivers also adjust in the vertical dimension. By enabling fluvial geomorphologists to take
their algorithms to petabytes worth of data, GEE is transformative in enabling determin-
istic science at scales defined by the user and determined by the phenomena of interest.
GEE offers a mechanism for promoting a cultural shift toward open science, through the
democratization of access and sharing of reproducible code.
The authors in [146] stated that this was the first study using GEE for RS of water
quality parameters in inland waters. Using Landsat imagery in conjunction with ground-
based measurements of CDOM absorption and DOC concentrations, a regression-based
model was built to estimate CDOM in the six largest Arctic rivers using 424 separate
observations from 2000 to 2013.
To estimate water quality parameters like Chl-a concentrations, turbidity, and dis-
solved organic matter, [152] used ML and DL models to analyze RS imagery. The authors
showed that several ML and DL models were able to achieve very low error rates for this
regression task. Some of the relationships detected by the models could be used to predict
for non-optical variables, as well. However, the authors had to move the ML portion of
their analysis off the GEE platform due to “algorithmic limitations” (inflexible models).
While a DL model performed well for predicting various water quality indicators, [152]
cited a lack of model transparency. They cautioned that feature extraction and expert
knowledge may still be necessary to make some sense of the DL model outputs, otherwise
they were difficult to interpret, negating the level of accuracy achieved with the model. The
authors in [153] developed a methodological framework for mapping Chl-a concentrations
with multi-sensor satellite observations and in-situ water quality samples. A SVM model
was trained on the GEE cloud-computer platform and used to predict Chl-a concentrations
of 12 inland lakes in the tri-state region of the U.S., including Kentucky, Indiana, and Ohio.
The results demonstrated that GEE and multi-sensor satellite observations can enable fast
and accurate mapping of Chl-a at a regional scale.
this analysis. The authors in [162] showed that by combining SAR, optical, and LiDAR data
on the GEE platform, a BRT model was able to predict peatland occurrence across Alberta
province with relatively high accuracy at high resolution. Using different input variable
selection methods and optimization techniques, the authors were able to trim down their
dataset to six variables, saving time and compute in the final analysis while pointing future
studies towards which data would be useful to collect more of for peatland mapping. [162]
pointed out that additional training data from field work or photo interpretation will aid in
future peatland monitoring and detection studies and that more research needs to go into
distinguishing between different wetland classes.
The authors in [161] used optical and SAR RS imagery to produce a 10 m resolution
wetland map for the entire province of Newfoundland, Canada, using both a RF model
and SNIC. Optical data contributed more to the accuracy of the models, although including
SAR boosted accuracy rates. While OA rates were high for distinguishing between wetland
and non-wetland classes, distinguishing between wetland sub-types (bog, fen, marsh, etc.)
remained difficult. Limitations for the study include not having access to a harmonized
Landsat-Sentinel data produced on GEE, not being able to use TensorFlow or DL models on
GEE, and a continued lack of ground-truth data for wetland detection studies. In [170], the
authors classified wetlands in Newfoundland during three different periods to show the
spatial dynamics of these ecosystems. The authors obtained high accuracy rates using both
a RF and CART model and were even able to distinguish between wetland subtypes like
bogs, fens, and peatlands. The authors used Landsat imagery because its data catalog goes
back to the 1980s. This was necessary because of the length of the wetland change detection
they were interested in. Still, the authors noted that future mapping applications should
focus their analyses on using higher-resolution products like Sentinel imagery to increase
accuracy rates even further over wide areas. The authors in [17] proposed using field data
collected from one Canadian province to create wetland inventory maps for several others
using a mix of optical, SAR, and digital elevation data. However, the authors received
mixed accuracy results from their RF model, most likely because the study rests on the
assumption that there was a static underlying distribution of data between wetlands across
Canadian provinces. The authors noted that their results could be improved if the GEE
platform allowed for more samples to be analyzed at once, and if there were more flexibility
or choice in choosing ML model hyperparameters or if there were more segmentation
algorithms included on the platform.
Across Canada, wetland mapping is a well-studied phenomenon. However, different
local and regional agency wetland inventories use different techniques for monitoring
wetlands or have altogether different definitions of what constitutes a wetland. Thus,
even though several large-scale wetland maps have been produced, they are often not
directly comparable. Additionally, these maps are often static and do not continually
monitor wetlands through time. However, as [165] detailed, these are not the only barriers
to mapping wetlands using RS imagery. Others include obtaining sufficient and recent
field data to verify wetland monitoring products, but also the difficulty of monitoring such
dynamic landscapes. Wetlands do not have clear-cut boundaries, are extremely diverse
landscapes and ecosystems, and are often in flux throughout seasons and years due to
flooding and drying. The authors use optical and SAR Sentinel data in addition to field
samples over the entirety of Canada and show that almost one-fifth of Canada is covered
in wetlands. The study in [165] produced a high-resolution (10-m) wetland inventory map
of Canada (an approximate area of one billion hectares), using multi-year, multi-source
(Sentinel-1 and Sentinel-2) RS data on the GEE platform. The whole country was mapped
using a large volume of reference samples using an object-based RF classification scheme
with an OA approaching 80%. They [165] used both pixel- and object-based classification
with an RF model and SNIC to reduce noise in the output map. However, the authors came
into the study with an accuracy threshold in mind and changed the training and dataset to
meet it after already seeing accuracy results. The authors presented uneven performance
across Canadian provinces, mainly due to a lack of RS or field data in some locations. The
Remote Sens. 2022, 14, 3253 86 of 110
authors in [160] analyzed a large number of field samples alongside Landsat imagery with
a RF model to produce a wetland map for all of Canada. While this analysis showed how
GEE made it easier to scale up the spatial scope of a given analysis (i.e., move from local to
regional, country-level, or global scope) [160], obtained low accuracy scores across Canada.
The authors note that more field samples and the use of SAR data could improve future
results given that large parts of Canada is often covered by clouds and snow throughout
the year. The authors in [168] proposed an object-based classification method to classify
Sentinel-1 and Sentinel-2 data on the GEE platform, which resulted in the 10-m Canadian
Wetland Inventory. The method consisted of a simple non-iterative clustering algorithm
and the RF algorithm, which was applied to identify wetlands in each of the 15 ecozones in
Canada. The overall accuracies for each ecozone ranged from 76% to 91%. It represents a
7% improvement compared to the first generation of the Canadian Wetland Inventory.
The authors in [163] used NAIP imagery and LiDAR derived DEM data to detect
wetlands across the northern United States using unsupervised classification on the GEE
platform. They then compared their output with Joint Research Centre (JRC) Monthly
Water History and National Wetland Inventory (NWI) data. Additionally, all code and
implementation details were made open source, making it easy for others to verify or build
on their results. A benefit of their technique is that unsupervised learning does not rely
on underlying ground-truth data, often a bottleneck in ML and wetland mapping studies.
However, this was also a limitation in the study as it was difficult to verify their resulting
maps other than by comparison with other water and wetland map products (which
themselves could have inaccuracies). To get around the limitation that wetlands can be both
wet and dry over the course of the same season, the authors in [171] combined Sentinel-1
and -2 imagery with aerial photographs and field data to map the spatial variation of
wetlands in portions of the United States over time. First, the authors trained RF and
SVM models to predict the occurrence of wetlands and then masked out permanent water
using the JRC Global Surface Water dataset. This allowed the authors to show not only
permanently inundated wetlands, but how wetlands change over time. The RF model
was the most accurate when compared to the SVM and NDWI, while also reducing false
positives and negatives. The authors made their workflow open source in the hopes that
conservation managers or people without coding experience can rerun their analysis for
updated wetland extent information. More analyses should take into account spatial
variation while producing environmental mapping applications, especially as governments
and nonprofits make conservation decisions based on them. The authors in [159] explored
the possibility of using GEE to map coastal wetlands in Indonesia by comparing all of the
different classifiers on the platform and how they perform with Landsat, digital elevation,
and Haralick texture data. While the results showed that the CART algorithm performed
the best on this task across every year of training data, it was unclear from the results
whether feature engineering and PCA bands helped the model learn better than from just
the spectral input data. While GEE allowed [159] to train several models, some models
failed to run due to computational constraints or inflexibility. The authors showed that in
all cases, ML models did much better at binary than multi-class classification.
With Landsat 8 and high-resolution Google Earth imagery, [164] used a RF model on
GEE to classify tidal flat types and their distribution in China. The authors reported very
high classification rates across tidal flat classes and showed that their methods produced
on GEE compared favorably to or did a much better job at classifying tidal flats based on
visual interpretation. However, the authors detailed that satellites like Landsat did not
fully capture tidal ranges, meaning that accuracy could be improved further with future
data products that observe full tidal duration distributions. In [169], the authors developed
a pixel and frequency-based approach to generate annual maps of tidal flats at 30-m spatial
resolution in China’s coastal zone using the Landsat TM/ETM+/OLI images and the GEE
cloud computing platform. The resulting map of coastal tidal flats in 2016 was evaluated
using very high-resolution images available in Google Earth. The annual frequency maps
of open surface water bodies and vegetation were first produced using Landsat-based
Remote Sens. 2022, 14, 3253 87 of 110
time series vegetation indices and water-related spectral index. Pixels with a water body
frequency spanning from 0.05 to 0.95 were classified as intertidal zones. A threshold value
of 0.05 was used to classify coastal vegetation area (vegetation frequency ≥ 0.05) and non-
vegetated tidal flats (vegetation frequency < 0.05). Mixed pixels, such as remnant tidal flats
water, could not be detected. In [172], the authors first processed high-resolution RS, and
UAS imagery to map minimum and maximum water and vegetation extent. They then used
Otsu’s thresholding algorithm to automatically detect the best ratio for each index. These
two indices were then combined in a composite that showed the total intertidal area in the
RS imagery, to which the authors again applied the Otsu thresholding algorithm. The end
result was a highly accurate map of tidal flats that did not require any post-processing. The
authors compared their results with other tidal flat datasets in China and noted that their
method produces (at least visually) better estimated because their method incorporated
high-resolution imagery, did a better job at cloud-masking, and achieved better estimates of
tidal minima and maxima. Still, the authors noted that more imagery of high and low tides
in RS imagery needed to be collected and would increase the accuracy of their method.
A RF model was used on GEE in [166] to identify water cavities where sebkhas form
in Morocco. The authors used digital elevation data, SAR, and optical imagery, as well as
digital photos on GEE to identify saltwater cavities and their aquifers with high accuracy.
However, future challenges remain in incorporating multi-sensor, multi-temporal, multi-
resolution RS big data and in improving open-source, cloud-based ML workflows for EO
data. The authors in [167] compared the performance of a XGBoost model to a CNN for
wetland type classification. The authors got a decent accuracy score, but their F1-score was
bad, so it was not clear what the models were actually learning. The authors were also not
able to train the two models on the same two subsets of data, making their performance
not directly comparable. However, in addition to making their resulting maps and trained
CNN model open source, the authors run an informative comparison between how long it
takes to run and train the two models used in this study. The CNN and XGBoost model
took the same time to train, but the CNN took far less time to predict on the test set. More
studies should adopt this reporting metric so that researchers can more clearly evaluate the
tradeoffs between using specific models for their use-cases.
Appendix C.7. Textual Summaries for Infrastructure and Building Detection, Urbanization Monitoring
The authors in [178] created a large, vectorized, ground-truth verified dataset in India
specifically for the purpose of being able to train different ML models. They verified the
utility of the dataset by training CART, RF, and SVM models on GEE and compared their
predictions to those of the WorldPop dataset. While manually creating a large dataset takes
time, the authors showed that they can achieve accuracy rates of 87% with the RF model.
The authors also compared different combinations of input data and their impact on model
performance. For their application, Landsat 8 data served as better input than Landsat
7 alone or Landsat 7 data with computed indices like NDVI.
To investigate how best to identify impervious materials in RS imagery regardless of
cloud cover, [182] combined nighttime light, DEM, and SAR data and a RF model on GEE.
Their resulting maps were more accurate than commonly used maps like GlobeLand30.
More importantly, though, the authors quantitatively showed that using multiple sources
of data were better than single sources for this task: optical data were the most important,
but SAR data improved accuracy rates across all metrics. The mounting expansion of
impervious surfaces (major components of human settlements) could lead to a series of
human-dominated environmental and ecological issues. In [180], the authors put forward
a new scheme to conduct long-term monitoring of impervious−relevant land disturbances
using Landsat archives. The developed region was identified using a RF classifier. The
GEE-version LandTrendr was then used to detect land disturbances, characterizing the
conversion from vegetation to impervious surfaces. Finally, the actual disturbance areas
within the developed regions were derived and quantitatively evaluated.
Remote Sens. 2022, 14, 3253 88 of 110
The authors in [179] accessed the impact of urban form on the landscape structure of
urban green spaces in 262 cities in China. They preprocessed and classified 6673 Landsat
scenes for these cities using the RF classifier on GEE. Subsequently, they calculated several
landscape structure metrics and urban form metrics. To evaluate the relationship between
landscape metrics and urban form metrics, a BRT model was constructed to analyze their
relationships. The results revealed that cities with a high road density tended to have
a smaller area of urban green spaces and be more fragmented. In contrast, cities with
complex terrains tended to have more fragmented urban green spaces.
A semi-automatic large-scale and long time series (LSLTS) urban land mapping frame-
work was demonstrated in [183] by integrating the crowdsourced OpenStreetMap (OSM)
data with free Landsat images to generate annual urban land maps in the middle Yangtze
River basin (MYRB) from 1987 to 2017. First, the annual Landsat images and the related
spectral indices were collected and calculated in GEE. The OSM related data were collected
and processed manually in ArcGIS to generate the training samples. Then, the generated
samples were uploaded to GEE. Two classification algorithms were used: CART and RF.
Pixels that were both classified as urban land by the two methods were labeled as urban
land. The classified maps were downloaded from GEE and a spatial-temporal consistency
checking was further performed. Except for the generation of reference data for training
and validation as well as post classification analysis, most of the data processing was
performed automatically in GEE. Use of crowdsourced geographic data (CGD) such as
OSM came with many challenges: OSM polygons may overlap and contain multiple LULC
types; there is a large diversity of tags in OSM, some of which cannot be converted directly
to LULC classes; most of human activities are in urban areas, resulting in an imbalance of
(non-urban) class data. The authors noted a lack of GEE infrastructure, such as 1) GEE API
related to CGD, that could facilitate the training samples generation, and 2) direct import of
the Google Earth annual very high resolution (VHR) images to GEE that users can set as the
background image and collect validation samples on the cloud. In this study, urban areas
on RS images were defined as sites that were dominated by a built environment, including
all non-vegetative, human-constructed elements and were defined as features with tags of
all non-vegetative, human-constructed elements including road networks and buildings in
OSM data.
To explore the possibility of identifying greenhouses in RS imagery over a large
area in China, [185] designed an ensemble ML model to distinguish them from water,
forest, farmland, and construction sites. The authors found that of various ML models
available on GEE, the CART, gmoMaxEnt, and RF models performed the best. These models
were then combined through a weighting system to make predictions, and this resultant
ensemble model performed better at this classification task than any of the individual
models. Additionally, [185] looked at which features play the most important role in the
ML model’s predictions. The authors found that spectral information was most useful,
but that texture and terrain features helped boost the accuracy even more. However, this
method relies on optical imagery, so it depends on relatively cloud-free imagery. More
work would need to be done to help the model generalize to situations where cloud-free
imagery is not available and to distinguish between greenhouse subtypes.
The authors in [186] designed a workflow for mapping urban sprawl over time in
Brazil using a RF on the GEE platform. They used optical RS imagery from the Landsat
and Sentinel platforms, alongside DEM data and found that the cities used for their case
study had built out horizontally instead of densifying vertically. Still, the drivers behind
the urban sprawl need to be investigated further, in addition to how best to incorporate
their maps into the governmental policy decision-making process.
Using different vegetative indices (EVI, Gross Primary Production, etc.) derived from
Landsat and MODIS data, [181] showed that urban sprawl in Shanghai had increased
significantly in the last decade and a half. The spread of suburbs in Shanghai had led to
much less green space over a 15-year period. This is a very impactful area of research that
can be done completely on the GEE platform and replicated across cities around the world.
Remote Sens. 2022, 14, 3253 89 of 110
Produced together with heatmaps of a given city, urban vegetation maps can be used to
pursue environmental justice strategies that can improve equitable access to green spaces
and attempt to reduce extreme temperature disparities (“heat islands”) in cities.
Producing up-to-date land cover maps can be time-consuming and expensive to make.
This is especially true in areas without dense data coverage for common LULC classes.
In [184], the authors combined Landsat 5 and 8 RS imagery, slope from a digital terrain
model (DTM), and GLCM information, and then trained a SVM to output two classification
maps for portions of Rwanda: one for 1987 and the other for 2019. The authors then used
the LandTrendr algorithm to compute LULC changes through time, which allowed them to
produce maps without having dense field observations for validation. They showed that
while water, wetland, and forested areas had remained fairly constant in terms of total area,
urban development has been replacing open land and agricultural areas.
maps fall short while making them interoperable with higher-resolution, more accurate
maps being produced today. Still, [196] said that the number of ground-truth observations
was the limiting factor in their analysis and that model’s performance could be improved
further with more validation data.
The 250 m spatial resolution of products like FireCCI51 leave out a lot of detail, so
the authors in [191] used CBERS, Gaofen, and Landsat imagery to create a 30 m burned-
area dataset for 2015. The authors first trained a RF on this imagery and set it to output
probabilities instead of class predictions. These probabilities were then used as a starting
point for a pixel-aggregation algorithm that classifies neighboring pixels as whether they
belong to the burned-area class or not. The authors called this “burned-are shaping” and
the resulting maps for this process were used as training data for an SVM. The resulting
map had good spatial agreement with FireCCI51 but had much higher spatial resolution
with more detailed and accurate boundaries. However, the authors noted that their method
had difficulty recognizing burned areas from recently plowed fields in agricultural areas, so
crop-type masks should be used to remove potential false positives. Additionally, Landsat
data were used for both the data collection and validation stage. Thus, the authors were not
able to assess the suitability of using Landsat imagery for data collection purposes despite
their high accuracy rates. Later on, [194] adapted the exact same processing steps on GEE
to produce a burned area map for the year 2005, illustrating how sharing and storing code
on GEE makes it easy to re-run analyses or adapt them for new use cases.
In order to better interpret the fire severity in terms of on-the-ground fire effects
compared to non-standardized spectral indices, [189] produced a map of composite burn
index (CBI), a frequently used, field-based measure of fire severity. A RF model was built
on the GEE, describing CBI across forested landscapes in North America as a function of
multiple spectral indices and climatic and geographic coordinates. The robust relationships
and the fairly high model skill in most regions suggest the resulting CBI maps may be
beneficial in remote regions where it is expensive and difficult to acquire field measures of
severity (e.g., Alaska and the majority of Canada).
In [202], the authors made use of Landsat imagery and the LandTrendr algorithm
to monitor water accumulation in subsidence areas of past mining in China. First, they
identified permanent versus seasonal water bodies, then used a water index in areas of
known mining to track water changes. The authors incorporated a popular subsidence
simulator that predicted for water accumulation at underground mining sites and showed
that their dataset had good agreement with it. Thus, their processing workflow can be
integrated with the simulator to verify the output. While the authors achieved high
accuracy rates, this varied dramatically between different years and between different
stages of water accumulation. The authors noted that more work needed to be done to
increase the robustness of their processing pipeline to more accurately distinguish between
water accumulation at mining sites and flooding and heavy rainfall events.
To monitor mining disturbances at a coalfield in Mongolia, [199] used the LandTrendr
algorithm to analyze Landsat data. The authors designed a fast, efficient method on the
GEE platform to monitor surface mining operations and show that only 26% of promised
reclamation was undertaken at the Shengli Coalfield. However, the authors noted that their
pixel-based classification approach would benefit from a comparison of an object-based
approach (although many object-based classifiers are not on GEE).
In order to keep track on mines and dams in Brazil, [200] used two different CNNs to
first classify potential mining sites and then to classify its perceived/potential environmental
risk. In this two-phase approach, the authors were able to identify 263 unregistered mines and
designed the CNN to work on variable-sized RS images. This analysis relied on government
data, which may not be available in other locations where mining was taking place. Addition-
ally, since the authors used a DL approach, they had to move their training process from GEE
to Google Colab. Even so, their data were too big for the GPU memory limits.
With GEE JavaScript API, [201] used RF classifiers to produce maps of mine waste
extents with Landsat-8 and Sentinel-1 and Sentinel-2 archives. The simplest method of
mapping mines is through thresholding, where a division between spectral response that
represents mines and non-mine areas can be clearly defined. Thresholding only produces
high accuracy when the spectral response of mines is significantly different than the
surrounding non-mine areas. Although the interpreter attempted to collect training data
points that were representative of all of the mine types as well as the variability in the
other classes, more training data may be required to better distinguish classes as similar as
outcrops/rock, mines, and urban areas. The RF classification algorithm computes Mean
Decrease in Accuracy (MDA) which is commonly used to assess variable importance. No
functions exist within GEE (yet) to analyze the importance of variables in a RF classifier
therefore this was completed using extracted training data values in R.
To test the efficacy of different ML algorithms for identifying waste and dump sites in
optical imagery, [203] optimized the parameters for the CART, RF, and SVM algorithms
available on GEE. The authors found that the RF algorithm was by far the most accurate
even when using several optimization schemes for each model. However, the authors
noted that a lack of elevation data in their processing pipeline led to classification errors,
and that more work could be done using DL methods to identify waste and dump piles in
the future.
ANN were used to establish the relationship between precipitation and four environmental
variables, including elevation, longitude, latitude, and one of the three vegetation indices
(NDVI, EVI, LAI), The StandardScaler algorithm of scikit-learn was used to standardize
variables using their means and standard deviation to eliminate the effects of different scaling.
The GridSearchCV algorithm with 10-fold cross-validation (GSCV) splitting strategy was
used to identify the best hyper-parameter values of each machine learning-vegetation index.
The monthly precipitation maps were derived from the annual downscaled precipitation by
disaggregation. According to validation in the Great Mekong upstream region, the ANN
method yielded the best performance when simulating the annual TRMM precipitation. The
most sensitive vegetation index for downscaling TRMM was LAI, followed by EVI.
The authors in [205] performed major-axis regression on these datasets in pairs
(7 ETM+/8 OLI, 7 ETM+/2 MSI, and 8 OLI/2 MSI) across the entire coterminous United
States and were able to determine cross-platform correction coefficients for the Blue, Green,
Red, NIR, and SWIR bands present in all three satellites. The authors then validated their
methodology and correction coefficients by analyzing these same satellite platforms across
Europe. While [205] did not create an actual integrated dataset for use on the GEE platform,
their research was the first step to building such a dataset and making sure that it is of
high quality.
The authors in [206] implemented a cloud-based workflow and compared that to the
traditional method of using SAGA GIS for producing local climate zone city maps based on
data like WUDAPT. The authors showed that the traditional method was more accurate on
average than the GEE method when using only the datasets available to WUDAPT and when
trying to transfer an urban morphology classifier between individual cities. However, using
GEE allowed the authors to aggregate information from multiple cities in the same climate
zone and for the RF model they used to be trained on more RS data and derived indices that
were not available in the WUDAPT dataset. These improvements boost OA scores in urban
topology classification. Thus, while the GEE and more traditional classification methods are
not directly comparable, the cloud-based method outlined by [206] can be used to complement
research being done in urban topology studies.
The authors in [207] investigated the impacts of landscape changes on LST intensity
(LSTI) in a tropical mountain city in Sri Lanka. Annual median temperatures from three
years were extracted from Landsat data through the GEE interface. The SVM algorithm was
used to conduct LULC mapping, which was then used to calculate the fractions of built-up,
forested, and agricultural land based on urban–rural zone analysis. The study showed that
rapid development was spreading towards rural zones, and the fraction of built-up land
influenced the increase in annual mean LST. It was recommended that having a mixture of
land-use types would considerably control the increasing LST in the study area.
The authors in [208] presented a method to obtain high-resolution sea surface salinity
(SSS) and temperature (SST) by using raw satellite data, i.e., Sentinel-2 Level 1-C Top of
Atmosphere reflectance data. A deep NN had been built to link band information with in
situ data, which was obtained from the Copernicus Marine In Situ platform. The deep NN
providing the best results was found to be composed of 20 hidden layers with 43 nodes in
each layer. Shortcuts were used in the network architecture to avoid the so-called vanishing
gradient problem, providing an improved performance compared with the equivalent feed-
forward architecture. Accurate salinity values were estimated without using temperature as
input in the network. However, a clear dependency on temperature ranges was observed, with
less accurate estimations for locations where ocean temperature falls below 10 ◦ C. The NN
presented in this paper outperformed classical architectures tested for regression problems.
To study this mechanism further, the authors in [209] used a LSTM and compared the
performance to a RF for carbon fluxes in global forests. They combined bioclimatic and
forest age data with Landsat imagery and MODIS atmospheric reflectance maps as input
data to their models. The authors showed that previous seasons’ water and temperature
records (specifically from the spring) affected the ways forests release carbon in the current
season. Still, the LSTM model used in [209] struggled when it was trained in one site or one
Remote Sens. 2022, 14, 3253 93 of 110
forest type and applied to another. For instance, their ML and DL models did not perform
well in the Tropics and had varying performance predicting carbon flux for evergreen
and deciduous forests. This lack of generalizability was indicative of the way that carbon
fluxes vary from forest to forest around the world, but also of that their dataset was biased
towards older, undisturbed forests which led the LSTM to underperform for those classes.
Using field observations, DEM data, and Landsat imagery, the authors in [219] sought
to address these issues by mapping different soil types and soil attributes across a large
region in Brazil using the GEE platform. The authors were able to show that elevation,
climate data, as well as the SWIR2, NIR, and Blue bands from Landsat imagery are the
most important factors in determining soil types, even at different soil depths. However,
the authors noted that more soil observations were needed to increase the accuracy of their
method and would aid further digital soil mapping studies.
The authors in [221] were able to produce a global, high-resolution soil moisture map
on GEE, by using optical, thermal, and SAR imagery in addition to DEM data. The authors
used a GBRT model to train on in-situ observations paired with RS imagery to then predict
soil moisture in other locations. After running a relative variable importance analysis,
the authors found that optical RS imagery and land-cover information played the most
important roles in determining soil moisture content, but that SAR imagery and soil data
also contributed significantly to the model’s overall performance. Their finding highlights
other studies results ([95,161,182]) that the combination of optical and SAR data improves
predictive outcomes. The entire processing pipeline is now an open-source Python package
(PYSMM). However, the authors had issues with the GEE platform. The model needed to
be trained offline due to issues with flexibility and design, and the validation soil moisture
observation dataset was not available on the platform. The authors noted that sparse or
clustered observations led to model inaccuracies, which was a call to both collect more soil
moisture observation data but also to upload more of it (and other diverse types of data) to
the GEE platform.
The authors in [220] explored the effects of spatial aggregation of climatic, biotic,
topographic and soil variables on national estimates of litter and soil C stocks and charac-
terized the spatial distribution of litter and soil C stocks in the conterminous United States
(CONUS). Litter and soil variables were measured on permanent sample plots from the
National Forest Inventory (NFI) from 2000 to 2011. These data were used with vegetation
phenology data estimated from Landsat 7 imagery and raster data describing environmen-
tal variables for the entire CONUS to predict litter and soil carbon stocks. Specifically, the
maximum of NDVI values from the growing season and forty categorical and continuous
environmental variables compiled from various data sources and resolutions with ArcGIS
were selected as predictor variables. Three supervised ML methods (i.e., RF, quantile regres-
sion forest (QRF) and KNN) were chosen to model the distribution of litter and soil carbon
stocks. All analyses were conducted with R. The results suggested that the RF and QRF
prediction models performed better than KNN models although results across the three
methods were similar. All modeling approaches performed better for soil compared to litter
layers and the spatial pattern of association between litter, soil carbon, and environmental
covariates observed from the RF and QRF models may reflect spatial patterns in litter
decomposition, soil chemistry, and plant and microbial communities.
change detection problem. The authors in [224] implemented a multitemporal cloud detec-
tion method using the GEE Python API, which was applied to the Landsat-8 imagery and
validated over a large collection of manually labeled cloud masks from the Biome dataset.
The approach was based on a simple multitemporal background modeling algorithm, in
which k-means clustering was applied to the difference image between the cloudy image
(target) and the cloud-free estimated background (reference). The obtained clusters were
then labeled as cloudy or cloud-free areas by applying a set of thresholds on the difference
intensity and on the reflectance of the representative clusters. This approach was found
to outperform single-scene threshold-based cloud detection approaches such as FMask
(Zhu et al. 2015). More specifically, linear and nonlinear least squares regression algorithms
were proposed to minimize both the prediction and the estimation error simultaneously.
Significant differences in the image of interest with respect to the estimated background
were identified as clouds. The use of kernel methods allowed the generalization of the al-
gorithm to account for higher-order (nonlinear) feature relations. The method was tested in
a dataset with 5-day revisit time series from SPOT-4 at high resolution and with Landsat-8
time series.
A CNN model, called DeepGEE-CD, was built in [225] to detect clouds in RS imagery
directly on the GEE platform. First, the authors developed and trained the CNN locally
and then uploaded the weights to GEE. They then implemented most of the layers in
the network, with the exception of a few of the more complicated convolutional layers
which were too complicated to be coded directly on the GEE platform. This CNN can run
inference directly in the cloud. In addition, the authors made the model flexible, able to
handle RS imagery of varying input sizes. The CNN gets comparable performance to the
Fmask algorithm, but without the additional information in the form of physical rules that
Fmask needs to work well.
To explore how CV algorithms and ML models can be used together on GEE, the
authors in [226] combined the existing Cloud-Score algorithm with a SVM to detect clouds
in imagery ranging from Amazon tropical forests, Hainan Island, and Sri Lanka. The
cloud-score algorithm first masked input RS imagery, then was used for input to train the
SVM. This process led to much higher accuracy rates than any of the other CV algorithms
for cloud detection and did so with considerably lower error rates.
The authors in [227] implemented their cloud removal DL model directly in GEE. Their
model, DeepGEE-S2CR, is a cloud-optimized version of the DSen2-CR model presented
in [228] and fused co-registered Sentinel-1 and Sentinel-1-2 images from the SEN12MS-CR
dataset. First, the authors trained their CNN locally and then uploaded the weights to GEE.
They then designed the network using the GEE API, implementing layers and custom cost
functions so that the CNN fits into memory constraints. The authors showed that their
model had a slight reduction in RMSE, but produced very similar results to the bigger and
more compute-intensive DSen2-CR. The CNN can be run directly on GEE without the need
to download, store, and process data locally.
in any physical storage unit to GEE needs a stable internet connection. Ways need to be
developed to speed up image transfer and processing.
A set of freely available environmental variables (i.e., habitat information from RS
observations and climatic information from weather stations), was used in [230] to assess
and predict the roadkill risk. For each of the seven medium-large mammals, they performed
binomial logistic regressions relating the roadkill presence-absence in the road sections
across the survey dates, with the collection of environmental variables (land cover classes,
forest cover, distance to rivers, temperature, precipitation, and NDVI) and the temporal
and spatial trends of overall roadkill. The intrinsic spatial and temporal roadkill risk were
the most important variables, followed by land cover, climate and NDVI. The modeling
framework of coupling RS information, climate data, traffic volume and biodiversity
metrics, may allow to provide more accurate roadkill risk predictions in near real time and
potentially at the global scale.
A semi-automated framework developed in [231] for monitoring large complex
wildlife aggregations using drone-acquired imagery over four large and complex wa-
terbird colonies. The semi-automated approach applied a RF classifier to high-resolution
drone imagery to identify nests, followed by predictive modeling (k-fold estimation) to
estimate nest counts from the mapped nest area. Arithmetic and textural metrics from
the red, green and blue channels in the drone data were calculated and used as predictor
variables in the RF classification, which helped capture more of the spatial and spectral
variation in target features. The predictor variable calculation and nest mapping routines
using RF classification were implemented in GEE. All statistical analyses, including nest
counting and accuracy assessment, were performed in the R programming environment.
Using Landsat RS imagery, climate variables, and government environmental data,
the authors in [232] analyzed Pine Processionary Moth outbreaks in pine forests in southern
Spain. The authors first used a KNN to determine which features related to various
vegetative indices and environmental variables. Then, after choosing a representative
subset of their data based on the KNN’s output, [232] used a RF to predict for pest outbreaks
based on ground-truth defoliation data. The authors found that minimum temperatures in
February and the precipitation patterns for each season were the best at predicting pest
outbreaks, followed by vegetative indices. While having access to medium-resolution
imagery helped the authors map pest outbreaks in pine forests over a large area in Spain,
they noted that more work should be done to collect more ground-truth data and to explore
the use of higher-resolution data products like those from the Sentinel satellites.
before releasing them. Perhaps most importantly, the authors vectorized their results at the
end of their analysis so that other researchers can use them for visualization or classification
tasks. This points to an urgent need in EO and ML research: more studies should attempt to
vectorize their data instead of producing binary or multi-class classification maps. However,
their analysis depends on having an internet connection to upload, process, and classify
data with GEE in the field. This is not always possible, perhaps limiting the future utility
of their work. The authors also mention data and compute limits on GEE as being a main
limitation to their analysis. For example, every image uploaded to GEE (at the time of this
paper’s release) is 10 GB. Because the authors used sub-centimeter drone imagery, they had
to downsize each image before uploading it, resulting in a loss of resolution.
Optical and SAR data on GEE were used in [235] to create a classifier capable of
outputting a likelihood that there is a mounded site in a given region of the Cholistan
Desert in Pakistan. Doing field sites there is difficult because it can be unsafe for people
due to its heat and remoteness. Thus, it is important that the authors were able to use a
RF model to show where likely mound sites are to analyze further. However, the authors
introduced some subjectivity by tweaking the probability threshold for mount/no-mound
boundaries. This was necessary because of a lack of high-quality validation data, so it is
difficult to measure the accuracy of their process.
Using Landsat images on the GEE platform, a method was proposed in [238] to
map continuous changes in coastlines and tidal flats in the Zhoushan Archipelago during
1985–2017. The workflow flow consists of (1) building the full time series of MNDWI at the
pixel level, (2) performing a temporal segmentation using a binary segmentation algorithm
and deriving the corresponding temporal segments, (3) classifying the coastal cover types
(i.e., water, tidal flats, and land) in each temporal segment based on the features of MNDWI
and regional tidal heights, (4) detecting the change information including conversion types,
turning years and months. The spatial and temporal validation was implemented based on
the visual interpretation of Landsat images. Three major coastal change types are found,
including land reclamation, aquaculture expansion, and assertion of tidal flats; the land
reclamation was the dominant coastal change.
the lake ice area was greater than or equal to 90% of the lake area, the date of this day was
determined as freeze-up end. If the lake ice area was stable at less than or equal to 90% of
the lake area, the date of this day was determined as break-up start, while breakup end
was defined as the time point when the ice was less than or equal to 10% of the total cover.
The presence of clouds and crushed ice may cause some errors in the results obtained from
different data sources.
References
1. Yang, L.; MacEachren, A.M.; Mitra, P.; Onorati, T. Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification:
A Review. ISPRS Int. J. Geo-Inf. 2018, 7, 65. [CrossRef]
2. Sebestyén, V.; Czvetkó, T.; Abonyi, J. The Applicability of Big Data in Climate Change Research: The Importance of System of
Systems Thinking. Front. Environ. Sci. 2021, 9, 619092. [CrossRef]
3. Li, Z. Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions. In High
Performance Computing for Geospatial Applications; Tang, W., Wang, S., Eds.; Springer International Publishing: Cham, Switzerland,
2020; pp. 53–76, ISBN 9783030479985.
4. Lee, J.-G.; Kang, M. Geospatial Big Data: Challenges and Opportunities. Big Data Res. 2015, 2, 74–81. [CrossRef]
5. Lippitt, C.D.; Zhang, S. The impact of small unmanned airborne platforms on passive optical remote sensing: A conceptual
perspective. Int. J. Remote Sens. 2018, 39, 4852–4868. [CrossRef]
6. Zhen, L.I.U.; Huadong, G.U.O.; Wang, C. Considerations on Geospatial Big Data. IOP Conf. Ser. Earth Environ. Sci. 2016, 46, 012058.
7. Karimi, H.A. Big Data: Techniques and Technologies in Geoinformatics; CRC Press: Boca Raton, FL, USA, 2014; ISBN 9781466586512.
8. Marr, B. Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance; John Wiley &
Sons: Hoboken, NJ, USA, 2015; ISBN 9781118965825.
9. Deng, X.; Liu, P.; Liu, X.; Wang, R.; Zhang, Y.; He, J.; Yao, Y. Geospatial Big Data: New Paradigm of Remote Sensing Applications.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3841–3851. [CrossRef]
10. Kashyap, R. Geospatial Big Data, Analytics and IoT: Challenges, Applications and Potential. In Cloud Computing for Geospatial Big
Data Analytics: Intelligent Edge, Fog and Mist Computing; Das, H., Barik, R.K., Dubey, H., Roy, D.S., Eds.; Springer International
Publishing: Cham, Switzerland, 2019; pp. 191–213, ISBN 9783030033590.
11. Yang, C.; Yu, M.; Hu, F.; Jiang, Y.; Li, Y. Utilizing Cloud Computing to address big geospatial data challenges. Comput. Environ.
Urban Syst. 2017, 61, 120–128. [CrossRef]
12. Liu, Y.; Dang, L.; Li, S.; Cai, K.; Zuo, X. Research Progress on Models, Algorithms, and Systems for Remote Sensing Spatial-
Temporal Big Data Processing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5918–5931. [CrossRef]
13. Liu, P.; Di, L.; Du, Q.; Wang, L. Remote Sensing Big Data: Theory, Methods and Applications. Remote Sens. 2018, 10, 711.
[CrossRef]
14. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial
analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [CrossRef]
15. Wang, Y.; Ziv, G.; Adami, M.; Mitchard, E.; Batterman, S.A.; Buermann, W.; Marimon, B.S.; Junior, B.H.M.; Reis, S.M.;
Rodrigues, D.; et al. Mapping tropical disturbed forests using multi-decadal 30 m optical satellite imagery. Remote Sens. Environ.
2018, 221, 474–488. [CrossRef]
16. Teluguntla, P.; Thenkabail, P.S.; Oliphant, A.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K.; Huete, A. A 30-m landsat-
derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine
cloud computing platform. ISPRS J. Photogramm. Remote Sens. 2018, 144, 325–340. [CrossRef]
17. Amani, M.; Brisco, B.; Afshar, M.; Mirmazloumi, S.M.; Mahdavi, S.; Mirzadeh, S.M.J.; Huang, W.; Granger, J. A generalized
supervised classification scheme to produce provincial wetland inventory maps: An application of Google Earth Engine for big
geo data processing. Big Earth Data 2019, 3, 378–394. [CrossRef]
18. Kumar, L.; Mutanga, O. Google Earth Engine Applications Since Inception: Usage, Trends, and Potential. Remote Sens. 2018, 10, 1509.
[CrossRef]
19. Samasse, K.; Hanan, N.P.; Anchang, J.Y.; Diallo, Y. A High-Resolution Cropland Map for the West African Sahel Based on
High-Density Training Data, Google Earth Engine, and Locally Optimized Machine Learning. Remote Sens. 2020, 12, 1436.
[CrossRef]
20. Lippitt, C.D.; Stow, D.A.; Clarke, K.C. On the nature of models for time-sensitive remote sensing. Int. J. Remote Sens. 2014, 35,
6815–6841. [CrossRef]
21. Zhou, B.; Okin, G.S.; Zhang, J. Leveraging Google Earth Engine (GEE) and machine learning algorithms to incorporate in situ
measurement from different times for rangelands monitoring. Remote Sens. Environ. 2020, 236, 111521. [CrossRef]
22. Sayad, Y.O.; Mousannif, H.; Al Moatassime, H. Predictive modeling of wildfires: A new dataset and machine learning approach.
Fire Saf. J. 2019, 104, 130–146. [CrossRef]
23. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; Depristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to
deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [CrossRef]
24. Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Health J. 2019, 6, 94–98. [CrossRef]
Remote Sens. 2022, 14, 3253 101 of 110
25. Mittal, S.; Hasija, Y. Applications of Deep Learning in Healthcare and Biomedicine. In Deep Learning Techniques for Biomedical and
Health Informatics; Dash, S., Acharya, B.R., Mittal, M., Abraham, A., Kelemen, A., Eds.; Springer International Publishing: Cham,
Switzerland, 2020; pp. 57–77, ISBN 9783030339661.
26. Boulos, M.N.K.; Peng, G.; VoPham, T. An overview of GeoAI applications in health and healthcare. Int. J. Health Geogr. 2019, 18, 7.
[CrossRef] [PubMed]
27. Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.;
Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications:
A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [CrossRef]
28. Wang, L.; Diao, C.; Xian, G.; Yin, D.; Lu, Y.; Zou, S.; Erickson, T.A. A summary of the special issue on remote sensing of land
change science with Google earth engine. Remote Sens. Environ. 2020, 248, 112002. [CrossRef]
29. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data
applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [CrossRef]
30. Hoeser, T.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part
I: Evolution and Recent Trends. Remote Sens. 2020, 12, 1667. [CrossRef]
31. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive
Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [CrossRef]
32. Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes.
Nature 2016, 540, 418–422. [CrossRef]
33. Decuyper, M.; Chávez, R.O.; Lohbeck, M.; Lastra, J.A.; Tsendbazar, N.; Hackländer, J.; Herold, M.; Vågen, T.-G. Continuous
monitoring of forest change dynamics with satellite time series. Remote Sens. Environ. 2021, 269, 112829. [CrossRef]
34. Guo, H.-D.; Zhang, L.; Zhu, L.-W. Earth observation big data for climate change research. Adv. Clim. Chang. Res. 2015, 6, 108–117.
[CrossRef]
35. Hird, J.N.; DeLancey, E.R.; McDermid, G.J.; Kariyeva, J. Google Earth Engine, Open-Access Satellite Data, and Machine Learning
in Support of Large-Area Probabilistic Wetland Mapping. Remote Sens. 2017, 9, 1315. [CrossRef]
36. Hsu, A.; Khoo, W.; Goyal, N.; Wainstein, M. Next-Generation Digital Ecosystem for Climate Data Mining and Knowledge
Discovery: A Review of Digital Data Collection Technologies. Front. Big Data 2020, 3, 29. [CrossRef] [PubMed]
37. Google Earth Engine. A Planetary-Scale Platform for Earth Science & Data Analysis. Available online: https://earthengine.
google.com/ (accessed on 19 November 2019).
38. National Aeronautics and Space Administration (NASA). Welcome to the NASA Earth Exchange (NEX). Available online:
https://www.nasa.gov/nex (accessed on 23 April 2022).
39. National Aeronautics and Space Administration (NASA). Geostationary-NASA Earth Exchange (GeoNEX). Available online:
https://www.nasa.gov/geonex (accessed on 23 April 2022).
40. Earth on AWS. Available online: https://aws.amazon.com/earth/ (accessed on 10 July 2019).
41. Chandrashekar, S. Announcing Real-Time Geospatial Analytics in Azure Stream Analytics. Available online: https://azure.
microsoft.com/en-us/blog/announcing-real-time-geospatial-analytics-in-azure-stream-analytics/ (accessed on 23 April 2022).
42. Microsoft. Microsoft Planetary Computer. Available online: https://planetarycomputer.microsoft.com/ (accessed on 23 April 2022).
43. Parente, L.; Taquary, E.; Silva, A.P.; Souza, C.; Ferreira, L. Next Generation Mapping: Combining Deep Learning, Cloud
Computing, and Big Remote Sensing Data. Remote Sens. 2019, 11, 2881. [CrossRef]
44. Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review.
ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [CrossRef]
45. Lobell, D.B.; Thau, D.; Seifert, C.; Engle, E.; Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 2015, 164,
324–333. [CrossRef]
46. Shelestov, A.; Lavreniuk, M.; Kussul, N.; Novikov, A.; Skakun, S. Exploring Google Earth Engine Platform for Big Data Processing:
Classification of Multi-Temporal Satellite Imagery for Crop Mapping. Front. Earth Sci. 2017, 5, 17. [CrossRef]
47. Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N.
Nominal 30-m cropland extent map of continental Africa by integrating pixel-based and object-based algorithms using Sentinel-2
and Landsat-8 data on Google Earth Engine. Remote Sens. 2017, 9, 1065. [CrossRef]
48. Xiong, J.; Thenkabail, P.S.; Gumma, M.K.; Teluguntla, P.; Poehnelt, J.; Congalton, R.G.; Yadav, K.; Thau, D. Automated cropland
mapping of continental Africa using Google Earth Engine cloud computing. ISPRS J. Photogramm. Remote Sens. 2017, 126, 225–244.
[CrossRef]
49. Deines, J.M.; Kendall, A.D.; Hyndman, D.W. Annual Irrigation Dynamics in the U.S. Northern High Plains Derived from Landsat
Satellite Data. Geophys. Res. Lett. 2017, 44, 9350–9360. [CrossRef]
50. Kelley, L.C.; Pitcher, L.; Bacon, C. Using Google Earth Engine to Map Complex Shade-Grown Coffee Landscapes in Northern
Nicaragua. Remote Sens. 2018, 10, 952. [CrossRef]
51. Ragettli, S.; Herberz, T.; Siegfried, T. An Unsupervised Classification Algorithm for Multi-Temporal Irrigated Area Mapping in
Central Asia. Remote Sens. 2018, 10, 1823. [CrossRef]
52. Ghazaryan, G.; Dubovyk, O.; Löw, F.; Lavreniuk, M.; Kolotii, A.; Schellberg, J.; Kussul, N. A rule-based approach for crop
identification using multi-temporal and multi-sensor phenological metrics. Eur. J. Remote Sens. 2018, 51, 511–524. [CrossRef]
Remote Sens. 2022, 14, 3253 102 of 110
53. Mandal, D.; Kumar, V.; Bhattacharya, A.; Rao, Y.S.; Siqueira, P.; Bera, S. Sen4Rice: A Processing Chain for Differentiating Early
and Late Transplanted Rice Using Time-Series Sentinel-1 SAR Data with Google Earth Engine. IEEE Geosci. Remote Sens. Lett.
2018, 15, 1947–1951. [CrossRef]
54. Oliphant, A.J.; Thenkabail, P.S.; Teluguntla, P.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K. Mapping cropland extent of
Southeast and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google
Earth Engine cloud. Int. J. App. Earth Observ. Geoinf. 2019, 81, 110–124. [CrossRef]
55. Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model. Sensors 2019, 19, 4363.
[CrossRef] [PubMed]
56. Wang, M.; Liu, Z.; Baig, M.H.A.; Wang, Y.; Li, Y.; Chen, Y. Mapping sugarcane in complex landscapes by integrating multi-temporal
Sentinel-2 images and machine learning algorithms. Land Use Policy 2019, 88, 104190. [CrossRef]
57. Tian, F.; Wu, B.; Zeng, H.; Zhang, X.; Xu, J. Efficient Identification of Corn Cultivation Area with Multitemporal Synthetic
Aperture Radar and Optical Images in the Google Earth Engine Cloud Platform. Remote Sens. 2019, 11, 629. [CrossRef]
58. Xie, Y.; Lark, T.J.; Brown, J.F.; Gibbs, H.K. Mapping irrigated cropland extent across the conterminous United States at 30 m
resolution using a semi-automatic training approach on Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2019, 155,
136–149. [CrossRef]
59. Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at
national scales with Google Earth Engine. Remote Sens. Environ. 2019, 228, 115–128. [CrossRef]
60. Rudiyanto; Minasny, B.; Shah, R.M.; Che Soh, N.; Arif, C.; Indra Setiawan, B.; Rudiyanto Minasny, B. Automated Near-Real-Time
Mapping and Monitoring of Rice Extent, Cropping Patterns, and Growth Stages in Southeast Asia Using Sentinel-1 Time Series
on a Google Earth Engine Platform. Remote Sens. 2019, 11, 1666. [CrossRef]
61. Wang, S.; Azzari, G.; Lobell, D.B. Crop type mapping without field-level labels: Random forest transfer and unsupervised
clustering techniques. Remote Sens. Environ. 2019, 222, 303–317. [CrossRef]
62. Liang, L.; Runkle, B.R.K.; Sapkota, B.B.; Reba, M.L. Automated mapping of rice fields using multi-year training sample
normalization. Int. J. Remote Sens. 2019, 40, 7252–7271. [CrossRef]
63. Tian, H.F.; Huang, N.; Niu, Z.; Qin, Y.C.; Pei, J.; Wang, J. Mapping Winter Crops in China with Multi-Source Satellite Imagery and
Phenology-Based Algorithm. Remote Sens. 2019, 11, 820. [CrossRef]
64. Neetu; Ray, S.S. Exploring machine learning classification algorithms for crop classification using sentinel 2 data. Int. Arch.
Photogramm. Remote Sens. Spatial Inf. Sci. 2019, XLII-3/W6, 573–578. [CrossRef]
65. Gumma, M.K.; Thenkabail, P.S.; Teluguntla, P.G.; Oliphant, A.; Xiong, J.; Giri, C.; Pyla, V.; Dixit, S.; Whitbread, A.M. Agricultural
cropland extent and areas of South Asia derived using Landsat satellite 30-m time-series big-data using random forest machine
learning algorithms on the Google Earth Engine cloud. GISci. Remote Sens. 2019, 57, 302–322. [CrossRef]
66. Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data and
Machine Learning in China. Remote Sens. 2020, 12, 236. [CrossRef]
67. Phalke, A.R.; Özdoğan, M.; Thenkabail, P.S.; Erickson, T.; Gorelick, N.; Yadav, K.; Congalton, R.G. Mapping Croplands of Europe,
Middle East, Russia, and Central Asia Using Landsat, Random Forest, and Google Earth Engine. ISPRS J. Photogramm. Remote
Sens. 2020, 167, 104–122. [CrossRef]
68. Chen, N.; Yu, L.; Zhang, X.; Shen, Y.; Zeng, L.; Hu, Q.; Niyogi, D. Mapping Paddy Rice Fields by Combining Multi-Temporal
Vegetation Index and Synthetic Aperture Radar Remote Sensing Data Using Google Earth Engine Machine Learning Platform.
Remote Sens. 2020, 12, 2992. [CrossRef]
69. Amani, M.; Kakooei, M.; Moghimi, A.; Ghorbanian, A.; Ranjgar, B.; Mahdavi, S.; Davidson, A.; Fisette, T.; Rollin, P.; Brisco, B.; et al.
Application of Google Earth Engine Cloud Computing Platform, Sentinel Imagery, and Neural Networks for Crop Mapping in
Canada. Remote Sens. 2020, 12, 3561. [CrossRef]
70. You, N.; Dong, J. Examining Earliest Identifiable Timing of Crops Using All Available Sentinel 1/2 Imagery and Google Earth
Engine. ISPRS J. Photogramm. Remote Sens. 2020, 161, 109–123.
71. Poortinga, A.; Thwal, N.S.; Khanal, N.; Mayer, T.; Bhandari, B.; Markert, K.; Nicolau, A.P.; Dilger, J.; Tenneson, K.; Clinton, N.; et al.
Mapping sugarcane in Thailand using transfer learning, a lightweight convolutional neural network, NICFI high resolution
satellite imagery and Google Earth Engine. ISPRS Open J. Photogramm. Remote Sens. 2021, 1, 100003. [CrossRef]
72. Adrian, J.; Sagan, V.; Maimaitijiang, M. Sentinel SAR-optical fusion for crop type mapping using deep learning and Google Earth
Engine. ISPRS J. Photogramm. Remote Sens. 2021, 175, 215–235. [CrossRef]
73. Cao, J.; Zhang, Z.; Luo, Y.; Zhang, L.; Zhang, J.; Li, Z.; Tao, F. Wheat yield predictions at a county and field scale with deep
learning, machine learning, and google earth engine. Eur. J. Agron. 2020, 123, 126204. [CrossRef]
74. Luo, C.; Qi, B.; Liu, H.; Guo, D.; Lu, L.; Fu, Q.; Shao, Y. Using Time Series Sentinel-1 Images for Object-Oriented Crop Classification
in Google Earth Engine. Remote Sens. 2021, 13, 561. [CrossRef]
75. Ni, R.; Tian, J.; Li, X.; Yin, D.; Li, J.; Gong, H.; Zhang, J.; Zhu, L.; Wu, D. An enhanced pixel-based phenological feature for accurate
paddy rice mapping with Sentinel-2 imagery in Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 178, 282–296.
[CrossRef]
76. Sun, Y.; Qin, Q.; Ren, H.; Zhang, Y. Decameter Cropland LAI/FPAR Estimation from Sentinel-2 Imagery Using Google Earth
Engine. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [CrossRef]
Remote Sens. 2022, 14, 3253 103 of 110
77. Li, M.; Zhang, R.; Luo, H.; Gu, S.; Qin, Z. Crop Mapping in the Sanjiang Plain Using an Improved Object-Oriented Method Based
on Google Earth Engine and Combined Growth Period Attributes. Remote Sens. 2022, 14, 273. [CrossRef]
78. Han, L.; Ding, J.; Wang, J.; Zhang, J.; Xie, B.; Hao, J. Monitoring Oasis Cotton Fields Expansion in Arid Zones Using the Google
Earth Engine: A Case Study in the Ogan-Kucha River Oasis, Xinjiang, China. Remote Sens. 2022, 14, 225. [CrossRef]
79. Hedayati, A.; Vahidnia, M.H.; Behzadi, S. Paddy lands detection using Landsat-8 satellite images and object-based classification
in Rasht city, Iran. Egypt. J. Remote Sens. Space Sci. 2022, 25, 73–84. [CrossRef]
80. Azzari, G.; Lobell, D. Landsat-based classification in the cloud: An opportunity for a paradigm shift in land cover monitoring.
Remote Sens. Environ. 2017, 202, 64–74. [CrossRef]
81. Midekisa, A.; Holl, F.; Savory, D.J.; Andrade-Pacheco, R.; Gething, P.; Bennett, A.; Sturrock, H. Mapping land cover change over
continental Africa using Landsat and Google Earth Engine cloud computing. PLoS ONE 2017, 12, e0184926. [CrossRef]
82. Hu, Y.; Dong, Y. Batunacun An Automatic Approach for Land-Change Detection and Land Updates Based on Integrated NDVI
Timing Analysis and the CVAPS Method with GEE Support. ISPRS J. Photogramm. Remote Sens. 2018, 146, 347–359. [CrossRef]
83. Ge, Y.; Hu, S.; Ren, Z.; Jia, Y.; Wang, J.; Liu, M.; Zhang, D.; Zhao, W.; Luo, Y.; Fu, Y.; et al. Mapping annual land use changes in
China’s poverty-stricken areas from 2013 to 2018. Remote Sens. Environ. 2019, 232, 111285. [CrossRef]
84. Lee, J.; Cardille, J.A.; Coe, M.T. BULC-U: Sharpening Resolution and Improving Accuracy of Land-Use/Land-Cover Classifications
in Google Earth Engine. Remote Sens. 2018, 10, 1455. [CrossRef]
85. Zurqani, H.A.; Post, C.J.; Mikhailova, E.A.; Schlautman, M.A.; Sharp, J.L. Geospatial analysis of land use change in the Savannah
River Basin using Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2018, 69, 175–185. [CrossRef]
86. Murray, N.J.; Keith, D.A.; Simpson, D.; Wilshire, J.H.; Lucas, R.M. Remap: An online remote sensing application for land cover
classification and monitoring. Methods Ecol. Evol. 2018, 9, 2019–2027. [CrossRef]
87. Mardani, M.; Mardani, H.; De Simone, L.; Varas, S.; Kita, N.; Saito, T. Integration of Machine Learning and Open Access Geospatial
Data for Land Cover Mapping. Remote Sens. 2019, 11, 1907. [CrossRef]
88. Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited
sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci.
Bull. 2019, 64, 370–373. [CrossRef]
89. Hao, B.; Ma, M.; Li, S.; Li, Q.; Hao, D.; Huang, J.; Ge, Z.; Yang, H.; Han, X. Land Use Change and Climate Variation in the Three
Gorges Reservoir Catchment from 2000 to 2015 Based on the Google Earth Engine. Sensors 2019, 19, 2118. [CrossRef]
90. Miettinen, J.; Shi, C.; Liew, S.C. Towards automated 10–30 m resolution land cover mapping in insular South-East Asia. Geocarto
Int. 2017, 34, 443–457. [CrossRef]
91. Xie, S.; Liu, L.; Zhang, X.; Yang, J.; Chen, X.; Gao, Y. Automatic Land-Cover Mapping using Landsat Time-Series Data based on
Google Earth Engine. Remote Sens. 2019, 11, 3023. [CrossRef]
92. Adepoju, K.A.; Adelabu, S.A. Improving accuracy of Landsat-8 OLI classification using image composite and multisource data
with Google Earth Engine. Remote Sens. Lett. 2019, 11, 107–116. [CrossRef]
93. Ghorbanian, A.; Kakooei, M.; Amani, M.; Mahdavi, S.; Mohammadzadeh, A.; Hasanlou, M. Improved land cover map of Iran
using Sentinel imagery within Google Earth Engine and a novel automatic workflow for land cover classification using migrated
training samples. ISPRS J. Photogramm. Remote. Sens. 2020, 167, 276–288. [CrossRef]
94. Liang, J.; Xie, Y.; Sha, Z.; Zhou, A. Modeling urban growth sustainability in the cloud by augmenting Google Earth Engine (GEE).
Comput. Environ. Urban Syst. 2020, 84, 101542. [CrossRef]
95. Zeng, H.; Wu, B.; Wang, S.; Musakwa, W.; Tian, F.; Mashimbye, Z.E.; Poona, N.; Syndey, M. A Synthesizing Land-cover
Classification Method Based on Google Earth Engine: A Case Study in Nzhelele and Levhuvu Catchments, South Africa. Chin.
Geogr. Sci. 2020, 30, 397–409. [CrossRef]
96. Naboureh, A.; Li, A.; Bian, J.; Lei, G.; Amani, M. A Hybrid Data Balancing Method for Classification of Imbalanced Training Data
within Google Earth Engine: Case Studies from Mountainous Regions. Remote Sens. 2020, 12, 3301. [CrossRef]
97. Naboureh, A.; Ebrahimy, H.; Azadbakht, M.; Bian, J.; Amani, M. RUESVMs: An Ensemble Method to Handle the Class Imbalance
Problem in Land Cover Mapping Using Google Earth Engine. Remote Sens. 2020, 12, 3484. [CrossRef]
98. Li, Q.; Qiu, C.; Ma, L.; Schmitt, M.; Zhu, X.X. Mapping the Land Cover of Africa at 10 m Resolution from Multi-Source Remote
Sensing Data with Google Earth Engine. Remote Sens. 2020, 12, 602. [CrossRef]
99. Huang, H.; Wang, J.; Liu, C.; Liang, L.; Li, C.; Gong, P. The migration of training samples towards dynamic global land cover
mapping. ISPRS J. Photogramm. Remote Sens. 2020, 161, 27–36. [CrossRef]
100. Tassi, A.; Vizzari, M. Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine
Learning Algorithms. Remote Sens. 2020, 12, 3776. [CrossRef]
101. Shetty, S.; Gupta, P.; Belgiu, M.; Srivastav, S. Assessing the Effect of Training Sampling Design on the Performance of Machine
Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine. Remote Sens.
2021, 13, 1433. [CrossRef]
102. Feizizadeh, B.; Omarzadeh, D.; Garajeh, M.K.; Lakes, T.; Blaschke, T. Machine learning data-driven approaches for land use/cover
mapping and trend analysis using Google Earth Engine. J. Environ. Plan. Manag. 2021, 1–33. [CrossRef]
103. Shafizadeh-Moghadam, H.; Khazaei, M.; Alavipanah, S.K.; Weng, Q. Google Earth Engine for large-scale land use and land cover
mapping: An object-based classification approach using spectral, textural and topographical factors. GISci. Remote Sens. 2021, 58,
914–928. [CrossRef]
Remote Sens. 2022, 14, 3253 104 of 110
104. Pan, X.; Wang, Z.; Gao, Y.; Dang, X.; Han, Y. Detailed and automated classification of land use/land cover using machine learning
algorithms in Google Earth Engine. Geocarto Int. 2021, 1–18. [CrossRef]
105. Becker, W.R.; Ló, T.B.; Johann, J.A.; Mercante, E. Statistical features for land use and land cover classification in Google Earth
Engine. Remote Sens. Appl. Soc. Environ. 2020, 21, 100459. [CrossRef]
106. Jin, Q.; Xu, E.; Zhang, X. A Fusion Method for Multisource Land Cover Products Based on Superpixels and Statistical Extraction
for Enhancing Resolution and Improving Accuracy. Remote Sens. 2022, 14, 1676. [CrossRef]
107. Lee, J.S.H.; Wich, S.; Widayati, A.; Koh, L.P. Detecting industrial oil palm plantations on Landsat images with Google Earth
Engine. Remote Sens. Appl. Soc. Environ. 2016, 4, 219–224. [CrossRef]
108. Voight, C.; Hernandez-Aguilar, K.; Garcia, C.; Gutierrez, S. Predictive Modeling of Future Forest Cover Change Patterns in
Southern Belize. Remote Sens. 2019, 11, 823. [CrossRef]
109. Koskinen, J.; Leinonen, U.; Vollrath, A.; Ortmann, A.; Lindquist, E.; D’Annunzio, R.; Pekkarinen, A.; Käyhkö, N. Participatory
mapping of forest plantations with Open Foris and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2018, 148, 63–74.
[CrossRef]
110. Duan, Q.; Tan, M.; Guo, Y.; Wang, X.; Xin, L. Understanding the Spatial Distribution of Urban Forests in China Using Sentinel-2
Images with Google Earth Engine. Forests 2019, 10, 729. [CrossRef]
111. Poortinga, A.; Tenneson, K.; Shapiro, A.; Nquyen, Q.; Aung, K.S.; Chishtie, F.; Saah, D. Mapping Plantations in Myanmar
by Fusing Landsat-8, Sentinel-2 and Sentinel-1 Data along with Systematic Error Quantification. Remote Sens. 2019, 11, 831.
[CrossRef]
112. Shimizu, K.; Ota, T.; Mizoue, N. Detecting Forest Changes Using Dense Landsat 8 and Sentinel-1 Time Series Data in Tropical
Seasonal Forests. Remote Sens. 2019, 11, 1899. [CrossRef]
113. Ramdani, F. Recent expansion of oil palm plantation in the most eastern part of Indonesia: Feature extraction with polarimetric
SAR. Int. J. Remote Sens. 2018, 40, 7371–7388. [CrossRef]
114. Çolak, E.; Chandra, M.; Sunar, F. The use of multi-temporal sentinel satellites in the analysis of land cover/land use changes
caused by the nuclear power plant construction. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-3/W8,
491–495. [CrossRef]
115. Shaharum, N.S.N.; Shafri, H.Z.M.; Ghani, W.A.W.A.K.; Samsatli, S.; Al-Habshi, M.M.A.; Yusuf, B. Oil palm mapping over Peninsular
Malaysia using Google Earth Engine and machine learning algorithms. Remote Sens. Appl. Soc. Environ. 2020, 17, 100287. [CrossRef]
116. De Sousa, C.; Fatoyinbo, L.; Neigh, C.; Boucka, F.; Angoue, V.; Larsen, T. Cloud-computing and machine learning in support
of country-level land cover and ecosystem extent mapping in Liberia and Gabon. PLoS ONE 2020, 15, e0227438. [CrossRef]
[PubMed]
117. Brovelli, M.A.; Sun, Y.; Yordanov, V. Monitoring Forest Change in the Amazon Using Multi-Temporal Remote Sensing Data and
Machine Learning Classification on Google Earth Engine. ISPRS Int. J. Geo-Inf. 2020, 9, 580. [CrossRef]
118. Kamal, M.; Farda, N.M.; Jamaluddin, I.; Parela, A.; Wikantika, K.; Prasetyo, L.B.; Irawan, B. A preliminary study on machine
learning and google earth engine for mangrove mapping. IOP Conf. Series Earth Environ. Sci. 2020, 500, 012038. [CrossRef]
119. Wei, C.; Karger, D.N.; Wilson, A.M. Spatial detection of alpine treeline ecotones in the Western United States. Remote Sens. Environ.
2020, 240, 111672. [CrossRef]
120. Praticò, S.; Solano, F.; Di Fazio, S.; Modica, G. Machine Learning Classification of Mediterranean Forest Habitats in Google
Earth Engine Based on Seasonal Sentinel-2 Time-Series and Input Image Composition Optimisation. Remote Sens. 2021, 13, 586.
[CrossRef]
121. Xie, B.; Cao, C.; Xu, M.; Duerler, R.; Yang, X.; Bashir, B.; Chen, Y.; Wang, K. Analysis of Regional Distribution of Tree Species
Using Multi-Seasonal Sentinel-1&2 Imagery within Google Earth Engine. Forests 2021, 12, 565. [CrossRef]
122. Floreano, I.X.; de Moraes, L.A.F. Land Use/land Cover (LULC) Analysis (2009–2019) with Google Earth Engine and 2030
Prediction Using Markov-CA in the Rondônia State, Brazil. Environ. Monit. Assess. 2021, 193, 239. [CrossRef]
123. Kumar, M.; Phukon, S.N.; Paygude, A.C.; Tyagi, K.; Singh, H. Mapping Phenological Functional Types (PhFT) in the Indian
Eastern Himalayas using machine learning algorithm in Google Earth Engine. Comput. Geosci. 2021, 158, 104982. [CrossRef]
124. Zhao, F.; Sun, R.; Zhong, L.; Meng, R.; Huang, C.; Zeng, X.; Wang, M.; Li, Y.; Wang, Z. Monthly mapping of forest harvesting
using dense time series Sentinel-1 SAR imagery and deep learning. Remote Sens. Environ. 2021, 269, 112822. [CrossRef]
125. Wimberly, M.C.; Dwomoh, F.K.; Numata, I.; Mensah, F.; Amoako, J.; Nekorchuk, D.M.; McMahon, A. Historical trends of
degradation, loss, and recovery in the tropical forest reserves of Ghana. Int. J. Digit. Earth 2022, 15, 30–51. [CrossRef]
126. Johansen, K.; Phinn, S.; Taylor, M. Mapping woody vegetation clearing in Queensland, Australia from Landsat imagery using the
Google Earth Engine. Remote Sens. Appl. Soc. Environ. 2015, 1, 36–49. [CrossRef]
127. Traganos, D.; Aggarwal, B.; Poursanidis, D.; Topouzelis, K.; Chrysoulakis, N.; Reinartz, P. Towards Global-Scale Seagrass Mapping
and Monitoring Using Sentinel-2 on Google Earth Engine: The Case Study of the Aegean and Ionian Seas. Remote Sens. 2018, 10, 1227.
[CrossRef]
128. Tsai, Y.H.; Stow, D.; Chen, H.L.; Lewison, R.; An, L.; Shi, L. Mapping Vegetation and Land Use Types in Fanjingshan National
Nature Reserve Using Google Earth Engine. Remote Sens. 2018, 10, 927. [CrossRef]
129. Jansen, V.S.; Kolden, C.A.; Schmalz, H.J. The Development of Near Real-Time Biomass and Cover Estimates for Adaptive
Rangeland Management Using Landsat 7 and Landsat 8 Surface Reflectance Products. Remote Sens. 2018, 10, 1057. [CrossRef]
Remote Sens. 2022, 14, 3253 105 of 110
130. Jones, M.O.; Allred, B.W.; Naugle, D.E.; Maestas, J.; Donnelly, P.; Metz, L.J.; Karl, J.; Smith, R.; Bestelmeyer, B.; Boyd, C.; et al.
Innovation in rangeland monitoring: Annual, 30 m, plant functional type percent cover maps for U.S. rangelands, 1984–2017.
Ecosphere 2018, 9, e02430. [CrossRef]
131. Campos-Taberner, M.; Moreno-Martínez, Á.; García-Haro, F.J.; Camps-Valls, G.; Robinson, N.P.; Kattge, J.; Running, S.W. Global
Estimation of Biophysical Variables from Google Earth Engine Platform. Remote Sens. 2018, 10, 1167. [CrossRef]
132. Xin, Y.; Adler, P.R. Mapping Miscanthus Using Multi-Temporal Convolutional Neural Network and Google Earth Engine. In
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, Chicago, IL,
USA, 5 November 2019; pp. 81–84. [CrossRef]
133. Parente, L.; Mesquita, V.; Miziara, F.; Baumann, L.; Ferreira, L. Assessing the pasturelands and livestock dynamics in Brazil, from
1985 to 2017: A novel approach based on high spatial resolution imagery and Google Earth Engine cloud computing. Remote Sens.
Environ. 2019, 232, 111301. [CrossRef]
134. Zhang, M.; Gong, P.; Qi, S.; Liu, C.; Xiong, T. Mapping bamboo with regional phenological characteristics derived from dense
Landsat time series using Google Earth Engine. Int. J. Remote Sens. 2019, 40, 9541–9555. [CrossRef]
135. Alencar, A.; Shimbo, J.Z.; Lenti, F.; Balzani Marques, C.; Zimbres, B.; Rosa, M.; Arruda, V.; Castro, I.; Fernandes Márcico Ribeiro,
J.P.; Varela, V.; et al. Mapping Three Decades of Changes in the Brazilian Savanna Native Vegetation Using Landsat Data Processed
in the Google Earth Engine Platform. Remote Sens. 2020, 12, 924. [CrossRef]
136. Tian, J.; Wang, L.; Yin, D.; Li, X.; Diao, C.; Gong, H.; Shi, C.; Menenti, M.; Ge, Y.; Nie, S.; et al. Development of spectral-phenological
features for deep learning to understand Spartina alterniflora invasion. Remote Sens. Environ. 2020, 242, 111745. [CrossRef]
137. Srinet, R.; Nandy, S.; Padalia, H.; Ghosh, S.; Watham, T.; Patel, N.R.; Chauhan, P. Mapping plant functional types in Northwest
Himalayan foothills of India using random forest algorithm in Google Earth Engine. Int. J. Remote Sens. 2020, 41, 7296–7309.
[CrossRef]
138. Long, X.; Li, X.; Lin, H.; Zhang, M. Mapping the vegetation distribution and dynamics of a wetland using adaptive-stacking
and Google Earth Engine based on multi-source remote sensing data. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2021, 102, 102453.
[CrossRef]
139. Yan, D.; Li, J.; Yao, X.; Luan, Z. Quantifying the Long-Term Expansion and Dieback of Spartina Alterniflora Using Google Earth
Engine and Object-Based Hierarchical Random Forest Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14,
9781–9793. [CrossRef]
140. Wu, N.; Shi, R.; Zhuo, W.; Zhang, C.; Zhou, B.; Xia, Z.; Tao, Z.; Gao, W.; Tian, B. A Classification of Tidal Flat Wetland Vegetation
Combining Phenological Features with Google Earth Engine. Remote Sens. 2021, 13, 443. [CrossRef]
141. Pipia, L.; Amin, E.; Belda, S.; Salinero-Delgado, M.; Verrelst, J. Green LAI Mapping and Cloud Gap-Filling Using Gaussian
Process Regression in Google Earth Engine. Remote Sens. 2021, 13, 403. [CrossRef]
142. Zou, Z.; Dong, J.; Menarguez, M.A.; Xiao, X.; Qin, Y.; Doughty, R.B.; Hooker, K.V.; Hambright, K.D. Continued decrease of open
surface water body area in Oklahoma during 1984–2015. Sci. Total Environ. 2017, 595, 451–460. [CrossRef]
143. Chen, F.; Zhang, M.; Tian, B.; Li, Z. Extraction of Glacial Lake Outlines in Tibet Plateau Using Landsat 8 Imagery and Google
Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4002–4009. [CrossRef]
144. Wang, C.; Jia, M.; Chen, N.; Wang, W. Long-Term Surface Water Dynamics Analysis Based on Landsat Imagery and the Google
Earth Engine Platform: A Case Study in the Middle Yangtze River Basin. Remote Sens. 2018, 10, 1635. [CrossRef]
145. Lin, S.; Novitski, L.N.; Qi, J.; Stevenson, R.J. Landsat TM/ETM+ and machine-learning algorithms for limnological studies and
algal bloom management of inland lakes. J. Appl. Remote Sens. 2018, 12, 026003. [CrossRef]
146. Griffin, C.G.; McClelland, J.W.; Frey, K.E.; Fiske, G.; Holmes, R.M. Quantifying CDOM and DOC in major Arctic rivers during
ice-free conditions using Landsat TM and ETM+ data. Remote Sens. Environ. 2018, 209, 395–409. [CrossRef]
147. Isikdogan, L.F.; Bovik, A.; Passalacqua, P. Seeing Through the Clouds with DeepWaterMap. IEEE Geosci. Remote Sens. Lett. 2019,
17, 1662–1666. [CrossRef]
148. Fang, Y.; Li, H.; Wan, W.; Zhu, S.; Wang, Z.; Hong, Y.; Wang, H. Assessment of Water Storage Change in China’s Lakes and
Reservoirs over the Last Three Decades. Remote Sens. 2019, 11, 1467. [CrossRef]
149. Fuentes, I.; Padarian, J.; van Ogtrop, F.; Vervoort, R.W. Vervoort Comparison of Surface Water Volume Estimation Methodologies
That Couple Surface Reflectance Data and Digital Terrain Models. Water 2019, 11, 780. [CrossRef]
150. Markert, K.N.; Markert, A.M.; Mayer, T.; Nauman, C.; Haag, A.; Poortinga, A.; Bhandari, B.; Thwal, N.S.; Kunlamai, T.;
Chishtie, F.; et al. Comparing Sentinel-1 Surface Water Mapping Algorithms and Radiometric Terrain Correction Processing in
Southeast Asia Utilizing Google Earth Engine. Remote Sens. 2020, 12, 2469. [CrossRef]
151. Wang, Y.; Li, Z.; Zeng, C.; Xia, G.; Shen, H. An Urban Water Extraction Method Combining Deep Learning and Google Earth
Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 768–781. [CrossRef]
152. Peterson, K.T.; Sagan, V.; Sloan, J.J. Deep Learning-Based Water Quality Estimation and Anomaly Detection Using Landsat-
8/Sentinel-2 Virtual Constellation and Cloud Computing. GISci. Remote Sens. 2020, 57, 510–525. [CrossRef]
153. Wang, L.; Xu, M.; Liu, Y.; Liu, H.; Beck, R.; Reif, M.; Emery, E.; Young, J.; Wu, Q. Mapping Freshwater Chlorophyll-a Concentrations
at a Regional Scale Integrating Multi-Sensor Satellite Observations with Google Earth Engine. Remote Sens. 2020, 12, 3278.
[CrossRef]
154. Boothroyd, R.J.; Williams, R.D.; Hoey, T.B.; Barrett, B.; Prasojo, O.A. Applications of Google Earth Engine in fluvial geomorphology
for detecting river channel change. WIREs Water 2020, 8, e21496. [CrossRef]
Remote Sens. 2022, 14, 3253 106 of 110
155. Weber, S.J.; Mishra, D.R.; Wilde, S.B.; Kramer, E. Risks for cyanobacterial harmful algal blooms due to land management and
climate interactions. Sci. Total Environ. 2019, 703, 134608. [CrossRef] [PubMed]
156. Mayer, T.; Poortinga, A.; Bhandari, B.; Nicolau, A.P.; Markert, K.; Thwal, N.S.; Markert, A.; Haag, A.; Kilbride, J.; Chishtie, F.; et al.
Deep learning approach for Sentinel-1 surface water mapping leveraging Google Earth Engine. ISPRS Open J. Photogramm. Remote
Sens. 2021, 2, 100005. [CrossRef]
157. Li, J.; Peng, B.; Wei, Y.; Ye, H. Accurate extraction of surface water in complex environment based on Google Earth Engine and
Sentinel-2. PLoS ONE 2021, 16, e0253209. [CrossRef]
158. Li, Y.; Niu, Z. Systematic method for mapping fine-resolution water cover types in China based on time series Sentinel-1 and 2
images. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2021, 106, 102656. [CrossRef]
159. Farda, N.M. Multi-temporal Land Use Mapping of Coastal Wetlands Area using Machine Learning in Google Earth Engine. IOP
Conf. Series Earth Environ. Sci. 2017, 98, 012042. [CrossRef]
160. Amani, M.; Mahdavi, S.; Afshar, M.; Brisco, B.; Huang, W.; Mohammad Javad Mirzadeh, S.; White, L.; Banks, S.; Montgomery, J.;
Hopkinson, C. Canadian Wetland Inventory using Google Earth Engine: The First Map and Preliminary Results. Remote Sens.
2019, 11, 842. [CrossRef]
161. Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Homayouni, S.; Gill, E. The First Wetland Inventory Map of Newfoundland
at a Spatial Resolution of 10 m Using Sentinel-1 and Sentinel-2 Data on the Google Earth Engine Cloud Computing Platform.
Remote Sens. 2019, 11, 43. [CrossRef]
162. DeLancey, E.R.; Kariyeva, J.; Bried, J.T.; Hird, J. Large-scale probabilistic identification of boreal peatlands using Google Earth
Engine, open-access satellite data, and machine learning. PLoS ONE 2019, 14, e0218165. [CrossRef]
163. Wu, Q.; Lane, C.R.; Li, X.; Zhao, K.; Zhou, Y.; Clinton, N.; DeVries, B.; Golden, H.E.; Lang, M.W. Integrating LiDAR data and
multi-temporal aerial imagery to map wetland inundation dynamics using Google Earth Engine. Remote Sens. Environ. 2019, 228,
1–13. [CrossRef] [PubMed]
164. Zhang; Zhang; Dong; Liu; Gao; Hu; Wu Mapping Tidal Flats with Landsat 8 Images and Google Earth Engine: A Case Study of
the China’s Eastern Coastal Zone circa 2015. Remote Sens. 2019, 11, 924. [CrossRef]
165. Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Brisco, B.; Homayouni, S.; Gill, E.; DeLancey, E.R.; Bourgeau-Chavez, L. Big
Data for a Big Country: The First Generation of Canadian Wetland Inventory Map at a Spatial Resolution of 10-m Using Sentinel-1
and Sentinel-2 Data on the Google Earth Engine Cloud Computing Platform. Can. J. Remote Sens. 2020, 46, 15–33. [CrossRef]
166. Hakdaoui, S.; Emran, A.; Pradhan, B.; Qninba, A.; El Balla, T.; Mfondoum, A.H.N.; Lee, C.-W.; Alamri, A.M. Assessing the
Changes in the Moisture/Dryness of Water Cavity Surfaces in Imlili Sebkha in Southwestern Morocco by Using Machine Learning
Classification in Google Earth Engine. Remote Sens. 2020, 12, 131. [CrossRef]
167. DeLancey, E.R.; Simms, J.F.; Mahdianpari, M.; Brisco, B.; Mahoney, C.; Kariyeva, J. Comparing Deep Learning and Shallow
Learning for Large-Scale Wetland Classification in Alberta, Canada. Remote Sens. 2019, 12, 2. [CrossRef]
168. Mahdianpari, M.; Brisco, B.; Granger, J.E.; Mohammadimanesh, F.; Salehi, B.; Banks, S.; Homayouni, S.; Bourgeau-Chavez, L.;
Weng, Q. The Second Generation Canadian Wetland Inventory Map at 10 Meters Resolution Using Google Earth Engine. Can. J.
Remote Sens. 2020, 46, 360–375. [CrossRef]
169. Wang, X.; Xiao, X.; Zou, Z.; Chen, B.; Ma, J.; Dong, J.; Doughty, R.B.; Zhong, Q.; Qin, Y.; Dai, S.; et al. Tracking annual changes of
coastal tidal flats in China during 1986–2016 through analyses of Landsat images with Google Earth Engine. Remote Sens. Environ.
2018, 238, 110987. [CrossRef]
170. Mahdianpari, M.; Jafarzadeh, H.; Granger, J.E.; Mohammadimanesh, F.; Brisco, B.; Salehi, B.; Homayouni, S.; Weng, Q. A large-
scale change monitoring of wetlands using time series Landsat imagery on Google Earth Engine: A case study in Newfoundland.
GISci. Remote Sens. 2020, 57, 1102–1124. [CrossRef]
171. Sahour, H.; Kemink, K.M.; O’Connell, J. Integrating SAR and Optical Remote Sensing for Conservation-Targeted Wetlands
Mapping. Remote Sens. 2021, 14, 159. [CrossRef]
172. Jia, M.; Wang, Z.; Mao, D.; Ren, C.; Wang, C.; Wang, Y. Rapid, robust, and automated mapping of tidal flats in China using time
series Sentinel-2 images and Google Earth Engine. Remote Sens. Environ. 2021, 255, 112285. [CrossRef]
173. van Deventer, H.; Cho, M.A.; Mutanga, O. Multi-season RapidEye imagery improves the classification of wetland and dryland
communities in a subtropical coastal region. ISPRS J. Photogramm. Remote Sens. 2019, 157, 171–187. [CrossRef]
174. Ye, X.-C.; Meng, Y.-K.; Xu, L.-G.; Xu, C.-Y. Net primary productivity dynamics and associated hydrological driving factors in the
floodplain wetland of China’s largest freshwater lake. Sci. Total Environ. 2019, 659, 302–313. [CrossRef] [PubMed]
175. Dalezios, N.R.; Dercas, N.; Eslamian, S.S. Water scarcity management: Part 2: Satellite-based composite drought analysis. Int. J.
Glob. Environ. Issues 2018, 17, 262. [CrossRef]
176. Zhang, M.; Lin, H. Wetland classification using parcel-level ensemble algorithm based on Gaofen-6 multispectral imagery and
Sentinel-1 dataset. J. Hydrol. 2022, 606, 127462. [CrossRef]
177. Guo, Y.; Jia, X.; Paull, D.; Benediktsson, J.A. Nomination-favoured opinion pool for optical-SAR-synergistic rice mapping in face
of weakened flooding signals. ISPRS J. Photogramm. Remote Sens. 2019, 155, 187–205. [CrossRef]
178. Goldblatt, R.; You, W.; Hanson, G.; Khandelwal, A.K. Detecting the Boundaries of Urban Areas in India: A Dataset for Pixel-Based
Image Classification in Google Earth Engine. Remote Sens. 2016, 8, 634. [CrossRef]
179. Huang, C.; Yang, J.; Jiang, P. Assessing Impacts of Urban Form on Landscape Structure of Urban Green Spaces in China Using
Landsat Images Based on Google Earth Engine. Remote Sens. 2018, 10, 1569. [CrossRef]
Remote Sens. 2022, 14, 3253 107 of 110
180. Xu, H.; Wei, Y.; Liu, C.; Li, X.; Fang, H. A Scheme for the Long-Term Monitoring of Impervious−Relevant Land Disturbances
Using High Frequency Landsat Archives and the Google Earth Engine. Remote Sens. 2019, 11, 1891. [CrossRef]
181. Zhong, Q.; Ma, J.; Zhao, B.; Wang, X.; Zong, J.; Xiao, X. Assessing spatial-temporal dynamics of urban expansion, vegetation
greenness and photosynthesis in megacity Shanghai, China during 2000–2016. Remote Sens. Environ. 2019, 233, 111374. [CrossRef]
182. Lin, Y.; Zhang, H.; Lin, H.; Gamba, P.E.; Liu, X. Incorporating synthetic aperture radar and optical images to investigate the
annual dynamics of anthropogenic impervious surface at large scale. Remote Sens. Environ. 2020, 242, 111757. [CrossRef]
183. Liu, D.; Chen, N.; Zhang, X.; Wang, C.; Du, W. Annual large-scale urban land mapping based on Landsat time series in Google
Earth Engine and OpenStreetMap data: A case study in the middle Yangtze River basin. ISPRS J. Photogramm. Remote Sens. 2019,
159, 337–351. [CrossRef]
184. Mugiraneza, T.; Nascetti, A.; Ban, Y. Continuous Monitoring of Urban Land Cover Change Trajectories with Landsat Time Series
and LandTrendr-Google Earth Engine Cloud Computing. Remote Sens. 2020, 12, 2883.
185. Lin, J.; Jin, X.; Ren, J.; Liu, J.; Liang, X.; Zhou, Y. Rapid Mapping of Large-Scale Greenhouse Based on Integrated Learning
Algorithm and Google Earth Engine. Remote Sens. 2021, 13, 1245. [CrossRef]
186. Carneiro, E.; Lopes, W.; Espindola, G. Urban Land Mapping Based on Remote Sensing Time Series in the Google Earth Engine
Platform: A Case Study of the Teresina-Timon Conurbation Area in Brazil. Remote Sens. 2021, 13, 1338. [CrossRef]
187. Zhang, Z.; Wei, M.; Pu, D.; He, G.; Wang, G.; Long, T. Assessment of Annual Composite Images Obtained by Google Earth Engine
for Urban Areas Mapping Using Random Forest. Remote Sens. 2021, 13, 748. [CrossRef]
188. Samat, A.; Gamba, P.; Wang, W.; Luo, J.; Li, E.; Liu, S.; Du, P.; Abuduwaili, J. Mapping Blue and Red Color-Coated Steel Sheet
Roof Buildings over China Using Sentinel-2A/B MSIL2A Images. Remote Sens. 2022, 14, 230. [CrossRef]
189. Parks, S.A.; Holsinger, L.M.; Koontz, M.J.; Collins, L.; Whitman, E.; Parisien, M.-A.; Loehman, R.A.; Barnes, J.L.; Bourdon, J.-F.;
Boucher, J.; et al. Giving Ecological Meaning to Satellite-Derived Fire Severity Metrics across North American Forests. Remote
Sens. 2019, 11, 1735. [CrossRef]
190. Quintero, N.; Viedma, O.; Urbieta, I.R.; Moreno, J.M. Assessing Landscape Fire Hazard by Multitemporal Automatic Classification
of Landsat Time Series Using the Google Earth Engine in West-Central Spain. Forests 2019, 10, 518. [CrossRef]
191. Long, T.; Zhang, Z.; He, G.; Jiao, W.; Tang, C.; Wu, B.; Zhang, X.; Wang, G.; Yin, R. 30 m Resolution Global Annual Burned Area
Mapping Based on Landsat Images and Google Earth Engine. Remote Sens. 2019, 11, 489. [CrossRef]
192. Bar, S.; Parida, B.R.; Pandey, A.C. Landsat-8 and Sentinel-2 based Forest fire burn area mapping using machine learning algorithms
on GEE cloud platform over Uttarakhand, Western Himalaya. Remote Sens. Appl. Soc. Environ. 2020, 18, 100324. [CrossRef]
193. Sulova, A.; Arsanjani, J.J. Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning
within Google Earth Engine. Remote Sens. 2021, 13, 10. [CrossRef]
194. Zhang, Z.; He, G.; Long, T.; Tang, C.; Wei, M.; Wang, W.; Wang, G. Spatial Pattern Analysis of Global Burned Area in 2005 Based
on Landsat Satellite Images. IOP Conf. Ser. Earth Environ. Sci. 2020, 428, 012078. [CrossRef]
195. Seydi, S.; Akhoondzadeh, M.; Amani, M.; Mahdavi, S. Wildfire Damage Assessment over Australia Using Sentinel-2 Imagery and
MODIS Land Cover Product within the Google Earth Engine Cloud Platform. Remote Sens. 2021, 13, 220. [CrossRef]
196. Arruda, V.L.; Piontekowski, V.J.; Alencar, A.; Pereira, R.S.; Matricardi, E.A. An alternative approach for mapping burn scars using
Landsat imagery, Google Earth Engine, and Deep Learning in the Brazilian Savanna. Remote Sens. Appl. Soc. Environ. 2021, 22, 100472.
[CrossRef]
197. Waller, E.K.; Villarreal, M.L.; Poitras, T.B.; Nauman, T.W.; Duniway, M.C. Landsat time series analysis of fractional plant cover
changes on abandoned energy development sites. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2018, 73, 407–419. [CrossRef]
198. Lobo, F.D.L.; Souza-Filho, P.W.M.; Novo, E.M.L.D.M.; Carlos, F.M.; Barbosa, C.C.F. Mapping Mining Areas in the Brazilian
Amazon Using MSI/Sentinel-2 Imagery (2017). Remote Sens. 2018, 10, 1178. [CrossRef]
199. Xiao, W.; Deng, X.; He, T.; Chen, W. Mapping Annual Land Disturbance and Reclamation in a Surface Coal Mining Region Using
Google Earth Engine and the LandTrendr Algorithm: A Case Study of the Shengli Coalfield in Inner Mongolia, China. Remote
Sens. 2020, 12, 1612. [CrossRef]
200. Balaniuk, R.; Isupova, O.; Reece, S. Mining and Tailings Dam Detection in Satellite Imagery Using Deep Learning. Sensors 2020,
20, 6936. [CrossRef]
201. Fuentes, M.; Millard, K.; Laurin, E. Big geospatial data analysis for Canada’s Air Pollutant Emissions Inventory (APEI): Using
google earth engine to estimate particulate matter from exposed mine disturbance areas. GISci. Remote Sens. 2019, 57, 245–257.
[CrossRef]
202. He, T.; Xiao, W.; Zhao, Y.; Deng, X.; Hu, Z. Identification of waterlogging in Eastern China induced by mining subsidence: A case
study of Google Earth Engine time-series analysis applied to the Huainan coal field. Remote Sens. Environ. 2020, 242, 111742.
[CrossRef]
203. Zhou, L.; Luo, T.; Du, M.; Chen, Q.; Liu, Y.; Zhu, Y.; He, C.; Wang, S.; Yang, K. Machine Learning Comparison and Parameter
Setting Methods for the Detection of Dump Sites for Construction and Demolition Waste Using the Google Earth Engine. Remote
Sens. 2021, 13, 787. [CrossRef]
204. Chrysoulakis, N.; Mitraka, Z.; Gorelick, N. Exploiting satellite observations for global surface albedo trends monitoring. Arch.
Meteorol. Geophys. Bioclimatol. Ser. B 2018, 137, 1171–1179. [CrossRef]
Remote Sens. 2022, 14, 3253 108 of 110
205. Chastain, R.; Housman, I.; Goldstein, J.; Finco, M.; Tenneson, K. Empirical Cross Sensor Comparison of Sentinel-2A and 2B MSI,
Landsat-8 OLI, and Landsat-7 ETM Top of Atmosphere Spectral Characteristics over the Conterminous United States. Remote
Sens. Environ. 2019, 221, 274–285. [CrossRef]
206. Demuzere, M.; Bechtel, B.; Mills, G. Global transferability of local climate zone models. Urban Clim. 2018, 27, 46–63. [CrossRef]
207. Ranagalage, M.; Murayama, Y.; Dissanayake, D.; Simwanda, M. The Impacts of Landscape Changes on Annual Mean Land
Surface Temperature in the Tropical Mountain City of Sri Lanka: A Case Study of Nuwara Eliya (1996–2017). Sustainability 2019,
11, 5517. [CrossRef]
208. Medina-Lopez, E.; Ureña-Fuentes, L. High-Resolution Sea Surface Temperature and Salinity in the Global Ocean from Raw
Satellite Data. Remote Sens. 2019, 11, 2191. [CrossRef]
209. Besnard, S.; Carvalhais, N.; Arain, M.A.; Black, A.; Brede, B.; Buchmann, N.; Chen, J.; Clevers, J.; Dutrieux, L.P.; Gans, F.; et al.
Memory effects of climate and vegetation affecting net ecosystem CO2 fluxes in global forests. PLoS ONE 2019, 14, e0211510.
[CrossRef]
210. Elnashar, A.; Zeng, H.; Wu, B.; Zhang, N.; Tian, F.; Zhang, M.; Zhu, W.; Yan, N.; Chen, Z.; Sun, Z.; et al. Downscaling TRMM
Monthly Precipitation Using Google Earth Engine and Google Cloud Computing. Remote Sens. 2020, 12, 3860. [CrossRef]
211. Yu, B.; Chen, F.; Muhammad, S. Analysis of satellite-derived landslide at Central Nepal from 2011 to 2016. Environ. Earth Sci.
2018, 77, 331. [CrossRef]
212. Cho, E.; Jacobs, J.M.; Jia, X.; Kraatz, S. Identifying Subsurface Drainage using Satellite Big Data and Machine Learning via Google
Earth Engine. Water Resour. Res. 2019, 55, 8028–8045. [CrossRef]
213. Uddin; Uddin; Matin; Meyer Operational Flood Mapping Using Multi-Temporal Sentinel-1 SAR Images: A Case Study from
Bangladesh. Remote Sens. 2019, 11, 1581. [CrossRef]
214. Vanama, V.S.K.; Mandal, D.; Rao, Y.S. GEE4FLOOD: Rapid mapping of flood areas using temporal Sentinel-1 SAR images with
Google Earth Engine cloud platform. J. Appl. Remote Sens. 2020, 14, 034505. [CrossRef]
215. Ghaffarian, S.; Rezaie Farhadabad, A.; Kerle, N. Post-Disaster Recovery Monitoring with Google Earth Engine. Appl. Sci. 2020, 10, 4574.
[CrossRef]
216. Kakooei, M.; Baleghi, Y. A two-level fusion for building irregularity detection in post-disaster VHR oblique images. Earth Sci.
Inform. 2020, 13, 459–477. [CrossRef]
217. Padarian, J.; Minasny, B.; McBratney, A. Using Google’s cloud-based platform for digital soil mapping. Comput. Geosci. 2015, 83,
80–88. [CrossRef]
218. Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Kempen, B.; de Sousa, L. Global mapping of soil salinity change. Remote
Sens. Environ. 2019, 231, 111260. [CrossRef]
219. Poppiel, R.R.; Lacerda, M.P.C.; Safanelli, J.L.; Rizzo, R.; Oliveira, M.P., Jr.; Novais, J.J.; Demattê, J.A.M. Mapping at 30 m Resolution
of Soil Attributes at Multiple Depths in Midwest Brazil. Remote Sens. 2019, 11, 2905. [CrossRef]
220. Cao, B.; Domke, G.M.; Russell, M.B.; Walters, B.F. Spatial modeling of litter and soil carbon stocks on forest land in the
conterminous United States. Sci. Total Environ. 2018, 654, 94–106. [CrossRef]
221. Greifeneder, F.; Notarnicola, C.; Wagner, W. A Machine Learning-Based Approach for Surface Soil Moisture Estimations with
Google Earth Engine. Remote Sens. 2021, 13, 2099. [CrossRef]
222. Zhang, M.; Zhang, M.; Yang, H.; Jin, Y.; Zhang, X.; Liu, H. Mapping Regional Soil Organic Matter Based on Sentinel-2A and
MODIS Imagery Using Machine Learning Algorithms and Google Earth Engine. Remote Sens. 2021, 13, 2934. [CrossRef]
223. Gómez-Chova, L.; Amorós-López, J.; Mateo-García, G.; Muñoz-Marí, J.; Camps-Valls, G. Cloud masking and removal in remote
sensing image time series. J. Appl. Remote Sens. 2017, 11, 015005. [CrossRef]
224. Mateo-García, G.; Gómez-Chova, L.; Amorós-López, J.; Muñoz-Marí, J.; Camps-Valls, G. Multitemporal Cloud Masking in the
Google Earth Engine. Remote Sens. 2018, 10, 1079. [CrossRef]
225. Yin, Z.; Ling, F.; Foody, G.M.; Li, X.; Du, Y. Cloud detection in Landsat-8 imagery in Google Earth Engine based on a deep
convolutional neural network. Remote Sens. Lett. 2020, 11, 1181–1190. [CrossRef]
226. Li, J.; Wang, L.; Liu, S.; Peng, B.; Ye, H. An automatic cloud detection model for Sentinel-2 imagery based on Google Earth Engine.
Remote Sens. Lett. 2021, 13, 196–206. [CrossRef]
227. Zhang, X.; Qiu, Z.; Peng, C.; Ye, P. Removing cloud cover interference from Sentinel-2 imagery in Google Earth Engine by fusing
Sentinel-1 SAR data with a CNN model. Int. J. Remote Sens. 2021, 43, 132–147. [CrossRef]
228. Meraner, A.; Ebel, P.; Zhu, X.X.; Schmitt, M. Cloud removal in Sentinel-2 imagery using a deep residual neural network and
SAR-optical data fusion. ISPRS J. Photogramm. Remote Sens. 2020, 166, 333–346. [CrossRef]
229. Carrasco-Escobar, G.; Manrique, E.; Ruiz-Cabrejos, J.; Saavedra, M.; Alava, F.; Bickersmith, S.; Prussing, C.; Vinetz, J.M.; Conn, J.;
Moreno, M.; et al. High-accuracy detection of malaria vector larval habitats using drone-based multispectral imagery. PLoS Negl.
Trop. Dis. 2019, 13, e0007105. [CrossRef]
230. Ascensão, F.; Yogui, D.R.; Alves, M.; Medici, E.P.; Desbiez, A. Predicting spatiotemporal patterns of road mortality for medium-
large mammals. J. Environ. Manag. 2019, 248, 109320. [CrossRef]
231. Lyons, M.B.; Brandis, K.J.; Murray, N.J.; Wilshire, J.H.; McCann, J.A.; Kingsford, R.T.; Callaghan, C.T. Monitoring large and
complex wildlife aggregations with drones. Methods Ecol. Evol. 2019, 10, 1024–1035. [CrossRef]
Remote Sens. 2022, 14, 3253 109 of 110
232. Pérez-Romero, J.; Navarro-Cerrillo, R.M.; Palacios-Rodriguez, G.; Acosta, C.; Mesas-Carrascosa, F.J. Improvement of Remote
Sensing-Based Assessment of Defoliation of Pinus spp. Caused by Thaumetopoea pityocampa Denis and Schiffermüller and
Related Environmental Drivers in Southeastern Spain. Remote Sens. 2019, 11, 1736.
233. Liss, B.; Howland, M.D.; Levy, T.E. Testing Google Earth Engine for the automatic identification and vectorization of archaeological
features: A case study from Faynan, Jordan. J. Archaeol. Sci. Rep. 2017, 15, 299–304. [CrossRef]
234. Orengo, H.; Garcia-Molsosa, A. A brave new world for archaeological survey: Automated machine learning-based potsherd
detection using high-resolution drone imagery. J. Archaeol. Sci. 2019, 112, 105013. [CrossRef]
235. Orengo, H.A.; Conesa, F.C.; Garcia-Molsosa, A.; Lobo, A.; Green, A.S.; Madella, M.; Petrie, C.A. Automated detection of
archaeological mounds using machine-learning classification of multisensor and multitemporal satellite data. Proc. Natl. Acad.
Sci. USA 2020, 117, 18240–18250. [CrossRef] [PubMed]
236. Hagenaars, G.; de Vries, S.; Luijendijk, A.P.; de Boer, W.P.; Reniers, A.J. On the accuracy of automated shoreline detection derived
from satellite imagery: A case study of the sand motor mega-scale nourishment. Coast. Eng. 2018, 133, 113–125. [CrossRef]
237. Vos, K.; Harley, M.D.; Splinter, K.D.; Simmons, J.A.; Turner, I.L. Sub-annual to multi-decadal shoreline variability from publicly
available satellite imagery. Coast. Eng. 2019, 150, 160–174. [CrossRef]
238. Cao, W.; Zhou, Y.; Li, R.; Li, X. Mapping changes in coastlines and tidal flats in developing islands using the full time series of
Landsat images. Remote Sens. Environ. 2020, 239, 111665. [CrossRef]
239. Traganos, D.; Poursanidis, D.; Aggarwal, B.; Chrysoulakis, N.; Reinartz, P. Estimating Satellite-Derived Bathymetry (SDB) with
the Google Earth Engine and Sentinel-2. Remote Sens. 2018, 10, 859. [CrossRef]
240. Sagawa, T.; Yamashita, Y.; Okumura, T.; Yamanokuchi, T. Satellite Derived Bathymetry Using Machine Learning and Multi-
Temporal Satellite Images. Remote Sens. 2019, 11, 1155. [CrossRef]
241. Tedesche, M.E.; Trochim, E.D.; Fassnacht, S.R.; Wolken, G.J. Extent Changes in the Perennial Snowfields of Gates of the Arctic
National Park and Preserve, Alaska. Hydrology 2019, 6, 53. [CrossRef]
242. Qi, M.; Liu, S.; Yao, X.; Xie, F.; Gao, Y. Monitoring the Ice Phenology of Qinghai Lake from 1980 to 2018 Using Multisource Remote
Sensing Data and Google Earth Engine. Remote Sens. 2020, 12, 2217. [CrossRef]
243. Yang, L.; Cervone, G. Analysis of remote sensing imagery for disaster assessment using deep learning: A case study of flooding
event. Soft Comput. 2019, 23, 13393–13408. [CrossRef]
244. Davies, D.K.; Murphy, K.J.; Michael, K.; Becker-Reshef, I.; Justice, C.O.; Boller, R.; Braun, S.A.; Schmaltz, J.E.; Wong, M.M.; Pasch,
A.N.; et al. The Use of NASA LANCE Imagery and Data for Near Real-Time Applications. In Time-Sensitive Remote Sensing;
Lippitt, C.D., Stow, D.A., Coulter, L.L., Eds.; Springer: New York, NY, USA, 2015; pp. 165–182, ISBN 9781493926022.
245. Lippitt, C.D.; Stow, D.A.; Riggan, P.J. Application of the remote-sensing communication model to a time-sensitive wildfire
remote-sensing system. Int. J. Remote Sens. 2016, 37, 3272–3292. [CrossRef]
246. Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; de Las Casas, D.; Hendricks, L.A.; Welbl, J.;
Clark, A.; et al. Training Compute-Optimal Large Language Models. arXiv 2022, arXiv:2203.15556.
247. Banko, M.; Brill, E. Scaling to Very Very Large Corpora for Natural Language Disambiguation. In Proceedings of the 39th Annual
Meeting of the Association for Computational Linguistics, Association for Computational Linguistics. Toulouse, France, 6–11 July
2001; pp. 26–33.
248. Gil Press Andrew Ng Launches A Campaign for Data-Centric AI. Available online: https://www.forbes.com/sites/gilpress/20
21/06/16/andrew-ng-launches-a-campaign-for-data-centric-ai/ (accessed on 25 April 2022).
249. Pratt, L.Y. Discriminability-Based Transfer between Neural Networks. In Advances in Neural Information Processing Systems 5;
Hanson, S.J., Cowan, J.D., Giles, C.L., Eds.; Morgan-Kaufmann: Burlington, MA, USA, 1993; pp. 204–211.
250. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [CrossRef]
251. Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 1345–1459. [CrossRef]
252. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Proceedings of the International
Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2018; pp. 270–279.
253. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE
2021, 109, 43–76. [CrossRef]
254. Li, C.; Zhang, S.; Qin, Y.; Estupinan, E. A systematic review of deep transfer learning for machinery fault diagnosis. Neurocomputing
2020, 407, 121–135. [CrossRef]
255. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
256. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
257. Bar, Y.; Diamant, I.; Wolf, L.; Lieberman, S.; Konen, E.; Greenspan, H. Chest Pathology Detection Using Deep Learning with
Non-Medical Training. In Proceedings of the 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York,
NY, USA, 16–19 April 2015; pp. 294–297.
258. Maaten, L.; Chen, M.; Tyree, S.; Weinberger, K. Learning with Marginalized Corrupted Features. In Proceedings of the International
Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 410–418.
259. Gillies, M.; Fiebrink, R.; Tanaka, A.; Garcia, J.; Bevilacqua, F.; Heloir, A.; Nunnari, F.; Mackay, W.; Amershi, S.; Lee, B.; et al.
Human-Centred Machine Learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in
Computing Systems 2016, San Jose, CA, USA, 7–12 May 2016.
Remote Sens. 2022, 14, 3253 110 of 110
260. Wu, Q. geemap: A Python package for interactive mapping with Google Earth Engine. J. Open Source Softw. 2020, 5, 2305.
[CrossRef]
261. Aybar, C.; Wu, Q.; Bautista, L.; Yali, R.; Barja, A. rgee: An R package for interacting with Google Earth Engine. J. Open Source
Softw. 2020, 5, 2272. [CrossRef]
262. Huntington, J.L.; Hegewisch, K.C.; Daudert, B.; Morton, C.G.; Abatzoglou, J.T.; McEvoy, D.J.; Erickson, T. Climate Engine: Cloud
Computing and Visualization of Climate and Remote Sensing Data for Advanced Natural Resource Monitoring and Process
Understanding. Bull. Am. Meteorol. Soc. 2017, 98, 2397–2410. [CrossRef]
263. Li, H.; Wan, W.; Fang, Y.; Zhu, S.; Chen, X.; Liu, B.; Hong, Y. A Google Earth Engine-enabled software for efficiently generating
high-quality user-ready Landsat mosaic images. Environ. Model. Softw. 2018, 112, 16–22. [CrossRef]
264. Yang, L.; Driscol, J.; Sarigai, S.; Wu, Q.; Lippitt, C.D.; Morgan, M. Towards Synoptic Water Monitoring Systems: A Review of AI
Methods for Automating Water Body Detection and Water Quality Monitoring Using Remote Sensing. Sensors 2022, 22, 2416.
[CrossRef] [PubMed]