Assessment of GEDIs LiDAR Data For The Estimation of Canopy Heights and Wood Volume of Eucalyptus Plantations in Brazil

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Assessment of GEDI’s LiDAR Data for the Estimation

of Canopy Heights and Wood Volume of Eucalyptus


Plantations in Brazil
Ibrahim Fayad, Nicolas Baghdadi, Clayton Alcarde Alvares, Jose Luiz Stape,
Jean-Stéphane Bailly, Henrique Ferraco Scolforo, Mehrez Zribi, Guerric Le
Maire

To cite this version:


Ibrahim Fayad, Nicolas Baghdadi, Clayton Alcarde Alvares, Jose Luiz Stape, Jean-Stéphane Bailly,
et al.. Assessment of GEDI’s LiDAR Data for the Estimation of Canopy Heights and Wood Volume
of Eucalyptus Plantations in Brazil. IEEE Journal of Selected Topics in Applied Earth Observations
and Remote Sensing, IEEE, 2021, 14, pp.7095-7110. �10.1109/JSTARS.2021.3092836�. �hal-03318147�

HAL Id: hal-03318147


https://hal.inrae.fr/hal-03318147
Submitted on 16 Aug 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License


IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021 7095

Assessment of GEDI’s LiDAR Data for the


Estimation of Canopy Heights and Wood Volume of
Eucalyptus Plantations in Brazil
Ibrahim Fayad , Nicolas N. Baghdadi, Clayton Alcarde Alvares , Jose Luiz Stape, Jean Stéphane Bailly ,
Henrique Ferraço Scolforo, Mehrez Zribi, and Guerric Le Maire

Abstract—Over the past two decades spaceborne LiDAR systems Index Terms—Brazil, dominant heights, eucalyptus, global
have gained momentum in the remote sensing community with their ecosystem dynamics investigation (GEDI), LiDAR, wood volume.
ability to accurately estimate canopy heights and aboveground
biomass. This article aims at using the most recent global ecosystem
dynamics investigation (GEDI) LiDAR system data to estimate I. INTRODUCTION
the stand-scale dominant heights (H dom ), and stand volume (V)
N THE last couple of decades, global concerns on the
of Eucalyptus plantations in Brazil. These plantations provide a
valuable case study due to the homogenous canopy cover and the
availability of precise field measurements. Several linear and non-
I increased atmospheric concentration of greenhouse gases,
such as CO2 has risen the interest in quantifying the state and
linear regression models were used for the estimation of H dom and
V based on several GEDI metrics. H dom and V estimation results
change of forest resources due to the key role of forests in the
showed that over low-slopped terrain the most accurate estimates global carbon cycle [1], [2]. Forests sequester a large quantity
of H dom and V were obtained using the stepwise regression, with of carbon in their woody biomass where they store around 70%
an root-mean-square error (RMSE) of 1.33 m (R2 of 0.93) and 24.39 to 90% of the global terrestrial biomass ranging from 385x109
m3 .ha−1 (R2 of 0.90) respectively. The principal metric explaining to 650x109 Mg [3]. Hence, the accurate estimation of forest
more than 87% and 84% of the variability (R2 ) of H dom and V
was the metric representing the height above the ground at which
biomass is needed to better determine its precise role in the
90% of the waveform energy occurs. Testing the postprocessed global carbon cycle [4], [5]. Forest plantations represent a small
GEDI metric values issued from six available different processing fraction (6.9%) of the total forested land ([6]) but are becoming
algorithms showed that the accuracy on H dom and V estimates is increasingly important around the world, economically, socially
algorithm dependent, with a 16% observed increase in RMSE on and environmentally ([7], [8]).
both variables using algorithm a5 vs. a1. Finally, the choice to select
the ground return from the last detected mode or the stronger of
The primary source of above ground biomass (AGB) estima-
the last two modes could also affect the Hdom estimation accuracy tion in tropical forests at large scales came in the last years
with 12 cm RMSE decrease using the latter. from observations and measurements from different satellite
remote sensing platforms. Methods based on remotely sensed
Manuscript received November 13, 2020; revised January 11, 2021 and May 5, data are less accurate than field measurements, however, their
2021; accepted June 15, 2021. Date of publication June 28, 2021; date of current
version July 26, 2021. This work was supported in part by the French Space Study major advantages are their global and frequent coverage and the
Center (CNES, TOSCA 2020 project), and in part by the National Research low or free acquisition costs for the end user. Currently optical,
Institute for Agriculture, Food, and the Environment (INRAE). (Corresponding radar, and LiDAR are the three main sources of remotely sensed
author: Ibrahim Fayad.)
Ibrahim Fayad and Nicolas N. Baghdadi are with the French National Re- data used in AGB estimation techniques. Nonetheless, current
search Institute for Agriculture, Food and the Environment (INRAE), CIRAD, data sources are either limited to low AGB levels (<150 Mg/ha)
CNRS, TETIS, AgroParisTech, Université de Montpellier, 34093 Montpellier, (sensor saturation at certain biomass levels with radar and optical
France (e-mail: [email protected]; [email protected]).
Clayton Alcarde Alvares is with the UNESP, Faculdade de Ciências data) or have a limited spatial coverage (e.g., airborne LiDAR
Agronômicas Botucatu 18610-034, Brazil, and also with the Suzano SA, Limeira data). LiDAR systems either airborne or spaceborne have the
13465-970, Brazil (e-mail: [email protected]). capability to capture the horizontal and vertical structure of
Jose Luiz Stape is with the UNESP, Faculdade de Ciências Agronômicas,
Botucatu 18610-034, Brazil (e-mail: [email protected]). vegetation comprehensively [9], and can thus estimate biomass
Jean Stéphane Bailly is with the INRAE, IRD, Institut Agro, LISAH, Univer- with better precision in comparison to the techniques using
sité de Montpellier, 34060 Montpellier, France, and also with the AgroParisTech, radar or optical data [10], [11]. To date, there have been only
75005 Paris, France (e-mail: [email protected]).
Henrique Ferraço Scolforo is with the Suzano SA, Limeira 13465-970, Brazil three satellite LiDAR missions. The first mission was the Ice,
(e-mail: [email protected]). cloud, and land elevation satellite (ICESat-1) which carried
Mehrez Zribi is with the Center for the Study of the Biosphere from the geoscience laser altimeter system (GLAS) from 2003 until
Space (CNRS/UPS/IRD/CNES/INRAE), 31401 Toulouse, France (e-mail:
[email protected]). 2009 [12]. Although GLAS’s ∼60 m diameter footprint was
Guerric Le Maire is with the CIRAD, UMR Eco&Sols, 34398 Montpel- larger than the ideal resolution for forest observations [13], its
lier, France, and also with the Eco&Sols, CIRAD, INRA, IRD, Montpel- capability to estimate forest parameters (e.g., canopy heights
lier SupAgro, Université de Montpellier, 34060 Montpellier, France (e-mail:
[email protected]). and biomass) has been exploited in numerous studies during its
Digital Object Identifier 10.1109/JSTARS.2021.3092836 operational and post-operational periods [5], [14]–[20].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
7096 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

ICESat-1 was followed in 2018 by ICESat-2 that carried the Consequently, in such evaluation and comparison of pro-
advanced topographic laser altimeter system (ATLAS) with a cessing algorithms and models for forest height and volume
goal to measure ice-sheet topography, cloud and atmospheric estimation, the field dataset plays a critical role. In this article,
properties and global vegetation. However, the wavelength of the we want to focus on the uncertainty coming from the GEDI
equipped laser (532 nm) has a spectral region of high radiation metrics and models, minimizing the influence of the uncertainty
absorption by the vegetation. This results in a low number of on in situ measurements. To reach this objective, we analyzed
reflected photons measured by ATLAS over vegetation [21], a large dataset of forest plantations in Brazil, which has many
and limits its ability to estimate forest canopy heights [21]. advantages to serve as a test case: large number of sites, dif-
The most recent spaceborne LiDAR system is GEDI on board ferent climate and topographical environments, numerous and
the ISS, which was launched in December 2018 with on-orbit frequent measurements, precise measurements of tree heights,
checkout in April 2019. GEDI’s mission objective is to provide good allometric relationships for wood volume and homoge-
information about canopy structure, biomass and topography, neous canopies, etc. (see description in Section II-C). And even
and is estimated to acquire 10 billion cloud free shots in its two though not representative of all forests, the results obtained on
years mission [22]. GEDI measures vertical structures similar forest plantations can give a notion of the reachable precision
to ICESat-1 (i.e., waveforms). However, given GEDI’s higher on height and wood volume estimation using GEDI data on a
sampling rate (242 versus 40 Hz for ICESat-1), and the much more structurally simple forest than natural forests, while also
smaller footprint size (∼25 versus ∼60 m for ICESat-1), GEDI removing part of the errors due to in situ measurements.
provides a highly improved coverage and waveform precision. The main objectives of this article are therefore summarized
GEDI’s ability to estimate forest height and wood volume in the following questions.
on different types of forest ecosystems, topography and lati- 1) What are the more important GEDI metrics linked to
tudes is of paramount importance. GEDI datasets are organized canopy height and volume?
in different levels of products, from raw acquisition data to 2) Are linear and nonlinear models using subsets of metrics
more elaborated data obtained by performing signal analysis more efficient in predicting height and volume?
and metrics extraction from the waveforms. This results in 3) What is the importance of the different pre-processing
a large number of metrics for each acquired footprint, from algorithms on the final uncertainties?
which, many different models could be used to retrieve canopy 4) Is there an influence of other acquisition characteristics,
heights and wood volume. While direct metrics could be used such as viewing angle on the estimated forest characteris-
as good proxies, it is however acknowledged that combining tics?
different metrics yields higher accuracies. For instance, such 5) Are other stand information, such as data from DEM or
algorithms make use of linear or non-linear regression models age of the stand relevant for the estimation of height and
applied on sets of metrics extracted from GEDI waveforms, volume on forest plantations?
and eventually combined with digital elevation models (DEMs). The manuscript presents first the GEDI dataset, followed by
The full waveform LiDAR data can potentially give access to the processing of GEDI data and the main metrics that will be
more information on canopy structure than the basic “top” and used for the estimation of canopy heights and wood volume.
“bottom” return signals, being itself potentially informative for Next, a description of the used methods for the estimation of
canopy height and volume prediction. Therefore, it is critical the forest characteristics is presented in Section II. Finally,
to explore which metrics, or combination of metrics, and with the results, discussions, and main conclusions are presented in
which type of models (e.g., linear versus nonlinear) provide the Sections IV, V, and VI, respectively
best forest parameter estimates. It is also important to evaluate
the effect of the uncertainty of the metrics estimation themselves, II. STUDY SITE AND DATASETS
which results from differences in preprocessing algorithms, as
well as other acquisition characteristics that may influence the A. Study Area
final models, such as beam acquisition angles. The study area is located in four regions in Brazil, (Bahia &
Precise evaluation of forest height and volume is not an easy Espírito Santo, Mato Grosso do Sul, São Paulo, and Maranhão)
task. One of the main issues is that uncertainties in field mea- across a large latitudinal gradient (see Fig. 1) and covering
surements can propagate through the models and create larger different climate and soil types. The studied plantations are man-
uncertainties in the estimates [23]. For example, Saarela et al. aged in order to produce high yield pulpwood growing at short
[24] and Holm et al. [25] found that not accounting for errors rotations. Clonal seedlings of mainly E. grandis (W. Hill) and E.
in field measurements could underestimate the uncertainty in urophylla (S.T. Blake) and different types of hybrids are planted
final satellite-based AGB maps by a factor of three or more. in rows at a density of 1000–1667 trees/ha, rationally fertilized
Feldpausch et al. [26] and Kearsley et al. [27] found that uncer- with nitrogen, phosphorus, and potassium and micronutrients
tainties in tree height measurements led to increased bias in the to alleviate any nutritional limitations. Harvest occurs every six
biomass and carbon stock estimates. Other obstacles include: to seven years, and very little tree mortality (under 7% from
the influence of tree growth during the timespan between the original plantation) is noticed. The annual productivity of the
field measurements and satellite acquisitions which cannot be plantations was on average 40 m3 /ha/year, with 80% of the
neglected [28]; the comprehensive model validation limited by stands being between 30–50 m3 /ha/year and some stands could
the sparsity of in situ data [29]; and the method used to measure reach values as high as 60 m3 /ha/year. At harvest time, the stand
tree heights [30]. volume is between 180 and 300 m3 /ha, with a dominant height
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7097

Fig. 1. (a) Location of the four study sites. (b) Example of GEDI tracks over some stands. (c) Eucalyptus stand during harvest (approx. 30 m high) illustrating
the clearly separated crown and trunk strata.

of 20 to 35 m range (for 80% of the stands). These plantations the echoed waveforms are digitized to a maximum of 1246 bins
were managed locally by stand units, generally around 50 ha, with a vertical resolution of 1 ns (15 cm), corresponding to a
where the same management is applied: planting, harvesting, maximum of 186.9 m of height ranges, with a vertical accuracy
weed control, genetic material, soil preparation and fertilization. over relatively flat, non-vegetated surfaces of ∼3 cm [31].
There are generally sparse understory and herbaceous strata in As described in the algorithm theoretical basis document
these plantations, as result of chemical weeding the first year, (ATBD) [32], [33], the received waveforms are first smoothed to
the closing of the canopy, and the high competitive strength of reduce the noise in the signal, and thus permitting the determina-
Eucalyptus. Tree height is very homogeneous within a stand, tion of the useful part of the waveform within the corresponding
with 95% of the trees having heights at +/- 10.5% around the footprint. Waveform smoothing is performed by means of a
average tree height in plot inventories. The plantations exhibit Gaussian filter with various widths. As mentioned in the ATBD,
a simple structure, with a tree crown strata of 3 to 10 m in currently a width of 6.5 ns was used for the Gaussian filter
width above a “trunk strata” with few Eucalyptus leaves and (Smooth width). After smoothing, two locations in the waveform
few understories [see Fig. 1(c)]. The “soil strata” is mainly denoted as search start and search end are determined [see
constituted of litter accumulation of branches and leaves, with Fig. 2(a)]. search start and search end are, respectively, the first
some patches of herbaceous species. and last positions in the signal where the signal intensity is above
the following threshold:
B. GEDI Data threshold = mean + σ. v (1)
1) Processing of GEDI Waveforms: GEDI uses three on- where “mean” is the mean noise level, “σ” is the standard
board lasers that produce eight parallel tracks of observations. deviation of noise of the smoothed waveform, and “v” is a
GEDI lasers illuminate a surface or footprint on the ground with constant currently set at 4. After determining the locations of
a 25 m diameter, at a frequency of 242 Hz, over which 3D search start and search end, the region between them, denoted
structures are measured. The footprints are separated by ∼60 as the waveform extent, is extended by a predetermined number
m (center to center) along the track, and the tracks are separated of sample bins, currently set to 100 bins at both sides. Within
by ∼600 m across. Moreover, GEDI has the ability to rotate the waveform extent, the highest (toploc) and lowest (botloc)
up to six degrees, allowing the lasers to be pointed as much detectable returns are determined [see Fig. 2(a)]. The metrics
as 40 km on either side of the ISS’s ground track [22]. GEDI toploc and botloc respectively represent the highest and low-
measures vertical structures using a 1064-nm laser pulse, and est locations within the waveform extent where two adjacent
7098 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

Fig. 2. (a) Example of an acquired GEDI waveform (Rw) over a Eucalyptus stand (Hdom = 25.9m; V= 230.7 m3 .ha−1 ), its smoothing (Sw) and corresponding
waveform metrics. (b) Cumulative energy of the waveform (CE) between botloc and toploc and the corresponding relative heights (RHn ) at different percentages
“n” for the same waveform. One (1) ns corresponds to 15 cm sampling distance in the waveform. The waveform amplitudes are counts from the analog to digital
converter on the instrument.

intensities are above a threshold. The threshold equation used Finally, the position of the ground return within the waveform
to determine toploc and botloc is the same as (1), with “v” is determined using the position of the last detected mode. The
an integer fixed at 2, 3, 4, or 6 (depending on the processing six different algorithms, noted a1 to a6 correspond to different
configuration). In the ATBD, the value of “v” used to determine values of the above-mentioned parameters (see Table I) and lead
toploc is named “Front_threshold” and “back_threshold” for to different estimates of the waveform metrics, and could in turn
botloc. Waveform metric values are extracted using thresholds lead to six different canopy height estimates. Over forest stands,
on Smoothwidth_zcross, front_threshold, and back_threshold. the recorded waveforms are multimodal in shape, with each
Currently, there are six configurations (henceforth referred to mode representing a reflection from a distinct surface height.
as algorithms) of different thresholds on these variables, which Fig. 2(a) shows a typical waveform over a Eucalyptus forest
are used to determine waveform metrics with high precision stand on relatively flat terrain. Over flat terrain, the first Gaussian
for a variety of acquisition scenarios (see Table I). Finally, the corresponds to a reflection from the top of the canopy while the
location of the distinctive peaks or modes in the waveform, such last Gaussian mostly refers to the lowest point in the footprint,
as the ground peak, or top of canopy peaks is determined using i.e., the ground surface.
a second Gaussian filtering of the waveform section between GEDI data used in this article have been already processed
toploc and botloc, and then finding all the zero crossings of and published by the land processes distributed active archive
the first derivative of the filtered waveform [see Fig. 2(a)]. The center (LP DAAC). Currently, three products (L1B, L2A, and
width of the second Gaussian filter (“Smoothwidth_zcross”) L2B) are available for download. The L1B data product [32]
is fixed to either 3.5 or 6.5 ns (based on the algorithm used). contains detailed information about the transmitted and received
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7099

TABLE I
DIFFERENT THRESHOLDS USED IN EACH OF THE SIX ALGORITHMS FOR THE ANALYSIS OF THE RECEIVED WAVEFORMS

waveforms, the location and elevation of each waveform foot-


print and other ancillary information, such as mean and standard
deviation of the noise and acquisition time. The L2A product
[33] contains data of elevation and height metrics of the vertical
structures within the waveform. These height metrics are issued
from the processing of the received waveforms from the L1B
product. Finally, the L2B data product [34] provides footprint-
level vegetation metrics, such as canopy cover, vertical profile
metrics, plant area index and foliage height diversity.
In this article, the received waveforms, their geolocation Fig. 3. Allometric relation between in situ V and Hdom .
(longitude, and latitude), as well as their acquisition times were
extracted from the L1B data product. In the L2A data product,
the derived metrics are also grouped by algorithm. Therefore, for of the first mode (Vloc) and toploc, while the trailing edge
each beam, the metrics derived from each of the six algorithms, extent (Trailext ) is the difference between botloc and the ground
as well as the parameters used for each algorithm are available. return (Gloc) [35]. Two methods will be used to determine the
Therefore, we extracted from L2A for each beam and for each ground position: the position of the mode selected as the lowest
of the six algorithms (a1 through a6), the following variables. non-noise mode from the L2A data product, and the position
1) The position within the waveform of toploc and botloc. of the highest mode between the last two detected modes. The
2) The amplitude of the smoothed waveforms lowest detected viewing angle of GEDI, which represents the angle between
mode (zcross_amp). the looking direction of the instrument and nadir at acquisition
3) The quality flag for each waveform (quality_flag). time, and for each shot, has also been calculated using the
4) The number of detected modes (num_detectedmodes). geolocation of the GEDI instrument available from the L1B data
5) The position and amplitude of each detected mode. product. The viewing angle has been demonstrated in Urban
6) The Relative height metrics at 10% inter- et al. [36] to increase elevation errors for ICESat-1 GLAS when
vals from botloc (0%) to toploc (100%) the viewing angle deviates from nadir due to precision attitude
(RHn , 10% ≤ n ≤ 100%, step 10%). RHn represents determination.
the height between botloc and the location at n% of Finally, as the wood volume (V) increases with canopy height
cumulative energy [see Fig. 2(b)]. No metrics were in a nonlinear shape (see Fig. 3), we calculated RHn for several
extracted from the L2B product, as they were not relevant power values (RHpn , 1 < p ≤ 3, step 0.2).
to this article. All the used variables for the estimation of the stand domi-
2) Calculation of Relevant GEDI and Terrain variables: nant height Hdom (m) and stand merchantable wood volume V
Several linear and non-linear regression models will be tested in (m3 .ha−1 ) are given in Table II.
order to estimate the stand dominant height Hdom (m) and stand The values of the extracted waveform metrics vary by the
merchantable wood volume V (m3 .ha−1 ) from GEDI data. The algorithm used for the processing of the waveforms. Therefore,
models were tested with a priori variables that were extracted in this article, all GEDI metrics were determined for the six
from GEDI waveforms. These variables represent canopy fea- available algorithms. The variability of the metric values based
tures, such as canopy top, canopy trunks, ground, or a mix of on the processing algorithm is given in Table III.
these elements. In addition to the available GEDI waveform 3) Filtering of GEDI Waveforms: Not all GEDI acquisitions
metrics described in the previous section, several additional are viable, as atmospheric conditions (e.g., clouds) can affect
metrics were also extracted. The first is the waveform extent them. Therefore, a waveform was not investigated further if it
(Wext ) which is the height difference between botloc and toploc. met any of the following criteria.
Next, to remove the effects of canopy height variability and 1) Waveforms with reported elevations that are significantly
terrain slope, two indices relying on waveform structure were higher or lower than the corresponding elevations from the
determined. The leading edge extent (Leadext ) is, as defined by SRTM DEM [16]. In essence, we removed all waveforms
Hilbert and Schmullius [35], the difference between the position where the absolute difference is higher than 100 m.
7100 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

TABLE II
LIST OF ALL THE VARIABLES CALCULATED FROM GEDI WAVEFORMS

Variables to be used as predictor variables in the canopy height and wood volume estimation models are highlighted in gray.

TABLE III
MEAN AND STANDARD DEVIATION OF SOME GEDI METRIC VALUES FROM EACH OF THE SIX PROCESSING ALGORITHMS
USING ALL GEDI SHOTS OVER THE 566 SELECTED EUCALYPTUS PLANTATIONS

TABLE IV meets certain criteria based on energy, sensitivity, amplitude,


DISTRIBUTION OF GEDI SHOTS ACROSS THE FOUR STUDY REGIONS
and real-time surface tracking quality, and thus can be processed
further. However, in this article waveforms with either value of
the quality_flag were analyzed.

C. Inventory Measurements
A total of 566 Eucalyptus stands were selected, corresponding
to stands where GEDI footprints acquired between April 20,
in situ Hdom and in situ V represent the 95th percentile of in situ values for each site. 2019 and September 4, 2019 were totally included. An additional
50 m internal buffer strip from the stand borders was used
to account for any footprint geolocation errors and to avoid
2) Waveforms with a difference between waveform extent footprints that match the boundary between the stand of interest
(Wext ) and (Gloc–Vloc) higher than 400 bins (correspond- and the surrounding medium. These 566 Eucalyptus stands were
ing to 60 m) also selected because they had field inventories performed by the
A total of 6166 footprints were acquired over our refer- company close to GEDI’s acquisition date (time difference fewer
ence stands between April 2019 and September 2019, with than two months). Field inventories are performed on several
the majority of these footprints (92.15%) providing exploitable permanent inventory plots within each stand. These inventory
waveforms. Table IV gives the distribution of GEDI shots across plots are systematically distributed throughout the stand with a
the four regions. density of one plot per 10 ha (i.e., a 20 ha stand will have two
GEDI data accessible through NASA’s LP DAAC contain a inventory plots while an 80 ha plot will contain eight inventory
quality flag (quality_flag) for each acquired waveform. A wave- plots). These permanent inventory plots had each an area of
form with a quality flag set to “1” indicates that the waveform approximately 400 m2 including 30 to 100 trees (average of 58
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7101

Fig. 5. Distribution of measurements of (a) dominant canopy heights and (b)


Fig. 4. Relationship between stand age and (a) Hdom (m) and (b) Volume wood volume from field inventories of the 566 Eucalyptus stands. The Y-axis
(m3 .ha−1 ). represents the percentage of samples within each height range (a) and wood
volume (b).

trees) in function of the inventory plot size and planting density.


During a field inventory, the diameter at breast height (DBH, 1.3 (S), Terrain Index (TI ), and surface Roughness (Roug). The
m above the ground) of each tree in the inventory plot, the height TI map was obtained by calculating the difference between the
of a central subsample of 10 trees and the height of the 10% highest and lowest altitude in a 3 × 3 pixel-moving window. The
largest trees in terms of DBH (dominant trees) were measured. surface roughness map was obtained by computing the standard
The mean height of the 10% of the largest trees defined the deviation of the elevation in a 3 × 3 pixel-moving window.
dominant height of the plot (Hdom ), while the mean height of
all trees in the plot (measured + estimated) defined the average III. METHODOLOGY
height of the plot (Hmean ). Hdom , basal area and age on the
inventory date were then used in local volume equations to A. Forest Height Estimation
estimate the plot total and merchantable volume (merchantable The simplest method to estimate Hdom from a GEDI wave-
volume is the tree volume up to the diameter outside bark of 6 form over forest stands with a gently sloping terrain is the height
cm). Stem biomass was then estimated from the stem volume difference between signal start (toploc) and the ground position
using age-dependent estimates of wood biomass density. (Gloc) [37]
As the dates of the inventory measurements were different
from GEDI acquisition dates, only data with a difference fewer Hdom = toploc − Gloc. (2)
than two months in the date between GEDI acquisitions and
For the previous ICESat-1 GLAS waveforms, the ground
inventory were used. In fact, on these fast growing plantations,
return was assumed to be the stronger of the last two detected
a two-month difference could result in an up to 50 cm growth
modes [15]. Therefore, in this article, the stronger between the
in Hdom [see Fig. 4(a)] and 10 m3.ha−1 in V [see Fig. 4(b)].
last two detected modes, as well as the position of the mode in
However, this reasonable compromise allows keeping a large
the field “selected_mode (SM)” from the L2A data product will
number of stands including a large variability of age and growing
be considered as the ground return separately.
conditions. Fig. 5 shows the distribution of field measured Hdom
Estimating canopy heights using (2) has several caveats. For
and volume.
example, over sloping terrain, the ground peak becomes wider,
and the returns from ground and vegetation can be mixed in
D. Digital Elevation Model the case of large footprints, making the identification of the
The DEM with a spatial resolution of 30 m, derived from ground peak return difficult and the estimation of forest height
the Shuttle Radar Topography Mission (SRTM), was used in inaccurate ([5], [37]). To remove or minimize the terrain slope
this article. Three variables were derived from the DEM: slope effect on the waveforms, as well as the vegetation variability,
7102 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

statistical approaches have been developed and used in several effect) between the explanatory variables. For this article, the
studies to predict canopy heights from GLAS data (e.g., [5], [15], number of trees in the RF were set to 100 trees (higher tree count
[35], [38], [39]). These approaches proposed regression models slightly increased model accuracy), with a tree depth equal to
based either on only waveform metrics or on both waveform the square root of the number of available factors.
metrics and terrain information derived from DEMs. Finally, since random forests are nonlinear and nonparamet-
The first statistical model was developed by Lefsky et al. [5] ric, we only used the original relative heights without modifica-
to estimate the maximum canopy height (Hdom ) from GLAS tion (i.e., RH1n , 10% ≤ n ≤ 100%, step 10%.)
waveforms
Hdom = aWext − bTI . (3) B. Wood Volume Estimation
The coefficients a and b are fitted using least squares regres- The estimation of aboveground biomass has been proven to be
sion (Hdom given by inventory measurements, Wext is derived successful using ICESat-1 GLAS waveforms as demonstrated
from the GEDI waveform, and TI is calculated from the SRTM by several studies ([15]–[17]). In this article, four models were
DEM, see Section II-D). For our dataset, TI values calculated tested to estimate wood volume from GEDI waveforms based
from the SRTM DEM ranged from 1 to 46 m. The incorporation on Hdom estimates. The first model was adapted from Lefsky
by Payn et al. [6] of the waveform leading edge extent in (4) et al. [5] for the estimation of wood volume (instead of AGB
showed a slight improvement on canopy height estimation in its original formulation), using the squared dominant canopy
heights (Hdom )
Hdom = aWext − bTI + cLeadext . (4)
2
Over sloping terrain, Lefsky et al. [38] observed that the V = a + bHdom . (9)
waveform extent is insufficient for estimating canopy heights.
The second tested model was adapted from Saatchi et al.
Hence, a new model based on the waveform extent, leading edge
[41], and uses a power law relationship between the volume
extent, and trailing edge extent was proposed. However, Pang
and Lorey’s height
et al. [39] observed inaccurate estimates of canopy heights with
the improved model by Lefsky et al. [38], especially for small V = aHL b (10)
waveform extents, and thus proposed a simpler model to estimate
canopy heights using the following equation: where HL is Lorey’s height which weighs the contribution of
c trees (all trees >10 cm in diameter) to the stand height by
Hdom = aWext − {b (Leadext + Trailext )} . (5)
their basal area. In this article, the relationship defined in (10)
The nonlinear model by Pang et al. [39] was further simplified was used by replacing Lorey’s height with the dominant height
by Chen [15] as both height values were similar (HL was lower than Hdom
Hdom = aWext − b (Leadext +Trailext ) (6) by a maximum of 0.9 m at the end of the rotation of the
Eucalyptus plantation) [16]. For both models (9 and 10), the
Baghdadi et al. [16] tested additional models for the estima- coefficients a and b were first fitted using in situ measurements
tion of canopy heights using ICESat-1 GLAS waveforms, of of dominant height and wood volume (see Fig. 6), and then, the
which, two will be tested in this article. The first model uses the calibrated equations were used to estimate wood volume using
Trailext and TI the dominant height predicted from GEDI footprints (best model
Hdom = aWext − bTI + cTrailext . (7) from Section III-A).
Similarly to Section III-A, a stepwise linear regression model
The second model uses exclusively GEDI metrics (SRV) and a random forest regressor (RFV) were used to esti-
Hdom = aWext − bLeadext − cTrailext + d. (8) mate the wood volume.

In addition to the previously described models, a stepwise


C. Model Assessment
multilinear regression aimed at estimating canopy dominant
heights (SRH) and automatically choosing a set of predictive To assess how the tested models generalize to an independent
explanatory variables among all possible variables presented in data set, a five-fold cross validation was used. Large k-fold val-
Table II. The choice of adding or removing a variable from the ues mean less bias towards overestimating the true expected error
SRH model is based on the increase or decrease of the mean (as training folds will be closer to the total dataset). Moreover,
squared error (MSE). since there are several GEDI footprints inside each stand, and
We also estimated canopy dominant heights through nonlinear the stands are very homogeneous, the five-fold splitting was
nonparametric regressions by means of a random forest regressor also done along the stands in order to reduce fitting bias. In
(RFH). Random forests are an ensemble of machine learning essence, GEDI footprints inside the same stand were used for
algorithms used for classification or regressing by fitting a either training or validation. Finally, models’ performance were
number of decision trees on various sub-samples of the dataset, assessed using the coefficient of determination (R²), the bias
and use averaging to improve the predictive accuracy and control (measured—estimated), the root-mean-square error (RMSE),
overfitting [40]. Compared to linear models, RF is advantageous the root mean squared percentage error (RMSPE), and the
for being able to model also nonlinear relationships (threshold Akaike information criterion (AIC). R2 , RMSE, and RMSPE
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7103

Fig. 6. Comparison of measured vs. estimated Hdom from the models presented in Section III-A using GEDI metrics extracted with algorithm a1 (see Table I).
RMSE is expressed in meters (m).

are defined as follows: (last detected nonnoise mode). The estimation of the canopy
n dominant heights (Hdom ) using the linear regression models
(yi − yi )2
R = 1 − in= 1
2
2 (11) [(3) through 8] with five-fold cross validation shows an accu-
i = 1 (yi − ȳ) racy (RMSE) between 1.70 and 2.31 m with a coefficient of
 determination (R2 ) between 0.80 and 0.89 (see Fig. 6). More-
 n
1  over, the contribution of the trailing edge extent appeared to
RMSE =  · (yi − yi )2 (12)
n i=1 be higher than that of the leading edge extent [see (7) versus
 (4), Table V]. However, the best model between (3) through
 n  2 (8) was (8) (RMSE = 1.70 m and R2 = 0.89) which uses
1  yi − yi
RMSPE = 100 ·  · (13) both Leading and Trailing edge extents, with an independent
n i=1 yi coefficient fitted for each variable. The introduction of terrain
information in the linear regression models did not show any
where yi is the observed value, yi the estimated value, ȳ is the significant improvements on the accuracy of the estimations.
mean of all the observed values, and n is the sample size. The stepwise linear regression model (see Fig. 6, SRH)
The AIC proposed by Akaike [42] is a measurement of the showed slightly better accuracy for the estimation of canopy
relative goodness of fit of a statistical model to the true values. heights (RMSE = 1.44 m, R2 = 0.93) in comparison to Eq.8.
By calculating AIC values for each model, the most performant However, unlike Eq.8 which relied on Wext , Leadext , and
model based on the lowest AIC values can be identified. Trailext , the most contributing variables for the estimation of
the canopy heights using the SRH model were RH90 , followed
IV. RESULTS by RH10 , RH80 , and RH100 . Meanwhile, the other metrics (e.g.,
Leadext , Trailext , TI, etc.,) were not necessary.
A. Canopy Height Estimation Furthermore the estimation of canopy heights using only
We start our model performance analysis using GEDI met- RH90 (by linear fitting) showed an RMSE of 1.63 m with an
rics extracted from algorithm a1 (see Table I), and the ground R2 of 0.90, and this accuracy could be improved to an RMSE
location as determined from the SM field from the L2A dataset of 1.5 m (R2 of 0.91) by only adding RH1.8 10 . The estimation of
7104 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

TABLE V
MODELS’ PERFORMANCE AND THE FITTED LINEAR EQUATIONS FOR ESTIMATING EUCALYPTUS STAND DOMINANT HEIGHTS

The variables are described in Section II-B-2, with the models described in Section III-A

Fig. 7. Classification of the variable importance by decreasing order of importance in the RFH model for stand dominant height estimation. The importance is
measured via the average percentage increase of MSE (%IncMSE) over 50 repetitions. The red bars indicates the standard deviation of %IncMSE.

TABLE VI
ACCURACY (RMSE IN M) OF THE MODELS PRESENTED IN SECTION III-A FOR THE ESTIMATION OF Hdom USING GEDI METRIC VALUES
EXTRACTED USING THE SIX DIFFERENT ALGORITHMS (A1 THROUGH A6)

Hdom using the random forest regressor (RFH, Fig. 6) with the The estimation of Hdom using the models described previ-
GEDI metrics in Table II (p in RHpn was set to 1 for RFH) as the ously with GEDI metrics extracted from the five remaining
dependent variables showed an accuracy on the canopy height algorithms (a2 to a6, see Table I) has been also tested. The results
estimates similar to that of the SRH model. presented in Table VI show that for the linear regression models
The variable importance test of the metrics (see Fig. 7) used in [(3) through (8)], canopy height estimation was worst with the
RFH showed that the most contributing factors for the estimation metrics from algorithms a2 through a6 in comparison to the
of GEDI canopy heights is a combination of RH90 , RH100 , and metrics from algorithm a1 with an RMSE on the canopy height
to lesser extent RH80 . These results show that in a low relief area estimates ranging from 2.40 m (R2 of 0.78, a3) to 5.06 m (R2 of
the use of other metrics in addition to the RH90 only slightly 0.02, a5). This is to be expected given the low terrain relief in
improved the precision of the estimation of canopy heights. our study area (mean slope of 4.7 ± 3%).
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7105

Fig. 8. Comparison between Measured Hdom and estimated Hdom using only Wext values (Hdom = α.Wext + β) from the six algorithms (a1 through a6).
RMSE is expressed in meters (m).

The low accuracy obtained with algorithm a5 is due to the low TABLE VII
DIFFERENCE IN ACCURACY (RMSE IN M) ON Hdom BASED ON THE CHOICE
thresholds used for the front and back thresholds (3.σ and 2.σ, OF THE SELECTED GROUND MODE FOR THE DIFFERENT MODELS DESCRIBED
Table I), which result in larger waveform extents. This is evident IN SECTION III-A, AND METRICS EXTRACTED USING ALGORITHM A1
when trying to estimate canopy heights based solely on the
waveform extent ( Hdom/insitu = α.Wext + β), with the results
in Fig. 8 showing that the metrics extracted using algorithm a5,
especially the waveform extent (Wext ), were the least correlated
to Hdom , with an RMSE of 4.38 m (R2 of 0.26).
In contrast to the linear regression models, canopy height
estimation using SRH or RFH with metrics from algorithms
a2, a3, a5, and a6 showed accuracies similar to those obtained
with algorithm 1 (see Table VI). In contrast, algorithm a5 was
slightly less accurate with an RMSE of respectively 1.6 m (R2
of 0.90) and 1.80 m (R2 of 0.88) when using SRH and RFH.
Finally, the effects of the method to select the ground return
has been studied. The results presented thus far have been based
on detecting the ground mode from the SM provided in the L2A
data product. SM detects the ground return as being the lowest,
SM = Ground mode from SM (last detected mode)
nonnoisy mode, which usually refers to the last detected mode. provided in the L2A data product. HL2M = ground
Previous studies that used GLAS waveforms suggested that the mode corresponding to the higher amplitude between
mode with the higher amplitude between the last two modes the last two modes.

(HL2M) is a better indicator of the ground return [15], [16].


In this article, Hdom estimation was also tested using the same
models described in Section III-A, with the metrics calculated presented in (9) and (10) were fitted using in situ V as the
relatively to the mode with the higher amplitude between the last estimated variable and in situ Hdom as the predictor on the 566
two modes, as the ground return. The results in Table VII show studied stands. Next, to estimate V, the fitted models in (9) and
that the models relying mostly on the relative canopy heights (10) used the estimated Hdom values from SRH with GEDI met-
(RHn ), such as SRH, or the trailing edge extent fitted separately rics extracted using algorithm a1, while SRV and RFV used the
[(7) and (8)] had an increase in accuracy on the canopy height GEDI waveform metrics from Table II extracted with algorithm
estimates between 12 and 20 cm. a1 (p in RHpn was set to 1 for RFV). The estimation results of
stand volume V (see Fig. 9, Table VIII) show that the four tested
models produced similar accuracies on the estimation of V, with
B. Wood Volume Estimation
an RMSE between 24.39 and 27.45 m3 .ha−1 and coefficient of
Four models were used to estimate the stand volume V. Two determination (R2 ) between 0.87 and 0.90. Moreover, the results
power functions as presented in Section III-B [(9) and (10)], in Fig. 9 also show that the estimations of V were close to the 1:1
a stepwise multilinear regressing model (SRV) and a random for all values of V between 0 and 250 m3 .ha−1 , while they under
forest based model (RFV). The coefficients of the models as estimated the volume for V values higher than 250 m3 .ha−1 . For
7106 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

TABLE VIII
MODELS’ PERFORMANCE AND THE FITTED LINEAR EQUATIONS FOR ESTIMATING EUCALYPTUS STAND WOOD VOLUME (V)

The variables are described in section II.B.2, with the models described in section III.B

TABLE IX
ACCURACY (RMSE IN M3 .HA−1 ) OF THE MODELS PRESENTED IN SECTION III-B FOR THE ESTIMATION OF V USING GEDI
METRIC VALUES EXTRACTED USING THE SIX DIFFERENT ALGORITHMS (A1 THROUGH A6)

The estimation of V was also tested using GEDI metric values


extracted using the remaining five algorithms (a2 through a6).
The results presented in Table IX show that the estimates of V
were mostly similar with (9) and (10) across all algorithms. On
the other hand, SRV and RFV show less accurate estimates of
V using GEDI metrics from algorithms a2 through a6 compared
to a1.
Finally, the choice of the method for detecting the ground
return was also studied. The results, unlike those obtained when
estimating Hdom , did not show any significant improvements
when considering the last detected mode, or the stronger between
the last two modes.

V. DISCUSSION
The different tested models in this article showed that GEDI
waveform metrics could be used to obtain good accuracies of
canopy heights and wood volumes, with a RMSPE of 7.1% on
Fig. 9. Comparison of measured vs. estimated wood volume from the models
presented in section III.B using GEDI metrics extracted with algorithm a1.
canopy height estimation and 20.4% on wood volume estima-
RMSE is expressed in m3 .ha−1 . tion. Moreover, GEDI waveforms appear to be of high quality
given the very little variability on the estimation of Hdom and V
from the individual footprints within a given stand. Indeed, the
the four models, the relative RMSE increased from ∼18% for V accuracy (RMSE) on the estimation of Hdom using the mean
less than 250 m3 .ha−1 to ∼40% for V higher than 250 m3 .ha−1 . estimates from SRH of the individual footprints was 1.33 m
The bias (mean difference of in situ V and estimated V) for V (R2 of 0.93) versus 1.32 m (R2 of 0.93) when averaging Hdom
higher than 250 m3 .ha−1 was also more apparent, and decreased estimates from GEDI for each stand. Similarly, the accuracy
from 2.2 m3 .ha−1 (average bias from all models) for V less than on the estimates of V using the mean estimates from SRV was
250 m3 .ha−1 to 26.5 m3 .ha−1 for V higher than 250 m3 .ha−1 . 24.39 m3 .ha−1 (R2 of 0.90) and 23.93 m3 .ha−1 (R2 of 0.91) for
The variable importance test of the GEDI metrics (see Fig. 10) the average of V from GEDI over each stand.
showed that the three most contributing factors on the estimation The most important GEDI variable for the estimation of Hdom
of V using the random forest regressor (RFV) were the same and V is RH90 , which explained respectively more than 87%
as those for the estimation of canopy heights, with the highest and 84% of the variability of Hdom and V. Some of the remaining
contributor being RH90 , followed by RH80 and RH100 . variability are explained by different GEDI metrics based on the
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7107

Fig. 10. Classification of the variable importance by decreasing order of importance in the RFV model for stand wood volume estimation. The importance is
measured via the average percentage increase of MSE (%IncMSE) over 50 repetitions. The red bars indicates the standard deviation of %IncMSE.

model used. In the case of the stepwise linear regression models,


RH10 , RH80 , and RH100 are used for the estimation of Hdom
and RH10 , RH30 , RH80 , and RH20 for the estimation of V.
The success of the models in estimating Hdom and V with good
accuracy from GEDI metrics relies heavily on the accuracy of
extracting such metrics from the raw waveform data, and on the
precision of the field measurements. GEDI datasets provide met-
rics issued from six algorithms, and the same metric can differ
in value from one algorithm to another. This was evident from
our results where the accuracy (RMSE) in estimating both Hdom
and V was slightly less accurate with GEDI metrics generated Fig. 11. The effects of adding stand age (in situ information) as a predictor
variable on the estimation of (a) stand dominant heights Hdom (SRHa, m), and
from algorithms a2 through a6 in comparison to a1. Therefore, (b) stand volume V (SRVa, m3 .ha−1 ) in a stepwise regression model.
the choice of algorithm of which the metrics are calculated
from, is important. Nonetheless, GEDI L2A datasets already
provide a field for each footprint called “selected_algorithm” Moreover, since GEDI data were acquired up to two months
that recognizes the algorithm selected as identifying the lowest before or after in situ measurements, stand growth that happened
mode (last detected mode) with less noise. This field could be between this time gap added to the uncertainty of our estimations
used as an indicator of the algorithm that provides the most (up to 50 cm difference). Canopy age plays an important role in
accurate metrics. Indeed, for our dataset, the best accuracy on understanding the variability of both Hdom and V. Fig. 4(a) and
both Hdom and V was observed with metrics extracted using (b) show that both Hdom and V are a log function of stand age. In
algorithm a1, which was also the algorithm suggested by the fact, adding the log transform of stand age to the stepwise linear
“selected_algorithm” for more than 99% of the studied foot- regression models enhances the accuracy (RMSE) on Hdom by
prints. ∼13% (RMSE of 1.33 m without age and 1.18 m with age) and
Another variable that also affects the accuracy of the extracted the volume estimation by ∼7% (24.39 versus 22.77 m3 .ha−1
metrics, such as RHn , Trailext , or Leadext is the location of the with age) (see Fig. 11). Moreover, the addition of the age of
ground return. In this article, the ground return was identified canopies to the SRH and SRV models also helps reduce the
either by the SM field in the L2a dataset, or by identifying the difference between estimates and measurements of some outliers
mode with the higher amplitude between the last two detected points (see Fig. 11). The interest of using the stand age has
modes. Our results, which were obtained over study sites with also been demonstrated by Le Maire et al. [43] in their study
mostly homogenous canopy cover and flat terrain, indicated that over Eucalyptus plantations with MODIS optical data. Other
the second method improved the estimation accuracy (RMSE) properties of the canopy, such as crown dimension, leaf area
on Hdom by up to 20 cm. On the other hand, for the estimation index, distribution of leaf angles, tree gaps due to mortality,
of V, the choice of ground return did not have any effect on the could also add variability to GEDI waveforms. Therefore, by
accuracy. However, for highly dense vegetated areas as in the adequately filtering the stand, one could obtain more accurate
case of tropical forests, where the ground return is not easily estimates of Hdom and V. On the opposite, the results presented
identifiable, choosing the strongest mode between the last two in this article are specific to Eucalyptus plantations, since most
as the ground return should provide better accuracies on Hdom sources of canopy variations are included.
and V [15], [16]. The instrumental factors affecting the estimation accuracy
Some of the uncertainties on the estimation of Hdom and V include the viewing angle of GEDI at acquisition time. In fact,
can be attributed to some biophysical properties of the canopy GEDI acquires data along eight beams with varying viewing
that could not be quantified using GEDI data alone, or due to in- angles (VA), which could affect the estimation accuracy. In this
strumental factors. To understand the effect of these biophysical article, the analysis of estimation accuracy on both Hdom and
properties, additional parameters could therefore be required. V according to the acquiring beam (see Table X) shows that
7108 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

TABLE X the quality_flag (either 0 or 1) showed the same accuracy on the


COMPARISON OF THE ESTIMATION ACCURACY ON Hdom AND
V BASED ON THE ACQUIRING BEAM
estimation of both Hdom and V for all the tested models.
VI. CONCLUSION
In this article, we analyzed GEDI data in order to determine its
accuracy in estimating stand-scale dominant heights (Hdom ) and
stand volume (V) of intensively managed Eucalyptus plantations
in Brazil. Hdom and V values have been estimated using the
most accurate models used for estimating forest height and
aboveground biomass from ICESat-1 waveforms. The GEDI
waveform metrics used in the Hdom and V estimation models
were extracted using algorithms provided by the land processes
distributed active archive center (LP DAAC), in addition to
already established metrics for ICESat-1 waveforms. Overall,
5517 GEDI shots over 566 Eucalyptus stands were analyzed
Estimation results are produced using stepwise lin- over our study area.
ear regression models with GEDI metric values For our study site defined by flat and gently sloping terrains
extracted using algorithm a1.
(average slope < 5°), six regression models, a stepwise linear
regression model (SRH), and a random forest regressor (RFH)
the most accurate data were from beams 8, 3 and 4, while the using GEDI waveform metrics were assessed on their accuracy
least accurate Hdom and V estimates were obtained using data to estimate Hdom . Results showed that the most accurate model
from beam 5. Indeed, the difference between the most accurate was SRH with an RMSE on the Hdom estimates of 1.33 m (R2 of
estimates (beam 3) and the least accurate (beam 5) is 46 cm for 0.93) using metrics extracted with the configuration of algorithm
Hdom and of 9 m3 .ha−1 for V. A preliminary analysis of VA a1 and considering the higher mode between the last two modes
values shows that beam 5 acquisitions had on average higher as the ground return. For this model, the most relevant metrics for
VA values. the estimation of Hdom was the 90th percentile relative height
A part of the uncertainties on the estimation of Hdom and V (RH90 ), followed by RH10 , RH80 , and RH100 .
can be attributed to the heterogeneity of the Eucalyptus stands. Stand wood volume (V) was modeled following power law
In fact, the present study compared GEDI acquisitions with with the canopy height, a stepwise regression model (SRV), and
stand-scale averaged Hdom and V obtained from 1 to up to a random forest regressor (RFV). The four tested models showed
10 permanent inventory plots. These in situ measurements are similar accuracies, with SRV being the most accurate one with
therefore the result of only a few observations within the stand. an RMSE of 24.39 m3 .ha−1 (relative error ∼20% of the wood
The hypothesis that the stands are homogeneous enough for volume average). Similar to SRH, the most relevant metrics for
using stand-scale averages is sometimes challenged. Indeed, the estimation of V using SRV was RH90 , followed by RH10 ,
some stands have high variability between their inventory plots, RH30 , RH80 , and RH20 .
which reaches more than 6 m in some cases. A more precise The choice of the algorithm used to extract the waveform met-
analysis of intra-stand variability of GEDI waveforms variations rics affected sometimes the accuracy, as metrics extracted using
could help determine which Eucalyptus stand could be compared algorithm a5 showed ∼16% higher RMSE on the estimation of
to stand-scale values of Hdom and V. both Hdom and V. Nonetheless, the field “selected_algorithm”
Another part of the uncertainties stems from the generalized from the L2A dataset provides a robust indicator of the algo-
models used in this article to estimate Hdom and V. Indeed, rithm that provides the most accurate metrics. In addition to
we used a single model across the entire dataset with disregard the accuracy of the metrics on the accuracy of Hdom , selecting
to the variability of growing conditions over each study site, the higher of the last two modes as the ground return could
leading to different canopy structures and stand-scale allometric potentially increase the accuracies on Hdom from 12 to 20 cm.
relationships. A preliminary analysis of locally trained random Not all the variability on Hdom and V could be explained by
forest models shows that for the estimation of Hdom , a locally GEDI alone. For example, including the stand age in the stepwise
trained model could slightly improve the estimation results (a regression models help decrease the RMSE on the estimation of
maximum RMSE decrease of 16 cm was observed, Table S1). Hdom and V by, respectively, ∼13% and ∼7%. Nonetheless, and
However, for the estimation of V, a locally trained model could despite our efforts, underestimation of V using GEDI data was
reduce volume estimation errors by as much as 12.8 m3 .ha−1 (see still observed for V greater than 250 m3 .ha−1 .
Table S2). The difference in accuracies on the estimation of V Finally, given the high accuracy of GEDI data on the estima-
are most probably due to the allometric relations between Hdom tion of tree heights and volume, GEDI can provide an excellent
and V, which could vary greatly between one region and another source of information to calibrate and validate upcoming and
even for the same tree types, but further analysis is required to future radar missions, such as the upcoming P-band BIOMASS
confirm these results. mission. GEDI can also supplement radar data by means of
Finally, after the filtering scheme applied to our dataset (see data fusion models in order to obtain high resolution and very
Section II-B-3), the remaining footprints with either value of accurate wall-to-wall maps of forest properties.
FAYAD et al.: ASSESSMENT OF GEDI’S LIDAR DATA 7109

AUTHOR CONTRIBUTIONS [15] Q. Chen, “Retrieving vegetation height of forests and woodlands over
mountainous areas in the pacific coast region using satellite laser altime-
Ibrahim Fayad–Conceptualization, Methodology, software, try,” Remote Sens. Environ., vol. 114, no. 7, pp. 1610–1627, Jul. 2010.
validation, formal analysis, data curation, visualization, writing– [16] N. Baghdadi et al., “Testing different methods of forest height and
aboveground biomass estimations from ICESat/GLAS data in eucalyptus
original draft. Nicolas Baghdadi—conceptualization, method- plantations in Brazil,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens.,
ology, validation, formal analysis, data curation, writing— vol. 7, no. 1, pp. 290–299, Jan. 2014, doi: 10.1109/JSTARS.2013.2261978.
original draft. Clayton Alcarde Alvares—Conceptualization, [17] J. Boudreau, R. Nelson, H. Margolis, A. Beaudoin, L. Guindon, and
D. Kimes, “Regional aboveground forest biomass using airborne and
validation, writing—review and editing. Jose Luiz Stape– spaceborne LiDAR in Québec,” Remote Sens. Environ., vol. 112, no. 10,
Conceptualization, validation, writing - review and editing. pp. 3876–3890, Oct. 2008.
Jean Stéphane Bailly—Validation, writing - review and edit- [18] M. El Hajj, N. Baghdadi, N. Labrière, J.-S. Bailly, and L. Villard, “Mapping
of aboveground biomass in Gabon,” Comptes Rendus Geosci., vol. 351,
ing. Henrique Ferraço Scolforo—Conceptualization, validation, no. 4, pp. 321–331, Apr. 2019.
writing—review and editing. Mehrez Zribi—Validation, writing [19] M. R. Pourrahmati et al., “Capability of GLAS/ICESat data to estimate
- review and editing. Guerric Le Maire—Conceptualization, forest canopy height and volume in mountainous forests of Iran,” IEEE J.
Sel. Top. Appl. Earth Observ. Remote Sens., vol. 8, no. 11, pp. 5246–5261,
validation, writing—review and editing. Nov. 2015, doi: 10.1109/JSTARS.2015.2478478.
[20] M. R. Pourrahmati et al., “Mapping Lorey’s height over Hyrcanian forests
ACKNOWLEDGMENT of Iran using synergy of ICESat/GLAS and optical images,” Eur. J. Remote
Sens., vol. 51, no. 1, pp. 100–115, Jan. 2018.
The authors would like to thank the GEDI team and the NASA [21] A. Neuenschwander and K. Pitts, “The ATL08 land and vegetation product
for the ICESat-2 mission,” Remote Sens. Environ., vol. 221, pp. 247–259,
LPDAAC (Land Processes Distributed Active Archive Center) Feb. 2019.
for providing GEDI data. The authors acknowledge Suzano´s [22] R. Dubayah et al., “The global ecosystem dynamics investigation: High-
researchers Italo Ramos Cegatta, Renan Tarenta Meirelles Brasil resolution laser ranging of the earth’s forests and topography,” Sci. Remote
Sens., vol. 1, Jun. 2020, Art. no. 100002.
and Carla Foster Feria for their technnical support and the [23] A. Persson, J. Holmgren, and U. Soderman, “Detecting and measuring in-
CIRAD Suzano project. Suzano SA Company supported the dividual trees using an airborne laser scanner,” Photogramm. Eng. Remote
forest-field data collection. Sens., vol. 68, no. 9, pp. 925–932, 2002.
[24] S. Saarela et al., “Hierarchical model-based inference for forest inventory
utilizing three sources of information,” Ann. Forest Sci., vol. 73, no. 4,
REFERENCES pp. 895–910, Dec. 2016.
[25] S. Holm, R. Nelson, and G. Ståhl, “Hybrid three-phase estimators for
[1] M. Main-Knorn et al., “Monitoring coniferous forest biomass change using
large-area forest inventory using ground plots, airborne LiDAR, and space
a Landsat trajectory-based approach,” Remote Sens. Environ., vol. 139,
LiDAR,” Remote Sens. Environ., vol. 197, pp. 85–97, Aug. 2017.
pp. 277–290, Dec. 2013. [26] T. R. Feldpausch et al., “Tree height integrated into pantropical for-
[2] A. Peregon and Y. Yamagata, “The use of ALOS/PALSAR backscatter to
est biomass estimates,” Biogeosciences, vol. 9, no. 8, pp. 3381–3403,
estimate above-ground forest biomass: A case study in Western Siberia,”
Aug. 2012.
Remote Sens. Environ., vol. 137, pp. 139–146, Oct. 2013.
[27] E. Kearsley et al., “Conventional tree height–diameter relationships sig-
[3] R. Houghton, F. Hall, and S. J. Goetz, “Importance of biomass in the global
nificantly overestimate aboveground carbon stocks in the central Congo
carbon cycle,” J. Geophys. Res., Biogeosci., vol. 114, no. G2, 2009.
Basin,” Nat. Commun., vol. 4, no. 1, Oct. 2013.
[4] T. E. Fatoyinbo and M. Simard, “Height and biomass of mangroves in
[28] Y. Su, Q. Ma, and Q. Guo, “Fine-resolution forest tree height estimation
Africa from ICESat/GLAS and SRTM,” Int. J. Remote Sens., vol. 34, across the Sierra Nevada through the integration of spaceborne LiDAR,
no. 2, pp. 668–681, Jan. 2013.
airborne LiDAR, and optical imagery,” Int. J. Digit. Earth, vol. 10, no. 3,
[5] M. A. Lefsky et al., “Estimates of forest canopy height and aboveground
pp. 307–323, Mar. 2017.
biomass using ICESat: ICESAT Estimates of Canopy Height,” Geophys.
[29] H. Tang et al., “Deriving and validating leaf area index (LAI) at multiple
Res. Lett., vol. 32, no. 22, Nov. 2005. spatial scales through lidar remote sensing: A case study in Sierra National
[6] T. Payn et al., “Changes in planted forests and future global implications,”
Forest, CA,” Remote Sens. Environ., vol. 143, pp. 131–141, Mar. 2014.
Forest Ecol. Manage., vol. 352, pp. 57–67, 2015.
[30] Y. Wang et al., “Is field-measured tree height as reliable as believed –
[7] P. Elias and D. Boucher, Planting for the Future. How Demand for Wood
A comparison study of tree height estimates from field measurement,
Products Could Be Friendly to Tropical Forests. Cambridge, MA, USA: airborne laser scanning and terrestrial laser scanning in a Boreal forest,”
Union Concerned Scientists, 2014.
ISPRS J. Photogramm. Remote Sens., vol. 147, pp. 132–145, Jan. 2019.
[8] R. Pirard, L. Dal Secco, and R. Warman, “Do timber plantations con-
[31] R. Dubayah et al., “The global ecosystem dynamics investigation: High-
tribute to forest conservation?,” Environ. Sci. Policy, vol. 57, pp. 122–130,
resolution laser ranging of the earth’s forests and topography,” Sci. Remote
Mar. 2016.
Sens., vol. 1, Jun. 2020, Art. no. 100002.
[9] M. A. Lefsky, W. B. Cohen, G. G. Parker, and D. J. Harding, “Lidar remote
[32] S. L. R. Dubayah, “GEDI L1B geolocated waveform data global footprint
sensing for ecosystem studies,” Bio. Sci., vol. 52, no. 1, pp. 19, 2002.
level V001,” NASA EOSDIS Land Processes DAAC, 2020. Accessed: Jul.
[10] R. Nelson, K. J. Ranson, G. Sun, D. S. Kimes, V. Kharuk, and P. Montesano, 2021. [Online]. Avilable: https://doi.org/10.5067/GEDI/GEDI01_B.001
“Estimating Siberian timber volume using MODIS and ICESat/GLAS,”
[33] S. L. R. Dubayah, “GEDI L2A elevation and height metrics data global
Remote Sens. Environ., vol. 113, no. 3, pp. 691–701, Mar. 2009.
footprint level V001,” NASA EOSDIS Land Processes DAAC, 2020.
[11] R. O. Dubayah et al., “Estimation of tropical forest height and biomass
Accessed: Jul. 2021. [Online]. Avilable: https://doi.org/10.5067/GEDI/
dynamics using Lidar remote sensing at La Selva, Costa Rica: Forest GEDI02_A.001
dynamics using LiDAR,” J. Geophys. Res., vol. 115, Jun. 2010.
[34] S. L. R. Dubayah, “GEDI L2B canopy cover and vertical profile metrics
[12] B. E. Schutz, H. J. Zwally, C. A. Shuman, D. Hancock, and J. P. DiMarzio,
data global footprint level V001,” NASA EOSDIS Land Processes DAAC,
“Overview of the ICESat mission,” Geophys. Res. Lett., vol. 32, no. 21,
2020. Accessed: Jul. 2021. [Online]. Avilable: https://doi: 10.5067/GEDI/
2005, Art. no. L21S01.. GEDI02_B.001
[13] Y. Pang, M. Lefsky, G. Sun, and J. Ranson, “Impact of footprint di-
[35] C. Hilbert and C. Schmullius, “Influence of surface topography on ICE-
ameter and off-nadir pointing on the precision of canopy height esti-
Sat/GLAS forest height estimation and waveform shape,” Remote Sens.,
mates from spaceborne LiDAR,” Remote Sens. Environ., vol. 115, no. 11,
vol. 4, no. 8, pp. 2210–2235, Jul. 2012.
pp. 2798–2809, Nov. 2011. [36] T. J. Urban, B. E. Schutz, and A. L. Neuenschwander, “A survey of
[14] I. Fayad et al., “Aboveground biomass mapping in French Guiana by
ICESat coastal altimetry applications: Continental coast, open ocean is-
combining remote sensing, forest inventories and environmental data,”
land, and inland river,” Terr. Atmospheric Ocean. Sci., vol. 19, pp. 1–19,
Int. J. Appl. Earth Observ. Geoinf., vol. 52, pp. 502–514, Oct. 2016.
2008.
7110 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

[37] D. J. Harding, “ICESat waveform measurements of within-footprint to- Jose Luiz Stape received the Ph.D. degree in forest
pographic relief and vegetation vertical structure,” Geophys. Res. Lett., ecology from Colorado State University, Fort Collins,
vol. 32, no. 21, 2005, Art. no. L21S10. CO, USA, in 2002.
[38] M. A. Lefsky, M. Keller, Y. Pang, P. B. De Camargo, and M. O. Hunter, He is a Permanent Graduate Professor of Forest
“Revised method for forest canopy height estimation from geoscience laser Ecophysiology with Sao Paulo State University (UN-
altimeter system waveforms,” J. Appl. Remote Sens., vol. 1, no. 1, 2007, ESP, Brazil), Sao Paulo, Brazil. He was with the
Art. no. 013537 University of Sao Paulo, and with the North Car-
[39] Y. Pang, M. Lefsky, H.-E. Andersen, M. E. Miller, and K. Sherrill, olina State University and across many countries and
“Validation of the ICEsat vegetation product using crown-area-weighted companies, looking to improve silvicultural recom-
mean height derived using crown delineation with discrete return lidar mendations for the sustainability of forest plantations
data,” Can. J. Remote Sens., vol. 34, pp. S471–S484, 2008. including: clonal deployment; site-preparation; nutri-
[40] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, tion and spacing. To better evaluate the factors limiting forest productivity and
2001. controlling C allocation, he coordinated the establishment, with other scientists,
[41] S. S. Saatchi et al., “Benchmark map of forest carbon stocks in tropical of four large Eucalyptus and Pine cooperative research programs in Brazil
regions across three continents,” Proc. Nat. Acad. Sci., vol. 108, no. 24, via IPEF (BEPP, Eucflux, TECHS and PPPIB) and a research network at
pp. 9899–9904, Jun. 2011. Suzano company (G2M2P2). Nowadays the use of remote sensing to improve
[42] H. Akaike, “Information theory and an extension of the maximum like- monitoring, management and modeling planted forests has been a main focus
lihood principle,” in Selected Papers of Hirotugu Akaike, E. Parzen, K. of his research.
Tanabe, and G. Kitagawa, Eds. New York, NY, USA: Springer, 1998,
pp. 199–213.
Jean Stéphane Bailly received the Engineering de-
[43] G. le Maire et al., “MODIS NDVI time-series allow the monitoring of
gree in agronomy, the M.Sc. degree in biostatistics,
eucalyptus plantation biomass,” Remote Sens. Environ., vol. 115, no. 10, and the Ph.D. degree in hydrology from the University
pp. 2613–2625, Oct. 2011.
of Montpellier, Montpellier, France, in 1990, 2003,
and 2007.
He is currently a Senior Lecturer of physical ge-
ography and geostatistics in AgroParis-Tech, Paris,
Ibrahim Fayad received the Engineering degree in France. He is currently a Research Fellow with the
computer and telecommunications in 2011 and the UM LISAHLab, Montpellier, France. His research
Ph.D. degree in automatic and microelectronic sys- interests include spatial observations and modeling
tems both from the University of Montpellier, Mont- for hydrological issues.
pellier, France in 2015.
He is currently a Research Engineer with the Na-
Henrique Ferraço Scolforo received the Ph.D. de-
tional Research Institute for Agriculture, Food and
gree in forest biometrics from the North Carolina
the Environment, Montpellier, France. His research
State University, Raleigh, NC, USA, in 2018.
interests include machine learning for the retrieval of
environmental parameters using remote sensing data. His research focused on growth and yield modeling
sensitive to climate and clonal variation applied to
eucalypt stands in Brazil. Since 2018, he has been
leading the biometrics, inventory, growth, and yield
studies with Suzano SA, São Paulo, Brazil.

Nicolas N. Baghdadi received the Ph.D. degree from


the University of Toulon, Toulon, France in 1994.
From 1995 to 1997, he was a Postdoctoral Re-
searcher with INRS Ete—Water Earth Environment Mehrez Zribi received the B.E. degree in signal
Research Centre, Quebec University. From 1998 to processing from the Ecole Nationale Supérieure
2008, he was with the French Geological Survey, d’Ingénieurs en Constructions Aéronautiques,
Orléans, France. Since 2008, he has been a Senior Toulouse, France, in 1995, and the Ph.D. degree
Scientist with the National Research Institute for in signal processing and remote sensing from the
Agriculture, Food and the Environment, Montpellier, Université Paul Sabatier, Toulouse, France, in 1998.
France. He is currently a Scientific Director with the He is currently a Research Director with the
THEIA Land Data Centre. Centre National de Recherche Scientifique, Paris,
His research activities are in the areas of microwave remote sensing, image France. In 1995, he was with the Centre d’Etude
processing, and satellite and airborne remote sensing data analysis. His main des Environnements Terrestre et Planétaires
field of interest is the analysis of remote sensing data (SAR, Lidar, optical) and Laboratory/Institut Pierre Simon Laplace, Vélizy,
the retrieval of environmental parameters (e.g., soil moisture content, surface France. Since October 2008, he has been with the Centre d’Etudes Spatiales de
roughness, biomass, etc.). la Biosphère, Toulouse, France. He has authored or coauthored more than 140
articles in refereed journals. He is currently the Director of Centre d’Etudes
Spatiales de la Biosphère, Toulouse, France. His research interests include
microwave remote sensing applied to hydrology, microwave modeling for land
surface parameters estimations and finally airborne microwave instrumentation.
Clayton Alcarde Alvares received the Ph.D. degree
in forestry science from the University of São Paulo,
Guerric le Maire received the M.Sc. degree in agron-
São Paulo, Brazil, in 2012.
omy, specialized in ecology, from the National Insti-
His research focused on mapping and edaphocli- tute of Agronomy (INA P-G, Paris, France), Paris,
matic modeling of productivity of Eucalyptus plan-
France, in 2002 and the Ph.D. degree in plant eco-
tations in Brazil. Since 2011, he has been a Forest
physiology from Paris XI University, Orsay, France,
Scientist with Suzano SA, São Paulo, Brazil, where he
in 2005.
has been developing in-depth analyzes in climatology, During 2006–2007, he was a Postdoctoral Re-
remote sensing, production environments zoning, and
searcher with the Le Laboratoire des Sciences du
applied ecophysiology, and is strongly committed
Climat et de l’Environnement Laboratory (Saclay-
to delivering tools for operational uses in the short,
France). He was a Researcher with the CIRAD, Mont-
medium and long term. Since 2019, he has been a Professor and an Advisor of pellier, France, in 2008. His research interests include
the Postgraduate Program in Forest Science, State University "Júlio Mesquita
remote-sensing image processing/analysis and process-based ecophysiological
Filho"—UNESP, Faculty of Agronomic Sciences—Campus de Botucatu.
forest models development applied to tropical forest plantations.

You might also like