Appendix A. Some Detail about the Methods for Estimating National Accuracy Indicators and Their Variance
In this section, error matrices are assumed to be populated with estimated area proportions unless stated otherwise. As described in
Section 2.3, estimates of country-wide accuracy indicators are computed using Equations (2)–(4) after creating a nationally aggregated error matrix based on proper weighting using Equation (6). We may call this method a regional error matrix aggregation method. The combined ratio estimator method described in Wickham et al. [
37] is also suitable for computing estimates of user’s accuracies and producer’s accuracies. Below, both methods are described with examples, with their equivalence explained and illustrated afterwards. Lastly, formulas for variance of estimated national accuracy indicators including user’s accuracy (UA), producer’s accuracy (PA), and overall accuracy (OA) are re-visited.
For the error-matrix-aggregation-based method, the key is to have proper weightings applied to proportions (
) in regional error matrices so that values of re-weighted
become the ratio of the area mapped as category
i in region
h to the national total area (
N) instead of region
h’s total area (
). This can be seen to be the case for what Equation (6) sums:
where
is areal proportion of region
h in the whole study area (
/
N),
is actually
in Equation (1) representing the proportion of the area mapped as category
i in region
h,
and
are the same as in Equation (1) (assuming the error matrix of sample counts for region
h is similar to
Table 3), and
is the combined weight representing the ratio of the area mapped as category
i in region
h over the whole study area (as indicated by
H).
Thus, all proportions (for map-reference class pairs (i, j)) in regional error matrices are re-weighted nationally as desired (Equation (A1)). These re-weighted regional error matrices were aggregated with their corresponding proportions summed up, giving rise to the national error matrix with correct (national) proportions (Equation (6)).
For example,
for forest-forest class pair in the national error matrix (shown in
Table 7) is computed as:
After creation of the national error matrix, UA, PA, and OA can be easily computed using Equations (2)–(4), respectively. For example, based on the national error matrix in
Table 7, UA and PA for forest are:
where
and
represent row
i and column
j’s totals in the national error matrix, respectively (
i =
j = 2 for forest). The estimate for OA is:
By the combined ratio estimator method, on the other hand, UA and PA are estimated as a ratio
R =
Y/
X, where
Y is the population total of
and
X is the population total of
(
u being a pixel in the population).
and
are indicator functions for pixel
u on condition
A and condition
B, respectively. For UA of a particular class, say “forest”, condition
A is that the map and reference labels are both forest, while condition
B is that map label is forest. For PA of forest, condition
A remains the same, but condition
B is that the reference label is forest, as also explained in [
38].
The combined ratio estimator for UA or PA is:
where
is the sample mean of
in stratum
h,
is the sample mean of
in stratum
h,
is the population size in stratum
h, and
H is the number of strata in the study area [
37]. This ratio estimator is very general as it can handle sample data with double stratifications (as in this paper) and situations where there is no one-to-one correspondence between strata and classes for which accuracy indicators need to be estimated.
In this study, the sample data were collected following a two-level stratification (ten regions, each with nine or eight land cover classes), as described in
Section 2.1. We can treat the sample data as consisting of 84 strata and calculate
with Equation (A2) (
H = 84).
However, given the congruence among regional error matrices and one-one correspondence of strata and mapped classes in individual regions, simplified use of Equation (A2) is possible on the basis of regional error matrices. In other words, it is more sensible to view regions in the study as strata to work with when applying Equation (A2) (i.e.,
H = 10). Then, for a particular region
h,
refers to the region’s areal proportion (i.e.,
Table 2, bottom row). In addition, given regional error matrices, we can easily get sample statistics required in Equation (A2). Specifically,
is class
i’s sample proportion in the region (e.g., row or column
i’s totals in the error matrix for region
h,
and
depending on whether UA or PA is concerned), while
is the proportion of sample pixels of reference class
i classified correctly as class
i in the region (e.g.,
in the error matrix for region
h).
For example, using regional error matrices (
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8,
Table A9 and
Table A10,
Appendix B), UA and PA for forest are computed as:
when using Equation (A2) to compute UA and PA above, proportionality between
and
(non-zero ones) is implicitly applied.
For estimating national OA, we can apply Equation (A2) but only the numerator part, with summation over all map classes. Thus, based on regional error matrices (
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8,
Table A9 and
Table A10,
Appendix B) again, we compute national OA as:
Clearly, estimates for national UA and PA computed from the two methods are identical. The equivalence between the two methods’ results is not only for forest but for any class
i, as it is established by:
where Equation (6) is applied for numerators and denominators, separately.
As for the estimated variance of UA and PA, the method described in [
38] is applicable. Specifically, the estimated variance of the combined ratio estimator is computed as:
where
is sample size for stratum
h (
being population size for stratum
h, as previously in Equation (A2)),
H is the number of strata (84 for the study in this paper),
and
are the sample variances of
and
for stratum
h, and
is the sample covariance of
and
in stratum
h.
The estimated variance of national OA can be calculated using Equation (5). Adaptation is, however, required by viewing the country-wide population as consisting of 84 strata (region-class combinations), for which weights
need to be calculated properly (see Equation (A1) and the example for computing OA three lines above in Equation (A3)). In addition, we can compute variance of national OA using Equation (A6):
where
is the sample variances of stratum
h,
N is the population size (total number of pixels) of the study area, and
H is the number of strata to run the summation (84 for the study in this paper). We employed Equation (A6) for computing variance of estimated national OA, although we tested both methods (Equations (5) and (A6)) and obtained identical results.
Appendix B. Regional Error Matrices of Estimated Area Proportions when Defining Agreement at a Sample Pixel as a Match between the Map Label and Either the Primary or Alternate Reference Label
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8,
Table A9 and
Table A10 show error matrices of estimated area proportions for GlobeLand30 2010 in all regions (R1–R10). In
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8,
Table A9 and
Table A10, cultivated land is abbreviated as CuL, artificial surfaces as ArS, and permanent snow and ice as PSI, – means there is no permanent snow and ice in a given region, as in
Table 5,
Table 6 and
Table 7. User’s accuracy (UA) and producer’s accuracy (PA) are reported with standard errors (SE) in parentheses. These results are based on defining agreement as a match between the map label and the primary or alternate reference label at sample pixels.
Table A1.
The error matrix of estimated area proportions for R1: Overall accuracy is 90.3% (2.0%).
Table A1.
The error matrix of estimated area proportions for R1: Overall accuracy is 90.3% (2.0%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 4.9 | 0 | 0.2 | 0.1 | 0 | 0 | 0.1 | 0.1 | 0 | 5.4 | 91(3) |
Forest | 0 | 0.8 | 0.1 | 0.2 | 0 | 0 | 0 | 0 | 0 | 1.2 | 68(5) |
Grassland | 0 | 0.4 | 19.3 | 1.3 | 0.2 | 0 | 0 | 0.4 | 0.2 | 21.9 | 88(3) |
Shrubland | 0 | 0 | 0.2 | 0.5 | 0 | 0 | 0 | 0.1 | 0 | 0.8 | 57(5) |
Wetland | 0 | 0 | 0 | 0 | 0.3 | 0 | 0 | 0 | 0 | 0.3 | 86(3) |
Water | 0 | 0 | 0 | 0 | 0 | 0.6 | 0 | 0 | 0 | 0.6 | 92(3) |
ArS | 0 | 0 | 0 | 0 | 0 | 0 | 0.2 | 0 | 0 | 0.3 | 76(4) |
Bareland | 0 | 0 | 4 | 1.3 | 0 | 0 | 0 | 61.6 | 0 | 67 | 92(3) |
PSI | 0 | 0 | 0.1 | 0 | 0 | 0 | 0 | 0.1 | 2.1 | 2.4 | 87(3) |
Total | 5 | 1.3 | 24.1 | 3.5 | 0.6 | 0.6 | 0.3 | 62.3 | 2.4 | 100 | |
PA | 99(0) | 63(16) | 80(5) | 14(4) | 44(17) | 100(0) | 67(15) | 99(1) | 90(8) | | |
Table A2.
The error matrix of estimated area proportions for R2: Overall accuracy is 82.0% (2.7%).
Table A2.
The error matrix of estimated area proportions for R2: Overall accuracy is 82.0% (2.7%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 0.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.4 | 64(5) |
Forest | 0 | 7.5 | 0.5 | 0.4 | 0 | 0 | 0 | 0 | 0 | 8.5 | 89(3) |
Grassland | 0 | 0.6 | 52.1 | 7.1 | 1.9 | 0 | 0 | 2.6 | 0 | 64.3 | 81(4) |
Shrubland | 0 | 0.1 | 0.4 | 1.3 | 0.1 | 0 | 0 | 0 | 0 | 2 | 68(5) |
Wetland | 0 | 0 | 0 | 0 | 0.2 | 0 | 0 | 0 | 0 | 0.2 | 85(4) |
Water | 0 | 0 | 0 | 0 | 0.1 | 2.5 | 0 | 0 | 0 | 2.6 | 97(2) |
ArS | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 59(5) |
Bareland | 0 | 0.2 | 2.7 | 0.2 | 0.2 | 0 | 0 | 14.4 | 0.2 | 17.8 | 81(4) |
PSI | 0 | 0.1 | 0.3 | 0.1 | 0 | 0 | 0 | 0.1 | 3.8 | 4.4 | 87(3) |
Total | 0.2 | 8.5 | 55.9 | 9.2 | 2.4 | 2.5 | 0 | 17.1 | 4 | 100 | |
PA | 99(0) | 88(7) | 93(1) | 14(3) | 6(3) | 99(0) | 46(11) | 84(6) | 95(4) | | |
Table A3.
The error matrix of estimated area proportions for R3: Overall accuracy is 85.9% (2.0%).
Table A3.
The error matrix of estimated area proportions for R3: Overall accuracy is 85.9% (2.0%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 11.3 | 0.4 | 0.7 | 0.4 | 0 | 0 | 0.4 | 0 | 0 | 13.1 | 86(3) |
Forest | 0.1 | 8.7 | 0.6 | 1 | 0 | 0 | 0 | 0 | 0 | 10.5 | 83(4) |
Grassland | 0 | 0.5 | 40 | 2.8 | 0.9 | 0 | 0 | 1.8 | 0 | 46 | 87(3) |
Shrubland | 0 | 0 | 0.2 | 0.7 | 0 | 0 | 0 | 0 | 0 | 1.1 | 70(5) |
Wetland | 0 | 0 | 0 | 0 | 0.4 | 0 | 0 | 0 | 0 | 0.4 | 90(3) |
Water | 0 | 0 | 0 | 0 | 0 | 1.3 | 0 | 0 | 0 | 1.3 | 97(2) |
ArS | 0.1 | 0 | 0 | 0 | 0 | 0 | 0.4 | 0 | 0 | 0.6 | 78(4) |
Bareland | 0 | 0 | 2.9 | 0.5 | 0.5 | 0 | 0 | 22.6 | 0 | 26.6 | 85(4) |
PSI | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 98(1) |
Total | 11.5 | 9.6 | 44.4 | 5.5 | 1.9 | 1.3 | 0.8 | 24.5 | 0.5 | 100 | |
PA | 98(1) | 91(5) | 90(2) | 14(3) | 20(8) | 100(0) | 52(14) | 92(3) | 96(3) | | |
Table A4.
The error matrix of estimated area proportions for R4: Overall accuracy is 76.0% (2.3%).
Table A4.
The error matrix of estimated area proportions for R4: Overall accuracy is 76.0% (2.3%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 22.8 | 2.6 | 0 | 2 | 0.3 | 0.3 | 0.6 | 0 | 0 | 28.5 | 80(4) |
Forest | 0.5 | 37.9 | 0.9 | 5.5 | 0 | 0 | 0 | 0.9 | 0 | 45.7 | 83(4) |
Grassland | 1.2 | 1.9 | 11.2 | 5 | 0.4 | 0 | 0.2 | 0.8 | 0 | 20.8 | 54(5) |
Shrubland | 0.1 | 0.3 | 0.1 | 2.3 | 0 | 0 | 0 | 0 | 0 | 2.9 | 80(4) |
Wetland | 0 | 0 | 0 | 0 | 0.4 | 0 | 0 | 0 | 0 | 0.4 | 90(3) |
Water | 0 | 0 | 0 | 0 | 0 | 0.6 | 0 | 0 | 0 | 0.6 | 95(2) |
ArS | 0.1 | 0 | 0 | 0 | 0 | 0 | 0.4 | 0 | 0 | 0.6 | 75(4) |
Bareland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.2 | 0 | 0.3 | 67(5) |
PSI | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.2 | 0.2 | 81(4) |
Total | 24.7 | 42.7 | 12.4 | 14.8 | 1.1 | 0.9 | 1.3 | 2 | 0.2 | 100 | |
PA | 92(3) | 89(2) | 91(5) | 15(2) | 33(13) | 64(21) | 35(13) | 11(4) | 92(3) | | |
Table A5.
The error matrix of estimated area proportions for R5: Overall accuracy is 87.7% (2.0%).
Table A5.
The error matrix of estimated area proportions for R5: Overall accuracy is 87.7% (2.0%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 11.2 | 0 | 0.2 | 0.2 | 0 | 0 | 0.2 | 0.1 | – | 12 | 93(3) |
Forest | 0 | 11.1 | 0.1 | 0.3 | 0 | 0 | 0 | 0 | – | 11.5 | 96(2) |
Grassland | 2.9 | 0.5 | 38.7 | 2.4 | 0 | 0.5 | 0 | 2.9 | – | 47.8 | 81(4) |
Shrubland | 0.1 | 0 | 0.2 | 0.5 | 0 | 0 | 0 | 0 | – | 0.8 | 59(5) |
Wetland | 0 | 0 | 0 | 0 | 0.5 | 0 | 0 | 0 | – | 0.5 | 90(3) |
Water | 0 | 0 | 0 | 0 | 0 | 0.4 | 0 | 0 | – | 0.5 | 93(3) |
ArS | 0.1 | 0 | 0 | 0 | 0 | 0 | 0.6 | 0 | – | 0.8 | 80(4) |
Bareland | 0 | 0 | 0.8 | 0.5 | 0 | 0 | 0 | 24.8 | – | 26.1 | 95(2) |
PSI | – | – | – | – | – | – | – | – | – | – | – |
Total | 14.2 | 11.6 | 40.1 | 4 | 0.5 | 0.9 | 0.9 | 27.9 | – | 100 | |
PA | 79(6) | 96(4) | 97(1) | 12(4) | 97(2) | 47(25) | 72(14) | 89(4) | – | | |
Table A6.
The error matrix of estimated area proportions for R6: Overall accuracy is 77.4% (2.6%).
Table A6.
The error matrix of estimated area proportions for R6: Overall accuracy is 77.4% (2.6%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 19 | 3.8 | 0.5 | 1.4 | 0.8 | 0.5 | 0.5 | 0.5 | – | 27.2 | 70(5) |
Forest | 1.2 | 51.4 | 1.2 | 6.1 | 0 | 0.6 | 0.6 | 0 | – | 61.2 | 84(4) |
Grassland | 0.5 | 1.3 | 2 | 1.6 | 0 | 0 | 0.3 | 0 | – | 5.8 | 35(5) |
Shrubland | 0 | 0.1 | 0 | 0.7 | 0 | 0 | 0 | 0 | – | 1 | 78(4) |
Wetland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0 | 67(5) |
Water | 0 | 0 | 0.1 | 0 | 0.1 | 2 | 0 | 0 | – | 2.3 | 87(3) |
ArS | 0.1 | 0.1 | 0 | 0.1 | 0 | 0 | 2.2 | 0 | – | 2.5 | 87(3) |
Bareland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0 | 56(5) |
PSI | – | – | – | – | – | – | – | – | – | – | – |
Total | 21 | 56.7 | 3.9 | 9.9 | 0.9 | 3.2 | 3.7 | 0.5 | – | 100 | |
PA | 91(4) | 91(2) | 52(13) | 7(2) | 2(1) | 63(14) | 58(12) | 1(1) | – | | |
Table A7.
The error matrix of estimated area proportions for R7: Overall accuracy is 78.1% (2.5%).
Table A7.
The error matrix of estimated area proportions for R7: Overall accuracy is 78.1% (2.5%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 34.2 | 1.8 | 0 | 2.3 | 1.4 | 0.9 | 4.5 | 0 | – | 45 | 76(4) |
Forest | 1.3 | 35.9 | 1.7 | 3.4 | 0 | 0 | 0 | 0 | – | 42.3 | 85(4) |
Grassland | 0.7 | 0.8 | 2 | 1 | 0 | 0 | 0.2 | 0 | – | 5 | 41(5) |
Shrubland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0.1 | 78(4) |
Wetland | 0 | 0 | 0 | 0 | 0.3 | 0.1 | 0 | 0 | – | 0.3 | 81(4) |
Water | 0.1 | 0 | 0 | 0 | 0.1 | 2.6 | 0 | 0 | – | 2.8 | 92(3) |
ArS | 1.2 | 0.2 | 0 | 0 | 0 | 0 | 3.1 | 0 | – | 4.6 | 67(5) |
Bareland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0 | 54(5) |
PSI | – | – | – | – | – | – | – | – | – | – | – |
Total | 37.6 | 38.8 | 3.7 | 6.7 | 1.7 | 3.6 | 7.8 | 0.1 | – | 100 | |
PA | 91(2) | 93(2) | 54(13) | 1(0) | 15(7) | 72(13) | 39(7) | 21(17) | – | | |
Table A8.
The error matrix of estimated area proportions for R8: Overall accuracy is 82.4% (2.2%).
Table A8.
The error matrix of estimated area proportions for R8: Overall accuracy is 82.4% (2.2%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 50.8 | 1.7 | 2.3 | 0 | 0 | 0.6 | 1.7 | 0.6 | – | 57.7 | 88(3) |
Forest | 0.7 | 13.8 | 0.7 | 1.7 | 0 | 0 | 0 | 0 | – | 16.8 | 82(4) |
Grassland | 1.7 | 0.6 | 9.6 | 3.4 | 0 | 0 | 0.2 | 0 | – | 15.5 | 62(5) |
Shrubland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0.1 | 67(5) |
Wetland | 0 | 0 | 0 | 0 | 0.3 | 0 | 0 | 0 | – | 0.3 | 85(4) |
Water | 0 | 0 | 0 | 0 | 0.1 | 1.6 | 0 | 0 | – | 1.7 | 92(3) |
ArS | 0.5 | 0.4 | 0.2 | 0.2 | 0.1 | 0 | 6.3 | 0 | – | 7.8 | 81(4) |
Bareland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0.1 | 37(5) |
PSI | – | – | – | – | – | – | – | – | – | – | – |
Total | 53.7 | 16.5 | 12.8 | 5.4 | 0.4 | 2.2 | 8.2 | 0.6 | – | 100 | |
PA | 95(1) | 83(5) | 75(7) | 1(0) | 60(12) | 73(19) | 77(9) | 4(4) | – | | |
Table A9.
The error matrix of estimated area proportions for R9: Overall accuracy is 84.0% (2.1%).
Table A9.
The error matrix of estimated area proportions for R9: Overall accuracy is 84.0% (2.1%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 32.8 | 3.2 | 1.2 | 0.4 | 0 | 1.2 | 1.2 | 0 | – | 40 | 82(4) |
Forest | 0.4 | 40.2 | 0.4 | 3.1 | 0 | 0.4 | 0 | 0 | – | 44.6 | 90(3) |
Grassland | 0.8 | 0.8 | 1.5 | 0.7 | 0.1 | 0.2 | 0.2 | 0 | – | 4.4 | 35(5) |
Shrubland | 0 | 0 | 0 | 0.1 | 0 | 0 | 0 | 0 | – | 0.1 | 53(5) |
Wetland | 0 | 0 | 0 | 0 | 0.3 | 0.1 | 0 | 0 | – | 0.5 | 63(5) |
Water | 0.1 | 0 | 0 | 0 | 0.1 | 4 | 0 | 0 | – | 4.2 | 95(2) |
ArS | 0.7 | 0.2 | 0 | 0 | 0 | 0.1 | 5.2 | 0.1 | – | 6.2 | 84(4) |
Bareland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0.1 | 42(5) |
PSI | – | – | – | – | – | – | – | – | – | – | – |
Total | 34.9 | 44.4 | 3.2 | 4.3 | 0.5 | 6 | 6.6 | 0.1 | – | 100 | |
PA | 94(1) | 90(2) | 48(13) | 1(0) | 56(11) | 66(9) | 78(8) | 31(22) | – | | |
Table A10.
The error matrix of estimated area proportions for R10: Overall accuracy is 88.0% (1.6%).
Table A10.
The error matrix of estimated area proportions for R10: Overall accuracy is 88.0% (1.6%).
| CuL | Forest | Grassland | Shrubland | Wetland | Water | ArS | Bareland | PSI | Total | UA |
---|
CuL | 40.6 | 0.4 | 0.9 | 0.9 | 0 | 0 | 0.9 | 0 | – | 43.7 | 93(3) |
Forest | 0 | 35.4 | 0.4 | 1.9 | 0 | 0 | 0 | 0 | – | 37.6 | 94(2) |
Grassland | 1.9 | 3.2 | 6.5 | 0 | 0 | 0 | 0.4 | 0 | – | 12 | 54(5) |
Shrubland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | – | 0 | 69(5) |
Wetland | 0.3 | 0 | 0 | 0 | 1 | 0.1 | 0 | 0 | – | 1.4 | 73(4) |
Water | 0 | 0 | 0 | 0 | 0.1 | 1.4 | 0 | 0 | – | 1.5 | 94(2) |
ArS | 0.4 | 0.1 | 0 | 0 | 0 | 0 | 2.6 | 0 | – | 3.1 | 84(4) |
Bareland | 0 | 0 | 0.1 | 0 | 0 | 0 | 0 | 0.5 | – | 0.7 | 72(5) |
PSI | – | – | – | – | – | – | – | – | – | – | – |
Total | 43.3 | 39.1 | 7.9 | 2.8 | 1.1 | 1.5 | 3.9 | 0.5 | – | 100 | |
PA | 94(1) | 90(2) | 82(8) | 0(0) | 91(3) | 93(3) | 67(11) | 100(0) | – | | |