A Primer on Clustering of Forest Management Units for Reliable Design-Based Direct Estimates and Model-Based Small Area Estimation
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data
2.2.1. Sampled Data and Variables of Interest
2.2.2. Clustering Variables
2.3. Methodology
2.3.1. Clusterability and Variable Selection
2.3.2. Optimal Number of Clusters and Sample Size per Cluster
2.3.3. Optimal and Best Clustering Schemes
- First Methodology: This focused on 13 distinct clustering schemes defined using the PAM algorithm, the application of the Euclidean distance metric, and 13 specific clustering variables. For this methodology, was set to between 2 and 50.
- Second Methodology: This broader approach evaluated a combination of 468 individual clustering schemes. Each scheme was optimized using one of four clustering indices (Table 3). Thus, we identified 4 × 468 = 1885 optimal clustering scheme variants. The initial range of 2 to 50 was modified to between 8 and 50.
2.3.4. Preprocessing and Dissimilarity Metrics
2.3.5. Clustering Algorithms
2.3.6. Clustering Indices
2.3.7. Evaluation of Design-Based Direct Estimates
2.3.8. Model-Based SAE: Correlation Analysis
3. Results
3.1. Analysis of Optimal Clustering Schemes
3.2. Design-Based Direct SAE
3.3. Correlation Analysis for Model-Based SAE
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1. Terminology
- Clustering/Clustering Analysis: The process of grouping similar objects or FMUs into clusters, based on shared attributes or characteristics (observations).
- Cluster: A group of similar objects or an aggregation of FMUs that are similar to each other within a clustering. Clusters are non-overlapping subsets of the population that can be considered small areas or forest subpopulations.
- Object: The individual elements that are subject to clustering, specifically referring to homogeneous FMUs or forest patches within the population.
- Observations: The measurable characteristics or attributes (clustering variables) that describe the objects and are used to form clusters
- Clustering (auxiliary) variables: Clustering variables describe the objects or FMUs using specific attributes such as height metrics, past census data, centroids, or a combination thereof.
- Clustering Scheme: A specific approach used in clustering, including the selection of variables, algorithms, and similarity or distance metrics.
- Minimum and maximum number of k: The predetermined minimum and maximum number of clusters that are considered in the clustering analysis.
- Optimal : The ideal number of clusters to divide the data into, determined using specific clustering index criteria.
- Optimal Clustering Scheme: This is the specific clustering approach that, using a defined set of variables, algorithms, and similarity or distance metrics, divides data into the optimal clusters. The division is determined by the best values from a selected clustering index and its associated optimal index value, all within a predefined range for the number of clusters .
- Clustering Indices: Metrics or (internal) measures used to evaluate the quality or suitability of a particular clustering scheme. Clustering indices determine the optimal number of clusters based on their optimal index value.
- Optimal Index Value: The best value of a clustering index for determining the optimal number of clusters for a clustering scheme.
- Best Index Value: Refers to a selection of optimal index values, used to further refine the optimal clustering schemes to identify the most suitable ones regarding the number of clusters .
- Best Clustering Schemes: The most effective or suitable clustering schemes are selected from among the optimal ones, based on the best index value.
Appendix A.2. Clustering Indices
Index Abbreviation. | Index in Literature | Optimal Number of Clusters Defined by | Equation |
---|---|---|---|
CH | Caliński and Harabasz [45] | Maximum value of the index | |
FR | Friedman and Rubin [48] | Maximum difference between hierarchy levels of the index | |
KL | Krzanowski and Lai [46] | Maximum index value | |
S | Silhouette [47] | Maximum index value |
- The within-cluster distance is the mean distance between each observation and its nearest neighbors within the same cluster.
- The nearest-neighbor distance represents the mean distance between each observation and the nearest observation from a different cluster.
Appendix A.3. PCA
Appendix A.4. Optimum Number of Clusters (k) as Influenced by Indices
Appendix A.5. Correlation
Appendix A.6. Clustering of the Aggregated Mean Height in FMUs
References
- Chukwu, O.; Dau, J.H. Forest Inventory: Challenges, Trend, and Relevance on Conservation and Restoration of Tropical Forests. In Handbook of Research on the Conservation and Restoration of Tropical Dry Forests; IGI Global: Hershey, PA, USA, 2020; pp. 306–322. [Google Scholar]
- Dau, J.H.; Mati, A.; Dawaki, S.A. Role of Forest Inventory in Sustainable Forest Management: A Review. Int. J. For. Hortic. 2015, 1, 33–40. [Google Scholar]
- Rao, J.N.; Molina, I. Small Area Estimation; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar]
- Rahman, A.; Harding, A. Small Area Estimation and Microsimulation Modeling; Chapman and Hall/CRC: New York, NY, USA, 2017. [Google Scholar]
- Giordani, P.; Ferraro, M.B.; Martella, F. An Introduction to Clustering with R; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Society. Ser. C (Appl. Stat.) 1979, 28, 100–108. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Partitioning around Medoids (Program PAM). In Finding Groups in Data: An Introduction to Cluster Analysis; Kaufman, L., Rousseeuw, P.J., Eds.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1990; pp. 68–125. [Google Scholar]
- Næsset, E. Determination of Mean Tree Height of Forest Stands by Digital Photogrammetry. Scand. J. Forest Res. 2002, 17, 446–459. [Google Scholar] [CrossRef]
- Immitzer, M.; Stepper, C.; Böck, S.; Straub, C.; Atzberger, C. Use of WorldView-2 stereo imagery and National Forest Inventory data for wall-to-wall mapping of growing stock. Forest Ecol. Manag. 2016, 359 (Suppl. C), 232–246. [Google Scholar] [CrossRef]
- Ullah, S.; Dees, M.; Datta, P.; Adler, P.; Saeed, T.; Khan, M.S.; Koch, B. Comparing the potential of stereo aerial photographs, stereo very high-resolution satellite images, and TanDEM-X for estimating forest height. Int. J. Remote Sens. 2020, 41, 6976–6992. [Google Scholar] [CrossRef]
- Strunk, J.L.; Bell, D.M.; Gregory, M.J. Pushbroom Photogrammetric Heights Enhance State-Level Forest Attribute Mapping with Landsat and Environmental Gradients. Remote Sens. 2022, 14, 14. [Google Scholar] [CrossRef]
- Fay, R.E.; Herriot, R.A. Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. J. Am. Stat. Assoc. 1979, 74, 269–277. [Google Scholar] [CrossRef]
- Battese, G.E.; Harter, R.M.; Fuller, W.A. An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. J. Am. Stat. Assoc. 1988, 83, 28–36. [Google Scholar] [CrossRef]
- Breidenbach, J.; Magnussen, S.; Rahlf, J.; Astrup, R. Unit-level and area-level small area estimation under heteroscedasticity using digital aerial photogrammetry data. Remote Sens. Environ. 2018, 212, 199–211. [Google Scholar] [CrossRef]
- Goerndt, M.E. Comparison and Analysis of Small Area Estimation Methods for Improving Estimates of Selected Forest Attributes. Ph.D. Thesis, Oregon State University, Oregon, CA, USA, 2010. [Google Scholar]
- Magnussen, S.; Mauro, F.; Breidenbach, J.; Lanz, A.; Kändler, G. Area-level analysis of forest inventory variables. Eur. J. For. Res. 2017, 136, 839–855. [Google Scholar] [CrossRef]
- Chandra, H.; Chandra, G. Small Area Estimation for Total Basal Cover in The State of Maharashtra in India. In Statistical Methods and Applications in Forestry and Environmental Sciences. Forum for Interdisciplinary Mathematics; Chandra, G., Nautiyal, R., Chandra, H., Eds.; Springer: Singapore, 2020. [Google Scholar]
- McConville, K.S.; Moisen, G.G.; Frescino, T.S. A Tutorial on Model-Assisted Estimation with Application to Forest Inventory. Forests 2020, 11, 244. [Google Scholar] [CrossRef]
- Newnham, R.M. Cluster analysis: An application in forest management planning. For. Chron. 1992, 68, 628–633. [Google Scholar] [CrossRef]
- Smaltschinski, T.; Seeling, U.; Becker, G. Clustering Forest harvest stands on spatial networks for optimized harvest scheduling. Ann. For. Sci. 2012, 69, 651–657. [Google Scholar] [CrossRef]
- Vega, C.; Renaud, J.-P.; Sagar, A.; Bouriaud, O. A new small area estimation algorithm to balance between statistical precision and scale. Int. J. Appl. Earth Obs. Geoinf. 2021, 97, 102303. [Google Scholar] [CrossRef]
- Georgakis, A. Stratification of Forest Stands as a Basis for Small Area Estimations. In Proceedings of the 33rd PanHellenic statistics conference, Statistics in the Economy and Administration, Larissa, Greece, 23–26 September 2021. [Google Scholar]
- University Forest Administration and Management Fund. Pertouli University Forest Management Plan 2019–2028; University Forest Administration and Management Fund: Thessaloniki, Greece, 2018. [Google Scholar]
- Kershaw Jr, J.A.; Ducey, M.J.; Beers, T.W.; Husch, B. Forest Mensuration, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Hosking, J.R.M. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. J. R. Stat. Soc. Ser. B Stat. Methodol. 1990, 52, 105–124. [Google Scholar] [CrossRef]
- Dolloff, J.T.; Theiss, H.J. Temporal correlation of metadata errors for commercial satellite images. Presentation and effects on stereo extraction accuracy. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, XXXIX-B1, 215–223. [Google Scholar]
- Neigh, C.S.R.; Carroll, M.L.; Montesano, P.M.; Slayback, D.A.; Wooten, M.R.; Lyapustin, A.I.; Shean, D.E.; Alexandrov, O.; Macander, M.J.; Tucker, C.J. An API for Spaceborne Sub-Meter Resolution Products for Earth Science. In IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium; IEEE: Piscataway, NJ, USA, 2019; pp. 5397–5400. [Google Scholar]
- Adolfsson, A.; Ackerman, M.; Brownstein, N.C. To cluster, or not to cluster: An analysis of clusterability methods. Pattern Recognit. 2019, 88, 13–26. [Google Scholar] [CrossRef]
- Maechler, M. Diptest: Hartigan’s dip Test Statistic for Unimodality-Corrected. R package Version 0.75-7. 2015. Available online: https://CRAN.R-project.org/package=diptest (accessed on 13 August 2023).
- Hopkins, B.; Skellam, J.G. A new method for determining the type of distribution of plant individuals. Ann.Bot. 1954, 18, 213–227. [Google Scholar] [CrossRef]
- Bezdek, J.C.; Hathaway, R.J. VAT: A Tool for Visual Assessment of (Cluster) Tendency. In Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN’02 (Cat. No. 02CH37290). Honolulu, HI, USA, 12–17 May 2002. [Google Scholar]
- Kassambara, A. Practical Guide To Cluster Analysis in R: Unsupervised Machine Learning; Sthda.com, 2017; Volume 1. [Google Scholar]
- Kassambara, A. Practical Guide To Principal Component Methods in R: PCA, M (CA), FAMD, MFA, HCPC, Factoextra; Sthda.com, 2017; Volume 2. [Google Scholar]
- McRoberts, R.E.; Gobakken, T.; Næsset, E. Post-stratified estimation of forest area and growing stock volume using lidar-based stratifications. Remote Sens. Environ. 2012, 125, 157–166. [Google Scholar] [CrossRef]
- Westfall, J.A.; Patterson, P.L.; Coulston, J.W. Post-stratified estimation: Within-strata and total sample size recommendations. Can. J. For. Res. 2011, 41, 1130–1139. [Google Scholar] [CrossRef]
- Scott, C.; Bechtold, W.; Reams, G.; Smith, W.; Hansen, M.; Moisen, G. Sample-based estimators used by the forest inventory and analysis national information management system. In Proceedings of the Enhanced Forest Inventory and Analysis Program—National Sampling Design and Estimation Procedures, Denver, CO, USA, 21–24 September 2004; Bechtold, W.A., Patterson, P.L., Eds.; USDA Forest Service, Southern Research Station: Asheville, NC, USA, 2005; pp. 43–67. [Google Scholar]
- Bechtold, W.; Scott, C. The Enhanced Forest Inventory and Analysis Program—National Sampling Design and Estimation Procedures. In Proceedings of the Enhanced Forest Inventory and Analysis Program—National Sampling Design and Estimation Procedures, Denver, CO, USA, 21–24 September 2004; Bechtold, W.A., Patterson, P.L., Eds.; USDA Forest Service, Southern Research Station: Asheville, NC, USA, 2005; pp. 27–42. [Google Scholar]
- Ruiz, L.; Hermosilla, T.; Mauro, F.; Godino, M. Analysis of the Influence of Plot Size and LiDAR Density on Forest Structure Attribute Estimates. Forests 2014, 5, 936. [Google Scholar] [CrossRef]
- Chambers, R.; Clark, R. An Introduction To Model-Based Survey Sampling With Applications; OUP Oxford: Oxford, UK, 2012; Volume 37. [Google Scholar]
- Magnussen, S. Arguments for a model-dependent inference? For. Int. J. For. Res. 2015, 88, 317–325. [Google Scholar] [CrossRef]
- Cochran, W.G. Sampling Techniques, 3rd ed.; Wiley: New York, NY, USA, 1997. [Google Scholar]
- Strunk, J.; Packalen, P.; Gould, P.; Gatziolis, D.; Maki, C.; Andersen, H.-E.; McGaughey, R.J. Large Area Forest Yield Estimation with Pushbroom Digital Aerial Photogrammetry. Forests 2019, 10, 397. [Google Scholar] [CrossRef]
- Charrad, M.; Ghazzali, N.; Boiteau, V.; Niknafs, A. NbClust: An R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 2014, 61, 1–36. [Google Scholar] [CrossRef]
- Ward, J.H. Hierarchical Grouping to Optimize an Objective Function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
- Xu, D.; Tian, Y. A Comprehensive Survey of Clustering Algorithms. Ann. Data Sci. 2015, 2, 165–193. [Google Scholar] [CrossRef]
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar]
- Krzanowski, W.J.; Lai, Y.T. A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering. Biometrics 1988, 44, 23–34. [Google Scholar] [CrossRef]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Friedman, H.P.; Rubin, J. On Some Invariant Criteria for Grouping Data. J. Am. Stat. Assoc. 1967, 62, 1159–1178. [Google Scholar] [CrossRef]
- Dunn, J.C. Well-separated clusters and optimal fuzzy partitions. J. Cybern. 1974, 4, 95–104. [Google Scholar] [CrossRef]
- Georgakis, A.; Diamantopoulou, M.J.; Trigkas, M. Methodology for the Establishment of Sample Plots and Estimation of Growing Stock Volume In Greek Forest Stands. In Proceedings of the 20th Panhellenic Forestry Conference, Trikala, Greece, 3–6 October 2021. [Google Scholar]
- Mauro, F.; Molina, I.; García-Abril, A.; Valbuena, R.; Ayuga-Téllez, E. Remote sensing estimates and measures of uncertainty for forest variables at different aggregation levels. Environmetrics 2016, 27, 225–238. [Google Scholar] [CrossRef]
- Team, R. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: http://www.r-project.org/index.html (accessed on 13 August 2023).
- Maechler, M.; Rousseeuw, P.; Struyf, A.; Hubert, M.; Hornik, K. Package Cluster: Cluster Analysis Basics and Extensions. R Package Version 2.1.4. 2022. Available online: https://CRAN.R-project.org/package=cluster (accessed on 13 August 2023).
- Molina, I.; Rao, J.; Datta, G.S. Small area estimation under a Fay–Herriot model with preliminary testing for the presence of random area effects. Surv. Methodol. 2015, 41, 1–19. [Google Scholar]
- Benavent, R.; Morales, D. Multivariate Fay–Herriot models for small area estimation. Comput. Stat. Data Anal. 2016, 94, 372–390. [Google Scholar] [CrossRef]
- Pratesi, M.; Salvati, N. Small area estimation: The EBLUP estimator based on spatially correlated random area effects. Stat. Methods Appt. 2008, 17, 113–141. [Google Scholar] [CrossRef]
- Ver Planck, N.R.; Finley, A.O.; Kershaw, J.A.; Weiskittel, A.; Kress, R.M.C. Hierarchical Bayesian models for small area estimation of forest variables using LiDAR. Remote Sens. Environ. 2018, 204, 287–295. [Google Scholar] [CrossRef]
- Georgakis, A.; Stamatellos, G. Sampling Design Contribution to Small Area Estimation Procedure in Forest Inventories. Mod. Concep. Dev. Agrono. 2020, 7, 694–697. [Google Scholar] [CrossRef]
- Hill, A. Integration of Small Area Estimation Procedures in Large-Scale Forest Inventories. Doctoral Dissertation, ETH Zurich, Zürich, Switzerland, 2018. Available online: http://hdl.handle.net/20.500.11850/305920 (accessed on 13 August 2023).
- Hill, A.; Mandallaz, D.; Langshausen, J. A Double-Sampling Extension of the German National Forest Inventory for Design-Based Small Area Estimation on Forest District Levels. Remote Sens. 2018, 10, 1052. [Google Scholar] [CrossRef]
- Mandallaz, D. Design-based properties of some small-area estimators in forest inventory with two-phase sampling. Can. J. For. Res. 2013, 43, 441–449. [Google Scholar] [CrossRef]
- Molefe, W.B. Sample Design for Small Area Estimation. Doctoral Thesis, University of Wollongong, Wollongong, Australia, 2011. Available online: https://ro.uow.edu.au/theses/3495 (accessed on 13 August 2023).
- Zimmermann, T. The Interplay between Sampling Design and Statistical Modelling in Small Area Estimation. Ph.D. Thesis, Trier University, Trier, Germany, 2018. [Google Scholar]
- Haakana, H.; Heikkinen, J.; Katila, M.; Kangas, A. Efficiency of post-stratification for a large-scale forest inventory—Case Finnish NFI. Ann. For. Sci. 2019, 76, 9. [Google Scholar] [CrossRef]
- You, Y.; Chapman, B. Small area estimation using area level models and estimated sampling variances. Surv. Methodol. 2006, 32, 97. [Google Scholar]
- Georgakis, A. Further Improvements of Growing Stock Volume Estimations at Stratum-Level with the Application of Fay-Herriot Model. In Proceedings of the 33rd PanHellenic Statistics Conference, Statistics in the Economy and Administration, Larissa, Greece, 23–26 September 2021. [Google Scholar]
- Zulkarnain, R.; Jayanti, D.; Listianingrum, T. Improving the quality of disaggregated SDG indicators with cluster information for small area estimates. Stat. J. IAOS 2020, 36, 955–961. [Google Scholar] [CrossRef]
- Torkashvand, E.; Jozani, M.J.; Torabi, M. Clustering in small area estimation with area level linear mixed models. J. R. Stat. Soc. Ser. A Stat. Soc. 2017, 180, 1253–1279. [Google Scholar] [CrossRef]
- Anisa, R.; Kurnia, A.; Indahwati, I. Cluster Information of Non-Sampled Area In Small Area Estimation. IOSR J. Math. 2014, 10, 15–19. [Google Scholar] [CrossRef]
- Desiyanti, A.; Ginanjar, I.; Toharudin, T. Application of an Empirical Best Linear Unbiased Prediction Fay-Herriot (EBLUP-FH) Multivariate Method with Cluster Information to Estimate Average Household Expenditure. Mathematics 2022, 11, 135. [Google Scholar] [CrossRef]
- Ginanjar, I.; Wulandary, S.; Toharudin, T. Empirical Best Linear Unbiased Prediction Method with K-Medoids Cluster for Estimate Per Capita Expenditure of Sub-District Level. IAENG Int. J. Appl. Math. 2022, 52, 1–7. [Google Scholar]
- Blaschke, T. Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef]
- Jia, W.; Sun, Y.; Pukkala, T.; Jin, X. Improved Cellular Automaton for Stand Delineation. Forests 2020, 11, 37. [Google Scholar] [CrossRef]
- Pukkala, T. Can Kohonen networks delineate forest stands? Scand. J. For. Res. 2021, 36, 198–209. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, W.; Pukkala, T.; Jin, X. Stand delineation based on laser scanning data and simulated annealing. Eur. J. For. Res. 2021, 140, 1065–1080. [Google Scholar] [CrossRef]
- Pascual, A.; Tóth, S.F. Using mixed integer programming and airborne laser scanning to generate forest management units. J. For. Res. 2022, 33, 217–226. [Google Scholar] [CrossRef]
- Georgakis, A.; Papageorgiou, V.E.; Stamatellos, G. Bivariate Fay-Herriot Model for Enhanced Small Area Estimation of Growing Stock Volume. In Proceedings of the International Conference on Applied Mathematics & Computer Science, IEEE Computer Society, Lefkada, Greece, 8–10 August 2023. [Google Scholar]
- Milligan, G.W.; Cooper, M.C. An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 1985, 50, 159–179. [Google Scholar] [CrossRef]
Variable of Interest (Measurement Unit) | Mean ± Standard Error (SE) | Relative SE % of the Mean | Minimum | Maximum |
---|---|---|---|---|
Volume (m3/ha) | 303.96 ± 6.57 | 2.16 | 13.99 | 842.30 |
Basal Area (m2/ha) | 32.55 ± 0.64 | 1.95 | 3.77 | 84.16 |
Tree Density (Trees/ha) | 582.20 ± 1.62 | 2.78 | 170.00 | 1770.00 |
Mean Height (m) | 20.16 ± 0.17 | 1.08 | 7.87 | 27.04 |
Data Type | Descriptive Statistics | Abbreviation | Description | Unit Metric |
---|---|---|---|---|
Height | Quantiles | h25; h50; h75; h95 | Percentiles of canopy height | Meters (m) |
Height | Central tendency | hmean; hmode | Cell height mean and mode (most frequent height in a cell) | m |
Height | Dispersion | hsd; hcv | Cell height standard deviation and coefficient of variation | m; ratio |
Height | L-Moments | L1; L2; L3; L4 | L1: mean height of all points in sample distribution; L2: similar to hsd; L3, L4: analogous to skewness/kurtosis (a measure of distribution shape) | m; m; ratio; ratio |
Height | L-ratios | hLcv; hLskew | hLcv = L2/L1, similar to hcv; hLskew = L3/L2 | ratio; ratio |
Census | Central tendency | FirTreeDensity88ha | Hybrid fir mean tree density (1988 census) | Trees/ha |
Census | Central tendency | FirGSV88ha | Hybrid fir mean volume (1988 census) | |
Census | Central tendency | ForestDensity97ha | Mean all-species tree density (1997 census) | Trees/ha |
Census | Central tendency | ForestGSV97ha | Mean all-species volume (1997 census) | |
Geolocation | Spatial | XY centroid coordinates | FMU centroid coordinates | m |
Combined variables for clustering (“_” used as variable delimiter) | hmean_hsd; hLcv_hLskew; ForestDensity97_hmean; h50_ForestDensity97; h50_ForestDensity97_X_Y; h50_X_Y |
Methodology 1 | Clustering Schemes | Clustering Index 2 | Number of Optimal Clustering Schemes | Best Index Value 3 | |||
---|---|---|---|---|---|---|---|
Method/Algorithm | Distance metric | Variables | |||||
1st | 2–50 | PAM or k-medoids | Euclidean | 13 | S | 13 | 13 |
2nd | 8–50 | Ward.D, Ward.D2, single, complete, average, McQuitty, median, centroid, k-means | Euclidean, maximum, Manhattan, Minkowski | 13 | CH | 468 | 8 |
FR | 468 | 7 | |||||
KL | 468 | 15 | |||||
S | 468 | 8 | |||||
Sum | 1; 9 (1st; 2nd) | 1; 4 (1st; 2nd) | 13 (1st; 2nd) | 1; 4 | 1885 | 51 |
SN | Dist | Algorithm | Clustering Variables | Index | Best Index Value | Clusters | Mean of RSEs | StD of RSEs | p90 of RSEs | Mean of nPlots | StD of nPlots | * 1 plot Clusters |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Man | ward.D | h50_Dens_X_Y | CH | 50.47 | 13 | 0.08 | 0.03 | 0.10 | 18.38 | 10.95 | 0 |
2 | Eucl | k-means | h50_X_Y | CH | 81.57 | 8 | 0.07 | 0.03 | 0.10 | 29.88 | 13.27 | 0 |
3 | Eucl | k-means | Hmean | S | 0.61 | 14 | 0.10 | 0.10 | 0.11 | 17.07 | 8.95 | 0 |
4 | Eucl | PAM (1) | h95 | S | 0.60 | 14 | 0.08 | 0.05 | 0.12 | 18.00 | 7.19 | 1 |
5 | Man | ward.D | hmean_Dens | KL | 2002.22 | 14 | 0.10 | 0.10 | 0.12 | 17.07 | 10.31 | 0 |
6 | Eucl | PAM (1) | h50_Dens_X_Y | S | 0.28 | 30 | 0.10 | 0.04 | 0.12 | 8.07 | 4.05 | 1 |
7 | Eucl | PAM (1) | h25 | S | 0.59 | 13 | 0.09 | 0.06 | 0.12 | 18.08 | 9.37 | 0 |
8 | Eucl | ward.D | h50 | KL | 141.49 | 19 | 0.08 | 0.03 | 0.12 | 14.75 | 7.04 | 3 |
9 | Max | k-means | h50_X_Y | S | 0.31 | 15 | 0.09 | 0.04 | 0.13 | 15.93 | 8.40 | 0 |
10 | Eucl | average | h50_Dens_X_Y | KL | 261.42 | 25 | 0.09 | 0.04 | 0.13 | 11.19 | 8.18 | 4 |
11 | Eucl | ward.D | hLcv_hLskew | KL | 2013.08 | 17 | 0.09 | 0.03 | 0.13 | 14.88 | 8.68 | 1 |
12 | Eucl | k-means | h50 | CH | 715.99 | 19 | 0.11 | 0.08 | 0.13 | 13.22 | 6.61 | 1 |
13 | Eucl | k-means | hLcv | CH | 600.54 | 17 | 0.10 | 0.04 | 0.15 | 14.88 | 5.52 | 1 |
14 | Eucl | PAM (1) | hmean | S | 0.60 | 22 | 0.09 | 0.04 | 0.16 | 11.65 | 7.46 | 2 |
15 | Eucl | k-means | h75 | S | 0.58 | 26 | 0.12 | 0.11 | 0.17 | 9.52 | 4.34 | 1 |
16 | Eucl | k-means | hLcv_hLskew | CH | 188.71 | 34 | 0.12 | 0.07 | 0.17 | 7.21 | 3.70 | 1 |
17 | Eucl | McQuitty | h75 | FR | 13781.53 | 36 | 0.12 | 0.10 | 0.17 | 7.77 | 4.73 | 6 |
18 | Eucl | PAM (1) | h75 | S | 0.58 | 33 | 0.10 | 0.04 | 0.18 | 8.21 | 4.52 | 5 |
19 | Eucl | PAM (1) | h50 | S | 0.60 | 31 | 0.11 | 0.05 | 0.18 | 8.56 | 4.43 | 4 |
20 | Eucl | k-means | hmean | FR | 2864.02 | 31 | 0.13 | 0.13 | 0.18 | 8.17 | 4.87 | 2 |
21 | Max | complete | h50_X_Y | CH | 77.08 | 33 | 0.12 | 0.06 | 0.19 | 7.65 | 3.82 | 2 |
22 | Eucl | complete | h50 | S | 0.64 | 35 | 0.13 | 0.07 | 0.19 | 8.03 | 4.35 | 6 |
23 | Eucl | average | hmean_hsd | KL | 574.46 | 26 | 0.11 | 0.07 | 0.19 | 12.83 | 11.75 | 8 |
24 | Eucl | PAM (1) | hmean_Dens | S | 0.40 | 41 | 0.12 | 0.05 | 0.20 | 6.54 | 3.99 | 6 |
25 | Eucl | PAM (1) | h50_Density | S | 0.43 | 42 | 0.12 | 0.07 | 0.20 | 6.36 | 3.91 | 6 |
26 | Max | median | hmean_Dens | KL | 540.85 | 28 | 0.13 | 0.07 | 0.23 | 12.78 | 18.02 | 9 |
27 | Eucl | k-means | h50_X_Y | S | 0.32 | 30 | 0.12 | 0.08 | 0.23 | 7.97 | 3.36 | 0 |
28 | Eucl | single | h95 | KL | 168.80 | 23 | 0.14 | 0.12 | 0.23 | 13.71 | 22.48 | 6 |
29 | Eucl | single | hmean | FR | 6240.10 | 32 | 0.13 | 0.12 | 0.24 | 9.63 | 8.72 | 8 |
30 | Eucl | median | h50_X_Y | CH | 41.68 | 26 | 0.14 | 0.09 | 0.24 | 13.53 | 11.04 | 9 |
31 | Eucl | PAM (1) | hLcv | S | 0.65 | 34 | 0.14 | 0.09 | 0.25 | 7.28 | 3.53 | 2 |
32 | Eucl | single | hmean | CH | 1265.23 | 36 | 0.14 | 0.12 | 0.27 | 8.25 | 8.11 | 8 |
33 | Eucl | ward.D | hLcv | S | 0.67 | 39 | 0.16 | 0.10 | 0.27 | 6.41 | 3.69 | 2 |
34 | Eucl | median | hLcv | FR | 7872.32 | 39 | 0.16 | 0.12 | 0.32 | 6.56 | 4.95 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Georgakis, A.; Gatziolis, D.; Stamatellos, G. A Primer on Clustering of Forest Management Units for Reliable Design-Based Direct Estimates and Model-Based Small Area Estimation. Forests 2023, 14, 1994. https://doi.org/10.3390/f14101994
Georgakis A, Gatziolis D, Stamatellos G. A Primer on Clustering of Forest Management Units for Reliable Design-Based Direct Estimates and Model-Based Small Area Estimation. Forests. 2023; 14(10):1994. https://doi.org/10.3390/f14101994
Chicago/Turabian StyleGeorgakis, Aristeidis, Demetrios Gatziolis, and Georgios Stamatellos. 2023. "A Primer on Clustering of Forest Management Units for Reliable Design-Based Direct Estimates and Model-Based Small Area Estimation" Forests 14, no. 10: 1994. https://doi.org/10.3390/f14101994