Finding Representative Points in Multivariate Data Using PCA
Authors:
Ashwinkumar Ganesan,
Tim Oates,
Matt Schmill
Abstract:
The idea of representation has been used in various fields of study from data analysis to political science. In this paper, we define representativeness and describe a method to isolate data points that can represent the entire data set. Also, we show how the minimum set of representative data points can be generated. We use data from GLOBE (a project to study the effects on Land Change based on a…
▽ More
The idea of representation has been used in various fields of study from data analysis to political science. In this paper, we define representativeness and describe a method to isolate data points that can represent the entire data set. Also, we show how the minimum set of representative data points can be generated. We use data from GLOBE (a project to study the effects on Land Change based on a set of parameters that include temperature, forest cover, human population, atmospheric parameters and many other variables) to test & validate the algorithm. Principal Component Analysis (PCA) is used to reduce the dimensions of the multivariate data set, so that the representative points can be generated efficiently and its Representativeness has been compared against Random Sampling of points from the data set.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
Contextualizing the global relevance of local land change observations
Authors:
N. R. Magliocca,
E. C. Ellis,
T. Oates,
M. Schmill
Abstract:
To understand global changes in the Earth system, scientists must generalize globally from observations made locally and regionally. In land change science (LCS), local field-based observations are costly and time consuming, and generally obtained by researchers working at disparate local and regional case-study sites chosen for different reasons. As a result, global synthesis efforts in LCS tend…
▽ More
To understand global changes in the Earth system, scientists must generalize globally from observations made locally and regionally. In land change science (LCS), local field-based observations are costly and time consuming, and generally obtained by researchers working at disparate local and regional case-study sites chosen for different reasons. As a result, global synthesis efforts in LCS tend to be based on non-statistical inferences subject to geographic biases stemming from data limitations and fragmentation. Thus, a fundamental challenge is the production of generalized knowledge that links evidence of the causes and consequences of local land change to global patterns and vice versa. The GLOBE system was designed to meet this challenge. GLOBE aims to transform global change science by enabling new scientific workflows based on statistically robust, globally relevant integration of local and regional observations using an online social-computational and geovisualization system. Consistent with the goals of Digital Earth, GLOBE has the capability to assess the global relevance of local case-study findings within the context of over 50 global biophysical, land-use, climate, and socio-economic datasets. We demonstrate the implementation of one such assessment - a representativeness analysis - with a recently published meta-study of changes in swidden agriculture in tropical forests. The analysis provides a standardized indicator to judge the global representativeness of the trends reported in the meta-study, and a geovisualization is presented that highlights areas for which sampling efforts can be reduced and those in need of further study. GLOBE will enable researchers and institutions to rapidly share, compare, and synthesize local and regional studies within the global context, as well as contributing to the larger goal of creating a Digital Earth.
△ Less
Submitted 25 July, 2013;
originally announced July 2013.