7423 CH 10
7423 CH 10
7423 CH 10
C. Jarvis
10.1 INTRODUCTION
Figure 10.1 The general nature of virtual organizations in the ‘Grid’ (after Foster et al.1).
165
More recently it has been suggested that ‘The ‘Grid’ … aims to provide an
infrastructure that enables flexible, secure, co-ordinated resource sharing among
dynamic collections of individuals, institutions and resources’1. This still
encompasses issues regarding computational systems and data storage, but is a
broader definition stressing collaborative (scientific) enterprise and transient virtual
organizations. These last points are critical. This rationale is a superset
encompassing both the earlier arguments in favor of intensive computing and also a
vision of the Grid’s potential to encourage changes to the very practice of science
itself. Adopting this wider stance, Table 10.1 highlights just a few of the areas in
which a Grid-enabled GIS might offer advantages over the status quo.
Table 10.1 Potential opportunities enabled by incorporating GIScience technologies within Grid
enabled systems
Data Virtual organizations
• Finding appropriate data sets • A new way of carrying out
automatically integrative modelling experiments
• Access to large data sets without across multiple sites
downloading them completely,
reducing data redundancy • A means of bringing together
elements of GI applications that plays
• A potential means of linking data held to the strengths of individual
at multiple organizations researchers who are freed by access
• Mobile and real time sensors as input to appropriate interfaces
o Providing update through new
observation • A more equitable resourcing outcome,
o Requiring new computation of models both for researchers and governments?
o To give information to decision
makers
Models and modelling Visualization for control,
• Access to models too complex to run monitoring and decision-making
at the majority of locations • Interactive, multi-site visualizations
• A means of linking multiple models to allow discussions of emerging
without overloading one computer phenomena and to support multi-user
system decisions
• A means of linking models developed • Multiple views based on a similar
at multiple sites without the modelling flow, for example
collocation of individuals or software researchers, farmers, advisors and
code policy makers
• Data mining for • Visualization methods that might
associations/associated models assist with the monitoring of GRID
• Computing power to evaluate processing
sensitivity of simulation
models/evaluate uncertainties in
approach
Turning firstly to the left hand quadrants of Table 10.1, practical computational
challenges in the extent to which we are able to process increasing volumes of
satellite and other data and model inter-linked critical processes at global and
regional scales are perennial issues. The pooling of available computer resources
across international and institutional boundaries has the potential to allow us to
pursue previously intractable questions, reduce redundancy in data archives,
process uncertainty bounds on simulation runs and explore geographically localized
models2. The use of computational Grids for the processing of remotely sensed data
for example has seen early progress3,4. Alternatively, Grid services could be used to
speed up applied models to provide more responsive ‘real-time’ risk assessments5.
E-science technologies also offer the possibility of drawing on expertise, data,
knowledge and models in-situ in different parts of the world, opening opportunities
for increased interdisciplinary collaboration and a richer set of research and socio-
political perspectives. This may be deductive, or inductive through the further
facilitation of data mining opportunities that the Grid presents. Grid services of the
future for example should be able to find appropriate GIS models, functions and
data dynamically, a considerable step forward from the currently used Web
Services model.
Putting some context to these possibilities, consider a Grid approach for
management and research regarding the causes and effects of urban atmospheric
pollution from traffic. Figure 10.2 identifies just some of the databases, automated
sensors, computing power, models (geographical and non-geographical) and
expertise that are associated with these tasks. Many of these resources are currently
unconnected, either in terms of easy human access or web services, let alone via a
Grid; the bold lines in Figure 10.2 illustrate potential new connections across a Grid
network. At present, an efficient flow of digital information to support, for
example, management of risk to asthmatics from localized extreme episodes or
responses to the threat of an impending critical episode is hampered by cross-
institutional and cross-disciplinary barriers. The types of entity that might form a
virtual organization in this case vary considerably in nature; the hospital expertise
and patient data of Figure 10.2 require strong controls on the access to personal
data to be in place6, while sensor and meteorological data have less restriction.
Work on Grid accessibility to this second type of data set is consequently more
advanced, for example through projects such as the ‘NERC DataGrid’ (see
http://www.bodc.ac.uk/projects/ndg.html). Similar contrasts may be identified
between research and public service organizations, where progressing Grid services
is understandably more in keeping with the former at this early stage.
Hypothetically, advantages from all quadrants of Table 10.1 to adopting a Grid
approach in this application area can be identified. This is just one very brief
snapshot of the potential of cross-disciplinary and cross-institutional Grid
computing in the service of an application area; more details may be found
elsewhere5. Examples of on-going Grid work that incorporates GIS or remotely
sensed data and/or functions and perspectives may be found in a diverse range of
subject areas connected with environmental decision-making, such as climate
modelling7, land-use change8 and hydrological modelling2 among others.
Figure 10.2 Inter-connected Grid resources for management and research regarding the causes and
effects of atmospheric pollution.
Before applying Grid-enabled GIS for science and decision making however,
we need to establish how close we really are to practicing GIS technologies on the
Grid. The reality is that many developments in computer science will be required if
data access, model integration and computing power are to be available and
harnessed in a seamless and secure fashion. Figure 10.3 suggests a development
profile for Grid utilization in environmental science; currently, practice is moving
into the second stage but retains a data, as opposed to service, bias7,9 that still also
exists at stage one. Thus, we should not lose sight of the fact that using the Grid to
support GIS applications currently requires considerable computing expertise on
the part of developers; the average GIS user is a long way from logging on to the
Grid in the same way that he or she logs on to a PC and searches the web.
This chapter focuses on the technical and indeed cultural aspects of GIScience
that might be further developed such that ‘doing’ interdisciplinary collaborative
work that incorporates GIS across the Grid is both seamless and straightforward in
the years to come. In other words, as Grid technologies mature, what does
GIScience need to research in order that GridGIS functionality will be available to
researchers and even to users who might not necessarily know that GIS
technologies are serving their requests? Issues of particular current importance in
meeting this goal are outlined in the right hand panel of Figure 10.3, and include
further research regarding the linked themes of metadata and ontologies, distributed
processing and federated databases. Work to assist users in managing remote data
and processes intelligently is also relatively immature in GIS10, while the area of
geography will interact with models and geographical data in different ways, as
will decision makers. The question as to whether it is valuable to attempt to
concatenate local ontologies into global super-sets must be opened for debate, as
must the wisdom of adopting a hierarchical approach24 to ontology building. For
flexibility, given the number of permutations in ontology likely to arise when
working in a global, interdisciplinary Grid context, it may be that pursuing methods
to bridge ontologies through dynamic negotiation according to context will be a
more fruitful avenue of research. Furthermore, incorporating changing contexts or
perceptions within ontologies will be a necessary challenge, given that no ontology
can ever be considered complete and immutable.
10.2.1.2 Towards the ‘Invisible’ Grid: Accessing and Scheduling GIS Procedures
As noted above, the average user of a GIS will not wish to grapple with many
of the technical issues involved in Grid computing. The aim must rather be one of
‘invisible computing’, where the tools ‘fit the person and tasks so well, are
sufficiently unobtrusive and inter-connectivity seamless, that the technological
details become virtually invisible compared to the task’25. Such an aim can only be
achieved by identifying and implementing appropriate Grid tools for geographical
contexts. This theme links with the intelligent GIS discussed below, but also
incorporates the more practical aspects of enabling and scheduling GI procedures.
Examples of geographical tools that will be desirable if we are to maximize the
potential of the Grid include a comprehensive and accessible set of web services for
GI functions that match those available in current GIS and beyond, and which
dovetail with Grid middleware. Additionally, the creation of toolkits and
frameworks that simplify model development for the Grid, such that the current
extra effort in wrapping a model as a grid service is removed, might do much to
make Grid computing a viable alternative for modellers5.
A wide range of methods for specifying the processing sequence or ‘workflow’,
that will collate and order services, is currently under investigation throughout the
Grid literature26. Scheduling algorithms that distribute the modelling tasks specified
in the workflow across multiple machines are a fundamental component of
developing the Grid from a computer science perspective. This distribution will
vary according to the geographical and temporal configuration of the task and
resources available at any one point in time. Investigation of how these scheduling
algorithms support spatial processing in particular will be useful; both previous
research ‘parallelizing’ GI tasks27 and more recent Grid-focused work28,29 suggests
that optimizing the way in which geographical modelling tasks are decomposed and
scheduled over multiple machines may be specific to the spatial context. Indeed,
understanding the changing space-time geographies of the Grid itself is likely to
prove an interesting research area, since ‘data “locality” can seriously affect
performance’30.
10.3 CONCLUSIONS
GIScience. This chapter began by noting two definitions of Grid computing. The
significance of the second definition stressing the virtual organization still requires
yet stronger emphasis if progress towards doing GIS over the Grid is not to be
thwarted, since there may be a lack of immediately applicable ‘big’ compute-
intensive applications. The Grid’s potential to empower using remote resources and
interdisciplinary communication must also be further evaluated if Grid GIS is to
prosper. Linked to this theme, we also need to keep a careful watch on issues of
democracy in research, security, intellectual property and privacy when moving
more closely towards such a component-based, global digital research world.
The Grid has the potential to provide the technical support for exciting new
developments in relevant, global geographies for the 22nd Century. However, in
closing, it is important to note that empirical geography supported by Grid must be
matched by developments in theory, particularly in relation to how we integrate
across scales and between disciplines, if we are not simply to achieve a faster ‘old’
geography or a collection of small components pushed together instead of a
dynamic new version.
10.4 REFERENCES
1.
Foster, I., Kesselman, C., and Tueke, S., The anatomy of the Grid: Enabling scalable virtual
organisations, International Journal of Supercomputer Applications, 15, 200-222, 2001.
2.
Beven, K. J., On environmental models everywhere on the Grid, Hydrological Processes, 17, 171-174,
2003.
3.
Aloisio, G. and Cafaro, M., A dynamic earth observation system, Parallel Computing, 29, 1357-1362,
2003.
4.
Shen, Z., Luo, J., Zhou, C., Cai, S., Zheng, J., Chen, Q., Ming, D., and Sun, Q., Architecture design of
grid GIS and its applications on image processing based on LAN, Information Sciences, 166, 1-17,
2004.
5.
Mineter, M. J., Dowers, S., Skouloudis, A. N., and Jarvis, C. H., Towards use of grids in
environmental research, management and policy, International Journal of Environment and Pollution,
20, 297-308, 2003.
6.
Hartswood, M., Ho, K., Procter, R., Slack, R., and Voss, A., Etiquettes of data sharing in healthcare
and healthcare research, in Proceedings of the 1st International Conference of E-Social Science,
Manchester, UK, 2005.
7.
Chervenak, A., Deelman, E., Kesselman, C., Allcock, B., Foster, I., Nefedova, V., Lee, J., Sim, A.,
Shoshani, A., and Drach, B., High-performance remote access to climate simulation data: a challenge
problem for data grid technologies, Parallel Computing, 29, 1335-1356, 2003.
8.
Edwards, P., Preece, A., Pignotti, E., Polhill, G., and Gotts, N., Lessons learnt from deployment of a
social simulation tool to the semantic Grid, in Proceedings of the 1st International Conference of E-
Social Science, Manchester, UK, 2005.
9.
Ananthanarayan, A., Balachandram, R., Grossman, R., Gu, Y., Hong, X., Levera, J., and Mazzucco,
M., Data webs for earth science data, Parallel Computing, 29, 1363-1379, 2003.