Paper 4 PDF
Paper 4 PDF
634-639
ISSN 2320–088X
REVIEW ARTICLE
Abstract - A grid is a hardware and software infrastructure that allows service oriented, flexible, and seamless
sharing of diverse network of resources. A grid is to compute data intensive tasks and provides faster throughput
and scalability at lower costs. The aim of grid computing is to provide an affordable approach to large-scale
computing problems. The geographically isolated computational resources combined within a grid viewed as a virtual
supercomputer. This paper tells about the various types of grid, Grid Architecture, OGSA, Fault Tolerance, Load
Balancing, and various challenges in grid computing.
Keywords — Grid Computing, Virtual Organizations, OGSA, Fault Tolerance, Load Balancing
I. INTRODUCTION
A parallel architecture in which shared resources across a network acts as functions like one large supercomputer.
A grid allows unused CPU capacity in sharing machines to be allocated to one application that is computation intensive
and programmed for parallel processing. Grid computing is peer to peer computing and distributed computing. The grid
computing gives us yet another way of sharing the computer resource and yields us the maximum benefit at the time and
speed efficiency. Grid computing enables multiple applications to share computing infrastructure, resulting in much
greater scalability, flexibility, cost, power efficiency, performance, and availability at the same time. [1].
In 1998, Ian Foster and Carl Kesselman [2] provided a first definition “A computational grid is a hardware and software
infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational
capabilities.” In a later article [9], the authors refined definition to address social and policy issues, stating that Grid
computing concerned with “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual
organizations.” The key idea is the ability to negotiate resource-sharing arrangements among a set of joining parties
(providers and consumers) and then to use the resulting resource pool for some purpose. Grid computing is an emerging
technology where computational and information resources shared and managed by various organizations in widespread
locations (virtual organizations). Grid computing offers valuable services to work groups such as scientific researchers,
airplane designers, drug-research firms. Resources usually owned by different people, communities or organizations with
varied administrative policies in grid environments. To manage resources in such a distributed environment is a complex
task because of geographical distributed and dynamic behavior of resources. The complexity of grid environment
growing because of increase in number of projects and applications that followed in this domain. The demands exerted
on grid are highly dynamic in nature and varies from application to application. For example, one application may need
large number of CPUs and another application is memory intensive. This makes it difficult to identify idle resources, and
the resource to application mapping. In such scenarios it is important that grids managed by a well-defined scalable and
resource management system.
Grid Types
Computing grid: This grid designed to provide as much computing power as possible. This environment usually
provides services for filing, surveying and managing jobs. In a computational grid most machines are high-performance
servers.
Data grid: A data grid stores and provides reliable access to data across multiple organizations. It manages the physical
data storage, data access policies and security issues of the stored data.
Service Grid: A service grid provides services that not covered by a single machine. It connects users and applications
into workgroups and enables real-time interaction between users and applications with a virtual workspace. Service grids
include on-demand, collaborative and multimedia grid systems.
Decentralized control: Decentralized control on the resources and enables different administration policies and local
management systems within the grid.
High quality service: A grid provides high quality of service in performance, availability and security.
C. Grid Architecture
Grid architecture developed for the establishment, management and cross-organizational resource sharing within a
virtual organization. It identifies basic parts of grid, defines the roles of components and shows how each component
interacts with one another.
Fig.1 illustrates layered grid architecture and its relationship to the internet protocol architecture.
Fig. 1
Fabric Layer: It defines the resources which are shareable. It includes data storage, networks, catalogs and other
computational resources.
Connectivity Layer: It defines the core communication and authentication protocols needed for grid-specific
networking services.
Resource Layer: This layer uses the communication and security protocols defined by the networking communications
layer, to control the secure negotiation, initiation, monitoring, metering, accounting and payment involving sharing
across individual resources.
Collective Layer: It is responsible for all global resource management and interaction with a collection of resources.
This protocol layer imposes a wide variety of sharing behaviors using a few Resource layer and Connectivity layer
protocols.
Application Layer: These are user applications, which formed by using the services defined at each lower layer. Such an
application can directly access the resources, or can access the resource through the Collective Service interface APIs
(Application Provider Interface).
.
Fig. 2
Fig. 3
Globus Toolkit. From 1997 onward, the open source Globus Toolkit version 2 (GT2) [7] emerged as the de facto
standard for Grid computing. Focusing on usability and interoperability, GT2 defined and implemented the protocols,
APIs, and services used in thousands of Grid deployments worldwide. By providing solutions to common problems such
as authentication, resource discovery, and resource access, GT2 speeded up construction of real Grid applications. The
GT2 protocol suite leveraged existing Internet standards for transport, resource discovery, and security. Some of the GT2
protocol suite arranged in formal technical specifications, reviewed within standards bodies, and instantiated in multiple
implementations: notably, the GridFTP [8] data transfer protocol and elements of the Grid Security Infrastructure.
However, in general, GT2 “standards” were neither formal nor subject to public review. Similar comments apply to other
important Grid technologies that emerged during this period, such as the high-throughput computing system.
Checkpoint-recovery and job replication are two techniques used in fault tolerance. Check pointing depends on the
system’s MTTR (Mean Time to Repair) while Replication depends on sites to run replicas.
Check pointing: The check pointing is the technique for providing fault-tolerance on unreliable systems. Check pointing
is a snapshot of entire system state to restart the application after happening of some failure. The checkpoint can store
periodically on temporary as well as stable storage [10]. Frequent check pointing may increase the overhead, while
lazy check pointing may lead to loss of significant computation. Decision on size of the check pointing interval
and the check pointing technique is a complicated task. The various types of check pointing optimization are:
Full check pointing or Incremental check pointing Unconditional periodic check pointing or Optimal (Dynamic) check
pointing
Synchronous (Coordinated) or asynchronous (Uncoordinated) check pointing Kernel, Application or User level check
pointing.
Replication: This technique based on an assumption that single resource much susceptible to failure compared to
simultaneous failure of multiple resources. Unlike check pointing, the replication avoids task recomputation by
executing several copies of the same task on more than one compute stations. The job replication and finding out
best number of replicas involves many technical concerns. The task replication in grids has studied in [12].
© 2014, IJCSMC All Rights Reserved 637
Ankit Punia et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.4, April- 2014, pg. 634-639
Static vs. Dynamic replication: In static replication [10], when any replica fails, it’s not replaced by any other replica.
The number of replicas of original task decided before starting the execution. In dynamic replication, new replicas can
generate during run time.
Active vs. passive replication: In active replication, the state of replicas kept closely synchronized and replicas
service the same requests in parallel and undergo the same state transitions [9]. In Passive replication, primary replica
services requests for clients and other replicas kept as standby and can take over a primary failure [11].
LOAD BALANCING
Load balancing feature always integrated into system to avoid processing delays and over commitment of resources.
These applications built with schedulers and resource managers. The workload push towards resources, based on
resources available, and then pulls jobs from the schedulers depending on their availability. This load balancing involves
partitioning of jobs, identifying the resources, and queuing of the jobs. There are cases when resource reservations
needed, as well as running multiple jobs in parallel. It supports for failure detection and management. The grid model
contains the features of heterogeneous and homogeneous systems, and different coupling choices like tightly coupled and
loosely coupled systems. The main aim of load balancing techniques is to adjust the load effectively to its surrounding.
Zaki et al. [15] judge many CPU execution speeds and balance the jobs effectively. Hendrickson and Devine [14]
evaluate some of the large groups of dynamic load balancing (DLB) techniques. For the heterogeneous systems, they
give more focus on the connections of arrangement with different execution time.
Load balancing algorithms defined by their implementation of the following policies [13]:
Information policy: specifies load information, when it collected and from where.
Location policy: uses results of the resource type policy to find a suitable partner for a server or receiver.
Selection policy: defines tasks that migrated from overloaded resources to idlest ones.
REFERENCES
[1] H. Kargupta and C. Kamath and P. Chan, Distributed and Parallel Data Mining: Emergence, Growth, and Future
Directions, In: Advances in Distributed and Parallel Knowledge Discovery.
[2] Ian Foster and Carl Kesselman (eds),The Grid: Blueprint for a New Computing Infrastructure, 1st edition, Morgan
Kaufmann Publishers, San Francisco, USA (1 November 1998), ISBN: 1558604758.
[4] Kunszt, Peter Z. (April 2002).The Open Grid Services Architecture – A Summary and Evaluation,
http://edms.cern.ch/file/350096/1/OGSAreview.pdf.
[5] Dialani, V., Miles, S., Moreau, L., Roure, D.D. and Luck, M. (August 2002). Transparent Fault Tolerance for Web
Services Based Architectures. Proceedings of 8th International Europar Conference (EURO-PAR ’02),
Paderborn,Germany. Lecture Notes in Computer Science, Springer-Verlag.
[6] Zhang, W., Zhang, J., Ma, D., Wang, B. and Chen, Y. (2004). Key Technique Research on Grid Mobile Service.
Proceedings of the 2nd International Conference on Information Technology for Application (ICITA 2004),
Harbin,China.
[7] Foster, I., and Kesselman, C., Globus: A metacomputing infrastructure toolkit,International Journal of Supercomputer
Applications 11(2), 115–129, 1998.
[8]Allcock, W., Bester, J., Bresnahan, J., Chervenak, A., Liming, L., and Tuecke, S., GridFTP: Protocol Extension to
FTP for the Grid. Global Grid Forum, 2001.
[9] Y. Li, Z. Lan, “Exploit failure prediction for adaptive fault-tolerance in cluster”. In: Proceedings of the sixth IEEE
International symposium on cluster computing and the grid, Vol 1, May 2006, pp.531-538.
[10] Oliner, A.J., Sahoo, R.K., Moreira, J.E., Gupta, M.: “Performance Implications of Periodic Checkpointing on
Large-Scale Cluster Systems”, In Proceedings of the 19th IEEE International Parallel and Distributed Processing
Symposium, Washington, 2005.
[11] Y. Wang, Y. H. jin, W. Guo, W. Q. Sun, W. S. hu and M. Y. Wu, (2007) “Joint Scheduling for Optical Grid
Applications”, Journal of Optical Networking, Vol. 6, pp. 304-318.
[12] S. Agarwal, R. Garg, M. Gupta, and J. Moreira, “Adaptive Incremental Checkpointing formassively Parallel
Systems,” In Proceedings of the 18th Ann. International Conf. Supercomputing, Nov. 2004.
[13]. Leinberger, W., G. Karypis, V. Kumar and R. Biswas, 2000. Load balancing across nearhomogeneous multi-
resource servers. In 9th Heterogeneous Computing Workshop, pp: 60-71.
[14]. Berman, F., G. Fox and Y. Hey, 2003. Grid Computing: Making the Global Infrastructure a Reality. Wiley
Series in Comm. Networking & Distributed System.
[15]. Foster, I. and C. Kesselman, 1997. Globus: a metacomputing infrastructure toolkit. Intl. J. Super- Computer and
High Performance Computing Applications, 11: 115-128.