0% found this document useful (0 votes)
55 views106 pages

Complete Unit 1

This document discusses cloud computing and distributed systems. It begins by explaining the sequential computer architecture and how cache memory improves performance. It then discusses how parallel computing using multiple processors improves performance over single-core processors. The document outlines some key aspects of distributed systems like nodes, networks, middleware, and shared data. It compares client-server and peer-to-peer architectures and explains remote procedure calls and how they allow programs to execute procedures remotely.

Uploaded by

bdfine9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views106 pages

Complete Unit 1

This document discusses cloud computing and distributed systems. It begins by explaining the sequential computer architecture and how cache memory improves performance. It then discusses how parallel computing using multiple processors improves performance over single-core processors. The document outlines some key aspects of distributed systems like nodes, networks, middleware, and shared data. It compares client-server and peer-to-peer architectures and explains remote procedure calls and how they allow programs to execute procedures remotely.

Uploaded by

bdfine9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Cloud Computing

Dr. Rajeev Kumar Gupta


Assistant Professor
Pandit Deendayal Energy University
Gandhinagar, Gujarat
1
Sequential Computer Architecture

Computer Performance
1) Processor speed
2) Bandwidth
3) Latency

2
Sequential Computer Architecture with
Cache Memory

3
Performance Improving
 From 1986-2002, microprocessors were speeding like a rocket, increasing in
performance.

 Since then, it’s performance improvement was stopped.

Problem
 Up to now, performance increases by increasing the density of the
transistors.

 But there are inherent problems

 Smaller transistors = Faster processors


 Faster processors = Increased power consumption
 Increased power consumption = Increased heat
 Increased heat = Unreliable processors

4
An Intelligent Solution
 Move away from single-core system to multicore processors
 “Core” = Processing unit
 Introduction of parallelism

 But……

 Adding more processors don’t help much if programmers aren’t aware


of them or don’t know how to use them.
 Serial programming don’t benefit in multi core processor (in most cases)

5
Serial Computing V/s Parallel Computing
 A processing in which one task is running at a time and all the tasks are run
by the processor in a sequence.
 Any operating system running on the single processor is an example of the serial
operating system.
 A type of processing in which multiple tasks are completed at a time by
different processors.
 In parallel processing there is more than one processor involved.

6
Parallel Computing
 It is a form of computation in which many calculations are carried
out simultaneously, operating on the principle that large problem can
be divided into smaller ones, which are then solved concurrently (in
parallel).
 So we need to write serial programs so that they are parallel.

 Write translation program that automatically convert serial programs


into parallel programs.
 They are very difficult to write.

 Success is limited.

7
Parallel Computing Basic Idea

17 February 2024 8
How Many?......

9
Connecting Multiple Computers

Advantages:
1) Multiple messages can be shared across processor
2) Scalable
Disadvantages
1) Latency and communication overhead
10
Why Distributed System

 In the early 1940-1960, high speeding computing, originally


implemented only in supercomputers for scientific research.
 Tools and systems available to implement and create high
performance computing system.
 Main area of discipline is developing a solution that can be divide
the task into small independent parts and can be executed
simultaneously by separate processor/system.
 HPC system have shifted from supercomputers to distributed system.

11
Distributed System
 A Distributed System (DS) is a collection of autonomous computer systems
that are physically separated but are connected by a centralized computer
network that is equipped with distributed system software.

 In a distributed system, each device or system has its own processing


capabilities and may also store and manage its own data.

12
 The autonomous computers will communicate among each system by
sharing resources and files and performing the tasks assigned to them.

 These devices or systems work together to perform tasks and share


resources, with no single device serving as the central hub.

 These systems can be tightly (processors share memory) or loosely


connected (every processor has its own memory module) computers that
work together so that they act as a single entity.

13
 Each autonomous System has a common Application that can have its own
data that is shared by the Centralized Database System.

 To Transfer the Data to Autonomous Systems, Centralized System should be


having a Middleware Service and should be connected to a Network.

 Middleware Services enable some services which are not present in the
local systems or centralized system default by acting as an interface
between the Centralized System and the local systems. By using
components of Middleware Services systems communicate and manage
data. 14
Middleware in DS
 Middleware is an intermediate layer of software that sits between the
application and the network.

 It is used in distributed systems to provide common services, such as


authentication, authorization, compilation for best performance on particular
architectures, input/output translation, and error handling

15
6 Key Components Of Distributed
Architectures
 Nodes
 These are the individual computers or servers that make up the
distributed system. Each node runs its own instances of the
applications and services that make up the system.
 Network
 This is the communication infrastructure that connects all
nodes. The network enables data and information to be exchanged
between nodes.
 Middleware
 This software layer provides a programming model for developers
and masks the heterogeneity of the underlying network, hardware, and
operating systems. It provides useful abstractions and services which
simplify the process of creating complex distributed systems.
16
 Shared Data/Database
 This is where the system stores and retrieves data. Depending on the
specific design of the distributed system, the data could be distributed
across multiple nodes, replicated, or partitioned.

 Distributed Algorithms
 These are the rules and procedures nodes follow to communicate and
coordinate with each other. They enable nodes to work together, even
when some nodes fail or network connections are unreliable.

 System Management Tools


 These tools help manage, monitor, and troubleshoot the distributed
system. They provide functionality for load balancing, fault tolerance,
system configuration, and more.

17
Why DS?

18
Why Not DS
 Communication may fail.

 Writing a program to run in a distributed system is difficult.

19
Pros and Cons of DS

20
Characteristics of Distributed System

21
Architecture of DS
 Distributed architectures vary in the way they organize nodes, share
computational tasks, or handle communication between different parts of
the system.

1) Client-Server Architecture

22
Peer-to-Peer (P2P)
 The P2P architecture is a unique type of distributed system that operates
without centralized control.
 In this architecture, any node, also referred to as a peer, can function as
either a client or a server. When a node requests a service, it acts as a
client and when it offers a service, it's considered a server.

23
Advantages of P2P

24
Disadvantages of P2P

25
Remote Procedure Call (RPC)
 RPC is a communication technology that is used by one program to make a
request to another program for utilizing its service on a network without
even knowing the network’s details.

 It is based on the client-server concept.

 An RPC, like a local procedure call, is based on the synchronous operation


that requires the requesting application to be stopped until the remote
process returns its results.

26
RPC Architecture
RPC architecture has mainly five components of the program:
1.Client
2.Client Stub
3.RPC Runtime
4.Server Stub
5.Server

27
How RPC Works?
 Step 1) The client, the client stub, and one instance of RPC run time execute on the
client machine.

 Step 2) A client starts a client stub process by passing parameters in the usual way.
The client stub stores within the client’s own address space. It also asks the local
RPC Runtime to send back to the server stub.

 Step 3) In this stage, RPC accessed by the user by making regular Local Procedural
Cal. RPC Runtime manages the transmission of messages between the network
across client and server. It also performs the job of retransmission, acknowledgment,
routing, and encryption.

 Step 4) After completing the server procedure, it returns to the server stub, which
packs (marshalls) the return values into a message. The server stub then sends a
message back to the transport layer.

 Step 5) In this step, the transport layer sends back the result message to the client
transport layer, which returns back a message to the client stub.

 Step 6) In this stage, the client stub demarshalls (unpack) the return parameters, in
28
the resulting packet, and the execution process returns to the caller.
Applications Area of Distributed
System
 Finance and Commerce: Amazon, eBay, Online Banking, E-Commerce
websites.
 Information Society: Search Engines, Wikipedia, Social Networking,
Cloud Computing.
 Cloud Technologies: AWS, Salesforce, Microsoft Azure, SAP.
 Entertainment: Online Gaming, Music, youtube.
 Healthcare: Online patient records, Health Informatics.
 Education: E-learning.
 Transport and logistics: GPS, Google Maps.
 Environment Management: Sensor technologies.

29
Cluster Computing History
 The first inspiration for cluster computing was developed in the 1960s
by IBM as an alternative of linking large mainframes to provide a
more cost-effective form of commercial parallelism.

 Cluster computing did not gain momentum until the convergence of


four important trends in the 1980s:

 high-performance microprocessors,

 high-speed networks

 standard tools for high-performance distributed computing.

 increasing need of computing power for computational science and


commercial applications coupled with the high cost and low
accessibility of traditional supercomputers.
Cluster Computing: Definition &
Components
 A cluster is a type of parallel or distributed computer system, which
consists of a collection of interconnected stand-alone computers
working together as a single integrated computing resource.
 The clusters are generally connected through fast local area
networks (LANs).

 In most circumstances, all of the nodes use the same


hardware and the same operating system, although in some
setups an be geographically distributed.

 Different operating systems can be used on each computer,


and/or different hardware.

32
Advantages
 The emergence of cluster platforms was driven by a number of
academic projects, such as Beowulf [2], Berkeley NOW [3], and
HPVM [4]

 Advantages of Cluster Computing:

 low-entry costs to access supercomputing-level performance,

 incrementally upgradeable system,

 open source development platforms, and

 vendor independence
Cluster Architecture
Cluster: Fast Interconnection
Technologies

How we will select the fast interconnection technologies

 Compatibility with the cluster hardware


 Compatibility with the OS
 Cost efficient
 Good performance
 Support for virtualization
 Number of nodes supported
Cluster: Interconnection Technologies
Single System Image (SSI)
 Represents the view of a distributed system as a single unified computing resource.

 Provides better usability for the users as it hides the complexities of the underlying
distributed and heterogeneous nature of clusters from them.

 SSI can be established through one or several mechanisms implemented at various levels
of abstraction in the cluster architecture: hardware, operating system, middleware, and
applications
SSI at OS Level
 The operating system in each of the cluster nodes provides the
fundamental system support for the combined operation of the
cluster.

 The operating system provides services such as protection


boundaries, process/thread coordination, inter-process
communication, and device handling, thus creating a high-level
software interface for user applications
Resource Management System (RMS)
Middleware
 A cluster resource management system (RMS) acts as a cluster
middleware that implements the SSI for a cluster of machines.

 It enables users to execute jobs on the cluster without the need to


understand the complexities of the underlying cluster architecture.

 A RMS manages the cluster through four major branches, namely:


resource management, node monitor, job scheduling, and job
management.
40
RMS Middleware
Cluster Programming Models
 Cluster computing programming models have traditionally been
divided into categories based on the relationship of programs to the
data the programs operate on
 The Single-Instruction, Single-Data (SISD) model defines the
traditional von Neumann computer.
 Multiple-Instruction, Multiple-Data (MIMD) machines.
 Multiple-Instruction, Single Data (MISD) model.
 Single-Instruction, Multiple Data (SIMD) model.

 How they exploit a cluster’s inherent parallelism.


Flynn’s Classification
 Cluster computing programming models have traditionally
been divided into categories based on the relationship of
programs to the data the programs operate on:

 Flynn's taxonomy is a categorization of forms of parallel


computer architectures.

1) SISD Single Instruction, Single data


2) SIMD Single Instruction, Multiple data
3) MISD Multiple Instructions, Single data
4) MIMD Multiple Instructions, Multiple data

43
Single-instruction, single-data (SISD) systems –

 An SISD computing system is a uniprocessor machine which is capable of


executing a single instruction, operating on a single data stream.
 In SISD, machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential computers.

44
Single-instruction, multiple-data (SIMD) systems –

 An SIMD system is a multiprocessor machine capable of executing the


same instruction on all the CPUs but operating on different data streams.

 Machines based on an SIMD model are well suited to scientific computing


since they involve lots of vector and matrix operations.

45
Multiple-instruction, single-data (MISD) systems –

 An MISD computing system is a multiprocessor machine capable of


executing different instructions on different PEs but all of them operating on
the same dataset .
 Example Z = sin(x)+cos(x)+tan(x)

46
Multiple-instruction, multiple-data (MIMD) systems –
 An MIMD system is a multiprocessor machine which is capable of
executing multiple instructions on multiple data sets.

 Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of
application.
 MIMD machines are broadly categorized into shared-memory
MIMD and distributed-memory MIMD based on the way PEs are
coupled to the main memory.

47
Based on exploit a cluster’s inherent parallelism
 Cluster computing programming models can roughly be
divided in two categories:
 The first category of models allow a serial (non-parallel) application
to take advantage of a cluster’s parallelism.
 The second category of programming models aid in the explicit
parallelization of a program.
 In SPPS (serial program, parallel subsystem), many instances of a
serial program are distributed on a cluster.
 A parallel subsystem provides input to each serial program instance,
and captures output from those programs, delivering that output to
users.
 Because there are multiple programs on the clusters, operating on
multiple data, SPPS is a form of MIMD.
Cluster Programming Models
 When many instances of a serial program operating in parallel,
those instances must coordinate work through a shared cluster
resource, such as distributed shared memory or a message
passing infrastructure.
 The primitive operations that coordinate the work of
concurrently executing serial programs on a cluster define a
coordination language.
 A coordination language is often described in terms of an
Application Programming Interface (API) to a parallel
subsystem
50
Example: LINDA
 The Linda tuple-space system exploits distributed shared memory to facilitate the
parallel execution of a serial program on a cluster.
 Linda defines primitive operations on a shared memory resource, allowing data
items – “tuples” – to be written to that memory, read from shared memory, and
deleted from shared memory.
 A tuple is similar to a database relation.
 A serial process with access to the shared memory writes data items to memory,
marking each item with an attribute indicating that that item requires processing.
 Another process awaiting newly arriving tuples removes such an item from the
shared memory, performs computations on that data item, and deposits the results
into the shared memory.
 The original submitter of the job then collects all the results from the tuple-space.
 Each process operating on a tuple-space is typically a serial program, and the Linda
system facilitates the concurrent execution of many such program
Example: JAVASPACES
 JavaSpaces is an object-oriented Linda system that takes advantage of Java’s
platform-independent code execution and mobile code facility.
 Mobile code allows not just data, but also code to move from one cluster node to
another at runtime.
 A master node runs a JavaSpace process, providing a shared memory resource to other
cluster nodes that act as workers.
 When a worker removes a job request from the shared JavaSpace, the operating codes
for that job dynamically download to that worker. The worker executes that
downloaded code and places the output of that execution into the JavaSpace.
 JavaSpaces, therefore, facilitates the automatic runtime distribution of code to cluster
nodes.
 JavaSpaces also provides optional transactional access to the shared memory resource,
which is especially helpful in the case of very large clusters with frequent node
failures
Types of Cluster Computing
 Computer clusters can generally be categorized as three types:
1. Highly available or fail-over
2. Load balancing
3. High-performance computing

 There are a few important terms to remember when discussing the


robustness of a system:
• Availability - the accessibility of a system or service over a period of
time,
• Resilience - how well a system recovers from failure
• Fault tolerance - the ability of a system to continue providing a service
in the event of a failure
• Reliability - the probability that a system will function as expected
• Redundancy - duplication of critical resources to improve system
reliability 53
1. High Availability (HA) and Failover Clusters

 These cluster models generate the availability of services and resources


in an uninterrupted technique using the system’s implicit redundancy.

 The basic term of Cluster is that if a node declines, then applications and
services can be made available to different nodes.

 These methods of clusters deliver as the element for critical missions,


mails, documents, and application servers.

54
2. Load-balancing clusters:
 Workload is distributed across multiple installed servers in
the cluster network.

3. High-performance (HP) clusters:


 This computer networking tactic use supercomputers and
Cluster computing to resolve complex and highly
advanced computation problems.

55
Application of Cluster Computing
 Compute-intensive applications
 Environment science
 Grand Challenge application
 Fluid mechanics
 Ecosystem simulation
 Biomedical images
 Biomolecular
 Applications that required high availability
 Applications that required scaling of machine
 Applications that required high performance
Case Study: Google Search Engine
 Google uses cluster computing to meet the huge quantity of worldwide
search requests that comprise of a peak of thousands of queries per
second.
 A single Google query needs to use at least tens of billions of
processing cycles and access a few hundred megabytes of data to
return satisfactory search result.
 Google uses cluster computing as its solution to the high demand of
system resources since clusters have better price-performance ratios
than alternative high-performance computing platforms and use less
electrical power.
 Google focuses on 2 important design factors: reliability and request
throughput.
Case Study: Google Search Engine

 Google is able to achieve reliability at the software level so that a


reliable computing infrastructure can be constructed on clusters of
15,000 commodity PCs distributed worldwide.

 The services for Google are also replicated across multiple machines
in the clusters to provide the necessary availability.

 Google maximizes overall request throughput by performing parallel


execution of individual search requests. This means that more search
requests can be completed within a specific time interval.
Case Study: Google Search Engine
 A typical Google search consists of the following operations:
1) An Internet user enters a query at the Google webpage.
2) The web browser searches for the Internet Protocol (IP) address via the
www.google.com Domain Name Server (DNS).
3) Google uses a DNS-based load balancing system that maps the query to a
cluster that is geographically nearest to the user so as to minimize network
communication delay time. The IP address of the selected cluster is returned.
4) The web browser then sends the search request in Hypertext Transport Protocol
(HTTP) format to the selected cluster at the specified IP address.
5) The selected cluster then processes the query locally.
6) A hardware-based load balancer in the cluster monitors the available set of
Google Web Servers (GWSs) in the cluster and distributes the requests evenly
within the cluster.
7) A GWS machine receives the request, coordinates the query execution and
sends the search result back to the user’s browser.
Case Study: Google Search Engine
 The first phase of query execution involves index
servers consulting an inverted index that match each
query keyword to a matching list of documents.
 Relevance scores are also computed for matching
documents so that the search result returned to the
user is ordered by score.

 In the second phase, document servers fetch each


document from disk to extract the title and the
keyword-in-context portion of the document.

 In addition to the 2 phases, the GWS also activates the


spell checker and the ad server.

 The spell checker verifies that the spelling of the query


keywords is correct, while the ad server generate
advertisements that relate to the query and may
therefore interest the user
Grid Computing
 Grid computing is a form of distributed computing whereby a "super and
virtual computer" is composed of a cluster of networked, loosely coupled
computers, acting in concert to perform very large tasks.

 It is a decentralized computing because every node manages it’s resources


independently.

 Grid computing is a computing infrastructure that combines computer


resources spread over different geographical locations to achieve a common
goal.

 All unused resources on multiple computers are pooled together and made
available for a single task.

 Enable communities (“virtual organizations”) to share geographically


distributed resources as they pursue common goals.

61
Criteria for a Grid:
Coordinates resources that are not subject to centralized control.
Uses standard, open, general-purpose protocols and interfaces.
Delivers nontrivial qualities of service.
Reliability, Performance, Scalability, Security

Benefits
Exploit Underutilized resources
Resource load Balancing
Virtualize resources across an enterprise
Data Grids, Compute Grids
Enable collaboration for virtual organizations
Grid Applications
Data and computationally intensive applications:
This technology has been applied to computationally-intensive scientific,
mathematical, and academic problems like drug discovery, economic
forecasting, seismic analysis back office data processing in support of e-
commerce
 A chemist may utilize hundreds of processors to screen thousands of
compounds per hour.
 Teams of engineers worldwide pool resources to analyze terabytes of structural
data.
 Meteorologists seek to visualize and analyze petabytes of climate data with
enormous computational demands.

Resource sharing
 Computers, storage, sensors, networks, …
 Sharing always conditional: issues of trust, policy, negotiation, payment, …

Coordinated problem solving


 distributed data analysis, computation, collaboration, …
Grid Topologies
• Intragrid
– Local grid within an organisation
– Trust based on personal contracts

• Extragrid
– Resources of a consortium of organizations
connected through a (Virtual) Private Network
– Trust based on Business to Business contracts

• Intergrid
– Global sharing of resources through the internet
– Trust based on certification organizations
65
66
67
68
Computational Grid
 Computational and data grids are two key components in grid computing that
play distinct roles.

Definition: A computational grid focuses on distributing computational tasks or


workloads across a network of interconnected computers.

Functionality: It allows the sharing of processing power to solve large-scale


problems that require significant computational resources. Tasks are divided into
smaller sub-tasks, which are then distributed to different nodes (computers) on
the grid for parallel processing.

Key Features:
 Parallel Processing: Computational grids leverage parallel processing to
accelerate the execution of tasks by dividing them into smaller units that
can be processed simultaneously.
 Load Balancing: The grid system ensures an even distribution of
computational tasks among available resources, preventing overload on
specific nodes and optimizing overall performance.
Computational Grid
 However, compute grids are useful even if you don't need to split your
computation - they help you improve overall scalability and fault-
tolerance of your system by offloading your computations onto
most available nodes.
Ex. Mapreduce

 “A computational grid is a hardware and software


infrastructure that provides dependable, consistent, and
inexpensive access to high-end computational capabilities.”
Data Grid
Definition: A data grid focuses on the management and distribution of large
amounts of data across a grid infrastructure.

Functionality: It enables the sharing and access of data resources distributed


across the grid. Data grids are particularly useful when dealing with large
datasets that cannot be stored or processed on a single machine.

Key Features:
 Data Replication: Data grids often use replication to ensure data
availability and fault tolerance. Multiple copies of data are stored across
different nodes, reducing the risk of data loss due to hardware failures.
 Data Access and Retrieval: Data grids provide mechanisms for
efficiently accessing and retrieving data from distributed storage
locations. This involves addressing challenges related to data transfer
speed and latency.
 Data Caching: Frequently accessed data can be cached locally on
nodes to reduce the need for repeated data transfers across the grid.
 Data Grid is the storage component of a grid environment. Scientific
and engineering applications require access to large amounts of data,
and often this data is widely distributed. A data grid provides
seamless access to the local or remote data required to complete
compute intensive calculations.

Example :
 Biomedical informatics Research Network (BIRN),
 Southern California earthquake Center (SCEC).
Methods of Grid Computing
1) Distributed Supercomputing
2) High-Throughput Computing
3) On-Demand Computing
4) Data-Intensive Computing
5) Collaborative Computing
6) Logistical Networking
Distributed Supercomputing

Combining multiple high-capacity resources on a


computational grid into a single, virtual distributed
supercomputer.

Tackle problems that cannot be solved on a single


system.
High-Throughput Computing
 Uses the grid to schedule large numbers of loosely coupled or
independent tasks, with the goal of putting unused processor
cycles to work.

On-Demand Computing
 Uses grid capabilities to meet short-term requirements for
resources that are not locally accessible.
 Models real-time computing demands.
Collaborative Computing
 Concerned primarily with enabling and enhancing human-to-
human interactions.
 Applications are often structured in terms of a virtual shared space.

Data-Intensive Computing
 The focus is on synthesizing new information from data that is
maintained in geographically distributed repositories, digital libraries,
and databases.

 Particularly useful for distributed data mining.


Logistical Networking
 Logistical networks focus on exposing storage resources
inside networks by optimizing the global scheduling of data
transport, and data storage.
 Contrasts with traditional networking, which does not
explicitly model storage resources in the network.
 high-level services for Grid applications
 Called "logistical" because of the analogy it bears with the
systems of warehouses, depots, and distribution channels.
P2P Computing vs Grid Computing
 Both P2P and Grid computing involve the distributed use of resources,
 P2P networks are often more decentralized, with equal peers
contributing resources voluntarily, while Grid computing typically
involves a more centralized control structure for managing
distributed resources in a coordinated fashion.
 P2P networks are often used for file sharing, distributed computing
(such as SETI@home), and collaborative applications where peers
contribute resources for a common goal. Grids are typically used for
complex scientific, engineering, or business applications that
require significant computing power and resources.
 Grid system deals with more complex, more powerful, more diverse
and highly interconnected set of resources than
P2P.
A Typical View of Grid Environment
Grid Information Service Grid Information Service
system collects the details of Details of Grid resources
the available Grid resources
and passes the information 1
to the resource broker.
2
4
Computational jobs
3
Grid application

Processed jobs
Computation result

User
A User sends computation
Resource Broker
A Resource Broker distribute the
or data intensive application
to Global Grids in order to
jobs in an application to the Grid Grid Resources
resources based on user’s QoS Grid Resources (Cluster, PC,
speed up the execution of requirements and details of available Supercomputer, database,
the application. Grid resources for further executions. instruments, etc.) in the Global
Grid execute the user jobs.
Grid Middleware
 Grids are typically managed by grid ware -
a special type of middleware that enable sharing and manage
grid components based on user requirements and resource
attributes (e.g., capacity, performance)
 Software that connects other software components or
applications to provide the following functions:
Run applications on suitable available resources
– Brokering, Scheduling
Provide uniform, high-level access to resources
– Semantic interfaces
– Web Services, Service Oriented Architectures
Address inter-domain issues of security, policy, etc.
– Federated Identities
Provide application-level status
monitoring and control
Middlewares
Globus – Chicago Univ
Condor – Wisconsin Univ – High throughput
computing
Legion – Virginia Univ – virtual workspaces-
collaborative computing
IBP – Internet back pane – Tennesse Univ –
logistical networking
NetSolve – solving scientific problems in
heterogeneous env – high throughput & data
intensive.
The Hourglass Model
 Focus on architecture issues
Applications
 Propose set of core services as
Diverse global services
basic infrastructure
 Used to construct high-level,
domain-specific solutions
(diverse)
Core
 Design principles
services
 Keep participation cost low
 Enable local control
 Support for adaptation
 “IP hourglass” model Local OS
83
84
85
Layered Grid Architecture
(By Analogy to Internet Architecture)
Application

Internet Protocol Architecture


“Coordinating multiple resources”:
ubiquitous infrastructure services, Collective
app-specific distributed services. Application

“Sharing single resources”:


negotiating access, controlling use. Resource

“Talking to things”: communication


(Internet protocols) & security. Connectivity Transport
Internet
“Controlling things locally”: Access
to, & control of, resources. Fabric Link
 We define Grid architecture in terms of a layered collection of
protocols.
 Fabric layer includes the protocols and interfaces that provide access to
the resources that are being shared, including computers, storage
systems, datasets, programs, and networks. This layer is a logical view
rather then a physical view. For example, the view of a cluster with a
local resource manager is defined by the local resource manger, and not
the cluster hardware. Likewise, the fabric provided by a storage system
is defined by the file system that is available on that system, not the raw
disk or tapes.
 The connectivity layer defines core protocols required for Grid-specific
network transactions. This layer includes the IP protocol stack (system
level application protocols [e.g. DNS, RSVP, Routing], transport and
internet layers), as well as core Grid security protocols for authentication
and authorization.

87
 Resource layer defines protocols to initiate and control sharing of (local)
resources. Services defined at this level are gatekeeper, GRIS, along
with some user oriented application protocols from the Internet protocol
suite, such as file-transfer.

 Collective layer defines protocols that provide system oriented


capabilities that are expected to be wide scale in deployment and generic
in function. This includes GIIS, bandwidth brokers, resource
brokers

 Application layer defines protocols and services that are parochial in


nature, targeted towards a specific application domain or class of
applications.

88
Scavenging Grid
 While similar to computational grids, CPU scavenging grids have many regular computers.
The term scavenging describes the process of searching for available computing
resources in a network of regular computers.

 While other network users access the computers for non-grid–related tasks, the grid software
uses these nodes when they are free. The scavenging grid is also known as CPU scavenging
or cycle scavenging.

89
Open Grid Services Architecture (OGSA)
 OGSA is a standard and open architecture for Grid systems, which
was designed by academia and the industry.

 OGSA is based on fundamental concepts and technologies from Grid


computing and Web Services.

 OGSA defines a core set of standard service interfaces with their


associated semantics for purposes such as state management, fault
management, and service creation and management.
 It defines a set of rules that make up a grid service.

 It aims to define a common, standard, and open architecture for grid-


based applications.

 It is an architecture designed for building and managing distributed


computing environments. 90
 Features of OGSA

• OGSA is built on ‘Service-Oriented Architecture’ principles which


means that it provides a way to access and use services distributed
across a network.
• In OGSA each service is designed to perform a specific function and
services which can be combined to create more complex applications.

• It is specifically designed for grid computing which involves sharing


computing resources across a network. OGSA provides a way to manage
these resources and ensure that they are used efficiently.

91
What does OGSA do?

• OGSA is based on a service-oriented architecture where Grid resources


are exposed as services that can be accessed and combined to create
higher-level applications.
• OGSA provides a secure environment for sharing resources and
executing applications, with support for authentication, authorization,
and encryption.
• OGSA defines standard interfaces and protocols for managing
resources, publishing and discovering services, and executing
distributed applications.

92
Open Grid Services Infrastructure
(OGSI)
 OGSI stands for Open Grid Services Infrastructure which is a
specification that defines a set of interfaces and protocols for
building grid services.

• It provides an infrastructure layer for the Open Grid Services


Architecture.
• OGSI is based on web services technologies, such as SOAP (Simple
Object Access Protocol) and WSDL (Web Services Description
Language).
• It is designed to provide a standard way to build and manage grid
services.
• It is based on WSRF (Web Services Resources Framework).

93
 What does OGSI do?
• OGSI provides a standardized programming model for building Grid
services. which allows developers to create powerful and scalable
distributed computing environments.
• By providing a consistent set of interfaces and protocols for managing
Grid resources, OGSI helps to simplify the development, deployment,
and management of Grid applications.

Example
• OGSA is the blueprint that the architect creates to show how the
building will look.
• OGSI is the structural design that the engineer creates to support the
architect’s vision for the building.

94
Simulation tools
 GridSim – job scheduling

 SimGrid – single client multiserver scheduling

 Bricks – scheduling

 GangSim- Ganglia VO

 OptoSim – Data Grid Simulations

 G3S – Grid Security services Simulator – security services


Simulation tool
 GridSim is a Java-based toolkit for modeling, and
simulation of distributed resource management and
scheduling for conventional Grid environment.

 GridSim is based on SimJava, a general purpose


discrete-event simulation package implemented in
Java.

 All components in GridSim communicate with each


other through message passing operations defined by
SimJava.
Salient features of the GridSim
 It allows modeling of heterogeneous types of resources.

 Resources can be modeled operating under space- or time-shared


mode.

 Resource capability can be defined (in the form of MIPS (Million


Instructions Per Second) benchmark.

 Resources can be located in any time zone.

 Weekends and holidays can be mapped depending on resource’s


local time to model non-Grid (local) workload.

 Resources can be booked for advance reservation.

 Applications with different parallel application models can be


simulated.
Salient features of the GridSim
 Application tasks can be heterogeneous and they can be CPU or I/O
intensive.

 There is no limit on the number of application jobs that can be


submitted to a resource.

 Multiple user entities can submit tasks for execution simultaneously


in the same resource, which may be time-shared or space-shared.
This feature helps in building schedulers that can use different
market-driven economic models for selecting services competitively.

 Network speed between resources can be specified.

 It supports simulation of both static and dynamic schedulers.

 Statistics of all or selected operations can be recorded and they can


be analyzed using GridSim statistics analysis methods.
A Modular Architecture for GridSim Platform and Components.
 GridSim’s higher level operation can be summarized in four distinct
operational steps.

 The first step is to identify and create Grid resources. These resources
could be of different sizes and configurations and are created in relation
to the experiment to be carried out.

 The next step is the creation of an application to use these resources.


These applications are defined as a collection of “Gridlets” or Grid jobs
that are created.

 The third step is the creation of the Grid user, which is the entity that
interacts with the broker. This interaction leads to the coordination of the
scheduling requirements for the simulation experiment.

 The final step is the entity responsible for allocating resources to the
jobs scheduled in the experiment

100
Globus Toolkit

101
102
Functional Modules of Globus for Core
Services

103
Client Globus Interaction

104
105
106

You might also like