Complete Unit 1
Complete Unit 1
Computer Performance
1) Processor speed
2) Bandwidth
3) Latency
2
Sequential Computer Architecture with
Cache Memory
3
Performance Improving
From 1986-2002, microprocessors were speeding like a rocket, increasing in
performance.
Problem
Up to now, performance increases by increasing the density of the
transistors.
4
An Intelligent Solution
Move away from single-core system to multicore processors
“Core” = Processing unit
Introduction of parallelism
But……
5
Serial Computing V/s Parallel Computing
A processing in which one task is running at a time and all the tasks are run
by the processor in a sequence.
Any operating system running on the single processor is an example of the serial
operating system.
A type of processing in which multiple tasks are completed at a time by
different processors.
In parallel processing there is more than one processor involved.
6
Parallel Computing
It is a form of computation in which many calculations are carried
out simultaneously, operating on the principle that large problem can
be divided into smaller ones, which are then solved concurrently (in
parallel).
So we need to write serial programs so that they are parallel.
Success is limited.
7
Parallel Computing Basic Idea
17 February 2024 8
How Many?......
9
Connecting Multiple Computers
Advantages:
1) Multiple messages can be shared across processor
2) Scalable
Disadvantages
1) Latency and communication overhead
10
Why Distributed System
11
Distributed System
A Distributed System (DS) is a collection of autonomous computer systems
that are physically separated but are connected by a centralized computer
network that is equipped with distributed system software.
12
The autonomous computers will communicate among each system by
sharing resources and files and performing the tasks assigned to them.
13
Each autonomous System has a common Application that can have its own
data that is shared by the Centralized Database System.
Middleware Services enable some services which are not present in the
local systems or centralized system default by acting as an interface
between the Centralized System and the local systems. By using
components of Middleware Services systems communicate and manage
data. 14
Middleware in DS
Middleware is an intermediate layer of software that sits between the
application and the network.
15
6 Key Components Of Distributed
Architectures
Nodes
These are the individual computers or servers that make up the
distributed system. Each node runs its own instances of the
applications and services that make up the system.
Network
This is the communication infrastructure that connects all
nodes. The network enables data and information to be exchanged
between nodes.
Middleware
This software layer provides a programming model for developers
and masks the heterogeneity of the underlying network, hardware, and
operating systems. It provides useful abstractions and services which
simplify the process of creating complex distributed systems.
16
Shared Data/Database
This is where the system stores and retrieves data. Depending on the
specific design of the distributed system, the data could be distributed
across multiple nodes, replicated, or partitioned.
Distributed Algorithms
These are the rules and procedures nodes follow to communicate and
coordinate with each other. They enable nodes to work together, even
when some nodes fail or network connections are unreliable.
17
Why DS?
18
Why Not DS
Communication may fail.
19
Pros and Cons of DS
20
Characteristics of Distributed System
21
Architecture of DS
Distributed architectures vary in the way they organize nodes, share
computational tasks, or handle communication between different parts of
the system.
1) Client-Server Architecture
22
Peer-to-Peer (P2P)
The P2P architecture is a unique type of distributed system that operates
without centralized control.
In this architecture, any node, also referred to as a peer, can function as
either a client or a server. When a node requests a service, it acts as a
client and when it offers a service, it's considered a server.
23
Advantages of P2P
24
Disadvantages of P2P
25
Remote Procedure Call (RPC)
RPC is a communication technology that is used by one program to make a
request to another program for utilizing its service on a network without
even knowing the network’s details.
26
RPC Architecture
RPC architecture has mainly five components of the program:
1.Client
2.Client Stub
3.RPC Runtime
4.Server Stub
5.Server
27
How RPC Works?
Step 1) The client, the client stub, and one instance of RPC run time execute on the
client machine.
Step 2) A client starts a client stub process by passing parameters in the usual way.
The client stub stores within the client’s own address space. It also asks the local
RPC Runtime to send back to the server stub.
Step 3) In this stage, RPC accessed by the user by making regular Local Procedural
Cal. RPC Runtime manages the transmission of messages between the network
across client and server. It also performs the job of retransmission, acknowledgment,
routing, and encryption.
Step 4) After completing the server procedure, it returns to the server stub, which
packs (marshalls) the return values into a message. The server stub then sends a
message back to the transport layer.
Step 5) In this step, the transport layer sends back the result message to the client
transport layer, which returns back a message to the client stub.
Step 6) In this stage, the client stub demarshalls (unpack) the return parameters, in
28
the resulting packet, and the execution process returns to the caller.
Applications Area of Distributed
System
Finance and Commerce: Amazon, eBay, Online Banking, E-Commerce
websites.
Information Society: Search Engines, Wikipedia, Social Networking,
Cloud Computing.
Cloud Technologies: AWS, Salesforce, Microsoft Azure, SAP.
Entertainment: Online Gaming, Music, youtube.
Healthcare: Online patient records, Health Informatics.
Education: E-learning.
Transport and logistics: GPS, Google Maps.
Environment Management: Sensor technologies.
29
Cluster Computing History
The first inspiration for cluster computing was developed in the 1960s
by IBM as an alternative of linking large mainframes to provide a
more cost-effective form of commercial parallelism.
high-performance microprocessors,
high-speed networks
32
Advantages
The emergence of cluster platforms was driven by a number of
academic projects, such as Beowulf [2], Berkeley NOW [3], and
HPVM [4]
vendor independence
Cluster Architecture
Cluster: Fast Interconnection
Technologies
Provides better usability for the users as it hides the complexities of the underlying
distributed and heterogeneous nature of clusters from them.
SSI can be established through one or several mechanisms implemented at various levels
of abstraction in the cluster architecture: hardware, operating system, middleware, and
applications
SSI at OS Level
The operating system in each of the cluster nodes provides the
fundamental system support for the combined operation of the
cluster.
43
Single-instruction, single-data (SISD) systems –
44
Single-instruction, multiple-data (SIMD) systems –
45
Multiple-instruction, single-data (MISD) systems –
46
Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of
executing multiple instructions on multiple data sets.
Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of
application.
MIMD machines are broadly categorized into shared-memory
MIMD and distributed-memory MIMD based on the way PEs are
coupled to the main memory.
47
Based on exploit a cluster’s inherent parallelism
Cluster computing programming models can roughly be
divided in two categories:
The first category of models allow a serial (non-parallel) application
to take advantage of a cluster’s parallelism.
The second category of programming models aid in the explicit
parallelization of a program.
In SPPS (serial program, parallel subsystem), many instances of a
serial program are distributed on a cluster.
A parallel subsystem provides input to each serial program instance,
and captures output from those programs, delivering that output to
users.
Because there are multiple programs on the clusters, operating on
multiple data, SPPS is a form of MIMD.
Cluster Programming Models
When many instances of a serial program operating in parallel,
those instances must coordinate work through a shared cluster
resource, such as distributed shared memory or a message
passing infrastructure.
The primitive operations that coordinate the work of
concurrently executing serial programs on a cluster define a
coordination language.
A coordination language is often described in terms of an
Application Programming Interface (API) to a parallel
subsystem
50
Example: LINDA
The Linda tuple-space system exploits distributed shared memory to facilitate the
parallel execution of a serial program on a cluster.
Linda defines primitive operations on a shared memory resource, allowing data
items – “tuples” – to be written to that memory, read from shared memory, and
deleted from shared memory.
A tuple is similar to a database relation.
A serial process with access to the shared memory writes data items to memory,
marking each item with an attribute indicating that that item requires processing.
Another process awaiting newly arriving tuples removes such an item from the
shared memory, performs computations on that data item, and deposits the results
into the shared memory.
The original submitter of the job then collects all the results from the tuple-space.
Each process operating on a tuple-space is typically a serial program, and the Linda
system facilitates the concurrent execution of many such program
Example: JAVASPACES
JavaSpaces is an object-oriented Linda system that takes advantage of Java’s
platform-independent code execution and mobile code facility.
Mobile code allows not just data, but also code to move from one cluster node to
another at runtime.
A master node runs a JavaSpace process, providing a shared memory resource to other
cluster nodes that act as workers.
When a worker removes a job request from the shared JavaSpace, the operating codes
for that job dynamically download to that worker. The worker executes that
downloaded code and places the output of that execution into the JavaSpace.
JavaSpaces, therefore, facilitates the automatic runtime distribution of code to cluster
nodes.
JavaSpaces also provides optional transactional access to the shared memory resource,
which is especially helpful in the case of very large clusters with frequent node
failures
Types of Cluster Computing
Computer clusters can generally be categorized as three types:
1. Highly available or fail-over
2. Load balancing
3. High-performance computing
The basic term of Cluster is that if a node declines, then applications and
services can be made available to different nodes.
54
2. Load-balancing clusters:
Workload is distributed across multiple installed servers in
the cluster network.
55
Application of Cluster Computing
Compute-intensive applications
Environment science
Grand Challenge application
Fluid mechanics
Ecosystem simulation
Biomedical images
Biomolecular
Applications that required high availability
Applications that required scaling of machine
Applications that required high performance
Case Study: Google Search Engine
Google uses cluster computing to meet the huge quantity of worldwide
search requests that comprise of a peak of thousands of queries per
second.
A single Google query needs to use at least tens of billions of
processing cycles and access a few hundred megabytes of data to
return satisfactory search result.
Google uses cluster computing as its solution to the high demand of
system resources since clusters have better price-performance ratios
than alternative high-performance computing platforms and use less
electrical power.
Google focuses on 2 important design factors: reliability and request
throughput.
Case Study: Google Search Engine
The services for Google are also replicated across multiple machines
in the clusters to provide the necessary availability.
All unused resources on multiple computers are pooled together and made
available for a single task.
61
Criteria for a Grid:
Coordinates resources that are not subject to centralized control.
Uses standard, open, general-purpose protocols and interfaces.
Delivers nontrivial qualities of service.
Reliability, Performance, Scalability, Security
Benefits
Exploit Underutilized resources
Resource load Balancing
Virtualize resources across an enterprise
Data Grids, Compute Grids
Enable collaboration for virtual organizations
Grid Applications
Data and computationally intensive applications:
This technology has been applied to computationally-intensive scientific,
mathematical, and academic problems like drug discovery, economic
forecasting, seismic analysis back office data processing in support of e-
commerce
A chemist may utilize hundreds of processors to screen thousands of
compounds per hour.
Teams of engineers worldwide pool resources to analyze terabytes of structural
data.
Meteorologists seek to visualize and analyze petabytes of climate data with
enormous computational demands.
Resource sharing
Computers, storage, sensors, networks, …
Sharing always conditional: issues of trust, policy, negotiation, payment, …
• Extragrid
– Resources of a consortium of organizations
connected through a (Virtual) Private Network
– Trust based on Business to Business contracts
• Intergrid
– Global sharing of resources through the internet
– Trust based on certification organizations
65
66
67
68
Computational Grid
Computational and data grids are two key components in grid computing that
play distinct roles.
Key Features:
Parallel Processing: Computational grids leverage parallel processing to
accelerate the execution of tasks by dividing them into smaller units that
can be processed simultaneously.
Load Balancing: The grid system ensures an even distribution of
computational tasks among available resources, preventing overload on
specific nodes and optimizing overall performance.
Computational Grid
However, compute grids are useful even if you don't need to split your
computation - they help you improve overall scalability and fault-
tolerance of your system by offloading your computations onto
most available nodes.
Ex. Mapreduce
Key Features:
Data Replication: Data grids often use replication to ensure data
availability and fault tolerance. Multiple copies of data are stored across
different nodes, reducing the risk of data loss due to hardware failures.
Data Access and Retrieval: Data grids provide mechanisms for
efficiently accessing and retrieving data from distributed storage
locations. This involves addressing challenges related to data transfer
speed and latency.
Data Caching: Frequently accessed data can be cached locally on
nodes to reduce the need for repeated data transfers across the grid.
Data Grid is the storage component of a grid environment. Scientific
and engineering applications require access to large amounts of data,
and often this data is widely distributed. A data grid provides
seamless access to the local or remote data required to complete
compute intensive calculations.
Example :
Biomedical informatics Research Network (BIRN),
Southern California earthquake Center (SCEC).
Methods of Grid Computing
1) Distributed Supercomputing
2) High-Throughput Computing
3) On-Demand Computing
4) Data-Intensive Computing
5) Collaborative Computing
6) Logistical Networking
Distributed Supercomputing
On-Demand Computing
Uses grid capabilities to meet short-term requirements for
resources that are not locally accessible.
Models real-time computing demands.
Collaborative Computing
Concerned primarily with enabling and enhancing human-to-
human interactions.
Applications are often structured in terms of a virtual shared space.
Data-Intensive Computing
The focus is on synthesizing new information from data that is
maintained in geographically distributed repositories, digital libraries,
and databases.
Processed jobs
Computation result
User
A User sends computation
Resource Broker
A Resource Broker distribute the
or data intensive application
to Global Grids in order to
jobs in an application to the Grid Grid Resources
resources based on user’s QoS Grid Resources (Cluster, PC,
speed up the execution of requirements and details of available Supercomputer, database,
the application. Grid resources for further executions. instruments, etc.) in the Global
Grid execute the user jobs.
Grid Middleware
Grids are typically managed by grid ware -
a special type of middleware that enable sharing and manage
grid components based on user requirements and resource
attributes (e.g., capacity, performance)
Software that connects other software components or
applications to provide the following functions:
Run applications on suitable available resources
– Brokering, Scheduling
Provide uniform, high-level access to resources
– Semantic interfaces
– Web Services, Service Oriented Architectures
Address inter-domain issues of security, policy, etc.
– Federated Identities
Provide application-level status
monitoring and control
Middlewares
Globus – Chicago Univ
Condor – Wisconsin Univ – High throughput
computing
Legion – Virginia Univ – virtual workspaces-
collaborative computing
IBP – Internet back pane – Tennesse Univ –
logistical networking
NetSolve – solving scientific problems in
heterogeneous env – high throughput & data
intensive.
The Hourglass Model
Focus on architecture issues
Applications
Propose set of core services as
Diverse global services
basic infrastructure
Used to construct high-level,
domain-specific solutions
(diverse)
Core
Design principles
services
Keep participation cost low
Enable local control
Support for adaptation
“IP hourglass” model Local OS
83
84
85
Layered Grid Architecture
(By Analogy to Internet Architecture)
Application
87
Resource layer defines protocols to initiate and control sharing of (local)
resources. Services defined at this level are gatekeeper, GRIS, along
with some user oriented application protocols from the Internet protocol
suite, such as file-transfer.
88
Scavenging Grid
While similar to computational grids, CPU scavenging grids have many regular computers.
The term scavenging describes the process of searching for available computing
resources in a network of regular computers.
While other network users access the computers for non-grid–related tasks, the grid software
uses these nodes when they are free. The scavenging grid is also known as CPU scavenging
or cycle scavenging.
89
Open Grid Services Architecture (OGSA)
OGSA is a standard and open architecture for Grid systems, which
was designed by academia and the industry.
91
What does OGSA do?
92
Open Grid Services Infrastructure
(OGSI)
OGSI stands for Open Grid Services Infrastructure which is a
specification that defines a set of interfaces and protocols for
building grid services.
93
What does OGSI do?
• OGSI provides a standardized programming model for building Grid
services. which allows developers to create powerful and scalable
distributed computing environments.
• By providing a consistent set of interfaces and protocols for managing
Grid resources, OGSI helps to simplify the development, deployment,
and management of Grid applications.
Example
• OGSA is the blueprint that the architect creates to show how the
building will look.
• OGSI is the structural design that the engineer creates to support the
architect’s vision for the building.
94
Simulation tools
GridSim – job scheduling
Bricks – scheduling
GangSim- Ganglia VO
The first step is to identify and create Grid resources. These resources
could be of different sizes and configurations and are created in relation
to the experiment to be carried out.
The third step is the creation of the Grid user, which is the entity that
interacts with the broker. This interaction leads to the coordination of the
scheduling requirements for the simulation experiment.
The final step is the entity responsible for allocating resources to the
jobs scheduled in the experiment
100
Globus Toolkit
101
102
Functional Modules of Globus for Core
Services
103
Client Globus Interaction
104
105
106