Module-1 Notes
Module-1 Notes
Module-01
Distributed System Models and Enabling Technologies
Scalable Computing Over the Internet
Scalable Computing over the Internet refers to the ability to dynamically allocate and
manage computing resources over the internet in a way that can handle growing demands.
.IN
This involves distributing computational tasks across multiple systems (often using cloud
computing or distributed computing platforms) to accommodate varying workloads.
C
Evolution of Computing Technology
Over the last 60 years, computing has evolved through multiple platforms and
N
environments.
SY
Platform Evolution
First Generation (1950-1970): Mainframes like IBM 360 and CDC 6400.
Second Generation (1960-1980): Minicomputers like DEC PDP 11 and VAX.
Third Generation (1970-1990): Personal computers with VLSI microprocessors.
Fourth Generation (1980-2000): Portable and wireless computing devices.
Fifth Generation (1990-present): HPC and HTC systems in clusters, grids, and cloud
computing.
FIGURE 1.1 Evolutionary trend toward parallel, distributed, and cloud computing with clusters,
MPPs, P2P networks, grids, clouds, web services, and the Internet of Things.
.IN
Clusters: Homogeneous compute nodes working together.
Grids: Wide-area distributed computing infrastructures.
P2P Networks: Client machines globally distributed for file sharing and content
delivery.
C
Cloud Computing: Utilizes clusters, grids, and P2P technologies.
N
Future Computing Needs and Design Objectives
SY
Transparency in data access, resource allocation, job execution, and failure recovery is
essential.
Application domains:
.IN
computing.
Challenges include network saturation, security threats, and lack of software
support.
C
OR
N
Both HPC and HTC systems desire transparency in many application aspects. For example,
data access, resource allocation, process location, concurrency in execution, job replication,
SY
and failure recovery should be made transparent to both users and system management.
Table 1.1 highlights a few key applications that have driven the development of parallel and
distributed systems over the years. These applications spread across many important domains
U
in science, engineering, business, education, health care, traffic control, Internet and web
services, military, and government applications. Almost all applications demand computing
VT
economics, web-scale data collection, system reliability, and scalable performance. For
example, distributed transaction processing is often practiced in the banking and finance
industry. Transactions represent 90 percent of the existing market for reliable banking
systems. Users must deal with multiple database servers in distributed transactions.
Maintaining the consistency of replicated transaction records is crucial in real-time banking
services. Other complications include lack of software support, network saturation, and
security threats in these applications.
.IN
speech recognition, predictive analytics, and media tablets.
Communication Models:
H2H (Human-to-Human)
H2T (Human-to-Thing)
T2T (Thing-to-Thing)
Smart Earth Vision: IoT aims to create intelligent cities, clean energy, better healthcare, and
sustainable environments.
CPS integrates computation, communication, and control (3C) into a closed intelligent
VT
Features:
IoT vs. CPS: IoT focuses on networked objects, while CPS focuses on VR
applications in the real world.
CPS enhances automation, intelligence, and interactivity in physical
environments.
Development:
Discusses hardware, software, and network technologies for distributed computing. Focuses
on designing distributed operating systems for handling massive parallelism.
Modern CPUs use multicore architecture (dual, quad, six, or more cores).
Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP) improve
.IN
performance.
• Moore’s Law holds for CPU growth, but clock rates are limited (~5 GHz max) due to heat and
power constraints.
U
Caches are small, fast storage layers closer to the processor than the main memory. A typical
hierarchy includes:
.IN
How it Works:
Data Locality: Frequently accessed data is stored in the caches, exploiting spatial and
C
temporal locality to speed up access.
Multilevel Approach: The cache hierarchy ensures faster access to data by maintaining
N
different levels of speed and size. L1 handles immediate needs, while L2 and L3 back it
up.
SY
In GPUs:
VT
GPUs also employ cache hierarchies, but their design is optimized for parallel workloads.
o Shared Memory: A fast, programmable memory shared among threads within a
core.
o L1 and L2 Caches: Designed to handle specific workloads like texture and global
memory accesses efficiently.
In both CPUs and GPUs, cache hierarchies reduce dependence on the slower main memory,
improving overall performance.
CPUs may scale to hundreds of cores but face memory wall limitations.
GPUs (Graphics Processing Units) are designed for massive parallelism and data-
level parallelism (DLP).
x86-based processors dominate HPC and HTC systems.
Trend towards heterogeneous processors combining CPU and GPU cores on a
single chip.
Dept. of CSE, SVIT Page 10
Multithreading Technology
.IN
C
N
SY
U
VT
Each processor category employs different scheduling patterns and techniques to exploit
Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP).
Fine-grain and coarse-grain multithreading focus on reducing idle cycles, while
superscalar, CMP, and SMT enhance parallel execution in various ways.
Blank slots in the functional units represent inefficiency, with SMT generally showing
the least idle time.
.IN
o Contains two independent processing cores, each functioning as a two-way
superscalar processor.
C
o Executes instructions from different threads on separate cores, fully exploiting
thread-level parallelism (TLP).
N
5. Simultaneous Multithreaded (SMT) Processor:
SY
same cycle.
VT
• GPUs were initially graphics accelerators, now widely used for HPC and AI.
.IN
This section highlights the interaction between a CPU and a GPU for parallel execution of
floating-point operations:
C
1. CPU as the Controller:
N
o The CPU, with its multicore architecture, has limited parallelism.
o It offloads floating-point kernel computations to the GPU for massive data
SY
processing.
2. GPU as the Workhorse:
o The GPU features a many-core architecture with hundreds of simple processing
U
3. Key Process:
o The CPU instructs the GPU to perform large-scale computations.
o Efficient communication requires matching bandwidths between the on-board
main memory and the GPU's on-chip memory.
4. NVIDIA CUDA Framework:
o This process is commonly implemented using CUDA programming.
o Examples of GPUs for such tasks include NVIDIA's GeForce 8800, Tesla, and
Fermi GPUs.
The setup ensures that the computational workload is distributed effectively, leveraging the
strengths of both CPU and GPU architectures.
Example : the NVIDIA Fermi GPU Chip with 512 CUDA Cores
The NVIDIA Fermi GPU architecture is designed for high performance, efficiency, and
VT
Structure:
o Built with 16 streaming multiprocessors (SMs), each containing 32 CUDA cores,
for a total of 512 CUDA cores.
o Each SM supports parallel execution, enabling massive computational power.
Key Features:
o Optimized for floating-point operations and parallel processing.
o Allows for high memory bandwidth between on-chip GPU memory and external
device memory.
o Enhanced programmability using NVIDIA’s CUDA platform.
Power Efficiency:
GPUs are optimized for high throughput and parallel processing, with explicit on-chip
.IN
memory management.
CPUs are designed for latency optimization in caches and memory, limiting their
C
parallelism.
N
Performance/Power Ratio:
SY
In 2010, GPUs achieved a performance of 5 Gflops/watt per core, far exceeding CPUs at
less than 1 Gflop/watt.
U
• Memory wall problem: CPU speed increases faster than memory access speed, creating a
performance gap.
U
• Hard drive capacity growth: 260 MB (1981) → 250 GB (2004) → 3 TB (2011), increasing 10×
every 8 years.
VT
Storage trends:
Small Clusters:
Often built using a multiport Gigabit Ethernet switch and copper cables.
Suitable when large distributed storage is not needed.
(2013).
Network performance grows 2× per year, surpassing Moore’s Law for CPUs.
VT
Conventional Computers:
Purpose of Virtualization:
.IN
Virtual storage and networking.
C
N
SY
U
VT
Virtual Machines
.IN
Example: An x86 desktop running Windows OS as the host.
VM Architectures:
VT
(b) Native (Bare-Metal) VM: Hypervisor (e.g., XEN) operates directly on hardware in
privileged mode. Guest OS could differ, like Linux on Windows hardware.
(c) Hosted VM: VMM runs in nonprivileged mode on a host OS. No need to modify the
host.
(d) Dual-Mode VM: Splits VMM functions between user level and supervisor level. May
require minor modifications to the host OS.
Key Benefits:
VM Primitive Operations
VM Operations :
Multiplexing: VMs are distributed across multiple hardware machines for efficient
resource use.
Dept. of CSE, SVIT Page 20
Suspension: A VM can be paused and stored in stable storage for later use.
Provision (Resumption): A suspended VM can be resumed on a new hardware
platform.
Live Migration: A VM can be moved from one hardware platform to another without
disruption.
Benefits:
Hardware Independence: VMs can run on any available hardware platform, regardless
of the underlying OS or architecture.
Flexibility: Enables seamless porting of distributed applications.
Enhanced Efficiency: Consolidates multiple server functions on a single hardware
.IN
platform, reducing server sprawl.
Improved Utilization: VMware estimates server utilization increases from 5–15% to 60–
80% with this approach.
C
N
SY
U
VT
Virtual Infrastructures
Virtual Infrastructure:
Benefits:
Future Support:
Virtualization is pivotal for clusters, clouds, and grids, as discussed in later chapters.
.IN
Data Center Trends
Shows customer spending, server installation growth, and cost breakdown over the
years.
C
Highlights the "virtualization management gap," emphasizing the growing importance of
virtualization technologies.
N
SY
U
VT
• Cloud Architecture: Uses commodity hardware (x86 processors, low-cost storage, Gigabit
Ethernet).
• Design Priorities: Cost-efficiency over raw performance, focusing on storage and energy
savings.
Dept. of CSE, SVIT Page 22
Data Centre Growth & Cost Breakdown
.IN
Uses commodity x86 servers and Ethernet networks instead of expensive hardware.
3. SOA, Web 2.0, and Mashups: Advances in web technologies drive cloud adoption.
• Data Deluge: Massive data from sensors, web, simulations, requiring advanced data
management.
• Multicore & GPU Clusters: Boost computational power for scientific research.
• Cloud Computing & Data Science Convergence: Revolutionizes computing architecture and
programming models.
.IN
C
N
SY
Node Interconnection:
U
LAN switches connect hundreds of machines; WANs link local clusters into massive
systems.
Massive Systems:
.IN
impressive results in handling heavy workloads with large data sets.
Cluster Architecture
C
Cluster Architecture:
N
Built around a low-latency, high-bandwidth interconnection network, such as SAN (e.g.,
Myrinet) or LAN (e.g., Ethernet).
SY
Hierarchical construction allows for scalable clusters using Gigabit Ethernet, Myrinet, or
InfiniBand switches.
The cluster is connected to the Internet via a virtual private network (VPN) gateway.
VT
Node Management:
Most clusters have loosely coupled node computers, each managed by its own operating
system (OS).
This results in multiple system images, as nodes operate autonomously under different
OS controls.
Shared Resources:
• Middleware is essential for SSI, high availability (HA), and distributed shared memory (DSM).
• Benefits of clusters: Scalability, efficient message passing, fault tolerance, and job management
.IN
C
N
SY
Over the past 30 years, there has been significant progression in computing services:
Internet services like Telnet allow local computers to connect to remote ones.
U
distant computers.
Computational Grids
.IN
C
N
SY
U
VT
P2P Systems
.IN
In a P2P system, each node acts as both a client and a server, sharing system
resources.
C
Peer machines are simply client computers connected via the Internet, with the
autonomy to join or leave freely.
N
No master-slave relationship, central coordination, or global database exists.
The system is self-organizing with distributed control.
P2P networks form ad hoc physical networks across Internet domains, using
SY
Overlay Networks
VT
1. Unstructured: Random graph, uses flooding for queries, causing high traffic and
unpredictable searches.
2. Structured: Follows specific topology and rules, with routing mechanisms for better
efficiency.
MSN, Skype).
Distributed Computing: Specific applications for computing power, such as
SETI@home with 25 Tflops over 3 million Internet hosts.
Application Platforms: Systems like JXTA, .NET, and FightingAID@home offer support
for naming, discovery, security, communication, and resource aggregation.
U
P2P systems face heterogeneity issues in hardware, software, and network requirements.
Data locality, network proximity, and interoperability are key design objectives in P2P
applications.
Routing efficiency, fault tolerance, failure management, and load balancing significantly
impact performance.
Trust issues exist due to security, privacy, and copyright concerns among peers.
P2P networks improve robustness by avoiding a single point of failure and replicating
data across nodes.
Lack of centralization makes management complex.
Security risks arise as any client can potentially cause damage or abuse.
P2P networks are suitable for low-security applications without sensitive data.
IT trends are moving computing and data from desktops to large data centers.
Cloud computing emerged to handle the data explosion, offering on-demand software,
hardware, and data as a service.
U
Internet Clouds
Cloud Computing:
.IN
1. Infrastructure as a Service (IaaS): Provides servers, storage, networks, and data center
C
resources. Users deploy and run VMs but don’t manage the cloud infrastructure.
2. Platform as a Service (PaaS): Enables deployment of user-built applications on a cloud
N
platform. Includes middleware, databases, tools, and APIs.
3. Software as a Service (SaaS): Browser-based software for business processes (CRM,
SY
ERP, HR, etc.), eliminating upfront costs for users and reducing hosting costs for
providers.
Private, public, managed, and hybrid, each with differing security implications.
VT
Cloud Advantages:
Entity interfaces include WSDL, Java methods, and CORBA IDL for distributed systems.
High-level communication systems use SOAP, RMI, and IIOP for RPC, fault recovery,
and routing.
Middleware like WebSphere MQ or JMS supports routing and virtualization.
Dept. of CSE, SVIT Page 33
Fault tolerance is achieved via WSRM, adapting OSI layer features for entity
abstractions.
Security is implemented using OSI concepts like IPsec and secure sockets.
Higher-level services provide registries, metadata, and entity management.
Discovery tools like JNDI, UDDI, LDAP, and ebXML enable service discovery.
Management services include CORBA Life Cycle, Enterprise JavaBeans models, and
Jini’s lifetime model.
Shared memory models simplify information exchange.
Distributed systems improve performance using multiple CPUs and support software
reuse.
Older approaches like CORBA are being replaced by SOAP, XML, and REST in modern
.IN
systems.
C
N
SY
U
VT
Loose coupling and heterogeneous implementations make services more attractive than
distributed objects.
Service architectures include web services and REST systems, each with distinct
approaches to interoperability.
REST Systems:
.IN
CORBA and Java are linked via RPCs.
Sensors represent entities producing data, while grids and clouds handle multiple
message-based inputs and outputs.
.IN
business models like two-phase transactions.
Workflow standards and approaches include BPEL, Pegasus, Taverna, Kepler, Trident,
and Swift.
C
Grids use static resources, while clouds focus on elastic resources.
N
Key difference: Dynamic resource allocation via virtualization and autonomic
computing.
SY
Grids can be built from multiple clouds, offering better support for negotiated resource
allocation.
U
These combinations result in systems like clouds of clouds, grids of clouds, clouds of
grids, or inter-clouds in SOA architectures.
VT
Distributed systems are loosely coupled, with multiple system images due to
independent operating systems on node machines.
A distributed OS promotes resource sharing and efficient communication among nodes.
Such a system is typically a closed system, relying on message passing and Remote
Procedure Calls (RPCs) for internode communication.
A distributed OS is essential for improving the performance, efficiency, and flexibility
of distributed applications.
Network OS: Built over heterogeneous OS platforms, offers the lowest transparency,
and relies on file sharing for communication.
Middleware: Provides limited resource sharing, similar to MOSIX/OS for clustered
systems.
Truly Distributed OS: Achieves high resource usage and system transparency.
.IN
C
N
SY
U
VT
.IN
Manages Linux clusters or grids of multiple clusters with resource discovery and
process migration.
Allows flexible grid management, enabling resource sharing among cluster owners.
MOSIX-enabled grids can scale indefinitely, provided there is trust among cluster
C
owners.
It is being applied to manage resources in Linux clusters, GPU clusters, grids, and
N
clouds (when VMs are used).
SY
Four programming models for distributed computing aim for scalable performance and
application flexibility.
Hadoop:
.IN
C
N
SY
U
VT
Designed to support clusters, grid systems, and P2P systems with enhanced web
services and utility computing applications.
PVM (Parallel Virtual Machine): Another low-level primitive for distributed
programming.
U
Both MPI and PVM are discussed in detail by Hwang and Xu [28].
VT
Hadoop Library
Hadoop, originally developed by a Yahoo! group, is a software platform for writing and
running applications over vast distributed data.
It can scale easily to store and process petabytes of web data.
Hadoop is economical, offering an open-source version of MapReduce that reduces
overhead.
It is efficient, achieving high parallelism across many commodity nodes.
It is reliable, automatically maintaining multiple data copies for redeployment after
unexpected system failures.
Globus is a middleware library developed by the U.S. Argonne National Laboratory and
USC Information Science Institute.
Implements OGSA standards for resource discovery, allocation, and security in grid
environments.
Supports multisite mutual authentication using PKI certificates.
The current version, GT 4, has been in use since 2008.
IBM has extended Globus for business applications.
Performance Metrics
Processor and network performance are estimated using metrics like CPU speed in
MIPS and network bandwidth in Mbps.
System throughput is measured in MIPS, Tflops, or TPS.
Additional performance measures include job response time and network latency.
A preferred interconnection network has low latency and high bandwidth.
System overhead factors include OS boot time, compile time, I/O data rate, and
runtime support system.
Other performance metrics involve QoS for Internet/web services, system availability
.IN
and dependability, and security resilience against network attacks.
Dimensions of Scalability
C
Scalable performance requires backward compatibility with existing hardware and
software.
N
Overdesign can be cost-ineffective, with scaling influenced by practical factors.
Size scalability involves adding processors, cache, memory, storage, or I/O channels to
SY
Application scalability aligns problem size with machine size for efficiency and cost-
effectiveness, often by enlarging problem size.
VT
.IN
C
N
SY
Amdahl’s Law
U
Amdahl's Law explains how much faster a program can run when split across multiple
VT
processors.
Some part of the program (called the sequential bottleneck) must always run on a
single processor, and it can't be parallelized. This is represented by α\alphaα.
The rest of the program can run in parallel across nnn processors. This is
(1−α)(1 - \alpha)(1−α).
As you add more processors, the speedup increases, but there's a limit. The maximum
speedup happens when α\alphaα (the part that can't be parallelized) is zero.
For example, if 25% of the code can't be parallelized (α=0.25\alpha = 0.25α=0.25), the best
speedup you'll ever get is 4, even with hundreds of processors.
In Amdahl's Law, the assumption is made that the workload remains the same for both
.IN
sequential and parallel execution, with a fixed problem size or data set. This is known as
fixed-workload speedup. C
The system efficiency EEE is defined as the speedup SSS divided by the number of
processors nnn, as follows:
N
E=Sn=1αn+(1−α)E = \frac{S}{n} = \frac{1}{\alpha n + (1 - \alpha)}E=nS=αn+(1−α)1
SY
In many cases, especially with large clusters, the system efficiency becomes quite low. For
instance, when executing the program on 256 processors, the efficiency is:
U
This low efficiency occurs because only a few processors (e.g., 4) are actively used, while the
majority of the processors remain idle. As the cluster size increases, the parallel processing
becomes less efficient, mainly due to the sequential bottleneck (the part of the code that can't
be parallelized).
Gustafson’s Law
To improve efficiency when using large clusters, scaled-workload speedup is used, which
scales the problem size to match the cluster's capabilities. This approach, proposed by John
Gustafson (1988), adjusts the workload based on the number of processors.
Scaled-Workload Speedup:
Let WWW be the original workload. When using nnn processors, the workload is scaled to
W′=αW+(1−α)nWW' = \alpha W + (1 - \alpha) n WW′=αW+(1−α)nW, where only the parallelizable
portion of the workload is scaled by nnn.
The efficiency E′E'E′ for the scaled workload on nnn processors is:
.IN
E′=S′n=αn+(1−α)E' = \frac{S'}{n} = \frac{\alpha}{n} + (1 - \alpha)E′=nS′=nα+(1−α)
For example, if the workload is scaled for a 256-node cluster with α=0.25\alpha = 0.25α=0.25,
the efficiency is:
C
E′=0.25256+0.75≈0.751E' = \frac{0.25}{256} + 0.75 \approx 0.751E′=2560.25+0.75≈0.751
N
When to Apply Amdahl's vs. Gustafson's Law:
SY
Amdahl’s Law should be used for fixed workloads, where the problem size doesn't change.
Gustafson’s Law should be used when the workload is scaled according to the number of
processors, as it leads to better efficiency for large clusters.
U
VT