CC Module-1 Notes
CC Module-1 Notes
Module-1
Distributed System Models and Enabling Technologies:
Scalable Computing Over the Internet, Technologies for Network Based Systems, System
Models for Distributed and Cloud Computing, Software Environments for Distributed Systems
and Clouds, Performance, Security and Energy Efficiency.
Textbook1: Chapter1:1.1 to 1.5
Grids enable access to shared computing power and storage capacity from your desktop.
Clouds enable access to leased computing power and storage capacity from your desktop.
• Grids are an open sourcetechnology. Resource users and providers alike can understand and
contribute to the management of their grid
• Clouds are a proprietary technology. Only the resource provider knows exactly how their
cloud manages data, job queues, security requirements and so on.
• The concept of grids was proposed in 1995. The Open science grid (OSG) started in 1995 The
.IN
EDG (European Data Grid) project began in 2001.
• In the late 1990`s Oracle and EMC offered early private cloud solutions .However the term
cloud computing didn't gain prominence until 2007. o high-performance computing (HPC)
C
applications is no longer optimal for measuring system performance
N
• The emergence of computing clouds instead demands high-throughput computing (HTC)
systems built with parallel and distributed computing technologies
SY
• We have to upgrade data centers using fast servers, storage systems, and high-bandwidth
networks.
• From 1950 to 1970, a handful of mainframes, including the IBM 360 and CDC 6400
U
Instead of using a centralized computer to solve computational problems, a parallel and distributed
computing system uses multiple computers to solve large-scale problems over the Internet. Thus,
distributed computing becomes data-intensive and network-centric.
o From 1980 to 2000, massive numbers of portable computers and pervasive devices appeared in
both wired and wireless applications
o Since 1990, the use of both HPC and HTC systems hidden in clusters, grids, or Internet clouds has
proliferated
.IN
High-Performance Computing (HPC) and High-Throughput Computing (HTC) have evolved
significantly, driven by advances in clustering, P2P networks, and cloud computing.
HPC Evolution:
C
o Traditional supercomputers (MPPs) are being replaced by clusters of cooperative
computers for better resource sharing.
N
o HPC has focused on raw speed performance, progressing from Gflops (1990s) to
Pflops (2010s).
SY
o P2P networks facilitate distributed file sharing and content delivery using
globally distributed client machines.
VT
o HTC applications dominate areas like Internet searches and web services for
millions of users.
Market Shift from HPC to HTC:
o HTC systems address challenges beyond speed, including cost, energy efficiency,
security, and reliability.
Emerging Paradigms:
o Advances in virtualization have led to the rise of Internet clouds, enabling service-
oriented computing.
o Technologies like RFID, GPS, and sensors are fueling the growth of the Internet
of Things (IoT).
The transition from HPC to HTC marks a strategic shift in computing paradigms, focusing on
scalability, efficiency, and real-world usability over pure processing power.
Computing Paradigm Distinctions
Centralized computing
A computing paradigm where all computer resources are centralized in a single physical
system. In this setup, processors, memory, and storage are fully shared and tightly integrated
within one operating system. Many data centers and supercomputers operate as centralized
systems, but they are also utilized in parallel, distributed, and cloud computing applications.
• Parallel computing
In parallel computing, processors are either tightly coupled with shared memory or loosely
coupled with distributed memory. Communication occurs through shared memory or message
passing. A system that performs parallel computing is a parallel computer, and the programs
running on it are called parallel programs. Writing these programs is referred to as parallel
programming.
.IN
• Distributed computing studies distributed systems, which consist of multiple autonomous
computers with private memory communicating through a network via message passing.
Programs running in such systems are called distributed programs, and writing them is known
C
as distributed programming.
N
Cloud computing refers to a system of Internet-based resources that can be either centralized
or distributed. It uses parallel, distributed computing, or both, and can be established with
SY
physical or virtualized resources over large data centers. Some regard cloud computing as a
form of utility computing or service computing. Alternatively, terms such as concurrent
computing or concurrent programming are used within the high-tech community, typically
referring to the combination of parallel and distributed computing, although interpretations
U
• Ubiquitous computing refers to computing with pervasive devices at any place and time
using wired or wireless communication. The Internet of Things (IoT) is a networked
connection of everyday objects including computers, sensors, humans, etc. The IoT is
supported by Internet clouds to achieve ubiquitous computing with any object at any place
and time. Finally, the term Internet computing is even broader and covers all computing
paradigms over the Internet. This book covers all the aforementioned computing paradigms,
placing more emphasis on distributed and cloud computing and their working systems,
including the clusters, grids, P2P, and cloud systems.
Internet of Things The traditional Internet connects machines to machines or web pages to web
pages. The concept of the IoT was introduced in 1999 at MIT.
• The IoT refers to the networked interconnection of everyday objects, tools, devices, or computers.
One can view the IoT as a wireless network of sensors that interconnect all things in our daily life.
• It allows objects to be sensed and controlled remotely across existing network infrastructure
.IN
workload and service models.
4. Flexibility – Enabling HPC applications (scientific and engineering) and HTC applications
(business and cloud services) to run efficiently in distributed environments.
C
The future of distributed computing depends on scalable, efficient, and flexible architectures that
can meet the growing demand for computational power, throughput, and energy efficiency.
N
Scalable Computing Trends and New Paradigms
SY
each year), have shaped modern computing. The increasing affordability of commodity hardware
has also fueled the growth of large-scale distributed systems.
VT
Degrees of Parallelism
Parallelism in computing has evolved from:
Bit-Level Parallelism (BLP) – Transition from serial to word-level processing.
Domain Applications
Science & Engineering Weather forecasting, genomic analysis
Business, education, services industry, and E-commerce, banking, stock exchanges
health care
Internet and web services, and government Cybersecurity, digital governance, traffic
applications monitoring
Mission-Critical Systems Military, crisis management
HTC systems prioritize task throughput over raw speed, addressing challenges like cost, energy
efficiency, security, and reliability.
.IN
service. Cloud computing extends this concept, allowing distributed applications to run on edge
networks. C
N
SY
U
VT
Challenges include:
Virtualization middleware
New programming models
.IN
New technologies follow a hype cycle, progressing through:
IoT: Interconnects everyday objects (sensors, RFID, GPS) to enable real-time tracking and
automation.
CPS: Merges computation, communication, and control (3C) to create intelligent systems
for virtual and physical world interactions.
Both IoT and CPS will play a significant role in future cloud computing and smart
infrastructure development.
.IN
Modern multicore processors integrate dual, quad, six, or more processing cores to
enhance parallelism at the instruction level (ILP) and task level (TLP).
Processor speed growth has followed Moore’s Law, increasing from 1 MIPS (VAX 780,
C
1978) to 22,000 MIPS (Sun Niagara 2, 2008) and 159,000 MIPS (Intel Core i7 990x,
N
2011).
Clock rates have increased from 10 MHz (Intel 286) to 4 GHz (Pentium 4) but have
SY
Multicore processors house multiple processing units, each with private L1 cache and
shared L2/L3 cache for efficient data access.
Many-core GPUs (e.g., NVIDIA and AMD architectures) leverage hundreds to thousands
of cores, excelling in data-level parallelism (DLP) and graphics processing.
Example: Sun Niagara II – Built with eight cores, each supporting eight threads,
achieving a maximum parallelism of 64 threads.
Modern distributed computing systems rely on scalable multicore architectures and high-speed
networks to handle massive parallelism, optimize efficiency, and enhance overall performance.
.IN
Multicore CPUs continue to evolve from tens to hundreds of cores, but they face
challenges like the memory wall problem, limiting data-level parallelism (DLP).
C
Many-core GPUs, with hundreds to thousands of lightweight cores, excel in DLP and
task-level parallelism (TLP), making them ideal for massively parallel workloads.
N
Hybrid architectures are emerging, combining fat CPU cores and thin GPU cores on a
single chip for optimal performance.
SY
level (TLP):
.IN
o
Modern GPUs (e.g., NVIDIA CUDA, Tesla, and Fermi) feature hundreds of cores,
handling thousands of concurrent threads.
Example: The NVIDIA Fermi GPU has 512 CUDA cores and delivers 82.4 teraflops,
contributing to the performance of top supercomputers like Tianhe-1A.
.IN
C
GPU vs. CPU Performance and Power Efficiency
N
GPUs prioritize throughput, while CPUs optimize latency using cache hierarchies.
Power efficiency is a key advantage of GPUs – GPUs consume 1/10th of the power per
SY
3. Concurrency and Locality – Improving software and compiler support for parallel
execution.
4. System Resiliency – Ensuring fault tolerance in large-scale computing environments.
The shift towards hybrid architectures (CPU + GPU) and the rise of power-aware computing
models will drive the next generation of HPC, HTC, and cloud computing systems.
Memory Technology
.IN
DRAM capacity has increased 4x every three years (from 16 KB in 1976 to 64 GB in
2011).
Memory access speed has not kept pace, causing the memory wall problem, where CPUs
C
outpace memory access speeds.
N
SY
U
VT
Solid-State Drives (SSDs) provide significant speed improvements and durability (300,000
to 1 million write cycles per block).
Ethernet speeds have evolved from 10 Mbps (1979) to 100 Gbps (2011), with 1 Tbps
links expected in the future.
.IN
C
N
SY
.IN
C
N
SY
• First,theVMscanbemultiplexedbetweenhardwaremachines,asshowninFigure1.13(a).
• Second,aVMcanbesuspendedandstoredinstablestorage,asshowninFigure1.13(b).
• Third,asuspendedVMcanberesumedorprovisionedtoanewhardware platform,asshowninFigure1.13(c).
U
Suspension & Migration – VMs can be paused, saved, or migrated across different servers.
Provisioning – VMs can be dynamically deployed based on workload demand.
Virtual Infrastructure
Separates physical hardware from applications, enabling flexible resource management.
.IN
60% of data center costs go toward maintenance and management, emphasizing energy
efficiency over raw performance.
costs.
Cloud computing & parallel computing address the data deluge challenge.
MapReduce & Iterative MapReduce enable scalable data processing for big data and
machine learning applications.
The integration of memory, storage, networking, virtualization, and cloud data centers is
transforming distributed systems. By leveraging virtualization, scalable networking, and cloud
computing, modern infrastructures achieve higher efficiency, flexibility, and cost-effectiveness,
paving the way for future exascale computing.
Grids: Interconnect multiple clusters via WANs, allowing resource sharing across
thousands of computers.
P2P Networks: Form decentralized, cooperative networks with millions of nodes, used in
.IN
file sharing and content distribution.
Cluster Architecture
Server Clusters and System Models for Distributed Computing
1.3.1 Server Clusters and Interconnection Networks
Server clusters consist of multiple interconnected computers using high-bandwidth, low-latency
networks like Storage Area Networks (SANs), Local Area Networks (LANs), and InfiniBand.
.IN
These clusters are scalable, allowing thousands of nodes to be connected hierarchically.
C
N
SY
U
VT
Clusters are connected to the Internet via a VPN gateway, which assigns an IP address to
locate the cluster.
Each node operates independently, with its own OS, creating multiple system images
(MSI).
The cluster manages shared I/O devices and disk arrays, providing efficient resource
utilization.
Middleware supports features like high availability (HA), distributed memory sharing
(DSM), and job scheduling.
Virtual clusters can be dynamically created using virtualization, optimizing resource
allocation on demand.
.IN
Features FunctionalCharacterization FeasibleImplementations
Key challenges include efficient message passing, seamless fault tolerance, high
availability, and performance scalability.
Server clusters are scalable, high-performance computing systems that utilize networked
computing nodes for parallel and distributed processing. Achieving SSI and efficient
middleware support remains a key challenge in cluster computing. Virtual clusters and cloud
computing are evolving to enhance cluster flexibility and resource management.
.IN
C
Used for scientific and enterprise applications, including SETI@Home and astrophysics
N
simulations.
SY
Provides an integrated resource pool, enabling shared computing, data, and information
services.
.IN
Two types:
o Unstructured overlays – Randomly connected peers, requiring flooding for data
retrieval (high traffic).
C
o Structured overlays – Use predefined rules for routing and data lookup, improving
efficiency.
N
1.3.3.3 P2P Application Families
SY
SETI@Home, Genome@Home
Distributed Computing Security vulnerabilities, selfish nodes
Security & Privacy – No central control means increased risk of data breaches and
malware.
P2P networks offer robust and decentralized computing, but lack security and reliability,
making them suitable only for low-security applications like file sharing and collaborative tools.
Both grid computing and P2P networks provide scalable, distributed computing models. While
grids are used for structured, high-performance computing, P2P networks enable
decentralized, user-driven resource sharing. Future developments will focus on security,
standardization, and efficiency improvements.
.IN
1.3.4.1 Internet Clouds
C
N
SY
U
It offers elastic, scalable, and self-recovering computing power through server clusters
and large databases.
The cloud can be perceived as either a centralized resource pool or a distributed
computing platform.
.IN
o Provides computing infrastructure such as virtual machines (VMs), storage, and
networking.
o Users deploy and manage their applications but do not control the underlying
C
infrastructure.
N
o Examples: Amazon EC2, Google Compute Engine.
SY
Hybrid Cloud – Combines public and private clouds, optimizing cost and security.
6. Enhanced service and data discovery for content and service distribution.
7. Security and privacy improvements, though challenges remain.
.IN
Cloud computing fundamentally changes how applications and services are developed, deployed,
and accessed. With virtualization, scalability, and cost efficiency, it has become the backbone of
modern Internet services and enterprise computing. Future advancements will focus on
C
security, resource optimization, and hybrid cloud solutions.
N
1.4 Software Environments for Distributed Systems and Clouds
This section introduces Service-Oriented Architecture (SOA) and other key software
SY
environments that enable distributed and cloud computing systems. These environments define
how applications, services, and data interact within grids, clouds, and P2P networks.
SOA enables modular, scalable, and reusable software components that communicate over a
VT
Middleware tools (e.g., WebSphere MQ, Java Message Service) manage messaging,
security, and fault tolerance.
.IN
Web Services provide structured, standardized communication but face challenges in
protocol agreement and efficiency.
C
REST is flexible and scalable, better suited for fast-evolving environments.
Integration of Services – Distributed systems use Remote Method Invocation (RMI) or
N
RPCs to link services into larger applications.
SY
SOA has expanded from basic web services to complex multi-layered ecosystems:
Sensor Services (SS) – Devices like ZigBee, Bluetooth, GPS, and WiFi collect raw data.
Filter Services (FS) – Process data before feeding into computing, storage, or discovery
clouds.
Cloud Ecosystem – Integrates compute clouds, storage clouds, and discovery clouds for
managing large-scale applications.
SOA enables data transformation from raw data → useful information → knowledge →
wisdom → intelligent decisions.
SOA defines the foundation for web services, distributed systems, and cloud computing. By
integrating sensors, processing layers, and cloud resources, SOA provides a scalable, flexible
approach for modern computing applications. The future of distributed computing will rely on
intelligent data processing, automation, and service-driven architectures.
.IN
Grids use static resources, whereas clouds provide elastic, on-demand resources via
virtualization.
C
Clouds focus on automation and scalability, while grids are better for negotiated
resource allocation.
N
Hybrid models exist, such as clouds of grids, grids of clouds, and inter-cloud
architectures.
SY
Traditional distributed systems run independent OS instances on each node, resulting in multiple
system images. A distributed OS manages all resources coherently and efficiently across nodes.
VT
3. Truly Distributed OS – Provides single-system image (SSI) with full transparency across
resources.
.IN
Amoeba (microkernel approach) offers a lightweight distributed OS model.
clouds.
computing.
Users can switch between OS platforms and cloud services without being locked into
specific applications.
.IN
1.4.3.1 Message-Passing Interface (MPI)
Used for high-performance computing (HPC).
Google executes thousands of MapReduce jobs daily for large-scale data analysis.
1.4.3.3 Hadoop
U
Distributed OS models are evolving, with MOSIX2 enabling process migration and
resource sharing across Linux clusters.
.IN
distributed systems and has multiple dimensions:
1. Size Scalability – Expanding system resources (e.g., processors, memory, storage) to
improve performance.
C
2. Software Scalability – Upgrading OS, compilers, and libraries to accommodate larger
systems.
N
3. Application Scalability – Increasing problem size to match system capacity for cost-
SY
effectiveness.
Speedup Formula:
where α is the fraction of the workload that is sequential.
Even with hundreds of processors, speedup is limited if sequential execution (α) is high.
Problem with Fixed Workload
In Amdahl’s law, we have assumed the same amount of workload for both sequential and parallel
execution of the program with a fixed problem size or data set. This was called fixed-workload
speedup by Hwang and Xu [14]. To execute a fixed workload on n processors, parallel processing
may lead to a system efficiency defined as follows:
.IN
1.5.1.6 Gustafson’s Law (Scaled Workload Scaling)
Instead of fixing workload size, this model scales the problem to match available
processors.
C
N
Speedup Formula:
ThisspeedupisknownasGustafson’slaw.ByfixingtheparallelexecutiontimeatlevelW,the
SY
followingefficiencyexpression isobtained:
U
More efficient for large clusters, as workload scales dynamically with system size.
VT
System availability depends on Mean Time to Failure (MTTF) and Mean Time to
Repair (MTTR): Availability=MTTF/(MTTF+MTTR)
Eliminating single points of failure (e.g., hardware redundancy, fault isolation) improves
availability.
P2P networks are highly scalable but have low availability due to frequent peer failures.
Grids and clouds offer better fault isolation and thus higher availability than traditional
clusters.
Amdahl’s Law limits speedup for fixed workloads, while Gustafson’s Law optimizes
large-scale computing.
High availability requires redundancy, fault tolerance, and system design
improvements.
Clouds and grids balance scalability and availability better than traditional SMP or
NUMA systems.
Network Threats, Data Integrity, and Energy Efficiency
.IN
This section highlights security challenges, energy efficiency concerns, and mitigation strategies
in distributed computing systems, including clusters, grids, clouds, and P2P networks.
Improper Authentication – Allows attackers to steal resources, modify data, and conduct
replay attacks.
IaaS: Users are responsible for most security aspects, while providers ensure availability.
Content poisoning and timestamped tokens help detect piracy and protect digital rights.
.IN
1.5.3.4 System Defense Technologies
Three generations of network security have evolved:
1. Prevention-based – Access control, cryptography.
C
2. Detection-based – Firewalls, intrusion detection systems (IDS), Public Key Infrastructure
N
(PKI).
3. Intelligent response systems – AI-driven threat detection and response.
SY
Cloud security responsibilities vary based on the service model (SaaS, PaaS, IaaS).
Global energy cost of idle servers: $3.8 billion annually, with 11.8 million tons of CO₂
emissions.
IT departments must identify underutilized servers to reduce waste.
.IN
C
N
1. Application Layer – Optimize software to balance performance and energy consumption.
SY
3. Resource Layer – Use Dynamic Power Management (DPM) and Dynamic Voltage-
Frequency Scaling (DVFS).
U
.IN
C
N
SY
U
VT