CC Module-1 Notes
CC Module-1 Notes
Module-1
Distributed System Models and Enabling Technologies:
Scalable Computing Over the Internet, Technologies for Network Based Systems, System Models
for Distributed and Cloud Computing, Software Environments for Distributed Systems and
Clouds, Performance, Security and Energy Efficiency.
Textbook 1: Chapter 1: 1.1 to 1.5
• Grids enable access to shared computing power and storage capacity from your desktop.
• Clouds enable access to leased computing power and storage capacity from your desktop.
• Grids are an open source technology. Resource users and providers alike can understand and
contribute to the management of their grid
• Clouds are a proprietary technology. Only the resource provider knows exactly how their cloud
manages data, job queues, security requirements and so on.
• The concept of grids was proposed in 1995. The Open science grid (OSG) started in 1995 The
EDG (European Data Grid) project began in 2001.
• In the late 1990`s Oracle and EMC offered early private cloud solutions . However the term cloud
computing didn't gain prominence until 2007. o high-performance computing (HPC) applications
is no longer optimal for measuring system performance
• The emergence of computing clouds instead demands high-throughput computing (HTC)
systems built with parallel and distributed computing technologies
• We have to upgrade data centers using fast servers, storage systems, and high-bandwidth
networks.
• From 1950 to 1970, a handful of mainframes, including the IBM 360 and CDC 6400
1.1 SCALABLE COMPUTING OVER THE INTERNET
Instead of using a centralized computer to solve computational problems, a parallel and distributed
computing system uses multiple computers to solve large-scale problems over the Internet. Thus,
distributed computing becomes data-intensive and network-centric.
The Age of Internet Computing
The Platform Evolution
o From 1960 to 1980, lower-cost minicomputers such as the DEC PDP 11 and VAX Series
o From 1970 to 1990, we saw widespread use of personal computers built with VLSI microprocessors.
o From 1980 to 2000, massive numbers of portable computers and pervasive devices appeared in both
wired and wireless applications
o Since 1990, the use of both HPC and HTC systems hidden in clusters, grids, or Internet clouds has
proliferated
The transition from HPC to HTC marks a strategic shift in computing paradigms, focusing on
scalability, efficiency, and real-world usability over pure processing power.
Computing Paradigm Distinctions
Centralized computing
A computing paradigm where all computer resources are centralized in a single physical
system. In this setup, processors, memory, and storage are fully shared and tightly integrated
within one operating system. Many data centers and supercomputers operate as centralized
systems, but they are also utilized in parallel, distributed, and cloud computing applications.
• Parallel computing
In parallel computing, processors are either tightly coupled with shared memory or loosely
coupled with distributed memory. Communication occurs through shared memory or message
passing. A system that performs parallel computing is a parallel computer, and the programs
running on it are called parallel programs. Writing these programs is referred to as parallel
programming.
• Distributed computing studies distributed systems, which consist of multiple autonomous
computers with private memory communicating through a network via message passing.
Programs running in such systems are called distributed programs, and writing them is known
as distributed programming.
Cloud computing refers to a system of Internet-based resources that can be either centralized
or distributed. It uses parallel, distributed computing, or both, and can be established with
physical or virtualized resources over large data centers. Some regard cloud computing as a
form of utility computing or service computing. Alternatively, terms such as concurrent
computing or concurrent programming are used within the high-tech community, typically
referring to the combination of parallel and distributed computing, although interpretations
may vary among practitioners.
• Ubiquitous computing refers to computing with pervasive devices at any place and time
using wired or wireless communication. The Internet of Things (IoT) is a networked connection
of everyday objects including computers, sensors, humans, etc. The IoT is supported by Internet
clouds to achieve ubiquitous computing with any object at any place and time. Finally, the term
Internet computing is even broader and covers all computing paradigms over the Internet. This
book covers all the aforementioned computing paradigms, placing more emphasis on
distributed and cloud computing and their working systems, including the clusters, grids, P2P,
and cloud systems.
Internet of Things The traditional Internet connects machines to machines or web pages to web
pages. The concept of the IoT was introduced in 1999 at MIT.
• The IoT refers to the networked interconnection of everyday objects, tools, devices, or computers.
One can view the IoT as a wireless network of sensors that interconnect all things in our daily life.
• It allows objects to be sensed and controlled remotely across existing network infrastructure
HTC systems prioritize task throughput over raw speed, addressing challenges like cost, energy
efficiency, security, and reliability.
The Shift Toward Utility Computing
Utility computing follows a pay-per-use model where computing resources are delivered as a service.
Cloud computing extends this concept, allowing distributed applications to run on edge networks.
Challenges include:
• Efficient network processors
• Scalable storage and memory
• Virtualization middleware
• New programming models
The Hype Cycle of Emerging Technologies
New technologies follow a hype cycle, progressing through:
1. Technology Trigger – Early development and research.
2. Peak of Inflated Expectations – High expectations but unproven benefits.
3. Trough of Disillusionment – Realization of limitations.
4. Slope of Enlightenment – Gradual improvements.
5. Plateau of Productivity – Mainstream adoption.
For example, in 2010, cloud computing was moving toward mainstream adoption, while broadband
over power lines was expected to become obsolete.
The Internet of Things (IoT) and Cyber-Physical Systems (CPS)
• IoT: Interconnects everyday objects (sensors, RFID, GPS) to enable real-time tracking and
automation.
• CPS: Merges computation, communication, and control (3C) to create intelligent systems
for virtual and physical world interactions.
Both IoT and CPS will play a significant role in future cloud computing and smart infrastructure
development.
1.2 Technologies for Network-Based Systems
Advancements in multicore CPUs and multithreading technologies have played a crucial role in
the development of high-performance computing (HPC) and high-throughput computing (HTC).
Advances in CPU Processors
• Modern multicore processors integrate dual, quad, six, or more processing cores to
enhance parallelism at the instruction level (ILP) and task level (TLP).
• Processor speed growth has followed Moore’s Law, increasing from 1 MIPS (VAX 780,
1978) to 22,000 MIPS (Sun Niagara 2, 2008) and 159,000 MIPS (Intel Core i7 990x, 2011).
• Clock rates have increased from 10 MHz (Intel 286) to 4 GHz (Pentium 4) but have
stabilized due to heat and power limitations.
Multicore CPU and Many-Core GPU Architectures
• Multicore processors house multiple processing units, each with private L1 cache and
shared L2/L3 cache for efficient data access.
• Many-core GPUs (e.g., NVIDIA and AMD architectures) leverage hundreds to thousands
of cores, excelling in data-level parallelism (DLP) and graphics processing.
• Example: Sun Niagara II – Built with eight cores, each supporting eight threads, achieving
a maximum parallelism of 64 threads.
Key Trends in Processor and Network Technology
• Multicore chips continue to evolve with improved caching mechanisms and increased
processing cores per chip.
• Network speeds have improved from Ethernet (10 Mbps) to Gigabit Ethernet (1 Gbps)
and beyond 100 Gbps to support high-speed data communication.
Modern distributed computing systems rely on scalable multicore architectures and high-speed
networks to handle massive parallelism, optimize efficiency, and enhance overall performance.
Multicore CPU and Many-Core GPU Architectures
Advancements in multicore CPUs and many-core GPUs have significantly influenced modern
high-performance computing (HPC) and high-throughput computing (HTC) systems. As CPUs
approach their parallelism limits, GPUs have emerged as powerful alternatives for massive
parallelism and high computational efficiency.
Multicore CPU and Many-Core GPU Trends
• Multicore CPUs continue to evolve from tens to hundreds of cores, but they face challenges
like the memory wall problem, limiting data-level parallelism (DLP).
• Many-core GPUs, with hundreds to thousands of lightweight cores, excel in DLP and
task-level parallelism (TLP), making them ideal for massively parallel workloads.
• Hybrid architectures are emerging, combining fat CPU cores and thin GPU cores on a
single chip for optimal performance.
Multithreading Technologies in Modern CPUs
• Different microarchitectures exploit parallelism at instruction-level (ILP) and thread-
level (TLP):
o Superscalar Processors – Execute multiple instructions per cycle.
o Fine-Grained Multithreading – Switches between threads every cycle.
o Coarse-Grained Multithreading – Runs one thread for multiple cycles before
switching.
o Simultaneous Multithreading (SMT) – Executes multiple threads in the same cycle.
• Modern GPUs (e.g., NVIDIA CUDA, Tesla, and Fermi) feature hundreds of cores,
handling thousands of concurrent threads.
• Example: The NVIDIA Fermi GPU has 512 CUDA cores and delivers 82.4 teraflops,
contributing to the performance of top supercomputers like Tianhe-1A.
• First, the VMs can be multiplexed between hardware machines, as shown in Figure 1.13(a).
• Second, a VM can be suspended and stored in stable storage, as shown in Figure 1.13(b).
• Third, a suspended VM can be resumed or provisioned to a new hardware platform, as shown in Figure 1.13(c).
• Finally, a VM can be migrated from one hardware platform to another, as shown in Figure 1.13(d).
The integration of memory, storage, networking, virtualization, and cloud data centers is
transforming distributed systems. By leveraging virtualization, scalable networking, and cloud
computing, modern infrastructures achieve higher efficiency, flexibility, and cost-effectiveness,
paving the way for future exascale computing.
• Clusters are connected to the Internet via a VPN gateway, which assigns an IP address to
locate the cluster.
• Each node operates independently, with its own OS, creating multiple system images
(MSI).
• The cluster manages shared I/O devices and disk arrays, providing efficient resource
utilization.
1.3.1.2 Single-System Image (SSI)
An ideal cluster should merge multiple system images into a single-system image (SSI), where all
nodes appear as a single powerful machine.
• SSI is achieved through middleware or specialized OS support, enabling CPU, memory,
and I/O sharing across all cluster nodes.
• Clusters without SSI function as a collection of independent computers rather than a unified
system.
Availability and Support Hardware and software support f o r Failover, failback, check pointing,
sustained HA in cluster rollback recovery, nonstop OS, etc.
Hardware Fault Tolerance Automated failure management to Component redundancy, hot
eliminate all single points of failure swapping, RAID, multiple
power supplies, etc.
Single System Image (SSI) Achieving SSI at functional level with Hardware mechanisms or middleware
hardware a nd software support, support t o achieve DSM at coherent
middleware, or OS extensions c a c h e level
Efficient Communications To reduce me s sa ge -passing system Fast message passing, active
overhead and hide latencies messages, enhanced MPI library, etc.
Cluster-wide Job Using a global job management Application of single-job
system with better scheduling and management systems such as LSF,
Management monitoring Codine, etc.
Dynamic Load Balancing Balancing the workload of all Workload monitoring, process
processing nodes a l o n g with failure migration, job replication and gang
recovery scheduling, etc.
Scalability and Adding more servers to a cluster or Use of scalable interconnect,
adding more clusters to a grid as the performance monitoring, distributed
Programmability workload or data set increases execution environment, and better
software tools
• Lack of a cluster-wide OS limits full resource sharing.
• Middleware solutions provide necessary functionalities like scalability, fault tolerance, and
job management.
• Key challenges include efficient message passing, seamless fault tolerance, high
availability, and performance scalability.
Server clusters are scalable, high-performance computing systems that utilize networked
computing nodes for parallel and distributed processing. Achieving SSI and efficient
middleware support remains a key challenge in cluster computing. Virtual clusters and cloud
computing are evolving to enhance cluster flexibility and resource management.
1.3.2 Grid Computing, Peer-to-Peer (P2P) Networks, and System Models
Grid Computing Infrastructures
Grid computing has evolved from Internet and web-based services to enable large-scale
distributed computing. It allows applications running on remote systems to interact in real-time.
1.3.2.1 Computational Grids
• A grid connects distributed computing resources (workstations, servers, clusters,
supercomputers) over LANs, WANs, and the Internet.
• Used for scientific and enterprise applications, including SETI@Home and astrophysics
simulations.
• Provides an integrated resource pool, enabling shared computing, data, and information
services.
1.3.2.2 Grid Families
Design Issues Computational and Data Grids P2P Grids
Grid Applications Reported Distributed supercomputing, Open grid with P2P flexibility, all
National Grid initiatives, etc. resources from client machines
Representative Systems TeraGrid built in US, ChinaGrid in JXTA, FightAid@home,
China, and the e-Science grid SETI@home
built in UK
Development Lessons Learned Restricted user groups, Unreliable user-contributed
middleware bugs, protocols to resources, limited to a few apps
acquire resources
• Computational and Data Grids – Used in national-scale supercomputing projects (e.g.,
TeraGrid, ChinaGrid, e-Science Grid).
• P2P Grids – Utilize client machines for open, distributed computing (e.g., SETI@Home,
JXTA, FightingAID@Home).
• Challenges include middleware bugs, security issues, and unreliable user-contributed
resources.
1.3.3 Peer-to-Peer (P2P) Network Families
P2P systems eliminate central coordination, allowing client machines to act as both servers and
clients.
1.3.3.1 P2P Systems
Collaboration PlatformsSkype, MSN, Multiplayer games Privacy risks, spam, lack of trust
user-driven resource sharing. Future developments will focus on security, standardization, and
efficiency improvements.
Cloud Computing over the Internet
Cloud computing has emerged as a transformative on-demand computing paradigm, shifting
computation and data storage from desktops to large data centers. This approach enables the
virtualization of hardware, software, and data resources, allowing users to access scalable
services over the Internet.
1.3.4.1 Internet Clouds
• Hybrid Cloud – Combines public and private clouds, optimizing cost and security.
Advantages of Cloud Computing
Cloud computing provides several benefits over traditional computing paradigms, including:
1. Energy-efficient data centers in secure locations.
2. Resource sharing, optimizing utilization and handling peak loads.
3. Separation of infrastructure maintenance from application development.
4. Cost savings compared to traditional on-premise infrastructure.
5. Scalability for application development and cloud-based computing models.
6. Enhanced service and data discovery for content and service distribution.
7. Security and privacy improvements, though challenges remain.
8. Flexible service agreements and pricing models for cost-effective computing.
Cloud computing fundamentally changes how applications and services are developed, deployed,
and accessed. With virtualization, scalability, and cost efficiency, it has become the backbone of
modern Internet services and enterprise computing. Future advancements will focus on security,
resource optimization, and hybrid cloud solutions.
1.4 Software Environments for Distributed Systems and Clouds
This section introduces Service-Oriented Architecture (SOA) and other key software environments
that enable distributed and cloud computing systems. These environments define how
applications, services, and data interact within grids, clouds, and P2P networks.
1.4.1 Service-Oriented Architecture (SOA)
SOA enables modular, scalable, and reusable software components that communicate over a
network. It underpins web services, grids, and cloud computing environments.
1.4.1.1 Layered Architecture for Web Services and Grids
• Distributed computing builds on the OSI model, adding layers for service interfaces,
workflows, and management.
SOA has expanded from basic web services to complex multi-layered ecosystems:
• Sensor Services (SS) – Devices like ZigBee, Bluetooth, GPS, and WiFi collect raw data.
• Filter Services (FS) – Process data before feeding into computing, storage, or discovery
clouds.
• Cloud Ecosystem – Integrates compute clouds, storage clouds, and discovery clouds for
managing large-scale applications.
SOA enables data transformation from raw data → useful information → knowledge → wisdom
→ intelligent decisions.
SOA defines the foundation for web services, distributed systems, and cloud computing. By
integrating sensors, processing layers, and cloud resources, SOA provides a scalable, flexible
approach for modern computing applications. The future of distributed computing will rely on
intelligent data processing, automation, and service-driven architectures.
1.4.1.4 Grids vs. Clouds
• Grids use static resources, whereas clouds provide elastic, on-demand resources via
virtualization.
• Clouds focus on automation and scalability, while grids are better for negotiated
resource allocation.
• Hybrid models exist, such as clouds of grids, grids of clouds, and inter-cloud architectures.
• Distributed OS models are evolving, with MOSIX2 enabling process migration and
resource sharing across Linux clusters.
• Parallel programming models like MPI and MapReduce optimize large-scale computing.
• Cloud computing and grid computing continue to merge, leveraging virtualization and
elastic resource management.
• Standardized middleware (OGSA, Globus) enhances grid security, interoperability, and
automation.
1.5 Performance, Security, and Energy Efficiency
This section discusses key design principles for distributed computing systems, covering
performance metrics, scalability, system availability, fault tolerance, and energy efficiency.
1.5.1 Performance Metrics and Scalability Analysis
Performance is measured using MIPS, Tflops, TPS, and network latency. Scalability is crucial in
distributed systems and has multiple dimensions:
1. Size Scalability – Expanding system resources (e.g., processors, memory, storage) to improve
performance.
2. Software Scalability – Upgrading OS, compilers, and libraries to accommodate larger
systems.
3. Application Scalability – Increasing problem size to match system capacity for cost-
effectiveness.
4. Technology Scalability – Adapting to new hardware and networking technologies while
ensuring compatibility.
1.5.1.3 Scalability vs. OS Image Count
• SMP systems scale up to a few hundred processors due to hardware constraints.
• NUMA systems use multiple OS images to scale to thousands of processors.
• Clusters and clouds scale further by using virtualization.
• Grids integrate multiple clusters, supporting hundreds of OS images.
• P2P networks scale to millions of nodes with independent OS images.
• Speedup Formula:
where α is the fraction of the workload that is sequential.
• Even with hundreds of processors, speedup is limited if sequential execution (α) is high.
Problem with Fixed Workload
• In Amdahl’s law, we have assumed the same amount of workload for both sequential and parallel
execution of the program with a fixed problem size or data set. This was called fixed-workload speedup
by Hwang and Xu [14]. To execute a fixed workload on n processors, parallel processing may lead to
a system efficiency defined as follows:
• Speedup Formula:
• This speedup is known as Gustafson’s law. By fixing the parallel execution time at
level W, the following efficiency expression is obtained:
• More efficient for large clusters, as workload scales dynamically with system size.
1.5.2 Fault Tolerance and System Availability
• High availability (HA) is essential in clusters, grids, P2P networks, and clouds.
• System availability depends on Mean Time to Failure (MTTF) and Mean Time to Repair
(MTTR): Availability=MTTF/(MTTF+MTTR)
• Eliminating single points of failure (e.g., hardware redundancy, fault isolation) improves
availability.
• P2P networks are highly scalable but have low availability due to frequent peer failures.
• Grids and clouds offer better fault isolation and thus higher availability than traditional
clusters.
• Scalability and performance depend on resource expansion, workload distribution, and
parallelization.
• Amdahl’s Law limits speedup for fixed workloads, while Gustafson’s Law optimizes
large-scale computing.
• High availability requires redundancy, fault tolerance, and system design improvements.
• Clouds and grids balance scalability and availability better than traditional SMP or
NUMA systems.
Network Threats, Data Integrity, and Energy Efficiency
This section highlights security challenges, energy efficiency concerns, and mitigation strategies
in distributed computing systems, including clusters, grids, clouds, and P2P networks.
1.5.3 Network Threats and Data Integrity
Distributed systems require security measures to prevent cyberattacks, data breaches, and
unauthorized access.
1.5.3.1 Threats to Systems and Networks
• Improper Authentication – Allows attackers to steal resources, modify data, and conduct
replay attacks.
1.5.3.2 Security Responsibilities
Security in cloud computing is divided among different stakeholders based on the cloud service
model:
• SaaS: Cloud provider handles security, availability, and integrity.
• PaaS: Provider manages integrity and availability, while users control confidentiality.
• IaaS: Users are responsible for most security aspects, while providers ensure availability.
1.5.3.3 Copyright Protection
• Collusive piracy in P2P networks allows unauthorized file sharing.
• Content poisoning and timestamped tokens help detect piracy and protect digital rights.
1.5.3.4 System Defense Technologies
Three generations of network security have evolved:
1. Prevention-based – Access control, cryptography.
2. Detection-based – Firewalls, intrusion detection systems (IDS), Public Key Infrastructure
(PKI).
3. Intelligent response systems – AI-driven threat detection and response.
1.5.3.5 Data Protection Infrastructure
• Trust negotiation ensures secure data sharing.
• Worm containment & intrusion detection protect against cyberattacks.
• Cloud security responsibilities vary based on the service model (SaaS, PaaS, IaaS).