Cloud Computing - Unit-1
Cloud Computing - Unit-1
Cloud Computing - Unit-1
Dr. R. Thangarajan
Professor, Information Technology
Kongu Engineering College
Perundurai – 638 060, Erode
Enabling Technologies
❑This introductory section assesses the evolutional changes in computing
and IT trends in the past 30 years.
❑We will study both high-performance computing (HPC) for scientific
computing and high throughput computing (HTC) systems for business
computing.
❑We will examine clusters/MPP, Grids, P2P networks, and Internet Clouds.
❑These systems are distinguished by their platform architectures, OS
platforms, processing algorithms, communication protocols, security
demands, and service models applied.
❑The study will emphasize scalability, performance, availability, security,
energy efficiency, workload outsourcing, data center protection, and so on.
Scalable Computing over the Internet
❑In this section, we assess evolutionary changes in machine
architecture, operating system platform, network connectivity, and
application workload.
❑Instead of using a centralized computer to solve computational
problems, a parallel and distributed computing system uses multiple
computers to solve large-scale problems over the Internet.
❑Thus, distributed computing becomes data-intensive and network-
centric.
❑This section identifies the applications of modern computer systems
that practice parallel and distributed computing.
❑These large-scale Internet applications have significantly enhanced
the quality of life and information services in society today.
The Age of Internet Computing
❑Due to billions of people using the Internet every day, supercomputer sites
and large data centers must provide high-performance computing services
to huge numbers of Internet users concurrently.
❑Because of this high demand, the Linpack Benchmark for high-
performance computing (HPC) applications is no longer optimal for
measuring system performance.
❑The emergence of computing clouds instead demands high-throughput
computing (HTC) systems built with parallel and distributed computing
technologies.
❑We have to upgrade data centers using fast servers, storage systems, and
high-bandwidth networks.
❑The purpose is to advance network-based computing and web services
with emerging new technologies.
Platform Evolution
Computer technology has undergone five generations of development, each lasting from
10 to 20 years.
Successive generations overlapped over about 10 years.
❑For instance, from 1950 to 1970, a handful of mainframes, including the IBM 360 and
CDC 6400, were built to satisfy the demands of large businesses and government
organizations.
❑From 1960 to 1980, lower-cost minicomputers such as the DEC PDP 11 and VAX Series
became popular among small businesses and college campuses.
❑From 1970 to 1990, we saw widespread use of personal computers built with VLSI
microprocessors.
❑From 1980 to 2000, massive numbers of portable computers and pervasive devices
appeared in both wired and wireless applications.
❑Since 1990, the use of both HPC and HTC systems hidden in clusters, grids, or Internet
clouds has proliferated.
❑These systems are employed by both consumers and high-end web-scale computing and
information services.
❑The general computing trend is to leverage shared web resources and massive
amounts of data over the Internet. The Figure below illustrates the evolution of HPC
and HTC systems.
❑On the HPC side, supercomputers (massively parallel processors or MPPs) are
gradually replaced by clusters of cooperative computers out of a desire to share
computing resources.
❑The cluster is often a collection of homogeneous compute nodes that are physically
connected in close range to one another.
❑On the HTC side, peer-to-peer (P2P) networks are formed for distributed file sharing
and content delivery applications.
❑A P2P system is built over many client machines.
❑Peer machines are globally distributed in nature. P2P, cloud computing, and web
service platforms are more focused on HTC applications than on HPC applications.
❑Clustering and P2P technologies lead to the development of computational grids or
data grids.
High Performance Computing (HPC)
❑For many years, HPC systems emphasized only raw speed performance.
❑The speed of HPC systems has increased from GFLOPS in the early 1990s to
now PFLOPS in 2010.
❑This improvement was driven mainly by the demands from the scientific,
engineering, and manufacturing communities.
❑For example,
▪ The Top 500 most powerful computer systems in the world are measured by the
floating-point speed in Linpack benchmark results.
▪ However, the number of supercomputer users is limited to less than 10% of all
computer users.
▪ Today, the majority of computer users are using desktop computers or large servers
when they conduct Internet searches and market-driven computing tasks.
High Throughput Computing (HTC)
❑The development of market-oriented high-end computing systems is
undergoing a strategic change from an HPC paradigm to an HTC paradigm.
❑This HTC paradigm pays more attention to high-flux computing.
❑The main application for high-flux computing is in Internet searches and
web services by millions or more users simultaneously.
❑The performance goal thus shifts to measure high throughput or the
number of tasks completed per unit of time.
❑HTC technology needs to not only improve in terms of batch processing
speed, but also address the acute problems of cost, energy savings,
security, and reliability at many data and enterprise computing centers.
❑This course will address both HPC and HTC systems to meet the demands
of all computer users.
The Three New Computing Paradigms
“As of now, computer networks are still in their infancy, but as they grow up
and become sophisticated, we will probably see the spread of computer
utilities, which like present electric and telephone utilities, will service
individual homes and offices across the country.” – Leonard Kleinrock, UCLA
(1969).
Many people have redefined the term computer since that time.
▪ “The network is the computer.” - John Gage, SUN Microsystems (1984).
▪ “The data center is the computer. There are dramatic differences between
developing software for millions to use as a service versus distributing
software to run on their PCs.” – David Patterson, UC Berkeley (2008).
▪ “The cloud is the computer.” - Rajkumar Buyya, University of Melbourne.
Computing Paradigms and their Distinctness
1. Centralized computing: This is a computing paradigm by which all computer
resources are centralized in one physical system.
❑All resources (processors, memory, and storage) are fully shared and tightly
coupled within one integrated OS.
❑Many data centers and supercomputers are centralized systems, but they are
used in parallel, distributed, and cloud computing applications.
2. Parallel computing: In parallel computing, all processors are either tightly
coupled with centralized shared memory or loosely coupled with distributed
memory.
❑It is known as parallel processing.
❑Inter-processor communication is accomplished through shared memory or via
message passing.
❑A computer system capable of parallel computing is commonly known as a
parallel computer.
❑Programs running in a parallel computer are called parallel programs. The
process of writing parallel programs is often referred to as parallel programming.
3. Distributed computing: This is a field of computer science/engineering that
studies distributed systems.
❑A distributed system consists of multiple autonomous computers, each having
its own private memory, communicating through a computer network.
❑Information exchange in a distributed system is accomplished through message
passing.
❑A computer program that runs in a distributed system is known as a distributed
program.
❑The process of writing distributed programs is referred to as distributed
programming.
4. Cloud computing: An Internet cloud of resources can be either a centralized or a
distributed computing system.
❑The cloud applies parallel or distributed computing or both.
❑Clouds can be built with physical or virtualized resources over large data centers
that are centralized or distributed.
❑Some authors consider cloud computing to be a form of utility computing or
service computing.
Other terms in Computing
❑Concurrent computing or concurrent programming: These terms typically
refer to the union of parallel computing and distributed computing.
❑Ubiquitous computing refers to computing with pervasive devices at any
place and time using wired or wireless communication.
❑The Internet of Things (IoT) is a networked connection of everyday objects
including computers, sensors, humans, etc.
❑The IoT is supported by Internet clouds to achieve ubiquitous computing
with any object at any place and time.
❑Finally, the term Internet computing is even broader and covers all
computing paradigms over the Internet.
❑This course covers all the aforementioned computing paradigms, placing
more emphasis on distributed and cloud computing and their working
systems, including clusters, grids, P2P, and cloud systems.
Distributed System Families
❑Since the mid-1990s, technologies for building P2P networks and
networks of clusters have been consolidated into many national
projects designed to establish wide-area computing infrastructures,
known as computational grids or data grids.
❑Recently, we have witnessed a surge in interest in exploring Internet
cloud resources for data-intensive applications.
❑Internet clouds are the result of moving desktop computing to
service-oriented computing using server clusters and huge databases
at data centers.
❑This course introduces the basics of various parallel and distributed
families.
❑Grids and clouds are disparity systems that place great emphasis on
resource sharing in hardware, software, and data sets.
❑In the future, both HPC and HTC systems will demand
multicore or many-core processors that can handle
large numbers of computing threads per core.
❑Both HPC and HTC systems emphasize parallelism
and distributed computing.
❑Future HPC and HTC systems must be able to satisfy
this huge demand for computing power in terms of
throughput, efficiency, scalability, and reliability.
❑The system efficiency is decided by speed,
programming, and energy factors (i.e., throughput per
watt of energy consumed).
Meeting these goals requires to yield the following design objectives:
❑Efficiency measures the utilization rate of resources in an execution model
by exploiting massive parallelism in HPC. For HTC, efficiency is more closely
related to job throughput, data access, storage, and power efficiency.
❑Dependability measures the reliability and self-management from the chip
to the system and application levels. The purpose is to provide high-
throughput service with Quality of Service (QoS) assurance, even under
failure conditions.
❑Adaptation in the programming model measures the ability to support
billions of job requests over massive data sets and virtualized cloud
resources under various workload and service models.
❑Flexibility in application deployment measures the ability of distributed
systems to run well in both HPC (science and engineering) and HTC
(business) applications.
Scalable Computing Trends and New Paradigms
❑Moore’s law indicates that processor speed doubles every
18 months. Although Moore’s law has been proven valid
over the last 30 years, it is difficult to say whether it will
continue to be true in the future.
❑Gilder’s law indicates that network bandwidth has doubled
each year in the past. Will that trend continue in the future?
❑The tremendous price/performance ratio of commodity
hardware was driven by the desktop, notebook, and tablet
computing markets.
❑This has also driven the adoption and use of commodity
technologies in large-scale computing.
Degrees of Parallelism
❑Bit-level parallelism (BLP) converts bit-serial processing to word-level
processing gradually. Over the years, users graduated from 4-bit
microprocessors to 8-, 16-, 32-, and 64-bit CPUs.
❑Instruction-level parallelism (ILP), in which the processor executes multiple
instructions simultaneously rather than only one instruction at a time. For
the past 30 years, we have practiced ILP through pipelining, superscalar
computing, VLIW (very long instruction word) architectures, and
multithreading. ILP requires branch prediction, dynamic scheduling,
speculation, and compiler support to work efficiently.
❑Data-level parallelism (DLP) was made popular through SIMD (single
instruction, multiple data) and vector machines using vector or array types
of instructions. DLP requires even more hardware support and compiler
assistance to work properly.
❑Ever since the introduction of multicore processors and chip
multiprocessors (CMPs), we have been exploring task-level parallelism
(TLP).
❑A modern processor explores all of the aforementioned parallelism
types. In fact, BLP, ILP, and DLP are well supported by advances in
hardware and compilers.
❑However, TLP is far from being very successful due to the difficulty in
programming and compilation of code for efficient execution on
multicore CMPs.
❑As we move from parallel processing to distributed processing, we
will see an increase in computing granularity to job-level parallelism
(JLP).
❑We can say that coarse-grain parallelism is built on top of fine-grain
parallelism.
Applications of HPC and HTC
Trend toward Utility Computing
❑Utility computing focuses on a business model in which customers
receive computing resources from a paid service provider.
❑All grid/cloud platforms are regarded as utility service providers.
❑However, cloud computing offers a broader concept than utility
computing.
❑Distributed cloud applications run on any available servers in some
edge networks.
❑Major technological challenges include all aspects of computer
science and engineering.
❑For example, users demand new network-efficient processors, scalable
memory and storage schemes, distributed OS, middleware for machine
virtualization, new programming models, effective resource management,
and application
Hype Cycle of New Technologies
Gartner, August 2010
Hype Cycle of New Technologies
Gartner, August 2022
Cyber-Physical Systems
❑A cyber-physical system (CPS) is the result of interaction between
computational processes and the physical world.
❑A CPS integrates “cyber” (heterogeneous, asynchronous) with
“physical” (concurrent and information-dense) objects.
❑A CPS merges the “3C” technologies of computation, communication,
and control into an intelligent closed feedback system between the
physical world and the information world, a concept that is actively
explored.
❑The IoT emphasizes various networking connections among physical
objects, while the CPS emphasizes the exploration of virtual reality (VR)
applications in the physical world.
❑We may transform how we interact with the physical world just like the
Internet transformed how we interact with the virtual world.
Technologies for Networked based Systems
1. Multicore CPUs and Multi-threading Technologies
GPU Computing (From Exascale and Beyond)
❑A GPU is a graphics co-processor or accelerator mounted on a computer’s graphics card or video
card.
❑A GPU offloads the CPU from tedious graphics tasks in video editing applications.
❑The world’s first GPU, the GeForce 256, was marketed by NVIDIA in 1999.
❑These GPU chips can process a minimum of 10 million polygons per second and are used in
nearly every computer on the market today.
❑Some GPU features were also integrated into certain CPUs.
❑Traditional CPUs are structured with only a few cores. For example, the Xeon X5670 CPU has six
cores. However, a modern GPU chip can be built with hundreds of processing cores.
❑Unlike CPUs, GPUs have a throughput architecture that exploits massive parallelism by
executing many concurrent threads slowly instead of quickly executing a single long thread in a
conventional microprocessor.
❑Lately, parallel GPUs or GPU clusters have been garnering a lot of attention against the use of
CPUs with limited parallelism. General-purpose computing on GPUs, known as GPGPUs, has
appeared in the HPC field.
❑NVIDIA’s CUDA model was for HPC using GPGPUs.
How do GPUs Work?
❑Early GPUs functioned as coprocessors attached to the CPU.
❑Today, the NVIDIA GPU has been upgraded to 128 cores on a single
chip.
❑Furthermore, each core on a GPU can handle eight threads of
instructions. This translates to having up to 1,024 threads executed
concurrently on a single GPU.
❑This is true massive parallelism, compared to only a few threads that
can be handled by a conventional CPU.
❑The CPU is optimized for latency caches, while the GPU is optimized
to deliver much higher throughput with explicit management of on-
chip memory.
❑Modern GPUs are not restricted to accelerated graphics or video
coding.
❑They are used in HPC systems to power supercomputers with
massive parallelism at multicore and multithreading levels.
❑GPUs are designed to handle large numbers of floating-point
operations in parallel.
❑In a way, the GPU offloads the CPU from all data-intensive
calculations, not just those that are related to video processing.
❑Conventional GPUs are widely used in mobile phones, game
consoles, embedded systems, PCs, and servers.
❑The NVIDIA CUDA Tesla or Fermi is used in GPU clusters or in HPC
systems for the parallel processing of massive floating-pointing data
GPU Programming Model
❑In November 2010, three of the five fastest supercomputers in the world (the
Tianhe-1a, Nebulae, and Tsubame) used large numbers of GPU chips to
accelerate floating-point computations.
❑Figure below shows the architecture of the Fermi GPU, a next-generation GPU
from NVIDIA.
❑This is a streaming multiprocessor (SM) module. Multiple SMs can be built on a
single GPU chip. The Fermi chip has 16 SMs implemented with 3 billion
transistors. Each SM comprises up to 512 streaming processors (SPs), known as
CUDA cores.
❑The Fermi GPU is a newer generation of GPU, first appearing in 2011.
❑The Tesla or Fermi GPU can be used in desktop workstations to accelerate
floating-point calculations or for building large-scale data centers.
❑There are 32 CUDA cores per SM. Only one SM is shown in the Figure.
❑Each CUDA core has a simple pipelined integer ALU and an FPU that can be used
in parallel. Each SM has 16 load/store units allowing source and destination
addresses to be calculated for 16 threads per clock.
❑There are four special function units (SFUs) for executing transcendental
instructions. All functional units and CUDA cores are interconnected by an
NoC (network on chip) to a large number of SRAM banks (L2 caches).
❑Each SM has a 64 KB L1 cache. The 768 KB unified L2 cache is shared by all
SMs and serves all load, store, and texture operations.
❑Memory controllers are used to connecting 6 GB of off-chip DRAMs. The
SM schedules threads in groups of 32 parallel threads called warps.
❑In total, 256/512 FMA (fused multiply and add) operations can be done in
parallel to produce 32/64-bit floating-point results.
❑The 512 CUDA cores in an SM can work in parallel to deliver up to 515
Gflops of double-precision results, if fully utilized. With 16 SMs, a single
GPU has a peak speed of 82.4 Tflops.
❑Only 12 Fermi GPUs have the potential to reach the Pflops performance
Memory, Storage and WAN
Virtualization and Virtualization Middleware
❑A conventional computer has a single OS image. This offers a rigid architecture
that tightly couples application software to a specific hardware platform.
❑Some software running well on one machine may not be executable on another
platform with a different instruction set under a fixed OS.
❑Virtual machines (VMs) offer novel solutions to underutilized resources,
application inflexibility, software manageability, and security concerns in existing
physical machines.
❑Today, to build large clusters, grids, and clouds, we need to access large amounts
of computing, storage, and networking resources in a virtualized manner.
❑We need to aggregate those resources, and hopefully, offer a single system
image.
❑In particular, a cloud of provisioned resources must dynamically rely on the
virtualization of processors, memory, and I/O facilities.
❑So what are virtualized resources, such as VMs, virtual storage, and virtual
networking, and their virtualization software or middleware?
❑In the figure above, the host machine is equipped with the physical hardware, as shown at the bottom
of the figure. An example is an x-86 architecture desktop running its installed Windows OS, as shown in
part (a) of the figure.
❑The VM can be provisioned for any hardware system. The VM is built with virtual resources managed by
a guest OS to run a specific application.
❑Between the VMs and the host platform, one needs to deploy a middleware layer called a virtual
machine monitor (VMM).
❑Figure (b) shows a native VM installed with the use of a VMM called a hypervisor in privileged mode.
For example, the hardware has x-86 architecture running the Windows system. The guest OS could be a
Linux system and the hypervisor is the XEN system developed at Cambridge University. This hypervisor
approach is also called bare-metal VM because the hypervisor handles the bare hardware (CPU,
memory, and I/O) directly.
❑Another architecture is the host VM shown in Figure (c). Here the VMM runs in nonprivileged mode.
The host OS need not be modified.
❑The VM can also be implemented with a dual mode, as shown in Figure (d). Part of the VMM runs at the
user level and another part runs at the supervisor level. In this case, the host OS may have to be
modified to some extent.
❑Multiple VMs can be ported to a given hardware system to support the virtualization process. The VM
approach offers hardware independence of the OS and applications.
❑The user application running on its dedicated OS could be bundled together as a virtual appliance that
can be ported to any hardware platform. The VM could run on an OS different from that of the host
computer.
VM Primitive Operations
The VMM provides the VM abstraction to the guest OS.
With full virtualization, the VMM exports a VM abstraction identical to the
physical machine so that a standard OS such as Windows 2000 or Linux can
run just as it would on the physical hardware.
Low-level VMM operations are indicated by Mendel Rosenblum and
illustrated in the figure below.
First, the VMs can be multiplexed between hardware machines, as shown in
Figure (a).
Second, a VM can be suspended and stored in stable storage, as shown in
Figure (b).
Third, a suspended VM can be resumed or provisioned to a new hardware
platform, as shown in Figure (c).
Finally, a VM can be migrated from one hardware platform to another, as
shown in Figure (d).
❑These VM operations enable a VM to be provisioned to any available
hardware platform.
❑They also enable flexibility in porting distributed application
executions.
❑Furthermore, the VM approach will significantly enhance the
utilization of server resources.
❑Multiple server functions can be consolidated on the same hardware
platform to achieve higher system efficiency. This will eliminate server
sprawl via the deployment of systems as VMs, which move
transparency to the shared hardware.
❑With this approach, VMware claimed that server utilization could be
increased from its current 5–15 percent to 60–80 percent.
Virtualization for Cloud Computing
Data Center Growth and Cost Breakdown: Shown in the figure below.
Low-Cost Design Philosophy:
• High-end switches or routers may be too cost-prohibitive for building data centers. Thus, using
high-bandwidth networks may not fit the economics of cloud computing, given a fixed budget.
• Similarly, using commodity x86 servers is more desired over expensive mainframes.
• The software layer handles network traffic balancing, fault tolerance, and expandability.
• Currently, nearly all cloud computing data centers use Ethernet as their fundamental network
technology
Convergence of Technologies: Essentially, cloud computing is enabled by the convergence of
technologies in four areas:
• Hardware virtualization and multi-core chips,
• Utility and grid computing,
• SOA, Web 2.0, WS mashups
• Autonomic computing and data center automation.
SYSTEM MODELS FOR
DISTRIBUTED AND
CLOUD COMPUTING
Clusters of Cooperative
Computing – HPC, DSM
• Cluster Architecture
• Single–System Image
• Hardware, Software, and
Middleware Support
• Unfortunately, a cluster-wide
OS for complete resource
sharing is not available yet.
• Middleware or OS extensions
were developed at the user
space to achieve SSI at selected
functional levels.
• Without this middleware,
cluster nodes cannot work
together effectively to achieve
cooperative computing.
Grid Computing Infrastructures
P2P Networks
The Cloud Landscape
Software Environments for DS and Clouds - SOA
Parallel and Distributed Programming Models
Dimensions of Scalability
Size scalability This refers to achieving higher performance or more functionality by increasing the machine
size. The word “size” refers to adding processors, cache, memory, storage, or I/O channels.
Software scalability This refers to upgrades in the OS or compilers, adding mathematical and engineering
libraries, porting new application software and installing more user-friendly programming environments. Some
software upgrades may not work with large system configurations.
Application scalability This refers to matching problem size scalability with machine size scalability. Problem size
affects the size of the data set or the workload increase. Instead of increasing machine size, users can enlarge
the problem size to enhance system efficiency or cost-effectiveness.
Technology scalability This refers to a system that can adapt to changes in building technologies, such as
component and networking technologies. When scaling a system design with new technology one must
consider three aspects: time, space, and heterogeneity.
(1) Time refers to generation scalability. When changing to new-generation processors, one must consider
the impact on the motherboard, power supply, packaging and cooling, and so forth. Based on past
experience, most systems upgrade their commodity processors every three to five years.
(2) Space is related to packaging and energy concerns. Technology scalability demands harmony and
portability among suppliers.
(3) Heterogeneity refers to the use of hardware components or software packages from different vendors.
Heterogeneity may limit the scalability.
PERFORMANCE, SECURITY, AND ENERGY EFFICIENCY