Module 3(Notes)
Module 3(Notes)
Over the past two decades, the world economy has rapidly moved from
manufacturing to more service-oriented. In 2010, 80 percent of the U.S. economy was
driven by the service industry, leav ing only 15 percent in manufacturing and 5 percent
in agriculture and other areas. Cloud computing benefits the service industry most and
advances business computing with a new paradigm.
In 2009, the global cloud service marketplace reached $17.4 billion. IDC predicted in
2010 that the cloud based economy may increase to $44.2 billion by 2013. Developers
of innovative cloud applications no longer acquire large capital equipment in advance.
They just rent the resources from some large data centers that have been automated
for this purpose.
Users can access and deploy cloud applications from anywhere in the world at very
competitive costs. Virtualized cloud platforms are often built on top of large data
centers. With that in mind, we examine first the server cluster in a data center and its
interconnec tion issues.
In other words, clouds aim to power the next generation of data centers by
architecting them as virtual resources over automated hardware, databases, user
interfaces, and application envir onments. In this sense, clouds grow out of the desire
to build better data centers through automated resource provisioning.
Public, Private, and Hybrid Clouds
1
Cloud computing has evolved from cluster, grid, and utility computing. Cluster and
grid computing use multiple computers in parallel to solve problems of various sizes.
Utility computing and Software as a Service (SaaS) introduce the model of pay-per-
use computing resources.
Cloud computing is a high-throughput computing (HTC) paradigm that delivers
services through large data centers or server farms. It allows users to access shared
resources from anywhere at any time using connected devices. Cloud computing
moves computations to the location of data rather than transferring large datasets to
multiple desktops, thus achieving better network bandwidth utilization.
With machine virtualization, cloud computing achieves better resource utilization,
application flexibility, and cost efficiency. It uses a virtual platform with elastic
resources, provisioning hardware, software, and datasets dynamically on demand.
The cloud model replaces desktop computing with a service-oriented platform
supported by server clusters and large databases.
Cloud computing helps IT companies by eliminating the need to manage hardware
and system software. It enables developers to focus on application development and
business value rather than infrastructure setup. It offers simplicity and low cost to
both providers and end users.
Centralized versus Distributed Computing
Public Clouds
2
Private Clouds
• Providers offer a remote interface for users to create and manage VM instances.
• Operates on proprietary infrastructure managed by the provider.
• Delivers business processes, applications, and infrastructure as services.
• Uses a flexible, pay-per-use pricing model.
• Ex: Netflix uses AWS for streaming videos
Hybrid Clouds
How It Works
1. Private clouds supplement their capacity by integrating with a public cloud.
2. Example: Flipkart during BB SALES: Uses private cloud: storing customer data
& Public cloud:AWS for scaling resources during peak demands.
3. Example: Research Compute Cloud (RC2) is a private cloud, built by IBM, that
interconnects the computing and IT resources at eight IBM Research Centers
scattered throughout the United States, Europe, and Asia
1. Balances standardization & customization – Uses public clouds for flexibility while
keeping sensitive data private.
2. Preserves capital investment – Organizations don’t have to expand private
infrastructure but can use public cloud resources when needed.
3. Enhances security & efficiency – Keeps critical operations private while leveraging
public cloud scalability.
3
Data-Center Networking Structure
• Server Cluster (or VM Cluster) – The main computing units of the cloud.
• Compute Nodes – Servers performing user jobs.
• Control Nodes – Manage and monitor cloud activities.
• Gateway Nodes – Act as access points for external users and ensure security.
Cloud Networking
4
Different clouds require varied performance, security, and data protection levels.
SLAs help ensure both providers and users agree on service expectations.
Ex:Uber App
RSN:ride requests and ensure smooth communication between drivers and passengers.
ISN:provide real-time traffic updates and weather conditions for better route selection.
5
Cloud Ecosystem and Enabling Technologies
• Buy and Own – Organizations must purchase hardware, system software, and
applications.
• High Maintenance –
• Install, configure, test, verify, evaluate, and manage the system.
• Requires IT staff and infrastructure management.
• Fixed Costs – Pay for maximum capacity, even if it's not used all the time.
• Obsolescence Cycle – Every 18 months, hardware becomes outdated, requiring
costly upgrades.
• High Cost ($$$$$) – Large upfront capital investment is required.
6
Cloud Design Objectives (Shift Services, Scale Data, High Standards)
• Shifting computing from desktops to data centers :Computer processing, storage, and
software delivery is shifted away from desktops and local servers and toward data centers
over the Internet
. • Service provisioning and cloud economics: Providers supply cloud services by signing SLAs
with consumers and end users. The services must be efficient in terms of computing, storage,
and power consumption. Pricing is based on a pay-as-you-go policy.
• Scalability in performance :The cloud platforms and software and infrastructure services
must be able to scale in performance as the number of users increases. 198 CHAPTER 4 Cloud
Platform Architecture over Virtualized Data Centers
• Data privacy protection: Can you trust data centers to handle your private data and
records? This concern must be addressed to make clouds successful as trusted services.
• High quality of cloud services: The QoS of cloud computing must be standardized to make
clouds interoperable among multiple providers.
• New standards and interfaces: This refers to solving the data lock-in problem associated
with data centers or cloud providers. Universally accepted APIs and access protocols are
needed to provide high portability and flexibility of virtualized applications.
Cost Model
7
Cloud Computing Cost Model (Figure 4.3b)
Cloud Ecosystems
With the growth of Internet clouds, a broad ecosystem of providers, users, and
technologies has emerged. This ecosystem is primarily built around public clouds, but
there's a growing interest in open-source cloud tools that help organizations
construct their own IaaS (Infrastructure as a Service) clouds using internal
infrastructure. Private and hybrid clouds also leverage public cloud elements,
allowing remote access through web service interfaces such as Amazon EC2.
Cloud Ecosystem Levels
8
VM Management – (Level d)
Many startups are moving to cloud-based IT strategies, reducing capital expenses by avoiding
the setup of dedicated IT infrastructure. This shift pushes the need for a flexible and open
cloud architecture that supports the construction of private/hybrid clouds. VI management
is central to this goal.
Dynamic VM placement
Automatic load balancing
Server consolidation
Dynamic resizing and partitioning of infrastructure
Besides commercial clouds like Amazon EC2, open-source tools such as Eucalyptus and
Globus Nimbus are available for cloud infrastructure virtualization. Access to these tools is
provided through interfaces like:
Amazon EC2WS
Nimbus WSRF
ElasticHost REST
For VM generation and control, tools like OpenNebula and VMware vSphere support
virtualization platforms such as Xen, KVM, and VMware.
Infrastructure-as-a-Service (IaaS)
9
🔹 PaaS (Platform-as-a-Service)
🔹 SaaS (Software-as-a-Service)
IaaS Components
Example: Amazon EC2 & S3:A startup uses Amazon EC2 for on-demand computing power
and S3 for cloud storage, reducing costs and improving flexibility.
10
Platform-as-a-Service (PaaS)
What is PaaS?
Features of PaaS
PaaS Examples
11
Example : Google App Engine for PaaS Applications
• Platform as a Service (PaaS): GAE is a cloud-
based platform that allows developers to build
and deploy applications without worrying about
infrastructure.
• Automatic Scaling & Load Balancing: GAE
automatically manages application scaling based
on traffic demands and distributes the load
among multiple servers.
• Task Scheduling: GAE has a distributed
scheduling mechanism to trigger tasks at
specified times and intervals.
• Local Development Environment: Developers
can build, test, and debug applications on their
local machines before deploying them to
Google’s cloud infrastructure.
What is SaaS?
o A user opens a web browser and logs into a SaaS application (e.g., Gmail).
o The application runs on cloud servers, not on the user's device.
o Data is stored in the cloud, either in a vendor’s proprietary cloud (Google Drive,
Microsoft OneDrive) or a publicly hosted cloud.
12
4.2 DATA-CENTER DESIGN AND INTERCONNECTION NETWORKS
Large-Scale (Warehouse)
Feature Small Modular Data Centers
Data Centers
As large as a shopping
Size Fits inside a 40-ft truck container
mall (11x football field)
Storage Cost $0.4 per GB 5.7 times higher than large data centers
Google, Microsoft,
Example Edge computing units, emergency response centers
Amazon data centers
Most data centers are built using commercially available components. An off-the-shelf
server typically includes multicore CPUs with internal cache hierarchies, local shared
DRAM, and directly attached disk drives. These servers are connected via first-level rack
switches, and the entire rack structure is linked using a cluster-level switch. For example,
in a data center with 2,000 servers, each having 8 GB DRAM and four 1 TB disks, groups
of 40 servers are connected through 1 Gbps links to rack-level switches, which are in turn
connected to the cluster-level network.
There is a significant performance difference between local and off-rack storage. Local
disks provide 200 MB/s bandwidth, while off-rack disks offer only 25 MB/s due to shared
uplinks. Moreover, the total disk storage in such clusters is millions of times greater than
13
local DRAM capacity, making latency, bandwidth, and capacity management a major
challenge for large applications.
In large-scale data centers, components are cheaper, but failure is common—about 1%
of nodes may fail concurrently, either due to hardware issues like CPU, disk I/O, or
network failure, or due to software bugs. In severe cases, even the entire data center
may go down, such as during a power outage.
To ensure reliability, redundant hardware is used, and software must maintain multiple
data copies across different locations. This redundancy ensures that services and data
remain accessible even in the face of hardware or software failures
Figure 4.9 shows the layout and cooling facility of a warehouse in a data center.
The data-center room has raised floors for hiding cables, power lines, and cooling
supplies. The cooling system is somewhat simpler than the power system. The raised
floor has a steel grid resting on stanchions about 2–4 ft above the concrete floor. The
under-floor area is often used to route power cables to racks, but its primary use is to
distribute cool air to the server rack.
The CRAC (computer room air conditioning) unit pressurizes the raised floor plenum
by blowing cold air into the plenum. The cold air escapes from the plenum through
perforated tiles that are placed in front of server racks. Racks are arranged in long
aisles that alternate between cold aisles and hot aisles to avoid mixing hot and cold
air.
The hot air produced by the servers circulates back to the intakes of the CRAC units
that cool it and then exhaust the cool air into the raised floor plenum again. Typically,
the incoming coolant is at 12–14°C and the warm coolant returns to a chiller.
Newer data centers often insert a cooling tower to pre-cool the condenser water
loop fluid. Water-based free cooling uses cooling towers to dissipate heat. The cooling
towers use a separate cooling loop in which water absorbs the coolant’s heat in a
heat exchanger
14
Data-Center Interconnection Networks
A critical core design of a data center is the interconnection network among all servers in the
data center cluster. This network design must meet five special requirements: low latency,
high band width, low cost, message-passing interface (MPI) communication support, and
fault tolerance.
Data center networks must support diverse traffic patterns, especially MPI (Message
Passing Interface) communication for parallel applications. This includes both:
For example:
Therefore, the network topology must ensure high bisection bandwidth, so no part of the
cluster becomes a traffic bottleneck.
Network Expandability
Data centers often grow over time. The interconnection network must be:
Key concerns:
Containers can be added easily: just plug in power, cooling, and network, making
expansion more cost-effective and efficient.
Networks must tolerate failures and continue running smoothly even when:
15
Mechanisms for fault tolerance:
️ The design must avoid single points of failure and ensure graceful degradation—
continued operation even when parts of the system fail.
The switch organization remains essential to ensure efficient traffic flow regardless of the
approach.
Structure
• Two-layer topology.
16
• Bottom layer: Server nodes connected to edge switches.
• Upper layer: Aggregation switches connecting edge switches.
• Core switches connect all pods together.
• Pods:
• Core Switches:
• Provide interconnection between pods.
• Enable communication across the whole data center.
• Redundancy and Fault Tolerance:
• Multiple paths exist between any two servers.
• Ensures alternate routes in case of link failure.
• Failure of:
• Edge switch: Affects only a few servers.
• Aggregation/Core switch: Does not affect overall connectivity.
• Each pod contains:
• Edge switches
• Aggregation switches
• Server nodes (leaf nodes)
• Benefits:
• High bandwidth.
• Scalability for cloud applications with massive data transfer.
• Fault-tolerant due to multiple paths.
• Uses low-cost Ethernet switches (cost-effective).
• Routing handled internally by switches.
• Server nodes remain unaffected during switch failures (unless all paths fail).
2. Cooling Mechanism:
17
• Fans and heat exchangers cool the hot air from servers.
• Chilled air or cold water circulates in a loop to maintain optimal temperature.
• Advanced cooling technology helps reduce cooling costs by up to 80%.
3. Capacity:
4. Benefits:
Deployment Process:
• Starts with single server → moves to rack → expands to full container system.
• Building 40-server rack: ~half a day.
• Full container (1000+ servers): requires space planning, power, networking, and
cooling setup.
• Containers must be weatherproof and easy to transport.
• Suitable for cloud applications, like:
• Healthcare clinics needing local data centers.
Container-based data-center modules are meant for construction of even larger data
centers using a farm of container modules.
18
• Servers are shown as circles (e.g., 00, 01...).
• Switches are shown as rectangles (e.g., <0,0>, <1,0>...).
• Two levels in this example:
• Level 0: Basic server-to-switch connections (BCube0).
• Level 1: Higher-level connections that link different BCube0 groups.
• BCube0: Each switch connects to multiple servers (e.g., <0,0> connects to 00, 01, 02,
03).
• BCube1: Built from multiple BCube0 groups, connected using extra switches in Level
1.
• Servers have multiple ports.
• This allows each server to connect to:
• One Level 0 switch
• One Level 1 switch
• More ports → more redundancy and bandwidth.
• Many paths exist between any two servers.
• Ensures:
• High availability
• Fault tolerance
• Load balancing
• Better bandwidth utilization
Each container (like Container-00, Container-01, etc.) has a BCube network inside it
connecting servers and switches.
19
• MDCube = BCube + Cube:
MDCube combines:
• A BCube structure inside the container
• A cube-like (grid) network among containers
• 2D Grid Example:
In the figure, 9 BCube1 containers (3x3 layout) are connected to form a 2D MDCube.
• Servers inside each container can communicate with other containers via high-speed
links between switches.
You can build large data centers by just adding more containers to the grid.
MDCube supports large cloud-based applications by providing high bandwidth,
fault-tolerant paths, and modular design.
20
1. Scalability
• Easily add more servers, storage, or bandwidth as demand grows.
2. Virtualization
• Support both physical and virtual machines for flexible resource use.
3. Efficiency
• Combine hardware + software for easy and optimized operations.
4. Reliability
• Keep data in multiple locations (e.g., 3 disks in different data centers) to
avoid data loss.
Cloud Management:
These technologies play instrumental roles in making cloud computing a reality. Most of
these technologies are mature today to meet increasing demand. In the hardware area, the
rapid progress in multicore CPUs, memory chips, and disk arrays has made it possible to
build faster data centers with huge amounts of storage space. Resource virtualization
enables rapid cloud deployment and disaster recovery.
21
Generic Cloud Architecture(Ex:Google drive->file upload)
Includes Storage Area Networks (SANs), database systems, firewalls, and security
devices.
Web service APIs allow developers to access and use cloud resources.
Monitoring and metering tools track performance and resource usage.
Cloud software must automate resource management and maintenance.
22
Layered Cloud Architectural Development (ex:Netflix Streaming Service)
23
Application Layer (SaaS)
. Objective
Ensures QoS (Quality of Service) based on SLA (Service Level Agreement) between cloud
providers and users.
24
Users/Brokers
Pricing Mechanism
Accounting Mechanism
VM Monitor
25
• Tracks resource entitlements for each VM.
• Supports flexible VM allocation across physical machines.
Dispatcher
Hardware virtualization
26
• Virtualization provides customized and separate environments for each user,
improving security and usability.
• Virtual disk storage and virtual networks support VM functionality.
• Virtualized resources (CPU, storage, etc.) are pooled and managed by virtual
integration managers.
• These managers handle load balancing, resource allocation, security, data, and
provisioning.
27
Virtualization Support in Public Clouds
• AWS (Amazon Web Services): Offers full VM-level virtualization, allowing users to
run custom applications.
• Microsoft Azure: Offers programming-level virtualization through the .NET
framework.
• Google App Engine (GAE): Provides limited application-level virtualization with only
Google-managed services.
• Tools:
• VMware: Used for workstations, servers, and virtual infrastructures.
• Microsoft Tools: Used on PCs and special servers.
• XenEnterprise: Only for Xen-based servers.
• Benefits: Virtualization enables high availability (HA), disaster recovery, dynamic
load balancing, and rich provisioning, key for cloud and utility computing.
Involves:
o Hardware configuration
o Operating system installation
28
o Backup agent setup
o Restarting system
Results in long recovery time, often complex and expensive.
VM-Based Recovery
Key Metrics
Security Consideration
During live migration, VM security must be enforced to protect data and system
integrity.
29
• Threats include traditional (DoS, malware) and cloud-specific (VM rootkits,
hypervisor attacks, man-in-the-middle attacks during migration).
• Need to secure both passive (data theft) and active (data manipulation) attacks.
30
1. Top Level – Users
Hardware Providers: Supply physical servers, storage systems, and networking gear.
Software Providers: Offer system software, virtualization platforms, and
middleware.
Together, they support the IaaS layer and indirectly all upper layers.
31
Google App Engine (GAE)
Google operates the world’s largest search engine, leveraging massive data
processing capabilities.
This has led to innovations in data center design and scalable programming models
like MapReduce.
Google owns hundreds of data centers with over 460,000 servers worldwide.
Up to 200 data centers can be used simultaneously for cloud applications.
Data storage includes text, images, and video, with replication for fault tolerance
and high availability (HA).
32
GAE Architecture
A distributed file system designed to manage large volumes of data across multiple
machines.
GFS Master: Manages metadata (file names, locations).
GFS Chunkservers: Store actual data chunks and serve them upon request.
Ensures fault tolerance through data replication.
2. MapReduce
3. BigTable
4. Chubby
5. Scheduler
33
User sends requests through an application interface (like a browser or app).
The request enters the Google cloud infrastructure (shaded cloud area).
Inside the infrastructure:
o Scheduler allocates resources.
o Chubby coordinates access.
o GFS Master controls file system metadata.
o Multiple nodes run services like:
MapReduce Jobs
BigTable Server
GFS chunkserver
Scheduler slave
All run over Linux OS.
The GAE platform comprises the following five major components. The GAE is not an
infrastructure platform, but rather an application development platform for users. We
describe the component functionalities separately.
a. The datastore offers object-oriented, distributed, structured data storage services based
onBigTable techniques. The datastore secures data management operations.
b. The application runtime environment offers a platform for scalable web programming
andexecution. It supports two development languages: Python and Java.
c. The software development kit (SDK) is used for local application development. The SDK
allowsusers to execute test runs of local applications and upload application code.
e. The GAE web service infrastructure provides special interfaces to guarantee flexible use
and management of storage and network resources by GAE.
GAE Applications
Well-known GAE applications include the Google Search Engine, Google Docs,
Google Earth, and Gmail. These applications can support large numbers of users
simultaneously.
Users can interact with Google applications via the web interface provided by each
application. Third-party application providers can use GAE to build cloud applications
for providing services.
The applications are all run in the Google data centers. Inside each data center, there
might be thousands of server nodes to form different clusters. Each cluster can run
multipurpose servers.
GAE supports many web applications. One is a storage service to store application-
specific data in the Google infrastructure. The data can be persistently stored in the
34
backend storage server while still providing the facility for queries, sorting, and even
transactions similar to traditional database systems.
GAE also provides Google-specific services, such as the Gmail account service (which
is the login service, that is, applications can use the Gmail account directly). This can
eliminate the tedious work of building customized user management components in
web applications. Thus, web applications built on top of GAE can use the APIs
authenticating users and sending e-mail using Google accounts.
AWS
Amazon Web Services (AWS) is a public cloud computing platform using the
Infrastructure-as-a-Service (IaaS) model.
It allows developers and companies to build and run applications in the cloud using
virtual machines and other cloud-based resources.
AWS Architecture
Additional Features
CloudWatch: Monitors AWS resources like EC2 for CPU, disk, and network metrics.
35
Elastic Load Balancer (ELB): Distributes traffic across multiple EC2s for high
availability.
Auto Scaling: Automatically adjusts EC2 count based on load.
Amazon DevPay & FPS: Handle billing and payment integration for commercial cloud
services.
Amazon Mechanical Turk: Offers a workforce-as-a-service model for tasks like data
labeling or surveys.
MPI Clusters (since 2010): High-performance computing using cluster compute
instances.
36
Architecture
1. Live Service
o Access to Microsoft Live applications.
o Supports data usage across multiple machines concurrently.
2. .NET Service
o Enables application development on local hosts.
o Applications can run on cloud-based machines.
3. SQL Azure
o Provides access to relational database services using SQL Server in the cloud.
4. SharePoint Service
o A platform for developing custom business web applications.
o Scalable and manageable.
5. Dynamic CRM Service
o Supports business applications related to finance, marketing, sales, and
promotions.
37
Six Layers of Cloud Services Architecture
38
Cloud Service Tasks and Trends
4. Collocation Services
39
Layered Software Stack Structure
Platform Layer
Similar to cluster environments, cloud systems require runtime services for smooth
operation.
Ensures efficiency and proper functioning of the cloud cluster.
Cluster Monitoring
40
SaaS Runtime Environment
Data Storage
41