0% found this document useful (0 votes)
41 views53 pages

Cloud Computing

The document discusses capacity management in cloud computing. It explains the importance of capacity management for maintaining service quality and cost control. It discusses the challenges of having too little or too much capacity and the need to find the right balance. The document also introduces virtual infrastructure management and OpenNebula as an open source virtual infrastructure manager.

Uploaded by

ashish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views53 pages

Cloud Computing

The document discusses capacity management in cloud computing. It explains the importance of capacity management for maintaining service quality and cost control. It discusses the challenges of having too little or too much capacity and the need to find the right balance. The document also introduces virtual infrastructure management and OpenNebula as an open source virtual infrastructure manager.

Uploaded by

ashish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

CS12 - Capacity & Scheduling

Software as a Service – Summary

Enterprise Software Application


o Perform business functions
o Organize internal and external information
o Share data among internal and external users
o The most standard type of software applicable to SaaS model
o Example: Saleforce.com CRM application, Siebel On-demand application
Anatomy of a Cloud Ecosystem

Why Capacity Management in Cloud?


Capacity Management is Following are five characteristics of Cloud
• Strategic and Proactive • They provide on-demand provisioning of
• Assists in managing service quality computational resources;
• Assists in managing cost expenditures • They use virtualization technologies to
• Aligns business and IT lease these resources;
• Match needs and cost during growth • They provide public and simple remote
interfaces to manage those resources;
• They use a pay-as-you-go cost model,
typically charging by the hour;
• They operate data centers large enough
to provide a seemingly unlimited number
of resources to their clients

What are the Challenges?

Importance of Capacity Management


When Capacity is Too Little (under capacity)
- Application Sizing not performed
- Controlled by limits
- Lower usage costs
- Service Levels affected – by how much?
- High impact on business?
• Loss of service
• SLA breach penalties
When Capacity is Too Much (over capacity)
• Unlimited access to resources
• Service Levels unaffected
• Costs – higher resource usage, software licensing
• Impacts on other services
- Poor performance
- Wasted resources (multi virtual CPU (vSMP) & single- threaded applications)
- VM Sprawl
- Increased pressure to manage virtual resources
So What’s the Way? – Find the Right Balance
We need to have the Right Capacity?
• Application Sizing performed
• Efficient use of IaaS
• Acceptable Service Levels
- continuous review and adjustment
• Controlled by shares, limits and reservations
• Continuous monitoring and tuning
- Configuration adjustments (CPU, Memory)
- VM consolidation
- Power off unused ESX hosts
• Right Balance
- Find the equilibrium (service / cost)
How To ? – Find the Right Balance
Automation and Control
• Rapid Elasticity- Guest Migration and Portability, Templates,
Golden Host
• Resource Pools (limits, shares and reservations)
• Load Balancing
- DRS (Automatic, Partial or Manual)
• Affinity
- VM to VM or VM to Host
- CPU
• Fine grained tweaking for performance gains – Single Threaded Apps
• Licensing
Protecting your loved ones‟
• Critical business applications
• Service Levels must be met
• Highest Priority (shares, unlimited), CPU affinity
• Must guarantee resources (reservations)
• High Availability Clusters
• VMware - Fault Tolerance
• Trade offs
- “just in case” capacity management could impact on other
services
- Significant impact? Think about scaling out or up .
How To ? – Find the Right Balance
Old Habits – Potential Risks
• Gartner – “Through 2015, more than 70% of private cloud implementations will fail to deliver
operational, energy and environmental efficiencies.”
• Infrastructure or application hugging
- Silo mentality “what’s mine is mine”
- Lack of resource sharing
- Through lack of trust and confidence
- “Just in case” capacity planning
- Leads to over provisioning
What is Virtual Infrastructure Management

Virtual Infrastructure Management


Virtual Infrastructure Management

So What is the way out?

Virtual Infrastructure Manager (VIM)


What is a Virtual Infrastructure Manager?
 A VIM runs on top of a hypervisor in a virtualized environment. The hypervisor allocates and
manages virtual machines. The VIM deals with the allocation of resources in the virtual
infrastructure. They include computational resources (processors), storage, and network
resources. Virtual infrastructure management allows the allocation to happen based on current
requirements, rather than being statically allocated.
 The VIM carries out several tasks:
 Allocating resources in accordance with traffic engineering rules.
 Support for defining operational rules.
 Definition of hub-to-facility mapping.
 Providing information for provisioning virtual infrastructure orchestration (VIO).
Why a Virtual Infrastructure Manager?
 VMs are great!!...but something more is needed
 Where did/do I put my VM? (scheduling & monitoring)
 How do I provision a new cluster node? (clone)
 What IP addresses are available? (networking)
 Provide a uniform view of the resource pool
 Life-cycle management and monitoring of VM
 The VIM should integrate Image, Network and Virtualization

Why a Virtual Infrastructure Manager?


• Dynamic deployment and re-placement of virtual machines on a pool of physical resources
• Transform a rigid distributed physical infrastructure into a flexible and agile virtual infrastructure
• Backend of Public Cloud: Internal management of the infrastructure
• Private Cloud: Virtualization of cluster or data-center for internal users
• Cloud Interoperation: On-demand access to public clouds
Enter – Distributed Management of Virtual Resources
• The cloud ecosystem is inherently distributed. Resources are spread around.
• Location also plays an important role in the efficient selection / optimal selection & placement of
resources.
• The problem of efficiently selecting or scheduling computational resources is well known.
However, the state of the art in VM-based resource scheduling follows a static approach, where
resources are initially selected using a greedy allocation strategy, with minimal or no support
for other placement policies.
• To efficiently schedule resources, VI managers must be able to support flexible and complex
scheduling policies and must leverage (use) the ability of VMs to suspend, resume, and migrate.
So What is the Solution
• Managing VMs in a pool of distributed physical resources is a key concern in IaaS clouds,
requiring the use of a virtual infrastructure manager.
• OpenNebula is capable of managing groups of interconnected VMs—with support for the Xen,
KVM, and VMWare platforms—within data centers and private clouds that involve a large
amount of virtual and physical servers.
• OpenNebula can also be used to build hybrid clouds by interfacing with remote cloud sites.
OpenNebula VIM
What is OpenNebula
 OpenNebula is a simple, feature-rich and flexible solution for the management of virtualized
data centers.
 It enables private, public and hybrid clouds. Here are a few facts about this solution.
 OpenNebula is an open source cloud middleware solution that manages heterogeneous
distributed data center infrastructures.
 It is designed to be a simple but feature-rich, production-ready, customizable solution to build
and manage enterprise clouds—simple to install, update and operate by the administrators; and
simple to use by end users.
 OpenNebula combines existing virtualization technologies with advanced features for multi-
tenancy, automated provisioning and elasticity.
 A built-in virtual network manager maps virtual networks to physical networks.
 Distributions such as Ubuntu and Red Hat Enterprise Linux have already integrated OpenNebula.
 OpenNebula supports Xen, KVM and VMware hypervisors.
 Private Cloud to simplify and optimize internal operations
 Hybrid Cloud to supplement the capacity of the Private Cloud
 Public Cloud to expose your Private to external users
Stages of VM Life Cycle in OpenNebula
• The life cycle of a VM within OpenNebula follows several stages:
• Resource Selection: allowing site administrators to configure the scheduler to prioritize
the resources that are more suitable for the VM
• Resource Preparation: The disk images of the VM are transferred to the target physical
resource. During the boot process, the VM is contextualized, a process where the disk
images are specialized to work in a given environment.
• VM Creation: The VM is booted by the resource hypervisor
• VM Migration: The VM potentially gets migrated to a more suitable resource(e.g.,to
optimizethe power consumption ofthe physical resources)
• VM Termination: When the VM is going to shut down, OpenNebula can transfer back its
disk images to a known location. This way, changes in the VM can be kept for a future
use.
 Within OpenNebula, a VM is modeled as having the following attributes:
o A capacity in terms of memory and CPU
o A set of NICs attached to one or more virtual networks
o A set of disk images
o A state file (optional) or recovery file
VM Management in OpenNebula
• OpenNebula manages VMs by interfacing with a physical resource’s hypervisor, such as Xen,
KVM, or VMWare, Hyper-V to control (e.g., boot, stop, or shutdown) the VM;
• using a set of pluggable drivers that decouple the managing process from the underlying
technology.
• Thus, whenever the core needs to manage a VM, it uses high-level commands such as “start
VM,” “stop VM,” and so on, which are translated by the drivers into commands that the virtual
machine manager can understand. By decoupling the OpenNebula core from the virtualization
technologies through the use of a driver-based architecture, adding support for additional virtual
machine managers only requires writing a driver for it.
Image Management in OpenNebula
• Transferring the VM images from an image repository to the selected resource and by
creating on-the-fly temporary images
• What is image?
• Virtual disk contains the OS and other additional software
• Image management model
Networking OpenNebula
• In general, services deployed on a cloud require several interrelated VMs, with a virtual
application network (VAN) being the primary link between them.
• OpenNebula dynamically creates these VANs and tracks the MAC addresses leased in the
network to the service VMs.
• physical cluster as a set of hosts with one or more network interfaces, each of them
connected to a different physical network.
Benefits of OpenNebula
For the Infrastructure Manager
• Centralized management of VM workload and distributed infrastructures
• Support for VM placement policies: balance of workload, server consolidation…
• Dynamic resizing of the infrastructure
• Dynamic partition and isolation of clusters
• Dynamic scaling of private infrastructure to meet fluctuating demands
• Lower infrastructure expenses combining local and remote Cloud resources
For the Infrastructure User
• Faster delivery and scalability of services
• Support for heterogeneous execution environments
• Full control of the lifecycle of virtualized services management
Features of OpenNebula

Feature Function

Internal Interface • Unix-like CLI for fully management of VM life-cycle and physical boxes
• XML-RPC API and libvirt virtualization API

Scheduler • Requirement/rank matchmaker allowing the definition of workload and


resource-aware allocation policies
• Support for advance reservation of capacity through Haizea
Virtualization • Xen, KVM, and VMware
Management • Generic libvirt connector (VirtualBox planned for 1.4.2)
Image • General mechanisms to transfer and clone VM images
Management
Network • Definition of isolated virtual networks to interconnect VMs
Management
Service • Support for multi-tier services consisting of groups of inter-connected
Management and VMs, and their auto-configuration at boot time
Contextualization
Security • Management of users by the infrastructure administrator
Fault Tolerance • Persistent database backend to store host and VM information
Scalability • Tested in the management of medium scale infrastructures with
hundreds of servers and VMs (no scalability issues has been reported)
Flexibility and • Open, flexible and extensible architecture, interfaces and components,
Extensibility allowing its integration with any product or tool

Comparison with Similar Technologies


Platform VMware
Eucalyptus Nimbus OpenNebula
ISF Vsphere
Virtualization VMware, Xen, KVM,
VMware Xen, KVM Xen
Management Xen VMware
Virtual Network
Yes Yes No Yes Yes
Management
Image
Yes Yes Yes Yes Yes
Management
Service
Contextualizatio No No No Yes Yes
n
Scheduling Yes Yes No No Yes
Administration
Yes Yes No No Yes
Interface
Hybrid Cloud
No No No No Yes
Computing
EC2 Query,
Cloud Interfaces No vCloud EC2 WSRF, EC2
OGF OCCI
Flexibility and
Yes No Yes Yes Yes
Extensibility
Open Source No No GPL Apache Apache
Scheduling VM Workloads - What is a Cloud Workload
• A cloud workload is a specific application, service, capability or a specific amount of work that
can be run on a cloud resource. Virtual machines, databases, containers, Hadoop nodes and
applications are all considered cloud workloads.
• A workload is a collection of resources and code that delivers business value, such as a
customer-facing application or a backend process. A workload might consist of a subset of
resources in a single cloud account or be a collection of multiple resources spanning multiple
cloud accounts.
• Examples of workloads are marketing websites, e-commerce websites, the back-ends for a
mobile app, analytic platforms, etc. Workloads vary in levels of architectural complexity, from
static websites to architectures with multiple data stores and many components.

What are we Talking about


Workload  Amount of work (or load) that software imposes on the underlying computing resources.
Workload  Amount of time and computing resources required to perform a specific task or produce an
output from inputs provided.
Types of Workload Static (fixed work always) Dynamic (Ad-hoc requests) Analytical (Big Data)
Transactional (Main Frame)
Standardized metrics used to measure and report on an application's performance or load are
collectively referred to as benchmarks.
Where do we Run Workloads?
Workload deployment -- determining where and how the workload runs -- is an essential part of
workload management. Today, an enterprise can choose to deploy a workload on premises, as well as to
a cloud.
Traditionally, workloads are deployed in the enterprise data center, which contains all of the server,
storage, network, services and other infrastructure required to operate the workload. The business owns
the data center facility and computing resources and fully controls the provisioning, optimization, and
maintenance of those resources. The enterprise establishes policies and practices for the data center
and workload deployment in order to meet prevailing business goals and regulatory obligations.
With the rise of the internet, cloud computing is now a viable alternative for many on-premises
workload deployments.
The challenge for any business is deciding just where to deploy a given workload. Today, most general-
purpose workloads can operate successfully in the public cloud, and, increasingly, applications are
designed and developed to run natively and solely in a public cloud.
Workload Challenges
Technologically, the most demanding workloads may struggle in the public cloud. Some workloads
require high-performance network storage or depend on internet throughput.
For example, database clusters that need high throughput and low latency may be unsuited to the cloud
-- and the cloud provider may offer high-performance database services as an alternative. Applications
that rely on low latency or are not designed for distributed computing infrastructures are usually kept on
premises.
Technical issues aside, a business may decide to keep workloads on premises for business continuance or
regulatory reasons.
Cloud clients have little actual insight into the underlying hardware and other infrastructure that hosts
the workloads and data. That can be problematic for businesses obligated to meet data security and
other regulatory requirements such as clear auditing and proof of data residency. Keeping those
sensitive workloads in the local data center allows the business to control its own infrastructure and
implement the necessary auditing and controls.
Cloud providers are also independent businesses that serve their own business interests and may not be
able to meet an enterprise's specific uptime or resilience expectations for a workload. Outages happen
and may last for hours -- even days -- adversely affecting client businesses and their customer base.
Consequently, organizations often opt to keep critical workloads in the local data center where dedicated
IT staff can maintain them.
Some organizations implement a hybrid cloud strategy that mixes on-premises, private cloud and public
cloud services. This provides flexibility to run workloads and manage data where it makes the most
sense, for reasons ranging from costs to security to governance and compliance. This presents tradeoffs
-- for example, an organization may keep sensitive data and workloads in its own data center to preserve
more direct control over them, but it also takes on more security responsibilities for them.
Cloud Workload – An Anatomy
Key Terms to Understand
Lease: A lease is defined as a contract between the Cloud Service Provider and the end user to facilitate
the usage of resources available with the CSP to execute a workload of the end user.
Application: One or more business process encapsulated in code which requires computing resources to
execute and provide business value.
Job: A Work Breakdown Structure of an application. An application may contain several jobs, and all of
them need to be executed to complete execution of the application.
Tasks: Steps that need to be performed to complete a job. Task can be sequential or parallel.
When we talk about cloud workload we are referring to the completion of a specific job which
provides some business values.
Resource Allocation vs Task Scheduling
• It might look similar, but there are lots of differences
• RA Identifying and allocating a resource of a type of work load
• TS Ability to shuffle and manage tasks to operate efficiently on the available resources.
Amdahl’s Law
In computer programming, Amdahl's law is that, in a program with parallel processing , a relatively
few instructions that have to be performed in sequence will have a limiting factor on program speedup
such that adding more processors may not make the program run faster.
This is generally an argument against parallel processing for certain applications and, in general, against
overstated claims for parallel computing. Others argue that the kinds of applications for which parallel
processing is best suited tend to be larger problems in which scaling up the number of processors does
indeed bring a corresponding improvement in throughput and performance.
Amdahl's law formula calculates the expected speedup of the system if one part is improved. It
has three parts:
Smax, p, and s.
Smax is the maximum possible improvement of the overall system. It is expressed as a decimal
greater than 1. If the operation is improved to be done in half the time, Smax = 2. Higher means a
greater improvement.
p is the part of the system to be improved, expressed as a number between 0-1. If the part is
45% of the system, p = 0.45.
s is the improvement factor of p, expressed by how many times faster p can be done. If it can be
done in 1/3rd the time, then s = 3.
Essentially, the equation subtracts out the part to be improved, then puts it back in after it has
been improved.
Scheduling Techniques
• While a VI manager like OpenNebula can handle all the minutiae of managing VMs in a pool of
physical resources, scheduling these VMs efficiently is a different and complex matter.
• Immediate provisioning model is used by commercial cloud providers, such as Amazon, since
their data centers’ capacity is assumed to be infinite.
• Best-effort provisioning where requests have to be queued and prioritized
• Advance provisioning where resources are pre-reserved so they will be guaranteed to be
available at a given time period.
• However, when managing a private cloud with limited resources, an immediate provisioning
model is insufficient.
• A lease-based resource provisioning model that can act as a scheduling back-end for
OpenNebula, supporting other provisioning models other than the immediate provisioning
models in existing cloud providers. In particular, Haizea adds support for both best-effort
provisioning and advance reservations, when managing a finite number of resources.
Scheduling Techniques – Existing Approaches
• Efficient reservation of resources in resource management systems has been studied
considerably, particularly in the context of job scheduling.
• In fact, most modern job schedulers support advance reservation of resources, but their
implementation falls short in several aspects.
• First of all, they are constrained by the job abstraction; when a user makes an advance
reservation in a job-based system, the user does not have direct and unfettered access to the
resources. Cloud users can access the VMs they requested, and are allowed to submit jobs to
them.
• Example-1: XYZ Inc creates a new queue that will be bound to the reserved resources,
guaranteeing that jobs submitted to that queue will be executed on them (assuming they have
permission to do so)
Example-2: ABC Inc, simply allow users to specify that a submitted job should use the reserved
resources (if the submitting user has permission to do so).
• Additionally, advance reservations lead to utilization problems caused by the need to vacate
resources before a reservation can begin.
• Traditional job schedulers are unable to efficiently schedule workloads combining both best-
effort jobs and advance reservations.
• However, advance reservations can be supported more efficiently by using a scheduler capable
of preempting running jobs at the start of the reservation and resuming them at the end of the
reservation.
• Preemption can also be used to run large parallel jobs (which tend to have long queue times)
earlier, and it is specially relevant in the context of urgent computing, where resources have to
be provisioned on very short notice and the likelihood of having jobs already assigned to
resources is higher.
While preemption can be accomplished by canceling a running job, the least disruptive form of
preemption is check pointing, where the preempted job’s entire state is saved to disk, allowing it to
resume its work from the last checkpoint.
• Additionally, some schedulers also support job migration, allowing check-pointed jobs to restart
on other available resources, instead of having to wait until the preempting job or reservation
has completed.
• •Check-pointing-based preemption, requires the job’s executable itself to be checkpointable. An
application can be made checkpointable by explicitly adding that functionality to an application
(application-level and library-level checkpointing) OR transparently by using OS-level
checkpointing, where the operating system (such as Cray, IRIX, and patched versions of Linux
using BLCR [17]) checkpoints a process, without rewriting the program or relinking it with
checkpointing libraries.
• •Thus, a job scheduler capable of checkpointing-based preemption and migration could be used
to checkpoint jobs before the start of an advance reservation, minimizing their impact on the
schedule.
• •However, the application and library-level checkpointing approaches burden the user with
having to modify their applications to make them checkpointable, imposing a restriction on the
software environment. On the other hand, OS-level checkpointing is a more appealing option,
but still imposes certain software restrictions on resource consumers.
Scheduling Techniques – Existing Approaches
• An alternative approach to supporting advance reservations was proposed by Nurmi et al. [18],
which introduced “virtual advance reservations for queues” (VARQ).
• •This approach overlays advance reservations over traditional job schedulers by first predicting
the time a job would spend waiting in a scheduler’s queue and then submitting a job
(representing the advance reservation) at a time such that, based on the wait time prediction,
the probability that it will be running at the start of the reservation is maximized.
• •Since no actual reservations can be done, VARQ jobs can run on traditional job schedulers,
which will not distinguish between the regular best-effort jobs and the VARQ jobs.
•Although this is an interesting approach that can be realistically implemented in practice (since it does
not require modifications to existing scheduler), it still depends on the job abstraction.
Scheduling Techniques – VM Overheads
Virtualization technologies are a key enabler of many features found in IaaS clouds. Virtual machines are
also an appealing vehicle for implementing efficient reservation of resources due to:
•Ability to be suspended,
•Potentially migrated,
•Resumed without modifying any of the applications running inside the VM.
However, virtual machines also raise additional challenges related to the overhead of using VMs:
•Preparation Overhead. When using VMs to implement reservations, a VM disk image must be either
prepared on-the-fly or transferred to the physical node where it is needed. Since a VM disk image can
have a size in the order of gigabytes, this preparation overhead can significantly delay the starting time
of leases. This delay may, in some cases, be unacceptable for advance reservations that must start at a
specific time.
•Runtime Overhead. Once a VM is running, scheduling primitives such as checkpointing and resuming
can incur in significant overhead since a VM’s entire memory space must be saved to disk, and then read
from disk. Migration involves transferring this saved memory along with the VM disk image. Similar to
deployment overhead, this overhead can result in noticeable delays.
Reservation Based Provisioning
• A particularly interesting problem when provisioning virtual infrastructures is how to deal with
situations where the demand for resources is known beforehand—for example, when an
experiment depending on some complex piece of equipment is going to run from 2 pm to 4 pm,
and computational resources must be available at exactly that time to process the data
produced by the equipment.
• Commercial clouds do have infinite resources to handle this situation. On the other hand, when
dealing with finite capacity, a different approach is needed. However, the intuitively simple
solution of reserving the resources beforehand is not so simple, because it is known to cause
resources to be underutilized, due to the difficulty of scheduling other requests around an
inflexible reservation.
Provisioning to meet SLA
• IaaS clouds can be used to deploy services that will be consumed by users other than the one
that has deployed the services.
• There is a distinction between the cloud consumer (i.e., the service owner; for instance, the
company that develops and manages the applications) and the end users of the resources
provisioned on the cloud (i.e., the service user; for instance, the users that access the
applications).
• Furthermore, service owners will enter into service-level agreements (SLAs) with their end
users, covering guarantees such as the timeliness with which these services will respond.
Requirements are formalized in infrastructure SLAs between the service owner and cloud provider,
separate from the high-level SLAs between the service owner and its end users.
• In many cases, either the service owner is not resourceful enough to perform an exact service
sizing or service workloads are hard to anticipate in advance.
• Therefore, to protect high-level SLAs, the cloud provider should cater for elasticity on demand.
Scaling and de-scaling of an application is best managed by the application itself. The reason is that in
many cases, resource allocation decisions are application-specific and are being driven by the
application level metrics.
Scheduling Techniques for Advance Reservation of Capacity
• Virtualization technologies are a key enabler of many features found in IaaS clouds. Virtual
machines are also an appealing vehicle for implementing efficient reservation of resources due
to:
• Ability to be suspended,
• Potentially migrated,
• Resumed without modifying any of the applications running inside the VM.
• However, virtual machines also raise additional challenges related
to the overhead of using VMs:
• Preparation Overhead. When using VMs to implement reservations, a VM disk image
must be either prepared on-the-fly or transferred to the physical node where it is
needed. Since a VM disk image can have a size in the order of gigabytes, this
preparation overhead can significantly delay the starting time of leases. This delay may,
in some cases, be unacceptable for advance reservations that must start at a specific
time.
Runtime Overhead. Once a VM is running, scheduling primitives such as checkpointing and resuming
can incur in significant overhead since a VM’s entire memory space must be saved to disk, and then read
from disk. Migration involves transferring this saved memory along with the VM disk image. Similar to
deployment overhead, this overhead can result in noticeable delays.

Solution – Haizea Scheduler


The Haizea project (http://haizea.cs.uchicago.edu/) was created to develop a scheduler that can
efficiently support advance reservations efficiently by using the suspend/resume/migrate capability of
VMs, but minimizing the overhead of using VMs.
The fundamental resource provisioning abstraction in Haizea is the lease, with three types of lease
currently supported:
Advanced reservation leases, where the resources must be available at a specific time.
Best-effort leases, where resources are provisioned as soon as possible and requests are placed on a
queue if necessary.
Immediate leases, where resources are provisioned when requested or not at all.
What is Haizea Scheduler
When managing a private cloud with limited resources, an immediate provisioning model is insufficient.
A lease-based resource provisioning model that can act as a scheduling back-end for OpenNebula,
supporting other provisioning models other than the immediate provisioning models in existing cloud
providers. In particular, Haizea adds support for both best-effort provisioning and advance reservations,
when managing a finite number of resources. The remainder of this section describes Haizea’s leasing
model and the algorithms Haizea uses to schedule these leases.
•We define a lease as “a negotiated and renegotiable agreement between a resource provider and a
resource consumer, where the former agrees to make a set of resources available to the latter, based
on a set of lease terms presented by the resource consumer.”
•The terms must encompass the following:
•the hardware resources required by the resource consumer, such as CPUs, memory, and network
bandwidth;
•a software environment required on the leased resources;
•and an availability period during which a user requests that the hardware and software resources be
available.

How Haizea Scheduler Works


We focus on the availability dimension of a lease and, in particular, on how to efficiently support advance
reservations. Thus, we consider the following availability terms:
•Start time may be unspecified (a best-effort lease) or specified (an advance reservation lease). In the
latter case, the user may specify either a specific start time or a time period during which the lease start
may occur.
• Maximum duration refers to the total maximum amount of time that the leased resources will be
available.
• Leases can be preemptable. A preemptable lease can be safely paused without disrupting the
computation that takes place inside the lease.
Haizea’s resource model considers that it manages W physical nodes capable of running virtual
machines. Each node i has CPUs, megabytes (MB) of memory, and MB of local disk storage.
•We assume that all disk images required to run virtual machines are available in a repository from
which they can be transferred to nodes as needed and that all are connected at a bandwidth of B
MB/sec by a switched network.
•A lease is implemented as a set of N VMs, each allocated resources described by a tuple (p, m, d, b),
where p is number of CPUs, m is memory in MB, d is disk space in MB, and b is network bandwidth in
MB/sec.
•A disk image I with a size of size (I) MB must be transferred from the repository to a node before the
VM can start. When transferring a disk image to multiple nodes, we use multicasting and model the
transfer time as size (I)/B.
Haizea is designed to process lease requests and determine how those requests can be mapped
to virtual machines, leveraging their suspend/resume/migrate capability, in such a way that the leases’
requirements are satisfied.
•The scheduling component of Haizea allow best-effort leases to be preempted if resources have to be
freed up for advance reservation requests.
•Additionally, to address the preparation and runtime overheads mentioned earlier, the scheduler
allocates resources explicitly for the overhead activities (such as transferring disk images or suspending
VMs) instead of assuming they should be deducted from the lease’s allocation.
•Besides guaranteeing that certain operations complete on time (e.g., an image transfer before the start
of a lease), the scheduler also attempts to minimize this overhead whenever possible, most notably by
reusing disk image transfers and caching disk images on the physical nodes.
Best-effort leases are scheduled using a queue. When a best-effort lease is requested, the lease
request is placed at the end of the queue, which is periodically evaluated using a backfilling algorithm to
determine if any leases can be scheduled.
•The scheduler does this by first checking the earliest possible starting time for the lease on each
physical node, which will depend on the required disk images. For example, if some physical nodes have
cached the required disk image, it will be possible to start the lease earlier on those nodes.
•Once these earliest starting times have been determined, the scheduler chooses the nodes that allow
the lease to start the soonest.
•The use of VM suspension/resumption allows the best-effort leases to be scheduled even if there are
not enough resources available for their full requested duration.
Advance reservations, on the other hand, do not go through a queue, since they must start at
either the requested time or not at all.
•Thus, scheduling this type of lease is relatively simple, because it mostly involves checking if there are
enough resources available during the requested interval.
•However, the scheduler must also check if any associated overheads can be scheduled in such a way
that the lease can still start on time.
•For preparation overhead, the scheduler determines if the required images can be transferred on time.
•These transfers are scheduled using an Earliest Deadline First (EDF) algorithm, where the deadline for
the image transfer is the start time of the advance reservation lease.
•For runtime overhead, the scheduler will attempt to schedule the lease without having to preempt
other leases; if preemption is unavoidable. The necessary suspension operations are scheduled; if they
can be performed on time.
Leasing Schedule – Best Effort Lease (BEL)

TIME
Node 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
N1 P5
N2 P1 P2
N3 P3 P5
e.g. If we have three nodes N1, N2 and N3 and five processes have to be scheduled with entry time
and duration of resources as follows – Best effort queue
Process Entry time Time required Node Start time
P1 9am 3 hours N2 9am Ended at 12pm
P2 10am 10 hours N2 12pm Ended at 8pm
P3 9.30am 4 hours N3 9.30am Ended at 1.30pm
P4 11am 5hours N3 1.30pm Ended at 6.30pm
P5 8am 15hours N1 8am Ended at 11pm

Leasing Schedule – Advanced Reservation (AR)


e.g. If we have three nodes N1, N2 and N3 and five processes have to be scheduled with entry time and
duration as given below (last example) with P2 being advanced reservation. (HW next slide)
Process Entry Time Node Start time
time required
P1 9am 3 hours N2 9am Ended at 12pm

P2 10am 10 hours N1 10am Ended at 8pm

P3 9.30am 4 hours N3 9.30am Ended at


1.30pm
P4 11am 5hours N3 1.30pm Ended at
6.30pm
P5 8am 15hours N1 8am Resumed on Ended at 1
pre-empted at N2 at 12pm am
10am

e.g. If we have three nodes N1, N2 and N3 and five processes have to be scheduled with start time and
duration with P2 and P5 being advanced reservation

CAPACITY MANAGEMENT TO MEET SLA COMMITMENTS

CAPACITY MANAGEMENT TO MEET SLA COMMITMENTS - Infrastructure SLA’s


Multi Tenancy
What is Multi Tenancy?
• Multi-tenancy is an architectural pattern
• A single instance of the software is run on the service provider’s infrastructure
• Multiple tenants access the same instance.
• In contrast to the multi-user model, multi-tenancy requires customizing the single instance
according to the multi-faceted requirements of many tenants.
Multi-tenancy is an architecture in which a single instance of a software application serves multiple
customers. Each customer is called a tenant. Tenants may be given the ability to customize some parts of
the application, such as color of the user interface (UI) or business rules, but they cannot customize the
application's code.

Some Facts
• In the multi tenant architecture, the application is redesigned to handle the resource sharing
between the multiple tenants.
• For example, SalesForce.com (service provider) hosts the CRM application using their
infrastructure.
• A company who wants to use this hosted CRM application for their business is the customer and
the employees of the companies to whom the company provides privileges to access the CRM
application are the actual users of the application
• With this architecture, data, configuration, user management, tenant specific functionality etc
are shared between the tenants.
• MT contrasts with multi-instance architectures, where separate software instances operate on
behalf of different tenants.
• In virtualization, the user is given the illusion that he owns the complete infrastructure on which
application is running through concept of virtual machine.
• The hypervisor plays important role to achieve the separation between the multiple users.
Multi Tenant vs Multi Instance

IAAS – MT Model
In (IaaS), a single physical or virtual infrastructure is shared among multiple tenants. Each tenant is
provided with its own logical environment, which includes compute resources such as virtual machines
(VMs), storage, networking, and other infrastructure components. There are several ways to implement
a multi-tenant deployment model for IaaS:
1. Virtual machine isolation: In this model, each tenant is provided with a set of virtual machines
that are completely isolated from other tenants. This approach provides strong security and
isolation, but can be less efficient than other approaches.
2. Network isolation: In this model, each tenant is provided with its own virtual network that is
isolated from other tenants. This approach provides better performance and efficiency than
virtual machine isolation, but may require more complex network management.
3. Resource pool isolation: In this model, each tenant is provided with a dedicated set of compute
resources such as CPU, memory, and storage that are isolated from other tenants. This approach
provides good performance and resource allocation, but may be less secure than other
approaches.
4. Hybrid model: In this model, a combination of the above models is used to meet the specific
requirements of each tenant. For example, some tenants may require strong security and
isolation, while others may prioritize performance and efficiency.
PAAS – MT Model
1. Application-level isolation: In this model, each tenant's application is isolated from other
tenants' applications at the application level. This can be achieved through containerization or
virtualization, and may also involve isolating the application's data and configuration settings.
2. Tenant-level isolation: In this model, each tenant is provided with a dedicated instance of the
platform, which is isolated from other tenants. This approach provides strong security and
isolation, but can be less efficient than other approaches.
3. Resource pool isolation: In this model, each tenant is provided with a dedicated set of compute
resources such as CPU, memory, and storage that are isolated from other tenants. This approach
provides good performance and resource allocation, but may be less secure than other
approaches.
4. Hybrid model: In this model, a combination of the above models is used to meet the specific
requirements of each tenant. For example, some tenants may require strong security and
isolation, while others may prioritize performance and efficiency.
SAAS – MT Model
There are several ways to implement a multi-tenant deployment model for SaaS:
1. Database-level isolation: In this model, each tenant's data is stored in a separate database
schema, which provides complete isolation and security. This approach can be efficient and
scalable, but may require complex database management.
2. Application-level isolation: In this model, each tenant's data is kept separate at the application
level, which may involve partitioning data and configuration settings. This approach can be more
flexible and easier to manage than the database-level isolation model, but may be less secure.
3. Hybrid model: In this model, a combination of the above models is used to meet the specific
requirements of each tenant. For example, some tenants may require complete data isolation,
while others may be willing to share some components of the application.
Benefits
Cost Savings –
• An application instance usually incurs a certain amount of memory and processing overhead
which can be substantial when multiplied by many customers, especially if the customers are
small.
• As the single instance is shared between multiple tenants this cost overhead can be segregated
between multiple tenants.
Data aggregation–
• In non MT architecture, the data for different customers will be located in different database
schemas and pulling information from all of them can be a very cumbersome task.
• In MT architecture, instead of collecting data from multiple data sources, with potentially
different database schemas, all data for all customers is stored in a single database schema.
Thus, running queries across customers, mining data, and looking for trends is much simpler.
Release management –
• MT simplifies the release management process.
• In a traditional release management process, packages containing code and database changes
are distributed to client desktop and/or server machines.
• These packages then have to be installed on each individual machine.
• With the multitenant model, the package typically only needs to be installed on a single server.
This greatly simplifies the release management process.
Disadvantages
Complexity –
• Because of the additional customization complexity and the need to maintain per-tenant
metadata, multitenant applications require a larger development effort.
Risks-
• At the same time, multi-tenancy increases the risks and impacts inherent in applying a new
release version.
• As there is a single software instance serving multiple tenants, an update on this instance may
cause downtime for all tenants even if the update is requested and useful for only one tenant.
• Also, some bugs and issues resulted from applying the new release could manifest in other
tenants' personalized view of the application.
• Because of possible downtime, the moment of applying the release may be restricted depending
on time usage schedule of more than one tenant.
Characteristics of MT Architecture
Customization –
• Multi-tenant applications are typically required to provide a high degree of customization to
support each target organization's needs. Customization typically includes the following aspects:
1. Branding: allowing each organization to customize the look-and-feel of the application to match
their corporate branding (often referred to as a distinct "skin").
2. Workflow: accommodating differences in workflow to be used by a wide range of potential
customers.
3. Extensions to the data model: supporting an extensible data model to give customers the ability
to customize the data elements managed by the application to meet their specific needs.
4. Access control: letting each client organization independently customize access rights and
restrictions for each user.
Quality of service
• Multitenant applications are expected to provide adequate isolation of security, robustness and
performance between multiple tenants which is provided by the layers below the application in
case of multi-instance applications
MT- Different Levels
• Implementing the highest degree of resource sharing for all resources may be prohibitively
expensive in development effort and complexity of the system.
• A balanced approach, where there is fine grained sharing of resources only for important
resources, may be the optimum approach.
• The four levels of multi-tenancy are described in the following list for any given resource in a
cloud system, the appropriate level could be selected
• Custom instances
• Configurable instances
• Configurable, multi-tenant efficient instances
• Scalable, configurable, multi tenant efficient resources
Custom instances
 Lowest level of MT
 Each customer has own custom version of application
 Different versions of application are running differently
 Extremely difficult to manage as needs dedicated support for each customer
Configurable instances
 Same version of application is shared between the customers with customizations
specific to each customer
 Different instances of same application are running
 Supports customization like logos on the screen, tailor made workflows
 Managing application is better that custom instances approach as only one copy needs
to be managed
 Upgrades are simple and seamless
Configurable, multi-tenant efficient instances
 Same version of application is shared between the customers through a single instance
of application
 More efficient usage of the resources
 Management is extremely simple
Scalable, configurable, multi tenant efficient resources
 All features of level 3 are supported.
 Application instances are installed on cluster of computers allowing it to scale as per
demand.
 Maximum level of resource sharing is achieved.
 Example, Gmail
Security in MT
• The key challenge in multi-tenancy is the secure sharing of resources. A very important
technology to ensure this is authentication.
• Clearly each tenant would like to specify the users who can log in to the cloud system. Unlike
traditional computer systems, the tenant would specify the valid users, but authentication still
has to be done by the cloud service provider.
• Two basic approaches can be used: a centralized authentication system or a decentralized
authentication system .
• In the centralized system, all authentication is performed using a centralized user data base.
The cloud administrator gives the tenant’s administrator rights to manage user accounts for that
tenant. When the user signs in, they are authenticated against the centralized database.
• In the decentralized system, each tenant maintains their own user data base, and the tenant
needs to deploy a federation service that interfaces between the tenant’s authentication
framework and the cloud system’s authentication service.
• Decentralized authentication is useful if single sign-on is important, since centralized
authentication systems will require the user to sign on to the central authentication system in
addition to signing on to the tenant’s authentication system.
• However, decentralized authentication systems have the disadvantage that they need a trust
relationship between the tenant’s authentication system and the cloud provider’s
authentication system. Given the self-service nature of the cloud (i.e., it is unlikely that the
cloud provider would have the resources to investigate each tenant, and ensure that their
authentication infrastructure is secure), centralized authentication seems to be more generally
applicable.

Multi Tenancy: Resource Sharing


• Two major resources that need to be shared are storage and servers.
• The basic principles for sharing of these resources are described first.
• This is followed by a deeper discussion that focuses on the question of how these resources can
be shared at a fine granularity, while allowing the tenants to customize the data to their
requirements.
Sharing storage resources:
• In a multi-tenant system, many tenants share the same storage system. Cloud applications may
use two kinds of storage systems: File systems and databases, where the term database is used
to mean not only relational databases, but NoSQL databases as well.
• The discussion is focused on sharing data for different users in a database. The focus is also on
multi-tenant efficient approaches where there is only one instance of the database which is
shared among all the tenants
Resource Sharing Approaches
STORAGE Sharing Scenarios in MT

Support for Customization


• It is important for the cloud infrastructure to support customization of the stored data, since it is
likely that different tenants may want to store different data in their tables.
• For example, an automobile repair shop, different shops may want to store different details
about the repairs carried out.
• Three methods for doing this are described in the next slides. It is to be noted that difficulties for
customization occur only in the shared table method.
• In the dedicated table method, each tenant has their own table, and therefore can have different
schema.
Pre Allocated Columns:
• In shared table approach, it’s very complex to provide support for the customizations.
• Each tenant might have his unique requirements to store data in the tables and using shared
table approach, managing such requirements needs to come up with proper data architecture.
Pre allocated columns
• Fixed number of columns is reserved for custom columns
• If the numbers of custom column are too less than the reserved custom columns then
space are wasted
• If the numbers of custom columns are more than the reserved custom columns then
customer will feel restricted.

Name-Value Pairs:
• The major problem with the pre-allocated columns technique is that there could be a lot of
wasted space.
• If the number of columns is too low, then users will feel constrained in their customizations.
• However, if the number is too big, there will be a lot of space wastage.
Name-Value Pairs
• A metadata table for tenant is maintained
• A data table is specified with standard common columns and has extra column at end to point to
another name value pair table
• Name value pair table (aka pivot table) actually includes the custom fields for this record.
• The actual custom fields are stored with data type and other metadata information.
• Space efficient as compared to pre allocated columns method
• Joins are involved to fetch tenant specific data
• Let’s consider Tenants have following table structure that needs to be mapped to the Name-
Value pair table structure.
• Each tenant table has:
• standard common fields and
• few custom fields having varying data type

Sample MT Architecture
Multi Tenancy: Resource Sharing
Did You Know ?
Here are some interesting facts about multi-tenancy in the cloud:
1. Multi-tenancy is one of the key features of cloud computing that makes it so attractive to
businesses. It allows multiple customers to share the same infrastructure and services, which
can lead to cost savings and improved efficiency.
2. One of the biggest challenges of multi-tenancy is ensuring that each tenant's data is kept
separate and secure. This requires careful planning and implementation of security measures to
prevent data leakage and unauthorized access.
3. Multi-tenancy can be implemented at different levels in the cloud, including infrastructure,
platform, and software. Each level requires different techniques and technologies to ensure
proper isolation and security.
4. In the public cloud model, multi-tenancy is typically implemented using virtualization and
resource sharing. Each tenant is assigned their own virtual resources, which are isolated from
other tenants.
Did You Know ?
Here are some interesting facts about multi-tenancy in the cloud:
1. Multi-tenancy is one of the key features of cloud computing that makes it so attractive to
businesses. It allows multiple customers to share the same infrastructure and services, which
can lead to cost savings and improved efficiency.
2. One of the biggest challenges of multi-tenancy is ensuring that each tenant's data is kept
separate and secure. This requires careful planning and implementation of security measures to
prevent data leakage and unauthorized access.
3. Multi-tenancy can be implemented at different levels in the cloud, including infrastructure,
platform, and software. Each level requires different techniques and technologies to ensure
proper isolation and security.
4. In the public cloud model, multi-tenancy is typically implemented using virtualization and
resource sharing. Each tenant is assigned their own virtual resources, which are isolated from
other tenants.
Did You Know ?
1. In the private cloud model, multi-tenancy can be implemented using virtualization or
containerization, and is typically achieved through strict access controls and resource
partitioning.
2. Hybrid cloud environments can offer the benefits of both public and private cloud models, but
can also add complexity and require careful management to ensure proper isolation and
security.
3. Multi-tenancy can offer significant cost savings for businesses, as they only pay for the resources
they use. This can also lead to better resource utilization and scalability, as resources are shared
among multiple tenants.
4. Multi-tenancy is not suitable for all types of applications, and some applications may require
dedicated resources and environments. It is important to carefully evaluate the requirements of
each application and choose the appropriate deployment model.
Overall, multi-tenancy is a key feature of cloud computing that can offer significant benefits to
businesses, but it requires careful planning and implementation to ensure proper isolation and security.
SLA Management
What is SLA or Service Level Agreement
• Describes a set of non functional requirements of the service.
• Example : RTO time – Return to Operation Time if case of failure
• SLO – Service Level Objective. That is, the objective to be achieved.
• KPI – Key Performance Indicators
• Service Level Objective:
• Objective of service quality that has to be achieved.
• Set of measurable KPIs with thresholds to decide if the objective is fulfilled or not.
• Attainable
• Repeatable
• Measurable
• Understandable
• Meaningful
• Controllable
• Affordable
• Mutually acceptable
SLA Role in High Availability
• What is High Availability
• Driven by SLA (Service Level Agreement)
• High Availability must conform to SLA. Goal here is to meet promised quality
• Examples: Service is available 99.95%.
• Net Banking : Guarantees banking 24 x7 . There are published down times called “ SYSTEM
MAINTENANCE” window
• Duronto Express : Promises point to point service with only Service halts
SLA Role in High Availability
What deters HA
• Service outage which was unplanned because of Server or Human Error
• Service Disruption by Planned Maintenance windows (Downtime)
• Service Performance degradation due to lack of infrastructure or failure of critical components
during peak load time
• Bottom Line : Need effective Capacity Planning to meet promised QoS or in other words
maintain HA thus adhering to the SLA’s
Steps for HA
Steps to achieve high availability
• Build for server failure
• Have redundant servers which can be made online
• Build for zone failure
• Having DR sites in case of failure to switch over to backup site
• Build for Cloud failure
• Plan for your Cloud setup to be robust and contained. Errors should not cascade, having
fire door policy to contain threats or errors which affect the ecosystem
• Automating and testing
• Test & Test again, low manual interference
The 3 Initialisms

SLA vs SLO
The SLA is the entire agreement that specifies
o what service is to be provided,
o how it is supported,
o times,
o locations,
o costs,
o performance, and
o responsibilities of the parties involved.
SLOs are specific measurable characteristics of the SLA such as
• availability,
• throughput,
• frequency,
• response time, or quality.
SLIs are actual measure of the SLO and serves as a benchmark to compare against the promised SLO
• availability,
• throughput,
• frequency,
• response time, or quality.

The 3 Initialisms
SLA
• In the early days of web-application deployment, performance of the application at peak load
was a single important criterion for provisioning server resources.
• Provisioning in those days involved deciding hardware configuration, determining the number of
physical machines, and acquiring them upfront so that the overall business objectives could be
achieved.
• The web applications were hosted on these dedicated individual servers within enterprises’ own
server rooms. These web applications were used to provide different kinds of e-services to
various clients.
• Due to the increasing complexity of managing the huge Data centres, enterprises started
outsourcing the application hosting to the infrastructure providers. They would procure the
hardware and make it available for application hosting.
• It necessitated the enterprises to enter into a legal agreement with the infrastructure service
providers to guarantee a minimum quality of service (QoS).
• Typically, the QoS parameters are related to the availability of the system CPU, data storage, and
network for efficient execution of the application at peak loads.
• This legal agreement is known as the service-level agreement (SLA)

Co-Hosting Application
The dedicated hosting practice resulted in massive redundancies within the ASP’s data centers due to the
underutilization of many of their servers.
This is because the applications were not fully utilizing their servers’ capacity at nonpeak loads.
To reduce the redundancies and increase the server utilization in data centers, ASPs started co-hosting
applications with complementary work load patterns.
Co-hosting of applications means deploying more than one application on a single server. This led to
further cost advantage for both the ASPs and enterprises.

Co-Hosting Application: Issues


However, newer challenges such as application performance isolation and security guarantees have
emerged and needed to be addressed.
Performance isolation implies that one application should not steal the resources being utilized by other
co-located applications.
For example, assume that application A is required to use more quantity of a resource than originally
allocated to it for duration of time t. For that duration the amount of the same resource available to
application B is decreased. This could adversely affect the performance of application B.
Security guaranty: Similarly, one application should not access and destroy the data and other
information of co-located applications. Hence, appropriate measures are needed to guarantee security
and performance isolation.

Co-Hosting Application: Solution


These challenges prevented ASPs from fully realizing the benefits of co-hosting. Virtualization
technologies have been proposed to overcome the above challenges. The ASPs could exploit the
containerization features of virtualization technologies to provide performance isolation and guarantee
data security to different co-hosted applications.
The applications, instead of being hosted on the physical machines, can be encapsulated using virtual
machines.
System resource allocation to these virtual machines can be made in two modes:
(1) Conserving
(2) Non-conserving.
In the conserving mode, a virtual machine demanding more system resources (CPU and memory) than
the specified quota cannot be allocated the spare resources that are remain un-utilized by the other co-
hosted virtual machines.
In the non-conserving mode the spare resources that are not utilized by the co-hosted virtual machines
can be used by the virtual machine needing the extra amount of resource. If the resource requirements
of a virtual machine cannot be fulfilled from the current physical host, then the virtual machine can be
migrated to another physical machine capable of fulfilling the additional resource requirements.

Traditional SLO Management Approaches


Load balancing – is to distribute the incoming request onto a set of physical machines, each hosting a
replica of an application, so that the load on the machines is equally distributed. Front end node (node
facing the client) receives the incoming requests and distributes these requests to different physical
machines for further execution. Back end nodes serves the incoming requests.
Class agnostic load balancing – the front end machine are agnostic to nature of request. This means that
the front end machine is neither aware of the type of client from which the request originates nor aware
of the request category.
Class aware load balancing – The front end node additionally inspect the type of client making the
request and/or the type of service requested before deciding which back end node should service the
request.

Traditional SLO Management Approaches


Admission Control – These algorithms plays important role in deciding the set of requests that should be
admitted into the application server when the server experiences very heavy loads. The objective of
admission control mechanisms is to police the incoming requests and identify when the system faces
overload situations.
Request based algorithms – reject new requests if the servers are running to their capacity. The
disadvantage is that client’s session may consist of multiple requests that are not necessarily unrelated.
Some requests are rejected even if there are others that are honored.
Session based algorithms – ensure that longer sessions are completed and any new sessions are
rejected. Once a session is admitted into the server, all future requests belonging to that session are
admitted as well, even though new sessions are rejected by system.

SLA Types
Infrastructure SLA – Infrastructure provider manages and offers guarantees on availability of the
infrastructure, namely server machine, power, network connectivity and so on. Enterprises manage their
applications that are deployed on these server machines. The machines are leased to customers and are
isolated from machines of other customers.
For example, SLAs can be
Hardware availability – 99% uptime in a calendar month
Power availability – 99.99% of the time in calendar month
Data center network availability – 99.99% of the time in calendar month
Application SLA – In application co-hosting model, the server capacity is available to the applications
based solely on their resource demands. Hence the service providers are flexible in allocating and de-
allocating computing resources among the co-located applications. Therefore the service providers are
also responsible for ensuring to meet their customers application SLOs.
For example, SLAs can be
Web site response time (max of 3.5 sec per user request), Latency of web server (max of 0.2 sec per
request), Latency of DB (max of 0.5 sec per query)

Infrastructure SLA vs Capacity Management


If temporal behavior of services with respect to resource demands is highly predictable, then capacity
can be efficiently scheduled using reservations.
For less predictable elastic workloads., exact scheduling of capacity may not be possible. Rather than
that, capacity planning and optimizations are required.
•IaaS providers perform two complementary management tasks:
(1) Capacity planning to make sure that SLA obligations are met as contracted with the service providers
and;
(2) Continuous optimization of resource utilization in specific workload to make the most efficient use of
the existing capacity.

Infrastructure SLA
Thus, to deploy a service on a cloud, a service provider orders suitable virtual hardware and installs its
application software on it.
From the IaaS provider, a given service configuration is a virtual resource array of black box resources,
which correspond to the number of instances of resource type.
For example, a typical three-tier application may contain ten general-purpose small instances to run
Web front-ends, three large instances to run an application server cluster with load balancing and
redundancy, and two large instances to run a replicated database.
A risk mitigation mechanism to protect user experience in the IaaS model is offered by infrastructure
SLAs (i.e., the SLAs formalizing capacity availability) signed between service provider and IaaS
provider.
There is no universal approach to infrastructure SLAs. As the IaaS field matures and more experience is
being gained, some methodologies may become more popular than others. Also some methods may be
more suitable for specific workloads than other. There are three main approaches as follows.
No SLAs. This approach is based on two premises:
(a) Cloud always has spare capacity to provide on demand, and
(b) services are not QoS sensitive and can withstand moderate performance degradation. This
methodology is best suited for the best effort workloads.
Probabilistic SLAs. (Epistemic) These SLAs allow us to trade capacity availability for cost of
consumption. Probabilistic SLAs specify clauses that determine availability percentile for contracted
resources computed over the SLA evaluation period. The lower the availability percentile, the cheaper
the cost of resource consumption. This type of SLA is suitable for small and medium businesses and for
many enterprise grade applications.
Deterministic SLAs. (Ontic) These are, in fact, probabilistic SLAs where resource availability percentile is
100%. These SLAs are most stringent and difficult to guarantee. From the provider’s point of view, they
do not admit capacity multiplexing. Therefore this is the most costly option for service providers, which
may be applied for critical services.

SLA Life Cycle


Each SLA goes through a sequence of steps starting from identification of terms and conditions,
activation and monitoring of the stated terms and conditions, and eventual termination of contract once
the hosting relationship ceases to exist. Such a sequence of steps is called SLA life cycle and consists of
the following five phases:
1. Contract definition
2. Publishing and discovery
3. Negotiation
4. Operationalization
5. De-commissioning

SLA Life Cycle


Contract Definition: Generally, service providers define a set of service offerings and corresponding
SLAs using standard templates. These service offerings form a catalog. Individual SLAs for enterprises
can be derived by customizing these base SLA templates.
Publication and Discovery. Service provider advertises these base service offerings through standard
publication media, and the customers should be able to locate the service provider by searching the
catalog. The customers can search different competitive offerings and shortlist a few that fulfill their
requirements for further negotiation.
Negotiation: Once the customer has discovered a service provider who can meet their application
hosting need, the SLA terms and conditions needs to be mutually agreed upon before signing the
agreement for hosting the application. For a standard packaged application which is offered as service,
this phase could be automated.
For customized applications that are hosted on cloud platforms, this phase is manual. The service
provider needs to analyze the application’s behavior with respect to scalability and performance before
agreeing on the specification of SLA. At the end of this phase, the SLA is mutually agreed by both
customer and provider and is eventually signed off. SLA negotiation can utilize the WS-negotiation
specification
SLA Life Cycle
Operational: SLA operation consists of SLA monitoring, SLA accounting, and SLA enforcement.
SLA monitoring involves measuring parameter values and calculating the metrics defined as a part of
SLA and determining the deviations.
On identifying the deviations, the concerned parties are notified. SLA Accounting involves capturing
and archiving the SLA adherence for compliance.
As part of accounting, the application’s actual performance and the performance guaranteed as a part
of SLA is reported.
De-commissioning : SLA decommissioning involves termination of all activities performed
under a particular SLA when the hosting relationship between the service provider and the
service consumer has ended.
SLA specifies the terms and conditions of contract termination and specifies situations under
which the relationship between a service provider and a service consumer can be considered to
be legally ended

SLA Life Cycle


• From the SLA perspective there are multiple challenges for provisioning the infrastructure on
demand. These includes -
• The application if a black box to CSP (Cloud Service Provider) and the CSP have virtually no
knowledge about the application runtime characteristics. CSP needs to determine the right
amount of computing resources required for different components of application at various
workloads.
• CSP needs to understand the performance bottlenecks and the scalability of the application.
• CSP analyses the application before it goes live. However, subsequent operations by customers
to their applications or auto updates beside others can impact performance of the applications,
thereby making the applications SLA at risk.
• The risk of capacity planning is with service provider instead of customer.

SLA LC Management – Cloud Applications


SLA management of applications hosted on cloud platforms involves five phases.
1. Feasibility
2. On-boarding
3. Pre-production
4. Production
5. Termination

SLA Cloud Application LC Management


Feasibility Analysis
Managed Service Providers(MSP) conducts the feasibility study of hosting an application on their cloud
platforms.
This study involves three kinds of feasibility:
(a) Technical feasibility,
(b) Infrastructure feasibility
(c) Financial feasibility.
The technical feasibility of an application implies determining the following:
1. Ability of an application to scale out.
2. Compatibility of the application with the cloud platform being used within the MSP’s data
center.
3. The need and availability of a specific hardware and software required for hosting and
running of the application.
4. Preliminary information about the application performance and whether they can be met by
the MSP.

SLA Cloud Application LC Management


Performing the infrastructure feasibility involves determining the availability of infrastructural
resources in sufficient quantity so that the projected demands of the application can be met.
The financial feasibility study involves determining the approximate cost to be incurred by the MSP and
the price the MSP charges the customer so that the hosting activity is profitable to both of them.
A feasibility report consists of the results of the above three feasibility studies. The report forms the
basis for further communication with the customer. Once the provider and customer agree upon the
findings of the report, the outsourcing of the application hosting activity proceeds to the next phase,
called “on boarding” of application.
Only the basic feasibility of hosting an application has been carried in this phase. However, the detailed
runtime characteristics of the application are studied as part of the on-boarding activity

On-boarding of Application:
Once the customer and the MSP agree in principle to host the application based on the findings of the
feasibility study, the application is moved from the customer servers to the hosting platform.
Moving an application to the MSP’s hosting platform is called on-boarding.
As part of the on-boarding activity, the MSP understands the application runtime characteristics using
runtime profilers*.
* Profiling is a method of gathering performance data in any development or deployment
scenario. This is for developers and system administrators who want to gather information about
application performance

SLA Cloud Application LC Management


On-boarding :
Packing of the application for deploying on physical or virtual environments. Application packaging is
the process of creating deployable components on the hosting platform (could be physical or
virtual).Open Virtualization Format (OVF) standard is used for packaging the application for cloud
platform.
The packaged application is executed directly on the physical servers to capture and analyze the
application performance characteristics. It allows the functional validation of customer’s application.
Besides, it provides a baseline performance value for the application in non virtual environment.
This can be used as one of the data points for customer’s performance expectation and for
application SLA. Additionally, it helps to identify the nature of application—that is, whether it is CPU-
intensive or I/O intensive or network-intensive and the potential performance bottlenecks.

On-boarding :
The application is executed on a virtualized platform and the application performance characteristics
are noted again. Important performance characteristics like the application’s ability to scale (out and
up) and performance bounds (minimum and maximum performance) are noted.
Based on the measured performance characteristics, different possible SLAs are identified. The
resources required and the costs involved for each SLA are also computed.
Once the customer agrees to the set of SLAs and the cost, the MSP starts creating different policies
required by the data center for automated management of the application. This implies that the
management system should automatically infer the amount of system resources that should be
allocated/de-allocated to/from appropriate components of the application when the load on the system
increases/decreases.

Pre-Production
Once the determination of policies is completed as discussed in previous phase, the application is
hosted in a simulated production environment.
It facilitates the customer to verify and validate the MSP’s findings on application’s runtime
characteristics and agree on the defined SLA. Once both parties agree on the cost and the terms and
conditions of the SLA, the customer sign-off is obtained. On successful completion of this phase the
MSP allows the application to go on-live.

Production
In this phase, the application is made accessible to its end users under the agreed SLA.
However, there could be situations when the managed application tends to behave differently in a
production environment compared to the preproduction environment.
This in turn may cause sustained breach of the terms and conditions mentioned in the SLA. Additionally,
customer may request the MSP for inclusion of new terms and conditions in the SLA.
If the application SLA is breached frequently or if the customer requests for a new non-agreed SLA, the
on-boarding process is performed again. In the case of the former, on-boarding activity is repeated to
analyze the application and its policies with respect to SLA fulfillment. In case of the latter, a new set of
policies are formulated to meet the fresh terms and conditions of the SLA.

Termination
When the customer wishes to withdraw the hosted application and does not wish to continue to avail
the services of the MSP for managing the hosting of its application, the termination activity is initiated.
On initiation of termination, all data related to the application are transferred to the customer and only
the essential information is retained for legal compliance. This ends the hosting relationship between the
two parties for that application, and the customer sign-off is obtained.

File System- Introduction


What is file system?
• File systems are abstraction that enable users to read, manipulate and organize data.
• Typically the data is stored in units known as files in a hierarchical tree where the nodes are
known as directories.
• The file system enables a uniform view, independent of the underlying storage devices which can
range between anything from floppy drives to hard drives and flash memory cards
• Since file systems evolved from stand-alone computers the connection between the logical file
system and the storage device was typically a one-to-one mapping.
• Even software RAID that is used to distribute the data on multiple storage devices is typically
implemented below the file system layer
Distributed File System- Introduction
What is Distributed File System?
• A distributed file system is a client/server-based application that allows clients to access and
process data stored on the server as if it were on their own computer.
• When a user accesses a file on the server, the server sends the user a copy of the file, which
is cached on the user's computer while the data is being processed and is then returned to the
server.
• Ideally, a distributed file system organizes file and directory services of individual servers into a
global directory in such a way that remote data access is not location-specific but is identical
from any client.
• All files are accessible to all users of the global file system and organization is hierarchical and
directory-based.
• Since more than one client may access the same data simultaneously, the server must have a
mechanism in place (such as maintaining information about the times of access) to organize
updates so that the client always receives the most current version of data and that data
conflicts do not arise.
Distributed file systems typically use file or database replication (distributing copies of data on
multiple servers) to protect against data access failures.

Distributed File System- Components


Service – software entity running on one or more machines and providing a particular type of function to
a priori unknown clients.
Server – service software running on a single machine.
Client – process that can invoke a service using a set of operations that forms its client interface.
A client interface for a file service is formed by a set of primitive file operations (create, delete, read,
write).
Client interface of a DFS should be transparent, i.e., not distinguish between local and remote files.

Goal:
• Provide common view of centralized file provide common view of centralized file
system, but distributed implementation.
• Ability to open & update any file on any machine on file on any machine on network
• All of synchronization issues and capabilities of shared local files local files

Distributed File System- Challenges


• Naming and Transparency
• Remote file access
• Caching
• Stateful vs Stateless service
• Replication
• Fault Tolerance
• Security
GFS (Google File System) – What is GFS
GFS is a scalable distributed file system for large data intensive applications built in 2003 by Google.
Shares many of the same goals as previous distributed file systems such as performance, scalability,
reliability, and availability.
The design of GFS is driven by four key observations
1. Component failures,
2. huge files,
3. mutation of files, and
4. benefits of co-designing the applications and file system API
GFS Assumptions
 GFS has high component failure rates
o System is built from many inexpensive commodity components
 Modest number of huge files
o A few million files, each typically 100MB or larger (Multi-GB files are common) n No
need to optimize for small files
 Workloads : two kinds of reads, and writes
o Large streaming reads (1MB or more) and small random reads (a few KBs)
o Small random reads
o Sequential appends to files by hundreds of data producers
 High sustained throughput is more important than latency n Response time for individual read
and write is not critical

GFS is not suited


 GFS/HDFS are not a good fit for:
• Low latency data access (in the milliseconds range)
 Many small files!
• Constantly changing data!
• Not all details of GFS are public knowledge

GFS Design Overview


• Single Master
o Centralized management
• Files stored as chunks
o With a fixed size of 64MB each.
• Reliability through replication
o Each chunk is replicated across 3 or more chunk servers
• Data caching
o Due to large size of data sets
• Interface
o Suitable to Google apps
o Create, delete, open, close, read, write, snapshot, record append

Files on GFS
• A single file can contain many objects (e.g. Web documents)
• Files are divided into fixed size chunks (64MB) with unique 64 bit identifiers
• IDs assigned by GFS master at chunk creation time
• Chunk servers store chunks on local disk as “normal” Linux files
• Reading & writing of data specified by the tuple (chunk_handle, byte_range)
GFS Master
• Files are replicated (by default 3 times) across all chunk servers
• Master maintains all file system metadata!
• Namespace, access control information, mapping from file to chunks, chunk locations, garbage
collection of orphaned chunks, chunk migration, …
• Heartbeat messages between master and chunk servers
• Is the chunk server still alive? What chunks are stored at the chunkserver?!
• To read/write data: client communicates with master (metadata operations) and chunk servers
(data)

Single master in a large cluster can become a bottleneck


• Goal: minimize the number of reads and writes (thus metadata vs. data)

CHUNKS
Chunks - Fixed size of 64MB ¨
Advantages
• Size of meta data is reduced
• Involvement of Master is reduced
• Network overhead is reduced
• Lazy space allocation avoids internal fragmentation
Disadvantages
• Hot spots
Solutions: increase the replication factor and stagger application start times; allow clients to read data
from other clients

GFS Meta Data


• Three major types of metadata
o The file and chunk namespaces
o The mapping from files to chunks
o Locations of each chunk’s replicas
• All the metadata is kept in the Master’s memory
• Master “operation log”
o Consists of namespaces and file to chunk mappings
o Replicated on remote machines
o 64MB chunk has 64 bytes of metadata
• Chunk locations
o Chunk servers keep track of their chunks and relay data to Master through HeartBeat
messages

GFS OPERATIONAL LOGS


• Persistent record of critical metadata changes
• Critical to the recovery of the system
• Changes to metadata are only made visible to clients after they have been written to the
operation log
• Operation log replicated on multiple remote machines
• Before responding to client operation, log record must have been flashed locally and remotely
• Master recovers its file system from checkpoint + operation

GFS CHUNK REPLICA


• Chunk replica placement
o Creation of (initially empty) chunks
o Use under utilized chunk servers; spread across racks
o Limit number of recent creations on each chunk server
• Re-replication!
o Started once the available replicas fall below setting
o Master instructs chunkserver to copy chunk data directly from existing valid replica
o Number of active clone operations/bandwidth is limited
• Re-balancing!
o Changes in replica distribution for better load balancing;
o Gradual filling of new chunk servers
BIG DATA
DATA Evolution
• Past decade, due to rapid progress in technology, several sources of data has emerged. Example
Social networks, eCommerce, P2P networks, Climate control systems, traffic cameras etc
• The amount of data produced by us from the beginning of time till 2003 was 5 billion gigabytes.
If you pile up the data in the form of disks it may fill an entire football field.
• The same amount was created in every two days in 2011, and in every ten minutes in 2013.Today
we create 2.5 quintillion bytes of data per day.
• However, 90% of the data was created in the past 2 – 5 years only !!!
• Though all this information produced is meaningful and can be useful when processed, it is being
neglected
What is BIG DATA
Big data means really a big data, it is a collection of large datasets that cannot be processed using
traditional computing techniques. Big data is not merely a data, rather it has become a complete
subject, which involves various tools, techniques and frameworks.
We are in a knowledge economy.
Data is an important asset to any organization
Discovery of knowledge; Enabling discovery; annotation of data
Complex computational models
No single environment is good enough: need elastic, on-demand capacities
We are looking at newer
Programming models, and
Supporting algorithms and data structures.
HADOOP
• Doug Cutting, Mike Cafarella and team took the solution provided by Google and started an
Open Source Project called HADOOP in 2005.
• Doug named it after his son's toy elephant. Now Apache Hadoop is a registered trademark of the
Apache Software Foundation.
• Hadoop runs applications using the MapReduce algorithm, where the data is processed in
parallel on different CPU nodes.
• In short, Hadoop framework is capable enough to develop applications capable of running on
clusters of computers and they could perform complete statistical analysis for a huge amounts of
data.
Hadoop – HDFS Design
• HDFS is a filesystem designed for storing very large files with streaming data access patterns,
running on clusters of commodity hardware.
• Very large files “Very large” in this context means files that are hundreds of megabytes,
gigabytes, or terabytes in size. There are Hadoop clusters running today that store petabytes of
data.
• Streaming data access HDFS is built around the idea that the most efficient data processing
pattern is a write-once, read-many-times pattern.
• A dataset is typically generated or copied from source, and then various analyses are performed
on that dataset over time.
• Each analysis will involve a large proportion, if not all, of the dataset, so the time to read the
whole dataset is more important than the latency in reading the first record.
• Commodity hardware Hadoop doesn’t require expensive, highly reliable hardware.
• It’s designed to run on clusters of commodity hardware (commonly available hardware that can
be obtained from multiple vendors) for which the chance of node failure across the cluster is
high, at least for large clusters.
• HDFS is designed to carry on working without a noticeable interruption to the user in the face of
such failure.
HDFS is not Recommended when:
• Low-latency data access
• Applications that require low-latency access to data, in the tens of milliseconds range,
will not work well with HDFS.
• Remember, HDFS is optimized for delivering a high throughput of data, and this may be
at the expense of latency.
• Lots of small files
• Because the namenode holds filesystem metadata in memory, the limit to the number of
files in a filesystem is governed by the amount of memory on the namenode.
• As a rule of thumb, each file, directory, and block takes about 150 bytes. So, for example,
if you had one million files, each taking one block, you would need at least 300 MB of
memory.
• Although storing millions of files is feasible, billions is beyond the capability of current
hardware.
• Multiple writers,
• Arbitrary file modifications Files in HDFS may be written to by a single writer. Writes are
always made at the end of the file, in append-only fashion.
• There is no support for multiple writers or for modifications at arbitrary offsets in the
file.
HDFS BLOCKS
• A disk has a block size, which is the minimum amount of data that it can read or write.
• File systems for a single disk build on this by dealing with data in blocks, which are an integral
multiple of the disk block size.
• File system blocks are typically a few kilobytes in size, whereas disk blocks are normally 512
bytes.
• This is generally transparent to the file system user who is simply reading or writing a file of
whatever length.
• HDFS, too, has the concept of a block, but it is a much larger unit 128 MB by default.
• Like in a file system for a single disk, files in HDFS are broken into block-sized chunks, which are
stored as independent units. Unlike a file system for a single disk, a file in HDFS that is smaller
than a single block does not occupy a full block’s worth of underlying storage. (For example, a 1
MB file stored with a block size of 128 MB uses 1 MB of disk space, not 128 MB.)
Note:
#Blocks = File Size/ Block size
#Chunks = File Size/ Chunk Size
Why Is a Block in HDFS So Large? HDFS blocks are large compared to disk blocks, and the reason is to
minimize the cost of seeks.
• If the block is large enough, the time it takes to transfer the data from the disk can be
significantly longer than the time to seek to the start of the block.
• Thus, transferring a large file made of multiple blocks operates at the disk transfer rate.
• The default is actually 128 MB, although many HDFS installations use larger block sizes.
• This figure will continue to be revised upward as transfer speeds grow with new generations of
disk drives.
HDFS NAME NODES & DATA NODES
• An HDFS cluster has two types of nodes operating in a master − worker pattern: a namenode
(the master) and a number of datanodes (workers).
• The namenode manages the filesystem namespace.
• It maintains the filesystem tree and the metadata for all the files and directories in the tree.
• This information is stored persistently on the local disk in the form of two files:
• the namespace image and
• the edit log.
• The namenode also knows the datanodes on which all the blocks for a given file are located;
however, it does not store block locations persistently, because this information is reconstructed
from datanodes when the system starts.
HDFS NAME NODES & DATA NODES
• A client accesses the file system on behalf of the user by communicating with the namenode
and datanodes.
• The client presents a file system interface similar to a Portable Operating System Interface
(POSIX), so the user code does not need to know about the namenode and datanodes to
function.
• Datanodes are the workhorses of the filesystem.
• Datanodes store and retrieve blocks when they are told to (by clients or the namenode), and
they report back to the namenode periodically with lists of blocks that they are storing.
Without the namenode, the filesystem cannot be used. In fact, if the machine running the namenode
were obliterated, all the files on the filesystem would be lost since there would be no way of knowing
how to reconstruct the files from the blocks on the datanodes. For this reason, it is important to make
the namenode resilient to failure, and Hadoop provides two mechanisms for this.

HDFS REDUNDANCY (High Availability)


• The first way is to back up the files that make up the persistent state of the file system
metadata. Hadoop can be configured so that the namenode writes its persistent state to
multiple filesystems.
• These writes are synchronous and atomic. The usual configuration choice is to write to local disk
as well as a remote NFS mount.
• It is also possible to run a secondary namenode, which despite its name does not act as a
namenode. Its main role is to periodically merge the namespace image
• The secondary namenode usually runs on a separate physical machine because it requires plenty
of CPU and as much memory as the namenode to perform the merge.
• It keeps a copy of the merged namespace image, which can be used in the event of the
namenode failing.
However, the state of the secondary namenode lags that of the primary, so in the event of total failure of
the primary, data loss is almost certain. The usual course of action in this case is to copy the namenode’s
metadata files that are on NFS to the secondary and run it as the new primary.

HDFS BLOCK CACHING


• Normally a datanode reads blocks from disk, but for frequently accessed files the blocks may be
explicitly cached in the datanode’s memory, in an off-heap block cache.
• By default, a block is cached in only one datanode’s memory, although the number is
configurable on a per-file basis.
• Job schedulers (for MapReduce, Spark, and other frameworks) can take advantage of cached
blocks by running tasks on the datanode where a block is cached, for increased read
performance.
• A small lookup table used in a join is a good candidate for caching, for example. Users or
applications instruct the namenode which files to cache (and for how long) by adding a cache
directive to a cache pool.
• Cache pools are an administrative grouping for managing cache permissions and resource usage.
HDFS FEDERATION
• The namenode keeps a reference to every file and block in the filesystem in memory, which
means that on very large clusters with many files, memory becomes the limiting factor for
scaling
• HDFS federation, introduced in the 2. x release series, allows a cluster to scale by adding
namenodes, each of which manages a portion of the filesystem namespace.
• For example, one namenode might manage all the files rooted under /user, say, and a second
namenode might handle files under /share.
• Under federation, each namenode manages a namespace volume, which is made up of the
metadata for the namespace, and a block pool containing all the blocks for the files in the
namespace.
• Namespace volumes are independent of each other, which means namenodes do not
communicate with one another, and furthermore the failure of one namenode does not affect
the availability of the namespaces managed by other name nodes.
Hadoop Distributed File System

HDFS File Read


1. The client opens the file it wishes to read by calling open() on the FileSystem object.
2. DistributedFileSystem calls the namenode, using remote procedure calls (RPCs), to determine
the locations of the first few blocks in the file. For each block, the namenode returns the
addresses of the datanodes that have a copy of that block.
3. The DistributedFileSystem returns an FSDataInputStream (an input stream that supports file
seeks) to the client for it to read data from.
4. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and
namenode I/ O. The client then calls read() on the stream.
5. 4 & 5. When the Firs block end is reached the DFFInpuitStream will automatically search for the
next node and close previous connections.

YARN
Apache YARN (Yet Another Resource Negotiator) is Hadoop’s cluster resource management system.
YARN was introduced in Hadoop 2 to improve the MapReduce implementation, but it is general enough
to support other distributed computing paradigms as well.
YARN provides APIs for requesting and working with cluster resources, but these APIs are not typically
used directly by user code. Instead, users write to higher-level APIs provided by distributed computing
frameworks, which themselves are built on YARN and hide the resource management details from the
user.

MapReduce (Data Processing Framework)


• MapReduce
• Software Framework for easily running applications
• Processes large amount of data in parallel
• Using large clusters having thousands of nodes
• Nodes of commodity hardware
• In a reliable and fault-tolerant manner
Hadoop – HDFS MAP REDUCE
MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
Map stage : The map or mapper’s job is to process the input data. Generally the input data is in the form
of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper
function line by line. The mapper processes the data and creates several small chunks of data.
Reduce stage : This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s
job is to process the data that comes from the mapper. After processing, it produces a new set of output,
which will be stored in the HDFS.
During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in
the cluster.
The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes.
Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
After completion of the given tasks, the cluster collects and reduces the data to form an appropriate
result, and sends it back to the Hadoop server.

MapReduce Processing flow


Classes of problems “mapreducable”
 Benchmark for comparing: Jim Gray’s challenge on data-intensive computing. Ex: “Sort”
 Google uses it for wordcount, adwords, pagerank, indexing data.
 Simple algorithms such as grep, text-indexing, reverse indexing
 Bayesian classification: data mining domain
 Facebook uses it for various operations: demographics
 Financial services use it for analytics
 Astronomy: Gaussian analysis for locating extra-terrestrial objects.
 Expected to play a critical role in semantic web and in web 3.0
Fault Tolerance in MapReduce
1. If a task crashes:
– Retry on another node
• OK for a map because it had no dependencies
• OK for reduce because map outputs are on disk
– If the same task repeatedly fails, fail the job or ignore that input block
2. If a node crashes:
– Relaunch its current tasks on other nodes
– Relaunch any maps the node previously ran
• Necessary because their output files were lost along
with the crashed node
3. If a task is going slowly (straggler):
– Launch second copy of task on another node
– Take the output of whichever copy finishes first, and kill the other one
• Critical for performance in large clusters
Some Facts about MR
Hadoop requires numerous large data centers. The top five U.S. Internet firms operate over 300 data
centers in the continental U.S., which cost around $1 billion each to construct and consume over 200,000
square feet each. Each data center is two to three times larger than a Wal-Mart. The U.S. government
operates over 2,000 data centers.
· Hadoop data centers employ few workers. Each $1 billion Hadoop data center employs fewer than 50
people, which is 10 times less than a single Wal-Mart Super Center. Their operation doesn't contribute to
local economies. Internet firms are offered tax free status, municipal bonds, and little or no-cost real
estate for their data centers.
· Hadoop consumes a lot of U.S. energy. Hadoop clusters consume 2.2% of America's power (i.e., 567
Terawatts out of 25,776 Terawatts generated each year in the U.S.). U.S. nuclear power plants generate
400 Terawatts of power each year. Therefore, Hadoop clusters consume all power generated by U.S.
nuclear power plants (and then some).
· Hadoop is not energy efficient. Hadoop clusters consume a lot of power even when idle. Power
consumption starts at around 10 kilowatts, which is close to the idle power of the nodes in the cluster
(for a small one). Ten Petabytes of raw storage takes four times that amount (i.e., 40 kilowatts of power
during idle state). A single Hadoop data center consumes about 36 megawatts per year (requiring highly-
specialized power, space, and cooling).
· Hadoop is capital intensive and creates bottlenecks. Hadoop clusters cost about $10 million. This
creates
problems for purchasing and engineering. Most firms don't have $10 million just lying around. Larger
organizations
purchase hardware in bigger increments. This creates bottlenecks for lean and agile methods, which
complete
requirements in a few hours, days, and weeks. Furthermore, Hadoop clusters require specialized power,
space, and
cooling, and initial clusters may be incompatible with data centers. It may be necessary to purchase
multiple clusters
before solutions are found, which increases costs and risks (and hinders engineering work flow).
· Hadoop doesn't scale in a linear fashion. Academia hosts an annual contest to see which computer
system can
sort a terabyte of data the fastest. Yahoo has designed a 1,000 node cluster just to compete in this
contest. Yahoo
generally wins this contest every year using simple, low-level Assembly language-like Hadoop programs.
Simple
tests like this show that Hadoop scales in a linear fashion (i.e., adding another node does not incur any
additional
operating system overhead in its "shared-nothing" paradigm). However, practical industrial-strength
Hadoop
applications create thousands if not millions of parallel operations during the shuffle and sort phase,
which saturates
the cluster's networking switch. Oh yeah, the network is the only shared resource in this shared-nothing
paradigm.
· Hadoop violates your privacy. Hadoop is a write-once, read-many times paradigm. Once information is
written, it
is stored forever. It is the basis for blockbuster Internet applications like Facebook, Yahoo mail, Google
mail,
Amazon EC2, Microsoft's Azure, Ebay, and many more, which have hundreds of millions of users each.
Once a
Gmail message is sent, it is saved forever. Users delete email messages, but they are merely quarantined
from further
use (not deleted). All of those sweet-nothings you whisper to your significant other are saved forever in
Twitter,
Gmail, and Facebook. Government agencies make thousands of requests for your sweet nothings each
year. Law
enforcement agencies monitor social networks of international travelers and act upon misunderstood
colloquialisms.
· Hadoop's complexity is understood by few. Hadoop is a complex, byte-level parallel processing
computer
programming language. It is akin to low-level operating system file manipulation in the Assembly or C
programming
languages. Technologically speaking, Hadoop takes us back to the 1950s and 1960s in terms of operating
systems,
programming, and database processing in order to manipulate the Petabytes of data produced by 3
billion Internet
users each day. Only a few human calculator-like computer programmers can really master it. Fewer than
10% of
Facebook's programmers have mastered Hadoop, and Facebook is the largest consumer of Hadoop
technologies.

You might also like