0% found this document useful (0 votes)
7 views347 pages

Collection of All Chapters

The document provides an introduction to distributed systems, defining their characteristics, advantages, and challenges. It highlights the differences between centralized and distributed systems, examples of distributed systems, and the importance of transparency and security. Additionally, it discusses computer clusters for scalable parallel computing, focusing on design principles, resource sharing, and the evolution of clustering technology.

Uploaded by

pwzpjx48gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views347 pages

Collection of All Chapters

The document provides an introduction to distributed systems, defining their characteristics, advantages, and challenges. It highlights the differences between centralized and distributed systems, examples of distributed systems, and the importance of transparency and security. Additionally, it discusses computer clusters for scalable parallel computing, focusing on design principles, resource sharing, and the evolution of clustering technology.

Uploaded by

pwzpjx48gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 347

Chapter1

Introduction to
Distributed System
Topic of Contents
• Distributed systems
• Definitions,
• Basic Concepts,
• Examples
• Advantages Distributed systems
• Characteristics Distributed systems
• Challenges
System
• Definition
Collection of components that are organized for a common purpose.

Computer System
Education System
Human Body System,etc
History
Centralize System Characteristics
• Single Computer Resource to handle of request

• Single Computer Resource shared by users all the time

• All resources accessible by user all the time

• Software runs in a single process

• Single point of control

• Single point of failure


Distributed systems

• Definition
A distributed system consists of a collection of autonomous
computers, connected through a network and distribution middleware,
which enables computers to coordinate their activities and to share the
resources of the system, so that users perceive the system as a single,
integrated computing facility.

Why DS:Availibility of powerful yet cheap microprocessor,continue advancement in communication technology


Distributed systems Characteristics
• Multiple autonomous components
• Components are not shared by all users
• Resources may not be accessible
• Software runs in concurrent processes on different processors
• Multiple Points of control
• Multiple Points of failure
Examples of Distributed Systems
• Local Area Network and Intranet

• Database Management System

• ATM Network

• Internet/World-Wide Web

• Mobile Computing

• Cloud Computing
8
DBMS:Examples of Distributed Systems
Automatic Teller Machine Network(ATM)
Advantage of Distributed System
• Resource Sharing :Ability to use any hardware, software or data anywhere in the system. .

• Scalability :More nodes can easily be added to the distributed system i.e. it can be scaled as required.

• Fault tolerant : Failure of one node does not lead to the failure of the entire distributed system. Other
nodes can still communicate with each other.

• Redundancy :Several machines can provide the same services, so if one is unavailable, work does not
stop.
• Openness : extensions and improvements of new component in distributed systems.
• Concurrency : Components in distributed systems are executed in concurrent processes.

• Transparency :Distributed systems should be perceived by users and application programmers as


a whole rather than as a collection of cooperating components.
Transparency
Transparency dimensions Properties
Access Transparency Enables local and remote information objects to be accessed using same
operations.
Location Transparency Enables information objects to be accessed without knowledge of their
location.
Concurrency Transparency Enables several processes to operate concurrently using shared information
objects without interference between them.
Replication Transparency Enables multiple instances of information objects to be used to increase
reliability and performance without knowledge of the replicas by users
Failure Transparency enable users and applications to complete their tasks despite the failure of
other components
Migration Transparency Enable movement of information objects within a system without affecting the
operations of users or
Performance Transparency Enable the system to be reconfigured to improve performance as loads vary.

Scaling Transparency Enable the system and applications to expand in scale without change to the
system structure
Disadvantage of Distributed System
• It is difficult to provide security in distributed systems because the nodes as
well as the connections need to be secured.
• Some messages and data can be lost in the network while moving from one
node to another.
• The database connected to the distributed systems is difficult to handle as
compared to a single user system.
• Overloading may occur in the network if all the nodes of the distributed
system try to send data at once.
Challenges:
• Increased complexity:
• Synchronization process challenges
• Imperfect scalability:
• More complex security
• Increased opportunities for failure:
• Communication
• Software structure
• System architecture
• Workload allocation
• Consistency maintenance
CHAPTER 2:
Computer Clusters forScalable
Parallel Computing

1
SUMMARY
• Clustering of computers enables scalable parallel and distributed computing in
both science and business applications.
• This chapter is devoted to building cluster-structured massively parallel
processors.
• We focus on the design principles and assessment of the hardware, software,
middleware, and operating system support to achieve scalability, availability,
programmability, single-system images, and fault tolerance in clusters.
• Only physical clusters are studied in this chapter.
• Virtual clusters will be studied in Chapters 3 and 4.

2
CLUSTERING FOR MASSIVE PARALLELISM

• A computer cluster is a collection of interconnected stand-alone computers which


can work together collectively and cooperatively as a single integrated computing
resource pool.
• Clustering explores massive parallelism at the job level and achieves high availability
(HA) through stand-alone operations.
• The benefits of computer clusters and massively parallel processors (MPPs) include
scalable performance, HA, fault tolerance, modular growth, and use of commodity
components.
• These features can sustain the generation changes experienced in hardware,
software, and network components.
• Cluster computing became popular in the mid-1990s as traditional mainframes and
vector supercomputers were proven to be less cost-effective in many high-
performance computing (HPC) applications.

3
Cluster Development Trends
Milestone Cluster Systems
• Clustering has been a hot research challenge in computer architecture. Fast communication,
job scheduling, SSI, and HA are active areas in cluster research. Table 2.1 lists some milestone
cluster research projects and commercial cluster products. Details of these old clusters can be
found in [14].

4
2.1.2 Design Objectives of Computer Clusters

• Clusters have been classified in various ways in the literature. We classify


clusters using six orthogonal attributes: scalability, packaging, control,
homogeneity, programmability, and security.
• Scalability:
• Clustering of computers is based on the concept of modular growth.
• To scale a cluster from hundreds of uniprocessor nodes to a supercluster
with 10,000 multicore nodes is a nontrivial task.
• The scalability could be limited by a number of factors, such as the
multicore chip technology, cluster topology, packaging method, power
consumption, and cooling scheme applied.
• The purpose is to achieve scalable performance constrained by the
aforementioned factors.
• We have to also consider other limiting factors such as the memory wall,
disk I/O bottlenecks, and latency tolerance, among others.

5
2.1.2 Design Objectives of Computer Clusters

• Packaging:
• Cluster nodes can be packaged in a compact or a slack fashion.
• In a compact cluster, the nodes are closely packaged in one or more racks
sitting in a room, and the nodes are not attached to peripherals (monitors,
keyboards, mice, etc.).
• In a slack cluster, the nodes are attached to their usual peripherals (i.e.,
they are complete SMPs, workstations, and PCs), and they may be located
in different rooms, different buildings, or even remote regions.
• Packaging directly affects communication wire length, and thus the
selection of interconnection technology used.
• While a compact cluster can utilize a high-bandwidth, low-latency
communication network that is often proprietary, nodes of a slack cluster
are normally connected through standard LANs or WANs.

6
2.1.2 Design Objectives of Computer Clusters

• Control:
• A cluster can be either controlled or managed in a centralized or decentralized
fashion.
• A compact cluster normally has centralized control, while a slack cluster can be
controlled either way.
• In a centralized cluster, all the nodes are owned, controlled, managed, and
administered by a central operator.
• In a decentralized cluster, the nodes have individual owners. For instance,
consider a cluster comprising an interconnected set of desktop workstations in
a department, where each workstation is individually owned by an employee.
• The owner can reconfigure, upgrade, or even shut down the workstation at any
time. This lack of a single point of control makes system administration of such
a cluster very difficult.
• It also calls for special techniques for process scheduling, workload migration,
check pointing, accounting, and other similar tasks.

7
2.1.2 Design Objectives of Computer Clusters

• Homogeneity:
• A homogeneous cluster uses nodes from the same platform, that is, the
same processor architecture and the same operating system; often, the
nodes are from the same vendors.
• A heterogeneous cluster uses nodes of different platforms.
Interoperability is an important issue in heterogeneous clusters.
• For instance, process migration is often needed for load balancing or
availability. In a homogeneous cluster, a binary process image can migrate
to another node and continue execution.
• This is not feasible in a heterogeneous cluster, as the binary code will not
be executable when the process migrates to a node of a different platform.

8
2.1.2 Design Objectives of Computer Clusters

• Security
• Intracluster communication can be either exposed or enclosed.
• In an exposed cluster, the communication paths among the nodes are exposed to
the outside world. An outside machine can access the communication paths, and
thus individual nodes, using standard protocols (e.g., TCP/IP).
• Such exposed clusters are easy to implement, but have several disadvantages: Being
exposed, intracluster communication is not secure, unless the communication
subsystem performs additional work to ensure privacy and security. Outside
communications may disrupt intracluster communications in an unpredictable
fashion.
• In an enclosed cluster, intracluster communication is shielded from the outside
world, which alleviates the aforementioned problems. A disadvantage is that there
is currently no standard for efficient, enclosed intracluster communication.

9
2.1.2.6 Dedicated versus Enterprise Clusters
• A dedicated cluster is typically installed in a deskside rack in a central computer room.
• It is homogeneously configured with the same type of computer nodes and managed
by a single administrator group like a frontend host.
• Dedicated clusters are used as substitutes for traditional mainframes or
supercomputers.
• A dedicated cluster is installed, used, and administered as a single machine. Many users
can log in to the cluster to execute both interactive and batch jobs.
• The cluster offers much enhanced throughput, as well as reduced response time.
• An enterprise cluster is mainly used to utilize idle resources in the nodes. Each node is
usually a full-fledged SMP, workstation, or PC, with all the necessary peripherals
attached.
• The nodes are typically geographically distributed, and are not necessarily in the same
room or even in the same building.
• The nodes are individually owned by multiple owners. The cluster administrator has only
limited control over the nodes, as a node can be turned off at any time by its owner.
• The owner’s “local” jobs have higher priority than enterprise jobs.
• The cluster is often configured with heterogeneous computer nodes.

10
2.2.1 Cluster Organization and Resource Sharing
2.2.1.1 A Basic Cluster Architecture
• Figure 2.4 shows the basic architecture of a computer cluster over PCs or
workstations.
• The figure shows a simple cluster of computers built with commodity components
and fully supported with desired SSI features and HA capability.
• The processing nodes are commodity workstations, PCs, or servers. These
commodity nodes are easy to replace or upgrade with new generations of
hardware.
• The node operating systems should be designed for multiuser, multitasking, and
multithreaded applications.
• The nodes are interconnected by one or more fast commodity networks. These
networks use standard communication protocols and operate at a speed thatshould
be two orders of magnitude faster than that of the current TCP/IP speed over
Ethernet.
• The network interface card is connected to the node’s standard I/O bus (e.g., PCI).

11
2.2.1.1 A Basic Cluster Architecture
• When the processor or the operating system is changed, only the driver
software needs to change.
• We desire to have a platform-independent cluster operating system, sitting
on top of the node platforms. But such a cluster OS is not commercially
available.
• Instead, we can deploy some cluster middleware to glue together all node
platforms at the user space. An availability middleware offers HA services.
• An SSI layer provides a single entry point, a single file hierarchy, a single
point of control, and a single job management system. Single memory may
be realized with the help of the compiler or a runtime library.
• A single process space is not necessarily supported.

12
2.2.1.1 A Basic Cluster Architecture

• In general, an idealized cluster is supported by three subsystems:


• First, conventional databases and OLTP monitors offer users a
desktop environment in which to use the cluster.
• In addition to running sequential user programs, the cluster
supports parallel programming based on standard languages and
communication libraries using PVM, MPI, or OpenMP. The
programming environment also includes tools for debugging,
profiling, monitoring, and so forth.
• A user interface subsystem is needed to combine the advantages of
the web interface and the Windows GUI. It should also provide user-
friendly links to various programming environments, job
management tools, hypertext, and search support so that users can
easily get help in programming the computer cluster.

13
2.2.1.2 Resource Sharing in Clusters

There is no widely accepted standard for the memory bus. But there are such standards for the
I/O buses. One recent, popular standard is the PCI I/O bus standard. So, if you implement an NIC
card to attach a faster Ethernet network to the PCI bus you can be assured that this card can be
used in other systems that use PCI as the I/O bus. 20
2.2.1.2 Resource Sharing in Clusters
• The nodes of a cluster can be connected in one of three ways, as shown in
Figure 2.5:

1. The shared-nothing architecture is used in most clusters, where the nodes are
connected through the I/O bus. The shared-nothing configuration in Part (a) simply
connects two or more autonomous computers via a LAN such as Ethernet.
2. The shared-disk architecture is in favor of small-scale availability clusters in
business applications. When one node fails, the other node takes over. A shared-
disk cluster is shown in Part (b). This is what most business clusters desire so that
they can enable recovery support in case of node failure. The shared disk can hold
checkpoint files or critical system images to enhance cluster availability. Without
shared disks, check-pointing, rollback recovery, failover, and failback are not
possible in a cluster.
3. The shared-memory cluster in Part (c) is much more difficult to realize. The nodes
could be connected by a scalable coherence interface (SCI) ring, which is connected
to the memory bus of each node through an NIC module.
• In the other two architectures, the interconnect is attached to the I/O bus. The
memory bus operates at a higher frequency than the I/O bus.

21
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Clusters should be designed for scalability and availability. In this section, we
will cover the design principles of SSI, HA, fault tolerance, and rollback
recovery in general-purpose computers and clusters of cooperative
computers.
Single-System Image Features:
1) Single Entry Point
2) Single File Hierarchy
3) Visibility of Files
4) Support of Single-File Hierarchy
5) Single I/O, Networking, and Memory Space
6) Other Desired SSI Features

22
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Single-System Image Features:
SSI does not mean a single copy of an operating system image residing in memory, as
in an SMP or a workstation. Rather, it means the illusion of a single system, single
control, symmetry, and transparency as characterized in the following list:
• Single system: The entire cluster is viewed by users as one system that has multiple
processors. The user could say, “Execute my application using five processors.” This is
different from a distributed system.
• Single control: Logically, an end user or system user utilizes services from one place
with a single interface. For instance, a user submits batch jobs to one set of queues; a
system administrator configures all the hardware and software components of the
cluster from one control point.
• Symmetry: A user can use a cluster service from any node. In other words, all cluster
services and functionalities are symmetric to all nodes and all users, except those
protected by access rights.
• Location-transparent: The user is not aware of the where is the physical device that
eventually provides a service. For instance, the user can use a tape drive attached to
any cluster node as though it were physically attached to the local node.

23
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Single-System Image Features:
• The main motivation to have SSI is that it allows a cluster to be used, controlled,
and maintained as a familiar workstation is.
• The word “single” in “single-system image” is sometimes synonymous with “global”
or “central.”
• For instance, a global file system means a single file hierarchy, which a user can
access from any node. A single point of control allows an operator to monitor and
configure the cluster system.
• Although there is an illusion of a single system, a cluster service or functionality is
often realized in a distributed manner through the cooperation of multiple
components.
• From the viewpoint of a process P, cluster nodes can be
classified into three types.
– The home node of a process P is the node where P resided when it was
created.
– The local node of a process P is the node where P currently resides.
– All other nodes are remote nodes to P.
24
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Single-System Image Features:
A node can be configured to provide multiple functionalities. For instance, a
node can be designated as a host, an I/O node, and a compute node at the
same time.
The illusion of an SSI can be obtained at several layers, three of which are
discussed in the following list. Note that these layers may overlap with one
another.
• Application software layer: Two examples are parallel web servers and
various parallel databases. The user sees an SSI through the application
and is not even aware that he is using a cluster.
• Hardware or kernel layer: Ideally, SSI should be provided by the operating
system or by the hardware. Unfortunately, this is not a reality yet.
Furthermore, it is extremely difficult to provide an SSI overheterogeneous
clusters. With most hardware architectures and operating systems being
proprietary, only the manufacturer can use this approach.
• Middleware layer: The most viable approach is to construct an SSI layer
just above the OS kernel. This approach is promising because it is platform-
independent and does not require application modification.

25
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.1 Single Entry Point
• Single-system image (SSI) is a very rich concept, consisting of single entry point,
single file hierarchy, single I/O space, single networking scheme, single control
point, single job management system, single memory space, and single process
space.
• The single entry point enables users to log in (e.g., through Telnet, rlogin, or HTTP)
to a cluster as one virtual host, although the cluster may have multiple physical host
nodes to serve the login sessions.
• The system transparently distributes the user’s login and connection requests to
different physical hosts to balance the load.
• Clusters could substitute for mainframes and supercomputers. Also, in an Internet
cluster server, thousands of HTTP or FTP requests may come simultaneously.
Establishing a single entry point with multiple hosts is not a trivial matter. Many
issues must be resolved. The following is just a partial list:
• Home directory Where do you put the user’s home directory?
• Authentication How do you authenticate user logins?
• Multiple connections What if the same user opens several sessions to the same user
account?
• Host failure How do you deal with the failure of one or more hosts?
26
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.1 Single Entry Point
The DNS translates the symbolic name and returns the IP address
159.226.41.150 of the least loaded node, which happens to be node Host1. The
user then logs in using this IP address. The DNS periodically receives load
information from the host nodes to make load-balancing translation decisions.
In the ideal case, if 200 users simultaneously log in, the loginsessions are evenly
distributed among our hosts with 50 users each. This allows a single host to be
four times more powerful.

27
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.2 Single File Hierarchy
• We use the term “single file hierarchy” to mean the illusion of a single,
huge file system image that transparently integrates local and global disks
and other file devices (e.g., tapes).
• All files a user needs are stored in some subdirectories of the root directory
/, and they can be accessed through ordinary UNIX calls such as open, read,
and so on.
• Multiple file systems can exist in a workstation as subdirectories of the
root directory. The functionalities of a single file hierarchy have already
been partially provided by existing distributed file systems such as Network
File System (NFS) and Andrew File System (AFS).

28
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.2 Single File Hierarchy
From the viewpoint of any process, files can reside on three types of
locations in a cluster, as shown in Figure 2.14.
Local storage is the disk on the local node of a process. The disks on remote
nodes are remote storage.
A stable storage requires two aspects: 1) It is persistent, which means data,
once written to the stable storage, will stay there for a sufficiently long time (e.g.,
a week), even after the cluster shuts down; and 2) it is fault-tolerant to some
degree, by using redundancy and periodic backup to tapes.

29
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.3 Visibility of Files
• The term “visibility” here means a process can use traditional UNIX system
or library calls such as fopen, fread, and fwrite to access files.
• Note that there are multiple local scratch directories in a cluster. The local
scratch directories in remote nodes are not in the single file hierarchy, and
are not directly visible to the process.
• The name “scratch” indicates that the storage is meant to act as a scratch
pad for temporary information storage. Information in the local scratch
space could be lost once the user logs out.
• Files in the global scratch space will normally persist even after the user
logs out, but will be deleted by the system if not accessed in a
predetermined time period.
• This is to free disk space for other users. The length of the period can be set
by the system administrator, and usually ranges from one day toseveral
weeks.
• Some systems back up the global scratch space to tapes periodically or
before deleting any files.

30
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.4 Support of Single-File Hierarchy
It is desired that a single file hierarchy have the SSI properties discussed, which are
reiterated for file systems as follows:
• Single system: There is just one file hierarchy from the user’s viewpoint.
• Symmetry: A user can access the global storage (e.g., /scratch) using a clusterservice
from any node. In other words, all file services and functionalities are symmetric to all
nodes and all users, except those protected by access rights.
• Location-transparent: The user is not aware of the whereabouts of the physical
device that eventually provides a service. For instance, the user can use a RAID attached
to any cluster node as though it were physically attached to the local node. There may
be some performance differences, though.
• A cluster file system should maintain UNIX semantics: Every file operation (fopen,
fread, fwrite, fclose, etc.) is a transaction. When an fread accesses a file after an
fwrite modifies the same file, the fread should get the updated value.
• to utilize the local disks in all nodes to form global storage. This solves the
performance and availability problems of a single file server.

31
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.5 Single I/O, Networking, and Memory Space
• To achieve SSI, we desire a single control point, a single address space, a
single job management system, a single user interface, and a single process
control, as depicted in Figure 2.15.
• In this example,
– each node has exactly one network connection. Two of the four nodes
each have two I/O devices attached.
– Single Networking: A properly designed cluster should behave as one
system (the shaded area). In other words, it is like a big workstation
with four network connections and four I/O devices attached.
– Single I/O device: In other words, it is like a big workstation with four
network connections and four I/O devices attached. Any process on any
node can use any network and I/O device as though it were attached to
the local node.

32
33
Thank You

34
Chapter-03
Virtual Machines and
Virtualization of Clusters
and Data Centers
Why learn virtualization?
• Modern computing is more efficient due to virtualization

• Virtualization can be used for mobile, personal and cloud computing

• You can also use virtualization in your personal life


This content will cover
• Understand the benefits of virtualization

• Be able to describe virtualization, virtual machines and hypervisors

• Describe typical data center components that are virtualized

• Become familiar with VMware technology popular in industry


Virtualization Benefits
• Have you ever wished you could clone yourself?

• If you could, would you be more efficient? Would you do more?

• Virtualization enables computers to be more efficient in a similar fashion

• Computers that use virtualization optimize the available compute resources


What is virtualization?
Hardware and Software
• Do you use a smartphone, laptop or home computer?

• Smartphones, laptops or home computers are hardware

• Similar to how your brain controls your actions, software controls hardware

• There are different types of software that control computer actions


Hardware
Processor - Also called CPU (Central Processing Unit)

RAM - Random Access Memory

Read-Only Memory - Non-volatile memory that stores BIOS


*BIOS is type of software responsible for turning on (booting)
computer
Motherboard - Printed Circuit Board (PCB) that holds processor, RAM, ROM,
network and Input/Output (I/O) and other components.
Chipset - Collection of microchips on motherboard that manage specific
functions.

Storage - A persistent (non-volatile) storage device such as a Hard Drive Disk


or Solid State Drive
Software
• System software is necessary for hardware to function

• Operating system controls the hardware

• Application software tells your system to execute a task you want


Now that you are aware of the roles of hardware and software, the concept of
virtualization will be easier to grasp. Virtualization is the “layer” of technology
that goes between the physical hardware of a device and the operating system
to create one or more copies of the device.
What is a VM?
• Virtualization creates virtual hardware by cloning physical hardware

• The hypervisor uses virtual hardware to create a virtual machine (VM)

• A VM is a set of files

• With a hypervisor and VMs, one computer can run multiple OS simultaneously
The Hypervisor
What is a Hypervisor?
• Software installed on top of hardware that created virtualization layer

• Hosts VMs

• Type 1 Hypervisor – Bare metal hypervisor (VMware ESXi)

• Type 2 Hypervisor – Hosted hypervisor (VMware Workstation)


Virtual Machine Files
• VMs can be exported and moved to other hosts

• Files are created by the hypervisor and stored in a directory

• Example VM files:
File Type File Name Description

Log File <vmname>.log Keeps a log of VM activity

Disk File <vmname>.vmdk Stores content of VM’s disk drive

Snapshot Files <vmname>.vmsd and Stores information about VM


<vmname>.vmsn snapshots (saved VM state)
Configuration File <vmname>.vmx Stores information about VM
name, BIOS, guest OS, and
memory
What is a snapshot?
• Working on a VM and need to save progress or state

• Snapshots are saved as files in the VM folder (<vmname>.vmx)

• What is saved by a snapshot?


- State of VM disks
- Contents of VM memory
- VM settings
The Data Center
What is a Data Center?
• Hardware infrastructure that
supports virtualization

• Focus is on processing large amounts


of data

• What are the three main


components?
- Compute
- Storage
- Networks
Compute Systems
• Hardware and operating system software that runs applications

• Difference between a PC and a server


- PCs have user-friendly interface while servers focus on running programs

• Types of servers:
- Tower
- Blade server
- Rack-mounted server

• What is the architecture of a server?


Networks
• Transfer data across the data center so devices can communicate

• What type of hardware is used for networking?


Storage
• Data center storage should have two features: availability and redundancy
Storage - RAID
• Redundant Array of Independent Disks

• Hard drives linked together to create a large volume of redundant storage

• What are the three methods of writing to RAID?


- Mirroring
- Striping
- Parity

• What do the RAID numbers mean (i.e., 0, 1, 5)?


Storage - Block vs. File Level
• Block-Level Storage – Data is written to and accessed from storage volumes (blocks)

• File-Level Storage – Data is written to disks but accessed from default file system
Storage – Types of Data Center Storage
• DAS – Storage device is directly attached to a server (block-level)

• NAS – Storage device is attached to a network, servers on the network can


access device (file-level)

• SAN – Clustered storage devices on their own network that servers can
connect to (block-level)
Common Data Center Storage Protocols
Protocol Application

SCSI (Internet Small Computer System Medium-sized blade servers, Enterprise


Interface servers, DAS
FC (Fiber Channel) Enterprise servers, SAN

FCoE (Fiber Channel over Ethernet) Enterprise servers, SAN

iSCSI (Internet Small Computer System Enterprise servers, NAS


Interface)
Virtual Data Center
Benefits of a Virtual Data Center
• Data centers use a lot of hardware and virtualization makes hardware more
efficient

• Increased computing resources results in higher availability of applications

• Less labor needed to monitor data center (administrator can monitor from desk
using a program)

• Software-defined data center (SDDC): Hypervisor pools physical data center


resources into a virtual data center
What is vSphere?
• Suite of virtualization technology designed for larger enterprise data center
management

• vSphere virtualization tools include:


- ESXi: Type 1 Hypervisor
- vCenter: Management software (installed on management server)
- vSphere Client: Program that controls host servers and VMs
vSphere
Server Virtualization
• Results in increased efficiency of data center servers because multiple VMs
can be hosted on one server

• Computing resources can be distributed to customers using less hardware


Storage Virtualization
• Storage capacity is pooled and distributed to
the VMs
- Physical storage devices are
partitioned into logical storage
(LUNs)
- LUNs are used to create a datastore

• How do VMs access data center storage?


- VMs are stored as VMDK (.vmdk)
files on datastore
- VM configuration files (VM settings)
are stored as VMX (.vmx) files
Network Virtualization
• Physical components that make up the physical
network are virtualized to create a virtual network

• What is a vSwitch?
- Virtual switch that virtual devices can
connect to in order to communicate
with each other

• What is a vLAN?
- Virtual Local Area Network that is
segmented into groups of ports isolated
from one another, creating different
network segments
Types of Virtual Networks
• Bridged Network: The host server and the VM are
connected to the same network, and the host shares
its IP address with the VM

• NAT: VMs use an IP translated from the host’s IP


(using NAT device) and communicate on a private
network set up on the host computer

• Host-only Network: VMs use a private network but


do not have translated IP addresses to connect to
external network, therefore can only communicate to
other VMs on the isolated host network
Application and Desktop Virtualization
• Why use virtualized applications?
- Some applications have specific system requirements
- VMware Thinapp creates a packaged virtual app, that contains the
program and system requirements, and delivers it to the end-user

• What is desktop virtualization?


- Designed to solve computing resource issues faced by the mobile
workforce (workers that need computing without the hardware)
- VMware Horizon takes the resources needed to create a desktop
environment from data centers and delivers it to the end-user’s device
What is the Cloud?
The Cloud

• Cloud computing is the delivery of


shared computing resources
(software and/or data) on-demand
through the internet
Types of Cloud Computing
Cloud computing is categorized into different
service models. The major types of cloud
computing are:

• Software as a Service (SaaS)


• Platform as a Service (PaaS)
• Infrastructure as a Service (IaaS)
Cloud Deployment Models
Cloud deployment includes an emphasize on where the
hardware or software is running and who is controlling it.

• Private Cloud
• Community Cloud
• Public Cloud
• Hybrid Cloud
VMware Solutions
vMotion
• Move running virtual machines from one ESXi host to another ESXi host without
service interruption (live migration)

• Increases availability of data and computing resources


Storage vMotion
• Move the disks and configuration files of a running virtual machine from one
datastore to another datastore without service interruption

• Increases availability of storage


NSX
• Suite of virtualization solutions for data
center networking

• VMware NSX creates a ‘software


network’ on top of the physical network
that can be divided up into many virtual
networks

• Virtual networking components included


VMware Cloud Foundation
• Suite of virtualization solutions for data
center migration

• VMware Cloud Foundation makes it easy


to transition from an existing system to a
virtual data center

• Can be used to virtualize on-premises or


to migrate off-premises to cloud
environments such as Amazon Web
Services (AWS)
vCloud Automation Center
• Cloud management product to quickly
deliver and easily manage the
personalized infrastructure, applications,
and services for business needs

• Individuals can have access to a user-


friendly self-service portal to create their
own machines

• Ability to deliver services on different


platforms such as AWS and Azure
Cloud Health
• Analyze and report your cloud costs,
usage, performance, and security

• Monitor groups of resources or specific


resources such as CPU, memory, and disk
usage
Cloud platform architecture over
virtualized data centers
• Traditional Sever Concept
• Outline
• Pros and Cons
• Virtual Server Concept
• Hypervisors and hosts
• Virtual machines
• Pros and Cons
• Could Computing
• Concept
• The Rise of Cloud
• Examples
• Data centres
• Overview
Two Technologies for Agility
• Virtualization:
• The ability to run multiple operating
systems on a single physical system and
share the underlying hardware resources*
• Cloud Computing:
• “The provisioning of services in a timely
(near on instant), on-demand manner, to
allow the scaling up and down of
resources”**
The Traditional Server Concept

Web App Server DB Server EMail


Server
Linux Linux Windows
Windows
Glassfish MySQL Exchange
IIS
The Traditional Server Concept

• System Administrators often talk about servers as a


whole unit that includes the hardware, the OS, the
storage, and the applications.
• Servers are often referred to by their function i.e. the
Exchange server, the SQL server, the File server,
etc.
• If the File server fills up, or the Exchange server
becomes overtaxed, then the System Administrators
must add in a new server.
And if something goes wrong ...

Web Server App Server DB Server EMail


Windows DOWN! Linux Windows
IIS MySQL Exchange
The Traditional Server Concept

• Unless there are multiple servers, if a


service experiences a hardware failure,
then the service is down.
• System Admins can implement clusters
of servers to make them more fault
tolerant. However, even clusters have
limits on their scalability, and not all
applications work in a clustered
environment.
The Traditional Server Concept

• Pros • Cons
• Easy to • Expensive to acquire and
conceptualize maintain hardware
• Fairly easy to deploy • Not very scalable
• Easy to backup • Difficult to replicate
• Virtually any • Redundancy is difficult to
application/service implement
can be run from this • Vulnerable to hardware
type of setup outages
• In many cases, processor
is under-utilized
The Virtual Server Concept

• Virtual servers seek to encapsulate the


server software away from the hardware
• This includes the OS, the applications, and the
storage for that server.
• Servers end up as mere files stored on a
physical box, or in enterprise storage.
• One host typically house many virtual
servers (virtual machines or VMs).
• A virtual server can be serviced by one or
more hosts e.g. storage, services, etc
The Virtual Server Concept

Hypervisor layer between Guest OS and hardware


Hypervisors And Hosts
• A hypervisor is a piece of computer software,
firmware or hardware that creates and runs virtual
machines.
• A computer on which a hypervisor is running one or
more virtual machines is defined as a host
machine.
• Each virtual machine has a guest operating
systems, which is managed by the hypervisor.
• Multiple instances of a variety of operating systems
may share the virtualized hardware resources.
Hypervisors and Virtual Machines

Server Server Clustering


1 2
Guest OS Guest OS Service
Console
Hypervisor

x86 Architecture

Intercepts
hardware
requests
The Virtual Server Concept

• Virtual servers can still be referred to by


their function i.e. email server, database
server, etc.
• If the environment is built correctly,
virtual servers will not be affected by the
loss of a host.
• Hosts may be removed and introduced
almost at will to accommodate
maintenance.
The Virtual Server Concept
• Virtual servers can be scaled out easily.
• If the administrators find that the resources
supporting a virtual server are being taxed
too much, they can adjust the amount of
resources allocated to that virtual server
• Server templates can be created in a
virtual environment to be used to create
multiple, identical virtual servers
• Virtual servers themselves can be
migrated from host to host almost at will.
The Virtual Server Concept
• Pros • Cons
• Resource pooling • Slightly harder to
• Highly redundant conceptualize
• Highly available • Slightly more costly
• Rapidly deploy new (must buy hardware,
servers OS, Apps, and now the
abstraction layer)
• Easy to deploy
• Reconfigurable while
services are running
• Optimizes physical
resources by doing
more with less
Cloud Computing

Source: http://www.free-pictures-photos.com/
Cloud Computing?
• The cloud is Internet-based
computing, whereby shared
resources, software, and information
are provided to computers and other
devices on demand – pay per use.

• Cost-effective means of virtualising and making use of


resources more effectively
• Low start-up costs – pay for use helps to kick-start companies
• Scaling is proportional to demand (revenue) so it’s a good
business model
• Vast range of Cloud Computing applications
• Virtual private servers, Web hosting, data servers, fail-over
services, etc
Clouds are on the Rise …
Basic Cloud Characteristics
• The “no-need-to-know” in terms of the underlying
details of infrastructure, applications interface with
the infrastructure via the APIs.
• The “flexibility and elasticity” allows these systems
to scale up and down at will
• utilising the resources of all kinds
• CPU, storage, server capacity, load balancing, and databases
• The “pay as much as used and needed” type of
utility computing and the “always on, anywhere and
any place” type of network-based computing.

18
Basic Cloud Characteristics
• Cloud are transparent to users and
applications, they can be built in multiple
ways
• branded products, proprietary open source,
hardware or software, or just off-the-shelf PCs.
• In general, they are built on clusters of PC
servers and off-the-shelf components plus
Open Source software combined with in-
house applications and/or system software.

19
Motivation Example: Forbes.com

• You offer on-line real • Why pay for capacity


time stock market weekends, overnight?
data

9 AM - 5 PM,
M-F
Rate of
Server
Accesses
ALL OTHER
TIMES
Forbes' Solution

• Host the web site in Amazon's EC2


Elastic Compute Cloud
• Provision new servers every day, and
deprovision them every night
• Pay just $0.10* per server per hour
• * more for higher capacity servers
• Let Amazon worry about the hardware!
Cloud computing takes
virtualization to the next step
• You don’t have to own the hardware
• You “rent” it as needed from a cloud
• There are public clouds
• e.g. Amazon EC2, and now many others
(Microsoft, IBM, Sun, and others ...)
• A company can create a private one
• With more control over security, etc.
Goal 1 – Cost Control
• Cost
• Many systems have variable
demands
• Batch processing (e.g. New York Times)
• Web sites with peaks (e.g. Forbes)
• Startups with unknown demand (e.g. the
Cash for Clunkers program)
• Reduce risk
• Don't need to buy hardware until you
need it
Goal 2 - Business Agility

• More than scalability - elasticity


• Ely Lilly in rapidly changing health care
business
• Used to take 3 - 4 months to give a department a
server cluster, then they would hoard it
• Using EC2, about 5 minutes
• And they give it back when they are done
• Scaling back is as important as scaling up
Goal 3 - Stick to Our Business

• Most companies don't WANT to do


system administration
• Forbes says:
• We are is a publishing company, not a software
company
• But beware:
• Do you really save much on sys admin?
• You don't have the hardware, but you still
need to manage the OS!
Cloud Computing Overview
SaaS and PaaS
• SaaS is where an application is hosted as a service
provided to customers across the Internet.
• Saas alleviates the burden of software maintenance/support
• but users relinquish control over software versions and
requirements.
• PaaS provides a computing platform and a solution stack
as a service.
• Consumer creates the software using tools and/or libraries from
the provider.
• The consumer also controls software deployment and
configuration settings. The provider provides the networks,
servers, storage and other services.
27
IaaS

• IaaS providers offer virtual machines, virtual-


machine image libraris, raw (block) and file-
based storage, firewalls, load balancers, IP
addresses, virtual local area networks
(VLANs), and software bundles.
• Pools of hypervisors can scale services up
and down according to customers' varying
requirements
• All infrastructure is provided on-demand
Cloud Service Models
Software as a Platform as a Infrastructure as a
Service (SaaS) Service (PaaS) Service (IaaS)

SalesForce
CRM
LotusLive

Google
App
Engine

Amazon Web Services

Dedicated Server, Managed Hosting


Web Hosting from Rackspace

29

Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim
Some Commercial Cloud
Offerings

30
Cloud Taxonomy

31
Where is all of this?
Data Centers

• 10 billion spent on electricity per year for data centers


• 3% of global energy use
• Clouds are the future of the way companies do business
on the Internet
Forecast: Data Centers, Worldwide, 2010-2016, 2Q12 Update

Table 1-1
Data Centers By Size and Region
(Top) (Front Page)
Data Centers by Size and Region, 2010-2016
Sum of Sites Year

Region Site Class 2010 2011 2012 2013 2014 2015 2016
Asia/Pacific Single 769,012 769,455 780,252 792,999 819,342 851,068 900,039
Rack/Computer Room 58,702 60,311 63,183 66,916 70,173 72,564 74,249
Midsize DC 4,478 4,656 4,989 5,455 5,892 6,260 6,559
Enterprise DC 984 1,026 1,110 1,239 1,379 1,523 1,666
Large DC 106 110 120 136 156 179 204
Canada Single 66,519 66,012 64,384 62,311 61,575 62,246 63,956
Rack/Computer Room 14,390 14,194 13,828 13,241 12,686 12,068 11,328
Midsize DC 650 643 631 613 599 584 564
Enterprise DC 210 209 208 210 217 228 241
Large DC 22 22 23 25 28 32 37
Eastern Europe Single 147,274 151,468 161,916 176,398 194,389 213,602 238,792
Rack/Computer Room 28,750 28,892 28,829 28,761 28,799 29,077 29,380
Midsize DC 1,102 1,112 1,121 1,134 1,149 1,172 1,197
Enterprise DC 196 198 202 208 216 228 243
Large DC 44 44 46 49 53 58 65
Japan Single 286,416 274,109 251,600 225,292 212,947 213,494 222,012
Rack/Computer Room 27,532 27,050 25,885 23,673 21,100 18,334 15,706
Midsize DC 380 372 354 324 294 264 236
Enterprise DC 346 341 330 313 301 294 292
Large DC 85 84 83 83 86 92 101
Latin America Single 195,547 196,703 199,196 204,643 216,169 233,087 253,961
Rack/Computer Room 13,325 13,541 13,786 13,957 13,939 13,846 13,774
Midsize DC 821 832 845 858 868 881 899
Enterprise DC 213 216 222 230 241 258 278
Large DC 18 18 19 20 22 25 28
Middle East and Africa Single 108,868 114,214 126,323 143,733 162,834 182,687 207,031
Rack/Computer Room 21,549 21,793 22,133 22,412 22,650 22,793 22,963
Midsize DC 871 881 894 908 921 934 952
Enterprise DC 120 122 126 131 139 148 159
Large DC 21 21 22 23 24 26 27
United States Single 770,925 769,095 749,290 716,352 685,760 664,601 660,355
Rack/Computer Room 184,457 182,963 179,818 174,492 168,593 162,051 154,496
Midsize DC 2,506 2,483 2,435 2,372 2,319 2,276 2,223
Enterprise DC 2,404 2,392 2,377 2,382 2,438 2,539 2,660
Large DC 571 571 574 589 621 669 724
Western Europe Single 536,090 531,772 528,022 525,520 545,062 583,768 647,273
Rack/Computer Room 139,790 138,022 133,181 125,030 116,790 109,967 105,045
Midsize DC 4,860 4,788 4,608 4,337 4,093 3,921 3,822
Enterprise DC 1,196 1,181 1,148 1,106 1,089 1,105 1,153
Large DC 244 243 242 244 256 280 313
Grand Total 3,391,592 3,382,159 3,364,353 3,338,716 3,376,208 3,469,231 3,645,002
Source: Gartner (August 2012)
Summary Comments
• Virtualization of servers solves a lot of headaches
when deploying infrastructure and applications
• It allows servers to be backed up and moved
around seamlessly
• Migrating a server might allow an application
speed to increase e.g. move to a faster machine
• Resizing (up or down) keeps costs proportional to
business model
• The model works for both private clouds or public
ones (insourcing or outsourcing)
• The cloud is easy to understand and a convenient
way of accessing infrastructure and services.
Service Oriented Architecture
for Distributed Computing

Chatpter-05
Overview of the syllabus

 SOA characteristics
 Principles of service orientation
 Web service and its role in SOA
 Service oriented analysis
 Service oriented design
 SOA platforms
 SOA support in J2EE and .NET
 SOA standards
 Service composition (BPEL)
 Security in SOA
Overview of the content

Current trends
Software paradigms
Application architecture
Web based systems
2-tier and 3-tier architecture
Web based technologies
 component based systems
Current trends …

 Internet based solution


 Complexity of the software
 Growth in hardware mobile and other smart
devices
 Demand for novel / customized services
Software paradigms…

 Procedure oriented
 Object-oriented
 Component based
 Event-driven
 Logic based
 Aspect-oriented
 Service oriented
The monolithic mainframe application
architecture

 Separate, single-function applications, such


as order-entry or billing
 Applications cannot share data or other
resources
 Developers must create multiple instances of
the same functionality (service).
 Proprietary (user) interfaces
The distributed application
architecture

 Integrated applications
 Applications can share resources
 A single instance of functionality (service) can
be reused.
 Common user interfaces
 Bottom-up approach
 Real world scenario
Web based systems …

 Client-server model
 Client side technologies
 Server side technologies
 Web client, Web servers
 Application servers
Basic idea of Tiers

Request Web
Thick
client server
Response

Tier 1: GUI
interactions with the
user and basic
validations

Applicati Databas
on server e server

Tier 2: Application logic, Tier 3: Database


Transaction processing
Management, Calls to
the database server
2-tier architecture

Presentation
Logic

Business Business Business


Logic Logic Logic

Database
Driver
Presentation / Business Layer
Tier Boundary

Data Layer

Database
Two tier architecture

• Deployment costs are high


• Database driver switching costs are high
• Business logic migration costs are high
• The client has to recompile if the BL is changed
• Network performance suffers
N-Tier architecture

Presentation
Logic

Tier Boundary
Business Business Business
Logic Logic Logic

Database
Driver

Tier Boundary
Data Layer

Database
N-Tier architecture

• Deployment costs are low


• Database switching costs are low
• Business migration costs are low
• A firewall can secure parts of the
deployment
• Each tier can vary independently
• Communication performance suffers
• Maintenance costs are high
Presentation tier technologies
At client or server? Property Microsoft Technology Sun Technology

Client HTTP (Web) based HTML browser HTML browser


(Internet Explorer) (Netscape Navigator)

ActiveX Controls Java Applets

Non-HTTP based COM clients CORBA clients

Communication DCOM RMI, IIOP


Protocol between
client and server

Server For creating dynamic ISAPI, ASP NSAPI, Servlets, JSP


Web pages

Other pages HTML, XML HTML, XML


Business tier technologies

Purpose Microsoft Technology Sun Technology

Transaction handing, COM, MTS EJB (Session Beans)


Business Objects

Queuing and Messaging MSMQ IBM’s MQSeries, Java


Messaging Service (JMS)

Database access ADO, OLE, ODBC JDBC, J/SQL (via Entity


Beans)
Microsoft Web Technologies
Presentation Tier

HTML HTML
Browser Browser
COM ActiveX
Client Control
Firewall

ASP ISAPI
DCOM DCOM

HTML/XML pages

DCOM

Business Tier

MTS Transactional MSMQ Queuing ADO/OLE/ODBC


Components Services Database Access

Database Tier

Database Database
Sun’s Web Technologies
Presentation Tier

HTML HTML
Browser Browser
CORBA Java
Client Applet
Firewall

Servlet JSP
RMI/IIOP RMI/IIOP

HTML/XML pages

RMI/IIOP

Business Tier

EJB Session Beans EJB Entity Beans JDBC / SQL/J

MQSeries/Java Messaging
Service (JMS)

Database Tier

Database Database
Component World …

 Justification for component


 Interface
 Implementation
 Reusability
 standards
Interface and Implementation

Eject Actual
implementation
External Skip in terms of
world (a user voltages,
of the audio - Volume + signals, currents
system) etc.
- Bass +

Interface Implementation
Technologies for implementing
components

 RMI / EJB
 CORBA
 COM, DCOM, COM+
 Limitations
 Web services (XML based standards)
Basic model of distributed system

Service
Registry

find Publish

Service Service
Requestor provider
Bind
An Archetypal Distributed Objects System

object
registry

object client object server

client server
proxy proxy
runtime runtime
support support

network network
support support

physical datapath

logical datapath
Distributed Object Systems / Protocols

• The distributed object paradigm has been


widely adopted in distributed applications, for
which a large number of mechanisms based on
the paradigm are available. Among the most
well known of such mechanisms are:
~ Java Remote Method Invocation (RMI),
~ the Common Object Request Broker Architecture
(CORBA) systems,
~ the Distributed Component Object Model
(DCOM),
~ mechanisms that support the Simple Object
Access Protocol (SOAP).
RMI architecture

Web
server
Client
Browse HTTP
r

Applets HTML
HTML pages

Java applets
Applicatio
n server

Object
Implementatio
n

Stub Skeleton
JRMP / IIOP
CORBA architecture

Web
server
Client
Brows HTTP
er

Applets HTML
HTML pages

Java applets
Application
server

Object
Implementation

Client Server
ORB IIOP ORB
DCOM architecture

Web
server
Client
Browse HTTP
r

ActiveX HTML
HTML pages

ActiveX
Controls Applicatio
n server

Object
Implementation

Proxy Stub
DCOM
Limitations of Components

Tightly coupled
Cross language/ platform issues
Interoperability issues
Maintenance and management
Security issues
Application Centric
Business
Narrow Consumers
scope Limited Business Processes
Finance
Application
Application

Integration bound to
Supply Application EAI vendor
Architecture

Manufacturing Distribution
Redundancy
Overlapped resources
Overlapped providers

EAI ‘leverage’ application silos


with the drawback of data and
Business functionality is function redundancy.
duplicated in each application
that requires it.
Goal - Service Centric
What are services?
 A service is
Autonomous unit of automated business
logic
Accessible to other systems
 A service represents
Business process
Sub process
Activity (process step)
(or multiple)
What is Service Architecture?
• A collection of services

services

• classified into types type type type

• arranged into layers

• Governed by architectural
patterns and policies
source:TietoEnator AB, Kurts
Bilder
SOA Defined
SOA is a software architecture model
in which business functionality are logically grouped and
encapsulated into self contained,distinct and reusable
units called services that
represent a high level business concept
can be distributed over a network
can be reused to create new business applications
contain contract with specification of the purpose,
functionality, interfaces (coarse grained), constraints,
usage
SOA Defined
SOA is a software architecture model
in which business functionality are logically grouped and
encapsulated into self contained,distinct and reusable
units called services that
represent a high level business concept
can be distributed over a network
can be reused to create new business applications
contain contract with specification of the purpose, functionality,
interfaces (coarse grained), constraints, usage

Services are autonomous, discrete and reusable units of


business functionality exposing its capabilities in a form of
contracts. Services can be independently evolved,
moved, scaled even in runtime.
Big (outer) vs. Little (inner) SOA
Service Relationships

This is not a use case orchestrate / are composed of

invokes
GUI Applications Serv ices
uses
participates exposes
in
Student Business
Applications

help users are realized


has achieve by (partially)

Goals Business

supported Processes

by
Why SOA?

 Heterogeneous cross-platform
 Reusability at the macro (service) level rather
than micro(object) level
 Interconnection to - and usage of - existing IT
(legacy) assets
 Granularity, modularity, composability,
componentization
 Compliance with industry standards
SOA is an evolutionary step
 for architecture
SOA is an evolutionary step

in reusability and communication


SOA is an evolutionary step

in distributed communications

EAI Project-ware SOA

source:Sam Gentile
Features of SOA

 Self- describing Interface (WSDL)


 Message communication via formally defined
XML
 Services are maintained in a registry
 Each service has a Quality Of Service
 Applications adapt to changing technologies
 Easy integration of applications with other
systems
 Leverage existing investments in legacy
applications
Service Architecture Composition

 Service architectures are composed of


 Services
• Units of processing logic
• Example: Credit card Service
 Messages
• Units of communications between services
• Needed for services to do their job

 Operations
• Units of Work
• Example: Determine Cost of Attendance
 Processes
• Composed / orchestrated groups of services
• Example: Financial Aid Disbursement
SOA principles

 Service Encapsulation
 Service Loose coupling
 Service Contract
 Service abstraction
 Service Documentation
 Service reusability
 Service composability
 Service autonomy
 Service optimization and Discovery
 Service statelessness
Loose Coupling

“Service contracts impose low consumer coupling


requirements and are themselves decoupled from their
surrounding environment."

Create specific types of relationships within and outside


of service boundaries with a constant emphasis on
reducing (“loosening”) dependencies

between
 Service contract
 Service implementation
 Service consumers
Source: Thomas Erl
Standardized Service Contracts

 “Services within the same service inventory are in


compliance with the same contract design standards."

 Services use service contract to


 Express their purpose
 Express their capabilities

 Use formal, standardized service contracts

 Focus on the areas of


 Functional expression
 Data representation Source: Thomas Erl

 Policy
Abstraction
 “Service contracts only contain essential
information and information about services is
limited to what is published in service contracts”

 Avoid the proliferation of unnecessary service


information, meta-data.

 Hide as much of the underlying details of a


service as possible.
 Enables and preserves the loosely coupled
relationships
 Plays a significant role in the positioning and
design of service compositions Source: Thomas Erl
Reusability

 “Services contain and express agnostic logic and


can be positioned as reusable enterprise
resources."

 Reusable services have the


following characteristics:
 Defined by an agnostic functional context
 Logic is highly generic
 Has a generic and extensible contract Source: Thomas Erl
 Can be accessed concurrently
Composability

 "Services are effective


composition
participants,
regardless of the size
and complexity of the
composition."

 Ensures services are


able to participate in
multiple compositions
to solve multiple larger
Source: Thomas Erl
problems
Autonomy

 "Services exercise a high level of control over


their underlying runtime execution environment."

 Represents the ability of a service to carry out its


logic independently of outside influences

 To achieve this, services must be more isolated

 Primary benefits
 Increased reliability
 Behavioral predictability

Source: Thomas Erl


Discoverability

 "Services are supplemented with communicative


meta data by which they can be effectively
discovered and interpreted."

 Service contracts contain appropriate meta data for


discovery which also communicates purpose and
capabilities to humans

 Store meta data in a


service registry or profile
documents
Source: Thomas Erl
Statelessness

 "Services minimize resource consumption by


deferring the management of state information
when necessary."

 Incorporate state management deferral


extensions within a service design

 Goals
 Increase service scalability
 Support design of agnostic
logic and improve service reuse Source: Thomas Erl
Applying SOA - Governance
 Governance is a program that makes sure people
do what is ‘right’

 In conjunction with software, governance controls


the development and operation of software

Goal: Establish SOA organization governance (SOA


Board) that governs SOA efforts and breaks down
capabilities into non-overlapping services
Applying SOA - Governance

Policies
 Codification of laws, regulations, corporate
guidelines and best practices
 Must address all stages of the service lifecycle
(technology selection, design, development
practices, configuration management, release
management, runtime management, etc.)
Applying SOA - Governance

Processes
 Enforce policies
 System-driven processes (code check-in, code
builds, unit tests)
 Human-driven process (requests, design
reviews, code reviews, threat assessment, test
case review, release engineering, service
registration, etc.)
Applying SOA - Governance

Metrics
 Measurements of service reuse, compliancy
with policy, etc.
 Organization
 Governance program should be run by SOA
Board, which should have cross-functional
representatives
Foundation

Applying SOA – Service Blocks

Governance Core APIs

rra
Te re

G dia
a
Sh

eo
M
e
I/CAD
Business
Capabilities

e ch
I/..

G/T
In
Serv r
ice Othe

ew
vi
al
ic
hn
c
te
Software and IT
Architects

Service implementation
SOA and deployment
w model
vi e Board

St d e l
Po a r
s

an
Mo rns
Pa t
e

lic ds
v i c

d
r

y
se

te
Enterprise
s
es

s
Architects
s in
Bu
Service specification
Service model
Designers
Business service
model
Applying SOA - Challenges

 Service Orientation
Business functionality has to be made available as services.
Service contracts must be fixed

 Reuse Implemented services must be designed with reuse in mind.


This creates some overhead.

 Sharing of Responsibilities
Potential service users must be involved in the design
 Increased complexity! process and will have influence on the service design
Applying SOA – Renovation
Roadmap

(Source: Enterprise SOA: Service Oriented Architecture Best Practices


by Dirk Krafzig, Karl Banke, and Dirk Slama, Prentice Hall 2004)
Service Oriented Architecture model
Before SOA – After SOA

source:IBM
Why SOA?
To enable Flexible, Federated Business Processes
Enabling a virtual federation of Enabling alternative Enabling reuse of
participants to collaborate in an implementations Services
end-to-end business process

Service

Service Identification Ticket Collection


Service
Ordering
Ticket Sales
Service

Service
Service

Inventory Service
Logistics Service Service

Service
Availability
Manufacturing

Enabling virtualization of business resources Enabling aggregation from multiple


providers

source:TietoEnator AB, Kurts


Bilder
Why SOA? To enable Business Process Optimization
and the Real Time Enterprise (RTE)

BPM Expressed in
terms of Services
Provided/Consumed

Seamless End to End Process

Service to Customers Enterprise Service from Multiple Suppliers

Smart Clients
Stores POS
Mobile
3rd Party Agents
Portal

Internal Systems

source:TietoEnator AB, Kurts


Bilder SOA Pattern: Standardized Service
SOA Patterns: Single, Multi-Channel provided by multiple suppliers
Service for consistency
Why SOA?
Enable Structural Improvement

ERP X Process Z Partner A Process Y

Standardizing capabilities Information Consistency

Policy Consistency
e.g. Single Customer Service
Details Service

Reducing impact Consolidation/ Encapsulating


of change Selection implementation
process complexity

CRM CRM Product e.g. Multiple


ERP Z
System 2 System 1 System Sources of
Customer Details

Policy Rationalization and Evolution


Resource Virtualization
Service Architecture Organized by Layers

Example Layers
Reasons for Layering
Presentation
1. Flexible composition. & workflow

2. Reuse.
Composed Services
3. Functional standardization in
lower levels
Basic Services
4. Customization in higher
layers
Underlying
5. Separation of Concerns. API

6. Policies may vary by Layer

according to:TietoEnator AB,


Kurts Bilder
Different layers of SOA
Service Composition Example

Aid
Disbursement
Process
Is Realized By

Aid Disburse Orchestrates


Service account info
Service
Interface Debit Account
Layer
FA
Registration
Award Loan Bursar
Service Service Service
Not Service
Physical

Are Executed In / Controlled By

App
Logic
FA System Registrar System Dept of Ed Bursar
Microsoft .NET Mainframe ??? Java on Linux
Applying services to the problem

Monolithic
Before After

The System S1 S2 S3 S4

System composed of many logical service


System replacement is a total process units (decomposition)
System modules are tightly interdependent Underlying business logic decoupled as
making change difficult much as possible from other services
(autonomy and loose coupling)
Goal of SOA

 Loosely coupled
 The goal for a SOA is a world wide mesh of
collaborating services, which are published
and available for invocation on the Service
Bus.
 SOA is not just an architecture of services
seen from a technology perspective, but the
policies, practices, and frameworks by which
we ensure the right services are provided and
consumed.
Major service types

Basic Services:
 Data-centric and logic-centric services
 Encapsulate data behavior and data model and
ensures data consistency (only on one
backend).
 Basic services are stateless services with high
degree of reusability.
 Represent fundamental SOA maturity level and
usually are build on top existing legacy API
(underlying services.
Major service types

Composed Services :
expose harmonized access to inconsistent basic
services technology (gateways, adapters, façades,
and functionality-adding services).
Encapsulate business specific workflows or
orchestrated services.
Service Types SOA Management & Security
service mediation, routing, trust
enablement. ESB, Service Registry

Multi channel applications: Mobile,


Smart, Thin, Thick clients, Portals.

Business centric services,


orchestrated workflows.
Intermediate services (gateways,
e
ru e
ur
s t vi c

facades )
ct
fra er
In S

n t
C lie Data centric and logic
t
ar centric consistent services.
Sm Highly reusable, stateless
servers
l

ice te
ta

r v o si
r

s
Po

Se mp

Foundation
Co

Service Blocks
rv i c

Core APIs
s
i ce
Se as
B

rra
Te re
G dia

a
Sh
eo
M
e
I/CAD

Business
Capabilities
e ch
I/..

G/T

In
Serv r
ice Othe
SOA Benefits Summary
 Allow us to execute complex business
processes by composing systems from small,
less complex building blocks
 Fosters collaborative business and technical
environment through sharing and coordination
of services
 Create outward facing self-service applications
not constrained by organizational boundaries
 Enables creating adaptive, agile systems that
are resilient to changes in the business
environment
Conclusions
 SOA represents a fundamental change to the
way information systems will designed in the
future
 Long term impact on IT portfolio management
is dramatic
 Adds a significant dimension to system
evaluation process
 Undertaking SOA requires commitment from
all levels of the organization and significant
investments (people, process, and tools)
Conclusion and Summary

 If done correct, SOA is “not just another


architectural fad”

 SOA seeks to bridge the gap between business


and technology promoting business agility (its all
about managing change)

 SOA
 Is complex
 Requires governance
 Requires executive management buy-in
 Requires commitment with resources
Thank You.
Chapter-06
Cloud programming and
software environments

1
Introduction: A Golden Era in
Computing
Powerful
multi-core
processors
General
Explosion of
purpose
domain graphic
applications
processors

Superior
Proliferation
software
of devices methodologies

Virtualization
Wider bandwidth leveraging the
for communication powerful
hardware
2
Cloud Concepts, Enabling-
technologies, and Models:
The Cloud Context

3
Publish
Publish
scale
Inform

Interact
web

Integrate

Transact

Discover (intelligence)
Semantic
discovery

Automate (discovery)
HPC, cloud
Data-intensive

Social media and networking

Data marketplace and analytics


time
Evolution of Internet Computing

deep web

4
Top Ten Largest Databases
Top ten largest databases (2007)

7000

6000

5000

4000

Terabytes
3000

2000

1000

0
LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate

Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/

5
Challenges
• Alignment with the needs of the business / user / non-
computer specialists / community and society
• Need to address the scalability issue: large scale data, high
performance computing, automation, response time, rapid
prototyping, and rapid time to production
• Need to effectively address (i) ever shortening cycle of
obsolescence, (ii) heterogeneity and (iii) rapid changes in
requirements
• Transform data from diverse sources into intelligence and
deliver intelligence to right people/user/systems
• What about providing all this in a cost-effective manner?

6
Enter the cloud

• Cloud computing is Internet-based computing,


whereby shared resources, software and
information are provided to computers and other
devices on-demand, like the electricity grid.
• The cloud computing is a culmination of numerous
attempts at large scale computing with seamless
access to virtually limitless resources.
o on-demand computing, utility computing, ubiquitous computing,
autonomic computing, platform computing, edge computing, elastic
computing, grid computing, …

7
“Grid Technology: A slide from my presentation

to Industry (2005)
• Emerging enabling technology.
• Natural evolution of distributed systems and the Internet.
• Middleware supporting network of systems to facilitate
sharing, standardization and openness.
• Infrastructure and application model dealing with sharing of
compute cycles, data, storage and other resources.
• Publicized by prominent industries as on-demand computing,
utility computing, etc.
• Move towards delivering “computing” to masses similar to
other utilities (electricity and voice communication).”
• Now,
Hmmm…sounds like the definition for cloud computing!!!!!

8
It is a changed world now…
• Explosive growth in applications: biomedical informatics, space
exploration, business analytics, web 2.0 social networking: YouTube,
Facebook
• Extreme scale content generation: e-science and e-business data
deluge
• Extraordinary rate of digital content consumption: digital gluttony:
Apple iPhone, iPad, Amazon Kindle
• Exponential growth in compute capabilities: multi-core, storage,
bandwidth, virtual machines (virtualization)
• Very short cycle of obsolescence in technologies: Windows Vista
Windows 7; Java versions; CC#; Phython
• Newer architectures: web services, persistence models, distributed
file systems/repositories (Google, Hadoop), multi-core, wireless and
mobile
• Diverse knowledge and skill levels of the workforce
• You simply cannot manage this complex situation with your
traditional IT infrastructure:

9
Answer: The Cloud Computing?

• Typical requirements and models:


o platform (PaaS),
o software (SaaS),
o infrastructure (IaaS),
o Services-based application programming interface (API)

• A cloud computing environment can provide one


or more of these requirements for a cost
• Pay as you go model of business
• When using a public cloud the model is similar to
renting a property than owning one.
• An organization could also maintain a private cloud
and/or use both.

10
Enabling Technologies

Cloud applications: data-intensive,


compute-intensive, storage-intensive

Bandwidth
WS
Services interface

Web-services, SOA, WS standards

VM0 VM1 VMn

Storage Virtualization: bare metal, hypervisor. …


Models: S3,
BigTable,
BlobStore, ... Multi-core architectures

64-bit
processor
11
Common Features of Cloud Providers

Development Production
Environment: Environment
IDE, SDK, Plugins

Simple Table Store Accessible through


Drives
storage <key, value> Web services

Management Console and Monitoring tools


& multi-level security

12
Windows Azure
• Enterprise-level on-demand capacity builder
• Fabric of cycles and storage available on-request
for a cost
• You have to use Azure API to work with the
infrastructure offered by Microsoft
• Significant features: web role, worker role , blob
storage, table and drive-storage

13
Amazon EC2
• Amazon EC2 is one large complex web service.
• EC2 provided an API for instantiating computing
instances with any of the operating systems
supported.
• It can facilitate computations through Amazon
Machine Images (AMIs) for various other models.
• Signature features: S3, Cloud Management
Console, MapReduce Cloud, Amazon Machine
Image (AMI)
• Excellent distribution, load balancing, cloud
monitoring tools

14
Google App Engine
• This is more a web interface for a development
environment that offers a one stop facility for
design, development and deployment Java and
Python-based applications in Java, Go and Python.
• Google offers the same reliability, availability and
scalability at par with Google’s own applications
• Interface is software programming based
• Comprehensive programming platform irrespective
of the size (small or large)
• Signature features: templates and appspot,
excellent monitoring and management console

15
Demos
• Amazon AWS: EC2 & S3 (among the many
infrastructure services)
o Linux machine
o Windows machine
o A three-tier enterprise application

• Google app Engine


o Eclipse plug-in for GAE
o Development and deployment of an application

• Windows Azure
o Storage: blob store/container
o MS Visual Studio Azure development and production environment

16
Cloud Programming Models

17
The Context: Big-data
• Data mining huge amounts of data collected in a wide range of
domains from astronomy to healthcare has become essential for
planning and performance.
• We are in a knowledge economy.
o Data is an important asset to any organization
o Discovery of knowledge; Enabling discovery; annotation of
data
o Complex computational models
o No single environment is good enough: need elastic, on-
demand capacities
• We are looking at newer
o Programming models, and
o Supporting algorithms and data structures.

18
Google File System
• Internet introduced a new challenge in the form web
logs, web crawler’s data: large scale “peta scale”
• But observe that this type of data has an uniquely
different characteristic than your transactional or the
“customer order” data : “write once read many
(WORM)” ;
• Privacy protected healthcare and patient information;
• Historical financial data;
• Other historical data

• Google exploited this characteristics in its Google file


system (GFS)

19
What is Hadoop?
’ At Google MapReduce operation are run on a
special file system called Google File System (GFS)
that is highly optimized for this purpose.
’ GFS is not open source.
’ Doug Cutting and others at Yahoo! reverse
engineered the GFS and called it Hadoop Distributed
File System (HDFS).
’ The software framework that supports HDFS,
MapReduce and other related entities is called the
project Hadoop or simply Hadoop.
’ This is open source and distributed by Apache.

20
Fault tolerance

• Failure is the norm rather than exception


• A HDFS instance may consist of thousands of server
machines, each storing part of the file system’s data.
• Since we have huge number of components and that
each component has non-trivial probability of failure
means that there is always some component that is
non-functional.
• Detection of faults and quick, automatic recovery from
them is a core architectural goal of HDFS.

21
HDFS Architecture
Metadata(Name, replicas..)
Metadata ops Namenode (/home/foo/data,6. ..

Client
Block ops
Read Datanodes Datanodes

replication
B
Blocks

Rack1 Write Rack2

Client

22
Hadoop Distributed File System
HDFS Server Master node

HDFS Client
Application

Local file
system
Block size: 2K
Name Nodes
Block size: 128M
Replicated

23
What is MapReduce?
’ MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta bytes
per day)
 A map function extracts some intelligence from raw data.
 A reduce function aggregates according to some guides the
data output by the map.
 Users specify the computation in terms of a map and a reduce
function,
 Underlying runtime system automatically parallelizes the
computation across large-scale clusters of machines, and
 Underlying system also handles machine failures, efficient
communications, and performance issues.
-- Reference: Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data
processing on large clusters. Communication of ACM 51, 1 (Jan. 2008), 107-
113.

24
Classes of problems “mapreducable”

’ Benchmark for comparing: Jim Gray’s challenge on data-


intensive computing. Ex: “Sort”
’ Google uses it for wordcount, adwords, pagerank, indexing
data.
’ Simple algorithms such as grep, text-indexing, reverse
indexing
’ Bayesian classification: data mining domain
’ Facebook uses it for various operations: demographics
’ Financial services use it for analytics
’ Astronomy: Gaussian analysis for locating extra-terrestrial
objects.
’ Expected to play a critical role in semantic web and in
web 3.0

25
Large scale data splits Map <key, 1>
<key, value>pair Reducers (say, Count)

Parse-hash

Count
P-0000
, count1

Parse-hash

Count
P-0001
, count2
Parse-hash

Count
P-0002
Parse-hash ,count3

26
MapReduce Engine
• MapReduce requires a distributed file system and an
engine that can distribute, coordinate, monitor and
gather the results.
• Hadoop provides that engine through (the file system
we discussed earlier) and the JobTracker +
TaskTracker system.
• JobTracker is simply a scheduler.
• TaskTracker is assigned a Map or Reduce (or other
operations); Map or Reduce run on node and so is
the TaskTracker; each task is run on its own JVM on a
node.

27
Thank You

28
Grid Computing

Grid computing systems and resource


management
Chapter-07
Grid systems
• Many!!!
• Classification: (depends on the author)
– Computational grid:
• distributed supercomputing (parallel application
execution on multiple machines)
• high throughput (stream of jobs)
– Data grid: provides the way to solve large
scale data management problems
– Service grid: systems that provide services
that are not provided by any single local
machine.
• on demand: aggregate resources to enable new
services
• Collaborative: connect users and applications via a
virtual workspace
• Multimedia: infrastructure for real-time multimedia 2
applications
Taxonomy of Applications
 Distributed supercomputing consume CPU cycles
and memory

 High-Throughput Computing unused processor cycles

 On-Demand Computing meet short-term requirements


for resources that cannot be cost-effectively or
conveniently located locally.

 Data-Intensive Computing

 Collaborative Computing enabling and enhancing


human-to-human interactions (eg: CAVE5D system
supports remote, collaborative exploration of large
geophysical data sets and the models that generated
them) 3
Alternative classification
• independent tasks
• loosely-coupled tasks
• tightly-coupled tasks

4
Application Management
Application

• Description
• Partitioning partitioning

• Mapping
• Allocation mapping

allocation
management

grid node A grid node B 5


Description
• Use a grid application description
language
• Grid-ADL and GEL
– One can take advantage of loop construct to
use compilation mechanisms for vectorization

6
Grid-ADL
Traditional systems
1

2 5

6
alternative systems

2 .. 5

7
Partitioning/Clustering
• Application represented as a graph
– Nodes: job
– Edges: precedence
• Graph partitioning techniques:
– Minimize communication
– Increase throughput or speedup
– Need good heuristics
• Clustering
8
Graph Partitioning
• Optimally allocating the components
of a distributed program over several
machines
• Communication between machines is
assumed to be the major factor in
application performance
• NP-hard for case of 3 or more
terminals
9
Collapse the graph

• Given G = {N, E, M}
• N is the set of Nodes
• E is the set of Edges
• M is the set of
machine nodes

10
Dominant Edge
• Take node n and its
heaviest edge e
• Edges e1,e2,…er with
opposite end nodes not in
M
• Edges e´1,e´2,…e´k with
opposite end nodes in M
• If w(e) ≥ Sum(w(ei)) +
max(w(e´1),…,w(e´k))
• Then the min-cut does
not contain e
• So e can be collapsed
11
Machine Cut
• Let machine cut Mi be the
set of all edges between
a machine mi and non-
machine nodes N
• Let Wi be the sum of the
weights of all edges in the
machine cut Mi
• Wi’s are sorted so
W1 ≥ W2 ≥ …
• Any edge that has a
weight greater than W2
cannot be part of the min-
cut

12
Zeroing
• Assume that node n has edges to each of
the m machines in M with weights
w1 ≤ w2 ≤ … ≤ wm
• Reducing the weights of each of the m
edges from n to machines M by w1 doesn’t
change the assignment of nodes for the
min-cut
• It reduces the cost of the minimum cut by
(m-1)w1
13
Order of Application
• If the previous 3 techniques are repeatedly
applied on a graph until none of them are
applicable:
• Then the resulting reduced graph is
independent of the order of application of
the techniques

14
Output
• List of nodes collapsed into each of the
machine nodes
• Weight of edges connecting the machine
nodes

• Source: Graph Cutting Algorithms for Distributed Applications Partitioning, Karin


Hogstedt, Doug Kimelman, VT Rajan, Tova Roth, and Mark Wegman, 2001

• homepages.cae.wisc.edu/~ece556/fall2002/PROJECT/distributed_applications.ppt

15
Graph partitioning
• Hendrickson and Kolda, 2000: edge cuts:
– are not proportional to the total
communication volume
– try to (approximately) minimize the total
volume but not the total number of messages
– do not minimize the maximum volume and/or
number of messages handled by any single
processor
– do not consider distance between processors
(number of switches the message passes
through, for example)
– undirected graph model can only express
symmetric data dependencies.
16
Graph partitioning
• To avoid message contention and improve
the overall throughput of the message
traffic, it is preferable to have
communication restricted to processors
which are near each other

• But, edge-cut is appropriate to applications


whose graph has locality and few
neighbors

17
Kwok and Ahmad, 1999:
multiprocessor scheduling taxonomy

18
List Scheduling
• make an ordered list of processes by assigning them some
priorities
• repeatedly execute the following two steps until a valid schedule
is obtained:
– Select from the list, the process with the highest priority for
scheduling.
– Select a resource to accommodate this process.
• priorities are determined statically before the scheduling process
begins. The first step chooses the process with the highest
priority, the second step selects the best possible resource.
• Some known list scheduling strategies:
• Highest Level First algorithm or HLF
• Longest Path algorithm or LP
• Longest Processing Time
• Critical Path Method

• List scheduling algorithms only produce good results for coarse-


grained applications
19
Static scheduling task precedence graph
DSC: Dominance Sequence Clustering
• Yang and Gerasoulis, 1994: two step
method for scheduling with
communication:(focus on the critical path)
1) schedule an unbounded number of
completely connected processors (cluster of
tasks);
2) if the number of clusters is larger than the
number of available processors, then merge
the clusters until it gets the number of real
processors, considering the network
topology (merging step). 20
Graph partitioning
• Kumar and Biswas, 2002: MiniMax
– multilevel graph partitioning scheme
– Grid-aware
– consider two weighted undirected
graphs:
• a work-load graph (to model the problem
domain)
• a system graph (to model the
heterogeneous system)
21
Resource Management

(1988)

Source: P. K. V. Mangan, Ph.D. Thesis, 2006 22


Resource Management
• The scheduling algorithm has four
components:
– transfer policy: when a node can take part of
a task transfer;
– selection policy: which task must be
transferred;
– location policy: which node to transfer to;
– information policy: when to collect system
state information.
23
Resource Management
• Location policy:
– Sender-initiated
– Receiver-initiated
– Symetrically-initiated

24
Scheduling mechanisms for grid
• Berman, 1998 (ext. by Kayser, 2006):
– Job scheduler
– Resource scheduler
– Application scheduler
– Meta-scheduler

25
Scheduling mechanisms for grid
• Legion
– University of Virginia (Grimshaw, 1993)
– Supercomputing 1997
– Currently Avaki commercial product

26
Legion
• is an object oriented infrastructure for grid
environments layered on top of existing
software services.
• uses the existing operating systems,
resource management tools, and security
mechanisms at host sites to implement
higher level system-wide services
• design is based on a set of core objects
27
Legion
• resource management is a negotiation between
resources and active objects that represent the
distributed application
• three steps to allocate resources for a task:
– Decision: considers task’s characteristics and
requirements, resource’s properties and policies, and
users’ preferences
– Enactment: the class object receives an activation
request; if the placement is acceptable, start the task
– Monitoring: ensures that the task is operating
correctly

28
Globus
• Toolkit with a set of components that implement basic
services:
– Security
– resource location
– resource management
– data management
– resource reservation
– Communication
• From version 1.0 in 1998 to the 2.0 release in 2002 and
the latest 3.0, the emphasis is to provide a set of
components that can be used either independently or
together to develop applications
• The Globus Toolkit version 2 (GT2) design is highly
related to the architecture proposed by Foster et al.
• The Globus Toolkit version 3 (GT3) design is based on
grid services, which are quite similar to web services.
GT3 implements the Open Grid Service Infrastructure
(OGSI).
• The current version, GT4, is also based on grid services,29
but with some changes in the standard
Globus: scheduling
• GRAM: Globus Resource Allocation Manager
• Each GRAM responsible for a set of resources operating under the
same site-specific allocation policy, often implemented by a local
resource management
• GRAM provides an abstraction for remote process queuing and
execution with several powerful features such as strong security and
file transfer
• It does not provide scheduling or resource brokering capabilities but it
can be used to start programs on remote resources, despite local
heterogeneity due to the standard API and protocol.
• Resource Specification Language (RSL) is used to communicate
requirements.
• To take advantage of GRAM, a user still needs a system that can
remember what jobs have been submitted, where they are, and what
they are doing.
• To track large numbers of jobs, the user needs queuing, prioritization,
logging, and accounting. These services cannot be found in GRAM
alone, but are provided by systems such as Condor-G
30
MyGrid and OurGrid
• Mainly for bag-of-tasks (BoT) applications
• uses the dynamic algorithm Work Queue
with Replication (WQR)
• hosts that finished their tasks are assigned
to execute replicas of tasks that are still
running.
• Tasks are replicated until a predefined
maximum number of replicas is achieved
(in MyGrid, the default is one).
31
OurGrid
• An extension of MyGrid
• resource sharing system based on peer-
to-peer technology
• resources are shared according to a
“network of favors model”, in which each
peer prioritizes those who have credit in
their past history of interactions.

32
GrADS
• is an application scheduler
• The user invokes the Grid Routine component to execute an application
• The Grid Routine invokes the component Resource Selector
• The Resource Selector accesses the Globus MetaDirectory Service (MDS)
to get a list of machines that are alive and then contact the Network
Weather Service (NWS) to get system information for the machines.
• The Grid Routine then invokes a component called Performance Modeler
with the problem parameters, machines and machine information.
• The Performance Modeler builds the final list of machines and sends it to
the Contract Developer for approval.
• The Grid Routine then passes the problem, its parameters, and the final list
of machines to the Application Launcher.
• The Application Launcher spawns the job using the Globus management
mechanism (GRAM) and also spawns the Contract Monitor.
• The Contract Monitor monitors the application, displays the actual and
predicted times, and can report contract violations to a re-scheduler.

• Although the execution model is efficient from the application perspective, it


does not take into account the existence of other applications in the
system. 33
GrADS
• Vadhiyar and Dongarra, 2002: proposed a
metascheduling architecture in the context
of the GrADS Project.
• The metascheduler receives candidate
schedules of different application level
schedulers and implements scheduling
policies for balancing the interests of
different applications.

34
EasyGrid
• Mainly concerned with MPI applications
• Allows intercluster execution of MPI
processes

35
Nimrod
• uses a simple declarative parametric modeling language
to express parametric experiments
• provides machinery that automates:
– task of formulating,
– running,
– monitoring,
– collating results from the multiple individual experiments.
• incorporates distributed scheduling that can manage the
scheduling of individual experiments to idle computers in
a local area network
• has been applied to a range of application areas, e.g.:
Bioinformatics, Operations Research, Network
Simulation, Electronic CAD, Ecological Modelling and
Business Process Simulation.
36
Nimrod/G

37
AppLeS
• UCSD (Berman and Casanova)
• Application parameter Sweep Template
• Use scheduling based on min-min, min-
max, sufferage, but with heuristics to
estimate performance of resources and
tasks
– Performance information dependent
algorithms (pida)
• Main goal: to minimize file transfers
38
GRAnD [Kayser et al., CCP&E, 2007]

• Distributed submission control


• Data locality
• automatic staging of data
• optimization of file transfer

39
Vega GOS (the CNGrid OS)
GOS overview
A user-level middleware running on a client
machine

• GOS has 2 components: GOS and gnetd


- GOS is a daemon running on the client
machine
- gnetd is a daemon on the grid server
40
GOS
• Grid process and Grid thread
– Grid process is a unit for managing the whole resource of the Grid.
– Grid thread is a unit for executing computation on the Grid.

• GOS API
– GOS API for application developers
• grid(): constructs a Grid process on the client machine.
• gridcon(): grid process connects to the Grid system.
• gridclose(): close a connected grid.
– gnetd API for service developer on Grid servers
• grid_register(): register a service to Grid.
• grid_unregister(): unregister a service.

41
Grid
• Not yet mentioned:
– Simulation: SimGrid and GridSim
– Monitoring: RTM, MonaLisa, ...
– Portals: GridIce, Genius, ...

42
Introduction to P2P systems

Peer-to-Peer computing and


overlay networks
Chapter-08
P2P Systems
Use the vast resources of
machines at the edge of the
Internet to build a network
that allows resource sharing
without any central authority.
More than a system for
sharing pirated music/movies
Characteristics of P2P Systems
 Exploit edge resources.
 Storage, content, CPU, Human presence.
 Significant autonomy from any centralized
authority.
 Each node can act as a Client as well as a Server.
 Resources at edge have intermittent
connectivity, constantly being added &
removed.
 Infrastructure is untrusted and the components
are unreliable.
Overlay Network

A P2P network is an overlay network. Each link


between peers consists of one or more IP links.
Overlays : All in the application
layer
 Tremendous design
flexibility
 Topology, maintenance
 Message types
 Protocol
 Messaging over TCP or UDP
 Underlying physical
network is transparent
to developer
 But some overlays exploit
proximity
Overlay Graph
 Virtual edge
 TCP connection
 or simply a pointer to an IP address
 Overlay maintenance
 Periodically ping to make sure neighbor is still
alive
 Or verify aliveness while messaging
 If neighbor goes down, may want to establish new
edge
 New incoming node needs to bootstrap
 Could be a challenge under high rate of
churn
 Churn : dynamic topology and intermittent access
due to node arrival and failure
Overlay Graph
 Unstructured overlays
 e.g., new node randomly chooses existing
nodes as neighbors
 Structured overlays
 e.g., edges arranged in restrictive structure
P2P Applications
 P2P File Sharing
 Napster, Gnutella, Kazaa, eDonkey,
BitTorrent
 Chord, CAN, Pastry/Tapestry, Kademlia
 P2P Communications
 MSN, Skype, Social Networking Apps
 P2P Distributed Computing
 Seti@home
P2P File Sharing
Alice runs P2P client Gets new
application on her Asks for
notebook computer IP address “Hey
Intermittently for each Jude”
connects to Internet connection

Application displays
Alice chooses one
other peers that have
of the peers, Bob.
copy of Hey Jude.

File is copied from While Alice downloads,


Bob’s PC to Alice’s other users upload
notebook P2P from Alice. P2P
P2P Communication
 Instant Messaging
 Skype is a VoIP P2P system
Alice runs IM client Gets new
application on her IP address Register
notebook computer for each herself with
Intermittently “system”
connects to Internet connection

Alice initiates direct Learns from


TCP connection “system” that Bob
with Bob, then in her buddy list is
chats P2P active
P2P/Grid Distributed
Processing
 seti@home
 Search for ET intelligence
 Central site collects radio telescope data
 Data is divided into work chunks of 300 Kbytes
 User obtains client, which runs in background
 Peer sets up TCP connection to central
computer, downloads chunk
 Peer does FFT on chunk, uploads results,
gets new chunk
 Not P2P communication, but exploit Peer
computing power
Promising properties of P2P
 Massive scalability
 Autonomy : non single point of failure
 Resilience to Denial of Service
 Load distribution
 Resistance to censorship
Key Issues
 Management
 How to maintain the P2P system under high rate
of churn efficiently
 Application reliability is difficult to guarantee
 Lookup
 How to find out the appropriate content/resource
that a user wants
 Throughput
 Content distribution/dissemination applications
 How to copy content fast, efficiently, reliably
Management Issue
 A P2P network must be self-organizing.
 Join and leave operations must be self-managed.
 The infrastructure is untrusted and the components are
unreliable.
 The number of faulty nodes grows linearly with system size.
 Tolerance to failures and churn
 Content replication, multiple paths
 Leverage knowledge of executing application
 Load balancing
 Dealing with freeriders
 Freerider : rational or selfish users who consume more than
their fair share of a public resource, or shoulder less than a
fair share of the costs of its production.
Lookup Issue
 How do you locate data/files/objects in a
large P2P system built around a dynamic set
of nodes in a scalable manner without any
centralized server or hierarchy?
 Efficient routing even if the structure of the
network is unpredictable.
 Unstructured P2P : Napster, Gnutella, Kazaa
 Structured P2P : Chord, CAN, Pastry/Tapestry,
Kademlia
Napster
 Centralized Lookup
 Centralized directory services
 Steps
 Connect to Napster server.
 Upload list of files to server.
 Give server keywords to search the full list
with.
 Select “best” of correct answers. (ping)
 Performance Bottleneck
 Lookup is centralized, but files are
copied in P2P manner
Gnutella
 Fully decentralized lookup for files
 The main representative of “unstructured P2P”
 Flooding based lookup
 Obviously inefficient lookup in terms of scalability
and bandwidth
Gnutella : Scenario
Step 0: Join the network
Step 1: Determining who is on the network
• "Ping" packet is used to announce your presence on the network.
• Other peers respond with a "Pong" packet.
• Also forwards your Ping to other connected peers
• A Pong packet also contains:
• an IP address
• port number
• amount of data that peer is sharing
• Pong packets come back via same route
Step 2: Searching
•Gnutella "Query" ask other peers (usually 7) if they have the file you desire
• A Query packet might ask, "Do you have any content that matches the string
‘Hey Jude"?
• Peers check to see if they have matches & respond (if they have any matches)
& send packet to connected peers if not (usually 7)
• Continues for TTL (how many hops a packet can go before it dies, typically 10 )
Step 3: Downloading
• Peers respond with a “QueryHit” (contains contact info)
• File transfers use direct connection using HTTP protocol’s GET method
Gnutella : Reachable Users
(analytical estimate)
T : TTL, N : Neighbors for Query
Gnutella : Search Issue
 Flooding based search is extremely wasteful
with bandwidth
 A large (linear) part of the network is covered irrespective of
hits found
 Enormous number of redundant messages
 All users do this in parallel: local load grows linearly with
size
 What search protocols can we come up with in
an unstructured network
 Controlling topology to allow for better search
 Random walk, Degree-biased Random Walk
 Controlling placement of objects
 Replication
Gnutella : Random Walk
 Basic strategy
 In scale-free graph: high degree nodes are easy to find by (biased)
random walk
 Scale-free graph is a graph whose degree distribution follows a power
law
 And high degree nodes can store the
index about a large portion of the network
 Random walk
 avoiding the visit of last visited node
 Degree-biased random walk
 Select highest degree node, that has
not been visited
 This first climbs to highest degree node,
then climbs down on the degree sequence
 Provably optimal coverage
Gnutella : Replication
 Spread copies of objects to peers:
more popular objects can be found
easier
 Replication strategies
 When qi is the proportion of query for object i
 Owner replication
 Results in proportional replication to qi
 Path replication
 Results in square root replication to qi
 Random replication
 Same as path replication to qi, only using the given number of
random nodes, not the path
 But there is still the difficulty with rare objects.
KaZaA
 Hierarchical approach between
Gnutella and Napster
 Two-layered architecture.
 Powerful nodes (supernodes) act as local
index servers, and client queries are
propagated to other supernodes.
 Each supernode manages around 100-
150 children
 Each supernode connects to 30-50 other
supernodes
 More efficient lookup than Gnutella
and more scalable than Napster
KaZaA : SuperNode
 Nodes that have more connection bandwidth and are
more available are designated as supernodes
 Each supernode acts as a mini-Napster hub, tracking
the content (files) and IP addresses of its
descendants
 For each file: File name, File size, Content Hash, File
descriptors (used for keyword matches during query)
 Content Hash:
 When peer A selects file at peer B, peer A sends ContentHash
in HTTP request
 If download for a specific file fails (partially completes),
ContentHash is used to search for new copy of file.
KaZaA : Parallel Downloading and
Recovery
 If file is found in multiple nodes, user can
select parallel downloading
 Identical copies identified by ContentHash
 HTTP byte-range header used to request
different portions of the file from different
nodes
 Automatic recovery when server peer stops
sending file
 ContentHash
Unstructured vs Structured
 Unstructured P2P networks allow resources to be
placed at any node. The network topology is arbitrary,
and the growth is spontaneous.
 Structured P2P networks simplify resource
location and load balancing by defining a topology
and defining rules for resource placement.
 Guarantee efficient search for rare objects

What are the rules???

Distributed Hash Table (DHT)


Hash Tables
 Store arbitrary keys and
satellite data (value)
 put(key,value)
 value = get(key)
 Lookup must be fast
 Calculate hash function h()
on key that returns a
storage cell
 Chained hash table: Store
key (and optional value)
there
Distributed Hash Table
 Hash table functionality in a P2P network :
lookup of data indexed by keys
 Key-hash  node mapping
 Assign a unique live node to a key
 Find this node in the overlay network quickly and
cheaply
 Maintenance, optimization
 Load balancing : maybe even change the key-hash
 node mapping on the fly
 Replicate entries on more nodes to increase
robustness
Distributed Hash Table
Structured P2P Systems
 Chord
 Consistent hashing based ring structure
 Pastry
 Uses ID space concept similar to Chord
 Exploits concept of a nested group
 CAN
 Nodes/objects are mapped into a d-dimensional
Cartesian space
 Kademlia
 Similar structure to Pastry, but the method to check the
closeness is XOR function
Chord
 Consistent hashing
based on an ordered
ring overlay
 Both keys and nodes
are hashed to 160 bit
IDs (SHA-1)
 Then keys are assigned
to nodes using
consistent hashing
 Successor in ID space
Chord : hashing properties
 Consistent hashing
 Randomized
 All nodes receive roughly equal share of load
 Local
 Adding or removing a node involves an O(1/N) fraction
of the keys getting new locations
 Actual lookup
 Chord needs to know only O(log N) nodes in
addition to successor and predecessor to achieve
O(log N) message complexity for lookup
Chord : Primitive Lookup
 Lookup query is
forwarded to
successor.
 one way
 Forward the query
around the circle
 In the worst case,
O(N) forwarding is
required
 In two ways, O(N/2)
Chord : Scalable Lookup

ith entry of a finger table A finger table has O(log N)


points the successor of the entries and the scalable
key (nodeID + 2i) lookup is bounded to O(log N)
Chord : Node join
 A new node has to
 Fill its own successor, predecessor and fingers
 Notify other nodes for which it can be a successor, predecessor
of finger
 Simpler way : Find its successor, then stabilize
 Immediately join the ring (lookup works), then modify the
structure
Chord : Stabilization
 If the ring is correct, then routing is correct,
fingers are needed for the speed only
 Stabilization
 Each node periodically runs the stabilization
routine
 Each node refreshes all fingers by periodically
calling find_successor(n+2i-1) for a random i
 Periodic cost is O(logN) per node due to finger
refresh
Chord : Failure handling
 Failed nodes are handled by
 Replication: instead of one successor, we keep r
successors
 More robust to node failure (we can find our new
successor if the old one failed)
 Alternate paths while routing
 If a finger does not respond, take the previous finger, or
the replicas, if close enough
 At the DHT level, we can replicate keys on
the r successor nodes
 The stored data becomes equally more robust
Pastry
 Applies a sorted ring in ID space like Chord
 Nodes and objects are assigned a 128-bit identifier
 NodeID is interpreted as sequences of digit
with base 2b
 In practice, the identifier is viewed in base 16.
 Nested groups
 Applies Finger-like shortcuts to speed up
lookup
 The node that is responsible for a key is
numerically closest (not the successor)
 Bidirectional and using numerical distance
Pastry : Nested group
 Simple example: nodes & keys have n-digit base-3
ids, eg, 02112100101022
 There are 3 nested groups for each group
 Each node knows IP address of one delegate node in
some of the other groups
 Suppose node in group 222… wants to lookup key k=
02112100210.
 Forward query to a node in 0…, then to a node in 02…, then
to a node in 021…, then so on.
Pastry : Routing table and
LeafSet
Base-4 routing table
 Routing table
 Provides delegate nodes in
nested groups
 Self-delegate for the nested
group where the node is
belong to
 O(log N) rows
 O(log N) lookup
 Leaf set
 Set of nodes which is
numerically closest to the
node
 L/2 smaller & L/2 higher
 Replication boundary
 Stop condition for lookup
 Support reliability and
consistency
 Cf) Successors in Chord
Pastry : Join and Failure
 Join
 Use routing to find numerically closest node already in
network
 Ask state from all nodes on the route and initialize own state
 Error correction
 Failed leaf node: contact a leaf node on the side of the failed
node and add appropriate new neighbor
 Failed table entry: contact a live entry with same prefix as
failed entry until new live entry found, if none found, keep
trying with longer prefix table entries
CAN : Content Addressable
Network
 Hash value is viewed as a point in a D-dimensional Cartesian
space
 Hash value points <n1, n2, …, nD>.
 Each node responsible for a D-dimensional “cube” in the space
 Nodes are neighbors if their cubes “touch” at more than just a
point

• Example: D=2
• 1’s neighbors: 2,3,4,6
• 6’s neighbors: 1,2,4,5
• Squares “wrap around”, e.g.,
7 and 8 are neighbors
• Expected # neighbors: O(D)
CAN : Routing
 To get to <n1, n2, …, nD> from <m1, m2, …, mD>
 choose a neighbor with smallest Cartesian distance from <m1,
m2, …, mD> (e.g., measured from neighbor’s center)

• e.g., region 1 needs to send to


node covering X
• Checks all neighbors, node 2 is
closest
• Forwards message to node 2
• Cartesian distance monotonically
decreases with each transmission
• Expected # overlay hops:
(DN1/D)/4
CAN : Join
 To join the CAN:
 find some node in the CAN (via
bootstrap process)
 choose a point in the space
uniformly at random
 using CAN, inform the node
that currently covers the space
that node splits its space in half
 1st split along 1st dimension
 if last split along dimension i
< D, next split along i+1st
dimension The likelihood of a
 e.g., for 2-d case, split on x- rectangle being selected
axis, then y-axis is proportional to it’s
 keeps half the space and gives size, i.e., big rectangles
other half to joining node chosen more frequently
CAN Failure recovery

 View partitioning as a binary tree


 Leaves represent regions covered by overlay nodes
 Intermediate nodes represents “split” regions that could
be “reformed”
 Siblings are regions that can be merged together
(forming the region that is covered by their parent)
CAN Failure Recovery
 Failure recovery when leaf S is
removed
 Find a leaf node T that is either
 S’s sibling
 Descendant of S’s sibling where
T’s sibling is also a leaf node
 T takes over S’s region (move to
S’s position on the tree)
 T’s sibling takes over T’s previous
region
Kademlia : BitTorrent DHT
 For each nodes, files, keywords, deploy
SHA-1 hash into a 160 bits space.
 Every node maintains information about
files, keywords “close to itself”.
 The closeness between two objects
measure as their bitwise XOR
interpreted as an integer.
 D(a, b) = a XOR b
Kademlia : Binary Tree

Subtrees for node 0011…. Each subtree has k


buckets
(k delegate nodes)
Kademlia : Lookup
When node 0011…… wants search 1110……

O(log N)
P2P Content Dissemination
Content dissemination
 Content dissemination is about allowing
clients to actually get a file or other
data after it has been located
 Important parameters
 Throughput
 Latency
 Reliability
P2P Dissemination
Problem Formulation
 Least time to disseminate:
 Fixed data D from one seeder to N
nodes
 Insights / Axioms
 Involving end-nodes speeds up the
process (Peer-to-Peer)
 Chunking the data also speeds up
the process
 Raises many questions
 How do nodes find other nodes for
exchange of chunks?
 Which chunks should be transferred?
 Is there an optimal way to do this?
Optimal Solution in
Homogeneous Network
M Chunks Seeder
 Least time to disseminate: Of Data

 All M chunks to N-1 peers


 Constraining the problem
 Homogeneous network N-1
Peers
 All Links have same throughput & delay
 Underlying network fully connected (Internet)

 Optimal Solution (DIM): Log2N + 2(M-1)


 Ramp-Up: Until each node has at least 1 chunk
 Sustained-Throughput: Until all nodes have all chunks
 There is also an optimal chunk size
FARLEY, A. M. Broadcast time in communication networks. In SIAM Journal Applied
Mathematics (1980)
Ganesan, P. On Cooperative Content Distribution and the Price of Barter. ICDCS 2005
Practical Content
dissemination systems
 Centralized
 Server farms behind single domain name, load balancing
 Dedicated CDN
 CDN is independent system for typically many providers,
that clients only download from (use it as a service),
typically http
 Akamai, FastReplica
 End-to-End (P2P)
 Special client is needed and clients self-organize to form the
system themselves
 BitTorrent(Mesh-swarm), SplitStream(forest),
Bullet(tree+mesh), CREW(mesh)
Akamai
 Provider (eg CNN, BBC, etc) allows Akamai to
handle a subset of its domains (authoritive DNS)
 Http requests for these domains are redirected to
nearby proxies using DNS
 Akamai DNS servers use extensive monitoring info to
specify best proxy: adaptive to actual load, outages, etc
 Currently 20,000+ servers worldwide, claimed 10-
20% of overall Internet traffic is Akamai
 Wide area of services based on this architecture
 availability, load balancing, web based applications, etc
Decentralized Dissemination
Tree: Mesh-Based (Bittorrent, Bullet):
- Multiple overlay links
- Intuitive way to implement a - High-BW peers: more connections
decentralized solution - Neighbors exchange chunks
- Logic is built into the Robust to failures
structure of the overlay - Find new neighbors when links are
broken
- Chunks can be received via multiple
paths
Simpler to implement

However:
-Sophisticated mechanisms for
heterogeneous networks (SplitStream)
- Fault-tolerance Issues
BitTorrent
 Currently 20-50% of internet traffic is
BitTorrent
 Special client software is needed
 BitTorrent, BitTyrant, μTorrent, LimeWire …
 Basic idea
 Clients that download a file at the same time help
each other (ie, also upload chunks to each other)
 BitTorrent clients form a swarm : a random
overlay network
BitTorrent : Publish/download
 Publishing a file
 Put a “.torrent” file on the web: it contains the
address of the tracker, and information about the
published file
 Start a tracker, a server that
 Gives joining downloaders random peers to download from
and to
 Collects statistics about the swarm
 There are “trackerless” implementations by using
Kademlia DHT (e.g. Azureus)
 Download a file
 Install a bittorrent client and click on a “.torrent” file
BitTorrent : Overview

File.torrent :
-URL of tracker
-File name
-File length
Seeder – peer having entire file -Chunk length
-Checksum for each
Leecher – peer downloading file chunk (SHA1 hash)
BitTorrent : Client
 Client first asks 50 random peers from tracker
 Also learns about what chunks (256K) they have
 Pick a chunk and tries to download its pieces
(16K) from the neighbors that have them
 Download does not work if neighbor is disconnected or
denies download (choking)
 Only a complete chunk can be uploaded to others
 Allow only 4 neighbors to download (unchoking)
 Periodically (30s) optimistic unchoking : allows
download to random peer
 important for bootstrapping and optimization
 Otherwise unchokes peer that allows the most
download (each 10s)
BitTorrent : Tit-for-Tat
 Tit-for-tat
 Cooperate first, then do what the opponent
did in the previous game
 BitTorrent enables tit-for-tat
 A client unchokes other peers (allow them
to download) that allowed it to download
from them
 Optimistic unchocking is the initial
cooperation step to bootstrapping
BitTorrent : Chunk selection
 What chunk to select to download?
 Clients select the chunk that is rarest among
the neighbors ( Local decision )
 Increases diversity in the pieces downloaded;
Increase throughput
 Increases likelihood all pieces still available even if
original seed leaves before any one node has
downloaded entire file
 Except the first chunk
 Select a random one (to make it fast: many
neighbors must have it)
BitTorrent : Pros/Cons
 Pros
 Proficient in utilizing partially downloaded files
 Encourages diversity through “rarest-first”
 Extends lifetime of swarm
 Works well for “hot content”
 Cons
 Assumes all interested peers active at same time;
performance deteriorates if swarm “cools off”
 Even worse: no trackers for obscure content
Overcome tree structure –
SplitStream, Bullet
 Tree
 Simple, Efficient, Scalable
 But, vulnerable to failures, load-unbalanced, no
bandwidth constraint
 SplitStream
 Forest (Multiple Trees)
 Bullet
 Tree(Metadata)
+ Mesh(Data)
 CREW
 Mesh(Data,Metadata)
SplitStream
 Forest based dissemination
 Basic idea
 Split the stream into K stripes (with MDC coding)
 For each stripe create a multicast tree such that
the forest
 Contains interior-node-disjoint trees
 Respects nodes’ individual bandwidth constraints
 Approach
 On the Pastry and Scribe(pub/sub)
SplitStream : MDC coding
 Multiple Description coding
 Fragments a single media stream
into M substreams (M ≥ 2 )
 K packets are enough for decoding (K < M)
 Less than K packets can be used to
approximate content
 Useful for multimedia (video, audio) but not for
other data
 Cf) erasure coding for large data file
SplitStream : Interior-node-
disjoint tree
 Each node in a set of trees is interior
node in at most one tree and leaf node
in the other trees.
 Each substream is disseminated over
subtrees
S

ID =0x… a ID =1x… d g ID =2x…


b c e f h i
SplitStream : Constructing the
forest
 Each stream has its groupID
 Each groupID starts with a different digit
 A subtree is formed by the routes from all
members to the groupId
 The nodeIds of all interior nodes share some
number of starting digits with the subtree’s
groupId.
 All nodes have incoming capacity
requirements (number of stripes they need)
and outgoing capacity limits
Bullet
 Layers a mesh on top of an overlay tree
to increase overall bandwidth
 Basic Idea
 Use a tree as a basis
 In addition, each node continuously looks
for peers to download from
 In effect, the overlay is a tree combined
with a random network (mesh)
Bullet : RanSub
 Two phases
 Collect phase : using the tree, 1 2 3 4 5 6 7

membership info is propagated S

upward (random sample and


subtree size) 1 2 3 5 1 3 4 6 2 4 5 6

 Distribution phase : moving A B C

down the tree, all nodes are


provided with a random sample
from the entire tree, or from 1 2 5 D E 1 3 4

the non-descendant part of the


tree
Bullet : Informed content
delivery
 When selecting a peer, first a similarity
measure is calculated
 Based on summary-sketches
 Before exchange missing packets need to be
identified
 Bloom filter of available packets is exchanged
 Old packets are removed from the filter
 To keep the size of the set constant
 Periodically re-evaluate senders
 If needed, senders are dropped and new ones are
requested
Gossip-based Broadcast

Probabilistic Approach with Good Fault Tolerant Properties


 Choose a destination node, uniformly at random, and send it the message
 After Log(N) rounds, all nodes will have the message w.h.p.
 Requires N*Log(N) messages in total
 Needs a ‘random sampling’ service
Usually implemented as
 Rebroadcast ‘fanout’ times
 Using UDP: Fire and Forget

BiModal Multicast (99), Lpbcast (DSN 01), Rodrigues’04 (DSN), Brahami ’04, Verma’06 (ICDCS),
Eugster’04 (Computer), Koldehofe’04, Periera’03
Gossip-based Broadcast:
Drawbacks

Problems
 More faults, higher fanout needed (not dynamically adjustable)
 Higher redundancy  lower system throughput  slower dissemination
 Scalable view & buffer management
 Adapting to nodes’ heterogeneity
 Adapting to congestion in underlying network
CREW: Preliminaries

Deshpande, M., et al. CREW: A Gossip-based Flash-Dissemination System IEEE International


Conference on Distributed Computing Systems (ICDCS). 2006.
CREW (Concurrent Random
Expanding Walkers) Protocol
 Basic Idea: Servers
‘serve’ data to only a
2 3
2
1
1 1 few clients
 Who In turn become
servers and ‘recruit’
3 2 3 more servers
 Split data into chunks
 Chunks are
concurrently
4 5 4 5 disseminated through
random-walks
 Self-scaling and self-
6 6 tuning to heterogeneity
What is new about CREW
 No need to pre-decide fanout or complex protocol to adjust it
 Deterministic termination
 Autonomic adaptation to fault level (More faults more pulls)
 Scalable, real-time and low-overhead view management
 Number of neighbors as low as Log(N) (expander overlay)
 Neighbors detect and remove dead node  disappears from all nodes’ views
instantly
 List of node addresses not transmitted in each gossip message
 Use of metadata plus handshake to reduce data overhead
 No transmission of redundant chunks

 Handshake overloading  Use of TCP as underlying transport


 For ‘random sampling’ of  Automatic flow and congestion control at
the overlay network level
 Quick feedback about  Less complexity in application layer
system-wide properties
 Quick adaptation  Implemented using RPC middleware
CREW Protocol: Latency,
Reliability
RapID
Information Reintegration Module
Chunk Forwarding Module Neighbor Maintenance Module

CORBA-based Middleware (ICE)


Network / OS
Fast Replica
 Disseminate large file to large set of edge
servers or distributed CDN servers
 Minimization of the overall replication time for
replicating a file F across n nodes N1, … , Nn.
 File F is divides in n equal subsequent files:

F1, … , Fn, where Size(Fi) = Size(F) / n bytes


for each i = 1, … , n.
 Two steps of dissemination

 Distribution and Collection


FastReplica : Distribution
N3
N2 N n-1
F3
F2
F n-1

F1 Fn
N1 N
File n
N0 F
F1 F2 F3 F n-1 F n
 Origin node N0 opens n concurrent connections to
nodes N1, … , Nn and sends to each node the
following items:
 a distribution list of nodes R = {N1, … , Nn} to which subfile
Fi has to be sent on the next step;
 subfile Fi .
FastReplica : Collection
N3 F3
F2
N2 N n-1
F1
F1
F n-1
F1
F1
N1 Fn
F1 Nn
File F
N0
F1 F2 F3 F n-1 F n
 After receiving Fi , node Ni opens (n-1) concurrent
network connections to remaining nodes in the
group and sends subfile Fi to them
FastReplica : Collection
(overall)
N3 F3
F2
N2 N n-1
F3
F2
F n-1
F n-1
F1
N1 Fn N Fn
File n
N0 F
 Each node N i has: F1 F2 F3 F n-1 F n

 (n - 1) outgoing connections for sending subfile F i ,


 (n - 1) incoming connections from the remaining
nodes in the group for sending complementary
subfiles F 1, … , F i-1 ,F i+1 , … , F n.
FastReplica : Benefits
 Instead of typical replication of the entire file F to n
nodes using n Internet paths FastReplica exploits (n x
n) different Internet paths within the replication
group, where each path is used for transferring 1/n-
th of file F.

 Benefits:
 The impact of congestion along the involved paths
is limited for a transfer of 1/n-th of the file,
 FastReplica takes advantage of the upload and
download bandwidth of recipient nodes.

You might also like