Collection of All Chapters
Collection of All Chapters
Introduction to
Distributed System
Topic of Contents
• Distributed systems
• Definitions,
• Basic Concepts,
• Examples
• Advantages Distributed systems
• Characteristics Distributed systems
• Challenges
System
• Definition
Collection of components that are organized for a common purpose.
Computer System
Education System
Human Body System,etc
History
Centralize System Characteristics
• Single Computer Resource to handle of request
• Definition
A distributed system consists of a collection of autonomous
computers, connected through a network and distribution middleware,
which enables computers to coordinate their activities and to share the
resources of the system, so that users perceive the system as a single,
integrated computing facility.
• ATM Network
• Internet/World-Wide Web
• Mobile Computing
• Cloud Computing
8
DBMS:Examples of Distributed Systems
Automatic Teller Machine Network(ATM)
Advantage of Distributed System
• Resource Sharing :Ability to use any hardware, software or data anywhere in the system. .
• Scalability :More nodes can easily be added to the distributed system i.e. it can be scaled as required.
• Fault tolerant : Failure of one node does not lead to the failure of the entire distributed system. Other
nodes can still communicate with each other.
• Redundancy :Several machines can provide the same services, so if one is unavailable, work does not
stop.
• Openness : extensions and improvements of new component in distributed systems.
• Concurrency : Components in distributed systems are executed in concurrent processes.
Scaling Transparency Enable the system and applications to expand in scale without change to the
system structure
Disadvantage of Distributed System
• It is difficult to provide security in distributed systems because the nodes as
well as the connections need to be secured.
• Some messages and data can be lost in the network while moving from one
node to another.
• The database connected to the distributed systems is difficult to handle as
compared to a single user system.
• Overloading may occur in the network if all the nodes of the distributed
system try to send data at once.
Challenges:
• Increased complexity:
• Synchronization process challenges
• Imperfect scalability:
• More complex security
• Increased opportunities for failure:
• Communication
• Software structure
• System architecture
• Workload allocation
• Consistency maintenance
CHAPTER 2:
Computer Clusters forScalable
Parallel Computing
1
SUMMARY
• Clustering of computers enables scalable parallel and distributed computing in
both science and business applications.
• This chapter is devoted to building cluster-structured massively parallel
processors.
• We focus on the design principles and assessment of the hardware, software,
middleware, and operating system support to achieve scalability, availability,
programmability, single-system images, and fault tolerance in clusters.
• Only physical clusters are studied in this chapter.
• Virtual clusters will be studied in Chapters 3 and 4.
2
CLUSTERING FOR MASSIVE PARALLELISM
3
Cluster Development Trends
Milestone Cluster Systems
• Clustering has been a hot research challenge in computer architecture. Fast communication,
job scheduling, SSI, and HA are active areas in cluster research. Table 2.1 lists some milestone
cluster research projects and commercial cluster products. Details of these old clusters can be
found in [14].
4
2.1.2 Design Objectives of Computer Clusters
5
2.1.2 Design Objectives of Computer Clusters
• Packaging:
• Cluster nodes can be packaged in a compact or a slack fashion.
• In a compact cluster, the nodes are closely packaged in one or more racks
sitting in a room, and the nodes are not attached to peripherals (monitors,
keyboards, mice, etc.).
• In a slack cluster, the nodes are attached to their usual peripherals (i.e.,
they are complete SMPs, workstations, and PCs), and they may be located
in different rooms, different buildings, or even remote regions.
• Packaging directly affects communication wire length, and thus the
selection of interconnection technology used.
• While a compact cluster can utilize a high-bandwidth, low-latency
communication network that is often proprietary, nodes of a slack cluster
are normally connected through standard LANs or WANs.
6
2.1.2 Design Objectives of Computer Clusters
• Control:
• A cluster can be either controlled or managed in a centralized or decentralized
fashion.
• A compact cluster normally has centralized control, while a slack cluster can be
controlled either way.
• In a centralized cluster, all the nodes are owned, controlled, managed, and
administered by a central operator.
• In a decentralized cluster, the nodes have individual owners. For instance,
consider a cluster comprising an interconnected set of desktop workstations in
a department, where each workstation is individually owned by an employee.
• The owner can reconfigure, upgrade, or even shut down the workstation at any
time. This lack of a single point of control makes system administration of such
a cluster very difficult.
• It also calls for special techniques for process scheduling, workload migration,
check pointing, accounting, and other similar tasks.
7
2.1.2 Design Objectives of Computer Clusters
• Homogeneity:
• A homogeneous cluster uses nodes from the same platform, that is, the
same processor architecture and the same operating system; often, the
nodes are from the same vendors.
• A heterogeneous cluster uses nodes of different platforms.
Interoperability is an important issue in heterogeneous clusters.
• For instance, process migration is often needed for load balancing or
availability. In a homogeneous cluster, a binary process image can migrate
to another node and continue execution.
• This is not feasible in a heterogeneous cluster, as the binary code will not
be executable when the process migrates to a node of a different platform.
8
2.1.2 Design Objectives of Computer Clusters
• Security
• Intracluster communication can be either exposed or enclosed.
• In an exposed cluster, the communication paths among the nodes are exposed to
the outside world. An outside machine can access the communication paths, and
thus individual nodes, using standard protocols (e.g., TCP/IP).
• Such exposed clusters are easy to implement, but have several disadvantages: Being
exposed, intracluster communication is not secure, unless the communication
subsystem performs additional work to ensure privacy and security. Outside
communications may disrupt intracluster communications in an unpredictable
fashion.
• In an enclosed cluster, intracluster communication is shielded from the outside
world, which alleviates the aforementioned problems. A disadvantage is that there
is currently no standard for efficient, enclosed intracluster communication.
9
2.1.2.6 Dedicated versus Enterprise Clusters
• A dedicated cluster is typically installed in a deskside rack in a central computer room.
• It is homogeneously configured with the same type of computer nodes and managed
by a single administrator group like a frontend host.
• Dedicated clusters are used as substitutes for traditional mainframes or
supercomputers.
• A dedicated cluster is installed, used, and administered as a single machine. Many users
can log in to the cluster to execute both interactive and batch jobs.
• The cluster offers much enhanced throughput, as well as reduced response time.
• An enterprise cluster is mainly used to utilize idle resources in the nodes. Each node is
usually a full-fledged SMP, workstation, or PC, with all the necessary peripherals
attached.
• The nodes are typically geographically distributed, and are not necessarily in the same
room or even in the same building.
• The nodes are individually owned by multiple owners. The cluster administrator has only
limited control over the nodes, as a node can be turned off at any time by its owner.
• The owner’s “local” jobs have higher priority than enterprise jobs.
• The cluster is often configured with heterogeneous computer nodes.
10
2.2.1 Cluster Organization and Resource Sharing
2.2.1.1 A Basic Cluster Architecture
• Figure 2.4 shows the basic architecture of a computer cluster over PCs or
workstations.
• The figure shows a simple cluster of computers built with commodity components
and fully supported with desired SSI features and HA capability.
• The processing nodes are commodity workstations, PCs, or servers. These
commodity nodes are easy to replace or upgrade with new generations of
hardware.
• The node operating systems should be designed for multiuser, multitasking, and
multithreaded applications.
• The nodes are interconnected by one or more fast commodity networks. These
networks use standard communication protocols and operate at a speed thatshould
be two orders of magnitude faster than that of the current TCP/IP speed over
Ethernet.
• The network interface card is connected to the node’s standard I/O bus (e.g., PCI).
11
2.2.1.1 A Basic Cluster Architecture
• When the processor or the operating system is changed, only the driver
software needs to change.
• We desire to have a platform-independent cluster operating system, sitting
on top of the node platforms. But such a cluster OS is not commercially
available.
• Instead, we can deploy some cluster middleware to glue together all node
platforms at the user space. An availability middleware offers HA services.
• An SSI layer provides a single entry point, a single file hierarchy, a single
point of control, and a single job management system. Single memory may
be realized with the help of the compiler or a runtime library.
• A single process space is not necessarily supported.
12
2.2.1.1 A Basic Cluster Architecture
13
2.2.1.2 Resource Sharing in Clusters
There is no widely accepted standard for the memory bus. But there are such standards for the
I/O buses. One recent, popular standard is the PCI I/O bus standard. So, if you implement an NIC
card to attach a faster Ethernet network to the PCI bus you can be assured that this card can be
used in other systems that use PCI as the I/O bus. 20
2.2.1.2 Resource Sharing in Clusters
• The nodes of a cluster can be connected in one of three ways, as shown in
Figure 2.5:
1. The shared-nothing architecture is used in most clusters, where the nodes are
connected through the I/O bus. The shared-nothing configuration in Part (a) simply
connects two or more autonomous computers via a LAN such as Ethernet.
2. The shared-disk architecture is in favor of small-scale availability clusters in
business applications. When one node fails, the other node takes over. A shared-
disk cluster is shown in Part (b). This is what most business clusters desire so that
they can enable recovery support in case of node failure. The shared disk can hold
checkpoint files or critical system images to enhance cluster availability. Without
shared disks, check-pointing, rollback recovery, failover, and failback are not
possible in a cluster.
3. The shared-memory cluster in Part (c) is much more difficult to realize. The nodes
could be connected by a scalable coherence interface (SCI) ring, which is connected
to the memory bus of each node through an NIC module.
• In the other two architectures, the interconnect is attached to the I/O bus. The
memory bus operates at a higher frequency than the I/O bus.
21
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Clusters should be designed for scalability and availability. In this section, we
will cover the design principles of SSI, HA, fault tolerance, and rollback
recovery in general-purpose computers and clusters of cooperative
computers.
Single-System Image Features:
1) Single Entry Point
2) Single File Hierarchy
3) Visibility of Files
4) Support of Single-File Hierarchy
5) Single I/O, Networking, and Memory Space
6) Other Desired SSI Features
22
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Single-System Image Features:
SSI does not mean a single copy of an operating system image residing in memory, as
in an SMP or a workstation. Rather, it means the illusion of a single system, single
control, symmetry, and transparency as characterized in the following list:
• Single system: The entire cluster is viewed by users as one system that has multiple
processors. The user could say, “Execute my application using five processors.” This is
different from a distributed system.
• Single control: Logically, an end user or system user utilizes services from one place
with a single interface. For instance, a user submits batch jobs to one set of queues; a
system administrator configures all the hardware and software components of the
cluster from one control point.
• Symmetry: A user can use a cluster service from any node. In other words, all cluster
services and functionalities are symmetric to all nodes and all users, except those
protected by access rights.
• Location-transparent: The user is not aware of the where is the physical device that
eventually provides a service. For instance, the user can use a tape drive attached to
any cluster node as though it were physically attached to the local node.
23
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Single-System Image Features:
• The main motivation to have SSI is that it allows a cluster to be used, controlled,
and maintained as a familiar workstation is.
• The word “single” in “single-system image” is sometimes synonymous with “global”
or “central.”
• For instance, a global file system means a single file hierarchy, which a user can
access from any node. A single point of control allows an operator to monitor and
configure the cluster system.
• Although there is an illusion of a single system, a cluster service or functionality is
often realized in a distributed manner through the cooperation of multiple
components.
• From the viewpoint of a process P, cluster nodes can be
classified into three types.
– The home node of a process P is the node where P resided when it was
created.
– The local node of a process P is the node where P currently resides.
– All other nodes are remote nodes to P.
24
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
Single-System Image Features:
A node can be configured to provide multiple functionalities. For instance, a
node can be designated as a host, an I/O node, and a compute node at the
same time.
The illusion of an SSI can be obtained at several layers, three of which are
discussed in the following list. Note that these layers may overlap with one
another.
• Application software layer: Two examples are parallel web servers and
various parallel databases. The user sees an SSI through the application
and is not even aware that he is using a cluster.
• Hardware or kernel layer: Ideally, SSI should be provided by the operating
system or by the hardware. Unfortunately, this is not a reality yet.
Furthermore, it is extremely difficult to provide an SSI overheterogeneous
clusters. With most hardware architectures and operating systems being
proprietary, only the manufacturer can use this approach.
• Middleware layer: The most viable approach is to construct an SSI layer
just above the OS kernel. This approach is promising because it is platform-
independent and does not require application modification.
25
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.1 Single Entry Point
• Single-system image (SSI) is a very rich concept, consisting of single entry point,
single file hierarchy, single I/O space, single networking scheme, single control
point, single job management system, single memory space, and single process
space.
• The single entry point enables users to log in (e.g., through Telnet, rlogin, or HTTP)
to a cluster as one virtual host, although the cluster may have multiple physical host
nodes to serve the login sessions.
• The system transparently distributes the user’s login and connection requests to
different physical hosts to balance the load.
• Clusters could substitute for mainframes and supercomputers. Also, in an Internet
cluster server, thousands of HTTP or FTP requests may come simultaneously.
Establishing a single entry point with multiple hosts is not a trivial matter. Many
issues must be resolved. The following is just a partial list:
• Home directory Where do you put the user’s home directory?
• Authentication How do you authenticate user logins?
• Multiple connections What if the same user opens several sessions to the same user
account?
• Host failure How do you deal with the failure of one or more hosts?
26
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.1 Single Entry Point
The DNS translates the symbolic name and returns the IP address
159.226.41.150 of the least loaded node, which happens to be node Host1. The
user then logs in using this IP address. The DNS periodically receives load
information from the host nodes to make load-balancing translation decisions.
In the ideal case, if 200 users simultaneously log in, the loginsessions are evenly
distributed among our hosts with 50 users each. This allows a single host to be
four times more powerful.
27
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.2 Single File Hierarchy
• We use the term “single file hierarchy” to mean the illusion of a single,
huge file system image that transparently integrates local and global disks
and other file devices (e.g., tapes).
• All files a user needs are stored in some subdirectories of the root directory
/, and they can be accessed through ordinary UNIX calls such as open, read,
and so on.
• Multiple file systems can exist in a workstation as subdirectories of the
root directory. The functionalities of a single file hierarchy have already
been partially provided by existing distributed file systems such as Network
File System (NFS) and Andrew File System (AFS).
28
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.2 Single File Hierarchy
From the viewpoint of any process, files can reside on three types of
locations in a cluster, as shown in Figure 2.14.
Local storage is the disk on the local node of a process. The disks on remote
nodes are remote storage.
A stable storage requires two aspects: 1) It is persistent, which means data,
once written to the stable storage, will stay there for a sufficiently long time (e.g.,
a week), even after the cluster shuts down; and 2) it is fault-tolerant to some
degree, by using redundancy and periodic backup to tapes.
29
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.3 Visibility of Files
• The term “visibility” here means a process can use traditional UNIX system
or library calls such as fopen, fread, and fwrite to access files.
• Note that there are multiple local scratch directories in a cluster. The local
scratch directories in remote nodes are not in the single file hierarchy, and
are not directly visible to the process.
• The name “scratch” indicates that the storage is meant to act as a scratch
pad for temporary information storage. Information in the local scratch
space could be lost once the user logs out.
• Files in the global scratch space will normally persist even after the user
logs out, but will be deleted by the system if not accessed in a
predetermined time period.
• This is to free disk space for other users. The length of the period can be set
by the system administrator, and usually ranges from one day toseveral
weeks.
• Some systems back up the global scratch space to tapes periodically or
before deleting any files.
30
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.4 Support of Single-File Hierarchy
It is desired that a single file hierarchy have the SSI properties discussed, which are
reiterated for file systems as follows:
• Single system: There is just one file hierarchy from the user’s viewpoint.
• Symmetry: A user can access the global storage (e.g., /scratch) using a clusterservice
from any node. In other words, all file services and functionalities are symmetric to all
nodes and all users, except those protected by access rights.
• Location-transparent: The user is not aware of the whereabouts of the physical
device that eventually provides a service. For instance, the user can use a RAID attached
to any cluster node as though it were physically attached to the local node. There may
be some performance differences, though.
• A cluster file system should maintain UNIX semantics: Every file operation (fopen,
fread, fwrite, fclose, etc.) is a transaction. When an fread accesses a file after an
fwrite modifies the same file, the fread should get the updated value.
• to utilize the local disks in all nodes to form global storage. This solves the
performance and availability problems of a single file server.
31
2.3 DESIGN PRINCIPLES OF COMPUTER CLUSTERS
2.3.1.5 Single I/O, Networking, and Memory Space
• To achieve SSI, we desire a single control point, a single address space, a
single job management system, a single user interface, and a single process
control, as depicted in Figure 2.15.
• In this example,
– each node has exactly one network connection. Two of the four nodes
each have two I/O devices attached.
– Single Networking: A properly designed cluster should behave as one
system (the shaded area). In other words, it is like a big workstation
with four network connections and four I/O devices attached.
– Single I/O device: In other words, it is like a big workstation with four
network connections and four I/O devices attached. Any process on any
node can use any network and I/O device as though it were attached to
the local node.
32
33
Thank You
34
Chapter-03
Virtual Machines and
Virtualization of Clusters
and Data Centers
Why learn virtualization?
• Modern computing is more efficient due to virtualization
• Similar to how your brain controls your actions, software controls hardware
• A VM is a set of files
• With a hypervisor and VMs, one computer can run multiple OS simultaneously
The Hypervisor
What is a Hypervisor?
• Software installed on top of hardware that created virtualization layer
• Hosts VMs
• Example VM files:
File Type File Name Description
• Types of servers:
- Tower
- Blade server
- Rack-mounted server
• File-Level Storage – Data is written to disks but accessed from default file system
Storage – Types of Data Center Storage
• DAS – Storage device is directly attached to a server (block-level)
• SAN – Clustered storage devices on their own network that servers can
connect to (block-level)
Common Data Center Storage Protocols
Protocol Application
• Less labor needed to monitor data center (administrator can monitor from desk
using a program)
• What is a vSwitch?
- Virtual switch that virtual devices can
connect to in order to communicate
with each other
• What is a vLAN?
- Virtual Local Area Network that is
segmented into groups of ports isolated
from one another, creating different
network segments
Types of Virtual Networks
• Bridged Network: The host server and the VM are
connected to the same network, and the host shares
its IP address with the VM
• Private Cloud
• Community Cloud
• Public Cloud
• Hybrid Cloud
VMware Solutions
vMotion
• Move running virtual machines from one ESXi host to another ESXi host without
service interruption (live migration)
• Pros • Cons
• Easy to • Expensive to acquire and
conceptualize maintain hardware
• Fairly easy to deploy • Not very scalable
• Easy to backup • Difficult to replicate
• Virtually any • Redundancy is difficult to
application/service implement
can be run from this • Vulnerable to hardware
type of setup outages
• In many cases, processor
is under-utilized
The Virtual Server Concept
x86 Architecture
Intercepts
hardware
requests
The Virtual Server Concept
Source: http://www.free-pictures-photos.com/
Cloud Computing?
• The cloud is Internet-based
computing, whereby shared
resources, software, and information
are provided to computers and other
devices on demand – pay per use.
18
Basic Cloud Characteristics
• Cloud are transparent to users and
applications, they can be built in multiple
ways
• branded products, proprietary open source,
hardware or software, or just off-the-shelf PCs.
• In general, they are built on clusters of PC
servers and off-the-shelf components plus
Open Source software combined with in-
house applications and/or system software.
19
Motivation Example: Forbes.com
9 AM - 5 PM,
M-F
Rate of
Server
Accesses
ALL OTHER
TIMES
Forbes' Solution
SalesForce
CRM
LotusLive
Google
App
Engine
29
Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim
Some Commercial Cloud
Offerings
30
Cloud Taxonomy
31
Where is all of this?
Data Centers
Table 1-1
Data Centers By Size and Region
(Top) (Front Page)
Data Centers by Size and Region, 2010-2016
Sum of Sites Year
Region Site Class 2010 2011 2012 2013 2014 2015 2016
Asia/Pacific Single 769,012 769,455 780,252 792,999 819,342 851,068 900,039
Rack/Computer Room 58,702 60,311 63,183 66,916 70,173 72,564 74,249
Midsize DC 4,478 4,656 4,989 5,455 5,892 6,260 6,559
Enterprise DC 984 1,026 1,110 1,239 1,379 1,523 1,666
Large DC 106 110 120 136 156 179 204
Canada Single 66,519 66,012 64,384 62,311 61,575 62,246 63,956
Rack/Computer Room 14,390 14,194 13,828 13,241 12,686 12,068 11,328
Midsize DC 650 643 631 613 599 584 564
Enterprise DC 210 209 208 210 217 228 241
Large DC 22 22 23 25 28 32 37
Eastern Europe Single 147,274 151,468 161,916 176,398 194,389 213,602 238,792
Rack/Computer Room 28,750 28,892 28,829 28,761 28,799 29,077 29,380
Midsize DC 1,102 1,112 1,121 1,134 1,149 1,172 1,197
Enterprise DC 196 198 202 208 216 228 243
Large DC 44 44 46 49 53 58 65
Japan Single 286,416 274,109 251,600 225,292 212,947 213,494 222,012
Rack/Computer Room 27,532 27,050 25,885 23,673 21,100 18,334 15,706
Midsize DC 380 372 354 324 294 264 236
Enterprise DC 346 341 330 313 301 294 292
Large DC 85 84 83 83 86 92 101
Latin America Single 195,547 196,703 199,196 204,643 216,169 233,087 253,961
Rack/Computer Room 13,325 13,541 13,786 13,957 13,939 13,846 13,774
Midsize DC 821 832 845 858 868 881 899
Enterprise DC 213 216 222 230 241 258 278
Large DC 18 18 19 20 22 25 28
Middle East and Africa Single 108,868 114,214 126,323 143,733 162,834 182,687 207,031
Rack/Computer Room 21,549 21,793 22,133 22,412 22,650 22,793 22,963
Midsize DC 871 881 894 908 921 934 952
Enterprise DC 120 122 126 131 139 148 159
Large DC 21 21 22 23 24 26 27
United States Single 770,925 769,095 749,290 716,352 685,760 664,601 660,355
Rack/Computer Room 184,457 182,963 179,818 174,492 168,593 162,051 154,496
Midsize DC 2,506 2,483 2,435 2,372 2,319 2,276 2,223
Enterprise DC 2,404 2,392 2,377 2,382 2,438 2,539 2,660
Large DC 571 571 574 589 621 669 724
Western Europe Single 536,090 531,772 528,022 525,520 545,062 583,768 647,273
Rack/Computer Room 139,790 138,022 133,181 125,030 116,790 109,967 105,045
Midsize DC 4,860 4,788 4,608 4,337 4,093 3,921 3,822
Enterprise DC 1,196 1,181 1,148 1,106 1,089 1,105 1,153
Large DC 244 243 242 244 256 280 313
Grand Total 3,391,592 3,382,159 3,364,353 3,338,716 3,376,208 3,469,231 3,645,002
Source: Gartner (August 2012)
Summary Comments
• Virtualization of servers solves a lot of headaches
when deploying infrastructure and applications
• It allows servers to be backed up and moved
around seamlessly
• Migrating a server might allow an application
speed to increase e.g. move to a faster machine
• Resizing (up or down) keeps costs proportional to
business model
• The model works for both private clouds or public
ones (insourcing or outsourcing)
• The cloud is easy to understand and a convenient
way of accessing infrastructure and services.
Service Oriented Architecture
for Distributed Computing
Chatpter-05
Overview of the syllabus
SOA characteristics
Principles of service orientation
Web service and its role in SOA
Service oriented analysis
Service oriented design
SOA platforms
SOA support in J2EE and .NET
SOA standards
Service composition (BPEL)
Security in SOA
Overview of the content
Current trends
Software paradigms
Application architecture
Web based systems
2-tier and 3-tier architecture
Web based technologies
component based systems
Current trends …
Procedure oriented
Object-oriented
Component based
Event-driven
Logic based
Aspect-oriented
Service oriented
The monolithic mainframe application
architecture
Integrated applications
Applications can share resources
A single instance of functionality (service) can
be reused.
Common user interfaces
Bottom-up approach
Real world scenario
Web based systems …
Client-server model
Client side technologies
Server side technologies
Web client, Web servers
Application servers
Basic idea of Tiers
Request Web
Thick
client server
Response
Tier 1: GUI
interactions with the
user and basic
validations
Applicati Databas
on server e server
Presentation
Logic
Database
Driver
Presentation / Business Layer
Tier Boundary
Data Layer
Database
Two tier architecture
Presentation
Logic
Tier Boundary
Business Business Business
Logic Logic Logic
Database
Driver
Tier Boundary
Data Layer
Database
N-Tier architecture
HTML HTML
Browser Browser
COM ActiveX
Client Control
Firewall
ASP ISAPI
DCOM DCOM
HTML/XML pages
DCOM
Business Tier
Database Tier
Database Database
Sun’s Web Technologies
Presentation Tier
HTML HTML
Browser Browser
CORBA Java
Client Applet
Firewall
Servlet JSP
RMI/IIOP RMI/IIOP
HTML/XML pages
RMI/IIOP
Business Tier
MQSeries/Java Messaging
Service (JMS)
Database Tier
Database Database
Component World …
Eject Actual
implementation
External Skip in terms of
world (a user voltages,
of the audio - Volume + signals, currents
system) etc.
- Bass +
Interface Implementation
Technologies for implementing
components
RMI / EJB
CORBA
COM, DCOM, COM+
Limitations
Web services (XML based standards)
Basic model of distributed system
Service
Registry
find Publish
Service Service
Requestor provider
Bind
An Archetypal Distributed Objects System
object
registry
client server
proxy proxy
runtime runtime
support support
network network
support support
physical datapath
logical datapath
Distributed Object Systems / Protocols
Web
server
Client
Browse HTTP
r
Applets HTML
HTML pages
Java applets
Applicatio
n server
Object
Implementatio
n
Stub Skeleton
JRMP / IIOP
CORBA architecture
Web
server
Client
Brows HTTP
er
Applets HTML
HTML pages
Java applets
Application
server
Object
Implementation
Client Server
ORB IIOP ORB
DCOM architecture
Web
server
Client
Browse HTTP
r
ActiveX HTML
HTML pages
ActiveX
Controls Applicatio
n server
Object
Implementation
Proxy Stub
DCOM
Limitations of Components
Tightly coupled
Cross language/ platform issues
Interoperability issues
Maintenance and management
Security issues
Application Centric
Business
Narrow Consumers
scope Limited Business Processes
Finance
Application
Application
Integration bound to
Supply Application EAI vendor
Architecture
Manufacturing Distribution
Redundancy
Overlapped resources
Overlapped providers
services
• Governed by architectural
patterns and policies
source:TietoEnator AB, Kurts
Bilder
SOA Defined
SOA is a software architecture model
in which business functionality are logically grouped and
encapsulated into self contained,distinct and reusable
units called services that
represent a high level business concept
can be distributed over a network
can be reused to create new business applications
contain contract with specification of the purpose,
functionality, interfaces (coarse grained), constraints,
usage
SOA Defined
SOA is a software architecture model
in which business functionality are logically grouped and
encapsulated into self contained,distinct and reusable
units called services that
represent a high level business concept
can be distributed over a network
can be reused to create new business applications
contain contract with specification of the purpose, functionality,
interfaces (coarse grained), constraints, usage
invokes
GUI Applications Serv ices
uses
participates exposes
in
Student Business
Applications
Goals Business
supported Processes
by
Why SOA?
Heterogeneous cross-platform
Reusability at the macro (service) level rather
than micro(object) level
Interconnection to - and usage of - existing IT
(legacy) assets
Granularity, modularity, composability,
componentization
Compliance with industry standards
SOA is an evolutionary step
for architecture
SOA is an evolutionary step
in distributed communications
source:Sam Gentile
Features of SOA
Operations
• Units of Work
• Example: Determine Cost of Attendance
Processes
• Composed / orchestrated groups of services
• Example: Financial Aid Disbursement
SOA principles
Service Encapsulation
Service Loose coupling
Service Contract
Service abstraction
Service Documentation
Service reusability
Service composability
Service autonomy
Service optimization and Discovery
Service statelessness
Loose Coupling
between
Service contract
Service implementation
Service consumers
Source: Thomas Erl
Standardized Service Contracts
Policy
Abstraction
“Service contracts only contain essential
information and information about services is
limited to what is published in service contracts”
Primary benefits
Increased reliability
Behavioral predictability
Goals
Increase service scalability
Support design of agnostic
logic and improve service reuse Source: Thomas Erl
Applying SOA - Governance
Governance is a program that makes sure people
do what is ‘right’
Policies
Codification of laws, regulations, corporate
guidelines and best practices
Must address all stages of the service lifecycle
(technology selection, design, development
practices, configuration management, release
management, runtime management, etc.)
Applying SOA - Governance
Processes
Enforce policies
System-driven processes (code check-in, code
builds, unit tests)
Human-driven process (requests, design
reviews, code reviews, threat assessment, test
case review, release engineering, service
registration, etc.)
Applying SOA - Governance
Metrics
Measurements of service reuse, compliancy
with policy, etc.
Organization
Governance program should be run by SOA
Board, which should have cross-functional
representatives
Foundation
rra
Te re
G dia
a
Sh
eo
M
e
I/CAD
Business
Capabilities
e ch
I/..
G/T
In
Serv r
ice Othe
ew
vi
al
ic
hn
c
te
Software and IT
Architects
Service implementation
SOA and deployment
w model
vi e Board
St d e l
Po a r
s
an
Mo rns
Pa t
e
lic ds
v i c
d
r
y
se
te
Enterprise
s
es
s
Architects
s in
Bu
Service specification
Service model
Designers
Business service
model
Applying SOA - Challenges
Service Orientation
Business functionality has to be made available as services.
Service contracts must be fixed
Sharing of Responsibilities
Potential service users must be involved in the design
Increased complexity! process and will have influence on the service design
Applying SOA – Renovation
Roadmap
source:IBM
Why SOA?
To enable Flexible, Federated Business Processes
Enabling a virtual federation of Enabling alternative Enabling reuse of
participants to collaborate in an implementations Services
end-to-end business process
Service
Service
Service
Inventory Service
Logistics Service Service
Service
Availability
Manufacturing
BPM Expressed in
terms of Services
Provided/Consumed
Smart Clients
Stores POS
Mobile
3rd Party Agents
Portal
Internal Systems
Policy Consistency
e.g. Single Customer Service
Details Service
Example Layers
Reasons for Layering
Presentation
1. Flexible composition. & workflow
2. Reuse.
Composed Services
3. Functional standardization in
lower levels
Basic Services
4. Customization in higher
layers
Underlying
5. Separation of Concerns. API
Aid
Disbursement
Process
Is Realized By
App
Logic
FA System Registrar System Dept of Ed Bursar
Microsoft .NET Mainframe ??? Java on Linux
Applying services to the problem
Monolithic
Before After
The System S1 S2 S3 S4
Loosely coupled
The goal for a SOA is a world wide mesh of
collaborating services, which are published
and available for invocation on the Service
Bus.
SOA is not just an architecture of services
seen from a technology perspective, but the
policies, practices, and frameworks by which
we ensure the right services are provided and
consumed.
Major service types
Basic Services:
Data-centric and logic-centric services
Encapsulate data behavior and data model and
ensures data consistency (only on one
backend).
Basic services are stateless services with high
degree of reusability.
Represent fundamental SOA maturity level and
usually are build on top existing legacy API
(underlying services.
Major service types
Composed Services :
expose harmonized access to inconsistent basic
services technology (gateways, adapters, façades,
and functionality-adding services).
Encapsulate business specific workflows or
orchestrated services.
Service Types SOA Management & Security
service mediation, routing, trust
enablement. ESB, Service Registry
facades )
ct
fra er
In S
n t
C lie Data centric and logic
t
ar centric consistent services.
Sm Highly reusable, stateless
servers
l
ice te
ta
r v o si
r
s
Po
Se mp
Foundation
Co
Service Blocks
rv i c
Core APIs
s
i ce
Se as
B
rra
Te re
G dia
a
Sh
eo
M
e
I/CAD
Business
Capabilities
e ch
I/..
G/T
In
Serv r
ice Othe
SOA Benefits Summary
Allow us to execute complex business
processes by composing systems from small,
less complex building blocks
Fosters collaborative business and technical
environment through sharing and coordination
of services
Create outward facing self-service applications
not constrained by organizational boundaries
Enables creating adaptive, agile systems that
are resilient to changes in the business
environment
Conclusions
SOA represents a fundamental change to the
way information systems will designed in the
future
Long term impact on IT portfolio management
is dramatic
Adds a significant dimension to system
evaluation process
Undertaking SOA requires commitment from
all levels of the organization and significant
investments (people, process, and tools)
Conclusion and Summary
SOA
Is complex
Requires governance
Requires executive management buy-in
Requires commitment with resources
Thank You.
Chapter-06
Cloud programming and
software environments
1
Introduction: A Golden Era in
Computing
Powerful
multi-core
processors
General
Explosion of
purpose
domain graphic
applications
processors
Superior
Proliferation
software
of devices methodologies
Virtualization
Wider bandwidth leveraging the
for communication powerful
hardware
2
Cloud Concepts, Enabling-
technologies, and Models:
The Cloud Context
3
Publish
Publish
scale
Inform
Interact
web
Integrate
Transact
Discover (intelligence)
Semantic
discovery
Automate (discovery)
HPC, cloud
Data-intensive
deep web
4
Top Ten Largest Databases
Top ten largest databases (2007)
7000
6000
5000
4000
Terabytes
3000
2000
1000
0
LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate
Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/
5
Challenges
• Alignment with the needs of the business / user / non-
computer specialists / community and society
• Need to address the scalability issue: large scale data, high
performance computing, automation, response time, rapid
prototyping, and rapid time to production
• Need to effectively address (i) ever shortening cycle of
obsolescence, (ii) heterogeneity and (iii) rapid changes in
requirements
• Transform data from diverse sources into intelligence and
deliver intelligence to right people/user/systems
• What about providing all this in a cost-effective manner?
6
Enter the cloud
7
“Grid Technology: A slide from my presentation
to Industry (2005)
• Emerging enabling technology.
• Natural evolution of distributed systems and the Internet.
• Middleware supporting network of systems to facilitate
sharing, standardization and openness.
• Infrastructure and application model dealing with sharing of
compute cycles, data, storage and other resources.
• Publicized by prominent industries as on-demand computing,
utility computing, etc.
• Move towards delivering “computing” to masses similar to
other utilities (electricity and voice communication).”
• Now,
Hmmm…sounds like the definition for cloud computing!!!!!
8
It is a changed world now…
• Explosive growth in applications: biomedical informatics, space
exploration, business analytics, web 2.0 social networking: YouTube,
Facebook
• Extreme scale content generation: e-science and e-business data
deluge
• Extraordinary rate of digital content consumption: digital gluttony:
Apple iPhone, iPad, Amazon Kindle
• Exponential growth in compute capabilities: multi-core, storage,
bandwidth, virtual machines (virtualization)
• Very short cycle of obsolescence in technologies: Windows Vista
Windows 7; Java versions; CC#; Phython
• Newer architectures: web services, persistence models, distributed
file systems/repositories (Google, Hadoop), multi-core, wireless and
mobile
• Diverse knowledge and skill levels of the workforce
• You simply cannot manage this complex situation with your
traditional IT infrastructure:
9
Answer: The Cloud Computing?
10
Enabling Technologies
Bandwidth
WS
Services interface
64-bit
processor
11
Common Features of Cloud Providers
Development Production
Environment: Environment
IDE, SDK, Plugins
12
Windows Azure
• Enterprise-level on-demand capacity builder
• Fabric of cycles and storage available on-request
for a cost
• You have to use Azure API to work with the
infrastructure offered by Microsoft
• Significant features: web role, worker role , blob
storage, table and drive-storage
13
Amazon EC2
• Amazon EC2 is one large complex web service.
• EC2 provided an API for instantiating computing
instances with any of the operating systems
supported.
• It can facilitate computations through Amazon
Machine Images (AMIs) for various other models.
• Signature features: S3, Cloud Management
Console, MapReduce Cloud, Amazon Machine
Image (AMI)
• Excellent distribution, load balancing, cloud
monitoring tools
14
Google App Engine
• This is more a web interface for a development
environment that offers a one stop facility for
design, development and deployment Java and
Python-based applications in Java, Go and Python.
• Google offers the same reliability, availability and
scalability at par with Google’s own applications
• Interface is software programming based
• Comprehensive programming platform irrespective
of the size (small or large)
• Signature features: templates and appspot,
excellent monitoring and management console
15
Demos
• Amazon AWS: EC2 & S3 (among the many
infrastructure services)
o Linux machine
o Windows machine
o A three-tier enterprise application
• Windows Azure
o Storage: blob store/container
o MS Visual Studio Azure development and production environment
16
Cloud Programming Models
17
The Context: Big-data
• Data mining huge amounts of data collected in a wide range of
domains from astronomy to healthcare has become essential for
planning and performance.
• We are in a knowledge economy.
o Data is an important asset to any organization
o Discovery of knowledge; Enabling discovery; annotation of
data
o Complex computational models
o No single environment is good enough: need elastic, on-
demand capacities
• We are looking at newer
o Programming models, and
o Supporting algorithms and data structures.
18
Google File System
• Internet introduced a new challenge in the form web
logs, web crawler’s data: large scale “peta scale”
• But observe that this type of data has an uniquely
different characteristic than your transactional or the
“customer order” data : “write once read many
(WORM)” ;
• Privacy protected healthcare and patient information;
• Historical financial data;
• Other historical data
19
What is Hadoop?
’ At Google MapReduce operation are run on a
special file system called Google File System (GFS)
that is highly optimized for this purpose.
’ GFS is not open source.
’ Doug Cutting and others at Yahoo! reverse
engineered the GFS and called it Hadoop Distributed
File System (HDFS).
’ The software framework that supports HDFS,
MapReduce and other related entities is called the
project Hadoop or simply Hadoop.
’ This is open source and distributed by Apache.
20
Fault tolerance
21
HDFS Architecture
Metadata(Name, replicas..)
Metadata ops Namenode (/home/foo/data,6. ..
Client
Block ops
Read Datanodes Datanodes
replication
B
Blocks
Client
22
Hadoop Distributed File System
HDFS Server Master node
HDFS Client
Application
Local file
system
Block size: 2K
Name Nodes
Block size: 128M
Replicated
23
What is MapReduce?
’ MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta bytes
per day)
A map function extracts some intelligence from raw data.
A reduce function aggregates according to some guides the
data output by the map.
Users specify the computation in terms of a map and a reduce
function,
Underlying runtime system automatically parallelizes the
computation across large-scale clusters of machines, and
Underlying system also handles machine failures, efficient
communications, and performance issues.
-- Reference: Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data
processing on large clusters. Communication of ACM 51, 1 (Jan. 2008), 107-
113.
24
Classes of problems “mapreducable”
25
Large scale data splits Map <key, 1>
<key, value>pair Reducers (say, Count)
Parse-hash
Count
P-0000
, count1
Parse-hash
Count
P-0001
, count2
Parse-hash
Count
P-0002
Parse-hash ,count3
26
MapReduce Engine
• MapReduce requires a distributed file system and an
engine that can distribute, coordinate, monitor and
gather the results.
• Hadoop provides that engine through (the file system
we discussed earlier) and the JobTracker +
TaskTracker system.
• JobTracker is simply a scheduler.
• TaskTracker is assigned a Map or Reduce (or other
operations); Map or Reduce run on node and so is
the TaskTracker; each task is run on its own JVM on a
node.
27
Thank You
28
Grid Computing
Data-Intensive Computing
4
Application Management
Application
• Description
• Partitioning partitioning
• Mapping
• Allocation mapping
allocation
management
6
Grid-ADL
Traditional systems
1
2 5
6
alternative systems
2 .. 5
7
Partitioning/Clustering
• Application represented as a graph
– Nodes: job
– Edges: precedence
• Graph partitioning techniques:
– Minimize communication
– Increase throughput or speedup
– Need good heuristics
• Clustering
8
Graph Partitioning
• Optimally allocating the components
of a distributed program over several
machines
• Communication between machines is
assumed to be the major factor in
application performance
• NP-hard for case of 3 or more
terminals
9
Collapse the graph
• Given G = {N, E, M}
• N is the set of Nodes
• E is the set of Edges
• M is the set of
machine nodes
10
Dominant Edge
• Take node n and its
heaviest edge e
• Edges e1,e2,…er with
opposite end nodes not in
M
• Edges e´1,e´2,…e´k with
opposite end nodes in M
• If w(e) ≥ Sum(w(ei)) +
max(w(e´1),…,w(e´k))
• Then the min-cut does
not contain e
• So e can be collapsed
11
Machine Cut
• Let machine cut Mi be the
set of all edges between
a machine mi and non-
machine nodes N
• Let Wi be the sum of the
weights of all edges in the
machine cut Mi
• Wi’s are sorted so
W1 ≥ W2 ≥ …
• Any edge that has a
weight greater than W2
cannot be part of the min-
cut
12
Zeroing
• Assume that node n has edges to each of
the m machines in M with weights
w1 ≤ w2 ≤ … ≤ wm
• Reducing the weights of each of the m
edges from n to machines M by w1 doesn’t
change the assignment of nodes for the
min-cut
• It reduces the cost of the minimum cut by
(m-1)w1
13
Order of Application
• If the previous 3 techniques are repeatedly
applied on a graph until none of them are
applicable:
• Then the resulting reduced graph is
independent of the order of application of
the techniques
14
Output
• List of nodes collapsed into each of the
machine nodes
• Weight of edges connecting the machine
nodes
• homepages.cae.wisc.edu/~ece556/fall2002/PROJECT/distributed_applications.ppt
15
Graph partitioning
• Hendrickson and Kolda, 2000: edge cuts:
– are not proportional to the total
communication volume
– try to (approximately) minimize the total
volume but not the total number of messages
– do not minimize the maximum volume and/or
number of messages handled by any single
processor
– do not consider distance between processors
(number of switches the message passes
through, for example)
– undirected graph model can only express
symmetric data dependencies.
16
Graph partitioning
• To avoid message contention and improve
the overall throughput of the message
traffic, it is preferable to have
communication restricted to processors
which are near each other
17
Kwok and Ahmad, 1999:
multiprocessor scheduling taxonomy
18
List Scheduling
• make an ordered list of processes by assigning them some
priorities
• repeatedly execute the following two steps until a valid schedule
is obtained:
– Select from the list, the process with the highest priority for
scheduling.
– Select a resource to accommodate this process.
• priorities are determined statically before the scheduling process
begins. The first step chooses the process with the highest
priority, the second step selects the best possible resource.
• Some known list scheduling strategies:
• Highest Level First algorithm or HLF
• Longest Path algorithm or LP
• Longest Processing Time
• Critical Path Method
(1988)
24
Scheduling mechanisms for grid
• Berman, 1998 (ext. by Kayser, 2006):
– Job scheduler
– Resource scheduler
– Application scheduler
– Meta-scheduler
25
Scheduling mechanisms for grid
• Legion
– University of Virginia (Grimshaw, 1993)
– Supercomputing 1997
– Currently Avaki commercial product
26
Legion
• is an object oriented infrastructure for grid
environments layered on top of existing
software services.
• uses the existing operating systems,
resource management tools, and security
mechanisms at host sites to implement
higher level system-wide services
• design is based on a set of core objects
27
Legion
• resource management is a negotiation between
resources and active objects that represent the
distributed application
• three steps to allocate resources for a task:
– Decision: considers task’s characteristics and
requirements, resource’s properties and policies, and
users’ preferences
– Enactment: the class object receives an activation
request; if the placement is acceptable, start the task
– Monitoring: ensures that the task is operating
correctly
28
Globus
• Toolkit with a set of components that implement basic
services:
– Security
– resource location
– resource management
– data management
– resource reservation
– Communication
• From version 1.0 in 1998 to the 2.0 release in 2002 and
the latest 3.0, the emphasis is to provide a set of
components that can be used either independently or
together to develop applications
• The Globus Toolkit version 2 (GT2) design is highly
related to the architecture proposed by Foster et al.
• The Globus Toolkit version 3 (GT3) design is based on
grid services, which are quite similar to web services.
GT3 implements the Open Grid Service Infrastructure
(OGSI).
• The current version, GT4, is also based on grid services,29
but with some changes in the standard
Globus: scheduling
• GRAM: Globus Resource Allocation Manager
• Each GRAM responsible for a set of resources operating under the
same site-specific allocation policy, often implemented by a local
resource management
• GRAM provides an abstraction for remote process queuing and
execution with several powerful features such as strong security and
file transfer
• It does not provide scheduling or resource brokering capabilities but it
can be used to start programs on remote resources, despite local
heterogeneity due to the standard API and protocol.
• Resource Specification Language (RSL) is used to communicate
requirements.
• To take advantage of GRAM, a user still needs a system that can
remember what jobs have been submitted, where they are, and what
they are doing.
• To track large numbers of jobs, the user needs queuing, prioritization,
logging, and accounting. These services cannot be found in GRAM
alone, but are provided by systems such as Condor-G
30
MyGrid and OurGrid
• Mainly for bag-of-tasks (BoT) applications
• uses the dynamic algorithm Work Queue
with Replication (WQR)
• hosts that finished their tasks are assigned
to execute replicas of tasks that are still
running.
• Tasks are replicated until a predefined
maximum number of replicas is achieved
(in MyGrid, the default is one).
31
OurGrid
• An extension of MyGrid
• resource sharing system based on peer-
to-peer technology
• resources are shared according to a
“network of favors model”, in which each
peer prioritizes those who have credit in
their past history of interactions.
32
GrADS
• is an application scheduler
• The user invokes the Grid Routine component to execute an application
• The Grid Routine invokes the component Resource Selector
• The Resource Selector accesses the Globus MetaDirectory Service (MDS)
to get a list of machines that are alive and then contact the Network
Weather Service (NWS) to get system information for the machines.
• The Grid Routine then invokes a component called Performance Modeler
with the problem parameters, machines and machine information.
• The Performance Modeler builds the final list of machines and sends it to
the Contract Developer for approval.
• The Grid Routine then passes the problem, its parameters, and the final list
of machines to the Application Launcher.
• The Application Launcher spawns the job using the Globus management
mechanism (GRAM) and also spawns the Contract Monitor.
• The Contract Monitor monitors the application, displays the actual and
predicted times, and can report contract violations to a re-scheduler.
34
EasyGrid
• Mainly concerned with MPI applications
• Allows intercluster execution of MPI
processes
35
Nimrod
• uses a simple declarative parametric modeling language
to express parametric experiments
• provides machinery that automates:
– task of formulating,
– running,
– monitoring,
– collating results from the multiple individual experiments.
• incorporates distributed scheduling that can manage the
scheduling of individual experiments to idle computers in
a local area network
• has been applied to a range of application areas, e.g.:
Bioinformatics, Operations Research, Network
Simulation, Electronic CAD, Ecological Modelling and
Business Process Simulation.
36
Nimrod/G
37
AppLeS
• UCSD (Berman and Casanova)
• Application parameter Sweep Template
• Use scheduling based on min-min, min-
max, sufferage, but with heuristics to
estimate performance of resources and
tasks
– Performance information dependent
algorithms (pida)
• Main goal: to minimize file transfers
38
GRAnD [Kayser et al., CCP&E, 2007]
39
Vega GOS (the CNGrid OS)
GOS overview
A user-level middleware running on a client
machine
• GOS API
– GOS API for application developers
• grid(): constructs a Grid process on the client machine.
• gridcon(): grid process connects to the Grid system.
• gridclose(): close a connected grid.
– gnetd API for service developer on Grid servers
• grid_register(): register a service to Grid.
• grid_unregister(): unregister a service.
41
Grid
• Not yet mentioned:
– Simulation: SimGrid and GridSim
– Monitoring: RTM, MonaLisa, ...
– Portals: GridIce, Genius, ...
42
Introduction to P2P systems
Application displays
Alice chooses one
other peers that have
of the peers, Bob.
copy of Hey Jude.
• Example: D=2
• 1’s neighbors: 2,3,4,6
• 6’s neighbors: 1,2,4,5
• Squares “wrap around”, e.g.,
7 and 8 are neighbors
• Expected # neighbors: O(D)
CAN : Routing
To get to <n1, n2, …, nD> from <m1, m2, …, mD>
choose a neighbor with smallest Cartesian distance from <m1,
m2, …, mD> (e.g., measured from neighbor’s center)
O(log N)
P2P Content Dissemination
Content dissemination
Content dissemination is about allowing
clients to actually get a file or other
data after it has been located
Important parameters
Throughput
Latency
Reliability
P2P Dissemination
Problem Formulation
Least time to disseminate:
Fixed data D from one seeder to N
nodes
Insights / Axioms
Involving end-nodes speeds up the
process (Peer-to-Peer)
Chunking the data also speeds up
the process
Raises many questions
How do nodes find other nodes for
exchange of chunks?
Which chunks should be transferred?
Is there an optimal way to do this?
Optimal Solution in
Homogeneous Network
M Chunks Seeder
Least time to disseminate: Of Data
However:
-Sophisticated mechanisms for
heterogeneous networks (SplitStream)
- Fault-tolerance Issues
BitTorrent
Currently 20-50% of internet traffic is
BitTorrent
Special client software is needed
BitTorrent, BitTyrant, μTorrent, LimeWire …
Basic idea
Clients that download a file at the same time help
each other (ie, also upload chunks to each other)
BitTorrent clients form a swarm : a random
overlay network
BitTorrent : Publish/download
Publishing a file
Put a “.torrent” file on the web: it contains the
address of the tracker, and information about the
published file
Start a tracker, a server that
Gives joining downloaders random peers to download from
and to
Collects statistics about the swarm
There are “trackerless” implementations by using
Kademlia DHT (e.g. Azureus)
Download a file
Install a bittorrent client and click on a “.torrent” file
BitTorrent : Overview
File.torrent :
-URL of tracker
-File name
-File length
Seeder – peer having entire file -Chunk length
-Checksum for each
Leecher – peer downloading file chunk (SHA1 hash)
BitTorrent : Client
Client first asks 50 random peers from tracker
Also learns about what chunks (256K) they have
Pick a chunk and tries to download its pieces
(16K) from the neighbors that have them
Download does not work if neighbor is disconnected or
denies download (choking)
Only a complete chunk can be uploaded to others
Allow only 4 neighbors to download (unchoking)
Periodically (30s) optimistic unchoking : allows
download to random peer
important for bootstrapping and optimization
Otherwise unchokes peer that allows the most
download (each 10s)
BitTorrent : Tit-for-Tat
Tit-for-tat
Cooperate first, then do what the opponent
did in the previous game
BitTorrent enables tit-for-tat
A client unchokes other peers (allow them
to download) that allowed it to download
from them
Optimistic unchocking is the initial
cooperation step to bootstrapping
BitTorrent : Chunk selection
What chunk to select to download?
Clients select the chunk that is rarest among
the neighbors ( Local decision )
Increases diversity in the pieces downloaded;
Increase throughput
Increases likelihood all pieces still available even if
original seed leaves before any one node has
downloaded entire file
Except the first chunk
Select a random one (to make it fast: many
neighbors must have it)
BitTorrent : Pros/Cons
Pros
Proficient in utilizing partially downloaded files
Encourages diversity through “rarest-first”
Extends lifetime of swarm
Works well for “hot content”
Cons
Assumes all interested peers active at same time;
performance deteriorates if swarm “cools off”
Even worse: no trackers for obscure content
Overcome tree structure –
SplitStream, Bullet
Tree
Simple, Efficient, Scalable
But, vulnerable to failures, load-unbalanced, no
bandwidth constraint
SplitStream
Forest (Multiple Trees)
Bullet
Tree(Metadata)
+ Mesh(Data)
CREW
Mesh(Data,Metadata)
SplitStream
Forest based dissemination
Basic idea
Split the stream into K stripes (with MDC coding)
For each stripe create a multicast tree such that
the forest
Contains interior-node-disjoint trees
Respects nodes’ individual bandwidth constraints
Approach
On the Pastry and Scribe(pub/sub)
SplitStream : MDC coding
Multiple Description coding
Fragments a single media stream
into M substreams (M ≥ 2 )
K packets are enough for decoding (K < M)
Less than K packets can be used to
approximate content
Useful for multimedia (video, audio) but not for
other data
Cf) erasure coding for large data file
SplitStream : Interior-node-
disjoint tree
Each node in a set of trees is interior
node in at most one tree and leaf node
in the other trees.
Each substream is disseminated over
subtrees
S
BiModal Multicast (99), Lpbcast (DSN 01), Rodrigues’04 (DSN), Brahami ’04, Verma’06 (ICDCS),
Eugster’04 (Computer), Koldehofe’04, Periera’03
Gossip-based Broadcast:
Drawbacks
Problems
More faults, higher fanout needed (not dynamically adjustable)
Higher redundancy lower system throughput slower dissemination
Scalable view & buffer management
Adapting to nodes’ heterogeneity
Adapting to congestion in underlying network
CREW: Preliminaries
F1 Fn
N1 N
File n
N0 F
F1 F2 F3 F n-1 F n
Origin node N0 opens n concurrent connections to
nodes N1, … , Nn and sends to each node the
following items:
a distribution list of nodes R = {N1, … , Nn} to which subfile
Fi has to be sent on the next step;
subfile Fi .
FastReplica : Collection
N3 F3
F2
N2 N n-1
F1
F1
F n-1
F1
F1
N1 Fn
F1 Nn
File F
N0
F1 F2 F3 F n-1 F n
After receiving Fi , node Ni opens (n-1) concurrent
network connections to remaining nodes in the
group and sends subfile Fi to them
FastReplica : Collection
(overall)
N3 F3
F2
N2 N n-1
F3
F2
F n-1
F n-1
F1
N1 Fn N Fn
File n
N0 F
Each node N i has: F1 F2 F3 F n-1 F n
Benefits:
The impact of congestion along the involved paths
is limited for a transfer of 1/n-th of the file,
FastReplica takes advantage of the upload and
download bandwidth of recipient nodes.