Mtorrent: A Multicast Enabled Bittorrent Protocol

MTorrent: A Multicast Enabled
BitTorrent Protocol
A thesis submitted in partial fulﬁlment

of the requirements for the degree
Bachelor-Master of Technology (Dual Degree)
by
Piyush Agrawal
Y3167218
Supervised By
Prof. R. K. Ghosh
Department of Computer Science and Engineering

Indian Institute of Technology, Kanpur
May 16, 2008
.
Dedicated to
My family and
a special friend.
1
Contents
1 Introduction 1
1.1 Models of Content Distribution . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 The World-Wide Web . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Peer-to-Peer Systems . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The problem of data repetivity . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Peer-to-Peer Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 BitTorrent Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 BitTorrent Location-aware Protocol . . . . . . . . . . . . . . . . . 7
1.3.4 Caching techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Emulab 10
2.1 Need for Emulab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Hardware and Software Resources . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Access to Emulab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Specification of topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 The Emulab Control Network . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Performance Study of Content Distribution Models 18

3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Used implementations of content distribution models . . . . . . . . . . . 22
i
3.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 WWW Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 Peer-to-Peer Model . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Motivation for MTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Multicast 29
4.1 IP Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 IP Multicast as a Content Distribution Model . . . . . . . . . . . . . . . 30
4.3 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 MTorrent 33
5.1 Overview of the protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 IP Multicast support on islands . . . . . . . . . . . . . . . . . . . 36
5.2.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.4 Comparison of MTorrent, BitTorrent and WWW . . . . . . . . . 38
5.2.5 MTorrent and Multicast Unreliability . . . . . . . . . . . . . . . . 45
6 Conclusions 49
ii
List of Figures
1.1 Concurrent downloads cause heavy load on server bandwidth and network
resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Screen shot showing operations available through the Emulab portal . . . 13
2.2 A simple topology to illustrate the use of NS2 scripts . . . . . . . . . . . 14
2.3 Control network interfaces (blue) to an experiment topology (red). Source: [4] 16
3.1 The experimental setup used for the performance study . . . . . . . . . . 20
4.1 Multicast Transmission Sends a Single Multicast Packet Addressed to All

Intended Recipients (Source: [8]) . . . . . . . . . . . . . . . . . . . . . . . 30
5.1 Flow chart showing MTorrent operations . . . . . . . . . . . . . . . . . . 35

5.2 Topology used for performance evaluation . . . . . . . . . . . . . . . . . 37
5.3 Cumulative Distribution Function of time for download by each client . . 39
5.4 Amount of data transferred over each link using MTorrent, BitTorrent or
WWW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.5 Stress on link using MTorrent, BitTorrent or WWW . . . . . . . . . . . . 43
5.6 Time for download with Packet Loss Percentage of each LAN . . . . . . 46
5.7 Average amount of data transferred with Packet Loss Percentage of each
LAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.8 Link stress with Packet Loss Percentage of each LAN . . . . . . . . . . . 47
5.9 Time for download with varying congestion level . . . . . . . . . . . . . . 48
iii
List of Tables
3.1 Link Statistics for file download using Wget . . . . . . . . . . . . . . . . 24

3.2 Link Statistics for file download using BitTorrent . . . . . . . . . . . . . 26
5.1 Mapping between link name and link label . . . . . . . . . . . . . . . . . 40

5.2 Download time (in seconds) statistics of MTorrent, BitTorrent and WWW 41
5.3 Amount of data (in Megabytes) statistics for all type of links . . . . . . . 42
5.4 Amount of data (in Megabytes) statistics for core links . . . . . . . . . . 42
5.5 Amount of data (in Megabytes) statistics for access links . . . . . . . . . 42
5.6 Stress statistics for all type of links . . . . . . . . . . . . . . . . . . . . . 44
5.7 Stress statistics for core links . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.8 Stress statistics for access links . . . . . . . . . . . . . . . . . . . . . . . 44
iv
Acknowledgements
This work would not have been possible without the constant support and encourage-
ment of my thesis supervisor Prof. Ratan Kumar Ghosh. I am highly indebted to him
for discussing the ideas at great length and for encouraging me all through.
I would also like to thank Dr. Venkata N. Padmanabhan of Microsoft Research for
discussing various ideas with me and pointing me to current state of the art research in
this field. A special thanks to the Flux Group at the University of Utah, which hosts the
Emulab network testbed, for allowing me to use their facility for carrying experimental
studies. The prompt help from the support staff of Emulab helped me progress through
my research smoothly.
I would also like to thank Hitesh Khandelwal, who worked with me during the later
part of this thesis. His fresh ideas and energy often helped us to come out of deadlocks.
My friends at IIT Kanpur have been very understanding and helpful throughout. I
especially thank all my wing mates who constantly shared ups and downs of my journey
during the last five years unfailingly.
I would like to thank my parents and my elder brother for encouraging me to take
my own decisions and for helping me maintain a balanced perspective in life.
I cannot end without being thankful to IIT Kanpur in general and the CSE depart-
ment in particular. Environment of this place always had something to offer. Stay over
here, though demanding, was enjoyable, fruitful and memorable one.
Piyush Agrawal
v
Abstract
Over the last decade, internet has seen an exponential increase in its growth. With
more and more people using it, efficient data delivery over the internet has become a
key issue. Various specialized Content Delivery Systems like content delivery networks
(CDNs) and peer-to-peer (P2P) file sharing systems have been used to achieve better
performance in data delivery. While the invention of each of such specialized systems
has improved the user experience, some fundamental shortcomings of these systems have
often been neglected. These shortcomings of content distribution systems have become
severe bottlenecks in scalability of the internet. In this work, we focus on the problem
of repetivity of data being transmitted using several state of the art content distribution
systems and show how severely it impacts the network economics and the experience of
end-user. We base our findings on real world large scale measurement studies conducted
over Emulab, which is a network testbed hosted by the University of Utah. We identify
that P2P systems like Bittorrent have gained popularity in the recent years and have
been widely deployed. We then focus on the problem of repetivity in Bittorrent and
propose a novel solution to restrict the same by exploiting IP Multicast functionality
wherever available. We call the proposed protocol MTorrent. Our approach leads to
44% reduction in download time, 65% reduction in traffic load on internet links and
40% reduction in download of redundant packets when compared to the popular and
widely deployed BitTorrent system. The new protocol is interoperable with BitTorrent
and only requires a few changes at the client end. We have implemented the prototype
system of our protocol over a network consisting of 44 nodes, spread over 9 LANs and
36 end clients.
Chapter 1
Introduction
Over the last decade, internet has seen an exponential increase in its growth. With more
and more people using it, efficient data delivery over the internet has become a key is-
sue. Various specialized Content Delivery Systems (CDS) have been used to achieve
better performance and scalability in data delivery. While the inventions of such spe-
cialized systems have improved the user experience, some fundamental shortcomings of
these systems have been either unattended or neglected. Since CDS have been deployed
massively over the Internet, these shortcomings have severely impacted the scalability.
The need to scale content delivery systems has been continuously felt and has led to
development of thousand-node clusters, global-scale content delivery networks, and more
recently, self-managing peer-to-peer structures. These content delivery mechanisms have
changed the nature of Internet content delivery and traffic. Therefore, to exploit full
potential of the modern Internet, there is a requirement for a detailed understanding of
these new mechanisms and the data they serve.
Content distribution on the Internet uses many different service architectures, rang-
ing from centralized client-server to fully distributed systems. The recent wide-spread
use of peer-to-peer applications such as SETI, Napster, and Gnutella indicate that there
are many potential benefits of deploying a fully distributed peer-to-peer system. Two of
the obvious advantages being more resilience and higher availability through large scale
replication of content at large numbers of peers. Current work on peer-to-peer con-
tent location has focused on designing scalable algorithms. However, in a heterogeneous
environment such as the Internet, performance is an equally important consideration.
1
1.1 Models of Content Distribution
Several Content Distribution models have been proposed. The two main models include:
the client/server oriented world-wide web (WWW) and peer-to-peer networks. At a high
level, the purpose of each of these models is to enable efficient delivery of data to the
end users. Efficiency can be measured in terms of metrics like the total download time,
load distribution on the networks etc. At the architectural level, these systems differ
significantly. These differences not only result in differences in performance but the
suitability of underlying class of applications. Each system performs optimally on some
while sub-optimally on other applications.
1.1.1 The World-Wide Web
Data transfer using the world-wide web has been the simplest and the most widely
used method. The architecture is simple. It is composed of a web server hosting the
content/data and the clients which use the HTTP [19] protocol to request content from
the server. Each client independently requests the server for content and no coordination
amongst the clients is assumed. Several studies have been performed examining various
aspects of the web, including web workloads [15] and characterizing web objects [16]. It
has been suggested that most of these objects are small (5-10KB), but the distribution
of object sizes is heavy-tailed and very large objects also exist.
1.1.2 Peer-to-Peer Systems
Peer-to-peer file sharing systems have become increasingly popular in recent years. In a
P2P system, peers collaborate to form a distributed system for the purpose of exchanging
the content. Peers that connect to the system behave both as servers and clients. A
file that one peer downloads is often made available for upload at other peers. The
participation is purely voluntary. However, a recent study [26] has shown that most
content-serving hosts run by end-users suffer from low availability, and have relatively
low capacity network connections (modem, cable modems, or DSL routers).
Users interact with a P2P system in two ways: they attempt to locate objects of
interest by issuing search queries, and once relevant objects have been located, users
2
`
Access Router
` Core Router
`
Access Router
Access Router
`
`
`
`
Figure 1.1: Concurrent downloads cause heavy load on server bandwidth and network
resources
issue download requests for the content. P2P systems differ in how downloads proceed,
once an object of interest has been located. Most systems transfer content over a direct
connection between the object provider and the object seeker. A latency-improving
optimization in some systems is to download multiple object fragments in parallel from
multiple replicas. A recent study [21] has found the peer-to-peer traffic of a small ISP
to be highly repetitive, exhibiting great potential for caching and other techniques for
enhancing performance.
1.2 The problem of data repetivity

Consider the scenario shown in Figure 1.1. The network topology contains a file server
which hosts a file to be downloaded by 9 clients. The file server is connected to a core
router which is then connected to three other access routers. All the clients are connected
to the access routers.
Each client establishes an independent TCP connection to the file server to fetch the
file. If all the clients need to download the file at the same time, nine parallel TCP
connections with file server as the source have to be started. This means that the server
3
opens 9 different sockets to serve each TCP connection and essentially transmits the
same data through each of these sockets. Thus, nine exact copies of the file available at
server are sent across the link connecting the file server and the core router. The core
router in turn sends 3 copies of the same data on each of the access links.
Now imagine the scenario where the number of interested clients increases from
nine to say around a few hundreds. This is common in case of new files (like movies)
getting hosted on websites or critical security patches being made available by software
companies. In that case, too much of server bandwidth and bandwidth of access routers
is wasted. This leads to each client getting low download rates and bad user experience.
We call this problem as the problem of data repetivity and work towards solving this by
proposing MTorrent.
1.3 Related Work
1.3.1 Peer-to-Peer Systems
A peer-to-peer computer network uses diverse connectivity between participants in a

network and the cumulative bandwidth of network participants rather than conventional
centralized resources where a relatively low number of servers provide the core value to
a service or application. Peer-to-peer networks are typically used for connecting nodes
via largely ad hoc connections. Such networks are useful for many purposes, such as:
sharing content files containing audio, video, data or anything in digital format and
real-time data, such as telephony traffic.
Peer-to-peer networks can be classified by what they can be used for:
• file sharing
• voice data (telephony)
• media streaming (audio, video)
• discussion forums
Other classification of peer-to-peer networks is according to their degree of centralization.

In ’pure’ peer-to-peer networks:
4
• Peers act as equals, merging the roles of clients and server
• There is no central server managing the network
• There is no central router
Some examples of pure peer-to-peer application layer networks designed for file sharing
are Gnutella and Freenet. There also exist countless hybrid peer-to-peer systems, which
share the following characteristics:
• Have a central server that keeps information on peers and responds to requests for
that information.
• Peers are responsible for hosting available resources (as the central server does not
have them), for letting the central server know what resources they want to share,
and for making its shareable resources available to peers that request it.
A popular example of hybrid peer-to-peer application is BitTorrent [18].
1.3.2 BitTorrent Protocol
Consider a scenario where we need to transfer a piece of data quickly to a group of

people interested in it. Such a scenario arise in case of software companies releasing
critical patches (e.g. security patches and kernel updates for OS) to be applied to all the
customer clients, network administrators trying to install new software on large number
of machines, distributors of digital content like movies, videos and music etc. In such
scenarios, it is necessary that the transfer takes place as fast as possible, the load on
the distributor is not high and the whole transfer process is scalable to large number of
downloaders.
When a file is made available using HTTP for download, all upload cost is placed
on the server hosting the file. In scenarios like those mentioned above, this leads to a
serious scalability problem since as the number of simultaneous downloaders increase,
the HTTP server has to upload the same data to more and more clients using unicast.
The network links near the server become the bottleneck of the download in this case,
thus increasing the download time. BitTorrent is a Peer-to-Peer file sharing protocol,
originally designed and implemented by Bram Cohen [18]. With BitTorrent based file
5
download, when multiple clients are downloading the same file at the same time, they
upload pieces of the file to each other, redistributing the cost of upload to downloaders.
Thus, BitTorrent makes hosting a file with potentially unlimited number of downloaders
affordable.
To start a BitTorrent deployment, a static file with a .torrent extension, accessible
to all downloaders is placed on an ordinary server. The torrent file contains file related
information like the length of the file, name, hashing information and the URL of the
tracker. Trackers help downloaders find each other, while speaking a simple protocol
layered on top of HTTP. A downloader on initialization contacts the tracker sending
information like name of the file it is downloading and the port on which it is listening.
The tracker then responds with a list of peers which are also downloading the same file.
Based on this, the downloaders then connect to each other. To start the download, one
of the downloaders which already has the complete file (called seeder ) must be started.
The role of tracker is essentially limited to assisting peers finding each other and keeping
statistics and thus the load on it is minimal. Infact, even if the tracker goes offline after
all the downloaders have started, the protocol is not severely affected.
To keep track of which peers have what, BitTorrent paritions each file into several
pieces of fixed size, typically 250 Kb each. Each downloader reports to its peers which
pieces it has. The SHA1 hashes of pieces are included in the torrent file corresponding
to this file, and these SHA1 hashes are used to verify the integrity of each downloaded
piece.
Selecting pieces to download in a good order is important for having good perfor-
mance. For example, a poor piece selection criteria can result in all peers downloading
the same set of pieces and thus may end up with none of them having any piece to
upload to the other. BitTorrent follows a strict priority order in which once a single
sub-piece has been requested, the remaining sub-pieces from that particular piece are
requested first before sub-pieces of other pieces. Several piece selection criteria have
been suggested, including the following:
Rarest First: Following this policy, the peers download pieces which are rarest amongst
their own peers. It ensures downloaders to have pieces which their peers would
want to be uploaded. The pieces which are generally available amongst the peers
6
are left for later download so that the likelihood that a peer that is currently
offering the upload will later not have anything of interest is reduced.
Random First: When the downloading starts, the peers have nothing to upload, thus
it is important to get a complete piece as quickly as possible. The pieces to
be downloaded are selected at random until the first complete piece could be
assembled. After that the strategy changes to rarest first.
The peers are responsible for maximizing their own download rates. The peers do
this by downloading from whichever peer they can and deciding which peers to upload
to via a variant of tit-for-tat. To cooperate, the peers upload, and to not cooperate,
1
they choke peers. BitTorrent choking algorithm attempts to achieve Pareto efficiency
by having the peers reciprocate uploading to the other peers which upload to them.
Unutilized connections are also uploaded to on a trial basis to see if better transfer rates
could be found using them.
Although BitTorrent is a novel Peer-to-Peer protocol, it has following drawbacks as
well:
• For Small files, BitTorrent tends to show higher latency and overhead.
• Even though several downloaders might be physically close to each other and
downloading the same file (e.g. several clients on a LAN downloading a software
patch), the tracker returns a random list of peers to which a new downloader should
connect to. This leads to wastage of resources because of redundant downloads of
same pieces by peers close to each other.
1.3.3 BitTorrent Location-aware Protocol
As mentioned above, the original BitTorrent protocol can lead to peers geographically
distant from one another exchanging data when peers close by are also present, leading
to suboptimal performance. A location-aware BitTorrent protocol has been proposed
1
Given a set of alternative allocations of, say, goods or income for a set of individuals, a movement
from one allocation to another that can make at least one individual better off without making any
other individual worse off is called a Pareto improvement. An allocation is Pareto efficient or Pareto
optimal when no further Pareto improvements can be made.
7
in [1]. However, the proposal is in a very lose form with no real world implementation
or performance results. It requires each BitTorrent client to supply its approximate
geographical location (longitude and latitude) when contacting the tracker to get the
peer list. The tracker knows geographical locations of all downloaders and thus returns
the list of peers to the original requester which are closer to it, instead of returning a
random list (as in case of the original BitTorrent tracker).
Several issues arise here. Firstly, this protocol is not compatible with the original
BitTorrent protocol and requires changes at the trackers. Secondly, assuming that the
geographical location of a client would be known is not realistic. Thirdly, clients located
close to each other geographically may not be having a fast network link between them
and might be separated by several hops in terms of routing. Finally, absence of any
implementation of this protocol makes one skeptical about the relative performance
gain of it.
1.3.4 Caching techniques
Traffic generated by P2P systems accounts for a major fraction of the Internet traf-
fic. The sheer volume has lead to negative consequences, including: (1) significantly
increased load on the internet backbone, hence, higher chances of congestion; and (2)
increased cost on Internet Service Providers (ISPs), hence, higher service charges for the
internet users. A potential solution for overcoming these difficulties is to employ caching
techniques for P2P traffic so that future requests for the same objects could be served
from a local cache.
There has been good deal of research [21, 20] highlighting the importance and feasibil-
ity of caching P2P traffic. P2P traffic is found to be highly repetitive and thus responds
well to caching [21]. Most of the current P2P protocols are not ISP-friendly, because
they impose unnecessary traffic on ISPs [20]. To solve this, researchers have suggested
deployment of smart caching techniques or making P2P protocols locality-aware [29, 25].
8
1.4 Contributions
The main contribution of this work is the proposal, implementation and performance
analysis of MTorrent which is a new protocol for content distribution, which can also
co-exist with the standard BitTorrent protocol. MTorrent exploits the IP Multicast
functionality wherever available to solve the problem of data repetivity.
We carried out extensive experiments on Emulab [28] (a network testbed) to deter-
mine the effect of data repetivity on today’s networks. It was found that state-of-the-art
P2P content distribution protocols like BitTorrent still suffer from severe data repetivity
on most commonly configured network topologies. On the basis of this important obser-
vation, we designed MTorrent. The key features of MTorrent include the simplicity of
its design (and thus easy implementation) and inter-operability with other BitTorrent
clients.
We then carried out detailed experimental study on data repetivity of MTorrent
and compared it with that of BitTorrent and Client-Server model. Our experiments
conclusively prove that compared to BitTorrent, MTorrent reduces the download time
by 44%, load on Internet links by 65%, and download of redundant data packets by as
much as 40%.
1.5 Organization
This thesis is organized as follows: In Chapter 2, we discuss the issues in planning
experiments on Emulab [28], which is a large scale network testbed. In Chapter 3, we
present the results of a large scale experimental study to understand the performance
of each of the content distribution models and provide the motivation for MTorrent.
In Chapter 4, we examine how IP Multicast can be leveraged for large scale content
distribution. We present the design and compare the performance of MTorrent against
BitTorrent and WWW based model in Chapter 5. We conclude this work, along with
discussing the implications of it in Chapter 6.
9
Chapter 2
Emulab
1
Emulab [28] is a network testbed run by the School of Computing at the University of
Utah. It provides an environment to researchers working in areas like networking and
distributed systems, in which they can develop, debug and evaluate their systems.
The facility is available free of cost to most researchers. It provides integrated ac-
cess to a wide range of experimental environments, including Emulations, Live-Internet
Experimentation and also Simulation.
The discussions in this chapter are centered around the issues of planning experiments
on Emulab. At first, it explains why emulation is preferable over simulation specially in
research on networks and distributed systems. Then it describes how experiments can
be planned and conducted over Emulab. It also identifies certain problems which an
experimenter may encounter in conducting an Emulab experiment.
2.1 Need for Emulab

Researchers working in networking and distributed systems fields often face several chal-
lenges for performing large scale experiments to check the validity of their systems. Be-
low, we present some of the common ways in which most papers (not using a facility
like Emulab) present their results:
• “We evaluated our system on five nodes.”: Many inexperienced researchers end up
developing highly complicated systems but testing them only on a very small scale
1
http//www.emulab.net
10
like this. The results presented in many papers are merely based on pilot studies
conducted on small testbeds comprising just 5-10 nodes.
• “We evaluated our Web proxy design with 10 clients on 100Mbit Ethernet.”: Most
academic universities and research labs today have access to high speed local area
networks (LAN). It is a general tendency to implement new research prototypes
and evaluate their performance on such high speed, error free and congestion free
networks. Therefore, when systems based on research prototypes are actually
deployed in the real world internet, which is full of packet re-ordering, congestion
etc., most of these end up with unrecoverable failures.
• “Simulation results indicate ...”: Due to the lack of access to large scale network
testbed on the internet, a natural tendency is to convert the research ideas into
network simulations (using tools like ns-2, opnet etc.) and evaluate the perfor-
mance on simulated large scale topologies. However, this approach has two main
drawbacks: (1) The simulated networks are unable to accurately model all the
phenomena happening at the various networking layers. For example, simulation
results often ignore memory and CPU demands on individual nodes or ignore in-
terrupt handling overhead in their evaluation. (2) Most network simulators require
that the prototype applications to be written in specified programming languages,
using a particular set of APIs provided by the simulator. For example, applica-
tions which intend to use ns-2 network simulator for evaluation must be written
in tcl programming language. Due to this, applications prepared for evaluations
cannot be used in the real world and separate implementation might be needed
for running such applications on real machines.
• “Experimental network X runs FreeBSD 2.2.x...”: Some small scale network testbeds
have been established at many research facilities. However, the available software
and hardware on these testbeds are extremely inflexible, running particular oper-
ating systems etc.
It is clear that the diverse requirements of network and distributed systems research
can not be met by a single experimental environment. Packet level discrete event simu-
lations and live network experimentations are two extremes. The benefit of simulations
11
is that they provide a controlled and repeatable environment for experiments. However,
it is well known that the level of abstraction of the simulators is generally too high to
capture various low-level effects, like impact of interrupts under heavy load, etc. as
mentioned earlier. On the other hand, live networks are closer to reality, but do not
provide a repeatable environment and an ability to modify internal router behaviour.
Emulation [14, 27, 24] is a hybrid approach that subjects real applications, protocols
, and operating systems to a synthetic network environment. There are some singe-
node wide area network emulators like Dummynet [24], which can introduce artificial
delays, loses, and bandwidth constraints in a controlled manner. However, they require
a tedious manual configuration, thus preventing researchers from concentrating on the
experiments rather than the environment.
2.2 Hardware and Software Resources

Emulab software controls a cluster comprised of around 200 PCs. Any of these PCs can
function as an edge node, a traffic generator, or a router. Each machine has five 100Mb
Ethernet interfaces. One of them is on a dedicated control and data acquisition network,
while the others are for arbitrary use by experiments.
There is ample disk space and memory available on each of the nodes to support
computation and logging of monitoring data. To support arbitrary and isolated topolo-
gies and to provide security to Emulab users, Virtual LANs are employed. VLAN is a
switch technology that restricts traffic to the subnet defined by its members.
Emulab uses Dummynet and VLANs to emulate wide-area links within the local-
area environment. A Dummynet node is automatically inserted between two physical
nodes and enforces queue and bandwidth limitations, introducing delays and packet loss.
Dummynet nodes act as Ethernet bridges and are transparent to experimental traffic.
2.3 Access to Emulab

A new project on Emulab can be started by submitting a simple web form which is then
approved by the Emulab staff. Subsequently, the web interface acts as the universally-
accessible portal to Emulab. An experimenter can create or terminate experiments, view
12
Figure 2.1: Screen shot showing operations available through the Emulab portal
the corresponding virtual topology, or configure node properties through this portal.
Figure 2.1 shows a screen shot view of the controlling portal.
2.4 Specification of topology

An Emulab experiment can be configured with an Network Simulator (NS2) [9] script
written in Tcl. The choice of NS2 facilitates validation and comparison since NS specified
topologies, traffic generation, and events can be reproduced in an emulated or wide-area
environment.
The following NS snippet illustrates the specification of a simple network topology
as shown in figure 2.2
# This is a simple ns script. Comments start with #.

set ns [new Simulator]
source tb_compat.tcl
set nodeA [$ns node]
13
Figure 2.2: A simple topology to illustrate the use of NS2 scripts
set nodeB [$ns node]

set nodeC [$ns node]
set nodeD [$ns node]
set link0 [$ns duplex-link $nodeB $nodeA 30Mb 50ms DropTail]

tb-set-link-loss $link0 0.01
set lan0 [$ns make-lan "$nodeD $nodeC $nodeB " 100Mb 0ms]
# Set the OS on a couple.

tb-set-node-os $nodeA FBSD-STD
tb-set-node-os $nodeC RHL-STD
$ns rtproto Static
# Go!
$ns run
Emulab also provides a Java GUI, which can be alternatively used to generate the
14
topologies. The GUI in turn generates an NS configuration file. Also, there exist several
standard topology generator tools such as GT-ITM [30] or BRITE [23], which may be
used to generate an NS script.
2.5 The Emulab Control Network

Every physical node in the testbed has one interface connected to a common 100Mb
LAN. This interface is used by the testbed infrastructure to configure experiments (e.g.
distribute account info, load disks, etc.). The experimenters communicate with the nodes
from outside Emulab or from users.emulab.net (e.g. through ssh) using this interface.
This interface can also be used to monitor activity during an experiment. Control net
links have a fixed address and not configured by the user.
The control network is differentiated from the experimental network which is the set
of links specified in the experiment topology over which experiment applications should
communicate. The difference can be understood with the help of the following NS2
snippet which provides the configuration for a simple two-nodes-and-a-router topology:
source tb_compat.tcl
set ns [new Simulator]
set node1 [$ns node]

set router [$ns node]
set node2 [$ns node]
set linkA [$ns duplex-link $node1 $router 100Mb 0ms DropTail]

set linkB [$ns duplex-link $router $node2 1Mb 10ms DropTail]
$ns rtproto Static
$ns run
The above user specified topology is instantiated as depicted within dashed box in
Figure 2.3.
15
Figure 2.3: Control network interfaces (blue) to an experiment topology (red).
Source: [4]
A typical node in the experiment (e.g., node2) can be accessed from anywhere
on the internet or the “users.emulab.net” machine either via its emulab name (e.g.
“pc12.emulab.net”) or by the DNS alias which is assigned when the experiment is cre-
ated (e.g.,“node2.foo.testbed.emulab.net”). Here, “foo” refers to the name of the ex-
periment and “testbed” is the name of the project. Both types of names use the fixed
100Mb control network link (IP: 155.101.132.12) to access the node. These control net
links connect experiment nodes to outside entities in Figure 2.3.
From inside an experiment node, things look differently. Since control network is
not a part of the user specified topology, applications running inside the experiment
should ideally not even be aware of the existence of the control network. However, due
to several practical reasons, the control net must be visible to experiment nodes. Most
importantly, control network allows login from remote sites.
As the control net is visible to applications on a node, it can lead to its inadvertent
use by applications. In the above example, consider a ping from node1 to node2. The
traffic is expected to pass through the included router and over the experiment link to
node2, resulting in round-trip times of 20ms:
1 node1.foo.testbed.emulab.net> ping node2
16
PING node2-linkB (10.1.1.2) from 10.1.2.2 : 56(84) bytes of data.
64 bytes from node2-linkB (10.1.1.2): icmp_seq=1 ttl=63
time=20.3 ms
64 bytes from node2-linkB (10.1.1.2): icmp_seq=2 ttl=63
time=20.3 ms
...
But if the ping travels over the control network rather than the experimental network,
there will be no delay:
2 node1.foo.testbed.emulab.net> ping node2.foo.testbed.emulab.net

PING pc12.emulab.net (155.101.132.12) from 155.101.132.10
: 56(84) bytes of data.
64 bytes from pc12.emulab.net (155.101.132.12): icmp_seq=1
ttl=64 time=0.291 ms
64 bytes from pc12.emulab.net (155.101.132.12): icmp_seq=2
ttl=64 time=0.124 ms
...
Notice that the correct ping commands use the an unqualified local name (node2),
while the incorrect one uses the fully qualified name (node2.foo.testbed.emulab.net). It
should be understood that the qualified names are resolved by the Emulab nameserver
and should be used for accessing nodes from outside the Emulab network, for controlling
purposes. Unqualified names are resolved from a local /etc/hosts file that is created on
each node. Thus, they should be used from within the experiment. Figure 2.3 shows
the various names each node can be named by, and which interface they will resolve to.
Accidental use of the control net interface in an experiment can occur due to incorrect
use of fully qualified name in place of local name or the application choosing an interface
on its own.
During our experience with Emulab, we faced the above problems several times due
to absence of understanding and learnt the correct usage in the hard way. Subsequently,
we modified several of the implementations we used for our experiments like BitTor-
rent client, tracker, etc., which by default selected the first interface, which sometimes
resulted in use of the control network.
17
Chapter 3
Performance Study of Content

Distribution Models
In chapter 1, we talked about the various content distribution models, including the
World Wide Web model and the Peer-to-Peer Systems model. With the help of an
example scenario, we also illustrated the problem of same data being re-transmitted
over internet links, leading to degraded performance and higher running costs. In this
chapter, we present the results of a large scale experimental study to understand the
performance of each of the content distribution models. The study was conducted using
the Emulab [28] emulation facility. In this chapter, we describe the experimental setup
on Emulab used for this study, the choice of available implementations of the two content
distribution models, the performance of each of the models and their interpretations.
3.1 Experimental Setup
3.1.1 Network Topology
The first step towards performing experiments on Emulab is to specify the network
topology and the specification of hardware and software on each node of the network.
This is done with the help of a topology specification script written in tcl programming
language, in a format identical to that of NS2.
Internet can be assumed to be composed of two entities:
Backbone Network: It consists of the high bandwidth, high delay, long distance net-
18
work links, which typically run across continents and countries. These backbone
links are generally hosted by various Internet Service Providers (ISPs) and account
for the main cost in running the internet.
High Speed LANs: Most organizations today have access to high speed local area
networks (LANs) which in turn are connected to the backbone internet via partic-
ular nodes (routers). Such LANs are generally error-free and congestion free and
are administered by the local organizations.
Since the major cost in running Internet is in maintaining the backbone network, the
ISPs are generally concerned about transferring the data across backbone links in the
most cost-effective manner. The cost for a link is proportional to the amount of data
(or the number of bytes) transferred across the link. In this study, we try to understand
the typical amount of traffic which the ISPs need to transfer to support the different
content distribution models.
Also, as we show in this study, most of the current models end-up sending the same
data again and again over the same links. We are interested in designing protocols which
restricts such retransmissions.
Figure 3.1 illustrates the network topology used for this performance study on Em-
ulab. The internet backbone is made up of four core routers, named coreRouter0,
coreRouter1, coreRouter2 and coreRouter3. Each of the core routers run the Red Hat
Linux 9.0 Standard operating system. The four core routers are all connected to each
other in a symmetrical manner and thus there are total six core links named corelink0
... corelink5. Each of the core link is a 10Mb link with a 20 ms end-to-end delay and a
Drop Tail queue.
Three of the core routers (coreRouter0, coreRouter1 and coreRouter2) are each con-
nected to a set of three high speed LANs via routers (router0, router1 and router2).
Each of the three routers run the FreeBSD 6.0 version of operating system. The link
between a router and a core router is a 2Mb link with a 10 ms end-to-end delay and a
Drop Tail queue. Each router is in turn connected to three 10 Mbps LANs (for example,
router0 is connected to lan0, lan1 and lan2). Each LAN is composed of 4 end nodes and
a switch. The nodes are named from node0 to node 35 (total 36 end-nodes/clients).
A dedicated node (named seeder) is connected to coreRouter3 via a 2Mb link with
19
Figure 3.1: The experimental setup used for the performance study
20
a 10 ms end-to-end delay and a Drop Tail queue. This node is used for initially hosting
the file which is to be distributed.
3.1.2 Performance Metrics
In this study, we are concerned about quantifying the amount of data transmitted over
backbone links in the various content distribution models. Thus, we measure two key
metrics in each experiment run, for each link, in each direction:
Number of Bytes: This represents the raw amount of data transfered over a link in a
particular direction.
Stress: This represents the ratio of number of total packets transmitted over a link and
the number of unique packets transmitted over the link. For example, a stress of
2 represents a case where each packet is transfered twice over a link.
As mentioned earlier, the running cost of a link for the ISP is proportional to the raw
amount of data transfered over a link. A higher link stress refers to the case where higher
redundant transmissions of the same data are happening over the link, thus wasting the
bandwidth.
Emulab has simple support for tracing and monitoring links and LANs. For example,
to trace a link:
set link0 [$ns duplex-link $nodeB $nodeA 30Mb 50ms DropTail]
$link0 trace
The default mode for tracing a link (or a lan) is to capture just the packet headers (first
64 bytes of the packet) and store them to a tcpdump [12] output file. In addition to
capturing just the packet headers, one may also capture the entire packet:
$link0 trace packet
By default, all packets traversing the link are captured by the tracing agent. To narrow
the scope of the packets that are captured, one may supply any valid tcpdump (pcap)
style expression:
21
$link0 trace monitor "icmp or tcp"
One may also set the snaplen for a link or lan, which sets the number of bytes that will
be captured by each of the trace agents:
$link0 trace_snaplen 128
In our experiments, we set the snaplen to 1600 bytes.

For each link (say link0, between nodeA and nodeB), 2 trace files of interest are
generated by tcpdump: trace nodeA-link0.recv and trace nodeB-link0.recv. Here, the
first trace file stores the packets sent by nodeA to nodeB over link0, while the second
file stores the packets sent by nodeB to nodeA over link0.
To analyse the tcpdump trace files, we modified a well known tool tcptrace [13]. We
added a module in the tcptrace code to calculate the MD5 checksum of payload of each
tcp packet and store the checksums of all payloads in a file. The number of checksums
is equal to the total number of packets transmitted over a link. We then calculate the
number of unique checksums in the file, which represents the number of unique packets
transmitted. The ratio of these two gives the link stress. Also, the total number of bytes
from payloads of all tcp packets on a link can be easily calculated from tcptrace.
3.2 Used implementations of content distribution mod-

els
We study the performance of two content distribution models: the World Wide Web
(WWW) and the Peer-to-Peer Systems model.
The WWW model is adopted in most of the real world applications today. For
example, while browsing the internet, a user communicates with web servers via the
HTTP protocol. The contents from web servers are fetched using the HTTP protocol.
We use GNU Wget [6] to download the data using the WWW model. GNU Wget is
a free software package for retrieving files using HTTP, HTTPS and FTP, the most
widely-used Internet protocols. It is a non-interactive command line tool, so it may
easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
The most popular Peer-to-Peer file sharing system used over internet today is Bit-
Torrent. As mentioned earlier, BitTorrent is a method of distributing large amounts of
22
data widely without the original distributor incurring the entire costs of hardware, host-
ing, and bandwidth resources. When data is distributed using the BitTorrent protocol,
each recipient supplies pieces of the data to the newer recipients, reducing the cost and
burden on any given individual source, providing redundancy against system problems,
and reducing dependence on the original distributor.
For our experiments, we needed a BitTorrent client which support at least the fol-
lowing features:
• Must support a console based interface to allow remote execution over Emulab
nodes
• Should preferably be in C/C++ so that BSD sockets could be used to extend it

to support IP multicast
An exhaustive comparison of different BitTorrent implementations is available at:

http://en.wikipedia.org/wiki/Comparison of BitTorrent software
We found CTorrent [5] to be a light weight BitTorrent client, written in C++ and
being actively supported. We based our experiments with the Peer-to-Peer model on
CTorrent.
3.3 Performance Evaluation

For each of the content distribution model, we initially host a 1.2 MB file on the seeder
node, which is then downloaded by all 36 end nodes, using one of the models. Below,
we present the evaluations for each of the models.
3.3.1 WWW Model
Table 3.1 shows the link statistics for the file download using GNU Wget on each of the
end/client nodes.
In the WWW model, each client downloads the file from the server (seeder in this
case) independently of each other, by establishing independent HTTP connections over
TCP with the server. Each HTTP connection is composed of two independent TCP
connections: (1) From client to server to send request and acknowledgement packets (2)
23
Table 3.1: Link Statistics for file download using Wget
Link Direction No. of Bytes Stress
coreLink0 coreRouter0 − > coreRouter1 0 0
coreLink3 coreRouter0 − > coreRouter3 1.3 KB 12.000
coreLink3 coreRouter3 − > coreRouter0 14.5 MB 11.585
link0 coreRouter0 − > router0 14.5 MB 11.585
link0 router0 − > coreRouter0 1.3 KB 12.000
link3 coreRouter3 − > seeder 3.9 KB 37.000
link3 seeder − > coreRouter3 43.6 MB 8.315
24
From server to client to send the data packets. Intuitively, the bandwidth utilization in
the uplink direction (from client to server) is much lower as compared to the downlink
direction (from server to client).
Table 3.1 shows that corelink0, corelink1 and corelink2 are not utilized for any bytes
transfer. This is due to static routing path between the seeder and the clients, which does
not include these 3 links but uses other 3 parallel and equivalent (in terms of bandwidth
and delay) links: corelink3, corelink4 and corelink5. Table 3.1 also illustrates that the
amount of data transferred in the uplink direction is much lower than that transferred
in the downlink direction.
The most important result shown in Table 3.1 is the number of bytes transferred
on each link in the downlink direction. For example, on coreLink3, 14.5 MB of data is
transferred downlink. This is explained by the fact that coreLink3 connects seeder with
coreRouter0, which in turn connects to all clients on lan0, lan1 and lan2. Since each
of the 12 clients on lan0, lan1 and lan2 download the data independently, the shared
file of 1.2 MB is downloaded 12 times through coreLink3. The extra bytes transferred
correspond to other overheads of HTTP.
Another interesting result shown in Table 3.1 is the stress on each of the internet
links. We see that a stress of around 4 is common on most links in the downlink
direction. Note that stress is not necessarily equal to the ratio of total bytes transferred
over a link and the number of bytes in the download file. This is because for calculating
link stress, we calculate MD5 checksum of each TCP packet payload, which contains
HTTP headers, etc. For packets which have similar data contents, these HTTP headers
may slightly differ, for example only by timestamps. Due to this, the MD5 checksums
of similar packet payloads differ, thus reducing the overall stress. A potential way of
dealing with this problem is the use of another hash function to calculate the checksum
which is suitable for detection of near-duplicate contents. One such hash function is
Charikar’s hash [17], also referred as simhash [22].
3.3.2 Peer-to-Peer Model
Table 3.2 shows the link statistics for the file download using BitTorrent on each of the
end/client nodes.
25
Table 3.2: Link Statistics for file download using BitTorrent
Link Direction No. of Bytes Stress
coreLink3 coreRouter0 − > coreRouter3 15 KB 2.800
link0 router0 − > coreRouter0 6.9 MB 7.013
link3 coreRouter3 − > seeder 47 KB 3.184
link3 seeder − > coreRouter3 3.0 MB 2.797
26
In the P2P model, clients download the file in a collaborative manner. Instead of
depending only of the seeder for the file download, each client fetches data packets from
other clients as well, which may also be downloading the same file. Thus, in this case,
clients have TCP connections between them, in addition to TCP connections with the
seeder. Unlike the WWW model where the uplink capacity of most links remain largely
unutilized, in P2P model, since clients are also responsible for uploading packets to other
clients, thus the uplink capacity is also used in P2P model.
Table 3.2 shows several important trends. All the links see data transfers of the order
of 4-6 MB, unlike the case of WWW model, where several links had to transfer as much
data as 14 MB. Data transfer happens in both directions (uplink and downlink).
The other important observation is regarding the link stress. We observe that link
stress values are smaller in case of the core links. This means that there are lesser number
of duplicate packet transmissions happening over the internet links, thus avoiding the
wastage of resources. This is due to the fact that each client observes the data pieces
which are available with other clients and fetches them as well, instead of fetching pieces
always from the seeder.
3.4 Motivation for MTorrent

The results in Table 3.1 and Table 3.2 show that internet links suffer from non-ideal
stress values. Since the coordination (if any) among the downloading clients is at a
very high level (application layer), redundant downloads of packets do happen. This
is particularly the case when several downloading clients are located on same or near-
by LANs, while individual clients not knowing of their physical and logical proximity.
Due to this, in the case of BitTorrent, although clients collaborate in downloading by
uploading their packets to each other, they may be physically and logically separated
from each other by several internet backbone links.
Today, a lot of P2P traffic is because of BitTorrent downloads of large files like those
of music and videos. Additionally, when such files are made available for the first time
on internet, millions of people tend to download them at the same time. If WWW
model is used for the dissemination of these files, WWW servers experience heavy load,
consequently slowing down the download process. Even in case of P2P model, several
27
downloaders from same LANs (for example, those located inside the same university
networks) download the files from external networks. Since there is no coordination
protocol amongst such clients, redundant downloads of pieces happens.
Another application where simultaneous downloads happen is that of distribution
of critical software updates to the network of clients by software companies. This also
includes distribution of virus updates, etc. In such cases, millions of clients need to be
updated within shortest possible time. For example, a Dutch university uses BitTorrent
as a network management tool to distribute software to 6500 desktop computers in
16 different locations throughout the Netherlands [2]. Instead of distributing software
updates and images from several centralized servers, the university utilizes the efficiency
of BitTorrent, and uses all the computers in the network to help distribute the files.
Before they decided to use BitTorrent, more than 20 servers were needed to distribute
25.6 TBs of data to the desktops, and even then it could take up to 4 days to update
them all. With BitTorrent, this process has speeded up significantly, and all computers
are updated with the latest software in less than 4 hours. The data does not have to
be distributed from one location, since all the workstations connected to the network
actively help in the distribution.
We propose to augment the BitTorrent protocol with a simple yet efficient coordina-
tion protocol, through which redundant downloads of pieces using BitTorrent could be
avoided, thus saving network resources and speeding the download process. In particu-
lar, we utilize the IP Multicast functionality available in some of the networks to share
the pieces downloaded by BitTorrent with the local peers, instead of letting them down-
load those pieces from remote peers. Our coordination protocol acts as a helping module
to the BitTorrent client, without interfering with the standard compliant operation of
BitTorrent.
28
Chapter 4
Multicast
Multicast is the delivery of information to a group of destinations simultaneously using

the most efficient strategy to deliver the messages over each link of the network only
once, creating copies only when the links to the destinations split. In this chapter, we
examine how IP Multicast can be leveraged for large scale content distribution.
4.1 IP Multicast
Internet Protocol (IP) multicast is a bandwidth-conserving technology that reduces traf-
fic by simultaneously delivering a single stream of information to thousands of recipients.
The applications that take advantage of multicast include videoconferencing, corporate
communications, distance learning, and distribution of software, stock quotes, and news.
IP Multicast delivers source traffic to multiple receivers without adding any addi-
tional burden on the source or the receivers. Multicast packets are replicated in the
network by routers enabled with Protocol Independent Multicast (PIM) and other sup-
porting multicast protocols. All alternatives to IP Multicast require the source to send
more than one copy of the data. Some even require the source to send an individual copy
to each receiver. If there are thousands of receivers, even low-bandwidth applications
benefit from using IP Multicast. Figure 4.1 demonstrates how data from one source is
delivered to several interested recipients using IP multicast.
Multicast is based on the concept of a group. An arbitrary group of receivers ex-
presses an interest in receiving a particular data stream. This group does not have any
29
Figure 4.1: Multicast Transmission Sends a Single Multicast Packet Addressed to All
Intended Recipients (Source: [8])
physical or geographical boundaries and the hosts can be located anywhere on the Inter-
net. The hosts that are interested in receiving data flowing to a particular group must
join the group using IGMP. The hosts must be a member of the group to receive the
data stream.
Multicast addresses specify an arbitrary group of IP hosts that have joined the group
and want to receive traffic sent to this group. The Internet Assigned Numbers Authority
(IANA) controls the assignment of IP multicast addresses. It has assigned the old Class
D address space to be used for IP multicast. This means that all IP multicast group
addresses will fall in the range of 224.0.0.0 to 239.255.255.255.
More information about IP Multicast can be obtained from RFC 3170 [10].
4.2 IP Multicast as a Content Distribution Model

As explained earlier, IP Multicast provides the most efficient strategy to deliver data
simultaneously to multiple clients from a single source. This is due to the fact that each
data packet traverse each link only once, while creating copies only when the links to
the destinations split.
In Chapter 3, we have discussed several content distribution models for the scenario
where hundreds or thousands of clients are interested in downloading the same data
within shortest amount of time. IP Multicast is a particularly attractive alternative for
30
content distribution in such scenarios. All the clients can initially send IGMP request
messages to join a multicast group and the source (or seeder as in Chapter 3) can
multicast the data on this group. Since routers are aware of the physical topology and
positions of clients, the data traverses the shortest path to reach each of the client,
guaranteeing optimal download time.
Although such an approach is promising, it is not viable in today’s Internet because
of lack of support of IP Multicast on Internet. This means that two nodes on the Internet
does not necessarily have a route between them which is IP Multicast enabled.
There are several reasons why IP Multicast is not available on the Internet. These
include:
• Most routers on the Internet lack support for IP Multicast. Recollect that to
support IP Multicast, a router needs to perform several additional operations like
duplication of packets with PIM, IGMP support, Multicast forwarding etc. The
routers available on Internet simply do not have resources or capabilities to perform
all such operations. Upgrading such existing routers is clearly infeasible.
• Congestion control schemes are not well defined for multicast.
• Pricing policies in multicast are not clear. Hence, there are no incentives for the
ISPs to be interested in deploying multicast support in the networks.
Therefore, it is almost clear that utilizing IP-level multicast for large scale content
distribution in above mentioned scenarios is not feasible.
4.3 Observations
Although IP Multicast is not available on the Internet, we have observed that most
organizations have it enabled on their local networks. This is so because upgrading a
few routers to support IP Multicast on the local networks is relatively an easier task
as compared to upgrading millions of routers on the Internet. Besides, problems like
absence of congestion and rate control mechanisms for IP Multicast are less severe on
local networks which are typically high speed, free from error and congestion. Lastly,
pricing policies for use of links within the local network is not very important as these
links are hosted by organizations themselves and not by foreign ISP.
31
For the rest of this thesis, we call local networks run by organizations as islands.
Based on our observations, we assume that most of such islands are IP Multicast enabled.
In Chapter 3, we observed that bulk data download happens using Peer-to-Peer
systems and there are several instances where there exist multiple downloading clients
within the same islands (for example, university networks etc.). These clients are un-
aware of each others presence and fetch data packets from outside the island over Bit-
Torrent.
A heavy coordination protocol amongst such clients, which enables each client know
the other clients also available on the island, etc. would involve huge overheads and
eventually slow down the downloading process. Besides, clients may not be ready to
reveal their identity to other clients on the same island during the download due to
privacy issues.
We propose to use IP Multicast as an enabling technology to anonymously share
downloaded packets with the local clients on the same island. This forms the basis of
our proposed protocol, which we call MTorrent.
32
Chapter 5
MTorrent
We propose a protocol which can co-exist with the standard BitTorrent protocol and
leverage IP Multicast to distribute downloaded pieces to other BitTorrent clients on the
same island. We call this protocol as MTorrent. In this chapter, we first explain the
design of MTorrent protocol. Then we develop a performance model for evaluation of
MTorrent over Emulab and compare the same against the performance of BitTorrent
and WWW HTTP protocols.
5.1 Overview of the protocol

We associate a class D IP Multicast address with each file to be downloaded. This
IP address can be embedded in the torrent file available on the web server. Note that
each file to be downloaded has a corresponding torrent file, which anyway needs to be
downloaded by each of the BitTorrent client. The BitTorrent client reads information
available in the torrent file, which comprises of the URL of the tracker, the piece size,
checksums of the pieces, etc., before starting the file download. We have modified the
client to also read the embedded class D IP address.
Conceptually, the class D IP address associated with a file is the multicast address
of the group of clients who are interested in this file. Before the download starts, each
client tries to join the multicast group by sending IGMP messages. If the network to
which the client is connected is IP Multicast enabled, the client successfully joins the
group, and thus becomes a part of the island. Otherwise, the client remains as a isolated
33
island, which means it does not receive or send data via IP multicast.
Note that the class D address is used by all the clients in different islands, which are
connected to each other by normal unicast links, without any support for IP Multicast.
Due to this, several groups may be formed on the Internet, who share the same class D
IP address.
In MTorrent, each client participates in the normal download process using unicast.
As soon as a piece has been successfully downloaded over unicast by a client, it multicasts
the piece over the local multicast group (corresponding to this file). Since other clients
on the same island must have also joined this group, they receive the pieces on the
multicast socket, in addition to the normal pieces on the unicast socket.
Each client has a helper thread running, which periodically checks for any multicast
packets received. Since clients are trying to download the same file, it is very likely that
other clients would also be anticipating same pieces over unicast links. However, when
the helper thread detects that the piece is now available over the multicast socket, it
tries to verify the integrity of the piece by calculating and comparing the checksum of
the received piece with that read from the torrent file. If the checksum matches, the
helper thread places the received piece at its correct location on the disk and updates the
local data structures to indicate that the client need not try to download this piece as it
is now available. The helper thread also immediately issues cancellation of all requests
for the received piece which may have been sent to other clients over the unicast link to
avoid receiving the same piece from other peers over unicast.
Figure 5.1 summarizes the sequence of operations performed while uploading the
torrent file on web server and while downloading of file by a MTorrent client.
One problem with IP Multicast is that unlike TCP, it is an unreliable protocol which
works over UDP. This means that there is no guarantee that a packet multicasted over
UDP will be successfully received by other clients. Since IP Multicast does not have
any mechanisms for rate control and checking packet losses (due to random errors etc.),
it is not necessary that pieces shared by clients would be received by all other clients
on the island. The clients which have low receive buffer or which are busy with other
operations often are unable to completely receive packets sent over multicast.
However, a simple observation makes the operation of MTorrent much easier. Since
MTorrent has been designed to be a completely independent module from the standard
34
Class D address Upload to web server
Torrent file with
class D IP address
File
file
rent
d tor File Server
ownloa
D
Send IGMP request to join

group
Yes
Successful in Start a new thread called
joining? “helper” thread
No Helper
ad
al Thre thread
Norm
Client
Periodically check for any

Download pieces using packets received on multicast
normal BitTorrent socket
Perform normal BitTorrent
operations
Yes Has the packet

At each successful download of a
received been
packet via unicast, multicast it on
already
the group
downloaded?
No
Discard this
packet Perform Hash
Check
Insert this packet at

the correct position
in file
Send cancel
request for this
packet
Figure 5.1: Flow chart showing MTorrent operations
35
BitTorrent, the clients which are unable to successfully receive multicasted pieces can
subsequently receive them using the normal unicast connections. Since the fraction of
the clients which are unable to receive multicasted pieces is very small, the benefits
obtained from multicasted packets offset the cost of fetching some packets over unicast,
consequently the overall download process becomes much more efficient.
5.2 Performance Evaluation
5.2.1 Topology
We evaluate the performance of MTorrent on a large scale topology, same as that used
in Chapter 3. For the sake of completeness, the topology is shown again in Figure 5.2.
The components of the topology are the end clients (node0 to node35), the access
routers (router0 to router2) and the core routers (coreRouter0 to coreRouter2). There
are two types of links in this topology:
Core links: which serve the traffic across the internet by connecting the core routers;
and
Access links: which are used to provide internet access to the islands consisting of
various high-speed LANs.
Since the two types of links carry different type of traffic, we show the evaluation of both
types separately.
5.2.2 IP Multicast support on islands
Each island in our experimental topology consists of 3 high-speed (10 Mbps) LANs. All
the LANs are connected to each other via the access router (i.e., router0, router1 or
router2). Each of the access routers run the FreeBSD 6.10 operating system. In order
to allow IP Multicast across different LANs on the same island, we run mrouted [7] on
each of the access routers. The mrouted utility is an implementation of the Distance-
Vector Multicast Routing Protocol (DVMRP), an earlier version of which is specified in
RFC-1075 [3].It maintains topological knowledge via a distance-vector routing protocol
36
Figure 5.2: Topology used for performance evaluation
37
(like RIP, described in RFC-1058 [11]), upon which it implements a multicast datagram
forwarding algorithm called Reverse Path Multicasting.
The mrouted utility forwards a multicast datagram along a shortest (reverse) path
tree rooted at the subnet on which the datagram originates. The multicast delivery tree
may be thought of as a broadcast delivery tree that has been pruned back so that it
does not extend beyond those subnetworks that have members of the destination group.
Hence, datagrams are not forwarded along those branches which have no listeners of the
multicast group. The IP time-to-live of a multicast datagram can be used to limit the
range of multicast datagrams.
Thus, any multicast packet in one of the LANs reach all other LANs on the same
island, provided their are clients on the other LANs who have subscribed to the corre-
sponding multicast group.
Also, we set the ttl value of multicast packets to 3 to allow them to cross multiple
levels of multicast enabled routers. Note that a ttl value of 1 means that packets are
limited to the same subnet.
5.2.3 Performance Metrics
Number of Bytes This represents the raw amount of data transfered over a link in a
particular direction.
Stress This represents the ratio of number of total packets transmitted over a link and
the number of unique packets transmitted over the link.
Time for Download This represents the total time each client takes to download the
file.
5.2.4 Comparison of MTorrent, BitTorrent and WWW
For all the experiments in this section, we run a seeder and a tracker at the seeder node
(connected to coreRouter3 in the topology shown in Figure 5.2). The seeder serves a file
of size 1.2 MB. All the results reported in this section have been obtained after proper
averaging over 5 to 10 runs of each experiment.
38
Figure 5.3: Cumulative Distribution Function of time for download by each client
Also, for the sake of convenience in representation of results, each of the links has
been identified with a label as described in Table 5.1.
Figure 5.3 shows the Cumulative Distribution Function (CDF) of the time for down-
load at each client. X-axis shows the time for download in seconds while Y-axis represents
the cumulative percentage of clients which completed their download till this time. Note
that the steeper the plot is, the faster is the completion of download for all the clients.
Figure 5.3 shows that 100 % of clients complete their download within 30 seconds
while using MTorrent protocol. Till 30 seconds, none of the clients using either BitTor-
rent protocol or the WWW (HTTP) protocol could complete the download. It takes
about 60 seconds for all the nodes to complete their download using BitTorrent. The
performance while using the WWW protocol is worst and it takes about 180 seconds for
all nodes to complete the download.
The reason for the degraded performance of the WWW model is the heavy load on
each of the core and access links and particularly on the link connecting the seeder with
the core router. Since all the clients try to download the data at the same time, the
effective download rate obtained by each of them is very low. In case of MTorrent and
BitTorrent, the download load is distributed on several links and the clients collaborate
with each other in fetching data. Further, in the case of MTorrent, data packets fetched
39
Table 5.1: Mapping between link name and link label
Link (Source − > Destination) Label assigned
coreRouter0 − > coreRouter1 1
coreRouter0 − > router0 13
router0 − > coreRouter0 14
coreRouter3 − > seeder 19
seeder − > coreRouter3 20
40
Table 5.2: Download time (in seconds) statistics of MTorrent, BitTorrent and WWW
Minimum Maximum Average
MTorrent 22.5 26.3 24.6
BitTorrent 34.2 51.0 43.4
WWW 147.4 181.2 171.2
Figure 5.4: Amount of data transferred over each link using MTorrent, BitTorrent or
WWW
from outside the island are instantaneously shared with other clients via IP Multicast.
Consequently the wait time for each packet is reduced considerably.
Table 5.2 shows the download time statistics of MTorrent, BitTorrent and the WWW
model. We report the minimum, maximum and the average download time over all the
36 clients. Based on the statistics, we conclude that the improvement is time for down-
load of files using the proposed MTorrent protocol is about 44 % over the conventional
BitTorrent protocol and about 86 % over the WWW HTTP protocol.
Figure 5.4 shows the amount of data transferred over each of the links using MTor-
rent, BitTorrent and the WWW protocol. Note that the link labels in this figure are
mapped to links as in Table 5.1. Link 1 to 12 are the core links, while link number 13
to 20 are the access links connecting island/access routers to the core routers. From
Figure 5.4, it is clear that the amount of data transferred over the access links is much
41
Table 5.3: Amount of data (in Megabytes) statistics for all type of links
MTorrent 0.007 4.5 1.4
WWW 0 30.3 5.8
Table 5.4: Amount of data (in Megabytes) statistics for core links
MTorrent 0.007 2.4 0.7
WWW 0 14.5 3.6
higher as compared to the core links. Also, in case of WWW protocol, some of the links
observe zero utilization because in that case, each client has a fixed single route to the
seeder, and some of the links do not lie on any of such paths. Also, in case of WWW
protocol, there is a clear difference in the utilization of the uplink and the downlink links
due to the asymmetric nature of download.
All the links have to carry comparatively much lesser data in case of the MTorrent
protocol as compared to either BitTorrent or WWW. This is explained by the fact that
in MTorrent, packets once downloaded in an island normally need not be downloaded
again from the internet. Thus, the amount of bytes transferred on internet links is lower.
Table 5.3 shows the statistics for the amount of data transferred over all type of
links using MTorrent, BitTorrent or WWW protocol. Minimum, maximum and average
amounts are shown over all the 20 links.
Also, Table 5.4 shows the statistics for the amount of data transferred only on the
core links. This is useful to understand the amount of traffic which ISPs need to support.
Table 5.5: Amount of data (in Megabytes) statistics for access links
MTorrent 0.01 4.5 2.4
WWW 0.001 30.3 9.2
42
Figure 5.5: Stress on link using MTorrent, BitTorrent or WWW
Finally, Table 5.5 shows the statistics for only the access links, for which the island
owners must pay, based on the usage. Note that the file being downloaded by the clients
is of size 1.2 MB. In the case of MTorrent, on an average, each access link transfers
about 2.4 MB of data, to serve the 12 downloading clients on the corresponding island.
This means that roughly, just 2 copies of the original file are downloaded by the all 12
clients collectively. On the other hand, in the case of BitTorrent, each access link, on
an average, transfers about 6.6 MB of data, which means around 5 copies of the same
file are downloaded in each island. WWW performs worst with around 8 copies of the
original file getting downloaded in each island.
Based on the above statistics shown in Table 5.3, we conclude that an improvement
of about 65 % over BitTorrent and about 75 % over WWW protocol in amount of data
transferred is achievable using the proposed MTorrent protocol.
Figure 5.5 shows the stress observed over each link using MTorrent, BitTorrent and
the WWW protocol. It can be recalled that stress on a link is the ratio of total packets
and unique packets passing over the link.
It is clear from this figure that the stress on each of the links is much lower when
using MTorrent protocol, as compared to that using either BitTorrent or the WWW
protocol. This proves that MTorrent is successful in reducing the duplicate download
43
Table 5.6: Stress statistics for all type of links
MTorrent 1 7.3 3.4
WWW 0 39 8.4
Table 5.7: Stress statistics for core links

MTorrent 1.2 7.3 3.1
WWW 0 14 5.4
of packets over the Internet links by effectively sharing downloaded packets with local
clients.
Table 5.6 shows the statistics for stress on all types of links using the 3 protocols.
Based on this, we conclude that MTorrent is about 40 % more effective over BitTorrent
and about 60 % more effective over WWW protocol in reducing the redundant traffic
load on internet links.
Table 5.7 and Table 5.8 shows the stress statistics for core and access links respec-
tively. It can be seen that the minimum stress achieved on an access link is as low as 1,
1
which is the theoretical lower bound on stress that any link can have. Thus, in one of
the islands (island containing lan0, lan1 and lan2 in topology shown in figure 5.2), ex-
actly one copy of the original file was downloaded using the proposed MTorrent protocol,
1
Stress is the ratio of number of total packets and number of unique packets traversing over a link.
Each packet must traverse at least once over the link, and thus if none of the packets traverse multiple
times over a link, the stress over this link is 1.
Table 5.8: Stress statistics for access links

MTorrent 1 5.7 3.8
WWW 0 39 13.0
44
which is the best performance achievable by any content distribution protocol.
5.2.5 MTorrent and Multicast Unreliability
In section 5.1, we discussed about the problem of unreliability in IP Multicast. Since

there is no guarantee that multicasted packets reach their destinations, the effective-
ness of MTorrent, which shares packets with local clients via IP Multicast needs to be
evaluated.
Multicast packets on an island can be lost or delayed due to two things:
• The clients and links on a LAN show abnormal behaviour (due to load or miscon-
figuration) leading to random packet losses.
• There is congestion on the LANs due to other heavy traffic being exchanged by
clients, e.g., VoIP etc.
In this section, we study the effectiveness of MTorrent and compare its robustness
with BitTorrent protocol under such scenarios.
Effect of random link losses
To model the random behavior of the clients and the links, a random packet loss module
is installed in each of the LANs, whose packet loss rate can be configured. We varied
the packet loss rate of each LAN from 0% to 5% and repeated the experiments for each
case to measure the various performance metrics for both MTorrent and BitTorrent.
Figure 5.6 shows the average time for download over all the clients with increasing
values of packet loss percentage. As we observed from our earlier results, at 0% packet
loss percentage, MTorrent completes the download much faster than BitTorrent. In this
case, no artificial random packet losses were injected into the links. However, some
multicast packets might still get lost due to receiver buffer overflow, etc.
As the random packet loss percentage increases, the effectiveness of sharing packets
via IP Multicast decreases. Due to this, the failed packets have to be obtained by
the clients via normal unicast procedure. Thus, the time for download in the case
of MTorrent increases with increasing packet loss percentage. Note that the time for
download in the case of BitTorrent also increases with random losses because even unicast
45
Figure 5.6: Time for download with Packet Loss Percentage of each LAN
packets are dropped and hence more TCP retransmissions occur resulting in increasing
the download time.
At packet loss percentage of more than 4%, the time for download using MTorrent
is higher than that using BitTorrent. This is because at such high loss rates, most
multicast packets are lost and thus, there packets have to be fetched using normal
unicast procedure. In addition, the module which sends and receives multicast packets
incur extra overhead on the client, which exceeds any possible gains in time for download
due to the successful multicast packets received.
However, random packet loss percentages as high as 4% are quite rare in most LANs
today and thus represent an unnatural scenario. With the more common scenarios,
MTorrent is shown to have a better performance over BitTorrent.
Figure 5.7 shows the average amount of data transferred on all internet links with
increasing values of packet loss percentage. The amount of data transferred increases
with random packet loss, in case of both MTorrent and BitTorrent. However, MTorrent
still performs better than BitTorrent by downloading lesser amount of data from internet.
Finally, Figure 5.8 shows the variation of average link stress with packet loss per-
centage. Stress on the internet links increases with random packet loss due to the higher
number of TCP retransmissions to deliver data across islands. Note that more retrans-
46
Figure 5.7: Average amount of data transferred with Packet Loss Percentage of each
LAN
Figure 5.8: Link stress with Packet Loss Percentage of each LAN
47
missions mean same data packets traversing internet links again and again.
Effect of congestion
To model the scenario of congestion in each island, we start a Constant-Bit Rate (CBR)
traffic source on each of the LANs which send the traffic to one the clients on another
LAN in the same island. Thus, each island has 3 CBR traffic sources. The rate of
CBR traffic for each source is varied from 0 Mbps to 10 Mbps to model the severity of
congestion.
Figure 5.9 shows the variation of average time for download over all clients with
increasing value of CBR traffic rates. Although MTorrent consistently performs the
download faster than BitTorrent even at high levels of congestion, we do not observe
any clean dependence of congestion level on the effectiveness of IP Multicast and thus
MTorrent.
Figure 5.9: Time for download with varying congestion level
48
Chapter 6
Conclusions
In this work, we considered the various content distribution models and highlighted the
problem of downloading redundant data in each case. In addition to theoretical explana-
tions, we carried out a detailed performance measurement study of content distribution
models on Emulab, which is a large scale network testbed, hosted by the University of
Utah. We designed several tools to facilitate the process of experimentation on Emulab
and generation of suitable performance metrics. Our tools can be easily reused by other
researchers on Emulab or other network testbeds. To the best of our knowledge, this
is the first large scale study highlighting the problem of suboptimal operation of the
prevalent content distribution models like WWW and Peer-to-Peer systems.
We then studied possible approaches such as caching and sharing of data with local
clients on multicast enabled network islands, to solve the redundant download prob-
lem. Our observations led to MTorrent, a P2P content distribution protocol, which is
completely interoperable with much used BitTorrent protocol.
We obtained the following three important results with MTorrent:
• Reduction in download time of each client using MTorrent by 44% over BitTorrent
and by 86% over WWW protocol.
• Reduction in traffic load on Internet links and ISPs by 65% and 75% over BitTor-
rent and WWW protocols respectively.
• Reduction in the wastage of resources like bandwidth due to redundant packet

downloads by 40% over BitTorrent and 60% over WWW protocol.
49
Downloading time is the most critical performance metric for normal Internet users,
whose experience with the system is largely determined by how fast they can download
files from the Internet. Also, recent applications of Peer-to-Peer systems like distribut-
ing the software updates and the images of operating systems, etc., over large networks
spread across a geographically distributed area depend heavily on the download time for
each computer. As mentioned in Chapter 3, a university in Netherlands has a large net-
work of around 6500 desktop computers in 16 different locations throughout the country,
for which around 25.6 TBs of data needs to be regularly distributed. With the WWW
based model, this task required them around 4 days to complete. With BitTorrent, they
are able to reduce this time to about 4 hours. Since MTorrent reduces the download time
by around 44% over BitTorrent, we expect that this software distribution time can be
further reduced to under 2 hours. As another example, consider downloading of movies
(typical size around 1 GB) using BitTorrent from the Internet by clients connected to
networks having good access to Internet (such as IIT Kanpur). This process requires
about 5 hours, assuming a download speed of 50 Kbps, which is typically observed with
BitTorrent. By using MTorrent, this time can be brought down to under 3 hours in case
there are multiple active downloaders.
Most ISPs today observe heavy traffic load on their Internet links due to increasing
number of users using Peer-to-Peer file sharing systems. Due to competition, ISPs
are forced to reduce tariff continuously resulting in reduction in the margins of profit.
However, with more users migrating to a system like MTorrent, the load on ISP resources
(Internet links) can be reduced by as much as 65%, for the comparable amount of
downloads by end clients. Thus, the profit margins of ISPs can be increased heavily
if they encourage more users to switch to MTorrent. The load on access links is also
reduced by similar proportions by the use of MTorrent. The island owners have to pay
for the Internet access links, on the basis of the usage of such links. With reduced
usage of access links, the Internet consumption bills for island owners can be reduced
considerably, which in turn will be a motivation for them to enable IP Multicast support
on their networks requiring software (and in some cases hardware) upgrades. Thus,
MTorrent is economically sustainable.
Finally, our work on MTorrent is distinct from other similar research because of the
following reasons:
50
Standard compliance: The proposed MTorrent protocol is interoperable with Bit-
Torrent protocol. It only requires changes at the end client level, unlike other
solutions, which would need network wide support.
Simplicity: The protocol is easy to understand and implement, thus can be readily
used in most high end BitTorrent clients today like Azureus and BitComet, etc.
Actual Implementation: In place of theoretical results or network simulations, we

resorted to actually implementing a prototype system of our protocol and have
evaluated it on a large scale real network.
51
Bibliography
[1] Bittorrent location-aware protocol 1.0 specification.

http://wiki.theory.org/BitTorrent Location-aware Protocol 1.0 Specification.
[2] Bittorrent used to update workstations. http://torrentfreak.com/university-uses-

utorrent-080306.
[3] Distance vector multicast routing protocol. http://www.ietf.org/rfc/rfc1075.txt.
[4] Emulab documentation. http://www.emulab.net/doc.php3.
[5] Enhanced ctorrent. http://www.rahul.net/dholmes/ctorrent.
[6] Gnu wget. http://www.gnu.org/software/wget.
[7] How to set up linux for multicast routing.

http://www.jukie.net/ bart/multicast/Linux-Mrouted-MiniHOWTO.html.
[8] Ip multicast. http://www.cisco.com/en/US/docs/internetworking/technology/handbook/IP-

Multi.html.
[9] The network simulator - ns-2. http://www.isi.edu/nsnam/ns.
[10] Rfc 3170 - ip multicast applications: Challenges and solutions.

http://www.faqs.org/rfcs/rfc3170.html.
[11] Routing information protocol. http://www.ietf.org/rfc/rfc1058.txt.
[12] Tcpdump/libpcap. http://www.tcpdump.org.
[13] tcptrace. http://www.tcptrace.org.
52
[14] Jong Suk Ahn, Peter B. Danzig, Zhen Liu, and Limin Yan. Evaluation of TCP
vegas: Emulation and experiment. In Proc. of SIGCOMM, pages 185–205, 1995.
[15] J. Almeida, V. Almeida, and D. Yates. Measuring the behavior of a world-wide web
server. Technical Report Technical Report 1996-025, Boston University, October
1996.
[16] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zip-like
distributions: Evidence and implications. In Proc. of IEEE INFOCOM, 1999.
[17] Charikar. Similarity estimation techniques from rounding algorithms. In Proc. of

the thiry-fourth annual ACM symposium on Theory of computing, 2002.
[18] B. Cohen. Incentives build robustness in bittorrent. In Proc. of the First Workshop
on the Economics of Peer-to-Peer Systems, 2003.
[19] Internet Engineering Task Force. Hypertext transfer protocol – HTTP 1.1. RFC
2068. 1997.
[20] T. Karagiannis, P. Rodriguez, and K. Papagiannaki. Should internet service

providers fear peer-assisted content distribution. In Proc. of IMC, pages 63–76,
2005.
[21] N. Leibowitz, A. Bergman, R. Ben-Shaul, and A. Shavit. Are file swapping networks
cacheable? characterizing p2p traffic. In Proc. of the 7th Int. WWW Caching
Workshop, 2002.
[22] Sarma Manku, Jain. Detecting near-duplicates for web crawling. In Proc. of the
16th international conference on World Wide Web, 2007.
[23] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers. Brite: An ap-
proach to universal topology generation. In Proc. of the International Workshop on
Modeling, Analysis and Simulation of Computer and Telecommunications Systems
- MASCOTS, 2001.
[24] L. Rizzo. Dummynet and forward error correction. In Proc. of Freenix, 1998.
53
[25] Osama Saleh and Mohamed Hefeeda. Modeling and caching of peer-to-peer traffic.
In Proc. of IEEE International Conference on Network Protocols, pages 249–258,
2006.
[26] S. Saroiu, P. K. Gummadi, and S. D. Gribble. A measurement study of peer-to-peer

file sharing systems. In Proc. of Multimedia Computing and Networking, 2002.
[27] Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostić, Jeff
Chase, and David Becker. Scalability and accuracy in a large-scale network emula-
tor. In Proc. of OSDI, 2002.
[28] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler,

C. Barb, and A. Joglekar. An integrated experimental environment for distributed
systems and networks. In Proc. of the 5th Symposium on Operating Systems Design
and Implementation (OSDI), 2002.
[29] A. Wierzbicki, N. Leibowitz, M. Ripeanu, and R. Wozniak. Cache replacement

policies revisited: the case of p2p traffic. In Proc. of IEEE International Symposium
on Cluster Computing and the Grid (CCGrid), pages 182–189, 2004.
[30] Ellen W. Zegura, Ken Calvert, and S. Bhattacharjee. How to model an internetwork.
In Proc. of IEEE Infocom, 1996.
54

Mtorrent: A Multicast Enabled Bittorrent Protocol

Uploaded by

Copyright:

Available Formats

Mtorrent: A Multicast Enabled Bittorrent Protocol

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mtorrent: A Multicast Enabled Bittorrent Protocol

Uploaded by

Copyright:

Available Formats

MTorrent: A Multicast Enabled

A thesis submitted in partial fulﬁlment

Department of Computer Science and Engineering

3 Performance Study of Content Distribution Models 18

3.1 The experimental setup used for the performance study . . . . . . . . . . 20

4.1 Multicast Transmission Sends a Single Multicast Packet Addressed to All

5.1 Flow chart showing MTorrent operations . . . . . . . . . . . . . . . . . . 35

3.1 Link Statistics for file download using Wget . . . . . . . . . . . . . . . . 24

5.1 Mapping between link name and link label . . . . . . . . . . . . . . . . . 40

1.1.1 The World-Wide Web

1.1.2 Peer-to-Peer Systems

1.2 The problem of data repetivity

1.3 Related Work

1.3.1 Peer-to-Peer Systems

A peer-to-peer computer network uses diverse connectivity between participants in a

• voice data (telephony)

• media streaming (audio, video)

Other classification of peer-to-peer networks is according to their degree of centralization.

• There is no central server managing the network

• There is no central router

A popular example of hybrid peer-to-peer application is BitTorrent [18].

1.3.2 BitTorrent Protocol

Consider a scenario where we need to transfer a piece of data quickly to a group of

1.3.3 BitTorrent Location-aware Protocol

1.3.4 Caching techniques

2.1 Need for Emulab

2.2 Hardware and Software Resources

2.3 Access to Emulab

2.4 Specification of topology

# This is a simple ns script. Comments start with #.

set nodeA [$ns node]

set nodeB [$ns node]

set link0 [$ns duplex-link $nodeB $nodeA 30Mb 50ms DropTail]

# Set the OS on a couple.

$ns rtproto Static

2.5 The Emulab Control Network

set node1 [$ns node]

set linkA [$ns duplex-link $node1 $router 100Mb 0ms DropTail]

1 node1.foo.testbed.emulab.net> ping node2

2 node1.foo.testbed.emulab.net> ping node2.foo.testbed.emulab.net

Performance Study of Content

3.1 Experimental Setup

3.1.1 Network Topology

3.1.2 Performance Metrics

set link0 [$ns duplex-link $nodeB $nodeA 30Mb 50ms DropTail]

$link0 trace packet

$link0 trace_snaplen 128

In our experiments, we set the snaplen to 1600 bytes.

3.2 Used implementations of content distribution mod-

• Should preferably be in C/C++ so that BSD sockets could be used to extend it

An exhaustive comparison of different BitTorrent implementations is available at:

3.3 Performance Evaluation

3.3.1 WWW Model

3.3.2 Peer-to-Peer Model

3.4 Motivation for MTorrent

Multicast is the delivery of information to a group of destinations simultaneously using

4.2 IP Multicast as a Content Distribution Model

• Congestion control schemes are not well defined for multicast.

5.1 Overview of the protocol