Bittorrent (Protocol) : File Sharing
Bittorrent (Protocol) : File Sharing
Part of a series on
File sharing
Technologies
By country or region
Canada · UK
v·d·e
BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data. BitTorrent is
one of the most common protocols for transferring large files, and it has been estimated that it accounted
for roughly 27% to 55% of all Internet traffic (depending on geographical location) as of February 2009.[1]
Programmer Bram Cohen designed the protocol in April 2001 and released a first implementation on July
2, 2001.[2] It is now maintained by Cohen's companyBitTorrent, Inc. There are numerous BitTorrent
clients available for a variety of computing platforms.
Contents
[hide]
1 Description
2 Operation
3 Adoption
o 3.2 Broadcasters
o 3.4 Software
o 3.5 Government
o 3.6 Others
4 Network impact
5 Indexing
o 6.5 Multitracker
7 Implementations
8 Development
9 Legal issues
11 See also
12 References
13 Further reading
14 External links
[edit]Description
The BitTorrent protocol can distribute a large file without the heavy load on the source computer and
network. Rather than downloading a file from a single source, the BitTorrent protocol allows users to join a
"swarm" of hosts to download and upload from each other simultaneously . The protocol works as an
alternative method to distribute data and can work over networks with low bandwidth so even small
computers, like mobile phones, are able to distribute files to many recipients.
A user who wants to upload a file first creates a small torrent descriptor file that he distributes by
conventional means (web, email, etc.). He then makes the file itself available through a BitTorrent node
acting as a seed. Those with the torrent descriptor file can give it to their own BitTorrent nodes which,
acting as peers or leechers, download it by connecting to the seed and/or other peers.
The file being distributed is divided into segments called pieces. As each peer receives a new piece of the
file it becomes a source of that piece to other peers, relieving the seed from having to send a copy to every
peer. With BitTorrent, the task of distributing the file is shared by those who want it; it is entirely possible for
the seed to send only a single copy of the file itself to an unlimited number of peers.
Each piece is protected by a cryptographic hash contained in the torrent descriptor.[3] This prevents nodes
from maliciously modifying the pieces they pass on to other nodes. If a node starts with an authentic copy
of the torrent descriptor, it can verify the authenticity of the actual file it has received.
When a peer completely downloads a file, it becomes an additional seed. This eventual shift from peers to
seeders determines the overall "health" of the file (as determined by the number of times a file is available
in its complete form).
This distributed nature of BitTorrent leads to a flood like spreading of a file throughout peers. As more
peers join the swarm, the likelihood of a successful download increases. Relative to standard Internet
hosting, this provides a significant reduction in the original distributor's hardware and bandwidth resource
costs. It also provides redundancy against system problems, reduces dependence on the original
distributor[4] and provides a source for the file which is generally temporary and therefore harder to trace
than when provided by the enduring availability of a host in standard file distribution techniques.
[edit]Operation
In this animation, the colored bars beneath all of the 7 clients in the upper region above represent the file, with each
color representing a individual piece of the file. After the initial pieces transfer from the seed (large system at the
bottom), the pieces are individually transferred from client to client. The original seeder only needs to send out one copy
of the file for all the clients to receive a copy.
A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of
preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer
is any computer running an instance of a client.
To share a file or group of files, a peer first creates a small file called a "torrent" (e.g. MyFile.torrent). This
file contains metadata about the files to be shared and about the tracker, the computer that coordinates the
file distribution. Peers that want to download the file must first obtain a torrent file for it and connect to the
specified tracker, which tells them from which other peers to download the pieces of the file.
Though both ultimately transfer files over a network, a BitTorrent download differs from a classic download
(as is typical with an HTTPor FTP request, for example) in several fundamental ways:
BitTorrent makes many small data requests over different TCP connections to different machines,
while classic downloading is typically made via a single TCP connection to a single machine.
BitTorrent downloads in a random or in a "rarest-first"[5] approach that ensures high availability, while
classic downloads are sequential.
Taken together, these differences allow BitTorrent to achieve much lower cost to the content provider,
much higher redundancy, and much greater resistance to abuse or to "flash crowds" than regular server
software. However, this protection, theoretically, comes at a cost: downloads can take time to rise to full
speed because it may take time for enough peer connections to be established, and it may take time for a
node to receive sufficient data to become an effective uploader. This contrasts with regular downloads
(such as from an HTTP server, for example) that, while more vulnerable to overload and abuse, rise to full
speed very quickly and maintain this speed throughout.
In general, BitTorrent's non-contiguous download methods have prevented it from supporting "progressive
downloads" or "streaming playback". However, comments made by Bram Cohen in January 2007 suggest
that streaming torrent downloads will soon be commonplace and ad supported streaming appears to be the
result of those comments. In January 2011 Cohen demonstrated an early version of BitTorrent streaming,
saying the feature will be available by summer 2011.[5]
The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By
convention, the name of a torrent file has the suffix .torrent. Torrent files have an "announce" section,
which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files,
their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which are used by clients
to verify the integrity of the data they receive.
Torrent files are typically published on websites or elsewhere, and registered with at least one tracker. The
tracker maintains lists of the clients currently participating in the torrent.[7] Alternatively, in atrackerless
system (decentralized tracking) every peer acts as a tracker. Azureus was the first[citation needed] BitTorrent
client to implement such a system through the distributed hash table (DHT) method. An alternative and
incompatible DHT system, known as Mainline DHT, was later developed and adopted by
the BitTorrent (Mainline), µTorrent, Transmission, rTorrent, KTorrent, BitComet, andDeluge clients.
After the DHT was adopted, a "private" flag — analogous to the broadcast flag — was unofficially
introduced, telling clients to restrict the use of decentralized tracking regardless of the user's desires.[8]The
flag is intentionally placed in the info section of the torrent so that it cannot be disabled or removed without
changing the identity of the torrent. The purpose of the flag is to prevent torrents from being shared with
clients that do not have access to the tracker. The flag was requested for inclusion in the official
specification in August, 2008, but has not been accepted.[9] Clients that have ignored the private flag were
banned by many trackers, discouraging the practice.[10]
Clients incorporate mechanisms to optimize their download and upload rates; for example they download
pieces in a random order to increase the opportunity to exchange data, which is only possible if two peers
have different pieces of the file.
The effectiveness of this data exchange depends largely on the policies that clients use to determine to
whom to send data. Clients may prefer to send data to peers that send data back to them (a tit for
tat scheme), which encourages fair trading. But strict policies often result in suboptimal situations, such as
when newly joined peers are unable to receive any data because they don't have any pieces yet to trade
themselves or when two peers with a good connection between them do not exchange data simply
because neither of them takes the initiative. To counter these effects, the official BitTorrent client program
uses a mechanism called "optimistic unchoking", whereby the client reserves a portion of its available
bandwidth for sending pieces to random peers (not necessarily known good partners, so called preferred
peers) in hopes of discovering even better partners and to ensure that newcomers get a chance to join the
swarm.[11]
Although swarming scales well to tolerate flash crowds for popular content, it is less useful for unpopular
content. Peers arriving after the initial rush might find the content unavailable and need to wait for the
arrival of a seed in order to complete their downloads. The seed arrival, in turn, may take long to happen
(this is termed the seeder promotion problem). Since maintaining seeds for unpopular content entails high
bandwidth and administrative costs, this runs counter to the goals of publishers that value BitTorrent as a
cheap alternative to a client-server approach. This occurs on a huge scale; measurements have shown that
38% of all new torrents become unavailable within the first month.[12] A strategy adopted by many
publishers which significantly increases availability of unpopular content consists of bundling multiple files
in a single swarm.[13] More sophisticated solutions have also been proposed; generally, these use cross-
torrent mechanisms through which multiple torrents can cooperate to better share content.[14]
BitTorrent does not offer its users anonymity. It is possible to obtain the IP addresses of all current and
possibly previous participants in a swarm from the tracker. This may expose users with insecure systems to
attacks.[11] It may also expose users to the risk of being sued, if they are distributing files without permission
from the copyright holder(s). However, there are ways to promote anonymity; for example,
the OneSwarm project layers privacy-preserving sharing mechanisms on top of the original BitTorrent
protocol.
[edit]Adoption
A growing number of individuals and organizations are using BitTorrent to distribute their own or licensed
material. Independent adopters report that without using BitTorrent technology and its dramatically reduced
demands on their private networking hardware and bandwidth, they could not afford to distribute their
files.[15]
BitTorrent Inc. has amassed a number of licenses from Hollywood studios for distributing popular
content from their websites.
Sub Pop Records releases tracks and videos via BitTorrent Inc.[16] to distribute its 1000+
albums. Babyshambles and The Libertines (both bands associated with Pete Doherty) have
extensively used torrents to distribute hundreds of demos and live videos. US industrial rock band Nine
Inch Nails frequently distributes albums via BitTorrent.
Podcasting software is starting to integrate BitTorrent to help podcasters deal with the download
demands of their MP3 "radio" programs. Specifically, Juice and Miro (formerly known as Democracy
Player) support automatic processing of .torrent files from RSS feeds. Similarly, some BitTorrent
clients, such as µTorrent, are able to process web feeds and automatically download content found
within them.
In 2008, the CBC became the first public broadcaster in North America to make a full show (Canada's
Next Great Prime Minister) available for download using BitTorrent.[18]
The Norwegian Broadcasting Corporation (NRK) has since March 2008 experimented with bittorrent
distribution, available online.[19] Only selected material in which NRK owns all royalties are published.
Responses have been very positive, and NRK is planning to offer more content.
The Dutch VPRO broadcasting organization released three documentaries under a Creative
Commons license using the content distribution feature of the Mininova tracker.
[edit]Personal material
The Amazon S3 "Simple Storage Service" is a scalable Internet-based storage service with a
simple web service interface, equipped with built-in BitTorrent support.
Blog Torrent offers a simplified BitTorrent tracker to enable bloggers and non-technical users to host a
tracker on their site. Blog Torrent also allows visitors to download a "stub" loader, which acts as a
BitTorrent client to download the desired file, allowing users without BitTorrent software to use the
protocol.[20] This is similar to the concept of a self-extracting archive.
[edit]Software
Blizzard Entertainment uses BitTorrent (via a proprietary client called the "Blizzard Downloader") to
distribute most content for StarCraft II and World of Warcraft, including the games themselves.[21]
Many software games, especially those whose large size makes them difficult to host due to bandwidth
limits, extremely frequent downloads, and unpredictable changes in network traffic, will distribute
instead a specialized, stripped down bittorrent client with enough functionality to download the game
from the other running clients and the primary server (which is maintained in case not enough peers
are available).
Many major open source and free software projects encourage BitTorrent as well as conventional
downloads of their products (via HTTP, FTP etc.) to increase availability and to reduce load on their
own servers, especially when dealing with larger files.[22]
Entropia Universe also begun distributing the client file(s) through BitTorrent.
[edit]Government
The UK government used BitTorrent to distribute details about how the tax money of UK citizens was
spent.[23][24]
[edit]Others
CableLabs, the research organization of the North American cable industry, estimates that BitTorrent
represents 18% of all broadband traffic.[28][dated info] In 2004, CacheLogic put that number at roughly 35% of
all traffic on the Internet.[29][dated info] The discrepancies in these numbers are caused by differences in the
method used to measure P2P traffic on the Internet.[30]
Routers that use network address translation (NAT) must maintain tables of source and destination IP
addresses and ports. Typical home routers are limited to about 2000 table entries while some more
expensive routers have larger table capacities. BitTorrent frequently contacts 300–500 servers per second
rapidly filling the NAT tables. This is a common cause of home routers locking up.[31]
[edit]Indexing
The BitTorrent protocol provides no way to index torrent files. As a result, a comparatively small number of
websites have hosted a large majority of torrents, many linking to copyrighted material without the
authorization of copyright holders, rendering those sites especially vulnerable to lawsuits.[32] Several types
of websites support the discovery and distribution of data on the BitTorrent network.
Public torrent hosting sites such as The Pirate Bay allow users to search and download from their collection
of torrent files. Users can typically also upload torrent files for content they wish to distribute. Often, these
sites also run BitTorrent trackers for their hosted torrent files, but these two functions are not mutually
dependent: a torrent file could be hosted on one site and tracked by another, unrelated site.
Private host/tracker sites operate like public ones except that they restrict access to registered users and
keep track of the amount of data each user uploads and downloads, in an attempt to reduceleeching.
Search engines allow the discovery of torrent files that are hosted and tracked on other sites; examples
include Mininova, BTJunkie, Torrentz, The Pirate Bay, Eztorrent and isoHunt. These sites allow the user to
ask for content meeting specific criteria (such as containing a given word or phrase) and retrieve a list of
links to torrent files matching those criteria. This list can often be sorted with respect to several criteria,
relevance (seeders-leechers ratio) being one of the most popular and useful (due to the way the protocol
behaves, the download bandwidth achievable is very sensitive to this value). Bram Cohen launched a
BitTorrent search engine on http://www.bittorrent.com/search that co-mingles licensed content with search
results.[33] Metasearch engines allow one to search several BitTorrent indices and search engines at once.
The BitTorrent protocol is still under development and therefore may still acquire new features and other
enhancements such as improved efficiency.
[edit]Distributed trackers
On May 2, 2005, Azureus 2.3.0.0 (now known as Vuze) was released,[34] introducing support for
"trackerless" torrents through a system called the "distributed database." This system is
a DHTimplementation which allows the client to use torrents that do not have a working BitTorrent tracker.
The following month, BitTorrent, Inc. released version 4.2.0 of the Mainline BitTorrent client, which
supported an alternative DHT implementation (popularly known as "Mainline DHT") that is incompatible
with that of Azureus. Current versions of the official BitTorrent client, µTorrent, BitComet, and BitSpirit all
share compatibility with Mainline DHT. Both DHT implementations are based on Kademlia.[35] As of version
3.0.5.0, Azureus also supports Mainline DHT in addition to its own distributed database through use of an
optional application plugin.[36] This potentially allows the Azureus client to reach a bigger swarm.
Another idea that has surfaced in Vuze is that of virtual torrents. This idea is based on the distributed
tracker approach and is used to describe some web resource. Currently, it is used for instant messaging. It
is implemented using a special messaging protocol and requires an appropriate plugin. Anatomic P2P is
another approach, which uses a decentralized network of nodes that route traffic to dynamic trackers.
Most BitTorrent clients also use Peer exchange (PEX) to gather peers in addition to trackers and DHT.
Peer exchange checks with known peers to see if they know of any other peers. With the 3.0.5.0 release of
Vuze, all major BitTorrent clients now have compatible peer exchange.
[edit]Web seeding
Web seeding was implemented in 2006 as the ability of BitTorrent clients to download torrent pieces from
an HTTP source in addition to the swarm. The advantage of this feature is that a website may distribute a
torrent for a particular file or batch of files and make those files available for download from that same web
server; this can simplify long-term seeding and load balancing through the use of existing, cheap, web
hosting setups. In theory, this would make using BitTorrent almost as easy for a web publisher as creating
a direct HTTP download. In addition, it would allow the "web seed" to be disabled if the swarm becomes
too popular while still allowing the file to be readily available.
The first was created by John "TheSHAD0W" Hoffman, who created BitTornado.[37] From version 5.0
onward, the Mainline BitTorrent client also supports web seeds, and the BitTorrent web site had[38]a simple
publishing tool that creates web seeded torrents.[39] µTorrent added support for web seeds in version
1.7. BitComet added support for web seeds in version 1.14. This first specification requires running a web
service that serves content by info-hash and piece number, rather than filename.
In September 2010, a new service named Burnbit was launched which generates a torrent from any URL
using webseeding. [41]
[edit]RSS feeds
Main article: Broadcatching
A technique called Broadcatching combines RSS with the BitTorrent protocol to create a content delivery
system, further simplifying and automating content distribution. Steve Gillmor explained the concept in a
column for Ziff-Davis in December, 2003.[42] The discussion spread quickly among bloggers (Ernest
Miller,[43] Chris Pirillo, etc.). In an article entitled Broadcatching with BitTorrent, Scott Raymond explained:
I want RSS feeds of BitTorrent files. A script would periodically check the feed for new items, and use them
to start the download. Then, I could find a trusted publisher of an Alias RSS feed, and "subscribe" to all
new episodes of the show, which would then start downloading automatically — like the "season pass"
feature of the TiVo.
—Scott Raymond, scottraymond.net[44]
The RSS feed will track the content, while BitTorrent ensures content integrity
with cryptographic hashing of all data, so feed subscribers will receive uncorrupted content.
One of the first and popular software clients (free and open source) for broadcatching is Miro. Other free
software clients such as PenguinTV and KatchTV are also now supporting broadcatching.
The BitTorrent web-service MoveDigital has the ability to make torrents available to any web application
capable of parsing XML through its standard REST-based interface.[45] Additionally, Torrenthut is
developing a similar torrent API that will provide the same features, as well as further intuition to help bring
the torrent community to Web 2.0 standards. Alongside this release is a first PHP application built using the
API called PEP, which will parse any Really Simple Syndication (RSS 2.0) feed and automatically create
and seed a torrent for each enclosure found in that feed.[46]
Since BitTorrent makes up a large proportion of total traffic, some ISPs have chosen to throttle (slow down)
BitTorrent transfers to ensure network capacity remains available for other uses. For this reason, methods
have been developed to disguise BitTorrent traffic in an attempt to thwart these efforts.[47]
Protocol header encrypt (PHE) and Message stream encryption/Protocol encryption (MSE/PE) are features
of some BitTorrent clients that attempt to make BitTorrent hard to detect and throttle. At the
moment Vuze, Bitcomet, KTorrent, Transmission, Deluge, µTorrent, MooPolice, Halite, rTorrent and the
latest official BitTorrent client (v6) support MSE/PE encryption.
In September 2006 it was reported that some software could detect and throttle BitTorrent traffic
masquerading as HTTP traffic.[48]
Reports in August 2007 indicated that Comcast was preventing BitTorrent seeding by monitoring and
interfering with the communication between peers. Protection against these efforts is provided
byproxying the client-tracker traffic via an encrypted tunnel to a point outside of the Comcast
network.[49] Comcast has more recently called a "truce" with BitTorrent, Inc. with the intention of shaping
traffic in a protocol-agnostic manner.[50] Questions about the ethics and legality of Comcast's behavior have
led to renewed debate about Net neutrality in the United States.[51]
In general, although encryption can make it difficult to determine what is being shared, BitTorrent is
vulnerable to traffic analysis. Thus even with MSE/PE, it may be possible for an ISP to recognize BitTorrent
and also to determine that a system is no longer downloading but only uploading data, and terminate its
connection by injecting TCP RST (reset flag) packets.
[edit]Multitracker
Another unofficial feature is an extension to the BitTorrent metadata format proposed by John
Hoffman[52] and implemented by several indexing websites. It allows the use of multiple trackers per file, so
if one tracker fails, others can continue to support file transfer. It is implemented in several clients, such
as BitComet, BitTornado, BitTorrent, KTorrent, Transmission, Deluge, µTorrent, rtorrent, andVuze.
Trackers are placed in groups, or tiers, with a tracker randomly chosen from the top tier and tried, moving
to the next tier if all the trackers in the top tier fail.
Torrents with multiple trackers[53] can decrease the time it takes to download a file, but also has a few
consequences:
Poorly implemented[54] clients may contact multiple trackers, leading to more overhead-traffic.
Torrents from closed trackers suddenly become downloadable by non-members, as they can connect
to a seed via an open tracker.
[edit]Decentralized keyword search
Even with distributed trackers, a third party is still required to find a specific torrent. This is usually done in
the form of a hyperlink from the website of the content owner or through indexing websites likeThe Pirate
Bay or Torrentz.
The Tribler BitTorrent client is the first to incorporate decentralized search capabilities. With Tribler, users
can find .torrent files that are hosted among other peers, instead of on a centralized index sites. It adds
such an ability to the BitTorrent protocol using a gossip protocol, somewhat similar to the eXeem network
which was shut down in 2005. The software includes the ability to recommend content as well. After a
dozen downloads the Tribler software can roughly estimate the download taste of the user and recommend
additional content.[55]
In May 2007 Cornell University published a paper proposing a new approach to searching a peer-to-peer
network for inexact strings,[56] which could replace the functionality of a central indexing site. A year later,
the same team implemented the system as a plugin for Vuze called Cubit[57] and published a follow-up
paper reporting its success.[58]
A somewhat similar facility but with a slightly different approach is provided by the BitComet client through
its "Torrent Exchange"[59] feature. Whenever two peers using BitComet (with Torrent Exchange enabled)
connect to each other they exchange lists of all the torrents (name and info-hash) they have in the Torrent
Share storage (torrent files which were previously downloaded and for which the user chose to enable
sharing by Torrent Exchange).
Thus each client builds up a list of all the torrents shared by the peers it connected to in the current session
(or it can even maintain the list between sessions if instructed). At any time the user can search into that
Torrent Collection list for a certain torrent and sort the list by categories. When the user chooses to
download a torrent from that list, the .torrent file is automatically searched for (by info-hash value) in
the DHT Network and when found it is downloaded by the querying client which can after that create and
initiate a downloading task.
[edit]Implementations
The BitTorrent specification is free to use and many clients are open source, so BitTorrent clients have
been created for all common operating systems using a variety of programming languages. Theofficial
BitTorrent client, µTorrent, Vuze, Transmission, and BitComet are some of the most popular clients.[citation
needed]
Some BitTorrent implementations such as MLDonkey and Torrentflux are designed to run as servers. For
example, this can be used to centralize file sharing on a single dedicated server which users share access
to on the network.[60] Server-oriented BitTorrent implementations can also be hosted by hosting
providers at co-located facilities with high bandwidth Internet connectivity (e.g., a datacenter) which can
provide dramatic speed benefits over using BitTorrent from a regular home broadband connection.
Services such as ImageShack can download files on BitTorrent for the user, allowing them to download the
entire file by HTTP once it is finished.
The Opera web browser supports BitTorrent,[61] as does Wyzo. BitLet allows users to download Torrents
directly from their browser using a Java applet. Sites such as xFiles and DuShare allow to transfer big files
directly using bittorrent inside adobe Flash.
An increasing number of hardware devices are being made to support BitTorrent. These include routers
and NAS devices containing BitTorrent-capable firmware like OpenWrt.
Proprietary versions of the protocol which implement DRM, encryption, and authentication are found within
managed clients such as Pando.
[edit]Development
An unimplemented (as of February 2008) unofficial feature is Similarity Enhanced Transfer (SET), a
technique for improving the speed at which peer-to-peer file sharing and content distribution systems can
share data. SET, proposed by researchers Pucha, Andersen, and Kaminsky, works by spotting chunks of
identical data in files that are an exact or near match to the one needed and transferring these data to the
client if the "exact" data are not present. Their experiments suggested that SET will help greatly with less
popular files, but not as much for popular data, where many peers are already downloading it.[62] Andersen
believes that this technique could be immediately used by developers with the BitTorrent file sharing
system.[63]
As of December 2008, BitTorrent, Inc. is working with Oversi on new Policy Discover Protocols that query
the ISP for capabilities and network architecture information. Oversi's ISP hosted NetEnhancer box is
designed to "improve peer selection" by helping peers find local nodes, improving download speeds while
reducing the loads into and out of the ISP's network.[64]
[edit]Legal issues
There has been much controversy over the use of BitTorrent trackers. BitTorrent metafiles themselves do
not store file contents. Whether the publishers of BitTorrent metafiles violate copyrights by linking to
copyrighted material without the authorization of copyright holders is controversial.
Various jurisdictions have pursued legal action against websites that host BitTorrent trackers. High-profile
examples include the closing of Suprnova.org, Torrentspy, LokiTorrent, Mininova and OiNK.cd.The Pirate
Bay torrent website, formed by a Swedish group, is noted for the "legal" section of its website in which
letters and replies on the subject of alleged copyright infringements are publicly displayed. On 31 May
2006, The Pirate Bay's servers in Sweden were raided by Swedish police on allegations by the MPAA of
copyright infringement;[65] however, the tracker was up and running again three days later.
Several studies on BitTorrent have indicated that a large portion of files available for download via
BitTorrent contain malware. In particular, one small sample[66] indicated that 18% of all executable
programs available for download contained malware. Another study [67] claims that as much as 14.5% of
BitTorrent downloads contain zero-day malware, and that BitTorrent was used as the distribution
mechanism for 47% of all zero-day malware they have found.