A BitTorrent Client in Python 3
A BitTorrent Client in Python 3
ABitTorrentclientinPython3.5
Python3.5comeswithsupportforasynchronousIO,whichseemslikeaperfectfitwhen
implementingaBitTorrentclient.ThisarticlewillguideyouthroughtheBitTorrentprotocol
detailswhileshowcasinghowasmallclientwasimplementedusingit.
PostedinCodewithtagsPython,BitTorrent,atWednesday,August24,2016
WhenPython3.5wasreleasedtogetherwiththenewmoduleasyncioIwascuriostogiveitatry.RecentlyIdecidedto
implementasimpleBitTorrentclientusingasyncioIhavealwaysbeeninterestedinpeertopeerprotocolsandit
seemedlikeaperfectfit.
TheprojectisnamedPieces,allofthesourcecodeisavailableatGitHubandreleasedundertheApache2license.
Feelfreetolearnfromit,stealfromit,improveit,laughatitorjustignoreit.
IpreviouslypostedashortintroductiontoPythonsasyncmodule.Ifthisisyourfirsttimelookingat asyncio
itmightbeagoodideatoreadthroughthatonefirst.
AnintroductiontoBitTorrent
BitTorrent has been around since 2001 when Bram Cohen authored the first version of the protocol. The big
breakthroughwaswhensitesasThePirateBaymadeitpopulartousefordownloadingpiratedmaterial.Streaming
sites, such as Netflix, might have resulted in a decrease of people using BitTorrent for downloading movies. But
BitTorrentisstillusedinanumberofdifferent,legal,solutionswheredistributionoflargerfilesareimportant.
Facebookuseittodistributeupdateswithintheirhugedatacenters
AmazonS3implementitfordownloadingofstaticfiles
TraditionaldownloadsstillusedforlargerfilessuchasLinuxdistributions
BitTorrent is a peertopeer protocol, where peersjoinaswarm of other peers to exchange pieces of data between
eachother.Eachpeerisconnectedtomultiplepeersatthesametime,andthusdownloadingoruploadingtomultiple
peersatthesametime.Thisisgreatintermsoflimitingbandwidthcomparedtowhenafileisdownloadedfroma
centralserver.Itisalsogreatforkeepingafileavailableasitdoesnotrelyonasinglesourcebeingonline.
While going through the implementation it might be good to have read, or to have another tab open with the
Unofficial BitTorrent Specification. This is without a doubt the best source of information on the BitTorrent
protocol.Theofficialspecificationisvagueandlackscertaindetailssotheunofficialistheoneyouwanttostudy.
Parsinga.torrentfile
Thefirstthingaclientneedstodoistofindoutwhatitissupposedtodownloadandfromwhere.Thisinformationis
whatisstoredinthe .torrent file,a.k.a.themetainfo.Thereisanumberofpropertiesstoredinthemetainfothat
weneedinordertosuccessfullyimplementaclient.
Thingslike:
Thenameofthefiletodownload
Thesizeofthefiletodownload
TheURLtothetrackertoconnectto
AllthesepropertiesarestoredinabinaryformatcalledBencoding.
Bencoding supports four different data types, dictionaries,lists,integers and strings it is fairly easy translate to
PythonsobjectliteralsorJSON.
BelowisbencodingdescribedinAugmentedBackusNaurFormcourtesyoftheHaskelllibrary.
Inpieces the encoding and decoding of bencoded data is implemented in the pieces.bencoding module (source
code).
HereareafewexamplesdecodingbencodeddataintoaPythonrepresentationusingthatmodule.
Likewise,aPythonobjectstructurecanbeencodedintoabencodedbytestringusingthesamemodule.
>>> Encoder(123).encode()
b'i123e'
>>> d = OrderedDict()
>>> d['cow'] = 'moo'
>>> d['spam'] = 'eggs'
>>> Encoder(d).encode()
bytearray(b'd3:cow3:moo4:spam4:eggse')
Theseexamplescanalsobefoundintheunittests.
Theparserimplementationisprettystraightforward,noasyncioisusedherethough,notevenreadingthe .torrent
fromdisk.
Here you can read see some of the metadata such as the name of the destination file (ubuntu16.04desktop
amd64.iso)andthetotalsizeinbytes(1485881344).
Noticehowthekeysusedinthe OrderedDict arebinary strings. Bencoding is a binary protocol, and using UTF8
stringsaskeyswillnotwork!
Awrapperclass pieces.torrent.Torrent exposing these properties is implemented abstracting the binary strings,
andotherdetailsawayfromtherestoftheclient.Thisclassonlyimplementstheattributesusedinpiecesclient.
Iwillnotgothroughwhichattributesthatisavailable,insteadtherestofthisarticlewillreferbacktoattributesfound
inthe .torrent /metainfowereused.
Connectingtothetracker
Now that we can decode a .torrent file and we have a Python representation of the data, we need to get a list of
Now that we can decode a .torrent file and we have a Python representation of the data, we need to get a list of
peerstoconnectwith.Thisiswherethetrackercomesin.Atrackerisacentralserverkeepingtrackofavailablepeers
foragiventorrent.AtrackerdoesNOTcontainanyofthetorrentdata,onlywhichpeersthatcanbeconnectedtoand
theirstatistics.
Buildingtherequest
The announce property in the metainfo is the HTTP URL to the tracker to connect to using the following URL
parameters:
Parameter Description
info_hash TheSHA1hashoftheinfodictfoundinthe .torrent
peer_id AuniqueIDgeneratedforthisclient
uploaded Thetotalnumberofbytesuploaded
downloaded Thetotalnumberofbytesdownloaded
left Thenumberofbyteslefttodownloadforthisclient
port TheTCPportthisclientlistenson
compact Whetherornottheclientacceptsacompactedlistofpeersornot
The peer_id needs to be exactly 20 bytes, and there are two major conventions used on how to generate this ID.
PiecesfollowstheAzureusstyleconventiongeneratingpeeridlike:
Atrackerrequestcanlooklikethisusinghttpie:
d8:completei3651e10:incompletei385e8:intervali1800e5:peers300:%yOk.
@_<K+
\mb^Tn^ O
A*1*>B)/u
...
TheresponsedataistruncatedsinceitcontainsbinarydatathatscrewsuptheMarkdownformatting.
Fromthetrackerresponse,thereistwopropertiesofinterest:
intervalTheintervalinsecondsuntiltheclientshouldmakeanewannouncecalltothetracker.
peersThelistofpeersisabinarystringwithalengthofmultipleof6bytes.Whereeachpeerconsistofa4byte
IPaddressanda2byteportnumber(sinceweareusingthecompactformat).
So, a successful announce call made to the tracker, gives you a list of peers to connect to. This might not be all
available peers in this swarm, only the peers the tracker assigned your client to connect. A subsequent call to the
trackermightresultinanotherlistofpeers.
AsyncHTTP
PythondoesnotcomewithabuiltinsupportforasyncHTTPandmybelovedrequestslibrarydoesnotimplement
asyncioeither.ScoutingaroundtheInternetitlookslikemostuseaiohttp,whichimplementbothaHTTPclientand
server.
The method is declared using async and uses the new asynchronous context manager async with to allow
beingsuspendedwhiletheHTTPcallisbeingmade.Givenasuccessfulresponse,thismethodwillbesuspendedagain
while reading the binary response data await response.read() . Finally the response data is wrapped in a
TrackerResponse instancecontainingthelistofpeers,alternativeanerrormessage.
Theloop
Everything up to this point could really have been made synchronously, but now that we are about to connect to
multiplepeersweneedtogoasynchronous.
import asyncio
loop = asyncio.get_event_loop()
client = TorrentClient(Torrent(args.torrent))
task = loop.create_task(client.start())
try:
loop.run_until_complete(task)
except CancelledError:
logging.warning('Event loop was canceled')
Isthatit?No,notreallywehaveourownloop(noteventloop)implementedinthe pieces.client.TorrentClient
Isthatit?No,notreallywehaveourownloop(noteventloop)implementedinthe pieces.client.TorrentClient
thatsetsupthepeerconnections,schedulestheannouncecall,etc.
TorrentClient issomethinglikeaworkcoordinator,itstartsbycreatingaasync.Queuewhichwillholdthelistof
availablepeersthatcanbeconnectedto.
Then it constructs N number of pieces.protocol.PeerConnection which will consume peers from off the queue.
These PeerConnection instances will wait ( await ) until there is a peer available in the Queue for one of them to
connectto(notblocking).
Letshavealookatthisloop:
while True:
if self.piece_manager.complete:
break
if self.abort:
break
current = time.time()
if (not previous) or (previous + interval < current):
response = await self.tracker.connect(
first=previous if previous else False,
uploaded=self.piece_manager.bytes_uploaded,
downloaded=self.piece_manager.bytes_downloaded)
if response:
previous = current
interval = response.interval
self._empty_queue()
for peer in response.peers:
self.available_peers.put_nowait(peer)
else:
await asyncio.sleep(5)
self.stop()
Basically,whatthatloopdoesisto:
1. Checkifwehavedownloadedallpieces
2. Checkifuseraborteddownload
3. Makeaannoucecalltothetrackerifneeded
4. Addanyretrievedpeerstoaqueueofavailablepeers
5. Sleep5seconds
So, each time an announce call is made to the tracker, the list of peers to connect to is reset, and if no peers are
retrieved,no PeerConnection willrun.Thisgoesonuntilthedownloadiscompleteoraborted.
Thepeerprotocol
After receiving a peer IP and portnumber from the tracker, our client will to open a TCP connection to that peer.
Oncetheconnectionisopen,thesepeerswillstarttoexchangemessagesusingthepeerprotocol.
First,letsgothroughthedifferentpartsofthepeerprotocol,andthengothroughhowitisallimplemented.
Handshake
The first message sent needs to be a Handshake message, and it is the connecting client that is responsible for
initiatingthis.
ImmediatelyaftersendingtheHandshake,ourclientshouldreceiveaHandshakemessagesentfromtheremotepeer.
peer_idTheuniqueIDofeitherpeer
info_hashTheSHA1hashvaluefortheinfodict
Eachclientstartsinthestatechokedandnotinterested.Thatmeansthattheclientisnotallowedtorequestpieces
fromtheremotepeer,nordowehaveintentofbeinginterested.
ChokedAchokedpeerisnotallowedtorequestanypiecesfromtheotherpeer.
UnchokedAunchokedpeerisallowedtorequestpiecesfromtheotherpeer.
InterestedIndicatesthatapeerisinterestedinrequestingpieces.
NotinterestedIndicatesthatthepeerisnotinterestedinrequestingpieces.
Consider Choked and Unchoked to be rules and Interested and Not Interested to be intents between two
peers.
Thefollowingsequenceofmessagesiswhatweareaimingforwhensettingupa PeerConnection :
Handshake
client --------------> peer We are initiating the handshake
Handshake
client <-------------- peer Comparing the info_hash with our hash
BitField
client <-------------- peer Might be receiving the BitField
Interested
client --------------> peer Let peer know we want to download
Unchoke
client <-------------- peer Peer allows us to start requesting pieces
Requestingpieces
As soon as the client gets into a unchoked state it will start requesting pieces from the connected peer. The details
surroundingwhichpiecetorequestisdetailedlater,inManagingthepieces.
Othermessages
Have
KeepAlive
Implementation
The PeerConnection opensa TCPconnectiontoaremotepeerusing asyncio.open_connection toasynchronously
open a TCP connection that returns a tuple of StreamReader and a StreamWriter . Given that the connection was
createdsuccessfully,the PeerConnection willsendandreceivea Handshake message.
Onceahandshakeismade,thePeerConnectionwilluseanasynchronousiteratortoreturnastreamof PeerMessages
andtaketheappropriateaction.
Upon iterating (calling next) the PeerStreamIterator will read data from the StreamReader and if enough data is
availabletrytoparseandreturnavalid PeerMessage .
TheBitTorrentprotocolusesmessageswithvariablelength,whereallmessagestakestheform:
<length><id><payload>
lengthisa4byteintegervalue
idisasingledecimalbyte
payloadisvariableandmessagedependent
Soassoonasthebufferhaveenoughdataforthenextmessageitwillbeparsedandreturnedfromtheiterator.
Allmessages aredecodedusingPythons module struct which contains functions to convert to and from Pythons
valuesandCstructs.Structusecompactstringsasdescriptorsonwhattoconvert,e.g. >Ib readsasBigEndian,4
byteunsignedinteger,1bytecharacter.
NotethatallmessagesusesBigEndianinBitTorrent.
This makes it easy to create unit tests to encode and decode messages. Lets have a look on the tests for the Have
message:
class HaveMessageTests(unittest.TestCase):
def test_can_construct_have(self):
have = Have(33)
self.assertEqual(
have.encode(),
b"\x00\x00\x00\x05\x04\x00\x00\x00!")
def test_can_parse_have(self):
have = Have.decode(b"\x00\x00\x00\x05\x04\x00\x00\x00!")
self.assertEqual(33, have.index)
From the raw binary string we can tell that the Have message have a length of 5 bytes \x00\x00\x00\x05 anidof
value4 \x04 andthepayloadis33 \x00\x00\x00! .
Since the message length is 5 and ID only use a single byte we know that we have four bytes to interpret as the
payloadvalue.Using struct.unpack wecaneasilyconvertittoapythonintegerlike:
Thatisbasicallyitregardingtheprotocol,allmessagesfollowthesameprocedureandtheiteratorkeepsreadingfrom
thesocketuntilitgetsdisconnected.Seethesourcecodefordetailsonallmessages.
Managingthepieces
Sofarwehaveonlydiscussedpiecespiecesofdatabeingexchangedbytwopeers.Itturnsoutthatpiecesisnotthe
entiretruth,thereisonemoreconceptblocks.Ifyouhavelookedthroughanyofthesourcecodeyoumighthave
seencodereferingtoblocks,soletsgothroughwhatapiecereallyis.
Apieceis,unsurprisingly,apartialpieceofthetorrentsdata.AtorrentsdataissplitintoNnumberofpiecesofequal
size(exceptthelastpieceinatorrent,whichmightbeofsmallersizethantheothers).Thepiecelengthisspecifiedin
the .torrent file.Typicallypiecesareofsizes512kBorless,andshouldbeapowerof2.
Piecesarestilltoobigtobesharedefficientlybetweenpeers,sopiecesarefurtherdividedintosomethingreferredto
asblocks.Blocksisthechunksofdatathatisactuallyrequestedbetweenpeers,butpiecesarestillusedtoindicate
which peer that have which pieces. If only blocks should have been used it would increase the overhead in the
protocolgreatly(resultinginlongerBitFields,moreHavemessageandlarger .torrent files).
Ablockis2^14(16384)bytesinsize,exceptthefinalblockthatmostlikelywillbeofasmallersize.
name: foo.txt
length: 135168
piece length: 49152
Thatsmalltorrentwouldresultin3pieces:
piece 0:
block 0: 16 384 bytes (2^14)
block 1: 16 384 bytes
block 2: 16 384 bytes
= 49 152 bytes
piece 1:
block 0: 16 384 bytes
block 1: 16 384 bytes
block 2: 16 384 bytes
= 49 152 bytes
piece 2:
block 0: 16 384 bytes
block 1: 16 384 bytes
block 2: 4 096 bytes
= 36 864 bytes
ExchangingtheseblocksbetweenpeersisbasicallywhatBitTorrentisabout.Onceallblocksforapieceisdone,that
piece is complete and can be shared with other peers (the Have message is sent to connected peers). And once all
piecesarecompletethepeertransformfromadownloadertoonlybeaseeder.
Twonotesonwheretheofficialspecificationisabitoff:
1. Theofficialspecificationrefertobothpiecesandblocksasjustpieceswhichisquiteconfusing.Theunofficial
specificationandothersseemtohaveagreeduponusingthetermblockforthesmallerpiecewhichiswhatwe
willuseaswell.
2. Theofficialspecificationisstatinganotherblocksizethatwhatweuse.Readingtheunofficialspecification,it
seemsthat2^14bytesiswhatisagreedamongimplementersregardlessoftheofficialspecification.
Theimplementation
Whena TorrentClient isconstructed,soisa PieceManager withtheresposibilityto:
Determinewhichblocktorequestnext
Persistingreceivedblockstofile
Determinewhenadownloadiscomplete.
Thiswaytheblocksandpieceswillberequstedinorder.However,multiplepiecesmightbeongoingbasedonwhich
pieceaclienthave.
Sincepiecesaimstobeasimpleclient,noefforthavebeenmadeonimplementingasmartorefficientstrategyfor
which pieces to request. A better solution would be to request the rarest piece first, which would make the entire
swarmhealthieraswell.
Wheneverablockisreceivedfromapeer,itisstored(inmemory)bythePieceManager.Whenallblocksforapieceis
retrieved,aSHA1hashismadeonthepiece.ThishashiscomparedtotheSHA1hashesincludeinthe .torrent info
dictifitmatchesthepieceiswrittentodisk.
When all pieces are accounted for (matching hashes) the torrent is considered to be complete, which stops the
TorrentClient closinganyopenTCPconnectionandasaresulttheprogramexitswithamessagethatthetorrentis
downloaded.
Futurework
Seedingisnotyetimplemented,butitshouldnotbethathardtoimplement.Whatisneededissomethingalongthe
linesofthis:
HavingseedingimplementedwouldmakePiecesagoodcitizen,supportingbothdownloadinganduploadingofdata
withintheswarm.
Additionalfeaturesthatprobablycanbeaddedwithouttoomucheffortis:
Resume a download, by seeing what parts of the file(s) are already downloaded (verified by making SHA1
hashes).
Summary
It was real fun to implement a BitTorrent client, having to handle binary protocols and networking was great to
balanceallthatrecentwebdevelopmentIhavebeendoing.
Python continues to be one of my favourite programming language. Handling binary data was a breeze given the
struct moduleandtherecentaddition asyncio feelsverypythonic.Usingasynciteratortoimplementtheprotocol
turnedouttobeagoodfitaswell.
HopefullythisarticleinspiredyoutowriteaBitTorrentclientofyourown,ortoextendpiecesinsomeway.Ifyou
spotanyerrorinthearticleorthesourcecode,feelfreetoopenanissueoveratGitHub.
Comments
commentspoweredbyDisqus
MarkusEliasson.
Athoroughtechnicalleadwithapassionforproducingvaluableandcleancode.Tendstooccasionallyblogaboutbuildingsoftwareandcan'tseemtomakeuphis
mindonwhichprogramminglanguagetousenext.