0% found this document useful (0 votes)
56 views35 pages

p2p Mie PDF

Uploaded by

1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views35 pages

p2p Mie PDF

Uploaded by

1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

An Introduction to

Peer-to-Peer Networks
Presentation for
MIE456 - Information Systems
Infrastructure II

Vinod Muthusamy
October 30, 2003
Agenda
n Overview of P2P
n Characteristics
n Benefits

n Unstructured P2P systems


n Napster (Centralized)
n Gnutella (Distributed)
n Kazaa/Fasttrack (Super-peers)

n Structured P2P systems (DHTs)


n Chord
n Pastry
n CAN

n Conclusions
Client/Server Architecture
n Well known,
powerful, reliable Server
server is a data
source
Client Client
n Clients request data
from server Internet

n Very successful Client Client


model
n WWW (HTTP), FTP,
Web services, etc.

* Figure from http://project-iris.net/talks/dht-toronto-03.ppt


Client/Server Limitations
n Scalability is hard to achieve
n Presents a single point of failure
n Requires administration
n Unused resources at the network edge

n P2P systems try to address these limitations


P2P Computing*
n P2P computing is the sharing of computer resources and
services by direct exchange between systems.

n These resources and services include the exchange of


information, processing cycles, cache storage, and disk
storage for files.

n P2P computing takes advantage of existing computing


power, computer storage and networking connectivity,
allowing users to leverage their collective power to the
benefit of all.

* From http://www-sop.inria.fr/mistral/personnel/Robin.Groenevelt/
Publications/Peer-to-Peer_Introduction_Feb.ppt
P2P Architecture
n All nodes are both
clients and servers
n Provide and consume Node
data
n Any node can initiate a Node Node
connection
Internet
n No centralized data
source
n The ultimate form of Node Node
democracy on the
Internet
n The ultimate threat to
copy-right protection on
the Internet

* Content from http://project-iris.net/talks/dht-toronto-03.ppt


P2P Network Characteristics
n Clients are also servers and routers
n Nodes contribute content, storage, memory, CPU
n Nodes are autonomous (no administrative
authority)
n Network is dynamic: nodes enter and leave the
network frequently
n Nodes collaborate directly with each other (not
through well-known servers)
n Nodes have widely varying capabilities
P2P Benefits
n Efficient use of resources
n Unused bandwidth, storage, processing power at the edge of the network

n Scalability
n Consumers of resources also donate resources
n Aggregate resources grow naturally with utilization

n Reliability
n Replicas
n Geographic distribution
n No single point of failure

n Ease of administration
n Nodes self organize
n No need to deploy servers to satisfy demand (c.f. scalability)
n Built-in fault tolerance, replication, and load balancing
P2P Applications
n Are these P2P systems?

n File sharing (Napster, Gnutella, Kazaa)

n Multiplayer games (Unreal Tournament, DOOM)

n Collaborative applications (ICQ, shared whiteboard)

n Distributed computation (Seti@home)

n Ad-hoc networks
Popular P2P Systems
n Napster, Gnutella, Kazaa, Freenet

n Large scale sharing of files.


n User A makes files (music, video, etc.) on their
computer available to others
n User B connects to the network, searches for
files and downloads files directly from user A

n Issues of copyright infringement


Napster
n A way to share music files with
others

n Users upload their list of files to


Napster server
n You send queries to Napster
server for files of interest
n Keyword search (artist, song,
album, bitrate, etc.)
n Napster server replies with IP
address of users with matching
files
n You connect directly to user A
to download file

* Figure from http://computer.howstuffworks.com/file-sharing.htm


Napster
n Central Napster server
n Can ensure correct results
n Bottleneck for scalability
n Single point of failure
n Susceptible to denial of service
n Malicious users
n Lawsuits, legislation

n Search is centralized
n File transfer is direct (peer-to-peer)
Gnutella
n Share any type of files
(not just music)
n Decentralized search
unlike Napster

n You ask your neighbours


for files of interest
n Neighbours ask their
neighbours, and so on
n TTL field quenches
messages after a
number of hops
n Users with matching files
reply to you

* Figure from http://computer.howstuffworks.com/file-sharing.htm


Gnutella
n Decentralized
n No single point of failure
n Not as susceptible to denial of service
n Cannot ensure correct results

n Flooding queries
n Search is now distributed but still not scalable
Kazaa (Fasttrack network)
n Hybrid of centralized Napster and decentralized Gnutella

n Super-peers act as local search hubs


n Each super-peer is similar to a Napster server for a small portion of
the network
n Super-peers are automatically chosen by the system based on their
capacities (storage, bandwidth, etc.) and availability (connection
time)

n Users upload their list of files to a super-peer


n Super-peers periodically exchange file lists
n You send queries to a super-peer for files of interest
Free riding*
n File sharing networks rely on users sharing data

n Two types of free riding


n Downloading but not sharing any data
n Not sharing any interesting data

n On Gnutella
n 15% of users contribute 94% of content
n 63% of users never responded to a query
n Didnt have interesting data

* Data from E. Adar and B.A. Huberman (2000), Free Riding on Gnutella
Anonymity
n Napster, Gnutella, Kazaa dont provide
anonymity
n Users know who they are downloading from
n Others know who sent a query

n Freenet
n Designed to provide anonymity among other
features
Freenet
n Data flows in reverse path of query
n Impossible to know if a user is initiating or forwarding a query
n Impossible to know if a user is consuming or forwarding data

n Smart queries
n Requests get
routed to
correct peer
by
incremental
discovery
Structured P2P
n Second generation P2P overlay networks

n Self-organizing
n Load balanced
n Fault-tolerant

n Scalable guarantees on numbers of hops to answer a


query
n Major difference with unstructured P2P systems

n Based on a distributed hash table interface


Distributed Hash Tables (DHT)
n Distributed version of a hash table data structure
n Stores (key, value) pairs
n The key is like a filename
n The value can be file contents

n Goal: Efficiently insert/lookup/delete (key, value) pairs


n Each peer stores a subset of (key, value) pairs in the
system
n Core operation: Find node responsible for a key
n Map key to node
n Efficiently route insert/lookup/delete request to this node
DHT Generic Interface
n Node id: m-bit identifier (similar to an IP address)
n Key: sequence of bytes
n Value: sequence of bytes

n put(key, value)
n Store (key,value) at the node responsible for the key
n value = get(key)
n Retrieve value associated with key (from the
appropriate node)
DHT Applications
n Many services can be built on top of a DHT
interface
n File sharing
n Archival storage
n Databases
n Naming, service discovery
n Chat service
n Rendezvous-based communication
n Publish/Subscribe
DHT Desirable Properties
n Keys mapped evenly to all nodes in the
network
n Each node maintains information about only
a few other nodes
n Messages can be routed to a node
efficiently
n Node arrival/departures only affect a few
nodes
DHT Routing Protocols
n DHT is a generic interface

n There are several implementations of this interface


n Chord [MIT]
n Pastry [Microsoft Research UK, Rice University]
n Tapestry [UC Berkeley]
n Content Addressable Network (CAN) [UC Berkeley]

n SkipNet [Microsoft Research US, Univ. of Washington]


n Kademlia [New York University]
n Viceroy [Israel, UC Berkeley]
n P-Grid [EPFL Switzerland]
n Freenet [Ian Clarke]

n These systems are often referred to as P2P routing substrates or P2P


overlay networks
Chord API
n Node id: unique m-bit identifier
(hash of IP address or other unique ID)
n Key: m-bit identifier (hash of a sequence of bytes)
n Value: sequence of bytes

n API
n insert(key, value) store key/value at r nodes
n lookup(key)
n update(key, newval)
n join(n)
n leave()
Chord Identifier Circle
n Nodes organized in
an identifier circle
based on node
identifiers
n Keys assigned to
their successor node
in the identifier circle
n Hash function
ensures even
distribution of nodes
and keys on the
circle
Chord Finger Table
n O(logN)
table size

n ith finger
points to
first node
that
succeeds n
by at least
2i-1
Chord Key Location
n Lookup in
finger table
the furthest
node that
precedes
key

n Query
homes in on
target in
O(logN)
hops
Chord Properties
n In a system with N nodes and K keys, with
high probability
n each node receives at most K/N keys
n each node maintains info. about O(logN) other
nodes
n lookups resolved with O(logN) hops

n No delivery guarantees
n No consistency among replicas
n Hops have poor network locality
Network locality
n Nodes close on ring can be far in the
network.
To vu.nl
Lulea.se

OR-DSL N20
CMU
MIT
MA-Cable
Cisco

CA-T1
N40 Cornell

N41
CCI

N80
NYU
Aros
Utah

* Figure from http://project-iris.net/talks/dht-toronto-03.ppt


Pastry
n Similar interface to Chord

n Considers network locality to


minimize hops messages
travel

n New node needs to know a


nearby node to achieve
locality

n Each routing hop matches


the destination identifier by
one more digit
n Many choices in each hop
(locality possible)
CAN
n Based on a
d-dimensional
Cartesian
coordinate
space on a
d-torus

n Each node
owns a distinct
zone in the
space
n Each key
hashes to a
point in the
space
CAN Routing and Node Arrival
P2P Review
n Two key functions of P2P systems
n Sharing content
n Finding content

n Sharing content
n Direct transfer between peers
n All systems do this
n Structured vs. unstructured placement of data
n Automatic replication of data

n Finding content
n Centralized (Napster)
n Decentralized (Gnutella)
n Probabilistic guarantees (DHTs)
Conclusions
n P2P connects devices at the edge of the Internet

n Popular in industry
n Napster, Kazaa, etc. allow users to share data
n Legal issues still to be resolved

n Exciting research in academia


n DHTs (Chord, Pastry, etc.)
n Improve properties/performance of overlays

n Applications other than file sharing are being developed

You might also like