0% found this document useful (0 votes)
101 views90 pages

Peer-To-Peer (P2P) Databases: Chengxiang Zhai

This document provides an overview of peer-to-peer (P2P) databases. It defines P2P as a class of systems that employ distributed resources like storage, processing power, and content available across the edges of the internet in a decentralized manner without central control. Examples of P2P applications discussed include file sharing, distributed databases, computing, collaboration and games. Centralized and non-structured P2P systems like Napster and Gnutella are described. The document also discusses structured overlays like distributed hash tables and how they provide routing indirection to enable scalable and fault-tolerant querying in P2P networks.

Uploaded by

israel_carino
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views90 pages

Peer-To-Peer (P2P) Databases: Chengxiang Zhai

This document provides an overview of peer-to-peer (P2P) databases. It defines P2P as a class of systems that employ distributed resources like storage, processing power, and content available across the edges of the internet in a decentralized manner without central control. Examples of P2P applications discussed include file sharing, distributed databases, computing, collaboration and games. Centralized and non-structured P2P systems like Napster and Gnutella are described. The document also discusses structured overlays like distributed hash tables and how they provide routing indirection to enable scalable and fault-tolerant querying in P2P networks.

Uploaded by

israel_carino
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 90

Lecture 22:

Peer-to-Peer (P2P) Databases


Nov. 10, 2006

ChengXiang Zhai

Most slides are taken from the following presentations:


[Joe Hellerstein 04] http://db.cs.berkeley.edu/jmh/talks/vldb04-p2ptut-final.ppt
[Aline Viana et al. 03] http://www.euronetlab.com/seminar/viana_040703.ppt
[Ryan Huebsch 03] http://www.cs.berkeley.edu/~kubitron/courses/cs294-4-F03/slides/lec20-dbp2p.ppt

CS511 Advanced Database Management Systems 1


What is Peer-to-Peer (P2P)?
P2P is a class of applications that take advantage of
resources – storage, cycles, content, human presence –
available at the edges of the Internet.
Clay Shirky (www.shirky.com)

P2P refers to a class of systems and applications that


employ distributed resources to perform a critical
function in a decentralized manner.
Milojicic et al. (HP)

Napster? Gnutella? TVUPlayer?

http://www.tvunetworks.com/player/index.html
CS511 Advanced Database Management Systems 2
P2P Properties
• No central control, no central database
– Deployable in an ad-hoc fashion

• No hierarchy
– Every node is both a client and a server
– The communication between peers is symmetric

• No global view of the system (all local decisions)


• Peers are autonomous
• System globally unreliable
– Robustness and security issues

CS511 Advanced Database Management Systems 3


Examples of p2p usage
• File-sharing applications
• Distributed databases
• Distributed computing
• Collaboration
• Distributed games
• Ad hoc networks
• Application-level multicast
• Etc.
CS511 Advanced Database Management Systems 4
Basic P2P

CS511 Advanced Database Management Systems 5


Centralized model (Napster)
• File-sharing system
• Almost distributed system
– The location of a document is centralized
– The "transfer" is peer-to-peer

• Problems
– Robustness
– Scalability

CS511 Advanced Database Management Systems 6


Centralized model (Napster)

location
server

Register x

INTERNET
 Document x!
OK:
Document
Peer Z x?
IP = a.b.c.d
x

CS511 Advanced Database Management Systems 7


Non-structured system (Gnutella-like)
• Two phases (like Napster)
– Localization + exchange

• No server
• Open source
– gnutella.wego.com

• Distributed search
– The query is flooded
– Loop avoidance

CS511 Advanced Database Management Systems 8


Gnutella

CS511 Advanced Database Management Systems 9


Lessons and Limitations
• Client-Server performs well
– But not always feasible
• Ideal performance is often not the key issue!

• Things that flood-based systems do well


– Organic scaling
– Decentralization of visibility and liability
– Finding popular stuff
– Fancy local queries

• Things that flood-based systems do poorly


– Finding unpopular stuff [Loo, et al VLDB 04]

– Fancy distributed queries


– Vulnerabilities: data poisoning, tracking, etc.
– Guarantees about anything (answer quality, privacy, etc.)

CS511 Advanced Database Management Systems 10


Gossip Protocols (Epidemic Algorithms)
• Originally targeted at database replication [Demers, et al. PODC ‘87]

– Especially nice for unstructured networks


– Rumor-mongering: propagate newly-received update to k random neighbors

• Extended to routing
– Point-to-point routing [Vahdat/Becker TR, ‘00]
– Rumor-mongering of queries instead of flooding [Haas, et al Infocom ‘02]

• Extended to aggregate computation [Kempe, et al, FOCS 03]

• Mostly theoretical analyses


– Usually of two forms:
• What is the “tipping point” where an epidemic infects the whole population? (Percolation
theory)
• What is the expected # of messages for infection?

• A Cornell specialty
– Demers, Kleinberg, Gehrke, Halpern, …

CS511 Advanced Database Management Systems 11


Why P2P Databases?

CS511 Advanced Database Management Systems 12


Infecting the Network, Peer-to-Peer
• The Internet is hard to change.
• But Overlay Nets are easy!
– P2P is a wonderful “host” for infecting network designs
– The “next” Internet is likely to be very different
• “Naming” is a key design issue today
• Querying and data independence key tomorrow?

• Don’t forget:
– The Internet was originally an overlay on the telephone network
– There is no money to be made in the bit-shipping business
• A modest goal for DB research:
– Don’t query the Internet.

CS511 Advanced Database Management Systems 13


Infecting the Network, Peer-to-Peer

Be the Internet.

• A modest goal for DB research:


– Don’t query the Internet.

CS511 Advanced Database Management Systems 14


Why Databases?
• The problem is placement and retrieval of data…
that would be a data management (or DB) problem
• P2P world is lacking
– Semantics
– Data transformation
– Data relationships

• All of which are core strengths of the DB community


• P2P brings a new environment for DB query
processing systems
– increased scalability, reliability, and performance
CS511 Advanced Database Management Systems 15
Some of the p2p DB groups
• PIER
– http://pier.cs.berkeley.edu
• Stanford Peers
– http://www-db.stanford.edu/peers/
• P-Grid
– http://www.p-grid.org/ (EPFL)
• Pepper
– http://www.cs.cornell.edu/database/pepper/pepper.htm
• BestPeer (PeerDB)
– http://xena1.ddns.comp.nus.edu.sg/p2p/
• Hyperion
– http://www.cs.toronto.edu/db/hyperion/
• Piazza
– http://data.cs.washington.edu/p2p/piazza/

CS511 Advanced Database Management Systems 16


PIER
• Peer-to-Peer Information Exchange & Retrieval
– Aggressively uses DHTs

• Deployed
– Running  queries on ~400 nodes around the world (PlanetLab)
– Simulated on up to 10K nodes

• Current Applications
– Improved Filesharing
– Internet Monitoring ()
– Customizable Routing via Recursive Queries

CS511 Advanced Database Management Systems http://pier.cs.berkeley.edu 17


 Vision: Network Oracle
• Suppose there existed a Network Oracle
– Answering questions about current Internet state
• Routing tables, link loads, latencies, firewall events, etc.

– How would this change things


• Social change (Public Health, safe computing)
• Medium term change in distributed application design
– Currently distributed apps do some of this on their own
• Long term change in network protocols
– App-specific custom routing
– Fault diagnosis
– Etc.

CS511 Advanced Database Management Systems 18


 : Public Health for the Internet
• Security tools focused on “medicine”
– Vaccines for Viruses
– Improving the world one patient at a time

• Weakness/opportunity in the “Public Health” arena


– Public Health: population-focused, community-oriented
– Epidemiology: incidence, distribution, and control in a population

  A New Approach
– Perform population-wide measurement
– Enable massive sharing of data and query results
• The “Internet Screensaver”
– Engage end users: education and prevention
– Understand risky behaviors, at-risk populations.

• Prototype running over PIER

CS511 Advanced Database Management Systems 19


Routing: Overlay networks

Overlay

IP

CS511 Advanced Database Management Systems 20


Routing: Overlay networks

Overlay

IP

CS511 Advanced Database Management Systems 21


Structured Overlays:
Distributed Hash Tables
(DHTs)

CS511 Advanced Database Management Systems 22


High-Level Idea: Indirection
• Indirection in space
– Logical (content-based) IDs, routing to those IDs
• “Content-addressable” network
y
– Tolerant of churn h =y
to h
• nodes joining and leaving the network z

CS511 Advanced Database Management Systems 23


High-Level Idea: Indirection
• Indirection in space
– Logical (content-based) IDs, routing to those IDs
• “Content-addressable” network
– Tolerant of churn h =z
to h
• nodes joining and leaving the network z

• Indirection in time
– Want some scheme to temporally decouple send and receive
– Persistence required. Typical Internet solution: soft state
• Combo of persistence via storage and via retry

• Metaphor: Distributed Hash Table

CS511 Advanced Database Management Systems 24


What is a DHT?
• Hash Table
– data structure that maps “keys” to “values”
– essential building block in software systems

• Distributed Hash Table (DHT)


– similar, but spread across the Internet

• Interface
– insert(key, value)
– lookup(key)

CS511 Advanced Database Management Systems 25


How?

Every DHT node supports a single operation:

– Given key as input; route messages toward node


holding key

CS511 Advanced Database Management Systems 26


DHT in action
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V

CS511 Advanced Database Management Systems 27


DHT in action
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V

CS511 Advanced Database Management Systems 28


DHT in action
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V

Operation: take key as input; route messages to node holding key


CS511 Advanced Database Management Systems 29
DHT in action: put()
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V
insert(K1,V1)

Operation: take key as input; route messages to node holding key


CS511 Advanced Database Management Systems 30
DHT in action: put()
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V
insert(K1,V1)

Operation: take key as input; route messages to node holding key


CS511 Advanced Database Management Systems 31
DHT in action: put()
(K1,V1) K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V

Operation: take key as input; route messages to node holding key


CS511 Advanced Database Management Systems 32
DHT in action: get()
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V

retrieve (K1)
Operation: take key as input; route messages to node holding key
CS511 Advanced Database Management Systems 33
Iterative vs. Recursive Routing
Previously showed recursive.
Another option: iterative
K V
K V

K V
K V

K V

K V
K V

K V

K V
K V
K V

retrieve (K1)
Operation: take key as input; route messages to node holding key
CS511 Advanced Database Management Systems 34
DHT Design Goals
• An “overlay” network with:
– Flexible mapping of keys to physical nodes
– Small network diameter
– Small degree (fanout)
– Local routing decisions
– Robustness to churn
– Routing flexibility
– Decent locality (low “stretch”)

• A “storage” or “memory” mechanism with


– No guarantees on persistence
– Maintenance via soft state

CS511 Advanced Database Management Systems


An Example DHT: Chord
• Assume n = 2 m nodes for a moment
– A “complete” Chord ring
– We’ll generalize shortly

CS511 Advanced Database Management Systems


An Example DHT: Chord

CS511 Advanced Database Management Systems


An Example DHT: Chord

CS511 Advanced Database Management Systems


An Example DHT: Chord
• Overlayed 2 -Gons k

CS511 Advanced Database Management Systems


Routing in Chord
• At most one of each Gon
• E.g. 1-to-0

CS511 Advanced Database Management Systems


Routing in Chord
• At most one of each Gon
• E.g. 1-to-0

CS511 Advanced Database Management Systems


Routing in Chord
• At most one of each Gon
• E.g. 1-to-0

CS511 Advanced Database Management Systems


Routing in Chord
• At most one of each Gon
• E.g. 1-to-0

CS511 Advanced Database Management Systems


Routing in Chord
• At most one of each Gon
• E.g. 1-to-0

CS511 Advanced Database Management Systems


Routing in Chord
• At most one of each Gon
• E.g. 1-to-0
• What happened? 2
– We constructed the
binary number 15! 4
8
– Routing from x to y
is like computing
y - x mod n by 1
summing powers of 2
Diameter: log n (1 hop per gon type)
Degree: log n (one outlink per gon type)
CS511 Advanced Database Management Systems
Content-Addressable Networks (CAN)
• Provides a large scale distributed hash table
– Keys are mapped into values

• CAN defines a d-dimensional virtual space


– No relationship with the physical space

• The virtual space is completely distributed among the


peers
– Each peer is responsible for one share of the space
– The peer that is responsible for region R is also
responsible for the values inside R

• Documents must be uniquely identified


CS511 Advanced Database Management Systems 46
Example

CS511 Advanced Database Management Systems 47


Example

CS511 Advanced Database Management Systems 48


Example

1 2

CS511 Advanced Database Management Systems 49


Example

CS511 Advanced Database Management Systems 50


Example

CS511 Advanced Database Management Systems 51


Example

4
4

1
5

CS511 Advanced Database Management Systems 52


Example

1 4

6 5

CS511 Advanced Database Management Systems 53


Example

1 7 4

6 5

CS511 Advanced Database Management Systems 54


Association ID  node

1 7 4

6 5

Ex: Node 3
holds this
3
document

CS511 Advanced Database Management Systems 55


Network Data Independence

CS511 Advanced Database Management Systems


SIGMOD Record, Sep. 2003 56
Recall Codd’s Data Independence
• Decouple app-level API from data organization
– Can make changes to data layout without modifying
applications

– Simple version: location-independent names

– Fancier: declarative queries

CS511 Advanced Database Management Systems 57


The Pillars of Data Independence
• Indexes DBMS
– Value-based lookups have to
compete with direct access B-Tree
– Must adapt to shifting data
distributions
– Must guarantee performance

• Query Optimization
Join Ordering,
– Support declarative queries
beyond lookup/search AM Selection,
– Must adapt to shifting data etc.
distributions
– Must adapt to changes in
environment

CS511 Advanced Database Management Systems 58


The Pillars of Data Independence
• Indexes DBMS P2P
– Value-based lookups have to
compete with direct access B-Tree Content-
– Must adapt to shifting data Addressable
distributions Overlay
– Must guarantee performance Networks
(DHTs)
• Query Optimization
Join Ordering, Multiquery
– Support declarative queries
beyond lookup/search AM Selection, dataflow
– Must adapt to shifting data etc. sharing?
distributions
– Must adapt to changes in
environment

CS511 Advanced Database Management Systems 59


Complex Query Processing

CS511 Advanced Database Management Systems 60


DHTs Gave Us Equality Lookups
• What else might we want?
– Range Search
– Aggregation
– Group By
– Join
– Intelligent Query Dissemination

• Theme
– All can be built elegantly on DHTs!
• This is the approach taken in PIER
– But in some instances other schemes are also reasonable

CS511 Advanced Database Management Systems 61


Range Search
• Numerous proposals in recent years
– Chord w/o hashing, + load-balancing [Karger/Ruhl
SPAA ‘04, Ganesan/Bawa VLDB ‘04]
– Mercury [Bharambe, et al. SIGCOMM ‘04]. Specialized
“small-world” DHT.
– P-tree [Crainiceanu et al. WebDB ‘04]. A “wrapped” B-
tree variant.
– P-Grid [Aberer, CoopIS ‘01]. A distributed trie with
random links.
–…

CS511 Advanced Database Management Systems 62


Aggregation
• Two key observations for DHTs
– DHTs are multi-hop, so hierarchical aggregation can
reduce BW
– DHTs provide tree construction in a very natural way

CS511 Advanced Database Management Systems 63


Consider Aggregation in Chord
• Everybody sends their
message to node 0
• Assume greedy jumps
(increasing Gon-order)
• Intercept messages and
aggregate along the way

CS511 Advanced Database Management Systems


Consider Aggregation in Chord
• Everybody sends their
message to node 0
• Assume greedy jumps
(increasing Gon-order)
• Intercept messages and
aggregate along the way

CS511 Advanced Database Management Systems


Consider Aggregation in Chord
• Everybody sends their
message to node 0
• Assume greedy jumps
(increasing Gon-order)
• Intercept messages and
aggregate along the way

CS511 Advanced Database Management Systems


So what if I don’t have a DHT?
• Need another tree-construction mechanism
– There are many in the NW literature (e.g. for multicast)
– Require maintenance messages akin to DHTs
• Do you maintain for the life of your query engine? Or setup/teardown as
needed?

• Can pick a tree shape of your own


– Not at the mercy of the DHT topologies
– E.g. could do high fan-in trees to minimize latency
• Or, can do aggregation via gossip [Kempe, et al FOCS ‘03]

CS511 Advanced Database Management Systems 67


Group By
• A piece of cake in a DHT
– Every node sends tuples toward the hash ID of the
grouping columns
– An agg tree is naturally constructed per group

• Note nice dual-purpose use of DHT


– Hash-based partitioning for parallel group by
• Just like parallel DBMS (Gamma, the Exchange op in Volcano)
– Agg tree construction in multi-hop overlay network

CS511 Advanced Database Management Systems 68


Hash Join
• We just did hash-based group by.
• Hash-based join is roughly the same deal, twice:
– Given R.a Join S.b
– Each node:
• sends each R tuple toward H(R.a)
• sends each S tuple toward H(S.b)

• Again, DHT gives


– Hash-based partitioning for parallel hash join
– Tree construction (no reduction along the way here, though)

• Note the resulting communication pattern


– A tree is constructed per hash destination!
• That’s a lot of trees!
• No big deal for the DHT -- it already had that topology there.

CS511 Advanced Database Management Systems 69


Query Dissemination
• How do nodes find out about a query?
• Case 1: Broadcast
– All nodes need to participate
– Need to have a broadcast tree out of the query node
– This is the opposite of an aggregation tree!
• But how to instantiate it?

• Naïve solution: Flood


– Each nodes sends query to all its neighbors
– Problem: nodes will receive query multiple times
• wasted bandwidth

CS511 Advanced Database Management Systems 70


Security and Trust

CS511 Advanced Database Management Systems 71


Trustworthy P2P
• Many challenges here. Examples:
– Authenticating peers
– Authenticating/validating data
• Stored (poisoning) and in flight
– Ensuring communication
– Validating distributed computations
– Avoiding Denial of Service
• Ensuring fair resource/work allocation
– Ensuring privacy of messages
• Content, quantity, source, destination
– Abusing the power of the network
– …

CS511 Advanced Database Management Systems 72


Free Riders
• Filesharing studies
– Lots of people download
– Few people serve files

• Is this bad?
– If there’s no incentive to serve, why do people do so?
– What if there are strong disincentives to being a major
server?

CS511 Advanced Database Management Systems 73


Simple Solution: Threshholds
• Many programs allow a threshhold to be set
– Don’t upload a file to a peer unless it shares > k files

• Problems:
– What’s k?
– How to ensure the shared files are interesting?

CS511 Advanced Database Management Systems 74


BitTorrent
• Server-based search
– suprnova.org, chat rooms, etc. serve “.torrent” files
• metadata including “tracker” machine for a file

• Bartered “Tit for Tat” download bandwidth


– Download one (random) chunk from a storage peer, slowly
– Subsequent chunks bartered with concurrent downloaders
• As tracked by the tracker for the file
– The more chunks you can upload, the more you can download
• Download speed starts slow, then goes fast
– Great for large files
• Mostly videos, warez

CS511 Advanced Database Management Systems 75


One Slide on Game Theory
• Typical game theory setup
– Assume self-interested (selfish) parties, acting autonomously
– Define some benefit & cost functions
– Parties make “moves” in the game
• With resulting costs and benefits for themselves and others
– A Nash equilibrium:
• A state where no party increases its benefit by moving
• Note:
– Equilibria need not be unique nor equal
– Time to equilibrium is an interesting computational twist

• Mechanism Design
– Design the states/moves/costs/benefits of a game
– To achieve particular globally-acceptable equilibria
• I.e. selfish play leads to global good

CS511 Advanced Database Management Systems 76


DAMD P2P!
• Distributed Algorithmic Mechanism Design (DAMD)
– A natural approach for P2P
• An Example: Fair-share storage [Ngan, et al., Fudico04]
– Every node n maintains a usage record:
• Advertised capacity
• Hosted list of objects n is hosting (nodeID, objID)
• Published list of objects people host for n (nodeID, objID)
– Can publish if capacity - p⋅∑(published list) > 0
• Recipient of publish request should check n’s usage record
– Need schemes to authenticate/validate usage records
• Selfish Audits: n periodically checks that the elements of its hosted list appear
in published lists of publishers
• Random Audits: n periodically picks a peer and checks all its hosted list items

CS511 Advanced Database Management Systems 77


Secure Routing in DHTs
• The “Sybil” attack [Douceur, IPTPS 02]
– Register many times
with multiple identities
– Control enough of
the space to capture
particular traffic

CS511 Advanced Database Management Systems 78


Squelching Sybil
• Certificate authority
– Centralize one thing: the signing of ID certificates
• Central server is otherwise out of the loop
– Or have an “inner ring” of trusted nodes do this
• Using practical Byzantine agreement protocols [Castro/Liskov OSDI ‘01]

• Weak secure IDs


– ID = SHA-1(IP address)
– Assume attacker controls a modest number of nodes
– Before routing through a node, challenge it to produce the right
IP address
• Requires iterative routing

CS511 Advanced Database Management Systems 79


Piazza

• Peers form small groups called spheres of


cooperation.
– May follow administrative boundaries
– Spheres of cooperation are nested

• Query Optimization problems:


– Exploit commonalities between queries
– Decide where to place data
– What queries to materialize (store answers)

• To make the problem tractable, optimization occurs


within a sphere of cooperation.
CS511 Advanced Database Management Systems 80
Piazza II

CS511 Advanced Database Management Systems 81


Piazza III

• Propagating Information
– Node advertises its materialized views to its neighbors
– Nodes consolidate info they receive and propagate
– Type of gossiping protocol

• Consolidating Queries
– Some queries can not be evaluated if data is not locally
available
– Broadcast all un-evaluatable queries to local sphere of
cooperation, and try to answer them collectively

CS511 Advanced Database Management Systems 82


Data Placement Problem
• Setup
– Set of cooperating nodes (no adversaries)
– Bottlenecks: network, CPU, or memory
– Nodes serve four roles
• Data Origin – producers
• Storage Provider
• Query Evaluator
• Query Initiator – consumers
– Cost of query = Origin or Storage  Evaluator
+ Evaluator  Initiator
CS511 Advanced Database Management Systems 83
Design Choices
• Score of decision making
– Global (hard, optimal) or local (easy, short-sided)
– Similar to multi-query optimization

• Extent of knowledge sharing


– Knowledge of materialized views on other nodes (a
catalog)
– Centralized or distributed? Hierarchical (like DNS)?

• Heterogeneity of information sources


– Few authoritative sources, lots of data producers
– Heterogeneous data  different schemas

CS511 Advanced Database Management Systems 84


Design Choices II
• Dynamicity of participants
– Node churn
– Some nodes act like servers, some like
workstations
– Could place all data on servers  reduced
flexibility and performance
• Data granularity
– Atomic granularity  indivisible objects (complete
file)
– Hierarchical granularity  groups (albums,
directories)
– Value based granularity  Objects composed of
atomic value (tuples composed of values)
CS511 Advanced Database Management Systems 85
Design Choices III
• Degrees of replication
– One copy all the way to fully replicated
– More replicas make updates harder
– Also makes retrieval harder (more choices)
– Consistency is harder, typical solution is to have a
master replica
• Freshness and update consistency
– Invalidation messages, pushed by server on
update or pulled by client on request
– Timeout based, lower overhead, looser guarantees
about freshness and consistency

CS511 Advanced Database Management Systems 86


Metareferences
• Your favorite search engine should find the inline refs
• Project IRIS has a lot of participants’ papers online
– http://www.project-iris.org

• IEEE Distributed Systems Online


– http://dsonline.computer.org/os/related/p2p/

• O’Reilly OpenP2P
– http://www.openp2p.com

• Karl Aberer’s ICDE 2002 tutorial


– http://lsirpeople.epfl.ch/aberer/Talks/ICDE2002-Tutorial.pdf

• Ross/Rubenstein InfoCom 2003 tutorial


– http://cis.poly.edu/~ross/tutorials/P2PtutorialInfocom.pdf

• PlanetLab
– http://www.planet-lab.org

• OpenDHT
– http://www.opendht.org

CS511 Advanced Database Management Systems 87


PlanetLab

• Consortium of academia and industry


– Catalyzed by Intel Research in 2002
– Now hosted at Princeton U
– 25% of SOSP ‘03 papers used PlanetLab

• DB folks should get more involved!


CS511 Advanced Database Management Systems 88
OpenDHT
• A shared DHT service
– The Bamboo DHT
– Hosted on PlanetLab
– Simple RPC API
– You don’t need to deploy or
host to play with a real DHT!

• A playground for killer apps?


– Needn’t be as big as PIER!
– Example: FreeDB
replacement

• Research in sharing DHT svc!

CS511 Advanced Database Management Systems 89


The DB Community Has Much to Offer
• Complex (multi-operator) queries & optimization
– NW folks have tended to build single-operator “systems”
• E.g. aggregation only, or multi-d range-search only
– Adaptivity required
• But may not look like adaptive QP in databases…

• Declarative language semantics


– Deal with streaming, clock jitter and soft state!

• Data reduction techniques


– For visualization, approximate query processing

• Bulk-computation workloads
– Quite different from the ones the NW and systems folks envision

• Recursive query processing


– The network is a graph!

CS511 Advanced Database Management Systems 90

You might also like