Moderndbs
Moderndbs
Databases
for
Ruby
Folks
Like
You
(and
me)
Eric
Redmond
@coderoshi
crudcomic.com
Database Genres
Rela6onal
Graph
Columnar
Key/Value
Document
ECOSYSTEM
h:p://blogs.the451group.com/informaCon_management/2011/04/15/nosql-newsql-and-beyond/
h:p://blogs.the451group.com/informaCon_management/2011/04/15/nosql-newsql-and-beyond/
HISTORY
h:p://blogs.the451group.com/informaCon_management/2011/04/20/necessity-is-the-mother-of-nosql/
Document Graph
Key/Value
RELATIONAL MODELS
Rela6onal
Models
PostgreSQL
(full
featured
ber
OSS)
hQp://bitbucket.org/ged/ruby-pg
hQp://github.com/Casecommons/pg_search
hQp://github.com/tenderlove/tex6cle
hQp://rubygems.org/gems/mysql
<=
turd
hQp://github.com/brianmario/mysql2
hQp://github.com/oldmoe/mysqlplus
hQp://drizzle.org/
Drizzle (lightest)
To
the
ORM-obsessed
SELECT
*,
cube_distance(ranks,
'1,0,0')
dist
FROM
movie_genres
WHERE
cube_enlarge('(1,0,0)'::cube,
0,
3)
@>
ranks
ORDER
BY
dist;
BIGTABLE/COLUMNAR
Bigtable/Columnar
HBase
Google-licious
Bigtable
hQp://github.com/CompanyBook/massive_record
hQp://github.com/greglu/hbase-stargate
(slow)
hQp://rubygems.org/gems/thrif
Hypertable Cassandra
HQL. Very queryable hQp://hypertable.org/ Hybrid. Node architecture like dynamo data structure like BigTable w/ column families hQp://github.com/fauna/cassandra hQp://github.com/NZKoz/cassandra_object
Columns
row keys column family column family
row
"a key"
column: "value"
row
"a key"
launch_hbase.rb
require
'hbase'
class
Object
include
Apache::Hadoop::Hbase::Thrif
def
thri8
unless
dened?(@@hclient)
@@tsocket
=
Thrif::Socket.new(
'127.0.0.1',
9090
)
@@Qransport
=
Thrif::BueredTransport.new(
@@tsocket
)
@@tprotocol
=
Thrif::BinaryProtocol.new(
@@Qransport
)
@@hclient
=
Hbase::Client.new(
@@tprotocol
)
end
@@Hransport.open
yield
@@hclient
ensure
@@Hransport.close
end
end
Hbase
Migra6on
class
CreateWikis
<
Ac6veRecord::Migra6on
def
self.up
thrif
do
|hbase|
hbase.createTable(
'wiki',
[
ColumnDescriptor.new(:name
=>
'text:',
:maxVersions=>10),
ColumnDescriptor.new(:name
=>
'Ktle:')
])
end
end
def self.all( start = '' ) wikis = [] thrif do |hbase| scanner = hbase.scannerOpen( 'wiki', start, ['6tle:', 'text:'] ) while (row = hbase.scannerGet(scanner) ).present? row.each do |v| wikis << Wiki.new( :6tle => v.columns['Ktle:'].value, :text => v.columns['text:'].value) end end end wikis end
wiki.rb
wiki.rb
def
self.nd(6tle)
thrif
do
|hbase|
hbase.getRow('wiki',
6tle).each
do
|v|
return
Wiki.new(
:6tle
=>
v.columns['6tle:'].value,
:text
=>
v.columns['text:'].value
)
end
end
end
wiki.rb
def
history
historical_text
=
[]
thrif
do
|hbase|
hbase.getVer(
'wiki',
6tle,
'text:',
10
).each
do
|v|
historical_text
<<
v.value.dup
end
end
historical_text
end
Thrif!
hQp://thrif.apache.org/
Use
thrif
to
interact
with
both,
and
the
ques6on
of
which
to
use
is
largely
an
architectural/ecosystem
considera6on
Google:
Hbase
Reddit:
Cassandra
Baidu:
Hypertable
HBase
Benets
Strong
(and
exible)
columnar
schema
Sequen6al
Reads
and
Column
Versioning
Mapreduce
via
Hadoop
integra6on
Consistent
(congurable
to
Available)
Great
for
Wide
Area
Networks
(Google,
Facebook)
Cassandra
Benets
Sequen6al
Reads
of
Ordered
Keys
(scannable)
Columnar
schema
Built-in
versioning
Available
(congurable
to
Consistent)
Op6mized
for
hundreds
of
nodes
(Digg,
Twi:er)
DOCUMENT
Document
Datastore
MongoDB
hQp://github.com/mongodb/mongo-ruby-driver
hQp://mongoid.org/
CouchDB
hQp://github.com/couchrest/couchrest_model
hQp://github.com/peritor/simply_stored
hQp://6lgovi.github.com/couchdb-lounge
RavenDB
Ruby
Driver??
Document
{
"_id"
:
ObjectId("4db7ca268e236e5bf9a52224"),
"_rev"
:
"2612672603",
"name"
:
"Sant
Juli
de
Lria",
"country"
:
"AD",
"6mezone"
:
"Europe/Andorra",
"popula6on"
:
8022,
"loca6on"
:
{
"la6tude"
:
42.46372,
"longitude"
:
1.49129
}
Document
Principle
Schemas
are
awesome
un6l
they
arent
Mongo
Cluster
clients mongos cong (mongod) cong (mongod) cong (mongod) replica set
Couch
Couch
has
a
heavy
reliance
on
map/reduce
to
create
views.
If
you
want
ad-hoc
queries
Futon
(web
console)
BigCouch
(Dynamo-style
NRW)
Lounge
(clustering,
sharding)
Lounge
clients lounge
Big
Couch
N=3 Q=12 Node A Node C
vn od 2 e1
vnode 1 vn Node od e2
B Node C
Node B
160
2 2
0
...
1i
vno de
3 vnode 4 vn od e5
Node A
Node A
Node C Node B
80
Node B
...
Node C Node A
Relax
$
sudo
couchdb
Apache
CouchDB
1.0.1
(LogLevel=info)
is
star6ng.
Apache
CouchDB
has
started.
Time
to
relax.
[info]
[<0.31.0>]
Apache
CouchDB
has
started
on
hQp://127.0.0.1:5984/
$
curl
hQp://127.0.0.1:5984/
{"couchdb":"Welcome","version":"1.0.1"}
Mapreduce
func6on
map(doc)
{
//
[{smith
:
1},
{smith
:
1},
{smith
:
1}
]
emit(doc.last_name,
1);
}
func6on
reduce(key,
values,
rereduce)
{
//
{smith
:
35,
smyth
:
2,
}
return
sum(values);
}
Re-Reduce
db.runCommand({'mapReduce'...}) mongod 1 mongod 2
reduce
reduce
reduce
map
map
map
map
map
map
Mongo
v
Couch
Consistency
Focused
Master/Slave
Availability
Focused
Master/Master
Ad-hoc queries Comfortable to SQL users Built to run on clusters Capped collec6ons, very fast writes
Mapreduce views Comfortable to client/ server authors Runs on nearly anything Change API, long polling reads
LINQ queries Comfortable to .NET developers Built to run on clusters Good security (kerberos), reports (index replica6on)
KEY/VALUE
Ripple
hQp://seancribbs.github.com/ripple
Riak
Session
hQp://rubygems.org/gems/riak-sessions
N=3 Q=12
B Node C
160
2 2
0
...
1i
vno
3 vnode 4 vno d de e5
Node A
Node A
Node C
80
Node B
...
N/R/W
CAP
cant
be
beat
but
it
can
be
tweaked
N/R/W
N
=
Nodes
to
write
to
(per
bucket)
W
=
Nodes
wriQen
to
before
success
R
=
Nodes
read
from
before
success
Quorum
version: B version: [B, A]
W=2
R=2
N=3
Key/Value
Stores
Memcached
Very
fast
Kyoto
Cabinet
Redis
Key/Value
Stores
Memcached
Very
fast.
Dont
use
it
Kyoto
Cabinet
Redis
Key/Value
Stores
Memcached
Very
fast.
Dont
use
it
Kyoto
Cabinet
Very
durable
data
structure
store
Redis
Key/Value
Stores
Memcached
Very
fast.
Dont
use
it
Kyoto
Cabinet
Very
durable
data
structure
store.
Dont
use
it
Redis
Key/Value
Stores
Memcached
Very
fast.
Dont
use
it
Kyoto
Cabinet
Very
durable
data
structure
store.
Dont
use
it
Redis
hQp://redis.io/
Fast
and
Durable.
Mul6ple
types
&
simple
syntax
for
complex
opera6ons
Redis
Drivers/Projects
hQp://github.com/ezmobius/redis-rb
hQp://github.com/nateware/redis-objects
hQp://github.com/jodosha/redis-store
hQp://github.com/defunkt/resque
Fun!
hQp://rediscookbook.org/
hQp://www.paperplanes.de/2010/2/16/ a_collec6on_of_redis_use_cases.html
Pub/Sub
SUBSCRIBE
chat
PUBLISH
chat
"hello
there
1)
"message"
2)
"chat"
3)
"hello
there"
Redis
is
Fast
100,000
SETs
/
second
80,000
GETs
/
second
GRAPH
Graph
Datastore
Neo4j
Neo4j.rb/Neography
(runs
in
jruby)
hQp://github.com/andreasronge/neo4j
hQp://github.com/maxdemarzi/neography
GraphDB
WriQen
in
.NET
FlockDB
hQp://github.com/twiQer/ockdb-client
hQps://github.com/twiQer/ockdb
The Matrix
h:p://wiki.neo4j.org/content/The_Matrix
Neo4j
Is
ChaQy
Jruby
Interface
Cypher
Gremlin
SPARQL
(for
RDF
data)
Java,
Clojure,
Erlang,
Groovy,
PHP,
Python,
Ruby,
Scala,
REST
Gremlin
on
Neo4j
bacon
=
g.V[[name:'Kevin
Bacon']].next()
elvis
=
g.V[[name:'Elvis
Presley']].next()
(elvis.costars.loop(1){
it.loops
<
4
&
!it.object.equals(bacon)
}.lter{
it.equals(bacon)
}.paths
>>
1).name.grep{it}
Elvis
Presley
Double
Trouble
Roddy
McDowall
The
Big
Picture
Kevin
Bacon
Use
REST
%w{rubygems
json
cgi
faraday}.each{|r|
require
r}
#
make
a
connec6on
to
the
Neo4j
REST
server
conn
=
Faraday.new(:url
=>
'hQp://localhost:7474/')
do
|builder|
builder.adapter
:net_hQp
end
#
Gremlin
script
script
=
"g.V.name"
r
=
conn.post("/db/data/ext/GremlinPlugin/graphdb/execute_script",
JSON.unparse({"script"
=>
script}),
{
'Content-Type'
=>
'applica6on/json'
})
p
(JSON.parse(r.body)
||
{})['self']
if
r.status
==
200
FlockDB
Distributed
Cannot
do
node
traversals
MySQL/ Postgres Hbase Cassandra Mongo Couch Neo4j FlockDB Riak Memcached /KC/Redis
CA CP AP CP AP CA AP AP AP
document insurance document mobile interfaces graph graph key/value key/value genealogy social network huge catalog session data
Sites
hQp://nosql-database.org/
A
great
list
hQp://sevenweeks.org/
The
book
website
hQp://github.com/coderoshi/
The
slides
will
be
there
Papers
Brewers
Conjecture
and
the
Feasibility
of
Consistent,
Available,
Par66on-Tolerant
Web
Services
Dynamo:
Amazons
Highly
Available
Key-value
Store
labs.google.com/papers/bigtable-osdi06.pdf
labs.google.com/papers/mapreduce.html
people.csail.mit.edu/sethg/pubs/BrewersConjecture-SigAct.pdf
allthingsdistributed.com/les/amazon-dynamo-sosp2007.pdf
Bigtable: A Distributed Storage System for Structured Data MapReduce: Simplied Data Processing on Large Clusters
Papers
PNUTS:
Plaorm
for
Nimble
Universal
Table
Storage
Megastore:
Providing
Scalable,
Highly
Available
Storage
for
Interac6ve
Services
Design
and
Evalua6on
of
a
Con6nuous
Consistency
Model
for
Replicated
Services
Indexed
Database
API
hQp://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.34.7743&rep=rep1&type=pdf
hQp://www.w3.org/TR/IndexedDB
hQp://www.cidrdb.org/cidr2011/Papers/ CIDR11_Paper32.pdf
hQp://research.yahoo.com/project/212