0% found this document useful (0 votes)
425 views71 pages

Moderndbs

This document provides an overview of modern database options for Ruby developers, including relational databases like PostgreSQL and MySQL, NoSQL databases like MongoDB, Cassandra, and HBase, and document databases like CouchDB. It discusses the benefits of each type of database and provides Ruby library recommendations for interacting with each one programmatically.

Uploaded by

eric8855
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
425 views71 pages

Moderndbs

This document provides an overview of modern database options for Ruby developers, including relational databases like PostgreSQL and MySQL, NoSQL databases like MongoDB, Cassandra, and HBase, and document databases like CouchDB. It discusses the benefits of each type of database and provides Ruby library recommendations for interacting with each one programmatically.

Uploaded by

eric8855
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Modern

Databases
for Ruby Folks Like You (and me) Eric Redmond @coderoshi crudcomic.com

Database Genres

Rela6onal

Graph

Columnar

Key/Value

Document

ECOSYSTEM

h:p://blogs.the451group.com/informaCon_management/2011/04/15/nosql-newsql-and-beyond/

h:p://blogs.the451group.com/informaCon_management/2011/04/15/nosql-newsql-and-beyond/

HISTORY

h:p://blogs.the451group.com/informaCon_management/2011/04/20/necessity-is-the-mother-of-nosql/

Our Fine Selec6on


MySQL PostgreSQL Cassandra HBase MongoDB CouchDB
Rela6onal Columnar

Neo4j FlockDB Riak Memcached Kyoto Cabinet Redis

Document Graph

Key/Value

PostgreSQL MySQL Drizzle VoltDB

RELATIONAL MODELS

Rela6onal Models
PostgreSQL (full featured ber OSS)
hQp://bitbucket.org/ged/ruby-pg hQp://github.com/Casecommons/pg_search hQp://github.com/tenderlove/tex6cle hQp://rubygems.org/gems/mysql <= turd hQp://github.com/brianmario/mysql2 hQp://github.com/oldmoe/mysqlplus hQp://drizzle.org/

MySQL (lighter barely OSS)

Drizzle (lightest)

VoltDB (largest scale)


hQp://voltdb.com/

To the ORM-obsessed
SELECT *, cube_distance(ranks, '1,0,0') dist FROM movie_genres WHERE cube_enlarge('(1,0,0)'::cube, 0, 3) @> ranks ORDER BY dist;

Hbase Hypertable Cassandra

BIGTABLE/COLUMNAR

Bigtable/Columnar
HBase
Google-licious Bigtable hQp://github.com/CompanyBook/massive_record hQp://github.com/greglu/hbase-stargate (slow) hQp://rubygems.org/gems/thrif

Hypertable Cassandra

HQL. Very queryable hQp://hypertable.org/ Hybrid. Node architecture like dynamo data structure like BigTable w/ column families hQp://github.com/fauna/cassandra hQp://github.com/NZKoz/cassandra_object

Columns
row keys column family column family

row

"a key"

column: "value" column: "value" column: "value"

column: "value"

row

"a key"

column: "value" column: "value"

launch_hbase.rb
require 'hbase' class Object include Apache::Hadoop::Hbase::Thrif def thri8 unless dened?(@@hclient) @@tsocket = Thrif::Socket.new( '127.0.0.1', 9090 ) @@Qransport = Thrif::BueredTransport.new( @@tsocket ) @@tprotocol = Thrif::BinaryProtocol.new( @@Qransport ) @@hclient = Hbase::Client.new( @@tprotocol ) end @@Hransport.open yield @@hclient ensure @@Hransport.close end end

Hbase Migra6on
class CreateWikis < Ac6veRecord::Migra6on def self.up thrif do |hbase| hbase.createTable( 'wiki', [ ColumnDescriptor.new(:name => 'text:', :maxVersions=>10), ColumnDescriptor.new(:name => 'Ktle:') ]) end end

def self.all( start = '' ) wikis = [] thrif do |hbase| scanner = hbase.scannerOpen( 'wiki', start, ['6tle:', 'text:'] ) while (row = hbase.scannerGet(scanner) ).present? row.each do |v| wikis << Wiki.new( :6tle => v.columns['Ktle:'].value, :text => v.columns['text:'].value) end end end wikis end

wiki.rb

wiki.rb
def self.nd(6tle) thrif do |hbase| hbase.getRow('wiki', 6tle).each do |v| return Wiki.new( :6tle => v.columns['6tle:'].value, :text => v.columns['text:'].value ) end end end

wiki.rb
def history historical_text = [] thrif do |hbase| hbase.getVer( 'wiki', 6tle, 'text:', 10 ).each do |v| historical_text << v.value.dup end end historical_text end

Thrif!
hQp://thrif.apache.org/

Use thrif to interact with both, and the ques6on of which to use is largely an architectural/ecosystem considera6on
Google: Hbase Reddit: Cassandra Baidu: Hypertable

HBase Benets
Strong (and exible) columnar schema Sequen6al Reads and Column Versioning Mapreduce via Hadoop integra6on Consistent (congurable to Available) Great for Wide Area Networks (Google, Facebook)

Cassandra Benets
Sequen6al Reads of Ordered Keys (scannable) Columnar schema Built-in versioning Available (congurable to Consistent) Op6mized for hundreds of nodes (Digg, Twi:er)

MongoDB CouchDB RavenDB

DOCUMENT

Document Datastore
MongoDB
hQp://github.com/mongodb/mongo-ruby-driver hQp://mongoid.org/

CouchDB
hQp://github.com/couchrest/couchrest_model hQp://github.com/peritor/simply_stored hQp://6lgovi.github.com/couchdb-lounge

RavenDB
Ruby Driver??

Document
{ "_id" : ObjectId("4db7ca268e236e5bf9a52224"), "_rev" : "2612672603", "name" : "Sant Juli de Lria", "country" : "AD", "6mezone" : "Europe/Andorra", "popula6on" : 8022, "loca6on" : { "la6tude" : 42.46372, "longitude" : 1.49129 }

Document Principle
Schemas are awesome un6l they arent

Mongo Cluster
clients mongos cong (mongod) cong (mongod) cong (mongod) replica set

shard1 (mongod) shard1 (mongod) shard1 (mongod) replica set

shard2 (mongod) shard2 (mongod) shard2 (mongod) replica set

shard3 (mongod) shard3 (mongod) shard3 (mongod) replica set

Couch
Couch has a heavy reliance on map/reduce to create views. If you want ad-hoc queries Futon (web console) BigCouch (Dynamo-style NRW) Lounge (clustering, sharding)

Lounge
clients lounge

shard1 (couchdb) shard1 (couchdb) shard1 (couchdb) replication

shard2 (couchdb) shard2 (couchdb) shard2 (couchdb) replication

shard3 (couchdb) shard3 (couchdb) shard3 (couchdb) replication

Big Couch
N=3 Q=12 Node A Node C
vn od 2 e1

vnode 1 vn Node od e2

B Node C

Node B

160

2 2
0

...

1i

num ncre bers asin g

vno de

3 vnode 4 vn od e5

Node A

Node A

Node C Node B

80

Node B

...
Node C Node A

Relax
$ sudo couchdb Apache CouchDB 1.0.1 (LogLevel=info) is star6ng. Apache CouchDB has started. Time to relax. [info] [<0.31.0>] Apache CouchDB has started on hQp://127.0.0.1:5984/ $ curl hQp://127.0.0.1:5984/ {"couchdb":"Welcome","version":"1.0.1"}

Mapreduce
func6on map(doc) { // [{smith : 1}, {smith : 1}, {smith : 1} ] emit(doc.last_name, 1); } func6on reduce(key, values, rereduce) { // {smith : 35, smyth : 2, } return sum(values); }

Mapreduce (in Ruby)


# get all rooms rooms = Room.all # convert array of rooms to array of capaciCes caps = rooms.map{|room| room.capacity } # rails shortcut: caps = rooms.map(&:capacity) # start with 0, then sum each value in the array result = caps.reduce(0){|sum, capacity| sum+capacity}

Re-Reduce
db.runCommand({'mapReduce'...}) mongod 1 mongod 2

reduce

reduce

reduce

map

map

map

map

map

map

Mongo v Couch
Consistency Focused
Master/Slave

Availability Focused
Master/Master

Ad-hoc queries Comfortable to SQL users Built to run on clusters Capped collec6ons, very fast writes

Mapreduce views Comfortable to client/ server authors Runs on nearly anything Change API, long polling reads

What about Raven?


Consistency Focused
Master/Slave

LINQ queries Comfortable to .NET developers Built to run on clusters Good security (kerberos), reports (index replica6on)

Riak Memcached Kyoto Cabinet Voldemort Redis

KEY/VALUE

Dynamo K/V Style


Riak
PreQy documenty Risky ORM
hQps://github.com/aphyr/risky

Ripple
hQp://seancribbs.github.com/ripple

Riak Session
hQp://rubygems.org/gems/riak-sessions

Consistently Hashed Cluster


Node A Node C Node B
vnode 1 vn Node 12 od e e2 od n vn

N=3 Q=12

B Node C

160

2 2
0

...

1i

um ncre bers asin g

vno

3 vnode 4 vno d de e5

Node A

Node A

Node C

80

Node B

...

N/R/W
CAP cant be beat but it can be tweaked N/R/W
N = Nodes to write to (per bucket) W = Nodes wriQen to before success R = Nodes read from before success

What does this mean?


Support both CP and AP in one database

Quorum
version: B version: [B, A]

W=2

R=2

N=3

version: B version: B version: A

Key/Value Stores
Memcached
Very fast

Kyoto Cabinet

Redis

Key/Value Stores
Memcached
Very fast. Dont use it

Kyoto Cabinet

Redis

Key/Value Stores
Memcached
Very fast. Dont use it

Kyoto Cabinet
Very durable data structure store

Redis

Key/Value Stores
Memcached
Very fast. Dont use it

Kyoto Cabinet
Very durable data structure store. Dont use it

Redis

Key/Value Stores
Memcached
Very fast. Dont use it

Kyoto Cabinet
Very durable data structure store. Dont use it

Redis
hQp://redis.io/ Fast and Durable. Mul6ple types & simple syntax for complex opera6ons

Redis
Drivers/Projects
hQp://github.com/ezmobius/redis-rb hQp://github.com/nateware/redis-objects hQp://github.com/jodosha/redis-store hQp://github.com/defunkt/resque

Fun!
hQp://rediscookbook.org/ hQp://www.paperplanes.de/2010/2/16/ a_collec6on_of_redis_use_cases.html

Redis Knows Lists


RPUSH lunch "pizza RPUSH lunch "pie RPUSH lunch "juice LRANGE lunch 0 1
1. "pizza" 2. "pie

Redis Knows Hashes


HSET conferences rmc KC HSET conferences sxsw Aus6n HGET conferences rmc
KC

HGET conferences sxsw


"Aus6n"

Redis Knows Sets


SADD person "Eric" SADD person "Jim SMEMBERS person
1. "Eric" 2. "Jim"

SADD owns_pet "Eric" SINTER person owns_pet


1. "Eric"

Pub/Sub
SUBSCRIBE chat PUBLISH chat "hello there
1) "message" 2) "chat" 3) "hello there"

Redis is Fast
100,000 SETs / second 80,000 GETs / second

Neo4j GraphDB FlockDB

GRAPH

Graph Datastore
Neo4j
Neo4j.rb/Neography (runs in jruby)
hQp://github.com/andreasronge/neo4j hQp://github.com/maxdemarzi/neography

GraphDB
WriQen in .NET

FlockDB
hQp://github.com/twiQer/ockdb-client hQps://github.com/twiQer/ockdb

The Matrix

h:p://wiki.neo4j.org/content/The_Matrix

Neo4j Is ChaQy
Jruby Interface Cypher Gremlin SPARQL (for RDF data) Java, Clojure, Erlang, Groovy, PHP, Python, Ruby, Scala, REST

Gremlin on Neo4j
bacon = g.V[[name:'Kevin Bacon']].next() elvis = g.V[[name:'Elvis Presley']].next() (elvis.costars.loop(1){ it.loops < 4 & !it.object.equals(bacon) }.lter{ it.equals(bacon) }.paths >> 1).name.grep{it}
Elvis Presley Double Trouble Roddy McDowall The Big Picture Kevin Bacon

Use REST
%w{rubygems json cgi faraday}.each{|r| require r} # make a connec6on to the Neo4j REST server conn = Faraday.new(:url => 'hQp://localhost:7474/') do |builder| builder.adapter :net_hQp end # Gremlin script script = "g.V.name" r = conn.post("/db/data/ext/GremlinPlugin/graphdb/execute_script", JSON.unparse({"script" => script}), { 'Content-Type' => 'applica6on/json' }) p (JSON.parse(r.body) || {})['self'] if r.status == 200

FlockDB
Distributed Cannot do node traversals

Try Them All


Why not? Its a big decision.

Psst! Get a Mac


brew install mysql brew install postgresql brew install hbase brew install cassandra brew install riak brew install mongodb brew install couchdb brew install memcached brew install kyoto-cabinet brew install redis brew install neo4j

MySQL/ Postgres Hbase Cassandra Mongo Couch Neo4j FlockDB Riak Memcached /KC/Redis

CA CP AP CP AP CA AP AP AP

rela6onal columnar columnar

bank search engine SETI

document insurance document mobile interfaces graph graph key/value key/value genealogy social network huge catalog session data

Sites
hQp://nosql-database.org/
A great list

hQp://sevenweeks.org/
The book website

hQp://github.com/coderoshi/
The slides will be there

Papers
Brewers Conjecture and the Feasibility of Consistent, Available, Par66on-Tolerant Web Services Dynamo: Amazons Highly Available Key-value Store
labs.google.com/papers/bigtable-osdi06.pdf labs.google.com/papers/mapreduce.html people.csail.mit.edu/sethg/pubs/BrewersConjecture-SigAct.pdf allthingsdistributed.com/les/amazon-dynamo-sosp2007.pdf

Bigtable: A Distributed Storage System for Structured Data MapReduce: Simplied Data Processing on Large Clusters

Papers
PNUTS: Plaorm for Nimble Universal Table Storage Megastore: Providing Scalable, Highly Available Storage for Interac6ve Services Design and Evalua6on of a Con6nuous Consistency Model for Replicated Services Indexed Database API
hQp://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.34.7743&rep=rep1&type=pdf hQp://www.w3.org/TR/IndexedDB hQp://www.cidrdb.org/cidr2011/Papers/ CIDR11_Paper32.pdf hQp://research.yahoo.com/project/212

You might also like