NoSQL Introduction OpenWest
NoSQL Introduction OpenWest
in/11012
NoSQL Introduction
NoSQL Introduction
• Understand what NoSQL is and what it is not.
• Why would you want to use NoSQL within your project
and which NoSQL database would you utilize?
• Explore the relationships between NoSQL and RDBMS.
• Understand how to select between an RDBMs (MySQL
and PostgreSQL), Document Database (MongoDB), Key-
Value Store, Graph Database, and Columnar databases or
combinations of the above.
Thursday May 8th 2014, 3:00pm-3:50pm SB 139
Slides and Feedback at: http://joind.in/11012
2
NoSQL
• History
• Popular NoSQL Databases
• NoSQL Database Comparisons
• Terminology
• Consistency, Replication, Performance
• NoSQL Implementation CRUD Operations
4
Slides and Feedback at: http://joind.in/11012
NoSQL History
5 http://www.w3resource.com/mongodb/nosql.php
NoSQL History
• 1998 Carlo Strozzi Command Line Database
• June 11, 2009 Meetup
– Open Source, Distributed, Non-Relational DB
– Eric Evans (Rackspace)
– Johan Oskarsson (Last.fm)
6
NoSQL History
7
NoSQL History
• Bad name, but it stuck!
• Not a definitive term
• Generally, Newer databases solving new
and different problems
• Not Only SQL http://blog.sym-
link.com/2009/10/30/nosql_whats_in_a_name.html
10
Most Popular NoSQL
• MongoDB - Document Store
• Cassandra – Wide Column Store
• Solr – Search Engine
• Redis – Key-value store
• Hbase – Wide Column Store
• Memcached – Key-value Store
• CouchDB – Document Store
• Neo4j – Graph Database
• Riak – Key-value Store
• SimpleDB – Key-value Store within Amazon Cloud
13
NoSQL “Bleeding Edge”
• Several solutions are mature and stable
enough to run large scale production
environments
• Not all permutations have been considered
• Several (but not all) optimization strategies
have been published
• Crucial elements such as Security may be a
secondary add-on in favor of performance.
14
NoSQL “Bleeding Edge”
Sun Microsystems csh man page:
“Although robust enough for
general use, adventures into the
esoteric periphery of the C shell
may reveal unexpected quirks.”
17
Key-Value Stores
Key Value
Code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
18
Column Oriented Database
19
Neo4j
20
Document Oriented Database
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
21
Document Oriented Database
{ "facutly" :
[
{ {
"_id" : 1, "_id" : 2,
"name" : { "first" : "John", "last" : "Backus" }, "name" : { "first" : "David", "last" : "Williams" },
"contribs" : [ "Fortran", "ALGOL", "FP" ], "contribs" : [ "C#", "Java", "PHP" ],
"awards" : [ "awards" : [
{ "award" : "W.W. McDowell Award", { "award" : "Sherman Peabody Award II",
"year" : 1967, "year" : 2095,
"by" : "IEEE Computer Society" }, "location" : "Paris",
{ "award" : "Draper Prize", "by" : "Intergalactic Continuum" },
"year" : 1993, { "award" : "Sherman Peabody Award IX",
"by" : "National Academy of Engineering" } "year" : 2090,
] "location" : "Paris",
}, "by" : "Intergalactic Continuum" },
{ "award" : "Sherman Peabody Award IV",
"year" : 2093,
"location" : "Paris",
"by" : "Intergalactic Continuum" }
]
}
]
}
22
Document Oriented Database
http://chris.photobooks.com/json/
26
Download NoSQL v95141.3
Released 4/1/2014
http://www.nosql.org/downloads/ymbkm.zip
27
NoSQL
Terminology
and
Concepts
29
Map Reduce
Divides work across distributed systems
Parallel processing of large data sets
Divide – Conquer – Consolidate
Often Implement by defining Map and Reduce classes or functions
2
6
16
8
1+2+3+6+7+8+9=? 36
1
7
20
3
9
Twitter Example:
https://dev.twitter.com/docs/api/1.1 (GET and POST only)
34
Database SELECT Statements
Oracle
SELECT * FROM relationships
MongoDB
db.relationships.find()
Cassandra (CQL)
SELECT * FROM relationships
Neo4j (Cypher)
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
db.employer.insert({
"_id": original_id,
"name": "Broadway Tech",
"url": "bc.example.net" })
db.people.insert({
"name": "Erin",
“employer_id": original_id,
"url": "bc.example.net/Erin" })
http://docs.mongodb.org/manual/reference/database-references/#document-references
37
Replication Challenge is
Write Consistency
38
ACID, BASE, CAP, CPR
1979 Gray, 1983 Reuter & Härder - ACID
Atomic, Consistent, Isolated, Durable
Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops
1997 Brewer - BASE
Basically Available, Soft-state, Eventually consistent
2000 Brewer – CAP (Pick Two)
Consistency, Availability, Partition Tolerance
CPR (Pick Two)
Consistency, Performance, Replication/Redundancy
40
CPR
Consistency Performance
A B C D Redundancy
41
ABCE ABCE ABCD ABCD
Consistency
Redundancy
42
ABCD ABCD ABCD ABCD
Consistency
One Update Locks all Nodes
Performance
Redundancy
43
CRUD
Create
Read
Update
Delete
45
Key-Value Stores
Key Value
code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
46
Redis CRUD
http://redis.io/commands
http://redis.io/topics/data-types-intro
http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/
50
Neo4j
51
Neo4j – Graph Database
http://www.neo4j.org/learn/try
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
https://gist.github.com/peterneubauer/6019125
http://gist.neo4j.org/?6019125
52
Neo4j CRUD
Must try dragging nodes at: http://www.neo4j.org/learn/try
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
MATCH (n)-[r]->(m) RETURN n,r,m
Matches any relationship between “n” and “m”
53
http://www.neo4j.org/learn/cypher
Neo4j
(LUKE {name:"Luke Skywalker"}), (OBI_WAN)-[:KNOWS]->(VADER),
(HAN {name:"Han Solo"}), (LUKE)-[:KNOWS]->(R2D2),
(LEIA {name:"Princess Leia Organa"}), (R2D2)-[:KNOWS]->(C3PO),
(OBI_WAN {name:"Obi Wan Kenobi"}), (LUKE)-[:LIVED_ON]->(TATOOINE),
(YODA {name : "Yoda"}), (HAN)-[:LIVED_ON]->(CORELLIA),
(VADER {name:"Darth Vader"}), (LEIA)-[:LIVED_ON]->(ALDERAAN),
(C3PO {name:"C3PO", droid:true}), (YODA)-[:LIVED_ON]->(DAGOBAH),
(R2D2 {name:"R2D2", droid:true}), (LUKE)-[:DEVOTED_TO]->(JEDI),
(CHEWBACCA {name:"Chewbacca"}), (LUKE)-[:DEVOTED_TO]->(REBELLION),
(TATOOINE {name:"Tatooine", distance:13184}), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE),
(DAGOBAH {name:"Dagobah", distance:15407}), (VADER)-[:DEVOTED_TO]->(SITH),
(JEDI {name:"Jedi"}), (VADER)-[:DEVOTED_TO]->(EMPIRE),
(SITH {name:"Sith"}), (VADER)-[:DEVOTED_TO]->(DARK_SIDE),
(REBELLION {name:"Rebellion"}), (LEIA)-[:DEVOTED_TO]->(REBELLION),
(EMPIRE {name:"Empire"}), (HAN)-[:DEVOTED_TO]->(REBELLION)
(DARK_SIDE {name:"Dark Side"}), …
(LIGHT_SIDE {name:"Light Side"}), https://gist.github.com/peterneubauer/6019125
… http://gist.neo4j.org/?6019125
(LUKE)-[:FRIENDS_WITH]->(HAN),
(LUKE)-[:FRIENDS_WITH]->(LEIA), MATCH y-[r]-other
(HAN)-[:FRIENDS_WITH]->(CHEWBACCA), WHERE y.name='Yoda'
(YODA)-[:TEACHES]->(OBI_WAN), return y.name, type(r), other.name
(YODA)-[:TEACHES]->(LUKE),
54 (OBI_WAN)-[:TEACHES]->(LUKE),
Google BigTable
• White Paper published in 2006
• Many databases based upon BigTable
• 13 pages, readable for many non-techies
• Insightful into the early days of NoSQL
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
55
Hbase
Large-Scale, Column-oriented database
Consistency, Performance, Fault-Tolerant, ACID via Locking
Tables are created before initial data is added
Tables have
row keys are indexed row identifier strings
column families – contain one or more columns
timestamp for version control
56
Hbase
Row key is a unifier for column families.
If row does insert values in a column family no disk space
is utilized within the column family.
Write-Ahead Logging
(WAL)
similar to file system
journaling
57
Hbase CRUD
create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’
create ‘wiki’, ‘text’, ‘revision’
put ‘wiki’, ‘first page’, ‘text:’, ‘…’
put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’
get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’]
delete ‘wiki’, ‘first page’, ‘revision:author’
scan ‘wiki’ = SELECT * FROM wiki
60
MongoDB Simple Database
http://media.mongodb.org/zips.json
{"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"}
{"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"}
{"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"}
{"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"}
{"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"}
{"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"}
{"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"}
{"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"}
{"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"}
{"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"}
{"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"}
{"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"}
{"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"}
{"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"}
{“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”}
{“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”,
“_id”: “95961”}
61
Cassandra Characteristics
Scalable, High-availability Wide-columnar datastore
Peer-to-peer rather than master-slave clusters
Tunable consistency can read/write to a single node,
quorum of nodes or all nodes
Recommends static and dynamic column families
Static column families have contain pre-defined columns
Contact Info: phone, address, email, web
Dynamic families have variable numbers of similar columns
Students enrolled in a course
62
Cassandra CRUD
http://www.datastax.com/docs/0.8/references/cql
http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt
63
Cassandra CRUD
No JOIN operations or FOREIGN KEYS