Graph Neo4j
Graph Neo4j
Databases
RDFa
Folksonomies
Information connectivity
Tagging
Wikis
UGC
Blogs
Feeds
Hypertext
Text
Documents
Data is more Semi-Structured:
• If you tried to collect all the data of every
movie ever made, how would you model it?
• Actors, Characters, Locations, Dates, Costs,
Ratings, Showings, Ticket Sales, etc.
Architecture Changes Over Time
1980’s: Single Application
Application
DB
Architecture Changes Over Time
1990’s: Integration
Database Antipattern
DB
Architecture Changes Over Time
2000’s: SOA
DB DB DB
Side note: RDBMS performance
Salary list
Social Network
Location-based services
NOSQL
Not Only SQL
Less than 10% of the NOSQL Vendors
Key Value Stores
• Came from a research article written by
Amazon (Dynamo)
– Global Distributed Hash Table
• Global collection of key value pairs
Four NOSQL Categories
Key Value Stores
• Most Based on Dynamo: Amazon Highly
Available Key-Value Store
• Data Model:
– Global key-value mapping
– Big scalable HashMap
– Highly fault tolerant (typically)
• Examples:
– Redis, Riak, Voldemort
Key Value Stores: Pros and Cons
• Pros:
– Simple data model
– Scalable
• Cons
– Poor for complex data
Column Family
• Most Based on BigTable: Google’s Distributed
Storage System for Structured Data
• Data Model:
– A big table, with column families
• Every row can have its own schema
• Helps capture more “messy” data
– Map Reduce for querying/processing
• Examples:
– HBase, HyperTable, Cassandra
Column Family: Pros and Cons
• Pros:
– Supports Simi-Structured Data
– Naturally Indexed (columns)
– Scalable
• Cons
– Poor for interconnected data
Document Databases
• Inspired by Lotus Notes
– Collection of Key value pair collections (called
Documents)
Document Databases
• Data Model:
– A collection of documents
– A document is a key value collection
– Index-centric, lots of map-reduce
• Examples:
– CouchDB, MongoDB
Document Databases: Pros and Cons
• Pros:
– Simple, powerful data model
– Scalable
• Cons
– Poor for interconnected data
– Query model limited to keys and indexes
– Map reduce for larger queries
Graph Databases
• Data Model:
– Nodes and Relationships
• Examples:
– Neo4j, OrientDB, InfiniteGraph, AllegroGraph
Graph Databases: Pros and Cons
• Pros:
– Powerful data model, as general as RDBMS
– Connected data locally indexed
– Easy to query
• Cons
– Sharding ( lots of people working on this)
• Scales UP reasonably well
– Requires rewiring your brain
What are graphs good for?
• Recommendations
• Business intelligence
• Social computing
• Geospatial
• Systems management
• Web of things
• Genealogy
• Time series data
• Product catalogue
• Web analytics
• Scientific computing (especially bioinformatics)
• Indexing your slow RDBMS
• And much more!
What is a Graph?
What is a Graph?
• An abstract representation of a set of objects
where some pairs are connected by links.
• Pseudo Graph
• Multi Graph
• Hyper Graph
More Kinds of Graphs
• Weighted Graph
• Labeled Graph
• Property Graph
What is a Graph Database?
• A database with an explicit graph structure
• Each node knows its adjacent nodes
• As the number of nodes increases, the cost of
a local step (or hop) remains the same
• Plus an Index for lookups
Relational Databases
Graph Databases
Neo4j Tips
• Each entity table is represented by a label on
nodes
• Each row in a entity table is a node
• Columns on those tables become node
properties.
• Join tables are transformed into relationships,
columns on those tables become relationship
properties
Node in Neo4j
Relationships in Neo4j
• Relationships between nodes are a key part of
Neo4j.
Relationships in Neo4j
Twitter and relationships
Properties
• Both nodes and relationships can have
properties.
• Properties are key-value pairs where the key is
a string.
• Property values can be either a primitive or an
array of one primitive type.
For example String, int and int[] values are
valid for properties.
Properties
Paths in Neo4j
• A path is one or more nodes with connecting
relationships, typically retrieved as a query or
traversal result.
Starting and Stopping
Creating a small graph
Print the data
Remove the data
The Matrix Graph Database