More Details On Data Models
More Details On Data Models
Relationships
• Aggregates are useful in that they put together
data that is commonly accessed together.
• But there are still lots of cases where data that’s
related is accessed differently.
• An important aspect of relationships between
aggregates is how they handle updates.
• If you update multiple aggregates at once, you
have to deal yourself with a failure partway
through.
Graph Databases
• Graph databases are an odd fish in the NoSQL pond.
• Most NoSQL databases were inspired by the need
to run on clusters, which led to aggregate-oriented
data models of large records with simple
connections.
• Graph databases are motivated by a different
frustration with relational databases and thus have
an opposite model—small records with complex
interconnections
Graph Databases
• Graph isn’t a bar chart or histogram; instead, we
refer to a graph data structure of nodes connected
by edges.
• The Figure have a web of information whose nodes
are very small (nothing more than a name) but there
is a rich structure of interconnections between them.
• With this structure, we can ask questions such as
“find the books in the Databases category that are
written by someone whom a friend of mine likes.”
Graph Databases
• Graph databases specialize in capturing this sort of
information—but on a much larger scale than a
readable diagram could capture
• The fundamental data model of a graph database is
very simple: nodes connected by edges (also called
arcs).
• Beyond this essential characteristic there is a lot of
variation in data models—in particular, what
mechanisms you have to store data in your nodes
and edges.
Graph Databases
• A quick sample of some current capabilities
illustrates this variety of possibilities:
– FlockDB is simply nodes and edges with no
mechanism for additional attributes
– Neo4J allows you to attach Java objects as
properties to nodes and edges in a schemaless
fashion
– Infinite Graph stores your Java objects,which are
subclasses of its built-in types, as nodes and edges.
Graph Databases
• Once you have built up a graph of nodes and edges,
a graph database allows you to query that network
with query operations designed with this kind of
graph in mind
• Important difference between relational and graph
database
– relational databases can implement relationships using
foreign keys, the joins required to navigate around can get
quite expensive
– Graph databases make traversal along the relationships
very cheap. A large part of this is because graph databases
shift most of the work of navigating relationships from
query time to insert time.
Graph Databases
Most of the time you find data by navigating
through the network of edges, with queries such
as “tell me all the things that both Anna and
Barbara like.” You do need a starting place,
however, so usually some nodes can be indexed
by an attribute such as ID. So you might start
with an ID lookup (i.e., look up the people
named “Anna” and “Barbara”) and then start
using the edges. Still, graph databases expect
most of your query work to be navigating
relationships.
Which Model to used when
Key Value
– We use it for : storing session information, user
profiles , preferences, shopping cart data.
– We would avoid it : when we need to query data
having relationships between entities.
Column based
– We use it for : content management systems, blogging
platforms, log aggregation.
– We would avoid it for : systems that are in early
development, changing query patterns.
Which Model to used when
Document Based
– We use it for : content management systems, blogging
platforms, web analytics, real-time analytics, e-commerce
applications.
– We would avoid it for : systems that need complex
transactions spanning multiple operations or queries
against varying aggregate structures.
Graph Based
– It is well suited for : connected data, such as social
networks, spatial data, routing information for goods and
supply.
Schemaless Databases
• Key-value store allows you to store any data you like under a
key
• Document databases make no restrictions on the structure of
the documents you store
• Column-family databases allow you to store any data under
any column you like
• Graph databases allow you to freely add new edges and freely
add properties to nodes and edges as you wish
Schemaless Databases
• NoSQL allows to easily change the data store
as we learn more about the project.
• NoSQL allows to add new things and stop
adding things not needed any more
• Schemaless store also make nonuniform data
– data where each record has a different set of
fields.
Pros and cons of schemaless data
• Pros:
– More freedom and flexibility
– You can easily change your data organization
– You can deal with non-uniform data
• Cons:
– A program that accesses data: .
• almost always relies on some form of implicit schema
• it assumes that certain fields are present
– The implicit schema is shifted into the application code that accesses
data
• To understand what data is present you have look at the application code
– The schema cannot be used to:
• decide how to store and retrieve data efficiently
• ensure data consistency
– Problems if multiple applications, developed by different people, access
the same database.
Schemaless Database
• Schemaless database shifts the schema into the
application code that accesses it.
• Encapsulate all database interaction within a single
application and integrate it with other applications
using web services. This fits in well with many
people’s current preference for using web services
for integration.
• Clearly define different areas of an aggregate for
access by different applications. These could be
different sections in a document database or
different column families in a column-family
database.
Schemaless Database
• Schema lessness does have a big impact on
changes of a database’s structure over time,
particularly for more uniform data.
• We have to exercise control when changing how
one store data in a schemaless database so that
one can easily access both old and new data.
• The flexibility that schemalessness gives you
only applies within an aggregate—if you need
to change your aggregate boundaries, the
migration is every bit as complex as it is in the
relational case.
Materialized Views
• A relational view is a table defined by computation over the
base tables
• Materialized views: computed in advance and cached on
disk
• NoSQL databases:
– do not have views
– have precomputed and cached queries usually called
“materialized view”
• Strategies to building a materialized view
– Eager approach
• the materialized view is updated at the same time of the base data .
good when you have more frequent reads than writes
– Detached approach
• batch jobs update the materialized views at regular intervals . good when
you don’t want to pay an overhead on each update
Modeling for Data Access
when modeling data aggregates we need to consider how the data is going to be read as
well as what are the side effects on data related to those aggregates.