0% found this document useful (0 votes)
100 views23 pages

More Details On Data Models

Graph databases are suited for data with complex interconnected relationships. They model data as nodes connected by edges, allowing for efficient traversal of relationships. In contrast, most NoSQL databases use a simpler aggregate-oriented model with references between large records. Materialized views can pre-compute and cache queries to provide alternative structures for accessing data organized in aggregates.

Uploaded by

chitraalavani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views23 pages

More Details On Data Models

Graph databases are suited for data with complex interconnected relationships. They model data as nodes connected by edges, allowing for efficient traversal of relationships. In contrast, most NoSQL databases use a simpler aggregate-oriented model with references between large records. Materialized views can pre-compute and cache queries to provide alternative structures for accessing data organized in aggregates.

Uploaded by

chitraalavani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

More Details on Data Models

Relationships
• Aggregates are useful in that they put together
data that is commonly accessed together.
• But there are still lots of cases where data that’s
related is accessed differently.
• An important aspect of relationships between
aggregates is how they handle updates.
• If you update multiple aggregates at once, you
have to deal yourself with a failure partway
through.
Graph Databases
• Graph databases are an odd fish in the NoSQL pond.
• Most NoSQL databases were inspired by the need
to run on clusters, which led to aggregate-oriented
data models of large records with simple
connections.
• Graph databases are motivated by a different
frustration with relational databases and thus have
an opposite model—small records with complex
interconnections
Graph Databases
• Graph isn’t a bar chart or histogram; instead, we
refer to a graph data structure of nodes connected
by edges.
• The Figure have a web of information whose nodes
are very small (nothing more than a name) but there
is a rich structure of interconnections between them.
• With this structure, we can ask questions such as
“find the books in the Databases category that are
written by someone whom a friend of mine likes.”
Graph Databases
• Graph databases specialize in capturing this sort of
information—but on a much larger scale than a
readable diagram could capture
• The fundamental data model of a graph database is
very simple: nodes connected by edges (also called
arcs).
• Beyond this essential characteristic there is a lot of
variation in data models—in particular, what
mechanisms you have to store data in your nodes
and edges.
Graph Databases
• A quick sample of some current capabilities
illustrates this variety of possibilities:
– FlockDB is simply nodes and edges with no
mechanism for additional attributes
– Neo4J allows you to attach Java objects as
properties to nodes and edges in a schemaless
fashion
– Infinite Graph stores your Java objects,which are
subclasses of its built-in types, as nodes and edges.
Graph Databases
• Once you have built up a graph of nodes and edges,
a graph database allows you to query that network
with query operations designed with this kind of
graph in mind
• Important difference between relational and graph
database
– relational databases can implement relationships using
foreign keys, the joins required to navigate around can get
quite expensive
– Graph databases make traversal along the relationships
very cheap. A large part of this is because graph databases
shift most of the work of navigating relationships from
query time to insert time.
Graph Databases
Most of the time you find data by navigating
through the network of edges, with queries such
as “tell me all the things that both Anna and
Barbara like.” You do need a starting place,
however, so usually some nodes can be indexed
by an attribute such as ID. So you might start
with an ID lookup (i.e., look up the people
named “Anna” and “Barbara”) and then start
using the edges. Still, graph databases expect
most of your query work to be navigating
relationships.
Which Model to used when
Key Value
– We use it for : storing session information, user
profiles , preferences, shopping cart data.
– We would avoid it : when we need to query data
having relationships between entities.
Column based
– We use it for : content management systems, blogging
platforms, log aggregation.
– We would avoid it for : systems that are in early
development, changing query patterns.
Which Model to used when
Document Based
– We use it for : content management systems, blogging
platforms, web analytics, real-time analytics, e-commerce
applications.
– We would avoid it for : systems that need complex
transactions spanning multiple operations or queries
against varying aggregate structures.
Graph Based
– It is well suited for : connected data, such as social
networks, spatial data, routing information for goods and
supply.
Schemaless Databases
• Key-value store allows you to store any data you like under a
key
• Document databases make no restrictions on the structure of
the documents you store
• Column-family databases allow you to store any data under
any column you like
• Graph databases allow you to freely add new edges and freely
add properties to nodes and edges as you wish
Schemaless Databases
• NoSQL allows to easily change the data store
as we learn more about the project.
• NoSQL allows to add new things and stop
adding things not needed any more
• Schemaless store also make nonuniform data
– data where each record has a different set of
fields.
Pros and cons of schemaless data
• Pros:
– More freedom and flexibility
– You can easily change your data organization
– You can deal with non-uniform data
• Cons:
– A program that accesses data: .
• almost always relies on some form of implicit schema
• it assumes that certain fields are present
– The implicit schema is shifted into the application code that accesses
data
• To understand what data is present you have look at the application code
– The schema cannot be used to:
• decide how to store and retrieve data efficiently
• ensure data consistency
– Problems if multiple applications, developed by different people, access
the same database.
Schemaless Database
• Schemaless database shifts the schema into the
application code that accesses it.
• Encapsulate all database interaction within a single
application and integrate it with other applications
using web services. This fits in well with many
people’s current preference for using web services
for integration.
• Clearly define different areas of an aggregate for
access by different applications. These could be
different sections in a document database or
different column families in a column-family
database.
Schemaless Database
• Schema lessness does have a big impact on
changes of a database’s structure over time,
particularly for more uniform data.
• We have to exercise control when changing how
one store data in a schemaless database so that
one can easily access both old and new data.
• The flexibility that schemalessness gives you
only applies within an aggregate—if you need
to change your aggregate boundaries, the
migration is every bit as complex as it is in the
relational case.
Materialized Views
• A relational view is a table defined by computation over the
base tables
• Materialized views: computed in advance and cached on
disk
• NoSQL databases:
– do not have views
– have precomputed and cached queries usually called
“materialized view”
• Strategies to building a materialized view
– Eager approach
• the materialized view is updated at the same time of the base data .
good when you have more frequent reads than writes
– Detached approach
• batch jobs update the materialized views at regular intervals . good when
you don’t want to pay an overhead on each update
Modeling for Data Access
when modeling data aggregates we need to consider how the data is going to be read as
well as what are the side effects on data related to those aggregates.

• The application can read the customer’s


information and all the related data by
using the key
• If the requirements are to read the
orders or the products sold in each order,
the whole object has to be read and then
parsed on the client side to build the
results.
• When references are needed, we could
switch to document stores and then
query inside the documents, or even
change the data for the key-value store
to split the value object into Customer
and Order objects and then
maintain these objects’ references to
each other.
Modeling for Data Access
We can now find the orders
independently from the Customer, and
with the orderId reference in the
Customer we can find all Orders for the
Customer. Using aggregates this way
allows for read optimization, but we have
to push the orderId reference into
Customer every time with a new Order.
Key Points
• Aggregate-oriented databases make inter-aggregate
relationships more difficult to handle than intra-aggregate
relationships.
• Graph databases organize data into node and edge graphs;
they work best for data that has complex relationship
structures.
• Schemaless databases allow you to freely add fields to records,
but there is usually an implicit schema expected by users of
the data.
• Aggregate-oriented databases often compute materialized
views to provide data organized differently from their primary
aggregates. This is often done with map-reduce computations.

You might also like