0% found this document useful (0 votes)
8 views40 pages

IAU ST Lecture3

This document discusses data models and query languages, highlighting the differences between relational and document models, as well as the emergence of NoSQL databases. It covers the challenges of object-relational mapping, schema flexibility, and the performance implications of various data models. Additionally, it explores graph-like data models and their associated query languages, emphasizing the evolution and capabilities of different database systems.

Uploaded by

asa5tanha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views40 pages

IAU ST Lecture3

This document discusses data models and query languages, highlighting the differences between relational and document models, as well as the emergence of NoSQL databases. It covers the challenges of object-relational mapping, schema flexibility, and the performance implications of various data models. Additionally, it explores graph-like data models and their associated query languages, emphasizing the evolution and capabilities of different database systems.

Uploaded by

asa5tanha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Big Data Analytics

Lecture 3
Mohammad Hamzei
Department of Computer Engineering
Islamic Azad University, South Tehran Branch
[email protected]
Data Models and Query Languages
Introduction

• Data models are perhaps the most important


part of developing software, because they have
such a profound effect:
– not only on how the software is written,
– but also on how we think about the problem that we
are solving.
• Most applications are built by layering one data
model on top of another.
– each layer hides the complexity of the layers below
it by providing a clean data model.
Relational Model VS. Document Model

• The best-known data model today is probably


that of SQL, based on the relational model
proposed by Edgar Codd in 1970

– data is organized into relations (called tables in SQL)

– each relation is an unordered collection of tuples


(rows in SQL)
The Birth of NoSQL
• NoSQL is the latest attempt to overthrow the
relational model’s dominance.

• Driving forces behind NoSQL databases adoption


– A need for greater scalability than relational databases can
easily achieve, including very large datasets or very high write
throughput
– Specialized query operations that are not well supported by
the relational model
– Frustration with the restrictiveness of relational schemas, and
a desire for a more Dynamic and expressive data model
The Object-Relational Mismatch

• If data is stored in relational tables, an awkward


translation layer is required between the objects
in the application code and the database model
of tables, rows, columns (impedance mismatch)

• Object-relational mapping (ORM) frameworks


reduce the amount of boilerplate code required
for this translation layer
Many to One Relationship

Example: Resume data model

1- SQL model: put positions, education, and


contact information in separate tables, with a
foreign key reference to the users table
Example: Resume data model
Example: Resume data model

Example: Resume data model


2- Later versions of the SQL standard added
support for structured datatypes and XML data:
– This allowed multi-valued data to be stored within a
single row, with support for querying and indexing
inside those documents.
– These features are supported to varying degrees by
Oracle, IBM DB2, MS SQL Server, and PostgreSQL.
– A JSON datatype is also supported by several
databases, including IBM DB2, MySQL, and
PostgreSQL.
Example: Resume data model

Example: Resume data model

3- Encode jobs, education, and contact info as a


JSON or XML document, store it on a text column
in the database
– let the application interpret its structure and
content.
– In this setup, you typically cannot use the database
to query for values inside that encoded column.
Example: Resume data model

• For a data structure like a resume, which is


mostly a self-contained document, a JSON
representation can be quite appropriate

• Document-oriented databases like MongoDB,


RethinkDB, CouchDB, and Espresso support this
data model
Example: Resume data model

• Resume JSON model (Document data model):


– The JSON representation has better locality than the
multi-table schema
Many-to-One and Many-to-Many Relationships

• In document databases, joins are not needed for


one-to-many tree structures, and support for
joins is often weak
• If the database itself does not support joins, you
have to emulate a join in application code by
making multiple queries to the database.
Ex. Many-to-Many Relationships
Are Document Databases Repeating History?

• While many-to-many relationships and joins are


routinely used in relational databases, document
databases and NoSQL reopened the debate on
how best to represent such relationships in a
database
• History:
– Hierarchical model(Difficulty in many-to-many
relationships)
– Relational model and Network model
Document databases

• Document databases reverted back to the


hierarchical model in one aspect:
– storing nested records (one-to-many relationships)
within their parent record rather than in a separate
table.
Document databases

• Many-to-one and many-to-many relationships:


– In these cases relational and document databases
are not fundamentally different
– In both cases, the related item is referenced by a
unique identifier, which is called a foreign key in the
relational model and a document reference in the
document model. That identifier is resolved at read
time by using a join or follow-up queries.
Relational Versus Document Data model

• Document data model:


– schema flexibility
– better performance due to locality
– for some applications it is closer to the data
structures used by the application
• Relational data model:
– providing better support for joins, and many-to-one
and many-to-many relationships
Relational Versus Document Data model

• If your application does use many-to-many


relationships, the document model becomes less
appealing.

• It’s possible to reduce the need for joins by


denormalizing, but then the application code
needs to do additional work to keep the
denormalized data consistent.
Relational Versus Document Data model

• Joins can be emulated in application code by


making multiple requests to the database

– but that also moves complexity into the application


and is usually slower than a join performed by
specialized code inside the database
Schema flexibility in the document model

• No schema in document model(schema-on-read)


– arbitrary keys and values can be added to a
document
– when reading, clients have no guarantees as to what
fields the documents may contain.
• Example: Change the format of data
– In Document model:
• Start writing new documents
– In Relational model:
• Perform a migration in database
• Schema changes can be slow and requires downtime
Schema flexibility in the document model

• The schema-on-read approach is advantageous if


the items in the collection don’t all have the
same structure
– There are many different types of objects
– The structure of the data is determined by external
systems
Data locality in the document model

• To access the entire document, there is a


performance advantage(storage locality)
– The locality advantage only applies if you need large
parts of the document at the same time
– On updates to a document, the entire document
usually needs to be rewritten
– These performance limitations significantly reduce
the set of situations in which document databases
are useful
* The column-family concept in the Bigtable data model (used in
Cassandra and HBase) has a similar purpose of managing locality
Convergence of document and relational databases

• Most relational database systems support XML


and/or JSON data model
– ability to index and query inside documents

• On the document database side, RethinkDB


supports relational-like joins in its query
language, and some MongoDB drivers
automatically resolve database references
Query Languages for Data

• declarative languages (ex. SQL, CSS)


– We just specify the pattern of the data we want—
what conditions the results must meet, and how the
data to be transformed (e.g., sorted, grouped, and
aggregated)—but not how to achieve that goal
• imperative languages (ex. C/C++, Java)
– An imperative language tells the computer to
perform certain operations in a certain order
Declarative query language (SQL)

• It hides implementation details of the database


engine
• It is up to the database system’s query optimizer
to decide which indexes and which join methods
to use, and in which order to execute various
parts of the query
• Declarative languages have a better chance of
getting faster in parallel execution
– Database is free to use a parallel implementation of
the query language
MapReduce Querying

• MapReduce is a programming model for


processing large amounts of data in bulk across
many machines, popularized by Google

• A limited form of MapReduce is supported by


some NoSQL data stores, including MongoDB
and CouchDB, as a mechanism for performing
read-only queries across many documents.
MapReduce model
• MapReduce is a fairly low-level programming
model for distributed execution on a cluster of
machines
• Word-Count Example
MapReduce Querying

• MongoDB’s MapReduce Example:


Graph-Like Data Models

• If your application has mostly one-to-many


relationships (tree-structured data) or no
relationships between records, the document
model is appropriate

• The relational model can handle simple cases of


many-to-many relationships, but as the
connections within your data become more
complex, it becomes more natural to start
modeling your data as a graph.
Graph-Like Data Models

• A graph consists of two kinds of objects:


– vertices (also known as nodes or entities)
– edges (also known as relationships or arcs)

• Examples:
– Social graphs
– The web graph
• PageRank can be used on the web graph to determine the
popularity of a web page and thus its ranking in search
results.
– Road or rail networks
Graph-Like Data Models

• Data structures
– property graph model (implemented by Neo4j, Titan,
and InfiniteGraph)
– triple-store model (implemented by Datomic, …)
• Query languages
– declarative query languages for graphs:
• Cypher, SPARQL, and Datalog
– imperative graph query languages for graphs
• Gremlin
– graph processing frameworks
• Pregel
Property Graphs
• Each vertex consists of:
– A unique identifier
– A set of outgoing edges
– A set of incoming edges
– A collection of properties (key-value pairs)
• Each edge consists of:
– A unique identifier
– The vertex at which the edge starts (the tail vertex)
– The vertex at which the edge ends (the head vertex)
– A label to describe the type of relationship
– A collection of properties (key-value pairs)
Representing a property graph using a relational schema
Graph Queries in SQL

• If we put graph data in a relational structure, can


we also query it using SQL?
– yes, but with some difficulty
• In a graph query, you may need to traverse a
variable number of edges before you find the
vertex you’re looking for
– the number of joins is not fixed in advance
Graph Queries in SQL

• Since SQL:1999, this idea of variable-length


traversal paths in a query can be expressed using
something called recursive common table
expressions (the WITH RECURSIVE syntax)

• Supported in PostgreSQL, IBM DB2, Oracle, and


SQL Server
The Cypher Query Language

• Cypher is a declarative query language for


property graphs, created for the Neo4j graph
database
• Example query: find the names of all the people
who emigrated from the United States to Europe
Triple-Stores and SPARQL

• The triple-store model is mostly equivalent to


the property graph model
• In a triple-store, all information is stored in the
form of very simple three-part statements:
(subject, predicate, object)
– Example: (Jim, likes, bananas)
Triple-Stores

• The subject of a triple is equivalent to a vertex in


a graph. The object is one of two things:
• 1. A value in a primitive datatype, such as a
string or a number.
– In that case, the predicate and object of the triple
are equivalent to the key and value of a property on
the subject vertex.
– For example, (lucy, age, 33) is like a vertex lucy with
properties {"age":33}.
Triple-Stores

• 2. Another vertex in the graph.


– In that case, the predicate is an edge in the graph,
the subject is the tail vertex, and the object is the
head vertex.
– For example, in (lucy, marriedTo, alain) the subject
and object lucy and alain are both vertices, and the
predicate marriedTo is the label of the edge that
connects them.

You might also like