0% found this document useful (0 votes)

10 views40 pages

IAU ST Lecture3

This document discusses data models and query languages, highlighting the differences between relational and document models, as well as the emergence of NoSQL databases. It covers the challenges of object-relational mapping, schema flexibility, and the performance implications of various data models. Additionally, it explores graph-like data models and their associated query languages, emphasizing the evolution and capabilities of different database systems.

Uploaded by

asa5tanha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views40 pages

IAU ST Lecture3

Uploaded by

asa5tanha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Big Data Analytics

Lecture 3
Mohammad Hamzei
Department of Computer Engineering
Islamic Azad University, South Tehran Branch
[email protected]
Data Models and Query Languages
Introduction

• Data models are perhaps the most important

part of developing software, because they have
such a profound effect:
– not only on how the software is written,
– but also on how we think about the problem that we
are solving.
• Most applications are built by layering one data
model on top of another.
– each layer hides the complexity of the layers below
it by providing a clean data model.
Relational Model VS. Document Model

• The best-known data model today is probably

that of SQL, based on the relational model
proposed by Edgar Codd in 1970

– data is organized into relations (called tables in SQL)

– each relation is an unordered collection of tuples

(rows in SQL)
The Birth of NoSQL
• NoSQL is the latest attempt to overthrow the
relational model’s dominance.

• Driving forces behind NoSQL databases adoption

– A need for greater scalability than relational databases can
easily achieve, including very large datasets or very high write
throughput
– Specialized query operations that are not well supported by
the relational model
– Frustration with the restrictiveness of relational schemas, and
a desire for a more Dynamic and expressive data model
The Object-Relational Mismatch

• If data is stored in relational tables, an awkward

translation layer is required between the objects
in the application code and the database model
of tables, rows, columns (impedance mismatch)

• Object-relational mapping (ORM) frameworks

reduce the amount of boilerplate code required
for this translation layer
Many to One Relationship

Example: Resume data model

1- SQL model: put positions, education, and

contact information in separate tables, with a
foreign key reference to the users table
Example: Resume data model
Example: Resume data model

Example: Resume data model

2- Later versions of the SQL standard added
support for structured datatypes and XML data:
– This allowed multi-valued data to be stored within a
single row, with support for querying and indexing
inside those documents.
– These features are supported to varying degrees by
Oracle, IBM DB2, MS SQL Server, and PostgreSQL.
– A JSON datatype is also supported by several
databases, including IBM DB2, MySQL, and
PostgreSQL.
Example: Resume data model

Example: Resume data model

3- Encode jobs, education, and contact info as a

JSON or XML document, store it on a text column
in the database
– let the application interpret its structure and
content.
– In this setup, you typically cannot use the database
to query for values inside that encoded column.
Example: Resume data model

• For a data structure like a resume, which is

mostly a self-contained document, a JSON
representation can be quite appropriate

• Document-oriented databases like MongoDB,

RethinkDB, CouchDB, and Espresso support this
data model
Example: Resume data model

• Resume JSON model (Document data model):

– The JSON representation has better locality than the
multi-table schema
Many-to-One and Many-to-Many Relationships

• In document databases, joins are not needed for

one-to-many tree structures, and support for
joins is often weak
• If the database itself does not support joins, you
have to emulate a join in application code by
making multiple queries to the database.
Ex. Many-to-Many Relationships
Are Document Databases Repeating History?

• While many-to-many relationships and joins are

routinely used in relational databases, document
databases and NoSQL reopened the debate on
how best to represent such relationships in a
database
• History:
– Hierarchical model(Difficulty in many-to-many
relationships)
– Relational model and Network model
Document databases

• Document databases reverted back to the

hierarchical model in one aspect:
– storing nested records (one-to-many relationships)
within their parent record rather than in a separate
table.
Document databases

• Many-to-one and many-to-many relationships:

– In these cases relational and document databases
are not fundamentally different
– In both cases, the related item is referenced by a
unique identifier, which is called a foreign key in the
relational model and a document reference in the
document model. That identifier is resolved at read
time by using a join or follow-up queries.
Relational Versus Document Data model

• Document data model:

– schema flexibility
– better performance due to locality
– for some applications it is closer to the data
structures used by the application
• Relational data model:
– providing better support for joins, and many-to-one
and many-to-many relationships
Relational Versus Document Data model

• If your application does use many-to-many

relationships, the document model becomes less
appealing.

• It’s possible to reduce the need for joins by

denormalizing, but then the application code
needs to do additional work to keep the
denormalized data consistent.
Relational Versus Document Data model

• Joins can be emulated in application code by

making multiple requests to the database

– but that also moves complexity into the application

and is usually slower than a join performed by
specialized code inside the database
Schema flexibility in the document model

• No schema in document model(schema-on-read)

– arbitrary keys and values can be added to a
document
– when reading, clients have no guarantees as to what
fields the documents may contain.
• Example: Change the format of data
– In Document model:
• Start writing new documents
– In Relational model:
• Perform a migration in database
• Schema changes can be slow and requires downtime
Schema flexibility in the document model

• The schema-on-read approach is advantageous if

the items in the collection don’t all have the
same structure
– There are many different types of objects
– The structure of the data is determined by external
systems
Data locality in the document model

• To access the entire document, there is a

performance advantage(storage locality)
– The locality advantage only applies if you need large
parts of the document at the same time
– On updates to a document, the entire document
usually needs to be rewritten
– These performance limitations significantly reduce
the set of situations in which document databases
are useful
* The column-family concept in the Bigtable data model (used in
Cassandra and HBase) has a similar purpose of managing locality
Convergence of document and relational databases

• Most relational database systems support XML

and/or JSON data model
– ability to index and query inside documents

• On the document database side, RethinkDB

supports relational-like joins in its query
language, and some MongoDB drivers
automatically resolve database references
Query Languages for Data

• declarative languages (ex. SQL, CSS)

– We just specify the pattern of the data we want—
what conditions the results must meet, and how the
data to be transformed (e.g., sorted, grouped, and
aggregated)—but not how to achieve that goal
• imperative languages (ex. C/C++, Java)
– An imperative language tells the computer to
perform certain operations in a certain order
Declarative query language (SQL)

• It hides implementation details of the database

engine
• It is up to the database system’s query optimizer
to decide which indexes and which join methods
to use, and in which order to execute various
parts of the query
• Declarative languages have a better chance of
getting faster in parallel execution
– Database is free to use a parallel implementation of
the query language
MapReduce Querying

• MapReduce is a programming model for

processing large amounts of data in bulk across
many machines, popularized by Google

• A limited form of MapReduce is supported by

some NoSQL data stores, including MongoDB
and CouchDB, as a mechanism for performing
read-only queries across many documents.
MapReduce model
• MapReduce is a fairly low-level programming
model for distributed execution on a cluster of
machines
• Word-Count Example
MapReduce Querying

• MongoDB’s MapReduce Example:

Graph-Like Data Models

• If your application has mostly one-to-many

relationships (tree-structured data) or no
relationships between records, the document
model is appropriate

• The relational model can handle simple cases of

many-to-many relationships, but as the
connections within your data become more
complex, it becomes more natural to start
modeling your data as a graph.
Graph-Like Data Models

• A graph consists of two kinds of objects:

– vertices (also known as nodes or entities)
– edges (also known as relationships or arcs)

• Examples:
– Social graphs
– The web graph
• PageRank can be used on the web graph to determine the
popularity of a web page and thus its ranking in search
results.
– Road or rail networks
Graph-Like Data Models

• Data structures
– property graph model (implemented by Neo4j, Titan,
and InfiniteGraph)
– triple-store model (implemented by Datomic, …)
• Query languages
– declarative query languages for graphs:
• Cypher, SPARQL, and Datalog
– imperative graph query languages for graphs
• Gremlin
– graph processing frameworks
• Pregel
Property Graphs
• Each vertex consists of:
– A unique identifier
– A set of outgoing edges
– A set of incoming edges
– A collection of properties (key-value pairs)
• Each edge consists of:
– A unique identifier
– The vertex at which the edge starts (the tail vertex)
– The vertex at which the edge ends (the head vertex)
– A label to describe the type of relationship
– A collection of properties (key-value pairs)
Representing a property graph using a relational schema
Graph Queries in SQL

• If we put graph data in a relational structure, can

we also query it using SQL?
– yes, but with some difficulty
• In a graph query, you may need to traverse a
variable number of edges before you find the
vertex you’re looking for
– the number of joins is not fixed in advance
Graph Queries in SQL

• Since SQL:1999, this idea of variable-length

traversal paths in a query can be expressed using
something called recursive common table
expressions (the WITH RECURSIVE syntax)

• Supported in PostgreSQL, IBM DB2, Oracle, and

SQL Server
The Cypher Query Language

• Cypher is a declarative query language for

property graphs, created for the Neo4j graph
database
• Example query: find the names of all the people
who emigrated from the United States to Europe
Triple-Stores and SPARQL

• The triple-store model is mostly equivalent to

the property graph model
• In a triple-store, all information is stored in the
form of very simple three-part statements:
(subject, predicate, object)
– Example: (Jim, likes, bananas)
Triple-Stores

• The subject of a triple is equivalent to a vertex in

a graph. The object is one of two things:
• 1. A value in a primitive datatype, such as a
string or a number.
– In that case, the predicate and object of the triple
are equivalent to the key and value of a property on
the subject vertex.
– For example, (lucy, age, 33) is like a vertex lucy with
properties {"age":33}.
Triple-Stores

• 2. Another vertex in the graph.

– In that case, the predicate is an edge in the graph,
the subject is the tail vertex, and the object is the
head vertex.
– For example, in (lucy, marriedTo, alain) the subject
and object lucy and alain are both vertices, and the
predicate marriedTo is the label of the edge that
connects them.

Chapter - 2: Database Model Key-Value Data Store Document Databases Column Databases Graph Databases
No ratings yet
Chapter - 2: Database Model Key-Value Data Store Document Databases Column Databases Graph Databases
61 pages
Module 5 - Nosql
No ratings yet
Module 5 - Nosql
45 pages
Session - 6 - Complex Data Types
No ratings yet
Session - 6 - Complex Data Types
27 pages
Unit Ii
No ratings yet
Unit Ii
70 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
MODULE 1 - PPT - 7B
No ratings yet
MODULE 1 - PPT - 7B
70 pages
2 Designing Data-Intensive Apps - CH 2
No ratings yet
2 Designing Data-Intensive Apps - CH 2
3 pages
Module 1
No ratings yet
Module 1
52 pages
Nosql
No ratings yet
Nosql
64 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
Lesson 2 Unstructured Data
No ratings yet
Lesson 2 Unstructured Data
33 pages
Chapter 5: No SQL Data Management and Mongodb: Unit-2
No ratings yet
Chapter 5: No SQL Data Management and Mongodb: Unit-2
65 pages
NOSQL
No ratings yet
NOSQL
50 pages
01 - Chapter - Introducing Data Modeling
No ratings yet
01 - Chapter - Introducing Data Modeling
50 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
NGT NOV-19 (Sol) (E-Next - In)
No ratings yet
NGT NOV-19 (Sol) (E-Next - In)
33 pages
Unit 6
No ratings yet
Unit 6
143 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
Data Models in DBMS
No ratings yet
Data Models in DBMS
10 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
Unit V Big Data Frameworks
No ratings yet
Unit V Big Data Frameworks
42 pages
1842 Week6 NoSQL
No ratings yet
1842 Week6 NoSQL
51 pages
ADBMS Original-Output
No ratings yet
ADBMS Original-Output
28 pages
No SQL
No ratings yet
No SQL
38 pages
2.1.2 Data Models
No ratings yet
2.1.2 Data Models
13 pages
Unit III
No ratings yet
Unit III
12 pages
Unit 2
No ratings yet
Unit 2
41 pages
10gen Top 5 NoSQL Considerations
No ratings yet
10gen Top 5 NoSQL Considerations
11 pages
Intro 2 DB
No ratings yet
Intro 2 DB
126 pages
RDBMSvs DBMS
No ratings yet
RDBMSvs DBMS
9 pages
BDA Assignment1 BE6 20
No ratings yet
BDA Assignment1 BE6 20
10 pages
The Relational Model
No ratings yet
The Relational Model
4 pages
05 NoSQL
No ratings yet
05 NoSQL
21 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
Chapter 1 - Introducing Big Data & NoSQL
No ratings yet
Chapter 1 - Introducing Big Data & NoSQL
14 pages
Unit 5 - 230601 - 174540-1
No ratings yet
Unit 5 - 230601 - 174540-1
14 pages
No SQL
No ratings yet
No SQL
3 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
15 pages
Unit II No-SQL DB Managment
No ratings yet
Unit II No-SQL DB Managment
33 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Untitled Document.
No ratings yet
Untitled Document.
7 pages
HND Computing UNIT 38: Database Management: Faisal Saghir
No ratings yet
HND Computing UNIT 38: Database Management: Faisal Saghir
38 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Moshell Example
No ratings yet
Moshell Example
69 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
MongoDB Why Documents
No ratings yet
MongoDB Why Documents
15 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Rank Booster JEE MAIN Maths Part 1.2 PDF
No ratings yet
Rank Booster JEE MAIN Maths Part 1.2 PDF
111 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
10gen Top 5 NoSQL Considerations
No ratings yet
10gen Top 5 NoSQL Considerations
10 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
4.2 NoSQL Databases UNIT-1
No ratings yet
4.2 NoSQL Databases UNIT-1
35 pages
NOSQL Database
No ratings yet
NOSQL Database
10 pages
Case Study On Different Nosql Data Models
No ratings yet
Case Study On Different Nosql Data Models
6 pages
Data Modeling For Nosql Document-Oriented Databases
No ratings yet
Data Modeling For Nosql Document-Oriented Databases
7 pages
HBase
No ratings yet
HBase
36 pages
IBM DB2 Universal Database V8.1 System Administration: Student Manual
No ratings yet
IBM DB2 Universal Database V8.1 System Administration: Student Manual
546 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
SYNON FAQ's
No ratings yet
SYNON FAQ's
15 pages
Multi Language Setup
No ratings yet
Multi Language Setup
16 pages
8051 Coomunication
No ratings yet
8051 Coomunication
93 pages
PGDCL05 PDF
No ratings yet
PGDCL05 PDF
408 pages
User Manual 3842570
No ratings yet
User Manual 3842570
18 pages
Tybca Slips-25marks
50% (2)
Tybca Slips-25marks
44 pages
MiniRemoteComm v1 7b
No ratings yet
MiniRemoteComm v1 7b
52 pages
CDE Sample Interview Questions
No ratings yet
CDE Sample Interview Questions
10 pages
Module4 Chapter1&2
No ratings yet
Module4 Chapter1&2
15 pages
Policy API Manual v2.1 - Asego
No ratings yet
Policy API Manual v2.1 - Asego
22 pages
NM-A901 Agera-SVT LO Rev1.0 - LENOVO
0% (1)
NM-A901 Agera-SVT LO Rev1.0 - LENOVO
37 pages
Sample of Hardware Equipment Acceptance Form V0.2.1
No ratings yet
Sample of Hardware Equipment Acceptance Form V0.2.1
10 pages
Large Language Models in Finance A Survey
No ratings yet
Large Language Models in Finance A Survey
9 pages
Cpu Organization
100% (3)
Cpu Organization
5 pages
Alter Index
No ratings yet
Alter Index
56 pages
Stock Price Prediction Using Multi-Faceted Information Based On Deep Recurrent Neural Networks
No ratings yet
Stock Price Prediction Using Multi-Faceted Information Based On Deep Recurrent Neural Networks
6 pages
AI and ML in Finance Revolutionizing The Future of Banking and Investments
No ratings yet
AI and ML in Finance Revolutionizing The Future of Banking and Investments
5 pages
Log
No ratings yet
Log
14 pages
Review On Applications of AI & ML in Finance
No ratings yet
Review On Applications of AI & ML in Finance
10 pages
Exploring The Latest Trends in Artificial Intellig
No ratings yet
Exploring The Latest Trends in Artificial Intellig
13 pages
Application of Artificial Intelligence in Electrical
No ratings yet
Application of Artificial Intelligence in Electrical
7 pages
Counter in BEx
50% (2)
Counter in BEx
9 pages
Multi-Source Information Fusion Based DLaaS For Traffic Flow Prediction
No ratings yet
Multi-Source Information Fusion Based DLaaS For Traffic Flow Prediction
10 pages
Multi-Label Text Classification On TextCNN Fused B
No ratings yet
Multi-Label Text Classification On TextCNN Fused B
7 pages
Gefhr2023 774 781
No ratings yet
Gefhr2023 774 781
8 pages
Artificial Intelligence Ai Prospects For Financial Services and Policy Approach Berlin Sept 2020
No ratings yet
Artificial Intelligence Ai Prospects For Financial Services and Policy Approach Berlin Sept 2020
15 pages
Asset Splitting Algorithm For Ultrahigh Dimensional Portfolio Selection and Its Theoretical Property
No ratings yet
Asset Splitting Algorithm For Ultrahigh Dimensional Portfolio Selection and Its Theoretical Property
11 pages
01.ORM Fundamentals Exercise MiniORM
No ratings yet
01.ORM Fundamentals Exercise MiniORM
21 pages
Applications of Artificial Intelligence in Finance Prospects Limits and Risks
No ratings yet
Applications of Artificial Intelligence in Finance Prospects Limits and Risks
6 pages
Application Analysis of Artificial Intelligence Te
No ratings yet
Application Analysis of Artificial Intelligence Te
12 pages
2 PDF
No ratings yet
2 PDF
16 pages
Flowchart Listing
No ratings yet
Flowchart Listing
54 pages
All Full Form List of The Computer PDF Download From Stud Mentor by Chetan Darji
No ratings yet
All Full Form List of The Computer PDF Download From Stud Mentor by Chetan Darji
4 pages
Anomaly Detection in Structural Health Monitoring With Ensemble Learning and Reinforcement Learning
No ratings yet
Anomaly Detection in Structural Health Monitoring With Ensemble Learning and Reinforcement Learning
16 pages
Introduction To C Language
No ratings yet
Introduction To C Language
9 pages
Lecture 8 - Data Structures
No ratings yet
Lecture 8 - Data Structures
24 pages
Useful Workflow Scripts
No ratings yet
Useful Workflow Scripts
4 pages
Greedy Online Classification of Persistent Market States Using Realized Intraday Volatility Features
No ratings yet
Greedy Online Classification of Persistent Market States Using Realized Intraday Volatility Features
14 pages
ChatGpt Java
No ratings yet
ChatGpt Java
6 pages
A DHT11 Class For Arduino.
No ratings yet
A DHT11 Class For Arduino.
6 pages
نمونه سوالات شبکه عصبی
No ratings yet
نمونه سوالات شبکه عصبی
4 pages
Multiversion Timestamp Ordering Algorithm
No ratings yet
Multiversion Timestamp Ordering Algorithm
2 pages
Design of A General Purpose Memory Allocator For The 4.3BSD UNIX Kernel - Marshall Kirk McKusick and Michael J. Karels
No ratings yet
Design of A General Purpose Memory Allocator For The 4.3BSD UNIX Kernel - Marshall Kirk McKusick and Michael J. Karels
9 pages

IAU ST Lecture3

Uploaded by

IAU ST Lecture3

Uploaded by

Big Data Analytics

• Data models are perhaps the most important

• The best-known data model today is probably

– data is organized into relations (called tables in SQL)

– each relation is an unordered collection of tuples

• Driving forces behind NoSQL databases adoption

• If data is stored in relational tables, an awkward

• Object-relational mapping (ORM) frameworks

Example: Resume data model

1- SQL model: put positions, education, and

Example: Resume data model

Example: Resume data model

3- Encode jobs, education, and contact info as a

• For a data structure like a resume, which is

• Document-oriented databases like MongoDB,

• Resume JSON model (Document data model):

• In document databases, joins are not needed for

• While many-to-many relationships and joins are

• Document databases reverted back to the

• Many-to-one and many-to-many relationships:

• Document data model:

• If your application does use many-to-many

• It’s possible to reduce the need for joins by

• Joins can be emulated in application code by

– but that also moves complexity into the application

• No schema in document model(schema-on-read)

• The schema-on-read approach is advantageous if

• To access the entire document, there is a

• Most relational database systems support XML

• On the document database side, RethinkDB

• declarative languages (ex. SQL, CSS)

• It hides implementation details of the database

• MapReduce is a programming model for

• A limited form of MapReduce is supported by

• MongoDB’s MapReduce Example:

• If your application has mostly one-to-many

• The relational model can handle simple cases of

• A graph consists of two kinds of objects:

• If we put graph data in a relational structure, can

• Since SQL:1999, this idea of variable-length

• Supported in PostgreSQL, IBM DB2, Oracle, and

• Cypher is a declarative query language for

• The triple-store model is mostly equivalent to

• The subject of a triple is equivalent to a vertex in

• 2. Another vertex in the graph.

You might also like