From Relational to Graph
Databases
Relational
DBs
Relational databases are powerful and the basis of most of current
software applications
The relational model is well-suited to model many real-world
problems…
but not
all
or, at least, not in the best performing
manner!
inspired
by https://phauer.com/2015/relational-databases-strength-weaknesses-
mongodb/ 2
Weaknesses of the relational
model
Impedance mismatch between the object-oriented and the relational
world
The relational data model doesn’t fit in with every
domain
Difficult schema evolution due to an inflexible data
model
Weak distributed availability due to poor horizontal
scalability
Performance hit due to joins, ACID transactions and strict
consistency
constraints (especially in distributed environments) 9
Weaknesses of the relational
model
Impedance mismatch between the object-oriented and the relational
world
The relational data model doesn’t fit in with every
domain
Difficult schema evolution due to an inflexible data
model
Weak distributed availability due to poor horizontal
scalability
Performance hit due to joins, ACID transactions and strict
consistency
constraints (especially in distributed environments) 9
Impedance Mismatch
The application layer of a software is often based on an OO-model
the relational model saves data in
two- dimensional tables
If you want to store your object graph
in a relational database, you have to
slice and flatten your object graph until
it fits into multiple normalized tables
relational databases are not well-suited for highly connected data,
because they don’t robustly store relationships between data
elements
11
Unsuitable Data Model for Certain Domains
Some domain do not naturally fit in the relational model
Think about hierarchies or graphs
They involve recursion to address several information
needs Recursion is expensive
12
A classical example
Return the departments where Alice works
We need to join the Persons table with the Dept_Members
table
and join the result set with the Department
table
14
A classical example
https://www.youtube.com/watch?v=uuTL321Oyto&ab_channel=N
eo4j
Return the departments where Alice works
We can remove an entire table and two
joins
15
SQL strain?
Large number of
Joins
Numerous Self-JOINs (or Recursive
JOINs)
Frequent Schema
Changes
Slow-Running Queries (Despite Extensive Tuning &
Hardware)
Pre-Computing Your
Results
16
Relationships
Relational DBs are not good with Relationships
1 Storing relationships comes at a cost
(complexity)
2 Performances degrade when the number of joins
grows
3 SQL was built for set theory not graph
theory
4 New data types and relationships require schema
redesign
18
NoSQL databases?
1 Storing relationships comes at a cost
(complexity)
2 Speed plummets as you try to join data together in the
application
3 Lots of “almost SQL” languages and terrible at
joins
4 No ACID = data
corruption
19
From Relational Modelling…
Relational Model and Relational
DBMS
21
… to Graph Modelling
Graph-based Models and Graph
DBMS
22
From real-world problems to graph-based solutions
Processing and cleaning
Python, Jupiter notebooks,
… Kaggle datasets,
challenges
Modelling
Ontologies, property
graphs
Storing and querying
SPARQL,
Cypher
23
Linked (Open)
Data
24
Advanced topics
Open
Science
Data
Citation
Data
Provenance
Data
Pricing 25
What do you need to know Before Studying Graph DB
Relational Model
Relational DBMS
SQL query
language
Python/Java
Algorithmic
Paradigms Divide
and Conquer Dynamic
Programming
Greedy
Time and Space Complexity 27
What is Open
Data
• “Open data is data that can be freely used, reused and
redistributed by anyone - subject only, at most, to the requirement
to attribute and sharealike.”
• Open data must have a license to state that it’s
open
• This license might require users of data
to:
• credit the publisher (attribution)
• publish results as open data if they've mixed open data with other
data
1
History
The concept of open data is not new.
In 1942, Robert King Merton (American Sociologist 1910-2003)
explained the importance of open data and its benefits for scientific
world.
Merton claims that “each researcher must contribute to the common pot
and give up intellectual property rights to allow knowledge to move
forward”
In 1995 “Open Data” term firstly appeared in a document from
American Scientific Agency they were trying to promote people to
exchange their scientific information throughout the world
From 2007 – Today the idea of open data idea is much more possible
than never before.
4
Why Open
Data
Open data can be used to design new products, provide community
services, open up new business opportunities
Open data can help with decision making in your own
life Allow an individual to be more active in the
society Improve quality of service you offer the public
Makes government efficient and in turn reducing cost
5
Linked/Open
Data
Open Data – can exist without linking
Linked Data – can exist without being open
Linked Open Data – is open data designed to support linking to other
open data resources
Both – can be offered as file dumps and/or live services
6
Linked Data: History of Semantic
Web
'Semantic Nets' first invented for computers Richard H. Richens 1956
Introduction of Semantic Network Model by cognitive scientist Allan M.
Collins, Linguist M. Ross Quillian and Psychologist Ellizabeth F. Loftus
in early 1960s.
Tim Berners-Lee coined the term “Semantic Web”: A web of data that
can be processed directly and indirectly by machines
7
Web of
Data
8
A Web of data consists of data from around
the world that is linked together so that it can
be found, browsed, crawled, integrated, and
so on
HTTP and Architecture of the
Web
Three rules for data publication
Use HTTP URIs to name everything.
1 HTTP URIs are names (URIs) supporting
lookup (HTTP access)
2
The server must provide descriptive
information about the resource identified by
that URI using the Web standard languages
(RDF)
A server must include links to HTTP URIs of other
3 things so that Web clients can discover more
things by looking up these new HTTP URIs
recursively at will
11
URL
s
http://ohsu.eagle-i.net/i/0000013c-a6f Identifies, on the Web, the existence of
8- the mouse located at the Bill Horton
a7ad-c825-3bd680000000 Laboratory with a single point mutation
(A52S)
13
URLs, URNs, URIs, IRIs
A Uniform Resource Identifier (URI) is a compact sequence of characters that
identifies an abstract or physical resource.
A Uniform Resource Name (URN) is a URI in the scheme urn intended to
serve as persistent, location-independent, resource identifier.
A Uniform Resource Locator (URL) is a URI that, in addition to identifying a
resource, provides a means of locating the resource by describing its primary
access mechanism
An Internationalized Resource Identifier (IRI) is defined similarly to a URI,
but the character set is extended to the Universal Coded Character Set
14
Differentiate URLs and URIs
The best practice is to use different URIs to identify different resources
Are we talking about a real-world resource or about a document
describing a resource?
From “Semantic Web for the Working
Ontologist”
15
Domain Name Resolution
HTTP URIs rely on the Domain Name System (DNS) for this
function. The DNS is a decentralized hierarchical naming
system for identifying devices and services connected to the
Internet
In the context of dereferenceable HTTP URIs, the DNS makes
sure that if anyone in the world resolves a particular HTTP URI,
they will all get to the same place
From “Semantic Web for the Working
Ontologist”
16
Content Negotiation
There is only one Web with all its different facets interlinked, including
data, hypertext, schemas, services, and so on
A human user or a software agent must both be capable of obtaining
descriptions of any particular resource described on theWeb in the most
suitable format for them, for example, a web page in HTML for a human
reader and an RDF/XML description for a Web robot
https://dbpedia.org/page/
Padua
https://dbpedia.org/page/
Padua
17
Content Negotiation
The content negotiation mechanism, sometimes called conneg for
short, is part of the HTTP standard. The HTTP protocol allowsWeb
clients to set headers in the requests they send in order to specify
their preferences in particular in terms of for mat
It is the responsibility of the server to match between the preferences of
the client and the options it has available, and to select the best
response for the client
The HTTP content negotiation can be used by linked data
applications to negotiate the RDF syntax they prefer (XML, Turtle,
JSON-LD) or alternate representations for their interfaces (for
example, HTML, text, image, etc.)
18
Example: The Universal Proteine Resource
https://www.uniprot.org/
A protein is identified by a URI. For instance
the Cell surface glycoprotein MUC18 is
identified by
https://www.uniprot.org/uniprot/P4312 URL redirection at
1 work
If you call the URL from a Web Browser you
get
21
Example: The Universal Proteine Resource
https://www.uniprot.org/
A protein is identified by a URI. For instance
the Cell surface glycoprotein MUC18 is
identified by
https://www.uniprot.org/uniprot/P4312 URL redirection at
1 work
If the URL is called from a machine then you
get
https://www.uniprot.org/uniprot/P43121.
ttl
21
The Five Stars of
LOD
Looking into Linked
Data
23
LD Principles in practice
24
LD Principles in practice
Uniform Resource Identifiers
26
RDF and SPARQL
27
RDF and SPARQL
28
SPARQL can be really
complex
29
SPARQL can be really
complex
29
SPARQL can be really
complex
29
SPARQL can be really
complex
29
5 star-schema of LOD
31
5 star-schema of LOD
32
★ Data available as open
access
33
★ Pros and
Cons
34
★ ★ Make it available as structured
data
35
★ ★ Pros and
Cons
All the benefits of ★
plus:
36
★ ★ ★ Use non-proprietary formats
37
All the benefits of ★ ★
plus:
★ ★ ★ Pros and Cons
38
★ ★ ★ ★ Use URIs to denote things
39
★ ★ ★ ★ Use URIs to denote things
40
★★★★★ Link your data to other data to provide
context
41
★ ★ ★ ★ ★ Pros and Cons
42
The role of Linked Open
Data
The Web of Data accounts for more than 100 billions facts.
LOD is the new emerging paradigm for data publishing, accessing
and re- using data in machine-readable format.
LOD is progressively shifting from a publishing paradigm to a
knowledge creation and sharing one.
43
LOD and Data Sharing
LOD is well-suited for enabling data sharing
fundamental for:
reproducing or verifying research
making results of publicly funded research available to the
public enabling others to ask new questions of extant data
advancing the state of research and innovation
–C. L. Borgman,
2012 “The Conundrum of Sharing
Research”
Journal of the American Society for Information Science and Technology, 63(6):1059–
1078
44
LOD and Data Citation
Giving credit to data creators and curators
Data Citation is fundamental for:
Referencing data in order to identify, discover and retrieve
them
Building and propagating knowledge
45