0% found this document useful (0 votes)
10 views61 pages

Link Open Data Graph DBMS

The document discusses the transition from relational databases to graph databases, highlighting the limitations of the relational model such as impedance mismatch, inflexible schema evolution, and performance issues with complex queries. It emphasizes the advantages of graph databases in handling highly connected data and recursive relationships more efficiently. Additionally, it touches on the concept of open data and linked open data, which facilitate data sharing and citation in a machine-readable format.

Uploaded by

mzaheerlion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views61 pages

Link Open Data Graph DBMS

The document discusses the transition from relational databases to graph databases, highlighting the limitations of the relational model such as impedance mismatch, inflexible schema evolution, and performance issues with complex queries. It emphasizes the advantages of graph databases in handling highly connected data and recursive relationships more efficiently. Additionally, it touches on the concept of open data and linked open data, which facilitate data sharing and citation in a machine-readable format.

Uploaded by

mzaheerlion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

From Relational to Graph

Databases
Relational
DBs
Relational databases are powerful and the basis of most of current
software applications

The relational model is well-suited to model many real-world


problems…

but not
all

or, at least, not in the best performing


manner!

inspired
by https://phauer.com/2015/relational-databases-strength-weaknesses-
mongodb/ 2
Weaknesses of the relational
model
Impedance mismatch between the object-oriented and the relational
world

The relational data model doesn’t fit in with every


domain

Difficult schema evolution due to an inflexible data


model

Weak distributed availability due to poor horizontal


scalability

Performance hit due to joins, ACID transactions and strict


consistency
constraints (especially in distributed environments) 9
Weaknesses of the relational
model
Impedance mismatch between the object-oriented and the relational
world

The relational data model doesn’t fit in with every


domain

Difficult schema evolution due to an inflexible data


model

Weak distributed availability due to poor horizontal


scalability

Performance hit due to joins, ACID transactions and strict


consistency
constraints (especially in distributed environments) 9
Impedance Mismatch

The application layer of a software is often based on an OO-model

the relational model saves data in


two- dimensional tables

If you want to store your object graph


in a relational database, you have to
slice and flatten your object graph until
it fits into multiple normalized tables

relational databases are not well-suited for highly connected data,


because they don’t robustly store relationships between data
elements

11
Unsuitable Data Model for Certain Domains

Some domain do not naturally fit in the relational model

Think about hierarchies or graphs

They involve recursion to address several information

needs Recursion is expensive

12
A classical example

Return the departments where Alice works

We need to join the Persons table with the Dept_Members


table

and join the result set with the Department


table
14
A classical example

https://www.youtube.com/watch?v=uuTL321Oyto&ab_channel=N
eo4j

Return the departments where Alice works

We can remove an entire table and two


joins
15
SQL strain?
Large number of
Joins

Numerous Self-JOINs (or Recursive


JOINs)

Frequent Schema
Changes

Slow-Running Queries (Despite Extensive Tuning &


Hardware)

Pre-Computing Your
Results
16
Relationships
Relational DBs are not good with Relationships

1 Storing relationships comes at a cost


(complexity)

2 Performances degrade when the number of joins


grows

3 SQL was built for set theory not graph


theory

4 New data types and relationships require schema


redesign

18
NoSQL databases?

1 Storing relationships comes at a cost


(complexity)

2 Speed plummets as you try to join data together in the


application

3 Lots of “almost SQL” languages and terrible at


joins

4 No ACID = data
corruption

19
From Relational Modelling…

Relational Model and Relational


DBMS

21
… to Graph Modelling

Graph-based Models and Graph


DBMS

22
From real-world problems to graph-based solutions
Processing and cleaning
Python, Jupiter notebooks,
… Kaggle datasets,
challenges

Modelling
Ontologies, property
graphs

Storing and querying


SPARQL,
Cypher

23
Linked (Open)
Data

24
Advanced topics
Open
Science

Data
Citation

Data
Provenance

Data
Pricing 25
What do you need to know Before Studying Graph DB

Relational Model

Relational DBMS

SQL query

language

Python/Java

Algorithmic

Paradigms Divide
and Conquer Dynamic

Programming

Greedy

Time and Space Complexity 27


What is Open
Data
• “Open data is data that can be freely used, reused and
redistributed by anyone - subject only, at most, to the requirement
to attribute and sharealike.”

• Open data must have a license to state that it’s


open
• This license might require users of data
to:
• credit the publisher (attribution)

• publish results as open data if they've mixed open data with other
data

1
History

The concept of open data is not new.

In 1942, Robert King Merton (American Sociologist 1910-2003)


explained the importance of open data and its benefits for scientific
world.

Merton claims that “each researcher must contribute to the common pot
and give up intellectual property rights to allow knowledge to move
forward”

In 1995 “Open Data” term firstly appeared in a document from


American Scientific Agency they were trying to promote people to
exchange their scientific information throughout the world

From 2007 – Today the idea of open data idea is much more possible
than never before.

4
Why Open
Data
Open data can be used to design new products, provide community
services, open up new business opportunities

Open data can help with decision making in your own

life Allow an individual to be more active in the

society Improve quality of service you offer the public

Makes government efficient and in turn reducing cost

5
Linked/Open
Data
Open Data – can exist without linking

Linked Data – can exist without being open

Linked Open Data – is open data designed to support linking to other


open data resources

Both – can be offered as file dumps and/or live services

6
Linked Data: History of Semantic
Web
'Semantic Nets' first invented for computers Richard H. Richens 1956

Introduction of Semantic Network Model by cognitive scientist Allan M.


Collins, Linguist M. Ross Quillian and Psychologist Ellizabeth F. Loftus
in early 1960s.

Tim Berners-Lee coined the term “Semantic Web”: A web of data that
can be processed directly and indirectly by machines

7
Web of
Data

8
A Web of data consists of data from around
the world that is linked together so that it can
be found, browsed, crawled, integrated, and
so on
HTTP and Architecture of the
Web
Three rules for data publication

Use HTTP URIs to name everything.


1 HTTP URIs are names (URIs) supporting
lookup (HTTP access)

2
The server must provide descriptive
information about the resource identified by
that URI using the Web standard languages
(RDF)

A server must include links to HTTP URIs of other


3 things so that Web clients can discover more
things by looking up these new HTTP URIs
recursively at will

11
URL
s

http://ohsu.eagle-i.net/i/0000013c-a6f Identifies, on the Web, the existence of


8- the mouse located at the Bill Horton
a7ad-c825-3bd680000000 Laboratory with a single point mutation
(A52S)

13
URLs, URNs, URIs, IRIs

A Uniform Resource Identifier (URI) is a compact sequence of characters that


identifies an abstract or physical resource.

A Uniform Resource Name (URN) is a URI in the scheme urn intended to


serve as persistent, location-independent, resource identifier.

A Uniform Resource Locator (URL) is a URI that, in addition to identifying a


resource, provides a means of locating the resource by describing its primary
access mechanism

An Internationalized Resource Identifier (IRI) is defined similarly to a URI,


but the character set is extended to the Universal Coded Character Set
14
Differentiate URLs and URIs

The best practice is to use different URIs to identify different resources

Are we talking about a real-world resource or about a document


describing a resource?

From “Semantic Web for the Working


Ontologist”
15
Domain Name Resolution

HTTP URIs rely on the Domain Name System (DNS) for this
function. The DNS is a decentralized hierarchical naming
system for identifying devices and services connected to the
Internet

In the context of dereferenceable HTTP URIs, the DNS makes


sure that if anyone in the world resolves a particular HTTP URI,
they will all get to the same place

From “Semantic Web for the Working


Ontologist”
16
Content Negotiation
There is only one Web with all its different facets interlinked, including
data, hypertext, schemas, services, and so on

A human user or a software agent must both be capable of obtaining


descriptions of any particular resource described on theWeb in the most
suitable format for them, for example, a web page in HTML for a human
reader and an RDF/XML description for a Web robot

https://dbpedia.org/page/
Padua

https://dbpedia.org/page/
Padua

17
Content Negotiation

The content negotiation mechanism, sometimes called conneg for


short, is part of the HTTP standard. The HTTP protocol allowsWeb
clients to set headers in the requests they send in order to specify
their preferences in particular in terms of for mat

It is the responsibility of the server to match between the preferences of


the client and the options it has available, and to select the best
response for the client

The HTTP content negotiation can be used by linked data


applications to negotiate the RDF syntax they prefer (XML, Turtle,
JSON-LD) or alternate representations for their interfaces (for
example, HTML, text, image, etc.)

18
Example: The Universal Proteine Resource

https://www.uniprot.org/

A protein is identified by a URI. For instance

the Cell surface glycoprotein MUC18 is

identified by
https://www.uniprot.org/uniprot/P4312 URL redirection at
1 work
If you call the URL from a Web Browser you
get

21
Example: The Universal Proteine Resource

https://www.uniprot.org/

A protein is identified by a URI. For instance

the Cell surface glycoprotein MUC18 is

identified by
https://www.uniprot.org/uniprot/P4312 URL redirection at
1 work

If the URL is called from a machine then you


get

https://www.uniprot.org/uniprot/P43121.
ttl
21
The Five Stars of
LOD
Looking into Linked
Data

23
LD Principles in practice

24
LD Principles in practice
Uniform Resource Identifiers

26
RDF and SPARQL

27
RDF and SPARQL

28
SPARQL can be really
complex

29
SPARQL can be really
complex

29
SPARQL can be really
complex

29
SPARQL can be really
complex

29
5 star-schema of LOD

31
5 star-schema of LOD

32
★ Data available as open
access

33
★ Pros and
Cons

34
★ ★ Make it available as structured
data

35
★ ★ Pros and
Cons
All the benefits of ★
plus:

36
★ ★ ★ Use non-proprietary formats

37
All the benefits of ★ ★
plus:
★ ★ ★ Pros and Cons

38
★ ★ ★ ★ Use URIs to denote things

39
★ ★ ★ ★ Use URIs to denote things

40
★★★★★ Link your data to other data to provide
context

41
★ ★ ★ ★ ★ Pros and Cons

42
The role of Linked Open
Data
The Web of Data accounts for more than 100 billions facts.

LOD is the new emerging paradigm for data publishing, accessing


and re- using data in machine-readable format.

LOD is progressively shifting from a publishing paradigm to a


knowledge creation and sharing one.

43
LOD and Data Sharing

LOD is well-suited for enabling data sharing


fundamental for:
reproducing or verifying research

making results of publicly funded research available to the

public enabling others to ask new questions of extant data

advancing the state of research and innovation

–C. L. Borgman,
2012 “The Conundrum of Sharing
Research”
Journal of the American Society for Information Science and Technology, 63(6):1059–
1078
44
LOD and Data Citation

Giving credit to data creators and curators


Data Citation is fundamental for:
Referencing data in order to identify, discover and retrieve
them

Building and propagating knowledge

45

You might also like