0% found this document useful (0 votes)

16 views91 pages

EKG Guide

The document describes a tutorial on building knowledge graphs. It provides an outline of the topics to be covered, including knowledge creation, hosting, curation, and deployment. It also introduces schema.org as a standard for knowledge graph schemas and types. The tutorial aims to introduce the knowledge graph lifecycle to both industry practitioners and academics.

Uploaded by

1977am

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views91 pages

EKG Guide

Uploaded by

1977am

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

How To Build A Knowledge Graph

Semantics Conference 2019, Tutorial

Elias Kärle & Umutcan Simsek,
STI2, University of Innsbruck, September 9th, 2019
About Us

Elias Kärle Umutcan Simsek

PhD Student PhD Student
[email protected] [email protected]
Twitter: @eliaska Twitter: @umutsims

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 2

Acknowledgement

This tutorial is based on the work being done in the MindLab, an industrial research project for building
knowledge graphs to be consumed by conversational agents in domains like tourism.

An extensive version of the content of this tutorial can be found in our upcoming book “Knowledge
Graphs in Use” (working title)

https://mindlab.ai

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 3

About the Tutorial
The tutorial aims to introduce our take on the knowledge graph lifecycle
Tutorial website: https://stiinnsbruck.github.io/kgt/

For industry practitioners:

An entry point to knowledge graphs. Several pointers for tackling different tasks on knowledge
graph lifecycle

For academics:
A brief overview of the literature, introduction of some tools, especially in knowledge curation.

Relevant Literature:
https://mindlab.ai/en/publications/ - An extensive list of the literature on knowledge graphs and their
applications with conversational agents

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 4

Outline and Agenda
13:30 – 15:00 Part 1
1) Introduction
2) Knowledge Creation
3) Knowledge Hosting
15:00 – 15:30 Coffee Break
15:30 – 17:30 Part 2
4) Knowledge Curation
5) Knowledge Deployment & Discussion

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 5

1. What is a Knowledge Graph?
TL;DR:
very large semantic nets that integrate various and heterogeneous information sources
to represent knowledge about certain domains of discourse.

Term coined by Google in 2012.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 6

1. What is a Knowledge Graph?
● A graph is a mathematical structure in which some pairs in a set of objects are somehow related. See
https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)

● Knowledge: knowledge level vs symbol level

We ascribe knowledge to the actions of an agent.
At the symbol level resides implementations like graph-databases.

● An agent would interpret a knowledge graph to make rational decisions to take actions to reach its
goals

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 7

1. What is a Knowledge Graph?
But wait, aren’t knowledge bases already doing this?
There are certain characteristic differences between KBs and KGs:

● KBs have a strict separation of TBox and Abox

● KGs do not have a big TBox, but have a very large ABox. There is not much to reason.

● No strict schema: Good for integrating heterogeneous sources, not so much in terms of data quality.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 8

1. Knowledge Graphs in the Wild
Name Instances Facts Types Relations
DBpedia (English) 4,806,150 176,043,129 735 2,813
YAGO 4,595,906 25,946,870 488,469 77
Freebase 49,947,845 3,041,722,635 26,507 37,781
Wikidata 15,602,060 65,993,797 23,157 1,673
NELL 2,006,896 432,845 285 425
OpenCyc 118,499 2,413,894 45,153 18,526
Google´s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000
Google´s Knowledge Vault 45,000,000 271,000,000 1,100 4,469
Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800
Numerical Overview of some Knowledge Graphs, taken from [Paulheim, 2017]

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 9

1. What is a Knowledge Graph?
● Knowledge graphs are not the first attempt for making data useful for automated agents by
integrating and enriching data from heterogeneous sources.

● Building knowledge graphs are expensive. Scaling them is challenging.

● A knowledge graph may cost 0,1 - 6 USD per fact [Paulheim, 2018]

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 10

1. What is a Knowledge Graph?
Two main entry points for improving the quality of knowledge graphs:

Fixing TBox
- We accept schema.org (and its extensions) as golden standard. No problem here.

Fixing ABox
- This is where knowledge curation comes in.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 11

1. Schema.org

Created, recommended and maintained

by four major search engines providers

http://www.schema.
org/

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 12

1. Schema.org

• Embedded in HTML source

• Microdata
• RDFa
• JSON-LD

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 13

1. Schema.org

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 14

1. Schema.org
{ "addressLocality": "Obsteig",

"@context": "http://schema.org", "postalCode": "6416",

"@type": "LocalBusiness", "addressCountry": "AT",

"name": "Imbiss-Stand \"Wurscht & Durscht\"", "telephone": "+43 664 / 26 32 319",

"geo": { "faxNumber": "",

"@type": "GeoCoordinates", "email": "[email protected]",

"latitude": "47.3006092921797", "url": "www.wudu-imbiss.at"

"longitude": "10.9136698539673" },

}, "description": "Der Imbisstand direkt

an der Bundesstraße B 189 in Obsteig
"address": { verwöhnt die Gäste mit qualitativ
hochwertigen \"Würschtln\" (Wurst) aller
"@type": "PostalAddress", Art.",
"streetAddress": "Unterer Mooswaldweg 2",

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 15

1. Schema.org ● schema.org is organized
as a hierarchy of types
and properties

● the data model is

derived from RDFS
● domainIncludes,
rangeIncludes instead of
rdfs:domain, rdfs:range

● The ranges are

disjunctive

● Types are arranged in

multiple inheritance
hierarchy

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 16

Knowledge Graph Building
Process Model

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 17

1. Knowledge Graph Building: Task Model
What we will focus on,
today

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 18

2. Knowledge Creation - Methodology
a.k.a Knowledge Acquisition: “...describes the process of extracting information from different sources,
structuring it, and managing established knowledge” - Schreiber et al.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 19

2. Knowledge Creation - Methodology
1) bottom-up: describes a first annotation process
a) analysis of a domain’s entities and their
(online) representation
b) defining a vocabulary (potentially by restricting and/or extending an already existing voc.)
c) “domain definition”, mapping to semantic vocabularies
d) annotation
e) evaluation and analysis of annotations

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 20

2. Knowledge Creation - Methodology

2) domain specification modeling: reflects the results of step 1)

formalize the findings of step 1) in a
- unified
- exchangable
- machine-read and understandable way
⇒ Domain Specifications

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 21

2. Knowledge Creation - Domain Specifications
“A domain specification is a document, defining syntactic and semantic constraints for schema.org*
annotations regarding a specific domain or application” [Holzknecht, 2019]
“[A] domain specification [is] a(n) (extended) subset of properties and restrict[s] the range of those
properties to a subset of subclasses of the range defined by schema.org*” [Simsek et al., 2017]
*or any other ontology
(extended: because we not only use schema.org, but also extensions of it if necessary)

Domain Specification are:

- annotation patterns
- a best practice for annotation users
- a “crutch” for annotation laymen
- a means of sharing a common understanding about a domain’s annotation application

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 22

2. Knowledge Creation - Domain Specifications

- DSs are serialized

in SHACL

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 23

2. Knowledge Creation - Methodology
3) top-down: applies models for further knowledge acquisition
a) mapping according to domain specifications
b) annotation development according to
domain specifications

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 24

2. Knowledge Creation - tools - semantify.it
In the “early days” of our KG building efforts: three core questions (by our show-case users*) arised
* our efforts were always driven by educating people (real users, outside of academia, mostly from the industry/tourism) to create their own semantically rich content

1) which vocabulary to use

2) how to create JSON-LD files
3) how to publish those annotations (schema.org in JSON-LD files)

Tool, developed as a research project, grown to a full-stack annotation creation, validation and publication framework!

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 25

2. Knowledge Creation - tools - semantify.it
1) Which vocabulary to choose? ⇒ schema.org
Still hundreds of classes and properties in schema.org?
Domain Specifications
- (Extended) subset of schema.org
- Domain expert builds
DS files as templates for editor
- Easy to use DS editor

Karlsruhe I Kärle
Ort&I Name
SimsekI Datum
I September 9, 2019 Seite 26
2. Knowledge Creation - tools - semantify.it
2) How to create those JSON-LD files?
- Semantify.it editor & instant annotations
- based on DS
- Inside platform (big DS files)
- or Instant Annotations (IA)
portable to every website (based on JS)
- mappers (RocketRML)
- wrapper framework
- semi-automatic
RocketRML ⇒

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 27

2. RocketRML - A Quite Scalable RML Mapper [Simsek et al., 2019]

Based on RML [Dimou et al., 2014]:

● Easier to learn RML than a programming language
● Easy sharing
● Mapping can be visualized
● Mapfiles can be faster to write than code
● Easily change mappings

RML YARRRML Matey

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 28

2. RocketRML - A Quite Scalable RML Mapper
● Resolving JOINs is the main bottleneck when it comes to mapping large input files.

● Each TriplesMap is iterated once

● Before starting the mapping process for a TriplesMap, we check whether the TriplesMap is in the
join condition of another TriplesMap. If it is, then we get the parent path of the join condition and
evaluate it. The value then is cached as path - value pair

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 29

2. RocketRML - A Quite Scalable RML Mapper
● Then we map the data based on the TriplesMap as usual. If there is a join condition encountered
during the mapping, then value of the child and path to the parent is cached in the child

● After everything is mapped, we go through the two caches and join the objects with matching child
and parent values.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 30

2. RocketRML - Performance

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 31

2. RocketRML - Source Code

https://semantifyit.github.io/RocketRML
/
Node.js implementation

Also available as Docker

container

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 32

2. RocketRML - A Quite Scalable RML Mapper
● Quick demo (https://semantifyit.github.io/rml):
Raw data set (JSON): Mapping file (YARRRML*): Mapping result:

* YARRRML is the yaml-based, human readable,

translation of the actual turtle-based RML syntax.
(http://rml.io/yarrrml/matey/)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 33

2. Knowledge Creation - tools - semantify.it
2) How to create those JSON-LD files?
- wrapper framework

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 34

2. Knowledge Creation - tools - semantify.it
2) How to create those JSON-LD files?
- semi automatic generation
- WordPress plugin
- “guess” the entities of
the web page through
machine learning
- model trained on
entities in our
knowledge graph

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 35

2. Knowledge Creation - tools - semantify.it
3) How to publish annotations (schema.org in JSON-LD files)?
- copy&paste?
→ pasting content to website is no option for inexperienced users and does not scale

- semantify.it stores all created annotations and provides them over an API
(http://smtfy.it/sj7Fie2 OR http://smtfy.it/url/http//... OR http://smtfy.it/cid/374fm38dkgi...)

- publication of annotations over JS or into popular CMSs trough plugins (Wordpress, TYPO3 etc.)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 36

2. Knowledge Creation - tools - semantify.it
Evaluator:
validation & verification
- verification against schema.org
- verification against DS
- validation against website →

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 37

2. Knowledge Creation - tools - semantify.it
Evaluator:
- validation against content of website

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 38

3. Knowledge Hosting
In our context:

“Knowledge is represented in the form of semantically enriched data”

→ metadata is added to describe the data by using a (de-facto) standard vocabulary,

according to the principles of RDF

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 39

1) identify resource with URI: e.g. “Resour
http://fritz.phantom.com ce”:
Fritz Ph
2) describe s, p, o antom
Resource Description Framework (RDF) Innsbru
ck
1.1.19?
?
Uni Inns
Subject Predicate Object
bruck
Fritz m is a rdf:type Person schema:Person
. co
m
Fritza nto has name schema:name Fritz Phantom schema:Text
z . ph
fr itFritz lives in xyz:lives Innsbruck schema:Place
p ://
htt Fritz was born in schema:born 1.1.19?? schema:Date
.a t
Fritz . g v works for xyz:works Uni Innsbruck schema:Organisation
. rol
ti
Innsbruckc k is a rdf:type town schema:Place
r u
n sb rdf:type
:/ /in Innsbruckat is in Tirol schema:Country
htt
p
. g v.
Tirol l … …
/ t iro
t p :/
ht Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 40
Resource Description Framework

what actually are the s, p, o?

Either a URL:
- to identify resources http://fritz.phantom.com
- to refer to properties of an ontology http://schema.org/name/
- to refer to types of an ontology http://schema.org/Person
or a literal
- String: “Fritz Phantom”
- Date: “1.1.19??”
- Number: 42

Karlsruhe I Kärle & Simsek I September 9, 2019 Page 41

Resource Description Framework

» 2 ways of representation (at least):

1. JSON-LD (for websites) 2. Graph Database (Knowledge Graph)

{“@context”:”http://schema.org” type schema:Person

“@type”:”Person”
http://fritz...
“@id”: “https://fritz.phantom.com”, schema:Organisation
“livesIn”:”Innsbruck”
worksFor
“born”:”19??-01-01” type

“worksFor”:{“@type”:”Organisation”, Uni IBK

“name”:”Uni Innsbruck”}}

Karlsruhe I Kärle & Simsek I September 9, 2019 Page 42

3. Knowledge Hosting
Two different approaches for storing semantically annotated data,
depending on the use case:

Either as
1) JSON-LD
or as
2) Knowledge Graph

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 43

3. Knowledge Hosting
1) Storing as JSON-LD:

Use-case: storing semantically annotated data for usage on websites

→ the classical semantify.it use-case
→ many people use semantic annotations exclusively for website for SEO
Collection/creation: manual or semi-automatic editing, mapping, wrapper framework (was covered in
previous section) or even crawling of annotated web-sites
Storage: JSON-based document database, e.g. MongoDB
(JSON-LD is in fact JSON)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 44

3. Knowledge Hosting
1) Storing as JSON-LD:
Pros:
- seamless and lightning-fast storage and retrieval (through advanced JSON indexing)
- lightweight (little processing power overhead)
- cost effective (starts with powerful free versions)
- good framework integration for web-development
- well documented
- huge community
Cons:
- no native RDF reasoning
- reasonig requires extensive programming and processing power overhead

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 45

3. Knowledge Hosting
1) Storing as JSON-LD:
Query:
- over an API, through GET request

Summary:
- works very well with tens of millions of JSON-LD files
- we replicate this data periodically into a graph database for “real” Knowledge Graph usage

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 46

3. Knowledge Hosting
2) Storing as Knowledge Graph:
Use-case: storing semantically annotated data as a full-fledged Knowledge Graph
→ Open Data repositories in tourism
→ enterprise Knowledge Graphs
→ advanced reasoning needs
→ AI, intelligent assistants
Collection/creation: due to potentially millions of annotation files: mapping, wrapper framework or also
crawling of annotated web-sites → semantify.it-broker

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 47

3. Knowledge Hosting
semantify.it-broker:
- crawling platform to collect
annotated data in
JSON-LD, Microdata, RDFa
- storage in graph database
- provision of SPARQL UI

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 48

3. Knowledge Hosting
2) Storing as Knowledge Graph:
Storage: due to RDF-nature, storage in graph database
with respect to:
- provenance
- historical data
- data duplication
In our current setting:
- historical data is kept im named graphs
- ~13 Billion statements

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 49

3. Knowledge Hosting
2) Storing as Knowledge Graph:
Storage: popular triple stores (https://www.w3.org/wiki/LargeTripleStores)

# Name # triples tested with

1 Oracle Spatial and Graph with Oracle Database 12c 1.08 T
2 AnzoGraph DB by Cambridge Semantics 1.065 T
3 AllegroGraph 1+ T
4 Stardog 50 B
5 OpenLink Virtuoso v7+ 39.8 B
6 GraphDB™ by Ontotext 17 B

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 50

3. Knowledge Hosting
2) Storing as Knowledge Graph:
Pros:
- querying through native SPARQL endpoint
Cons:
- resource intensive
- expensive

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 51

3. Knowledge Hosting
2) Storing as Knowledge Graph:
Query:
- SPARQL
http://graphdb.sti2.at:8080/sparql

Summary:
- overhead aside: great for big
knowledge graphs

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 52

4. Knowledge Curation
● Knowledge Assessment

● Knowledge Cleaning

● Knowledge Enrichment

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 53

4. Knowledge Curation - A Simple KR Formalism - TBox
1. Two disjoint and finite sets of type and property names T and P.

2. A finite number of type definitions isA(t1,t2) with t1 and t2 are elements of T. isA is reflexive and
transitive.

3. A finite number of property definitions:

○ hasDomain(p,t) with p is an element of P and t an element of T.

○ Range definition for a property p with p is an element of P, t1 and t2 are Elements of T. Simple
definition: Global property definition: hasRange(p,t2)
■ Refined definition: Local property definition: hasRange(p,t2) for domain t1, short:
hasLocalRange(p,t1,t2)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 54

4. Knowledge Curation - A Simple KR Formalism - ABox
1. A countable set of instance identifiers I. i, i1, and i2 are elements of I.

2. Instance assertions: isElementOf(i,t).isElementOf is a special property with build-in semantics. If

isA(t1,t2) AND isElementOf(i,t1) THEN isElementOf(i,t2).

3. Property value assertions: p(i1,i2).

4. Equality assertions: isSameAs(i1,i2). We allow another build-in property to express identity of

instances. It is symmetric, reflexive, and transitive.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 55

4. Knowledge Curation - Knowledge Assessment
● First step to improve the quality of a KG: Assess the situation

● Closely related to data quality literature

● Various dimensions for data quality assessment introduced [Batini & Scannapieco, 2006], [Färber et
al., 2018], [Pipino et al., 2002], [Wang, 1998], [Wang & Strong, 1996], [Wang et al., 2001], [Zaveri et
al., 2016])

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 56

4. Knowledge Curation - Knowledge Assessment
1. accessibility 9. easy of manipulating 17. reputation,
2. accuracy (veracity) 10. easy of operation 18. security,
3. appropriate amount 11. easy of understanding 19. timeliness (velocity),
4. believability 12. flexibility 20. traceability,
5. completeness 13. free-of-error 21. understandability,
6. concise representation 14. interoperability 22. value-added, and
7. consistent representation 15. objectivity 23. variety
8. cost-effectiveness 16. relevancy **fitness for use**

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 57

4. Knowledge Curation - Knowledge Assessment Tasks
● Two core assessment dimensions for Knowledge Graphs
○ Correctness
○ Completeness

● Three quality issue sources:

○ Instance assertions
○ Property value assertions
○ Equality assertions

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 58

4. Knowledge Curation - Knowledge Assessment Tools
● WIQA (Web Information Quality Assessment Framework)
http://wifo5-03.informatik.uni-mannheim.de/bizer/wiqa/ [Bizer and Cyganiak, 2009]:

Allows defining policies to filter triples in a graph

● SWIQA (Semantic Web Information Quality Assessment Framework) [Fürber & Hepp, 2011]:

A set of SPARQL-based rules to assess data quality

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 59

4. Knowledge Curation - Knowledge Assessment Tools

● LINK-QA [Guéret et al., 2012]

Benefits from network features to assess data quality (e.g. counting open chains to find wrongly
asserted isSameAs relationships)

● Sieve [Mendes et al., 2012] https://github.com/wbsg/ldif/

Uses data quality indicators, scoring functions and assessment metrics

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 60

4. Knowledge Curation - Knowledge Assessment Tools
● Validata [Hansen et al., 2015] https://github.com/HW-SWeL/Validata

An online tool check the conformance of RDF graphs against ShEx (Shape Expressions)

● Luzzu (A Quality Assessment Framework for Linked Open Datasets) [Debattista et al., 2016]
https://eis-bonn.github.io/Luzzu/downloads.html

Allows declarative definitions of quality metrics and produces machine-readable assessment reports
based on Dataset Quality Vocabulary

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 61

4. Knowledge Curation - Knowledge Assessment Tools
● RDFUnit [Kontokostas et al., 2014] https://github.com/AKSW/RDFUnit/ :

A framework that assesses linked data quality based on test cases defined in various ways (e.g.
RDFS/OWL axioms can be converted into constraints)

● SDType [Paulheim & Bizer, 2013] https://github.com/HeikoPaulheim/sd-type-validate

Uses statistical distributions to predict the types of instances. Incoming and outgoing properties are
used as indicators for the types of resources.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 62

4. Knowledge Curation - Knowledge Assessment Tools - Example
● Sieve for Data Quality Assessment

○ Data Quality Indicators: Various type of (meta)data that can be used to assess data quality e.g. data
about the dataset provider, user ratings

○ Scoring Functions: A set of functions that help the calculation of assessment metrics based on the
indicators

○ Assessment Metrics: Metrics like relevancy, timeliness that help users to assess the quality for an
intended use

○ Aggregate Metrics: Allow users to aggregate new metrics based on simple assessment metrics.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 63

4. Knowledge Curation - Knowledge Assessment Tools - Example
SCORING FUNCTION EXAMPLE
TimeCloseness measures the distance from the input date to the current (system)
date. Dates outside the range receive value 0, and dates that are
more recent receive values closer to 1.
Preference assigns decreasing, uniformly distributed, real values to each graph
URI provided as a space-separated list.
SetMembership assigns 1 if the value of the indicator provided as input belongs to
the set informed as parameter, 0 otherwise.
Threshold assigns 1 if the value of the indicator provided as input is higher
than a threshold informed as parameter, 0 otherwise.
IntervalMembership Assigns 1 if the value of the indicator provided as input is within
the interval informed as parameter, 0 otherwise.

Assessment Metrics in Sieve

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 64

4. Knowledge Curation - Knowledge Cleaning
● The actions taken to improve the correctness of a knowledge graph.

● Two major steps:

○ Error detection
○ Error correction

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 65

4. Knowledge Curation - Knowledge Cleaning Tasks
Detection and correction of wrong instance assertions: isElementOf(i.t)

Error Correction
i is not a proper instance identifier Delete assertion or correct i
i1 is not a valid instance identifier Delete assertion or correct t.
Instance assertion is semantically incorrect Delete assertion or find proper t.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 66

4. Knowledge Curation - Knowledge Cleaning Tasks
Detection and correction of wrong property value assertions p(i1,i2)

Error Correction
p is not a valid property Delete assertion or correct p
i1 is not a valid instance identifier Delete assertion or correct i1
i1 is not in any domain of p Delete assertion or add assertion isElementOf(i1,t)
where t is in a domain of p

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 67

4. Knowledge Curation - Knowledge Cleaning Tasks
Detection and correction of wrong property value assertions p(i1,i2)

Error Correction
i2 is not a valid instance identifier delete assertion or correct i2
i2 is not in any range of p where i1 is an element of Delete assertion or
a domain of p.
Add assertion isElementOf(i1,t1) given that
hasLocalRange(t1,p,t2) and isElementOf(i2,t2)

Add assertion isElementOf(i2,t2) given that

hasLocalRange(t1,p,t2) and isElementOf(i1,t1)
Property assertion is semantically incorrect. Delete assertion or define a proper i2 or find a
better p or better i1

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 68

4. Knowledge Curation - Knowledge Cleaning Tasks
Detection and correction of wrong equality assertions isSameAs(i1,i2)

Error Correction
i1 is not a valid instance identifier Delete assertion or correct i1
i2 is not a valid instance identifier Delete assertion or correct i2
Equality assertion is semantically wrong Delete assertion or loosen the semantics (e.g.
replace by a skos operator)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 69

4. Knowledge Curation - Knowledge Cleaning Tools
● HoloClean [Rekatsinas et al., 2017] https://hazyresearch.github.io/snorkel/blog/holoclean.html

An error detection and correction tool based on integrity constraints to identify conflicting and invalid
values, external information to support the constraints, and quantitative statistics to detect outliers.

● KATARA [Chu et al., 2015]

Learns the relationships between data columns and validate the learn patterns with the help of
existing Knowledge Bases and crowd, in order to detect errors in the data. Afterwards it also suggests
possible repairs.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 70

4. Knowledge Curation - Knowledge Cleaning Tools
● SDValidate [Paulheim & Bizer, 2014] https://github.com/HeikoPaulheim/sd-type-validate

Uses statistical distribution to detect erroneous statements that connect two resources. The
statements with less frequent predicate-object pairs are selected as candidates for being wrong.

● SHACL https://www.w3.org/TR/shacl/ and ShEx https://shex.io/shex-semantics/index.html

Two approaches that aim to verify RDF graphs against a specification (so called shapes).
For a comparison of two approaches, see Chapter 7 in [Gayo et al., 2017]

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 71

4. Knowledge Curation - Knowledge Cleaning Tools
● LOD Laundromat [Beek et al., 2014] http://lodlaundromat.org/

Detects and corrects syntactic errors (e.g. bad encoding, broken IRIs), replaces blank nodes with IRIs,
removes duplicates in dirty linked open data and re-publishes it in a canonical format.

● TISCO [Rula et al., 2019]

A framework that tries to identify the time interval where a statement was correct. It uses external
knowledge bases and the web content to extract evidence to assess the validity of a statement for a
time interval.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 72

4. Knowledge Curation - Knowledge Enrichment
Improve the completeness of a knowledge graph by adding new statements

● Consists of following steps

○ Identifying new knowledge sources

○ Integration of TBox
○ Integration of Abox

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 73

4. Knowledge Curation - Knowledge Enrichment
● Identifying knowledge sources

○ Open sources (e.g. LOD) - may be automated to some extent

○ Proprietary sources - usually very hard automate

● Integration of TBox
○ We assume that all data sources are mapped to schema.org
○ Non-RDF sources can be also mapped with the techniques described in Knowledge Creation

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 74

4. Knowledge Curation - Knowledge Enrichment
● Integration of ABox

○ Issue-1: Identifying and resolving duplicates

○ Issue-2: Invalid property assertions (e.g. multiple disjoint values for unique properties, domain
and range violations)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 75

4. Knowledge Curation - Knowledge Enrichment
Different names for the same problem! [Getoor et al.,
2012]

Tackling issues:

● Entity resolution: Derive new isSameAs(i1,i2)

assertions and aligning their property assertions

● Conflict resolution: Resolve conflicting property

assertions

● Enrichment also has implications towards

cleaning!

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 76

4. Knowledge Curation - Knowledge Enrichment Tasks
● Identifying and resolving duplicates
● Resolving conflicting property assertions

can be realized by

● addition of missing instance assertions: isElementOf(i,t)

● addition or deletion of property value assertions: p(i1,i2)
● addition of missing equality assertions: isSameAs(i1,i2)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 77

4. Knowledge Curation - Knowledge Enrichment Tools
Duplication detection and resolution tools

● Dedupe: https://github.com/dedupeio/dedupe

A python library that uses machine learning to find duplicates in a dataset and to link two datasets.

● Duke [Garshol & Borge, 2013]: https://github.com/larsga/Duke

Uses various similarity metrics to detect duplicates in a dataset or link records between two datasets based
on a given configuration. The configuration parameters can be

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 78

4. Knowledge Curation - Knowledge Enrichment Tools
Duplication detection and resolution tools

● Legato [Achichi et al., 2017] https://github.com/DOREMUS-ANR/legato

A recording linkage tool that utilizes Concise Bounded Description* of resources for comparison.
*https://www.w3.org/Submission/2004/SUBM-CBD-20040930/#r6

● LIMES [Ngomo & Auer, 2011] https://github.com/dice-group/LIMES

A link discovery approach that benefits from the metric spaces (in particular triangle inequality) to
reduce the amount of comparisons between source and target dataset.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 79

4. Knowledge Curation - Knowledge Enrichment Tools
Duplication detection and resolution tools

● SERIMI [Araújo et al., 2011] https://github.com/samuraraujo/SERIMI-RDF-Interlinking

A link discovery tool that utilizes string similarity functions on “label properties” without a prior
knowledge of data or schema

● SILK [Volz et al., 2009] http://silkframework.org/

A link discovery tool with declerative linkage rules applying different similarity metrics (e.g. string,
taxonomic, set) that also supports policies for the notification of datasets when one of them
publishes new links to others.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 80

4. Knowledge Curation - Knowledge Enrichment Tools
Conflict resolution tools

● FAGI [Giannopoulos et al., 2014] https://github.com/GeoKnow/FAGI-gis

A framework for fusing geospatial data. It suggests fusion strategies based on two datasets with
geospatial data and a set of linked entities.

● KnoFuss [Nikolov et al., 2008] http://technologies.kmi.open.ac.uk/knofuss/

A framework that allows the application of different methods on different attributes in the same
dataset for identification of duplicates and resolves inconsistencies caused by the fusion of linked
instances.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 81

4. Knowledge Curation - Knowledge Enrichment Tools
Conflict resolution tools

● ODCleanStore [Knap et al., 2012]

A framework that contains a fusion module that allows users to configure conflict resolution policies
based on different functions (e.g. AVG, MAX, CONCAT) that can be applied on conflicting property
values.

● Sieve [Mendes et al., 2012]

Sieve has a data fusion module that supports different fusion functions on selected property values. It
also utilizes the assessment values from the assessment module in the fusion process.

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 82

4. Knowledge Curation - Knowledge Enrichment Tools - Demo

Duplication detection and resolution with Duke

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 83

5. Knowledge Deployment
- training of ML models based on KGs
- due to the RDF nature data in KGs is semantically described
- good training data for ML models
- conversational agents
- chatbots
- intelligent personal assistants
- question answering over LinkedData
- OpenData sharing platforms
- currently Open(Government)Data often makes little sense
(scanned pdfs, weird spreadsheets,csv, …)
- LinkedData is self explaining (see lod-cloud https://lod-cloud.net)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 84

5. Knowledge Deployment - discussion
- are you using KGs in your enterprise / research already?

- are you planning to?

- where do you see the potential

- where do you see challenges / risks?

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 85

References
[Achichi et al., 2017] Achichi, M., Bellahsene, Z., Todorov, K.: Legato results for OAEI 2017. In: Proceedings of the 12th International Workshop
on Ontology Matching (OM2017) co-located with the 16th International Semantic Web Conference (ISWC2017), Vienna, Austria, October 21,
2017. CEUR Workshop Proceedings, vol. 2032, pp. 146–152. CEUR-WS.org (2017)
[Araújo et al., 2011] Araújo, S., Hidders, J., Schwabe, D., de Vries, A.P.: SERIMI - resource description similarity, RDF instance matching and
interlinking. In: Proceedings of the 6th International Workshop on Ontology Matching (OM2011), Bonn, Germany, October 24, 2011. CEUR
Workshop Proceedings, vol. 814. CEUR-WS.org (2011), http://ceur-ws.org/Vol-814/om2011_poster6.pdf
[Batini & Scannapieco, 2006] Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Data-Centric Systems and
Applications, Springer (2006). https://doi.org/10.1007/3-540-33173-5
[Beek et al., 2014] Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: A uniform way of publishing other
people’s dirty data. In: Proceedings of the 13th International Semantic Web Conference (ISWC2014), Riva del Garda, Italy, October 19-23,
2014. Lecture Notes in Computer Science, vol. 8796, pp. 213–228. Springer (2014). https://doi.org/10.1007/978-3-319-11964-9_14
[Bizer and Cyganiak, 2009] Bizer, C., Cyganiak, R.: Quality-driven information filtering using the WIQA policy framework. Journal of Web
Semantics 7(1), 1–10 (2009). https://doi.org/10.1016/j.websem.2008.02.005
[Chu et al., 2015] Chu, X., Ouzzani, M., Morcos, J., Ilyas, I.F., Papotti, P., Tang, N., Ye, Y.: KATARA: reliable data cleaning with knowledge bases
and crowdsourcing. Proceedings of the 41st International Conference on Very Large Data Bases (PVLDB2015), VLDB Endowment, Hawaii,
August 31- September 4, 2015 8(12), 1952–1955 (2015). https://doi.org/10.14778/2824032.2824109,
http://www.vldb.org/pvldb/vol8/p1952-chu.pdf
[Debattista et al., 2016] Debattista, J., Auer, S., Lange, C.: Luzzu - A methodology and framework for linked data quality assessment. Journal of
Data and Information Quality (JDIQ) 8(1), 4:1–4:32 (2016). https://doi.org/10.1145/2992786

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 86

References
[Dimou et al.,, 2014] Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.: RML: A generic language for integrated
RDF mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web (LDOW2014) colocated with the 23rd
International World Wide Web Conference (WWW2014), Seoul, Korea, April 8, 2014. CEUR Workshop Proceedings, vol. 1184. CEUR- WS.org
(2014), http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf
[Färber et al., 2018] Farber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of dbpedia, freebase, opencyc, wikidata, and
YAGO. Semantic Web Journal 9(1), 77–129 (2018). https://doi.org/10.3233/SW-170275
[Fürber & Hepp, 2011] Fürber, C., Hepp, M.: Swiqa - a semantic web information quality assessment framework. In: Proceedings of the 19th
European Conference on Information Systems (ECIS2011), Helsinki, Finland, June 9-11, 2011. p. 76. Association for Information Systems (AIS e
Library) (2011), http://aisel.aisnet.org/ecis2011/76
[Garshol & Borge, 2013] Garshol, L.M., Borge, A.: Hafslund sesam - an archive on semantics. In: Proceedings of the 10th Extending Semantic
Web Conference (ESWC2013): Semantics and Big Data, Montpellier, France, May 26-30, 2013. Lecture Notes in Computer Science, vol. 7882,
pp. 578–592. Springer (2013). https://doi.org/10.1007/978-3-642-38288-8_39
[Gayo et al., 2017] Gayo, J. E. L., Prud'hommeaux, E., Boneva, I.,, Kontokostas, D. Validating RDF Data. Morgan & Claypool Publishers, (2017).
[Getoor et al., 2012] Getoor, L., Machanavajjhala, A.: Entity resolution: Theory, practice & open challenges. Proceedings of the 38th
International Conference on Very Large Data Bases 5(12), 2018–2019 (2012). https://doi.org/10.14778/2367502.2367564
[Giannopoulos et al., 2014] Giannopoulos, G., Skoutas, D., Maroulis, T., Karagiannakis, N., Athanasiou, S.: FAGI: A framework for fusing
geospatial RDF data. In: Proceedings of the Confederated International Conferences ”On the Move to Meaningful Internet Systems”
(OTM2014), Amantea, Italy, October 27-31, 2014. Lec- ture Notes in Computer Science, vol. 8841, pp. 553–561. Springer (2014).
https://doi.org/10.1007/978-3-662-45563-0_33

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 87

References
[Guéret et al., 2012] Guéret, C., Groth, P.T., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Proceedings
of the 9th Extended Semantic Web Conference (ESWC2012), Heraklion, Greece, May 27-31, 2012. Lecture Notes in Computer Science, vol.
7295, pp. 87–102. Springer (2012). https://doi.org/10.1007/978-3-642-30284-8_13
[Hansen et al., 2015] Hansen, J.B., Beveridge, A., Farmer, R., Gehrmann, L., Gray, A.J.G., Khutan, S., Robertson, T., Val, J.: Validata: An online
tool for testing RDF data conformance. In: Proceedings of the 8th International Conference on Semantic Web Applications and Tools for Life
Sciences (SWAT4LS2015), Cambridge, UK, December 7-10, 2015. CEUR Workshop Proceedings, vol. 1546, pp. 157–166. CEUR-WS.org (2015),
http://ceur-ws.org/Vol-1546/paper_3.pdf
[Knap et al., 2012] Knap, T., Michelfeit, J., Necaský, M.: Linked open data aggregation: Conflict resolution and aggregate quality. In: Proceedings
of the 36th Annual IEEE Computer Software and Applications Conference Workshops (COMPSAC2012), Izmir, Turkey, July 16-20, 2012. pp.
106–111. IEEE Computer Society (2012). https://doi.org/10.1109/COMPSACW.2012.29
[Kontokostas et al., 2014] Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation
of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web (WWW2014), Seoul, Korea, April 07 - 11,
2014. pp. 747–758. ACM (2014). https://doi.org/10.1145/2566486.2568002
[Mendes et al., 2012] Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of 2nd
International Workshop on Linked Web Data Management (LWDM 2012), in conjunction with the 15th International Conference on Extending
Database Technology (EDBT2012): Workshops, Berlin, Germany, March 30, 2012. pp. 116–123. ACM (2012).
https://doi.org/10.1145/2320765.2320803
[Ngomo & Auer, 2011] Ngomo, A.N., Auer, S.: LIMES - A time-efficient approach for large-scale link discovery on the web of data. In: Walsh, T.
(ed.) Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI2011), Barcelona, Spain, July 1622, 2011. pp.
2312–2317. AAAI Press (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-385

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 88

References
[Nikolov et al., 2008] Nikolov, A., Uren, V.S., Motta, E., Roeck, A.N.D.: Integration of semantically annotated data by the knofuss architecture.
In: Gangemi, A., Euzenat, J. (eds.) Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management
(EKAW2008): Practice and Patterns, Acitrezza, Italy, September 29 - October 2, 2008. Lecture Notes in Computer Science, vol. 5268, pp.
265–274. Springer (2008). https://doi.org/10.1007/978-3-540-87696-0_24
[Paulheim & Bizer, 2013] Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Proceedings of the 12th International Semantic Web
Conference (ISWC2013), Sydney, Australia, October 21-25, 2013. Lecture Notes in Computer Science, vol. 8218, pp. 510–525. Springer (2013).
https://doi.org/10.1007/978-3-642-41335-3_32
[Paulheim & Bizer, 2014] Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. International Journal on
Semantic Web and Information Systems (IJSWIS) 10(2), 63–86 (2014). https://doi.org/10.4018/ijswis.2014040104
[Paulheim, 2017] Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web Journal 8(3),
489–508 (2017). https://doi.org/10.3233/SW-160218
[Paulheim, 2018] Paulheim, H.: How much is a triple? estimating the cost of knowledge graph creation. In: Proceedings of the 17th
International Semantic Web Conference (ISWC2018): Posters & Demonstrations, Industry and Blue Sky Ideas Tracks, Monterey, USA, October
8-12, 2018. CEUR Workshop Proceedings, vol. 2180. CEUR-WS.org (2018),
http://ceur-ws.org/Vol-2180/ISWC_2018_Outrageous_Ideas_paper_10.pdf
[Pipino et al., 2002] Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002).
https://doi.org/10.1145/505248.5060010
[Rekatsinas et al., 2017] Rekatsinas, T., Chu, X., Ilyas, I.F., R´e, C.: Holoclean: Holistic data repairs with probabilistic inference. Proceedings of
the Very Large Data Bases Endowment 10(11), 1190–1201 (2017). https://doi.org/10.14778/3137628.3137631,
http://www.vldb.org/pvldb/vol10/p1190-rekatsinas.pdf

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 89

References
[Rula et al., 2019] Rula, A., Palmonari, M., Rubinacci, S., Ngomo, A.N., Lehmann, J., Maurino, A., Esteves, D.: TISCO: temporal scoping of facts.
Journal of Web Semantics 54, 72–86 (2019). https://doi.org/10.1016/j.websem.2018.09.002
[Simsek et al., 2019] Simsek, U., Kärle, E., Fensel, D.: Rocketrml - A nodejs implementation of a use-case specific RML mapper. In: Proceedings
of 1st Knowledge Graph Building Workshop co-located with 16th Extended Semantic Web Conference (ESWC), Portoroz, Slovenia, June 3,
2019. vol. abs/1903.04969 (2019), http://arxiv.org/abs/1903.04969
[Volz et al., 2009] Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Proceedings of the 8th
International Semantic Web Conference (ISWC 2009), Chantilly, USA, October 25-29, 2009. Lecture Notes in Computer Science, vol. 5823, pp.
650–665. Springer (2009). https://doi.org/10.1007/978-3-642-04930-9_41
[Wang, 1998] Wang, R.Y.: A product perspective on total data quality management. Communication of the ACM 41(2), 58–65 (1998).
https://doi.org/10.1145/269012.269022
[Wang & Strong, 1996] Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. Journal of Management
Information Systems 12(4), 5–33 (1996), http://www.jmis-web.org/articles/1002
[Wang et al., 2001] Wang, R.Y., Ziad, M., Lee, Y.W.: Data Quality, Advances in Database Systems, vol. 23. Kluwer Academic Publisher (2001).
https://doi.org/10.1007/b116303
[Zaveri et al., 2016] Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic
Web Journal 7(1), 63–93 (2016)

Karlsruhe I Kärle & Simsek I September 9, 2019 Seite 90

Mogaji E. Green Marketing in Emerging Economies... 2022
No ratings yet
Mogaji E. Green Marketing in Emerging Economies... 2022
289 pages
TCS AnI Presentation - Vodafone - Idea Big Data RFP - v1.4
No ratings yet
TCS AnI Presentation - Vodafone - Idea Big Data RFP - v1.4
20 pages
Knowledge Representation and Reasoning
No ratings yet
Knowledge Representation and Reasoning
155 pages
Dokumen - Pub - Designing and Building Enterprise Knowledge Graphs 1nbsped 1636391745 9781636391748 9781636391755 9781636391762
No ratings yet
Dokumen - Pub - Designing and Building Enterprise Knowledge Graphs 1nbsped 1636391745 9781636391748 9781636391755 9781636391762
168 pages
Knowledge Graphs
No ratings yet
Knowledge Graphs
150 pages
The Rise of The Knowledge Graph
No ratings yet
The Rise of The Knowledge Graph
88 pages
A Data Engineer's Guide To Semantic Modelling - Ilaria Maresi
No ratings yet
A Data Engineer's Guide To Semantic Modelling - Ilaria Maresi
50 pages
The Knowledge Graph Cookbook
No ratings yet
The Knowledge Graph Cookbook
228 pages
Wallstreetjournalmag 20211113 TheWallStreetJournalMagazine
100% (1)
Wallstreetjournalmag 20211113 TheWallStreetJournalMagazine
121 pages
Pepper Avhandling Trykkversjon
No ratings yet
Pepper Avhandling Trykkversjon
540 pages
Eckerson Group Enhanced Data Fabric
100% (2)
Eckerson Group Enhanced Data Fabric
26 pages
ArcPy and ArcGIS - Second Edition
From Everand
ArcPy and ArcGIS - Second Edition
Silas Toms
4/5 (2)
Knowledge Graphs For Explainable Artificial Intelligence Foundations Applications and Challenges Studies On The Semantic Web Pascal Hitzler Eds Instant Download
No ratings yet
Knowledge Graphs For Explainable Artificial Intelligence Foundations Applications and Challenges Studies On The Semantic Web Pascal Hitzler Eds Instant Download
84 pages
ST 1
No ratings yet
ST 1
54 pages
KG Generations
No ratings yet
KG Generations
87 pages
Kdd2014 Gabrilovich Bordes Knowledge Graphs
No ratings yet
Kdd2014 Gabrilovich Bordes Knowledge Graphs
159 pages
A Survey On Augmenting Knowledge Graphs (KGS) With Large Language Models (LLMS) - Models, Evaluation Metrics, Benchmarks, and Challenges
No ratings yet
A Survey On Augmenting Knowledge Graphs (KGS) With Large Language Models (LLMS) - Models, Evaluation Metrics, Benchmarks, and Challenges
28 pages
Part1 Intro
No ratings yet
Part1 Intro
48 pages
Knowledge Graphs
No ratings yet
Knowledge Graphs
150 pages
Introduction To RDF and SPARQL
No ratings yet
Introduction To RDF and SPARQL
45 pages
BigData-V I
No ratings yet
BigData-V I
35 pages
Knowledge Graph Tutorial
No ratings yet
Knowledge Graph Tutorial
175 pages
Knowledge Graphs
No ratings yet
Knowledge Graphs
43 pages
Electronics 10 02616 v2
No ratings yet
Electronics 10 02616 v2
27 pages
SSRN 4608445
No ratings yet
SSRN 4608445
64 pages
RANDI DK III Fix
No ratings yet
RANDI DK III Fix
620 pages
Intro
No ratings yet
Intro
44 pages
Flytxt at Vodafone India Architecture and Integrations
100% (1)
Flytxt at Vodafone India Architecture and Integrations
8 pages
Knowledge Graphs and or Vs For LLMs 1717939076
No ratings yet
Knowledge Graphs and or Vs For LLMs 1717939076
20 pages
Kejriwal Knowledge Graph Tutorial - 2020-12-asonam-tutorial-KG
No ratings yet
Kejriwal Knowledge Graph Tutorial - 2020-12-asonam-tutorial-KG
73 pages
1.5 - Knowledge Graphs
No ratings yet
1.5 - Knowledge Graphs
21 pages
Research Review of The Knowledge Graph and Its Application
No ratings yet
Research Review of The Knowledge Graph and Its Application
20 pages
Synthesis Lectures On Data, Semantics, and Knowledge
No ratings yet
Synthesis Lectures On Data, Semantics, and Knowledge
121 pages
Autonomy in The Age of Knowledge Graphs: Vision and Challenges
No ratings yet
Autonomy in The Age of Knowledge Graphs: Vision and Challenges
22 pages
AI Presentation On Knowledge Representation
No ratings yet
AI Presentation On Knowledge Representation
7 pages
TigerGraph Buyers Guide Part 1
No ratings yet
TigerGraph Buyers Guide Part 1
8 pages
Ch1. Knowledge Graph
No ratings yet
Ch1. Knowledge Graph
66 pages
Lecture 9 - Knowledge Graph
No ratings yet
Lecture 9 - Knowledge Graph
31 pages
International Baccalaureate (IB) : Artificial Neural Networks - #3
No ratings yet
International Baccalaureate (IB) : Artificial Neural Networks - #3
13 pages
Conclusions
No ratings yet
Conclusions
22 pages
TCS Big Data Lake Presentation - VIL - 17apr2019
100% (1)
TCS Big Data Lake Presentation - VIL - 17apr2019
21 pages
Knowledge Graphs A Tutorial On The History of Knowledge Graph's Main Ideas
No ratings yet
Knowledge Graphs A Tutorial On The History of Knowledge Graph's Main Ideas
2 pages
(2022) Knowledge Graph - A Giude Tour (21 Pages)
No ratings yet
(2022) Knowledge Graph - A Giude Tour (21 Pages)
21 pages
Interactive Domain-Specific Knowledge Graphs
No ratings yet
Interactive Domain-Specific Knowledge Graphs
14 pages
Your Guide To Knowledge Graphs. All You Need To Know About Knowledge - by Diego Lopez Yse - Medium
No ratings yet
Your Guide To Knowledge Graphs. All You Need To Know About Knowledge - by Diego Lopez Yse - Medium
18 pages
W06 - Knowledge Representation in AI
No ratings yet
W06 - Knowledge Representation in AI
23 pages
1.0 - Knowledge Representation With Graphs
No ratings yet
1.0 - Knowledge Representation With Graphs
5 pages
Wei 2009
No ratings yet
Wei 2009
5 pages
Graph 4
No ratings yet
Graph 4
4 pages
Crafting A Knowledge Graph - The Semantic Data Modeling Way
No ratings yet
Crafting A Knowledge Graph - The Semantic Data Modeling Way
5 pages
Alan Griffiths Digital Television Strategies Business Challenges and Opportunities Palgrave Macmillan 2003
No ratings yet
Alan Griffiths Digital Television Strategies Business Challenges and Opportunities Palgrave Macmillan 2003
107 pages
Settings Provider
No ratings yet
Settings Provider
55 pages
Knowledge Graph
No ratings yet
Knowledge Graph
1 page
Graph 1
No ratings yet
Graph 1
3 pages
Full Text
No ratings yet
Full Text
32 pages
Graph 2
No ratings yet
Graph 2
3 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
No ratings yet
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
Unit 3-KR
No ratings yet
Unit 3-KR
50 pages
Deloitte NL Risk Knowledge Graphs Financial Services
No ratings yet
Deloitte NL Risk Knowledge Graphs Financial Services
16 pages
Graph 3
No ratings yet
Graph 3
3 pages
Dpi Review
No ratings yet
Dpi Review
48 pages
LPG - Neo4j GDS Presentation - 2020
No ratings yet
LPG - Neo4j GDS Presentation - 2020
53 pages
Semantic Data Lakes - Informatik 2020 - KIT - Gesellschaft Fuer Informatik Karlsruhe 2020
No ratings yet
Semantic Data Lakes - Informatik 2020 - KIT - Gesellschaft Fuer Informatik Karlsruhe 2020
10 pages
Graph 2
No ratings yet
Graph 2
3 pages
Settingsprovider
No ratings yet
Settingsprovider
49 pages
From Human Experts To Machines An LLM
No ratings yet
From Human Experts To Machines An LLM
10 pages
Graph
No ratings yet
Graph
3 pages
Stardog Data Fabric Whitepaper
No ratings yet
Stardog Data Fabric Whitepaper
28 pages
Cie-10 Volumen 3 Sec III
No ratings yet
Cie-10 Volumen 3 Sec III
122 pages
Module1 Knowledge Graphs Introduction-2
No ratings yet
Module1 Knowledge Graphs Introduction-2
5 pages
Evaluation Matrix-SI-Vendor V 1 2 - 1
No ratings yet
Evaluation Matrix-SI-Vendor V 1 2 - 1
21 pages
A Survey On Knowledge Graphs: Representation, Acquisition and Applications
No ratings yet
A Survey On Knowledge Graphs: Representation, Acquisition and Applications
27 pages
Apps
No ratings yet
Apps
23 pages
Automatic KG Construction
No ratings yet
Automatic KG Construction
50 pages
Course Introduction
No ratings yet
Course Introduction
16 pages
Top Trends in Data and Analytics 2021
No ratings yet
Top Trends in Data and Analytics 2021
25 pages
Prev CSC Log
No ratings yet
Prev CSC Log
7 pages
Linear and Nolinear PDF
No ratings yet
Linear and Nolinear PDF
1 page
BLIS03 - Assignment II - 2023-24
No ratings yet
BLIS03 - Assignment II - 2023-24
3 pages
Artigo - Grafo Do Conhecimento
No ratings yet
Artigo - Grafo Do Conhecimento
8 pages
Semantic Web Tech
No ratings yet
Semantic Web Tech
35 pages
Jair14494 Rev
No ratings yet
Jair14494 Rev
32 pages
Provenance in Data Science: Leslie F. Sikos Oshani W. Seneviratne Deborah L. Mcguinness Editors
No ratings yet
Provenance in Data Science: Leslie F. Sikos Oshani W. Seneviratne Deborah L. Mcguinness Editors
119 pages
Evaluation Matrix - AA - V 1.5
No ratings yet
Evaluation Matrix - AA - V 1.5
14 pages
Paper 2
No ratings yet
Paper 2
16 pages
Neo4j Master Data Management White Paper en US
No ratings yet
Neo4j Master Data Management White Paper en US
11 pages
NeurIPS 2018 Kong Kernels For Ordered Neighborhood Graphs Paper
No ratings yet
NeurIPS 2018 Kong Kernels For Ordered Neighborhood Graphs Paper
10 pages
AutoKnow KnowledgeProductGraph
No ratings yet
AutoKnow KnowledgeProductGraph
11 pages
Programming MapReduce with Scalding
From Everand
Programming MapReduce with Scalding
Antonios Chalkiopoulos
No ratings yet
MRA Participant Runbook
No ratings yet
MRA Participant Runbook
6 pages
TigerGraph Buyers Guide Part 2
No ratings yet
TigerGraph Buyers Guide Part 2
9 pages
Empowering: Empowering Machine Machine Learning Learning
No ratings yet
Empowering: Empowering Machine Machine Learning Learning
6 pages
EKG Events
No ratings yet
EKG Events
7 pages
The Catalog
No ratings yet
The Catalog
28 pages
EhrlingerWoess - Towards A Definition of Knowledge Graphs - 2016
No ratings yet
EhrlingerWoess - Towards A Definition of Knowledge Graphs - 2016
5 pages
Knowledge Representation - B
No ratings yet
Knowledge Representation - B
9 pages
Shop Owl
No ratings yet
Shop Owl
3 pages
Paper 4
No ratings yet
Paper 4
6 pages
BPM KG
No ratings yet
BPM KG
4 pages
Yozolog
No ratings yet
Yozolog
5 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Java - Introduction To Programming: String Builder
No ratings yet
Java - Introduction To Programming: String Builder
3 pages
Compares JCL
No ratings yet
Compares JCL
12 pages
Cloud Security Checklist
No ratings yet
Cloud Security Checklist
2 pages
Automata Theory IIB: Questions Answers (DFA-NFA)
No ratings yet
Automata Theory IIB: Questions Answers (DFA-NFA)
15 pages
Submitted To: Priyanka Mam Submitted By: Sonu
No ratings yet
Submitted To: Priyanka Mam Submitted By: Sonu
17 pages
Tutorial On Ontological Engineering Part 2: Ontology Development, Tools and Languages
No ratings yet
Tutorial On Ontological Engineering Part 2: Ontology Development, Tools and Languages
28 pages
High Authority Backlinks List Sites
No ratings yet
High Authority Backlinks List Sites
5 pages
Knowledge Engineering Syllabus - College of Computer Science - KAU
No ratings yet
Knowledge Engineering Syllabus - College of Computer Science - KAU
1 page
Kaldi Nnet3 PDF
No ratings yet
Kaldi Nnet3 PDF
1 page
Casa3200 SNMP Config
No ratings yet
Casa3200 SNMP Config
2 pages
JNTUK R13 April 2018 4 2 Social Network The Semantics Web CSE
No ratings yet
JNTUK R13 April 2018 4 2 Social Network The Semantics Web CSE
1 page
Westland 20%
No ratings yet
Westland 20%
1 page
RT42054D042018
No ratings yet
RT42054D042018
1 page
Syllabus ANN
No ratings yet
Syllabus ANN
2 pages
SQWRL: A Query Language For OWL
No ratings yet
SQWRL: A Query Language For OWL
18 pages
Teme Pentru Referate La Cursul "Retele Neuronale"
No ratings yet
Teme Pentru Referate La Cursul "Retele Neuronale"
3 pages
1 General Information On The Final Exam
No ratings yet
1 General Information On The Final Exam
2 pages