Semantic Search Algo Primer
Semantic Search Algo Primer
1 / 32
The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge representation system to be exported onto the Web. T. Berners-Lee, J. Hendler, O. Lassila Semantic Web, 2001
2 / 32
Outline
1
Introduction to Semantic Web Concept and History of Development Architecture of Semantic Web Concept of Semantic Search
3 / 32
Outline
1
Introduction to Semantic Web Concept and History of Development Architecture of Semantic Web Concept of Semantic Search Three Algorithms for Semantic Search Minimal Answers Concept Matching Computing Interconnections
3 / 32
Outline
1
Introduction to Semantic Web Concept and History of Development Architecture of Semantic Web Concept of Semantic Search Three Algorithms for Semantic Search Minimal Answers Concept Matching Computing Interconnections Directions for Further Research
3 / 32
4 / 32
Motivating Scenarios
A person asking his web-agent: Book the ticket for the movie The Lives of Others in the nearest cinema that shows it today evening
5 / 32
Motivating Scenarios
A person asking his web-agent: Book the ticket for the movie The Lives of Others in the nearest cinema that shows it today evening Find a suitable wine for every item in this menu. If possible, choose French
5 / 32
Motivating Scenarios
A person asking his web-agent: Book the ticket for the movie The Lives of Others in the nearest cinema that shows it today evening Find a suitable wine for every item in this menu. If possible, choose French Microwave, please, go to the website of the dish manufacturer and download the optimal parameters for cooking
5 / 32
Timeline
1994: Foundation of W3C. They develop standards such as: HTML, URL, XML, HTTP, PNG, SVG, CSS 1998: Tim Berners-Lee published Semantic Web Road Map 1999: W3C launched groups for designing Sematic Web foundations, the rst version of RDF is published 2000: American defence research institution started investigations for ontology descriptions (DAML+OIL project) 2001: The Sematic Web paper in Scientic American 2004: New version of RDF, ontology description language OWL 2006: Candidate recommendation of SPARQL, a query language for Semantic Web
6 / 32
Na Plan ve
1
Develop a MEGA-language that is powerful enough to describe all human knowledge and is machine understandable at the same time. Force all web publishers translate their websites to this language Write programs that can search in and reason about all the information in the web
7 / 32
Na Plan ve
1
Develop a MEGA-language that is powerful enough to describe all human knowledge and is machine understandable at the same time. Force all web publishers translate their websites to this language Write programs that can search in and reason about all the information in the web
There is a more practical solution for the rst step
7 / 32
8 / 32
9 / 32
Syntax for knowledge representation (done: RDF) Ontology description language (done: OWL)
9 / 32
Syntax for knowledge representation (done: RDF) Ontology description language (done: OWL) Web-services description language (started: OWL-S)
9 / 32
Syntax for knowledge representation (done: RDF) Ontology description language (done: OWL) Web-services description language (started: OWL-S) Tools for reading/publishing Semantic Web documents (started: Jena, Haystack, Protege)
9 / 32
Syntax for knowledge representation (done: RDF) Ontology description language (done: OWL) Web-services description language (started: OWL-S) Tools for reading/publishing Semantic Web documents (started: Jena, Haystack, Protege) Query language for data represented by RDF (started: SPARQL)
9 / 32
Syntax for knowledge representation (done: RDF) Ontology description language (done: OWL) Web-services description language (started: OWL-S) Tools for reading/publishing Semantic Web documents (started: Jena, Haystack, Protege) Query language for data represented by RDF (started: SPARQL) Logic reasoning about RDF statements (to be done)
9 / 32
Syntax for knowledge representation (done: RDF) Ontology description language (done: OWL) Web-services description language (started: OWL-S) Tools for reading/publishing Semantic Web documents (started: Jena, Haystack, Protege) Query language for data represented by RDF (started: SPARQL) Logic reasoning about RDF statements (to be done) Semantic search and semantic agents (to be done)
9 / 32
10 / 32
11 / 32
11 / 32
11 / 32
11 / 32
11 / 32
12 / 32
XRANK: Model
Database is a set of XML documents There are hyperlinks between nodes Every node contain some text Query is a short list of keywords
13 / 32
XRANK: Model
Database is a set of XML documents There are hyperlinks between nodes Every node contain some text Query is a short list of keywords A complete answer is a node that together with its descendants contain all query terms
13 / 32
Minimal Answers
A node v is called to be a minimal answer if k Q : [v contains k] OR [u son of v s.t. u contains k AND u is not complete answer]
14 / 32
Minimal Answers
A node v is called to be a minimal answer if k Q : [v contains k] OR [u son of v s.t. u contains k AND u is not complete answer] Search task: nd all minimal answers and rank them accordingly to the link/containement popularity
14 / 32
Dewey Code
Nodes in database have Dewey codes n1 .n2 . . . . nh For example, Dewey code 7.2.12 denotes the 12th left son of the 2nd left son of the root of the 7th document in our collection.
15 / 32
Dewey Code
Nodes in database have Dewey codes n1 .n2 . . . . nh For example, Dewey code 7.2.12 denotes the 12th left son of the 2nd left son of the root of the 7th document in our collection. For every keyword Dewey inverted index store a list of Dewey codes of nodes (DIL) that directly contain this keyword
15 / 32
16 / 32
Given Dewey inverted lists for all query terms to return a list of Dewey codes of all minimal answers
17 / 32
18 / 32
18 / 32
18 / 32
nd a lowest common predecessor w for v and u Sequentially consider ancestors of u from bottom to top, add keywords of u to their set in Dewey stack Stop at root, or with identical set update or on the rst complete node In latter case output this node to the list of minimal answers
19 / 32
20 / 32
20 / 32
21 / 32
Similarity Formula
TreeSim(Q, R) = NodeSim(q0 , r0 )+
children matching
max
22 / 32
Compute TreeSim for every pair of Q and R roots children Find the best matching by applying Bellman-Ford algorithm
23 / 32
Compute TreeSim for every pair of Q and R roots children Find the best matching by applying Bellman-Ford algorithm Complexity for l-branch trees of depth d: C (d + 1) = l 2 C (d) + l 4 + const C (d) = O(l 2d+2 ) = O(n2 l 2 )
23 / 32
Compute TreeSim for every pair of Q and R roots children Find the best matching by applying Bellman-Ford algorithm Complexity for l-branch trees of depth d: C (d + 1) = l 2 C (d) + l 4 + const C (d) = O(l 2d+2 ) = O(n2 l 2 ) In general, time complexity is O(n4 )
23 / 32
XSEarch Model
Database: huge XML tree with labels on internal nodes and keywords on leafs Query terms: label:keyword, label:, :keyword
24 / 32
XSEarch Model
Database: huge XML tree with labels on internal nodes and keywords on leafs Query terms: label:keyword, label:, :keyword Answer: a set of interconnected nodes that together satisfy all query terms
24 / 32
25 / 32
Interconnection
Nodes u and v are interconnected i on the shortest path between them only labels of u and v can coincide
26 / 32
Properties of Interconnection
For u being ancestor of v : InCon[u, v ] = InCon[u, parent(v )]& (label(u) = label(parent(v ))) & InCon[sonv (u), v ]& (label(sonv (u)) = label(v ))
27 / 32
Properties of Interconnection
For u being ancestor of v : InCon[u, v ] = InCon[u, parent(v )]& (label(u) = label(parent(v ))) & InCon[sonv (u), v ]& (label(sonv (u)) = label(v )) Otherwise: InCon[u, v ] = InCon[u, parent(v )]& (label(u) = label(parent(v ))) & InCon[parent(u), v ]& (label(parent(u)) = label(v ))
27 / 32
Properties of Interconnection
For u being ancestor of v : InCon[u, v ] = InCon[u, parent(v )]& (label(u) = label(parent(v ))) & InCon[sonv (u), v ]& (label(sonv (u)) = label(v )) Otherwise: InCon[u, v ] = InCon[u, parent(v )]& (label(u) = label(parent(v ))) & InCon[parent(u), v ]& (label(parent(u)) = label(v ))
Using these formulas we can compute InCon for all pairs in O(|T |) for all pairs by dynamic programming
27 / 32
Algorithms for online conceptual graph matching Queries using arithmetic: what is the most popular movie (according to IMDB) I have not seen yet? Automated inference for RDF statements? Semantic search for the case when the answer is not in the DB, but can be derived from it.
28 / 32
Highlights
XRANK: merging Dewey inverted lists by a single pass Concept matching: nding the most similar tree to the query tree XSEarch: computing interconnection by dynamic programming
30 / 32
Highlights
XRANK: merging Dewey inverted lists by a single pass Concept matching: nding the most similar tree to the query tree XSEarch: computing interconnection by dynamic programming
References (1/2)
Course homepage http://logic.pdmi.ras.ru/~yura/webguide.html
L.Guo, F.Shao, C.Botev, J.Shanmugasundaram XRANK: Ranked Keyword Search over XML Documents http://www.cs.fiu.edu/~vagelis/classes/COP6727/publications/XRank.pdf S.Cohen, J.Mamou, Y.Kanza, Y.Sagiv XSEarch: A Semantic Search Engine for XML http://wwwdb.informatik.uni-rostock.de/Archiv/vldb2003/papers/S03P02.pdf J.Zhong, H.Zhu, J.Li, Y.Yu Conceptual Graph Matching for Semantic Search http://apex.sjtu.edu.cn/docs/iccs2002.pdf
31 / 32
References (2/2)
R.Guha, R.McCool, E.Miller Semantic Search http://learning.ncsa.uiuc.edu/lmarini/papers/p700-guha.pdf S.Harris SPARQL query processing with conventional relational database systems http://eprints.ecs.soton.ac.uk/11126/01/harris-ssws05.pdf E.Brill, S.Dumais, M.Banko An Analysis of the AskMSR Question-Answering System http://www.stanford.edu/class/linguist180/EMNLP2002.pdf T.Berners-Lee, J.Hendler, O.Lassila Semantic Web
http://wireless.ictp.trieste.it/school 2002/lectures/canessa/0501berners-lee.ps 32 / 32