Data Models and Information Accesses - : (Set, Graph, Map, Archetype) (Relations, XML, KML, ADL) (List)
Data Models and Information Accesses - : (Set, Graph, Map, Archetype) (Relations, XML, KML, ADL) (List)
Accesses
CSV888-Special Module
Lecture 2
2015
(set, graph, map, archetype)
(relations, XML, KML, ADL)
(list)
-Subhash Bhalla
1
Data Interchange
Program 1 CSV (comma Separated values)
Program 2 CSV values
Information Interchange
Information System 1 Amazon Java books
Information Systems 2 Amazon Books
ADT
section
s1
s1
s2
employee
e1
e2
e3
<company id="c1">
<section id="s1">
<employee id="e1"/>
<employee id="e2"/>
</section>
<section id="s2">
<employee id="e3"/>
</section>
</company>
11
<employeeList>
<employee id="e1">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e2">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e3">
<company id="c1"/>
<section id="s2"/>
</employee>
</employeeList>
12
section
s1
s1
s2
employee
e1
e2
e3
Comparison of methods
Old Models- Hierarchical Model
variation over list structure
Started from Bottom: Query on list
Network Model variation over list
Knowledge Management
Ontologies, Semantic nets,
Bioinformatics
Molecular structures, bio-pathways,
CAD
Electrical circuits, IC designs,
Directed Graphs
Structural similarity, substructure, reachability
Weighted Graphs
Shortest paths, best matching substructure
Labeled Graphs
Labeled structural similarity, unlabeled
structural similarity
24
Structural similarity
Given a graph database G = {G1, G2, Gn} and a query
graph Q and a threshold t, return all graphs Gi where the
edit distance between Q and Gi is at most t.
The edit distance between two graphs is the number of
edge modifications (additions, deletions) required to
rewrite one graph into the other
25
Data Graph
- Storage Models for Graphs
- Data Models for Graph Databases
- Structural Indexes
- Mining Frequent Subgraphs
gSpan (graph-based Substructure
pattern mining)
FBT (Graph Data and Mining )
26
Structural Queries
In graph databases structure matching
has to be performed against a set of
graphs!
Method of storage, pre-processing and
index structures crucial
(if structural searches are to be practical)
27
p
s
C
B
t
A
B
B
A
A
B
C
D
C
D
q
s
t
p
r
28
<node id=B>
<node id=C>
</node>
<node id=D>
</node>
</node>
<node id=C>
</node>
<node id=D>
</node>
</node>
B
C
30
B
C
31
32
34
Web Documents
ISBN or
Keyword
Amazon
Customer
reviews
Book
catalogs
Book data
XML DB
35
37
Web pages
News
Video
Maps
Blogs
38
Technorati
Search API
Yahoo!
Search Web Service
Amazon
E-Commerce Service
Retrieval
Book Catalogs
Delete
Update
Book catalogs
Alternate Keywords
Search & Suggestion
Search
Update time
Mark up
Current state
Registration
Registration
XML DB Handler
KB
(book data)
IRUN
(need data)
39
Amazon
Web Service
<Book>
<Catalog/>
</Book>
RDB
XML DB
1
40
RDB
XML DB
1
41
Web Document
C). Frequent Structural Change
Add comment
<Book>
<Catalog/>
</Book>
<Book>
<Catalog/>
<Comment/>
</Book>
Relational DB:
XML DB:
1
42
Update information:
- Added time
- Commented time
- Recommended time
- Searched time
Current state of a book
43
Semi-structured Data
Web Data
47
SQL XML
SQL 2003 nested XML output
Each tuple XML element
<bank>
<account>
<row>
<account-number> A-101
</account-number>
<branch-name> Downtown </branch-name>
<balance>
500
</balance>
</row>
<row>
more data .. . .
</row>
</account>
. . .. . . . .
</bank>
48
E.g.
<university>
<department>
<dept_name> Comp. Sci. </dept_name>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<course>
<course_id> CS-101 </course_id>
<title> Intro. to Computer Science </title>
<dept_name> Comp. Sci </dept_name>
<credits> 4 </credits>
</course>
</university>
49
<university-3>
<department dept name=Comp. Sci.>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<department dept name=Biology>
<building> Watson </building>
<budget> 90000 </budget>
</department>
<course course id=CS-101 dept name=Comp. Sci
instructors=10101 83821>
<title> Intro. to Computer Science </title>
<credits> 4 </credits>
</course>
.
<instructor IID=10101 dept name=Comp. Sci.>
<name> Srinivasan </name>
<salary> 65000 </salary>
</instructor>
.
</university-3>
50
51
Summary - 1
1. Content model usage, interface,
query Users
2. Representation
1. storage level
2. content level
3. XML widely researched and
supported authoring, editing, parsing,
.
52
Summary -2
1. XML query tools
xpath; xquery; xslt ( all use xpath )
tree / arbitrary graph
2.SQL can query GIS data and relational data
(XML converted to relational form)
3. Query Interfaces Type A and Type B
4. EHRs AQL (uses SQL structure + XML
addresses) ; XML templates
53
Summary - 3
1. SQL for map data
XML
55
XML Examples
Internet RSS, ATOM
- XHTML, Web Service Formats: SOAP, WSDL
File Format: Microsoft Office, Open Office, Apples iWork
56
57
Research Issues
1. Data Chemistry structures, EHRs
Structural information is captured in
tree model or graph model for querying
2. Graph is more flexible
3. Tree model is simple Single root, no
cycle, unique path from root to a leaf.
Graph pointer to ancestor and decendents
4. Semi-structured Data schema sharing
58
section
s1
s1
s2
employee
e1
e2
e3
<company id="c1">
<section id="s1">
<employee id="e1"/>
<employee id="e2"/>
</section>
<section id="s2">
<employee id="e3"/>
</section>
</company>
60
<employeeList>
<employee id="e1">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e2">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e3">
<company id="c1"/>
<section id="s2"/>
</employee>
</employeeList>
61
section
s1
s1
s2
employee
e1
e2
e3
Comparison of methods
Old Models- Hierarchical Model
variation over list structure
Started from Bottom: Query on list
Network Model variation over list
65
Data Interchange
Program 1 CSV (comma Separated values)
Program 2 CSV values
Information Interchange
Information System 1 Amazon Java books
Information Systems 2 Amazon Books
YAML
Designed common data types of different programming
languages.
YAML
71
http://www.yaml.org/spec/1.2/spec.html ( many more)
CANDLE MARKUP
CANDLE MARKUP
Example ( XML )
<menu id="file" value="File">
<popup>
<menuitem value="New" onclick="CreateNewDoc()" />
<menuitem value="Open" onclick="OpenDoc()" />
<menuitem value="Close" onclick="CloseDoc()" />
</popup>
</menu>
Example ( JSON )
{"menu": {
"id": "file", "value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
73
Conclusions
1. Information Interchange is common
3. Storage Transform
Query
76