0% found this document useful (0 votes)
33 views

Data Models and Information Accesses - : (Set, Graph, Map, Archetype) (Relations, XML, KML, ADL) (List)

This document discusses data models and information access. It begins by covering tree structured data and how it is represented on both the client and storage sides. It then discusses the relationship between programs and data, and how structured data can be represented as objects in languages like Java and C++ or in a database using a data dictionary. It also covers semi-structured web data. The document then discusses different data interchange formats like CSV, database dumps, and newer standards like JSON, YAML, and XML. It covers how information can be represented as objects, documents, or graphs. The rest of the document discusses older data models based on lists, basic data elements like sets and bags, representing data as trees or graphs, abstract data types, and

Uploaded by

Surbhi Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Data Models and Information Accesses - : (Set, Graph, Map, Archetype) (Relations, XML, KML, ADL) (List)

This document discusses data models and information access. It begins by covering tree structured data and how it is represented on both the client and storage sides. It then discusses the relationship between programs and data, and how structured data can be represented as objects in languages like Java and C++ or in a database using a data dictionary. It also covers semi-structured web data. The document then discusses different data interchange formats like CSV, database dumps, and newer standards like JSON, YAML, and XML. It covers how information can be represented as objects, documents, or graphs. The rest of the document discusses older data models based on lists, basic data elements like sets and bags, representing data as trees or graphs, abstract data types, and

Uploaded by

Surbhi Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Data Models and Information

Accesses
CSV888-Special Module
Lecture 2

2015
(set, graph, map, archetype)
(relations, XML, KML, ADL)
(list)
-Subhash Bhalla
1

Application Design and Development

Application Programs and User Interfaces


Tree Structured Data
(a) Clint-side XHTML, CSS, JavaScript
(b) Storage side RDBMS, XML, JSON, YAML, DOT,
Web Fundamentals Client + web server + Database Server
Servlets and JSP, PHP
Application Architectures
Rapid Application Development
Application Performance
Application Security
Encryption and Its Applications
2

Programs and Data


Program Data
Direct access to the data/medium
(format csv, space, Columns data types, variables
(hardwired to data)
Program [ structured data JAVA/C++ objects]
Database 3 level Data Dictionary
- Structured Data (DBMS (scheme): db (data) )
- sets, relations, list, bags

Web Data Semi-structured Data (latex, HTML, )


- Objects, object-class, sets, .
3

Data Interchange
Program 1 CSV (comma Separated values)
Program 2 CSV values

Program data dump


Stacks
ORACLE Database Dump
Arrays
Abstract Data Types

PROGRAMMING to upload and process


knowledge of syntax and semantics
NEW TRENDS ( data sharing among multiple applications)
4
YAML, Jason, XML, Candle

Information Interchange
Information System 1 Amazon Java books
Information Systems 2 Amazon Books

1. Objects Books, Rooms with id and (x,y) coordinates,


Students, Courses, .
2. Documents web documents, finacial statements of
companies,
3. Graphs and structures Protiens, Maps,
Information Sets (RDB) , relation ! DB !
Tree-structured Data ( XML),
Syntax and Semantics
Tree Structured Data XML, JSON, Candle Markup
5

Old Data ModelsList Processing


1. Hierarchical Model Tree (rooted,
acyclic, unique path from root to leaf)
2. Network Model Linked list
1) and 2) influenced by list structure
6

Basic Data Elements


1. Set - No duplicates and no order
[ (3,1,1)- not a set; Set (3,1) is same
as set (1,3) ]

2. Bag data has no order


[(3,1,1) is same as (1,3,1)]
3. List has order [(3,1,1) is not same
as (1,3,1)]
7

Content has no form- an island


1. Set
Set = Relation;
2. Stored over List
3. List Processed by Von
Neumann architecture /
Turing M/C
8

Abstract Data Type (ADT)


Abstract data type (ADT) a mathematical model for a certain class
of data structures that have similar behavior;
for certain data types of one or more programming languages that
have similar semantics.
An abstract data type is defined indirectly (by the operations that may
be performed on it and by mathematical constraints on the effects
(and possibly cost) of those operations)
Example, an abstract stack defined by three operations: push, pop,
and peek
When analyzing the efficiency of algorithms that use stacks, one
may also specify that all operations take the same time no matter how
many items have been pushed into the stack, and that the stack uses a
constant amount of storage for each element.

ADT

Abstract data types are purely theoretical entities,


1.
2.
3.
4.

used to simplify the description of abstract algorithms,


to classify and evaluate data structures,
to formally describe the type systems of programming languages.
ADT may be implemented by specific data types or data
structures, in many ways and in many programming languages;
5. or described in a formal specification language.
6. ADTs are often implemented as modules: the module's interface
declares procedures that correspond to the ADT operations,
sometimes with comments that describe the constraints.
7. This information hiding strategy allows the implementation of
the module to be changed without disturbing the client programs.
10

Content: Table Set/bag (represent as?)


company
c1
c1
c1

section
s1
s1
s2

employee
e1
e2
e3

<company id="c1">

<section id="s1">

<employee id="e1"/>

<employee id="e2"/>

</section>

<section id="s2">

<employee id="e3"/>

</section>
</company>
11

Data in Tree (list)


<sectionList>
<section id="s1">
<company id="c1"/>
<employee id="e1"/>
<employee id="e2"/>
</section>
<section id="s2">
<company id="c1"/>
<employee id="e3"/>
</section>
</sectionList>

<employeeList>
<employee id="e1">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e2">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e3">
<company id="c1"/>
<section id="s2"/>
</employee>
</employeeList>
12

Relational Model (set)- EF Codd 1971 (IBM)

A. Two levels1. User Sets and set operations


--------------------------------------2. Storage list ;
-User [need elements]; (no navigation)
-Storage [store over list; provide thru
index or list search]
User [need set operations]
do them on your own
13

Table form set (product set)


company
c1
c1
c1

section
s1
s1
s2

employee
e1
e2
e3

Table form of data set, or bag


Operations set operations
Query language
14

Comparison of methods
Old Models- Hierarchical Model
variation over list structure
Started from Bottom: Query on list
Network Model variation over list

Relational Model: Top Down Approach


Set + Set operations : Two layers
No navigation as in old models
influence over query operations
15

Text book research book


1. SQL
a) 1971- 1976
b) SQL 2 ( 1992 )
c) SQL 3 ( Object Relational Data Models )
(1999)
d) SQL 4 ( Web Data XML ) (2003)
e) Web Services (Data Resouce sharing)(2005
2010)
f) Semantic Web Using web (2005 -2020)
16

[ New ] DB Forms of content


1. Web Documents

2. Map Google Map, Yahoo map, MS


map
3. Bio-medical informatics web
data resources (complex chains of
molecules in protiens)
4. Electronic Health Records
17

[ New ] DB Forms of content


Content has a form (structure)
(not islands of data)
Representation 1. list (too simple)
2. set
3. graph
Low level (Disk/Memory) list
Processing Content; intermediate
representation (may be); storage (list)
18

[ New ] DB Forms of Content


Web Document XML
Web-based Maps KML (google)
Bio-Medical Data Resources XML, or
similar to XML
Electronic Health Records ADL
(similarities with XML, used in
conjunction with XML)
1. Document form graph (not set)
19

Content not Island Graph Data


A graph G = (V,E) is a collection of nodes (vertices)
and edges.
A graph relationship structure among different
data elements.

A graph database is a collection of different graphs


representing different relationship structures.
Notes:
a) Storage Level list structures, b) multiple levels,
c) intermediate forms (XML Lists )
20

Compare: Graph database and (set)


Relational database
A relational database
maintains different
instances of the same
relationship structure
(represented by its ER
schema)
A graph database maintains different
relationship structures
Web Documents, maps, Bio-Medical
Informatics, Electronic Health Records
Store in intermediate forms XML,KML,ADL
21

Queries over New DB Contents


Attribute Queries (Type A)
Queries over attributes and values in nodes
and edges. ( Equivalent to a relational
query within a given schema

Structural Queries (type B)


[Not Main focus of our Discussion]
Queries over the relationship structure
itself. Examples: Structural similarity,
substructure, template matching, etc.
22

Graph Database Applications(Type A and Type B)


Software Engineering
UML diagrams, flowcharts, state machines,

Knowledge Management
Ontologies, Semantic nets,

Bioinformatics
Molecular structures, bio-pathways,

CAD
Electrical circuits, IC designs,

Cartography, XML Bases, HTML Webs,


23

Structural Queries on Graph Data


(Type B)
Undirected Graphs
Structural similarity, substructure

Directed Graphs
Structural similarity, substructure, reachability

Weighted Graphs
Shortest paths, best matching substructure

Labeled Graphs
Labeled structural similarity, unlabeled
structural similarity
24

Structural Queries (Type B)


Substructure query
Given a graph database G = {G1, G2, Gn} and a
query graph Q, return all graphs Gi where Q is a
subgraph of Gi.

Structural similarity
Given a graph database G = {G1, G2, Gn} and a query
graph Q and a threshold t, return all graphs Gi where the
edit distance between Q and Gi is at most t.
The edit distance between two graphs is the number of
edge modifications (additions, deletions) required to
rewrite one graph into the other
25

Data Graph
- Storage Models for Graphs
- Data Models for Graph Databases
- Structural Indexes
- Mining Frequent Subgraphs
gSpan (graph-based Substructure
pattern mining)
FBT (Graph Data and Mining )
26

Structural Queries
In graph databases structure matching
has to be performed against a set of
graphs!
Method of storage, pre-processing and
index structures crucial
(if structural searches are to be practical)
27

Storing Graph Data set


Attributed Relational Graphs (ARGs)
A
q

p
s
C

B
t

A
B
B
A
A

B
C
D
C
D

q
s
t
p
r

28

Storing Graph Data


ARGs
ARGs store a graph as a set of rows, each
depicting an edge
Amenable to storage in an RDBMS and easy
attribute searches using SQL
New Query Languages (Research Type A)
Costly structural searches, requiring
complex nesting of SELECT statements
Each graph needs a separate table
Type B (VLDB , SIGMOD, many forums)
29

1. Storing Graph Data in XML


(rooted tree,acyclic, unique path from root)
<node id=A>

<node id=B>

<node id=C>

</node>

<node id=D>

</node>

</node>

<node id=C>

</node>

<node id=D>

</node>
</node>

B
C

30

2. Storing Graph Data in XML


(arbitrary graph)
XML with IDs and IDREFS:
A

B
C

<node id=A, adj=C D>


<node id=B>
<node id=C>
</node>
<node id=D>
</node>
</node>
</node>

31

Storing Graph Data


XML (with or without IDREFS )
Reduces graph database to an XML base
Use XPath / XQuery engines for attribute querries
and structural queries
Widely supported by a variety of XML parsers
Costly structure/sub-structure matching
Needs distinction between IDREF edges and
hierarchy edges

32

Contents- 1. Web Documents


1. Input ISBNs or Keywords
(of author or title).
2. Send request data to
Amazon Web Service.
3. Receive response.

4. Extract Documents from


the response.
5. Add update data and
state data to the book
catalog.
6. Store these data into KB.
33

KB: Data Structure


Book

URL of image: text


- ASIN: number (1)
- Title: text
- Average rating: number (2)
- Author name: text
- URL of detail page: text
- Price: text
- Publisher name: text
- Publication date: text
- Number of pages: text
Sales rank: number
.. . .. . ..

34

Web Documents

Web ServicesWeb API


1.
2.
3.
4.

Amazon E-Commerce Service


Yahoo! Search Web Service
Google AJAX Search API
Technorati Search API

ISBN or
Keyword

Amazon
Customer
reviews

Book
catalogs

XML DB:DBMS for XML


1.

Knowledge Base (KB)


A collection of book data for BUS.
2. Information Repository of Users Needs (IRUN)
A collection of data that consists of users interest
and needs.
Users data

Book data

XML DB
35

Amazon E-Commerce Service

Product information (e.g. catalogs,


reviews, rating) retrieval for:
1. Books
2. Music
3. DVD
4. Electronics
5. Kitchen
6. Software
7. Video Games
8. Toys
36

Yahoo! Search Web Service


Web information (e.g. URL, content
or hit count) retrieval:
1. Web pages
2. Images
3. Movies

37

Google AJAX Search API


Embed search box in a web page and
display search results of:
1.
2.
3.
4.
5.

Web pages
News
Video
Maps
Blogs

38

Book Utilization System


Web Service Handler
Google
AJAX Search API

Technorati
Search API

Yahoo!
Search Web Service

Amazon
E-Commerce Service

Web User Interface


Blogs
Display

Retrieval

Book Catalogs
Delete
Update

Book catalogs

Alternate Keywords
Search & Suggestion

Search

Update time

Mark up

Current state

Registration

Book Reviews Needs


Retrieval
Evaluation

Registration

Evaluated value Users interest and needs

XML DB Handler

KB
(book data)

IRUN
(need data)

39

A). Direct Storage of XML


A). Direct Storage of XML

Amazon
Web Service
<Book>
<Catalog/>
</Book>

RDB

XML data can be directly


stored in XML DB.

XML DB
1

40

Semi-structured Data Handling


B). Semi-structured Data Handling
Book Data
Catalog
Catalog
+
+
Comment Catalog Articles
only

RDB

The structure of book


data is different from
book to book.
<Book>
<Book>
<Articles/>
<Catalog/>
</Book>
</Book>
<Book>
<Catalog/>
<Comment/>
</Book>

XML DB
1

41

Web Document
C). Frequent Structural Change

Add comment

<Book>
<Catalog/>
</Book>

<Book>
<Catalog/>
<Comment/>
</Book>

Relational DB:

XML DB:
1

42

Content 1. Web Document

Update information:
- Added time
- Commented time
- Recommended time
- Searched time
Current state of a book

Comment added by a user.

43

Semi-structured Data
Web Data

Information interchage, exchange


document structure
Semi-structured Data
{ name: Alan, tel: 2157786,
email: [email protected] }
44

Web Data - Labels


Duplicate labels
{ name: Alan, tel: 2157786, tel: 3782535 }
Many labels or missing labels
{
person:
{name: Alan,tel:2157786, email: [email protected]},
person:
{name: {first: Sara, last:Green},
tel: 2136877, email: [email protected]},
person:
{name: Fred, tel: 4257783, Height: 183 }
}
45

A relation and its XML form


Fruits-table = fruit-name, string(6), color, string(5)
[ Apple, Green]
[ Apple, Red ]
<?XML VERSION ="1.0" STANDALONE = "YES"?>
<Fruits-table>
<FRUITS>
<FRUIT> <NAME> Apple <\NAME>
<COLOR> Green <\COLOR>
<\FRUIT>
<FRUIT> <NAME> Apple <\NAME>
<COLOR> Red <\COLOR>
<\FRUIT>
<\FRUITS>
<\Fruits-table>
46

SQL Extensions (SQL 2003)


xmlelement creates XML elements

xmlattributes creates attributes

select xmlelement ( name "account,


xmlattributes (account_number as account_number),
xmlelement ( name "branch_name",branch_name),
xmlelement ( name "balance",balance))
from account

47

SQL XML
SQL 2003 nested XML output
Each tuple XML element
<bank>
<account>
<row>
<account-number> A-101
</account-number>
<branch-name> Downtown </branch-name>
<balance>
500
</balance>
</row>
<row>
more data .. . .
</row>
</account>
. . .. . . . .
</bank>

48

Data in XML SQL 2003

Ability to specify new tags + create nested tag structures XML is a


way to exchange data (db) + documents.
XML extensive use in data exchange applications

Tags make data (relatively) self-documenting

E.g.

<university>
<department>
<dept_name> Comp. Sci. </dept_name>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<course>
<course_id> CS-101 </course_id>
<title> Intro. to Computer Science </title>
<dept_name> Comp. Sci </dept_name>
<credits> 4 </credits>
</course>
</university>

49

Data in XML (new std. SQL2003)

<university-3>
<department dept name=Comp. Sci.>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<department dept name=Biology>
<building> Watson </building>
<budget> 90000 </budget>
</department>
<course course id=CS-101 dept name=Comp. Sci
instructors=10101 83821>
<title> Intro. to Computer Science </title>
<credits> 4 </credits>
</course>
.
<instructor IID=10101 dept name=Comp. Sci.>
<name> Srinivasan </name>
<salary> 65000 </salary>
</instructor>
.
</university-3>

50

1. Contents- web documents


Web
semi-structured Web
document
data
query
---------------------------------------------------Multiple
Semi-structured Web
Web documents data
mining
----------------------------------------------------Web structure
Structured
Web
and links
data
mining
----------------------------------------------------Web Usage
Structured
Web
logs and tables data
mining
------------------------------------------------------

51

Summary - 1
1. Content model usage, interface,
query Users
2. Representation
1. storage level
2. content level
3. XML widely researched and
supported authoring, editing, parsing,
.
52

Summary -2
1. XML query tools
xpath; xquery; xslt ( all use xpath )
tree / arbitrary graph
2.SQL can query GIS data and relational data
(XML converted to relational form)
3. Query Interfaces Type A and Type B
4. EHRs AQL (uses SQL structure + XML
addresses) ; XML templates
53

Summary - 3
1. SQL for map data

2. a) XML, b) XML query languages,


c) Berkeley DB XML (free download)
3. Web Services and Web Resources
4. Recent increase in research activity
New Query Language Interfaces
5. High-level user interfaces
54

XML

55

XML Examples
Internet RSS, ATOM
- XHTML, Web Service Formats: SOAP, WSDL
File Format: Microsoft Office, Open Office, Apples iWork

Industrial- Insurance (ACORD),


- Clinical Trials (cdisc)
- Financial (FIX, FpML)
- Many Applications use XML- Storage or Data
exchannge

56

57

Research Issues
1. Data Chemistry structures, EHRs
Structural information is captured in
tree model or graph model for querying
2. Graph is more flexible
3. Tree model is simple Single root, no
cycle, unique path from root to a leaf.
Graph pointer to ancestor and decendents
4. Semi-structured Data schema sharing
58

Old Data ModelsList Processing


1. Hierarchical Model Tree (rooted,
acyclic, unique path from root to leaf)
2. Network Model Linked list
1) and 2) influenced by list structure
59

Content: Table Set/bag (represent as?)


company
c1
c1
c1

section
s1
s1
s2

employee
e1
e2
e3

<company id="c1">

<section id="s1">

<employee id="e1"/>

<employee id="e2"/>

</section>

<section id="s2">

<employee id="e3"/>

</section>
</company>
60

Data in Tree (list)


<sectionList>
<section id="s1">
<company id="c1"/>
<employee id="e1"/>
<employee id="e2"/>
</section>
<section id="s2">
<company id="c1"/>
<employee id="e3"/>
</section>
</sectionList>

<employeeList>
<employee id="e1">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e2">
<company id="c1"/>
<section id="s1"/>
</employee>
<employee id="e3">
<company id="c1"/>
<section id="s2"/>
</employee>
</employeeList>
61

Relational Model (set)- EF Codd 1971 (IBM)

A. Two levels1. User Sets and set operations


--------------------------------------2. Storage list ;
-User [need elements]; (no navigation)
-Storage [store over list; provide thru
index or list search]
User [need set operations]
do them on your own
62

Table form set (product set)


company
c1
c1
c1

section
s1
s1
s2

employee
e1
e2
e3

Table form of data set, or bag


Operations set operations
Query language
63

Comparison of methods
Old Models- Hierarchical Model
variation over list structure
Started from Bottom: Query on list
Network Model variation over list

Relational Model: Top Down Approach


Set + Set operations : Two layers
No navigation as in old models
influence over query operations
64

XML Most Recent Inovations


Can be a Tree with UNIX directory style
paths
Can maintain redundant IDs to know the
linked information

65

Application Design and Development

Application Programs and User Interfaces


Tree Structured Data
(a) Clint-side XHTML, CSS, JavaScript
(b) Storage side RDBMS, XML, JSON, YAML, DOT,
Web Fundamentals Client + web server + Database Server
Servlets and JSP, PHP
Application Architectures
Rapid Application Development
Application Performance
Application Security
Encryption and Its Applications
66

Data Interchange
Program 1 CSV (comma Separated values)
Program 2 CSV values

Program data dump


Stacks
ORACLE Database Dump
Arrays
Abstract Data Types

PROGRAMMING to upload and process


knowledge of syntax and semantics
NEW TRENDS ( data sharing among multiple applications)
67
YAML, Jason, XML, Candle

Information Interchange
Information System 1 Amazon Java books
Information Systems 2 Amazon Books

1. Objects Books, Rooms with id and (x,y) coordinates,


Students, Courses, .
2. Documents web documents, finacial statements of
companies,
3. Graphs and structures Protiens, Maps,
Information Sets (RDB) , relation ! DB !
Tree-structured Data ( XML),
Syntax and Semantics
Tree Structured Data XML, JSON, Candle Markup
68

XML STYLE MARKUP LANGUAGES

Data Mark-up : Configuration files, Internet Messages, Sharing Data


and Objects between programming Languages

Document Mark-up : Web Documents, Database contents


Purpose : Exchange of data or exchange of documents, Storage

YAML cross language, Unicode based, data serialization language


( Data Mark-up)

Candle Mark-up ( Document mark-up for static data )


The syntax is based on XML, but have many differences
69

YAML
Designed common data types of different programming
languages.

Superset JSON (YAML Version 1.2)


Goals:
1. easily readable by humans.
2. portable between programming languages.
3. matches the native data structures of most programming
languages.
4. has a consistent model to support generic tools.
5. supports one-pass processing.
6. expressive and extensible.
7. is easy to implement and use.
70

YAML

YAML integrates and builds upon concepts


(many tools + Software)
described by C,
Java,
Perl, Python, Ruby,
RFC0822 (MAIL),
RFC1866 (HTML),
RFC2045 (MIME),
RFC2396 (URI),
XML, SAX, SOAP, and
JSON.
Reference:

71
http://www.yaml.org/spec/1.2/spec.html ( many more)

CANDLE MARKUP

Candle Markup Document Markup


Can do Data Markup easily
is an ideal format for general-purpose data serialization.
It works well for both structured object data and mixed text content.
It has a terse and readable syntax, as well as,
a clean and strongly-typed data model,
It is better than many existing textual serialization formats: XML,
JSON, YAML.

Candle Markup is a subset of the Candle language


used as a document format for static data.
The syntax of Candle Markup is designed based on XML
72

CANDLE MARKUP
Example ( XML )
<menu id="file" value="File">
<popup>
<menuitem value="New" onclick="CreateNewDoc()" />
<menuitem value="Open" onclick="OpenDoc()" />
<menuitem value="Close" onclick="CloseDoc()" />
</popup>
</menu>
Example ( JSON )
{"menu": {
"id": "file", "value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}

73

CANDLE MARKUP CANDLE OBJECT NOTATION


<?cmk1.0?>
menu {
id=file value="File"
popup {
menuitem { value="New" onclick="CreateNewDoc()" }
menuitem { value="Open" onclick="OpenDoc()" }
menuitem { value="Close" onclick="CloseDoc()" }
}
}
Candle Object Notation ( comparison with JSON) :
objects have explicit name (instead of encoding it as key string);
attribute name does not need to be double quoted;
There's no need of delimiter, like comma, between the attributes.
74

DOT (graph description language)


example script that describes the bonding structure of an
ethane molecule. This is an undirected graph and contains
node attributes.
graph ethane {

C_0 -- H_0 [type=s];

C_0 -- H_1 [type=s];

C_0 -- H_2 [type=s];

C_0 -- C_1 [type=s];

C_1 -- H_3 [type=s];

C_1 -- H_4 [type=s];

C_1 -- H_5 [type=s];


}
Many interfaces for graphic visualization and query
75

Conclusions
1. Information Interchange is common

2. ADTs objects with schema details


Languages ( XML, JSON, .)

3. Storage Transform

Query
76

You might also like