0% found this document useful (0 votes)
15 views19 pages

Adbms Unit1

Uploaded by

Amrin Mulani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views19 pages

Adbms Unit1

Uploaded by

Amrin Mulani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ADBMS

Ramez Elmasri , Shamkant B. Navathe(2016) Fundamentals of Database


Systems (7th. Edition), Pearson, isbn 10: 0-13-397077-9; isbn-13:978-0-13-397077-7.
Unit 1
Introduction
Structured, Semi structured, and Unstructured Data.
XML Hierarchical (Tree) Data Model.
XML Documents, DTD, and XML Schema.
XML Documents and Databases.
XML Querying.
XPath
XQuery
Introduction

 XML: Extensible Markup Language


 Defined by the WWW Consortium (W3C)
 Originally intended as a document markup language not a database language
 Documents have tags giving extra information about sections of the document
 E.g. <title> XML </title> <slide> Introduction …</slide>
 Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML
 Extensible, unlike HTML
 Users can add new tags, and separately specify how the tag should be handled for display
 Goal was (is?) to replace HTML as the language for publishing documents on the Web
XML Introduction (Cont.)

 The ability to specify new tags, and to create nested tag structures made XML a great way to exchange
data, not just documents.
 Much of the use of XML has been in data exchange applications, not as a replacement for HTML
 Tags make data (relatively) self-documenting

 E.g.
<bank>
<account>
<account-number> A-101 </account-number>
<branch-name> Downtown </branch-name>
<balance> 500 </balance>
</account>
<depositor>
<account-number> A-101 </account-
number>
<customer-name> Johnson </customer-name>
</depositor>
</bank>
XML: Motivation

 Data interchange is critical in today’s networked world


 Examples:
 Banking: funds transfer
 Order processing (especially inter-company orders)
 Scientific data
– Chemistry: ChemML, …
– Genetics: BSML (Bio-Sequence Markup Language), …
 Paper flow of information between organizations is being replaced by electronic flow of information
 Each application area has its own set of standards for representing information
 XML has become the basis for all new generation data interchange formats
XML Motivation (Cont.)
 Earlier generation formats were based on plain text with line
headers indicating the meaning of fields
 Similar in concept to email headers
 Does not allow for nested structures, no standard “type”
language
 Tied too closely to low level document structure (lines, spaces,
etc)
 Each XML based standard defines what are valid elements, using
 XML type specification languages to specify the syntax
 DTD (Document Type Descriptors)
 XML Schema
 Plus textual descriptions of the semantics
 XML allows new tags to be defined as required
 However, this may be constrained by DTDs
 A wide variety of tools is available for parsing, browsing and
querying XML documents/data
Structure of XML Data

 Tag: label for a section of data


 Element: section of data beginning with <tagname> and ending
with matching </tagname>
 Elements must be properly nested
 Proper nesting
 <account> … <balance> …. </balance> </account>
 Improper nesting
 <account> … <balance> …. </account> </balance>
 Formally: every start tag must have a unique matching end tag, that is in
the context of the same parent element.
 Every document must have a single top-level element
Example of Nested Elements

<bank-1>
<customer>
<customer-name> Hayes </customer-name>
<customer-street> Main </customer-street>
<customer-city> Harrison </customer-
city>
<account>
<account-number> A-102
<branch-name> </account-number>
Perryridge </branch-name>
<balance> 400 </balance>
</account>
<account>

</account>
</customer>
.
.
</bank-1>
Motivation for Nesting

 Nesting of data is useful in data transfer


 Example: elements representing customer-id, customer name, and
address nested within an order element
 Nesting is not supported, or discouraged, in relational databases
 With multiple orders, customer name and address are stored
redundantly
 normalization replaces nested structures in each order by foreign key into
table storing customer name and address information
 Nesting is supported in object-relational databases
 But nesting is appropriate when transferring data
 External application does not have direct access to data referenced by a
foreign key
Structure of XML Data (Cont.)

 Mixture of text with sub-elements is legal in XML.


 Example:
<account>
This account is seldom used any more.
<account-number> A-102</account-number>
<branch-name> Perryridge</branch-name>
<balance>400 </balance>
</account>
 Useful for document markup, but discouraged for data
representation
Structured, Semi Structured and Unstructured Data
Structured Data:
• represented by columns and rows in a
database.
• Databases that hold tables in this form are
called relational databases.
• The mathematical term “relation” specifies
a formed set of data held as a table.
• all row in a table has the same set of
columns.
• SQL (Structured Query Language)
programming language used for structured
data.
Structured, Semi Structured and Unstructured Data
• Semi-Structured Data:
• Data is collected in an ad-hoc manner before it is
known how it will be stored and managed.
• This data may have a certain structure.
• All the information collected will have identical
structure.
• The schema information is mixed in with the data
values, since each data object can have different
attributes that are not known in advance.
• Hence, this type of data is sometimes referred to as
self-describing data.
Structured, Semi Structured and Unstructured Data (contd.)
• Semi-structured data may be displayed as a directed graph...
• The labels or tags on the directed edges represent the schema names—the names
of attributes, object types (or entity types or classes), and relationships.
• The internal nodes represent individual objects or composite attributes.
• The leaf nodes represent actual data values of simple (atomic) attributes.
FIGURE 1 Representing semistructured data as a graph.
Structured, Semi Structured and Unstructured Data (contd.)

• Unstructured Data:
• Either is not organized in a pre-
defined manner or does not
have a pre-defined data model.
• Set of text-heavy but may
contain data such as numbers,
dates, and facts as well.
• Videos, audio, and binary data
files might not have a specific
structure.
XML Hierarchical (Tree) Data Model

A complex XML element called


<projects>
XML Hierarchical (Tree) Data Model (contd.)

• The basic object in XML is the XML document.


• There are two main structuring concepts that are used to construct an
XML document:
• Elements
• Attributes
• Attributes in XML provide additional information that describe
elements.
XML Hierarchical (Tree) Data Model (contd.)
• As in HTML, elements are identified in a document by their start tag and end tag.
• The tag names are enclosed between angled brackets <…>, and end tags are further
identified by a backslash </…>.
• Complex elements are constructed from other elements hierarchically, whereas
simple elements contain data values.
• internal nodes represent complex elements, whereas leaf nodes represent simple elements.
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<root> <author>Giada De Laurentiis</author>
<child> <year>2005</year>
<subchild>.....</subchild> <price>30.00</price>
</child> </book>
</root> <book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
XML Hierarchical (Tree) Data Model (contd.)

Three main types of XML documents:


1. Data-centric XML documents
• small data items that follow a specific structure, may be extracted from a structured
database. They are formatted as XML documents in order to exchange them or display
them over the Web.
2. Document-centric XML documents:
• large amounts of text, such as news articles or books. There is little or no structured
data elements in these documents.
3. Hybrid XML documents:
• have parts that contains structured data and other parts that are predominantly
textual or unstructured.

You might also like