0% found this document useful (0 votes)
12 views18 pages

DB Unit-3

The document discusses the classification of data into structured, semi-structured, and unstructured types, detailing their characteristics and storage methods. It also covers XML data models, including DTD and XML Schema, along with querying languages like XPath and XQuery. Additionally, it explains the differences between XML-enabled and native XML databases, and the benefits of using XQuery for data retrieval.

Uploaded by

Manoj D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

DB Unit-3

The document discusses the classification of data into structured, semi-structured, and unstructured types, detailing their characteristics and storage methods. It also covers XML data models, including DTD and XML Schema, along with querying languages like XPath and XQuery. Additionally, it explains the differences between XML-enabled and native XML databases, and the benefits of using XQuery for data retrieval.

Uploaded by

Manoj D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-3

Structured Data Vs Unstructured Data Vs SemiStructured


Data
We can classify data as structured data, semi-structured
data, or unstructured data.
Structured data resides in predefined formats and
models.
Unstructured data is stored in its natural format until it’s
extracted for analysis.
Semi-structured data basically is a mix of both
structured and unstructured data.
What Is Data?
Data is a set of facts such as descriptions, observations,
and numbers used in decision making.
We can classify data as structured, unstructured, or
semi-structured data.
1) Structured Data
➢ Structured data is generally tabular data that is
represented by columns and rows in a database.
➢ Databases that hold tables in this form are called
relational databases.
➢ In structured data, all row in a table has the same set of
columns.
➢ SQL (Structured Query Language) programming
language used for structured data.
2) Semi-structured Data
➢ Semi-structured data is information that doesn’t consist
of Structured data (relational database) but still has
some structure to it.
3) Unstructured Data
➢ Unstructured data is information that either does not
organize in a pre-defined manner or not have a pre-
defined data model.
➢ Videos, audio, and binary data files might not have a
specific structure.
Characteristics Of Structured (Relational) and Unstructured
(Non-Relational) Data
Relational Data
➢ Relational databases provide undoubtedly the most
well-understood model for holding data.
➢ We can communicate with relational databases using
Structured Query Language (SQL).
➢ SQL allows the joining of tables
➢ Examples of relational databases: MySQL, PostgreSQL.
Non-Relational Data
➢ Non-relational databases permit us to store data in a
format that more closely meets the original structure.
➢ A non-relational database is a database that does not
use the tabular schema of columns and rows
➢ In a non-relational database the data may be stored as
JSON documents, as simple key/value pairs, or as a
graph consisting of edges and vertices.
➢ Examples of non-relational databases: Redis,JanusGraph,
MongoDB, RabbitMQ
Document Data Stores
A document data store handles a set of objects data
values and named string fields in an entity referred to as a
document.
Columnar Data Stores
A columnar or column-family data store construct data
into rows and columns. The columns are divided into groups
known as column families.
Key/Value Data Stores
A key/value store is actually a large hash table
Graph Data Stores
A graph data store handles two types of information,
edges, and nodes.
Structured Unstructured
Data in rows&columns Not in rows&columns
Number,data,strings Images,audio,video
Less storage More storage
Easy to manage and protect Difficult
XML Hierarchical (Tree) Data Model
An XML document has a self descriptive structure. It
forms a tree structure which is referred as an XML tree.
A tree structure contains root element (as parent), child
element and so on.
It is very easy to traverse all succeeding branches and
sub-branches and leaf nodes starting from the root.
Example of an XML document
<?xml version="1.0"?>
<college>
<student>
<firstname>Tamanna</firstname>
<lastname>Bhatia</lastname>
<contact>09990449935</contact>
<email>[email protected]</email>
<address>
<city>Ghaziabad</city>
<state>Uttar Pradesh</state>
<pin>201007</pin>
</address>
</student>
</college>
XML Tree Rules
These rules are used to figure out the relationship of the
elements.
Descendants: If element A is contained by element B, then A
is known as descendant of B. In the above example "College"
is the root element and all the other elements are the
descendants of "College".
Ancestors: The containing element which contains other
elements is called "Ancestor" of other element. In the above
example Root element (College) is ancestor of all other
elements.

Elements in XML Tree Model:


Complex Element: It is constructed from other elements
hierarchically
Simple Element: It contains data values.
Characterize three main types of XML documents:
Data-centric XML documents: These documents have
many small data items that follow a specific structure.
Document-centric XML documents. These are
documents with large amounts of text, such as news articles
or books.
Hybrid XML documents. These documents may have
parts that contain structured data and unstructured.

XML DTD:
The XML Document Type Declaration, commonly known
as DTD, is a way to describe XML language. DTDs check
vocabulary and validity of XML documents.
An XML DTD can be either specified inside the
document, or it can be kept in a separate document and then
linked separately.
Syntax
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
• The DTD starts with <!DOCTYPE delimiter.
• An element tells the parser to parse the document from
the specified root element.
• DTD identifier is an identifier for the document type
definition, which may be the path to a file on the system
or URL.
• The square brackets [ ] enclose an optional list of entity
declarations called Internal Subset.

Internal DTD

A DTD is referred to as an internal DTD if elements are


declared within the XML files.

To refer it as internal DTD, standalone attribute in XML


declaration must be set to yes.

Syntax

<!DOCTYPE root-element [element-declarations]>

Example:
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"
?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>

<address>
<name>Manoj</name>
<company>SKP</company>
<phone>123</phone>
</address>

Start Declaration:
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"
?>
DTD:
<!DOCTYPE address [
DTD Body :
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
End Declaration :
]>
Rules
• The document type declaration must appear at the start
of the document.
• Similar to the DOCTYPE declaration, the element
declarations must start with an exclamation mark.
• The Name in the document type declaration must match
the element type of the root element.

External DTD

In external DTD elements are declared outside the XML


file. They are accessed by specifying the system attributes
which may be either the legal .dtd file or a valid URL.

To refer it as external DTD, standalone attribute in the


XML declaration must be set as no.

Syntax
<!DOCTYPE root-element SYSTEM "file-name">
Example
<?xml version = "1.0" encoding = "UTF-8" standalone = "no"
?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Manoj</name>
<company>SKP</company>
<phone>123</phone> </address>
The content of the DTD file address.dtd is as shown −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Types
You can refer to an external DTD by using either system
identifiers or public identifiers.
System Identifiers
A system identifier enables you to specify the location of
an external file containing DTD declarations.
Syntax
<!DOCTYPE name SYSTEM "address.dtd" [...]>
Public Identifiers
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address
Example//EN">
XML SCHEMA:
XML Schema is commonly known as XML Schema
Definition (XSD). It is used to describe and validate the
structure and the content of XML data.
Schema element supports Namespaces.
Checking Validation
An XML document is called "well-formed" if it contains
the correct syntax. A well-formed and valid XML document is
one which have been validated against Schema.

XML Schema Example[employee.xsd]


<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema
"
targetNamespace="http://www.javatpoint.com"
xmlns="http://www.javatpoint.com"
elementFormDefault="qualified">
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Let's see the xml file using XML schema or XSD file.
employee.xml
<?xml version="1.0"?>
<employee
xmlns="http://www.javatpoint.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.javatpoint.com employee.x
sd">
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>[email protected]</email>
</employee>
Description of XML Schema
<xs:element name="employee"> : It defines the element
name employee.
<xs:complexType> : It defines that the element 'employee' is
complex type.
<xs:sequence> : It defines that the complex type is a
sequence of elements.
<xs:element name="firstname" type="xs:string"/> : It
defines that the element 'firstname' is of string/text type.
<xs:element name="lastname" type="xs:string"/> : It
defines that the element 'lastname' is of string/text type.
<xs:element name="email" type="xs:string"/> : It defines
that the element 'email' is of string/text type.
XML Schema Data types
There are two types of data types in XML schema.
1. simpleType
2. complexType
simple type
The simple type allows you to have text-based elements.
It contains less attributes, child elements, and cannot be left
empty.
Complex type
The complex type allows you to hold multiple attributes
and elements. It can contain additional sub elements and can
be left empty.
XML DATABASE
XML database is a data persistence software system
used for storing the huge amount of information in XML
format.
You can query your stored data by using XQuery.
Types of XML databases
There are two types of XML databases.
1. XML-enabled database
2. Native XML database (NXD)

XML-enable Database

XML-enable database works just like a relational


database. In this database, data is stored in table, in the form
of rows and columns.
Native XML Database
Native XML database is used to store large amount of
data. Instead of table format, Native XML database is based
on container format.
You can query data by XPath expressions.
Example of XML database:
<?xml version="1.0"?>
<contact-info>
<contact1>
<name>Vimal Jaiswal</name>
<company>SSSIT.org</company>
<phone>(0120) 4256464</phone>
</contact1>
<contact2>
<name>Mahesh Sharma </name>
<company>SSSIT.org</company>
<phone>09990449935</phone>
</contact2>
</contact-info>

X-PATH:
XPath defines a pattern or path expression to select
nodes or node sets in an XML document. These patterns are
used by XSLT to perform transformations.
XPath specifies seven types of nodes that can be output of
the execution of the XPath expression.
o Root
o Element
o Text
o Attribute
o Comment
o Processing Instruction
o Namespace
Syntax:
o //tagname[@attribute = ‘value’]

XPath Expressions:

Symbol Description

Selects nodes in the document from the current


// node that match the selection no matter where
they are

/ Selects the root node

tagname Tag name of the current node

@ Select the attribute

attribute Attribute name of the node

Value Value of the attribute

Example:
//input[@id = 'fakebox-input']
In this example, We are locating the ‘input‘ element whose
‘id‘ is equal to ‘fakebox-input‘
Types of XPath:
1. Absolute XPath
2. Relative Xpath
Absolute XPath:
Absolute XPath uses the root element of the HTML/XML
code and followed by all the elements which are necessary
to reach the desired element. It starts with the forward
slash ‘/’ .
Relative XPath:
In this, XPath begins with the double forward
slash ‘//’ which means it can search the element anywhere in
the Webpage.

XQuery
XQuery is a functional language that is used to retrieve
information stored in XML format.
XQuery can be used on XML documents, relational
databases containing data in XML formats.
Characteristics
• Functional Language − XQuery is a language to
retrieve/querying XML based data.
• Analogous to SQL − XQuery is to XML what SQL is to
databases.
• XPath based − XQuery uses XPath expressions to
navigate through XML documents.
• Universally accepted − XQuery is supported by all major
databases.
• W3C Standard − XQuery is a W3C standard.
Benefits of XQuery
• Using XQuery, both hierarchical and tabular data can be
retrieved.
• XQuery can be directly used to build webpages.
• XQuery can be used to transform xml documents.

You might also like