0% found this document useful (0 votes)
84 views24 pages

Xml-Introduction Structured, Semi-Structured and Unstructured Data

XML is a text-based markup language used to store and organize data. It allows users to create custom tags to describe data. The document discusses structured, semi-structured, and unstructured data. Structured data has a predefined format like databases, semi-structured data uses tags like XML but has some flexibility, and unstructured data does not have a clear structure like text, images, and videos. Understanding how to categorize different data types is important for performing effective data analysis on business information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views24 pages

Xml-Introduction Structured, Semi-Structured and Unstructured Data

XML is a text-based markup language used to store and organize data. It allows users to create custom tags to describe data. The document discusses structured, semi-structured, and unstructured data. Structured data has a predefined format like databases, semi-structured data uses tags like XML but has some flexibility, and unstructured data does not have a clear structure like text, images, and videos. Understanding how to categorize different data types is important for performing effective data analysis on business information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

XML-INTRODUCTION

STRUCTURED ,SEMI-
STRUCTURED AND
UNSTRUCTURED DATA
SUBMITTED TO- DR. MAMTA JUNEJA
SUBMITTED BY- MALYA SINGH (20-310)
MONIKA SHARMA (20-315)
CLASS- ME-CSE (1st SEM)
XML-INTRODUCTION
 XML stands for Extensible Markup Language.
It is a text-based markup language derived
from Standard Generalized Markup Language
(SGML).
 XML tags identify the data and are used to
store and organize the data, rather than
specifying how to display it like HTML tags,
which are used to display the data.
 XML simplifies data sharing, data transport,
platform changes and data availability.
 It is designed to be self-descriptive.
There are important characteristics of XML that make it
useful in a variety of systems and solutions −

XML is extensible − XML allows you to create your own


self-descriptive tags (no predefined tags), or language
that suits your application.

XML carries the data, does not present it − XML allows
you to store the data irrespective of how it will be
presented.

XML is a public standard − XML was developed by an


organization called the World Wide Web Consortium (W3C)
and is available as an open standard.

XML is a markup language - that defines set of rules for


encoding documents in a format that is both human-
readable and machine-readable.
EXAMPLE
<?xml version = "1.0"?>
<contact-info>
<name>Aarna Malhotra</name>
<company>Google</company>
<phone>(011)123-4567</phone>
</contact-info>

Difference between XML and HTML-


 XML was designed to carry data - with focus on what data
is.
 HTML was designed to display data - with focus on how
data looks.
 XML tags are not predefined like HTML tags are.
STRUCTURED DATA-
 Structured data is the data which conforms to a
data model, has a well define structure, follows
a consistent order and can be easily accessed
and used by a person or a computer program.
Structured data is usually stored in well-defined
schemas such as Databases. It is generally
tabular with column and rows that clearly define
its attributes.

Example:  Excel files or SQL databases


EXAMPLE-
•Structured data have a well defined structure
that helps in easy storage and access of data.
• Data can be indexed based on text string as
well as attributes. This makes search operation
hassle-free.
• Data mining is easy i.e knowledge can be easily
extracted from data.
• Operations such as Updating and deleting is
easy due to well structured form of data.
• Business Intelligence operations such as Data
warehousing can be easily undertaken.
• Easily scalable in case there is an increment of
data.
• Ensuring security to data is easy.
UNSTRUCTURED DATA-
 Unstructured data are more or less data that
does not have a structure or data that is not
within the semantics of rows and columns.
In the modern world, abundant unstructured
data are arising from sensors readings, IoT
applications, text data, and many more.
 Unstructured data has an internal structure,
but it’s not predefined through data models.
These types of data might be human or
machine-generated in a non-textual format.
 In a simple sense, unstructured data can be
thought of as data that is not managed in
a relational database management system
(RDBMS). 

 So we can say the data which does not fit in a


structured form ( spreadsheets , relational model
etc ), it includes things like video , audio, image
files ,log files , social media posts etc. even email
has some unstructured aspect to it .

 The ability to store and process unstructured data


has greatly grown in recent years, with many new
technologies to store specialized types of
unstructured data such as  MongoDB
EXAMPLE OF UNSTRUCTURED
DATA
 Example: radiological report (x-ray, MRI,
ultrasound) reports generate vendor-neutral
 reporting templates that are unstructured
EXAMPLES OF UNSTRUCTURED
DATA
SEMI-STRUCTURED DATA
 Semi-structured data are the types of data that are 
based on Extensible Markup Language (XML). Semi-
structured data does not contain the same level of
flexibility as structured data. In addition to XML, HTML is a
subset of XML since most parts of an HTML in extendable –
meaning only a part of the structure is understandable.
 The data that is considered semi-structured does not
reside in fixed fields or records but does contain
elements that can separate the data into various
hierarchies.
 A typical example of semi-structured data is photos taken
with a Smartphone. Every photo contains some mixture of
semi-structured image content as well as the time,
location, and other identifiable information.
 Semi-structured data is a form of structured
data that does not conform with the formal
structure of data models associated with
relational databases or other forms of data
tables, but nonetheless contain tags or other
markers to separate semantic elements and
enforce hierarchies of records and fields
within the data. Therefore, it is also known
as self-describing structure.
 Semi structured data is combination of
both ..example email , social media
EXAMPLE : WEBPAGE OF HTML
WEBSITE
A RENDERED HTML WEBSITE IS EXAMPLE OF
SEMISTRUCTURED DATA
EXAMPLES OF SEMI-
STRUCTURED DATA
COMPARISON OF DIFFERENT DATA TYPES
SUMMERY
 In every business or organization, there will be access
to massive amounts of structured, semi-structured,
and unstructured data from different sources in all
formats. To make use of the data, robust data
analytics must be performed to make the most of the
data. Separating the data accordingly to its types is
the first step to performing analysis.
 To recap, structured data is easily organizable, which
follows a rigid format; unstructured is complex.
Often times, highly qualitative information is
impossible to reduce to or organize in a 
relational database, and semi-structured data has
both elements.
THANK YOU !

You might also like