0% found this document useful (0 votes)
20 views12 pages

Chapter 2 - Types of Digital Data

Chapter 2 discusses the types of digital data, classifying it into structured, semi-structured, and unstructured categories. It highlights that 80% of data generated in enterprises is unstructured, with structured data being easily manipulable and beneficial for SEO. The chapter also outlines the advantages and disadvantages of structured data compared to unstructured data, emphasizing their differing characteristics and storage methods.

Uploaded by

shreyajha2324
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views12 pages

Chapter 2 - Types of Digital Data

Chapter 2 discusses the types of digital data, classifying it into structured, semi-structured, and unstructured categories. It highlights that 80% of data generated in enterprises is unstructured, with structured data being easily manipulable and beneficial for SEO. The chapter also outlines the advantages and disadvantages of structured data compared to unstructured data, emphasizing their differing characteristics and storage methods.

Uploaded by

shreyajha2324
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter 2

Types of Digital Data

“Reimagining Data Visualization Using Python”


Seema Acharya
Copyright  2022 Wiley India Pvt. Ltd. All rights reserved.
In the presentation

 Digital Data
 Classification of digital data
 Structured data
 Benefits of structured data
 Disadvantages of structured data
 Semi-structured data
 Unstructured data
 Structured Vs. Unstructured data
Digital Data

 Irrespective of the size of an enterprise (big or small), data assume


significance as precious and irreplaceable asset.
 Data are present inside the enterprise and data also exist outside the four
walls and firewalls of the enterprise.
 Data are present in homogeneous sources and data are also there in
heterogeneous sources.
Classification of Digital Data

Digital data Structured data

Semi-structured data

Unstructured data
Classification of Digital Data

Almost 80% of data generated in any enterprise today is unstructured data.


Roughly around 10% of data is in the structured and semi-structured
category.

10% Structured Data

10% Semi-Structured Data

80% UnStructured Data


Structured Data

When do we say that the data are structured?


The simple answer is when data conform to a pre-defined schema or
structure we say it is structured data.

Sources of structured
data
Benefits of Structured Data

 It can be easily used by machine learning algorithms. It is easy to manipulate


and query-structured data.
 It can be easily used by an average business user.
 There are several tools available in the market to work with and analyze
structured data.
 Structured data is recommended to be used on websites. Structured data
helps to markup your webpage so that search engines can quickly crawl your
page. It tells the search engine what is there on each of the webpage and
allows them to easily pick important bits of the information that they need.
This could lead to improved SEO (Search Engine Optimization). It allows search
engines to more accurately display relevant content.
 Structured data requires less storage space. Structured data is data that is
formatted to fit a pre-defined structure before loading in data storage.
Disadvantages of Structured Data

 Storage inflexibility: Structured data is generally stored in relational


databases or data warehouses both of which have highly rigid and stringent
structures.
 Limited use cases: Pre-defined, structured data can only be used for its
intended purpose which limits its use cases.
Semi-Structured Data

 Semi-structured data is also referred to as self-describing structure.


 It does not confirm to the data models that one typically associates with
relational data bases or any other form of data tables.
 It uses tags to segregate semantic elements. Tags are also used to
enforce hierarchies of records and fields within data.

Sources of Semi-structured
data
Unstructured Data
 Unstructured data does not conform to any data model.
 Its structure is quite unpredictable
 Human generated – social media comments, emails, word processing,
PowerPoint presentations etc.
 Machine generated – satellite images, scientific data, surveillance images
and videos etc.

Sources of Unstructured data


Structured Vs. Unstructured data

Structured Data Unstructured data


Who Self-service access. Business Data Scientists
Users
What Only select data types Varied data types
When Schema on write Schema on read
Where Commonly stored in data Commonly stored in Data Lakes
warehouse

How Predefined format Native format


Generated Human generated – Human generated – social media
by spreadsheets comments, emails, word processing,
Machine generated – weblog PowerPoint presentations etc.
statistics, Point of sale data Machine generated – satellite images,
such as barcodes, and quantity scientific data, surveillance images and
videos etc.

Characterist Quantitative, Factual Qualitative


ic
Thank you

You might also like