0% found this document useful (0 votes)
6 views12 pages

Chapter 2 - Types of Digital Data

Chapter 2 of 'Data Visualization Using Python' discusses the types and classification of digital data, highlighting structured, semi-structured, and unstructured data. It emphasizes that structured data is easily manipulated and beneficial for machine learning and SEO, while unstructured data accounts for about 80% of data generated in enterprises. The chapter also contrasts structured and unstructured data in terms of storage, access, and generation characteristics.

Uploaded by

bristi79661
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views12 pages

Chapter 2 - Types of Digital Data

Chapter 2 of 'Data Visualization Using Python' discusses the types and classification of digital data, highlighting structured, semi-structured, and unstructured data. It emphasizes that structured data is easily manipulated and beneficial for machine learning and SEO, while unstructured data accounts for about 80% of data generated in enterprises. The chapter also contrasts structured and unstructured data in terms of storage, access, and generation characteristics.

Uploaded by

bristi79661
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter 2

Types of Digital Data

“Data Visualization Using Python”


Raj Kumar Samanta
Ref: Seema Acharya, Wiley India Pvt. Ltd.
In the presentation

► Digital Data
► Classification of digital data
► Structured data
► Benefits of structured data
► Disadvantages of structured data
► Semi-structured data
► Unstructured data
► Structured Vs. Unstructured data
Digital Data

► Irrespective of the size of an enterprise (big or small), data assume significance as


precious and irreplaceable asset.
► Data are present inside the enterprise and data also exist outside the four walls and
firewalls of the enterprise.
► Data are present in homogeneous sources and data are also there in heterogeneous
sources.
Classification of Digital Data

Digital data Structured data

Semi-structured data

Unstructured data
Classification of Digital Data

Almost 80% of data generated in any enterprise today is unstructured data. Roughly
around 10% of data is in the structured and semi-structured category.
Structured Data

When do we say that the data are structured?


The simple answer is when data conform to a pre-defined schema or structure we say it
is structured data.

Sources of structured data


Benefits of Structured Data

► It can be easily used by machine learning algorithms. It is easy to manipulate and


query-structured data.
► It can be easily used by an average business user.
► There are several tools available in the market to work with and analyze structured
data.
► Structured data is recommended to be used on websites. Structured data helps to
markup your webpage so that search engines can quickly crawl your page. It tells the
search engine what is there on each of the webpage and allows them to easily pick
important bits of the information that they need. This could lead to improved SEO
(Search Engine Optimization). It allows search engines to more accurately display
relevant content.
► Structured data requires less storage space. Structured data is data that is formatted to
fit a pre-defined structure before loading in data storage.
Disadvantages of Structured Data

► Storage inflexibility: Structured data is generally stored in relational databases or data


warehouses both of which have highly rigid and stringent structures.
► Limited use cases: Pre-defined, structured data can only be used for its intended
purpose which limits its use cases.
Semi-Structured Data

► Semi-structured data is also referred to as self-describing structure.


► It does not confirm to the data models that one typically associates with relational data
bases or any other form of data tables.
► It uses tags to segregate semantic elements. Tags are also used to enforce hierarchies
of records and fields within data.

Sources of Semi-structured data


Unstructured Data
► Unstructured data does not conform to any data model.
► Its structure is quite unpredictable
► Human generated – social media comments, emails, word processing, PowerPoint
presentations etc.
► Machine generated – satellite images, scientific data, surveillance images and videos
etc.

Sources of Unstructured data


Structured Vs. Unstructured data

Structured Data Unstructured data


Who Self-service access. Business Users Data Scientists
What Only select data types Varied data types
When Schema on write Schema on read
Where Commonly stored in data warehouse Commonly stored in Data Lakes

How Predefined format Native format


Generated by Human generated – spreadsheets Human generated – social media comments,
Machine generated – weblog emails, word processing, PowerPoint
statistics, Point of sale data such as presentations etc.
barcodes, and quantity Machine generated – satellite images,
scientific data, surveillance images and
videos etc.

Characteristic Quantitative, Factual Qualitative


Thank you

You might also like