0% found this document useful (0 votes)
8 views2 pages

Unstructured Data

Unstructured data is information that lacks a fixed format, making it challenging to organize and analyze, and includes formats such as text documents, images, and videos. It is characterized by its lack of structure, variety, large volume, and diverse sources, but poses challenges in storage, management, and security. Solutions for handling unstructured data include converting it to manageable formats, using content addressable storage systems, and employing data mining tools for information extraction.

Uploaded by

g.mahalakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

Unstructured Data

Unstructured data is information that lacks a fixed format, making it challenging to organize and analyze, and includes formats such as text documents, images, and videos. It is characterized by its lack of structure, variety, large volume, and diverse sources, but poses challenges in storage, management, and security. Solutions for handling unstructured data include converting it to manageable formats, using content addressable storage systems, and employing data mining tools for information extraction.

Uploaded by

g.mahalakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

What is Unstructured Data?

Last Updated : 26 Sep, 2024



Unstructured data refers to information that doesn’t have a fixed format or


structure that makes it difficult to organize and analyze. Unlike structured data,
which is neatly arranged in tables, unstructured data includes a variety of
formats such as text documents, images, videos.
Unstructured Data
 Unstructured data refers to information that does not have a predefined
data model or structure, making it challenging to collect, process and analyze
using traditional data management tools.
 Unlike structured data, which is organized in a well-defined format
(like rows and columns in a relational database), unstructured data can
come in various forms and formats.
Characteristics of Unstructured Data
1. Lack of Format: Unstructured data does not fit neatly into tables or
databases. It can be textual or non-textual, making it difficult to categorize
and organize.
2. Variety: This type of data can include a wide range of formats, such as:
 Text documents (e.g., emails, reports, articles)
 Multimedia files (e.g., images, audio, video)
 Social media content (e.g., posts, comments, tweets)
 Web pages and blogs
3. Volume: Unstructured data represents a significant portion of the data
generated today. It is often larger in volume compared to structured data.
4. Diverse Sources: It can originate from various sources, including user-
generated content, sensor data, customer interactions, and more.
Sources of Unstructured Data:
 Web pages
 Images (JPEG, GIF, PNG, etc.)
 Videos
 Memos
 Reports
 Word documents and PowerPoint presentations
 Surveys
Advantages of Unstructured Data:
 It supports the data that lacks a proper format or sequence
 The data is not constrained by a fixed schema
 Very Flexible due to the absence of schema.
 Data is portable
 It is very scalable
 It can deal easily with the heterogeneity of sources.
 These types of data have a variety of business intelligence and analytics
applications.
Disadvantages Of Unstructured Data:
 It is difficult to store and manage unstructured data due to lack of schema
and structure.
 Indexing the data is difficult and error-prone due to unclear structure and not
having pre-defined attributes. Due to this search results are not very
accurate.
 Ensuring the security of data is a difficult task.
Challenges in Storing Unstructured Data:
 It requires a lot of storage space to store unstructured data.
 It is difficult to store videos, images, audio, etc.
 Due to unclear structure, operations like update, delete, and search are very
difficult.
 Storage cost is high as compared to structured data.

 Indexing the unstructured data is difficult


Solution for Storing Unstructured Data
 Unstructured data can be converted to easily manageable formats
 Using a content addressable storage system (CAS) to store unstructured data.
 It stores data based on their metadata and a unique name is assigned to
every object stored in it. The object is retrieved based on content, not its
location.
 Unstructured data can be stored in XML format.
 Unstructured data can be stored in RDBMS which supports BLOBs
Extracting Information from Unstructured Data:
Unstructured data do not have any structure. So it can not easily interpreted
by conventional algorithms. It is also difficult to tag and index unstructured
data. So extracting information from them is a tough job. Here are possible
solutions:
 Taxonomies or classification of data helps in organizing data in a
hierarchical structure. Which will make the search process easy.
 Data can be stored in the virtual repository and be automatically tagged. For
example Documentum.
 Use of application platforms like XOLAP.
XOLAP helps in extracting information from e-mails and XML-based
documents
 Use of various data mining tools

You might also like