Week 3 Assignment
Week 3 Assignment
Question : Provide an overview of the different type of datasets used in data analysis (Stuctured,
Unstructured, and Semi Stuctured).
Structured data
Structured data are data whose elements are addressable for effective analysis. It has been organized
into a formatted repository that is typically a database. It makes up about 10% - 20% of generated data
and has clearly defined data types & patterns that makes them easily stored and organized into row and
columns. It is usually stored in relational database e.g SQL or Spread Sheets. They have relational keys
and can easily be mapped into pre-designed fields. Today, those data are most processed in the
development and simplest way to manage information. Example: Relational data.
Semi-Structured data
Semi-structured data is information that does not reside in a relational database but that has some
organizational properties that make it easier to analyze. With some processes, you can store them in the
relation database (it could be very hard for some kind of semi-structured data), but Semi-structured
exist to ease space. Example: XML data.
Unstructured data
Unstructured data is a data which is not organized in a predefined manner or does not have a
predefined data model, thus it is not a good fit for a mainstream relational database. So for
Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent
in IT systems and is used by organizations in a variety of business intelligence and analytics applications.
It makes up about 80% of generated data and cannot be organised Example: Word, PDF, Text, Media
logs.
Differences between Structured, Semi-structured and Unstructured data:
Query Performance Structured query allow Queries over Only textual queries
complex joining anonymous nodes are are possible
possible