0% found this document useful (0 votes)
24 views6 pages

Facets of Data:: Self-Describing Structure

DS and A lab for mongoDB installation and implementation

Uploaded by

Alishba Aleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Facets of Data:: Self-Describing Structure

DS and A lab for mongoDB installation and implementation

Uploaded by

Alishba Aleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Facets of Data:

1. structured data
2. semi-structured data
3. unstructured data

structured data
 predefined fields
 can be arranged in tables or relational databases
 Quantitative and highly organized.
 Effective analysis.
 Easy to export, store, and organize in a database
 Definite format.
 Easy Search.

semi-structured aka self-describing structure.


 Semi-structured data is a form of structured data that does not obey the
tabular structure of data models associated with relational databases or other
forms of data tables, but nonetheless contains tags or other markers to
separate semantic elements and enforce hierarchies of records and fields
within the data.
 Example: Email, XML, JSON

Unstructured
 qualitative data. Example: Text, Images, Videos, Sound
 difficult to analyze
 can be structured with machine learning techniques to extract insights.
Example: Language –Parsing. Images – Segmentation

Big Data:
 mixed data sets that are very large and are a mixture of structured and
unstructured data.
V’s of Big Data:
 Volume
 Velocity
 Value
 Veracity
 Variety

Interaction data comes from recording activities in our day-to-day (digital)


lives

Volume
 Amount of data

Velocity
 velocity increases data volume, often exponentially, it might shorten the window
of data retention or application

Value
 usefulness of gathered data for your business.
 Regardless of its volume, bulk data usually isn’t very useful — to be valuable, it
needs to be converted into insights or information, and that is where data
science and analytics step in.

Veracity
 assurance of quality or credibility of the collected data.

Variety
 Structured/unstructured/semi-structured

Data Science and Analytics


 The process of discovering valuable information from very large databases using
algorithms that discover hidden patterns in data
 The analysis of data to draw hidden insights to aid decision making
 The aim in analyzing all this data is to uncover patterns and connections that
might otherwise be invisible, and that might provide valuable insights about the
users who created it.

Types:
 Descriptive
 Diagnostic
 Predictive
 Prescriptive

2ND SLIDE
Data Science Process
 Understand the Business Problem
1. Data Acquisition/Collection
2. Data Preparation
3. Data Exploration
4. Data Modeling (in-depth analysis)
5. Visualization

Understand the Business Problem


The first thing you must do before you solve a problem is to define exactly what it
is.

1.Data Acquisition/Collection
• This part of the process involves finding suitable data and getting access to the
data from the data owner
• The result of this step is data in its raw form, which probably needs polishing and
transformation before it becomes usable.

2.Data Preparation
• Transforming data from a raw form into data that is directly usable in your
model.
• (Data Splitting, Data integration, Feature Selection)

3.Data Exploration
• The goal of this step is to gain a deep understanding of the information
contained in the data.
• The goal is to look for patterns, correlations, and deviations based on visual and
descriptive techniques.

4.Data Modeling
• It is now that you attempt to gain the insights or make the predictions stated in
your project charter (business problem).
• This step of the process is where you will have to apply your statistical,
mathematical and technological knowledge and leverage all of the data science
tools at your disposal to crunch the data and find every insight you can.

5.Visualization
All the analysis and technical results that you come up with are of little value unless
you can explain to your stakeholders what the results mean, in a way that’s
comprehensible and compelling. Data storytelling is a critical and underrated skill
that you will build and use here.

3RD SLIDE:
Data Modeling:
• The process of sorting and storing data is called "data modeling"
• process of creating a data model for the data to be stored in a database.
• Data modeling helps in the visual representation of data and enforces
regulations on the data

Data Model:
• A data model is a method by which we can organize and store data.
• data model is a conceptual representation of Data objects, the associations
between different data objects, and the rules.
• Data Models ensure consistency in naming conventions, default values,
semantics, security while ensuring quality of the data
• The Data Model is defined as an abstract model that organizes data description,
data semantics, and consistency constraints of data.
• The data model emphasizes what data is needed and how it should be organized
instead of what operations will be performed on data.
• Data Model is like an architect's building plan, which helps to build conceptual
models and set a relationship between data items.

Levels of data Modeling:


1. Conceptual
2. Logical
3. Physical

Conceptual Model (Summary level Data Model/


Domain Model)
• Conceptual Data Model defines WHAT the system contains.
• The purpose is to organize the scope and define business concepts and
rules.
• highly abstract nature
• Conceptual Data Model is an organized view of your data and the
relationships within data. The purpose of creating a conceptual data
model is to establish entities, their attributes, and relationships.
• Offer Organization-wide coverage of business concepts
• designed and developed for a business audience.
• create a common vocabulary for all stakeholders by establishing basic
concepts and scope.
The three basic components of a Conceptual Data Model are:
1. Entity: A real-world thing
2. Attribute: Characteristics or properties of an entity (zero or extremely
limited in number)
3. Relationship: Dependency or association between two entities

Logical Data Model


• Defines HOW the system should be implemented regardless of the DBMS.
• The Logical Data Model is used to define the structure of data elements and
to set relationships between them.
• A logical data model is a fully attributed data model that is independent of
DBMS, technology, data storage or organizational constraints. It typically
describes data requirements from the business point of view

Physical Data Model


• This Data Model describes how the system will be implemented using a specific
DBMS system.
• A physical data model is a fully attributed data model that is dependent upon
a specific version of a data persistence technology.
• It offers database abstraction and helps generate the schema. This is
because of the richness of meta-data offered by a Physical Data Model.
• The physical data model also helps in visualizing database structure by
replicating database column keys, constraints, indexes, triggers, …
• The physical data model describes data for a single project or application though
it may be integrated with other physical data models based on project scope.
• Columns should have exact datatypes, lengths and default values.
• Primary and Foreign keys, views, indexes, access profiles, and authorizations,
etc. are defined.

Benefits of Data Models


Performance: • Good data models can help us quickly query the required data and
reduce I/O throughput.
Cost: • significantly reduce unnecessary data redundancy, reuse computing results,
and reduce the storage and computing costs for the big data system
Efficiency: • Good data models can greatly improve user experience and increase
the efficiency of data utilization.
Quality: • make data statistics more consistent and reduce the possibility of
computing errors

Schema on Read
• In Schema on Read we upload data as it arrives without any changes or
transformations.
• Schema-on-read has fast data ingestion because data doesn’t follow any internal
schema — you are just copying/moving files.

Schema on Write
• Schema on write is defined as creating a schema for data before writing into the
database.
• This is schema-on-write — the approach in which we define the columns, data
format, relationships of columns, etc. before the actual data upload.

You might also like