0% found this document useful (0 votes)

24 views6 pages

Facets of Data:: Self-Describing Structure

DS and A lab for mongoDB installation and implementation

Uploaded by

Alishba Aleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views6 pages

Facets of Data:: Self-Describing Structure

DS and A lab for mongoDB installation and implementation

Uploaded by

Alishba Aleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Facets of Data:

1. structured data
2. semi-structured data
3. unstructured data

structured data
 predefined fields
 can be arranged in tables or relational databases
 Quantitative and highly organized.
 Effective analysis.
 Easy to export, store, and organize in a database
 Definite format.
 Easy Search.

semi-structured aka self-describing structure.

 Semi-structured data is a form of structured data that does not obey the
tabular structure of data models associated with relational databases or other
forms of data tables, but nonetheless contains tags or other markers to
separate semantic elements and enforce hierarchies of records and fields
within the data.
 Example: Email, XML, JSON

Unstructured
 qualitative data. Example: Text, Images, Videos, Sound
 difficult to analyze
 can be structured with machine learning techniques to extract insights.
Example: Language –Parsing. Images – Segmentation

Big Data:
 mixed data sets that are very large and are a mixture of structured and
unstructured data.
V’s of Big Data:
 Volume
 Velocity
 Value
 Veracity
 Variety

Interaction data comes from recording activities in our day-to-day (digital)

lives

Volume
 Amount of data

Velocity
 velocity increases data volume, often exponentially, it might shorten the window
of data retention or application

Value
 usefulness of gathered data for your business.
 Regardless of its volume, bulk data usually isn’t very useful — to be valuable, it
needs to be converted into insights or information, and that is where data
science and analytics step in.

Veracity
 assurance of quality or credibility of the collected data.

Variety
 Structured/unstructured/semi-structured

Data Science and Analytics

 The process of discovering valuable information from very large databases using
algorithms that discover hidden patterns in data
 The analysis of data to draw hidden insights to aid decision making
 The aim in analyzing all this data is to uncover patterns and connections that
might otherwise be invisible, and that might provide valuable insights about the
users who created it.

Types:
 Descriptive
 Diagnostic
 Predictive
 Prescriptive

2ND SLIDE
Data Science Process
 Understand the Business Problem
1. Data Acquisition/Collection
2. Data Preparation
3. Data Exploration
4. Data Modeling (in-depth analysis)
5. Visualization

Understand the Business Problem

The first thing you must do before you solve a problem is to define exactly what it
is.

1.Data Acquisition/Collection
• This part of the process involves finding suitable data and getting access to the
data from the data owner
• The result of this step is data in its raw form, which probably needs polishing and
transformation before it becomes usable.

2.Data Preparation
• Transforming data from a raw form into data that is directly usable in your
model.
• (Data Splitting, Data integration, Feature Selection)

3.Data Exploration
• The goal of this step is to gain a deep understanding of the information
contained in the data.
• The goal is to look for patterns, correlations, and deviations based on visual and
descriptive techniques.

4.Data Modeling
• It is now that you attempt to gain the insights or make the predictions stated in
your project charter (business problem).
• This step of the process is where you will have to apply your statistical,
mathematical and technological knowledge and leverage all of the data science
tools at your disposal to crunch the data and find every insight you can.

5.Visualization
All the analysis and technical results that you come up with are of little value unless
you can explain to your stakeholders what the results mean, in a way that’s
comprehensible and compelling. Data storytelling is a critical and underrated skill
that you will build and use here.

3RD SLIDE:
Data Modeling:
• The process of sorting and storing data is called "data modeling"
• process of creating a data model for the data to be stored in a database.
• Data modeling helps in the visual representation of data and enforces
regulations on the data

Data Model:
• A data model is a method by which we can organize and store data.
• data model is a conceptual representation of Data objects, the associations
between different data objects, and the rules.
• Data Models ensure consistency in naming conventions, default values,
semantics, security while ensuring quality of the data
• The Data Model is defined as an abstract model that organizes data description,
data semantics, and consistency constraints of data.
• The data model emphasizes what data is needed and how it should be organized
instead of what operations will be performed on data.
• Data Model is like an architect's building plan, which helps to build conceptual
models and set a relationship between data items.

Levels of data Modeling:

1. Conceptual
2. Logical
3. Physical

Conceptual Model (Summary level Data Model/

Domain Model)
• Conceptual Data Model defines WHAT the system contains.
• The purpose is to organize the scope and define business concepts and
rules.
• highly abstract nature
• Conceptual Data Model is an organized view of your data and the
relationships within data. The purpose of creating a conceptual data
model is to establish entities, their attributes, and relationships.
• Offer Organization-wide coverage of business concepts
• designed and developed for a business audience.
• create a common vocabulary for all stakeholders by establishing basic
concepts and scope.
The three basic components of a Conceptual Data Model are:
1. Entity: A real-world thing
2. Attribute: Characteristics or properties of an entity (zero or extremely
limited in number)
3. Relationship: Dependency or association between two entities

Logical Data Model

• Defines HOW the system should be implemented regardless of the DBMS.
• The Logical Data Model is used to define the structure of data elements and
to set relationships between them.
• A logical data model is a fully attributed data model that is independent of
DBMS, technology, data storage or organizational constraints. It typically
describes data requirements from the business point of view

Physical Data Model

• This Data Model describes how the system will be implemented using a specific
DBMS system.
• A physical data model is a fully attributed data model that is dependent upon
a specific version of a data persistence technology.
• It offers database abstraction and helps generate the schema. This is
because of the richness of meta-data offered by a Physical Data Model.
• The physical data model also helps in visualizing database structure by
replicating database column keys, constraints, indexes, triggers, …
• The physical data model describes data for a single project or application though
it may be integrated with other physical data models based on project scope.
• Columns should have exact datatypes, lengths and default values.
• Primary and Foreign keys, views, indexes, access profiles, and authorizations,
etc. are defined.

Benefits of Data Models

Performance: • Good data models can help us quickly query the required data and
reduce I/O throughput.
Cost: • significantly reduce unnecessary data redundancy, reuse computing results,
and reduce the storage and computing costs for the big data system
Efficiency: • Good data models can greatly improve user experience and increase
the efficiency of data utilization.
Quality: • make data statistics more consistent and reduce the possibility of
computing errors

Schema on Read
• In Schema on Read we upload data as it arrives without any changes or
transformations.
• Schema-on-read has fast data ingestion because data doesn’t follow any internal
schema — you are just copying/moving files.

Schema on Write
• Schema on write is defined as creating a schema for data before writing into the
database.
• This is schema-on-write — the approach in which we define the columns, data
format, relationships of columns, etc. before the actual data upload.

KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Database Design
No ratings yet
Database Design
4 pages
Introduction To Data Model L-1
No ratings yet
Introduction To Data Model L-1
17 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
Data Modeling
No ratings yet
Data Modeling
6 pages
Lesson Note SS2 - 120329
No ratings yet
Lesson Note SS2 - 120329
16 pages
CEF342 - Database and Design Chapter 2 - Data Models
No ratings yet
CEF342 - Database and Design Chapter 2 - Data Models
10 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Module 1.2 Data Preprocessing
No ratings yet
Module 1.2 Data Preprocessing
50 pages
DMV Unit-5 - RSK Jkefwebvervbrubvuiewvbuiwev
No ratings yet
DMV Unit-5 - RSK Jkefwebvervbrubvuiewvbuiwev
75 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
UNIT - II Artificial Intelligence Second Part
No ratings yet
UNIT - II Artificial Intelligence Second Part
9 pages
DB Lecture 2
No ratings yet
DB Lecture 2
34 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
33 pages
Data Models
No ratings yet
Data Models
5 pages
Data Modelling Conceptual, Logical, Physical Data Model Types
No ratings yet
Data Modelling Conceptual, Logical, Physical Data Model Types
9 pages
Chapter 5 Summary
No ratings yet
Chapter 5 Summary
7 pages
Data Model Y11
No ratings yet
Data Model Y11
6 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Data Model in Database Management System
No ratings yet
Data Model in Database Management System
4 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
CC105
No ratings yet
CC105
17 pages
Sm301 - Tuhin Pal Dbms
No ratings yet
Sm301 - Tuhin Pal Dbms
10 pages
First Term E-Notes Ss2 Data Processing-1
No ratings yet
First Term E-Notes Ss2 Data Processing-1
16 pages
BIG DATA ANALYTICS Notes Unit 1 and 2
No ratings yet
BIG DATA ANALYTICS Notes Unit 1 and 2
34 pages
Emergency Chapter Two
No ratings yet
Emergency Chapter Two
41 pages
Data Visualization of Multidimensional Data
No ratings yet
Data Visualization of Multidimensional Data
86 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Data Modeling Training
No ratings yet
Data Modeling Training
71 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Data Models in DBMS
No ratings yet
Data Models in DBMS
5 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Data Visualization Techniques: (Course Code: 19CS3051S)
No ratings yet
Data Visualization Techniques: (Course Code: 19CS3051S)
36 pages
Data Model Y11
No ratings yet
Data Model Y11
6 pages
Data Management Techniques Unit 3
No ratings yet
Data Management Techniques Unit 3
35 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
2020 DBMS
No ratings yet
2020 DBMS
46 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Data Modeling: BY Raavi Trinath
No ratings yet
Data Modeling: BY Raavi Trinath
16 pages
Data Science
No ratings yet
Data Science
11 pages
DS Presentation
No ratings yet
DS Presentation
14 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Data Models in DBMS
No ratings yet
Data Models in DBMS
67 pages
Department of Computer Science: Dual Degree Integrated Post Graduate Program
No ratings yet
Department of Computer Science: Dual Degree Integrated Post Graduate Program
31 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Newnnneee
No ratings yet
Newnnneee
19 pages
CC6 Week 3 Chapter 2
No ratings yet
CC6 Week 3 Chapter 2
42 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Data Modeling
No ratings yet
Data Modeling
7 pages
Data Modeling in System Analysis
No ratings yet
Data Modeling in System Analysis
9 pages
What Is DBMS - Application, Types, Example, Advantages
No ratings yet
What Is DBMS - Application, Types, Example, Advantages
7 pages
Microsoft: Exam Questions DP-900
No ratings yet
Microsoft: Exam Questions DP-900
21 pages
Chapter 2-DATABASE SYSTEM Architecture
No ratings yet
Chapter 2-DATABASE SYSTEM Architecture
52 pages
Data Modeling
No ratings yet
Data Modeling
8 pages
CCSK
50% (2)
CCSK
10 pages
NotPetya Cyberattack Report Final
No ratings yet
NotPetya Cyberattack Report Final
2 pages
Unit 1
No ratings yet
Unit 1
61 pages
Database Design Document Hostel Room Allocation and Maintenance System
No ratings yet
Database Design Document Hostel Room Allocation and Maintenance System
12 pages
Red Hat Enterprise Linux-8-Managing File systems-en-US
No ratings yet
Red Hat Enterprise Linux-8-Managing File systems-en-US
159 pages
KeerthanaM 11208 (ECE)
No ratings yet
KeerthanaM 11208 (ECE)
2 pages
Implementing Capacity Management Within SIAM (Service Integration & Management) Amrit Bhattacharya
No ratings yet
Implementing Capacity Management Within SIAM (Service Integration & Management) Amrit Bhattacharya
22 pages
Real Estate Management System
No ratings yet
Real Estate Management System
2 pages
CC103 Mod3
No ratings yet
CC103 Mod3
12 pages
Information Systems: Property of STI
No ratings yet
Information Systems: Property of STI
2 pages
BCA Semester 5 2019
No ratings yet
BCA Semester 5 2019
48 pages
VPN (Virtual Private Network)
No ratings yet
VPN (Virtual Private Network)
49 pages
Essay On An Education Issue
No ratings yet
Essay On An Education Issue
16 pages
IoT Lab Manual
No ratings yet
IoT Lab Manual
47 pages
Ai Powered Sign Language Translator
No ratings yet
Ai Powered Sign Language Translator
10 pages
Valotario Lenovo PDF
No ratings yet
Valotario Lenovo PDF
19 pages
Hi! We're Netcon
No ratings yet
Hi! We're Netcon
16 pages
Curriculam Vitae: Kailash Chandra Singh
No ratings yet
Curriculam Vitae: Kailash Chandra Singh
4 pages
Quiz On Structured Data
No ratings yet
Quiz On Structured Data
7 pages
Transport Layer Security (TLS)
No ratings yet
Transport Layer Security (TLS)
14 pages
Specialist - CSV
No ratings yet
Specialist - CSV
2 pages
Ms .Sareeta Pradhan IITTM, Bhubaneswar
No ratings yet
Ms .Sareeta Pradhan IITTM, Bhubaneswar
33 pages
CV - Saket
No ratings yet
CV - Saket
1 page
Deepak Tare CV
No ratings yet
Deepak Tare CV
2 pages
Product Owner Resume Template
No ratings yet
Product Owner Resume Template
1 page
BCS Higher Education Qualifications Diploma in IT IT Service Management Syllabus
No ratings yet
BCS Higher Education Qualifications Diploma in IT IT Service Management Syllabus
6 pages
Certification Training: Cisco
No ratings yet
Certification Training: Cisco
3 pages
EPM Functions
0% (1)
EPM Functions
3 pages
Impersonation For CA SiteMinder
No ratings yet
Impersonation For CA SiteMinder
2 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Facets of Data:: Self-Describing Structure

Uploaded by

Facets of Data:: Self-Describing Structure

Uploaded by

Facets of Data:

semi-structured aka self-describing structure.

Interaction data comes from recording activities in our day-to-day (digital)

Data Science and Analytics

Understand the Business Problem

Levels of data Modeling:

Conceptual Model (Summary level Data Model/

Logical Data Model

Physical Data Model

Benefits of Data Models

You might also like