0% found this document useful (0 votes)

2 views

Unit II

Data formats refer to the structure in which data is stored and processed, including structured, semi-structured, and unstructured types. A data model is a conceptual representation that organizes data in databases, with key elements like entities, attributes, and relationships, and various types including hierarchical and relational models. Data marts are smaller, subject-oriented data warehouses that provide faster access to specific data compared to larger data warehouses, with advantages and disadvantages in terms of cost, complexity, and data analysis capabilities.

Uploaded by

Shafeen Nagoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit II

Uploaded by

Shafeen Nagoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Unit II: Data Models

What is Data Format?

 Data format refers to the structure or arrangement in which data is stored, processed, and
exchanged between systems. . It specifies how information is organized for storage and
retrieval
 In Big Data, understanding different data formats is crucial for efficient storage and
retrieval.
 Data can come in a variety of formats—structured, semi-structured, and unstructured—
depending on its source and use case.

Types of Data Formats

1. Structured Data
o This data is well-organized and stored in predefined schemas such as rows and
columns in relational databases.
o It is easy to store, access, and process.
o Example: Sales records, employee data in an Excel sheet, or SQL databases.
2. Semi-structured Data
o This type of data does not follow a rigid schema but has some organizational
properties, making it easier to analyze than unstructured data.
o Common formats include XML, JSON, and CSV files.
o Example: Web server logs, sensor data, and emails.
3. Unstructured Data
o This data does not have a predefined format, making it the most challenging to store
and analyze.
o It requires specialized tools for processing.
o Example: Images, videos, social media posts, and multimedia files.

Importance of Understanding Data Formats:

 Helps in choosing the appropriate storage system.

 Optimizes data processing and retrieval.
 Improves decision-making by enabling better data analysis

What is a Data Model?

A data model is a conceptual representation of how data is organized, stored, and accessed in
databases. It acts as a blueprint for designing and managing data in both traditional databases and
Big Data environments. Data models help manage large and complex datasets, ensuring efficiency
in data storage, retrieval, and analysis.

Key Elements of a Data Model

1. Entities:
o Objects or concepts that store data.
o Example: In a university database, entities can be Students, Courses, and
Departments.
2. Attributes:
o Characteristics or properties of entities.
o Example: For the Student entity, attributes could be Name, ID Number, and Date of
Birth.
3. Relationships:
o Defines how entities are related to each other.
o Example: A Student entity can have a relationship with a Course entity—enrolled
in.

Types of Data Models

1. Hierarchical Data Model:

o Data is organized in a tree-like structure. Each child node has a single parent.
o Example: File systems, where folders have subfolders or files.
o Usage: Early mainframe databases.
2. Relational Data Model:
o Represents data in tables (rows and columns) with relationships between tables.
o Example: SQL-based databases like MySQL and PostgreSQL.
o Usage: Banking systems, customer relationship management (CRM).
3. Network Data Model:
o Similar to the hierarchical model but allows many-to-many relationships between
entities.
o Usage: Complex applications like telecommunications networks.
4. Document Data Model:
o Stores data as documents, typically in JSON or BSON format.
o Example: MongoDB.
o Usage: Web applications, content management systems.
5. Key-Value Data Model:
o Stores data as key-value pairs for fast access.
o Example: Redis, Amazon DynamoDB.
o Usage: Real-time applications and caching.
6. Graph Data Model:
o Represents data as nodes and edges to show relationships.
o Example: Neo4j.
o Usage: Social network analysis, recommendation systems, fraud detection.

Example of a Data Model in a Relational Database

 Entity: Customer
o Attributes: Customer ID, Name, Email, Phone Number
 Entity: Orders
o Attributes: Order ID, Order Date, Amount
 Relationship: Customer places Orders.

Benefits of Appropriate Models and Storage Environments in Big Data

Selecting the right data model and storage environment is crucial for the success of Big Data
projects. The benefits include:

1. Scalability:
o Allows systems to handle large datasets and grow as data volume increases.
o Example: Hadoop Distributed File System (HDFS) and Amazon S3 provide
scalable storage solutions.
2. High Performance:
o Optimized data models enhance query performance and reduce latency.
o Example: Columnar storage like Apache Parquet for fast read-heavy analytics.
3. Cost-Effectiveness:
o Appropriate models and storage systems reduce hardware and maintenance costs.
o Cloud-based storage (Google BigQuery) eliminates the need for physical
infrastructure.
4. Data Integration:
o Enables seamless integration of data from multiple sources, improving data quality
and consistency.
o Example: Data lakes integrate structured, semi-structured, and unstructured data.
5. Real-Time Analytics:
o Supports real-time data processing for immediate insights and decision-making.
o Example: Kafka and Spark Streaming for processing streaming data.
6. Data Security and Compliance:
o Modern storage environments offer advanced security features to protect sensitive
data.
o Example: Role-based access control (RBAC) and encryption in cloud storage
systems like Azure Data Lake.
7. Flexibility:

o Different models cater to different types of data (structured, semi-structured,

unstructured).
o Example: Document databases (MongoDB) for flexible schema designs.

Examples of Storage Environments:

 Hadoop Distributed File System (HDFS): Manages large datasets across distributed
clusters.
 Amazon S3: Cloud-based object storage for scalable and secure data storage.
 Google BigQuery: Real-time analytics platform for processing large datasets.


What is a Data Mart?

A: A data mart is a small, subject-oriented data warehouse that stores data specific to a business
function or department, such as sales or marketing. It provides faster access to relevant data for
analysis and decision-making.

Features of a Data Mart:

 Smaller and more focused than a data warehouse.

 Provides fast access to specific data.
 Helps in targeted business analysis.
 Easier to maintain and administer.

How is a data mart different from a data warehouse?

A data warehouse stores the entire organization’s data, while a data mart stores data related to a
specific department or function. Data marts are smaller, faster, and more focused.
Different types of data mart

1. Dependent Data Mart

o Extracts data from a central data warehouse.
o Ensures consistency with enterprise-wide data.
o Example: A sales department data mart derived from a company’s main data
warehouse.
2. Independent Data Mart
o Created directly from operational systems without relying on a central data
warehouse.
o Suitable for smaller organizations.
o Example: A stand-alone marketing data mart.
3. Hybrid Data Mart
o Combines data from both a data warehouse and external operational systems.
o Offers more flexibility in integrating data.
o Example: A finance data mart that combines internal financial data with external
market trends.
Advantages of Data mart

1. Simpler, more focused & flexible.

2. Low cost for both h/w and s/w.
3. Faster and cheaper to build.
4. Stores data closer that enhances performance.

Disadvantages of data mart

1. Maintaining a large number of independent data marts can be time-consuming, resource-

intensive, and challenging to manage.
2. Data Mart cannot provide company-wide data analysis as their data set is limited.
3. Without proper planning and coordination, the development of data marts can become
unorganized.
4. Increase in data mart size leads to problems such as performance degradation and data
inconsistency.

Data Mart vs Data Warehouse

Data Mart Data Warehouse

It is a decentralized system It is a centralized system

It is a bottom-up model It is a top-down model
Building a mart is easy Building a warehouse is difficult
Smaller, handles a subset of data Larger, handles vast amounts of data
Limited to a few sources Multiple sources across the organization
Focused on one department or group Serves multiple departments and users
Simpler design, fewer integration needs More complex, integrates data from diverse sources
Lower cost, faster implementation Higher cost, longer implementation
Better performance, faster queries May have slower queries due to large data volume

pyq DMDW
No ratings yet
pyq DMDW
8 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Introduction to Data Models 677e35511a823
No ratings yet
Introduction to Data Models 677e35511a823
45 pages
2 emerging
No ratings yet
2 emerging
10 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Moshi Moshi
No ratings yet
Moshi Moshi
25 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
4 pages
Lesson note SS2_120329
No ratings yet
Lesson note SS2_120329
16 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
DAunit1 (1)
No ratings yet
DAunit1 (1)
22 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
27 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
What Is Data Mart-1
No ratings yet
What Is Data Mart-1
4 pages
Selected Topic: Data Modeling and Management: What Are You Thinking of When We Talk About ?
No ratings yet
Selected Topic: Data Modeling and Management: What Are You Thinking of When We Talk About ?
28 pages
Big Data
No ratings yet
Big Data
10 pages
Data Analytics
No ratings yet
Data Analytics
69 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
TIS Chapter 3
No ratings yet
TIS Chapter 3
36 pages
UNIT 3 DBMS
No ratings yet
UNIT 3 DBMS
114 pages
Lecture 2.1 - Data Storage and Data Models
No ratings yet
Lecture 2.1 - Data Storage and Data Models
18 pages
BIGDATA ANALYTICS
No ratings yet
BIGDATA ANALYTICS
19 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
BUAN6320 - Chapter 2 & 9
No ratings yet
BUAN6320 - Chapter 2 & 9
55 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Digital and Leadership Acumen
No ratings yet
Digital and Leadership Acumen
35 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Summary of Big Data Modeling and Management
No ratings yet
Summary of Big Data Modeling and Management
6 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
WK 3
No ratings yet
WK 3
29 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
dbms
No ratings yet
dbms
8 pages
Module 1.2 Data Preprocessing
No ratings yet
Module 1.2 Data Preprocessing
50 pages
Big Data Vs Traditional Database
No ratings yet
Big Data Vs Traditional Database
19 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
big data processing
No ratings yet
big data processing
38 pages
dm
No ratings yet
dm
5 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
2 Big Data Management and Modeling
No ratings yet
2 Big Data Management and Modeling
9 pages
Chapter 3
No ratings yet
Chapter 3
4 pages
Unit 1 Introduction To Big Data
No ratings yet
Unit 1 Introduction To Big Data
80 pages
Self Prepared
No ratings yet
Self Prepared
147 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
Unit 1 Introduction: Data Science and Big Data: Syllabus
No ratings yet
Unit 1 Introduction: Data Science and Big Data: Syllabus
38 pages
Ds unit 3 notes
No ratings yet
Ds unit 3 notes
29 pages
Kuliah M1 - TEKREK - Komputasi Big Data
No ratings yet
Kuliah M1 - TEKREK - Komputasi Big Data
55 pages
Big Data
No ratings yet
Big Data
957 pages
Unit I
No ratings yet
Unit I
11 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
Dbms
No ratings yet
Dbms
10 pages
Coronelpptch02 Datamodels 191215041240
No ratings yet
Coronelpptch02 Datamodels 191215041240
69 pages
Comprehensive_Excel_Course_Book
No ratings yet
Comprehensive_Excel_Course_Book
5 pages
TSSR DNB Colo New 2021 5g Ran Phase 1a Kul 182 Dwkul0099 Menaraseputih b4
No ratings yet
TSSR DNB Colo New 2021 5g Ran Phase 1a Kul 182 Dwkul0099 Menaraseputih b4
24 pages
Spring 2025 - Database Systems - BSCS
No ratings yet
Spring 2025 - Database Systems - BSCS
4 pages
写作作品集
100% (1)
写作作品集
6 pages
Textbook UNIT 8 + UNIT 9
No ratings yet
Textbook UNIT 8 + UNIT 9
25 pages
Survery On Fpga and LLM
No ratings yet
Survery On Fpga and LLM
16 pages
Perforation Slide
No ratings yet
Perforation Slide
80 pages
Ferret Business Mod List
No ratings yet
Ferret Business Mod List
15 pages
UT350
No ratings yet
UT350
6 pages
Uas Bing Wida Jawaban
No ratings yet
Uas Bing Wida Jawaban
6 pages
ERV-2020-LGUK ERVB 05 20v1
No ratings yet
ERV-2020-LGUK ERVB 05 20v1
10 pages
The Role of Artificial Intelligence in Modern Healthcare
No ratings yet
The Role of Artificial Intelligence in Modern Healthcare
2 pages
Audio CD README
No ratings yet
Audio CD README
6 pages
Brochure Ureofix500classic
No ratings yet
Brochure Ureofix500classic
6 pages
41 M & S: I LPG: FG Ervices Ndustrial Policy and Reforms
No ratings yet
41 M & S: I LPG: FG Ervices Ndustrial Policy and Reforms
44 pages
Ms Access All Previous Year Question
No ratings yet
Ms Access All Previous Year Question
39 pages
Product Hunting Sheet
No ratings yet
Product Hunting Sheet
7 pages
Practical No 26 - Merged
No ratings yet
Practical No 26 - Merged
18 pages
Prolad: Molten Metal Level Control
No ratings yet
Prolad: Molten Metal Level Control
2 pages
Instant download (Ebook) Information Security Text and Cases 2.0 by Gurpreet Dhillon ISBN 9781943153244, 9781943153251, 1943153248, 1943153256 pdf all chapter
100% (8)
Instant download (Ebook) Information Security Text and Cases 2.0 by Gurpreet Dhillon ISBN 9781943153244, 9781943153251, 1943153248, 1943153256 pdf all chapter
71 pages
Technology Acquisition
No ratings yet
Technology Acquisition
14 pages
Art To The Aid of Technology
No ratings yet
Art To The Aid of Technology
13 pages
Astm F2306 F2306M 18
No ratings yet
Astm F2306 F2306M 18
6 pages
Adegbola Silifat Bolanle: Customer Statement
No ratings yet
Adegbola Silifat Bolanle: Customer Statement
126 pages
Sensor Summary
No ratings yet
Sensor Summary
10 pages
Permit Required Confined Spaces - English
No ratings yet
Permit Required Confined Spaces - English
3 pages
New Format of Daily Report
No ratings yet
New Format of Daily Report
17 pages
IRIG B Synchronization
No ratings yet
IRIG B Synchronization
6 pages
PL 300
No ratings yet
PL 300
94 pages
Producing Key Drawings For Animation
No ratings yet
Producing Key Drawings For Animation
107 pages

Unit II

Uploaded by

Unit II

Uploaded by

Unit II: Data Models

What is Data Format?

Types of Data Formats

Importance of Understanding Data Formats:

 Helps in choosing the appropriate storage system.

What is a Data Model?

Key Elements of a Data Model

Types of Data Models

1. Hierarchical Data Model:

Example of a Data Model in a Relational Database

Benefits of Appropriate Models and Storage Environments in Big Data

o Different models cater to different types of data (structured, semi-structured,

Examples of Storage Environments:

What is a Data Mart?

Features of a Data Mart:

 Smaller and more focused than a data warehouse.

How is a data mart different from a data warehouse?

1. Dependent Data Mart

1. Simpler, more focused & flexible.

Disadvantages of data mart

1. Maintaining a large number of independent data marts can be time-consuming, resource-

Data Mart vs Data Warehouse

Data Mart Data Warehouse

It is a decentralized system It is a centralized system

You might also like