0% found this document useful (0 votes)

3 views9 pages

Data Ingreation Approaches

The document discusses various data integration methods including Federated Databases, Memory-Mapped Data Structures, and Data Warehousing, each with distinct advantages, challenges, and use cases. It emphasizes the importance of Data Quality and Data Integrity in maintaining accurate and reliable data within databases. Examples illustrate how these methods are applied in real-world scenarios like healthcare, stock trading, and retail analysis.

Uploaded by

kruthika c g

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views9 pages

Data Ingreation Approaches

Uploaded by

kruthika c g

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

Federated Databases

A federated database system (FDBS) is a type of virtual database that integrates

multiple independent databases while allowing them to maintain their autonomy.

How It Works:

 A middleware layer connects different databases without physically moving or replicating the
data.
 Queries are processed by translating them into the format of the underlying databases.
 The system returns a unified result to the user without requiring a centralized storage
system.

Advantages:

 Real-time data access from multiple sources.

 No need for duplicating or transferring large amounts of data.
 Maintains autonomy of individual databases.

Challenges:

 Performance issues due to query translation overhead.

 Complex implementation as different databases may have different formats and query
languages.

Use Cases:

 Organizations with multiple database systems that need unified access (e.g., banks,
hospitals).
 Real-time reporting from distributed databases.

A virtual database is a system that provides a unified view of multiple databases

without physically copying or moving the data. Instead of storing data in one place,
it dynamically retrieves and integrates data from different sources in real-time.

How a Virtual Database Works:

1. When a user queries data, the virtual database does not store the data itself.
2. It translates the query into multiple sub-queries that are sent to the original databases.
3. The results from different databases are combined and presented as a single, unified
dataset.

Key Features of a Virtual Database

✅ No data duplication – Data remains in its original source.

✅ Real-time access – Fetches the latest data on demand.
✅ Flexible integration – Can connect different databases (SQL, NoSQL, Cloud,
etc.).
✅ Cost-effective – No need for large storage infrastructure.
Example of a Virtual Database

Use Case: Banking System

A bank has separate databases for:

 Customer information (MySQL)

 Loan records (PostgreSQL)
 Credit card transactions (Oracle DB)

Instead of merging all data into a single database, the bank sets up a virtual database
that integrates these three sources. When a bank employee searches for a customer’s
financial profile, the virtual database retrieves and combines real-time data from all
three sources without duplicating it.

2. Memory-Mapped Data Structures

Memory-mapped data integration is an in-memory approach where data is directly

accessed in RAM instead of traditional disk-based storage.

How It Works:

 Data from different sources is mapped into memory so that applications can access it
instantly.
 It uses shared memory techniques to integrate data without duplicating it.
 Typically used in real-time analytics and high-speed transaction processing.

Advantages:

 Extremely fast as it eliminates disk I/O operations.

 Reduces storage redundancy.
 Ideal for big data and real-time applications.

Challenges:

 Requires large RAM capacity, which can be expensive.

 Data is volatile—if the system crashes, data may be lost unless persisted.

Use Cases:

 Financial trading systems that require real-time price updates.

 IoT applications where real-time data processing is crucial.
 In-memory databases like Redis and Apache Ignite.
3. Data Warehousing

A data warehouse is a centralized repository that integrates structured data from

multiple sources for analysis and reporting.

How It Works:

 Data is extracted from various operational databases (ETL process).

 The extracted data is cleaned, transformed, and stored in a structured format.
 Users can query, analyze, and generate reports from the warehouse.

Advantages:

 Stores historical data, enabling trend analysis.

 Optimized for business intelligence (BI) and decision-making.
 Improves data consistency by standardizing formats.

Challenges:

 High initial setup costs and maintenance.

 Not suitable for real-time data processing (batch-oriented).
 Requires ETL processes, which can be complex.

Use Cases:

 Business Intelligence & Reporting (e.g., sales performance tracking).

 Customer analytics (e.g., analyzing shopping behavior in e-commerce).
 Enterprise data consolidation (e.g., integrating data from different departments).

Comparison of Methods

Federated Memory-Mapped Data

Feature
Databases Data Structures Warehousing
Data
Distributed In-memory Centralized
Storage
Speed Medium Very High Medium (Batch)
Real-Time
Yes Yes No
Access
Scalability High Limited by RAM High
Complexity High Medium High
Unified access to Real-time Business
Use Case
multiple DBs analytics Intelligence

Which One to Choose?

 Use federated databases if you need real-time access to multiple databases without moving
data.
 Use memory-mapped data structures if you need high-speed data access for real-time
processing.
 Use data warehousing if you need historical data analysis and business intelligence.

Would you like me to expand on any of these methods?

You said:

with example
ChatGPT said:

Here are examples of how each data integration method is used in real-world
scenarios:

1. Federated Databases Example – Healthcare System

Integration

Scenario:

A hospital group has multiple branches, each using its own database for patient
records, lab results, and billing. The hospital wants a unified view of patient data
without moving or duplicating data.

How Federated Databases Work:

 The hospital implements a federated database system that connects all the branch
databases.
 When a doctor queries a patient’s history, the system fetches real-time data from multiple
databases.
 The doctor sees a combined view of the patient’s medical history, lab results, and billing
details.

Example Tools: IBM InfoSphere Federation Server, Microsoft

SQL Server PolyBase

Benefits:

✅ Real-time patient data access from different locations

✅ No need to duplicate sensitive patient data
✅ Each branch maintains autonomy over its database

2. Memory-Mapped Data Structures Example – Stock

Market Trading

Scenario:
A stock exchange system needs to process millions of stock price updates per
second to provide real-time insights to traders.

How Memory-Mapped Data Works:

 Stock price data is stored in memory-mapped files instead of traditional databases.

 Traders' software can instantly read and analyze stock prices without disk access.
 The system updates stock prices in real time, allowing traders to make quick decisions.

Example Tools: Apache Ignite, Redis, Oracle TimesTen

Benefits:

✅ Ultra-fast stock price updates with zero disk I/O

✅ Low latency for real-time trading decisions
✅ Efficient memory usage for large-scale trading

3. Data Warehousing Example – Retail Sales Analysis

Scenario:

A large retail company (e.g., Walmart, Amazon) wants to analyze customer

purchasing trends over the past 5 years to improve marketing and inventory
planning.

How Data Warehousing Works:

 Sales data is collected from all retail stores and online platforms.
 The data is extracted, cleaned, and stored in a data warehouse (e.g., Amazon Redshift,
Google BigQuery).
 Analysts run queries and reports to identify trends, such as seasonal buying patterns or
popular products.

Example Tools: Snowflake, Amazon Redshift, Google

BigQuery

Benefits:

✅ Enables historical trend analysis for better decision-making

✅ Supports business intelligence (BI) dashboards
✅ Data is structured and optimized for fast analytics

Comparison Table with Examples

Federated Data
Memory-Mapped
Feature Databases Warehousing
(Stock Trading)
(Hospital) (Retail)
Data Distributed In-memory (real- Centralized
Location (multiple hospitals) time stock prices) (retail sales data)
Medium (real-time Very High Medium (batch
Speed
queries) (microseconds) processing)
Patient record
Use High-speed stock Analyzing sales
access across
Case trading trends
hospitals
Example IBM InfoSphere, Redis, Apache Snowflake,
Tools SQL PolyBase Ignite Amazon Redshift

Which One Should You Use?

 If you need real-time access to multiple databases → Use Federated Databases
 If you need extremely fast real-time processing → Use Memory-Mapped Data
 If you need historical data analysis for reporting → Use Data Warehousing

What is Data Quality?

Data Quality refers to how accurate, complete, and reliable data is. Good data quality means the data
is correct, up-to-date, and useful for making decisions.

Why is Data Quality Important?

Imagine you are filling out a form to apply for a job. If you enter the wrong phone number or leave
your email address blank, the company cannot contact you—this is an example of bad data quality.

What is Data Integrity?

Data Integrity ensures that the data in a database is correct, consistent, and does not get changed
accidentally. It involves rules that protect data from mistakes or unauthorized changes.

How Does Data Integrity Work?

In a database, data integrity is maintained using rules and constraints. Let’s understand some key
concepts with an example:

Example: Employee Database

Imagine we have an Employee Table in a company's database. The table has the following columns:

EmpNo (Primary Key) DeptNo (Foreign Key)EmpName (Not Null)Salary (Check Constraint)
101 1 John Doe 50000
102 2 Alice Smith 60000
103 3 Bob Williams 55000
Data Integrity Rules Explained with the Example:
Primary Key (PK) – Unique & Not Null

EmpNo (Employee Number) is the Primary Key, so each employee must have a unique EmpNo.
Example: There cannot be two employees with EmpNo 101.
It also cannot be empty (NULL), meaning every employee must have an EmpNo.
Foreign Key (FK) – Linking Tables
DeptNo (Department Number) is a Foreign Key, meaning it must refer to an existing department in
the Department Table.
Example: If DeptNo 5 does not exist in the Department Table, you cannot assign an employee to
DeptNo 5.
Not Null Constraint – No Empty Values

EmpName has a Not Null constraint, meaning every employee must have a name.
Example: If we try to insert a new employee without an EmpName, the database will reject the entry.
Check Constraint – Business Rules

Salary has a Check Constraint to ensure employees earn at least $30,000.

Example: If someone tries to enter a salary of $20,000, the database won’t allow it because it violates
the rule.
Why is Data Integrity Important?
✅ Prevents duplicate records (e.g., two employees with the same EmpNo).
✅ Ensures accuracy (e.g., an employee cannot be assigned to a non-existent department).
✅ Protects data consistency (e.g., salaries must follow company rules).
✅ Helps in reliable decision-making (e.g., HR can trust that all employees have valid salaries).

Conclusion
Data Quality means the data is correct and useful.
Data Integrity means the data follows rules to keep it accurate and consistent.
Databases use Primary Keys, Foreign Keys, Not Null, and Check Constraints to protect data integrity.

What is Data Quality?

Data Quality means that data is accurate, complete, consistent, and useful for
making decisions.

Why is Data Quality Important?

🔹 If data is incorrect, wrong decisions can be made.

🔹 Bad data can cause errors in reports.
🔹 Good data saves time and money in businesses and organizations.

📌 Example:

 If a school records a student’s birthdate incorrectly, the student might not be placed in the
correct grade.

What is Data Integrity?

Data Integrity ensures that data is correct, reliable, and follows rules to avoid
mistakes or loss of information.

Example of Data Integrity in a Database

Imagine a company has an Employee Table storing employee details. It follows data
integrity rules to keep the data correct.

EmpNo DeptNo EmpName Age (Check

(Primary Key) (Foreign Key) (Not Null) Constraint)
1001 D01 John Mathews 25
1010 D02 Elan Hayden 27
Jennifer
1011 D01 23
Augustine

🔹 Each column follows a rule to maintain integrity.

Types of Data Integrity Rules

1️⃣Primary Key (PK) – Uniqueness Rule
 Each employee must have a unique EmpNo (Employee Number).
 Example: No two employees can have EmpNo = 1001.

2️⃣Foreign Key (FK) – Relationship Rule

 DeptNo (Department Number) must exist in the Department Table.
 Example: If the Department Table has "D01" (Finance), an employee can be assigned to it.
 If DeptNo D05 doesn’t exist, we cannot assign an employee to it.

DeptNo (From Department

Dept Name
Table)
D01 Finance
D02 Purchase
Human
D03
Resources
D04 Sales

3️⃣Not Null – No Empty Values Rule

 EmpName must always be filled.
 Example: If we try to insert an employee without a name, the database will reject it.

4️⃣Check Constraint – Business Rule

 Age should be between 18 and 65.
 Example: If someone tries to enter Age 17, the database will reject it.

Why is Data Integrity Important?

✅ Prevents duplicate records
✅ Ensures data accuracy and reliability
✅ Protects business operations from errors
✅ Keeps relationships between tables correct

Final Summary
📌 Data Quality means data is accurate and useful.
📌 Data Integrity ensures data follows rules to stay correct.
📌 Primary Key, Foreign Key, Not Null, and Check Constraints help maintain
data integrity.

Alteryx Designer Tool Sheet 11.0 PDF
0% (1)
Alteryx Designer Tool Sheet 11.0 PDF
24 pages
1st Semester
No ratings yet
1st Semester
19 pages
DB - Chapter 1
No ratings yet
DB - Chapter 1
64 pages
Dbms
No ratings yet
Dbms
8 pages
Nots RDBMS
No ratings yet
Nots RDBMS
37 pages
Data Integration
No ratings yet
Data Integration
44 pages
BDT Unit 02 - Part1
No ratings yet
BDT Unit 02 - Part1
153 pages
Adbms Imp
No ratings yet
Adbms Imp
25 pages
Database
No ratings yet
Database
72 pages
Info
No ratings yet
Info
20 pages
IT in Business - Session 7
No ratings yet
IT in Business - Session 7
35 pages
Unit 3 DBMS
No ratings yet
Unit 3 DBMS
114 pages
Dbms + SQL Sheet
No ratings yet
Dbms + SQL Sheet
78 pages
Big Data Part 3
No ratings yet
Big Data Part 3
25 pages
All About Database
No ratings yet
All About Database
10 pages
Big Data Analytics (Unit-II)
No ratings yet
Big Data Analytics (Unit-II)
17 pages
Data Warehouse - Concepts
No ratings yet
Data Warehouse - Concepts
64 pages
ADMT End War
No ratings yet
ADMT End War
30 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
IoT Class U4 U5 Postmid
No ratings yet
IoT Class U4 U5 Postmid
102 pages
Session 1
No ratings yet
Session 1
48 pages
A Database Management System Group Assignment - 080810
No ratings yet
A Database Management System Group Assignment - 080810
9 pages
Soft DeepAI
No ratings yet
Soft DeepAI
23 pages
ITBO WEEK 3 PPT - ch03
No ratings yet
ITBO WEEK 3 PPT - ch03
53 pages
Advanced Databased Integration RepotMaamJho
No ratings yet
Advanced Databased Integration RepotMaamJho
45 pages
Lecture 2 - Relational Data Processing
No ratings yet
Lecture 2 - Relational Data Processing
10 pages
Unit 5 DMS
No ratings yet
Unit 5 DMS
4 pages
Introduction To Data Models 677e35511a823
No ratings yet
Introduction To Data Models 677e35511a823
45 pages
Course Work Database Programming
No ratings yet
Course Work Database Programming
18 pages
Notes
No ratings yet
Notes
14 pages
The JavaScript Beginner's Handbook
90% (10)
The JavaScript Beginner's Handbook
76 pages
DBMS Notes...
No ratings yet
DBMS Notes...
5 pages
PDF Document BIDA 2
No ratings yet
PDF Document BIDA 2
21 pages
Dbms Unit - 1
No ratings yet
Dbms Unit - 1
18 pages
A Database Management System
No ratings yet
A Database Management System
12 pages
Module 6
No ratings yet
Module 6
7 pages
Chapter 1 Data Warehouse Fundamentals
No ratings yet
Chapter 1 Data Warehouse Fundamentals
26 pages
AdDB Chap 1
No ratings yet
AdDB Chap 1
42 pages
JNV Syllabus
No ratings yet
JNV Syllabus
23 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
DBMS
No ratings yet
DBMS
10 pages
CCD Chapter 3 Notes
No ratings yet
CCD Chapter 3 Notes
11 pages
New 2nd Lecture Data Resource Management
No ratings yet
New 2nd Lecture Data Resource Management
24 pages
Advanced Database Concepts
No ratings yet
Advanced Database Concepts
7 pages
Database
No ratings yet
Database
4 pages
2 Emerging
No ratings yet
2 Emerging
10 pages
DDM Assignment
No ratings yet
DDM Assignment
27 pages
Types of Databases
No ratings yet
Types of Databases
9 pages
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
No ratings yet
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
24 pages
DBMS - Chapter 1
No ratings yet
DBMS - Chapter 1
45 pages
Database Management
No ratings yet
Database Management
7 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
4 pages
DBMS
No ratings yet
DBMS
4 pages
Udbms Notes
No ratings yet
Udbms Notes
18 pages
Database Types
No ratings yet
Database Types
4 pages
DDBS Post Mid - Lecture 2
No ratings yet
DDBS Post Mid - Lecture 2
16 pages
Case Study About Database Tools
No ratings yet
Case Study About Database Tools
13 pages
DBMS Group 4 Work
No ratings yet
DBMS Group 4 Work
5 pages
Basics of Data Integration
100% (1)
Basics of Data Integration
61 pages
DBMS Tutoria1
No ratings yet
DBMS Tutoria1
7 pages
MIS - 7 (Compatibility Mode)
No ratings yet
MIS - 7 (Compatibility Mode)
48 pages
1Z0 062 Teststudyguide
No ratings yet
1Z0 062 Teststudyguide
22 pages
Life Insurance Data Model
100% (2)
Life Insurance Data Model
24 pages
Hospital Management System
No ratings yet
Hospital Management System
40 pages
SQL Antipatterns
100% (1)
SQL Antipatterns
250 pages
Scribd Cookies
No ratings yet
Scribd Cookies
11 pages
Fundamental of DB Lab Manual
100% (2)
Fundamental of DB Lab Manual
49 pages
SQL Error Code
No ratings yet
SQL Error Code
35 pages
Data Stage
100% (1)
Data Stage
299 pages
SQL Full Revision
No ratings yet
SQL Full Revision
27 pages
Balance Scorecard
No ratings yet
Balance Scorecard
16 pages
SQL Notes
No ratings yet
SQL Notes
10 pages
Schema
No ratings yet
Schema
17 pages
PHP
No ratings yet
PHP
74 pages
R23 Unit-3 PDF V1
No ratings yet
R23 Unit-3 PDF V1
37 pages
CH 4
No ratings yet
CH 4
14 pages
Grade - Xii Humanities Summer Holiday Home Work
No ratings yet
Grade - Xii Humanities Summer Holiday Home Work
26 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Cloudera Impala
No ratings yet
Cloudera Impala
478 pages
Best Practices in PL/SQL: Karthikeyan M
No ratings yet
Best Practices in PL/SQL: Karthikeyan M
15 pages
Data Profiling Is A Critical Step in Data Manageme
No ratings yet
Data Profiling Is A Critical Step in Data Manageme
7 pages
SQL: Queries, Programming, Triggers: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
SQL: Queries, Programming, Triggers: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
32 pages
SQL Mcqs
No ratings yet
SQL Mcqs
25 pages
Character / String Functions
No ratings yet
Character / String Functions
19 pages
SQL Practicals Cs
No ratings yet
SQL Practicals Cs
5 pages
Important Questions Asked in Interviews
No ratings yet
Important Questions Asked in Interviews
55 pages
Scenario
No ratings yet
Scenario
2 pages
Log Cat 1697818094709
No ratings yet
Log Cat 1697818094709
11 pages
Blood Donation System
No ratings yet
Blood Donation System
20 pages
000 731
No ratings yet
000 731
47 pages
How To Create A Table in MySQL (And Display Data)
No ratings yet
How To Create A Table in MySQL (And Display Data)
1 page
DB Questions
No ratings yet
DB Questions
5 pages
Document 988907.1
No ratings yet
Document 988907.1
2 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Data Ingreation Approaches

Uploaded by

Data Ingreation Approaches

Uploaded by

1.

A federated database system (FDBS) is a type of virtual database that integrates

 Real-time data access from multiple sources.

 Performance issues due to query translation overhead.

A virtual database is a system that provides a unified view of multiple databases

How a Virtual Database Works:

Key Features of a Virtual Database

✅ No data duplication – Data remains in its original source.

Use Case: Banking System

A bank has separate databases for:

 Customer information (MySQL)

2. Memory-Mapped Data Structures

Memory-mapped data integration is an in-memory approach where data is directly

 Extremely fast as it eliminates disk I/O operations.

 Requires large RAM capacity, which can be expensive.

 Financial trading systems that require real-time price updates.

A data warehouse is a centralized repository that integrates structured data from

 Data is extracted from various operational databases (ETL process).

 Stores historical data, enabling trend analysis.

 High initial setup costs and maintenance.

 Business Intelligence & Reporting (e.g., sales performance tracking).

Federated Memory-Mapped Data

Which One to Choose?

Would you like me to expand on any of these methods?

1. Federated Databases Example – Healthcare System

How Federated Databases Work:

Example Tools: IBM InfoSphere Federation Server, Microsoft

✅ Real-time patient data access from different locations

2. Memory-Mapped Data Structures Example – Stock

How Memory-Mapped Data Works:

 Stock price data is stored in memory-mapped files instead of traditional databases.

Example Tools: Apache Ignite, Redis, Oracle TimesTen

✅ Ultra-fast stock price updates with zero disk I/O

3. Data Warehousing Example – Retail Sales Analysis

A large retail company (e.g., Walmart, Amazon) wants to analyze customer

How Data Warehousing Works:

Example Tools: Snowflake, Amazon Redshift, Google

✅ Enables historical trend analysis for better decision-making

Comparison Table with Examples

Which One Should You Use?

What is Data Quality?

Why is Data Quality Important?

What is Data Integrity?

How Does Data Integrity Work?

Example: Employee Database

Salary has a Check Constraint to ensure employees earn at least $30,000.

What is Data Quality?

Why is Data Quality Important?

🔹 If data is incorrect, wrong decisions can be made.

What is Data Integrity?

Example of Data Integrity in a Database

EmpNo DeptNo EmpName Age (Check

🔹 Each column follows a rule to maintain integrity.

Types of Data Integrity Rules

2️⃣Foreign Key (FK) – Relationship Rule

DeptNo (From Department

3️⃣Not Null – No Empty Values Rule

4️⃣Check Constraint – Business Rule

Why is Data Integrity Important?

You might also like