Data Ingreation Approaches
Data Ingreation Approaches
Federated Databases
How It Works:
A middleware layer connects different databases without physically moving or replicating the
data.
Queries are processed by translating them into the format of the underlying databases.
The system returns a unified result to the user without requiring a centralized storage
system.
Advantages:
Challenges:
Use Cases:
Organizations with multiple database systems that need unified access (e.g., banks,
hospitals).
Real-time reporting from distributed databases.
1. When a user queries data, the virtual database does not store the data itself.
2. It translates the query into multiple sub-queries that are sent to the original databases.
3. The results from different databases are combined and presented as a single, unified
dataset.
Instead of merging all data into a single database, the bank sets up a virtual database
that integrates these three sources. When a bank employee searches for a customer’s
financial profile, the virtual database retrieves and combines real-time data from all
three sources without duplicating it.
How It Works:
Data from different sources is mapped into memory so that applications can access it
instantly.
It uses shared memory techniques to integrate data without duplicating it.
Typically used in real-time analytics and high-speed transaction processing.
Advantages:
Challenges:
Use Cases:
How It Works:
Advantages:
Challenges:
Use Cases:
Comparison of Methods
You said:
with example
ChatGPT said:
Here are examples of how each data integration method is used in real-world
scenarios:
Scenario:
A hospital group has multiple branches, each using its own database for patient
records, lab results, and billing. The hospital wants a unified view of patient data
without moving or duplicating data.
The hospital implements a federated database system that connects all the branch
databases.
When a doctor queries a patient’s history, the system fetches real-time data from multiple
databases.
The doctor sees a combined view of the patient’s medical history, lab results, and billing
details.
Benefits:
Scenario:
A stock exchange system needs to process millions of stock price updates per
second to provide real-time insights to traders.
Benefits:
Scenario:
Sales data is collected from all retail stores and online platforms.
The data is extracted, cleaned, and stored in a data warehouse (e.g., Amazon Redshift,
Google BigQuery).
Analysts run queries and reports to identify trends, such as seasonal buying patterns or
popular products.
Benefits:
EmpNo (Primary Key) DeptNo (Foreign Key)EmpName (Not Null)Salary (Check Constraint)
101 1 John Doe 50000
102 2 Alice Smith 60000
103 3 Bob Williams 55000
Data Integrity Rules Explained with the Example:
Primary Key (PK) – Unique & Not Null
EmpNo (Employee Number) is the Primary Key, so each employee must have a unique EmpNo.
Example: There cannot be two employees with EmpNo 101.
It also cannot be empty (NULL), meaning every employee must have an EmpNo.
Foreign Key (FK) – Linking Tables
DeptNo (Department Number) is a Foreign Key, meaning it must refer to an existing department in
the Department Table.
Example: If DeptNo 5 does not exist in the Department Table, you cannot assign an employee to
DeptNo 5.
Not Null Constraint – No Empty Values
EmpName has a Not Null constraint, meaning every employee must have a name.
Example: If we try to insert a new employee without an EmpName, the database will reject the entry.
Check Constraint – Business Rules
Conclusion
Data Quality means the data is correct and useful.
Data Integrity means the data follows rules to keep it accurate and consistent.
Databases use Primary Keys, Foreign Keys, Not Null, and Check Constraints to protect data integrity.
📌 Example:
If a school records a student’s birthdate incorrectly, the student might not be placed in the
correct grade.
Final Summary
📌 Data Quality means data is accurate and useful.
📌 Data Integrity ensures data follows rules to stay correct.
📌 Primary Key, Foreign Key, Not Null, and Check Constraints help maintain
data integrity.