advanced database individual assignment
advanced database individual assignment
A distributed database is a collection of databases that are distributed across different physical
locations, connected through a network. The data is not stored at a single site but is spread across
several sites, allowing data to be accessed, updated, and queried from different locations.
Distribution of Data: The database is not confined to one location, but the data is distributed across
multiple servers or nodes.
Transparency: The system should present itself as a single logical database to users and applications,
despite the physical distribution.
Replication: Data can be replicated across multiple locations to improve availability and fault tolerance.
Autonomy: Each node in a distributed database can operate independently, meaning they have local
control over their data.
Concurrency Control: Ensures that multiple users can access and modify the database without conflicts.
2. Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design
Data Fragmentation:
Data fragmentation is the process of breaking a database into smaller pieces (fragments), which are
then distributed across different nodes. There are three main types of fragmentation:
1. Horizontal Fragmentation:
The database is split into rows, where each fragment contains a subset of the rows. For example, users
from different regions may have their data stored in different fragments.
Example: A customer table may be fragmented into multiple parts where each part stores data for
customers from a specific region (e.g., Asia, Europe, America).
2. Vertical Fragmentation:
The database is split into columns, where each fragment contains a subset of the columns. This is useful
when users typically access only certain attributes of the data.
Example: In a Customer table with Name, Address, Email, and Phone, the Name and Email columns may
be stored together in one fragment, and the Address and Phone in another.
A combination of horizontal and vertical fragmentation. A table may be fragmented both horizontally
and vertically, depending on the access patterns.
Data Replication:
Data replication involves storing copies of the same data at multiple sites. Replication increases data
availability and fault tolerance, as multiple copies exist in case of site failures.
Example: In an e-commerce system, product catalog data might be replicated across all regions to
ensure quick access for users globally, while user-specific data may only be replicated in specific regions.
Data Allocation:
Data allocation involves deciding where to store fragments and replicas of the database to optimize
performance, cost, and reliability. The allocation strategies depend on factors like data access
frequency, network latency, and available storage resources.
All sites in the system use the same DBMS software. The data schema and structure are the same across
all sites, making data integration and communication easier.
Example: A company using Oracle at all its locations worldwide can implement a homogeneous
distributed database system.
Example: A system where one site uses MySQL, another uses SQL Server, and a third uses
PostgreSQL.
In a federated system, multiple autonomous databases are linked, but each database retains its
independence. It is typically a combination of both homogeneous and heterogeneous systems, with
some degree of autonomy allowed for each database.
Example: An organization with multiple departments (sales, HR, finance), each using their own
database system, but all part of a federated structure where queries can span across databases.
Query processing in distributed databases is the process of translating, optimizing, and executing
queries on a distributed system. The aim is to provide efficient query execution across multiple sites
while ensuring consistency, fault tolerance, and minimal latency.
1. Query Decomposition:
The first step is to break the query into smaller subqueries that can be executed at different sites. For
example, a query to fetch a customer’s order history might involve accessing data from both the
customer table and the orders table, which could reside at different sites.
2. Data Localization:
Data localization refers to the process of rewriting the query in such a way that it accesses only the
relevant data from the appropriate sites. This involves identifying where the data fragments are stored
and ensuring that the query is sent only to those sites.
3. Optimization:
The query optimizer evaluates different execution plans to determine the most efficient way to execute
the query. This may involve:
- Considering network cost, local processing power, and storage resources at each site.
4. Execution:
The optimized query is executed across the distributed sites. Subqueries are executed at the appropriate
locations, and intermediate results may be transmitted between sites to combine results.
5. Result Integration:
After execution, the results from different sites are integrated. This may involve combining rows from
different sites or performing join operations. Once integrated, the final result is returned to the user.