Distributed Database Management Systems
Distributed Database Management Systems
1
regions and providing disaster recovery capabilities. This geographic distribution also
helps comply with data sovereignty requirements and regulatory constraints.
2
Federated architecture: In this architecture, each site in the distributed database system
maintains its own independent database, but the databases are integrated through a middleware
layer that provides a common interface for accessing and querying the data.
Distributed databases may be homogeneous or heterogeneous.
In a homogeneous distributed database system, all physical locations use the same underlying
hardware and run the same operating systems and database applications. Users perceive
homogeneously distributed database systems as a single system, and designing and maintaining
them can be much easier. For a distributed database system to be homogeneous, the data structures
at each location must be similar or compatible. The database applications in each location must
also be similar or compatible.
A heterogeneous distributed database can have different hardware, operating systems, and
database applications at each location. Although a schema difference may make query and
transaction processing more difficult, different sites can use different schemas and software. Users
in one location may be able to read data in another, but not upload or modify it. Because
heterogeneous distributed databases are difficult to use, many businesses cannot afford to employ
them.
There are 2 ways in which data can be stored on different sites. These are:
3
1. Replication
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the
entire database is available at all sites, it is a fully redundant database. Hence, in
replication, systems maintain copies of data. This is advantageous as it increases the
availability of data at different sites. Also, now query requests can be processed in
parallel. However, it has certain disadvantages as well. Data needs to be constantly
updated. Any change made at one site needs to be recorded at every site that relation is
stored or else it may lead to inconsistency. This is a lot of overhead. Also, concurrency
control becomes way more complex as concurrent access now needs to be checked over
a number of sites.
2. Fragmentation
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and
each of the fragments is stored in different sites where they’re required. It must be made
sure that the fragments are such that they can be used to reconstruct the original relation
(i.e, there isn’t any loss of data). Fragmentation is advantageous as it doesn’t create
copies of data, consistency is not a problem.
4
3) Use Strong Authentication Mechanisms
Multi-Factor Authentication (MFA): Implement strong authentication methods, including MFA,
to verify user identities before granting access to the database.
Role-Based Access Control: Limit user permissions based on roles to ensure that individuals only
have access to data necessary for their job functions, adhering to the principle of least privilege.
4) Regular Monitoring and Auditing
Activity Monitoring: Continuously monitor database activities to detect and respond to suspicious
behavior promptly. Implement alerts for unusual access pattern.
Regular Audits: Conduct periodic audits of user permissions and database activity logs to identify
potential vulnerabilities or unauthorized access attempts.
5) Data Classification and Inventory
Data Discovery: Maintain an inventory of all data stored within the database, classifying it based
on sensitivity. This helps in implementing appropriate security measures tailored to the type of
data.
Data Minimization: Only retain data necessary for business operations and regularly purge
unnecessary historical information to reduce exposure risk.
6) Establish a Zero Trust Architecture
Continuous Verification: Adopt a Zero Trust model where every access request is verified,
regardless of its origin within or outside the network perimeter. This approach assumes that threats
can arise from both internal and external sources.
7) Physical Security Measures
Secure Hardware: Ensure that physical servers are located in secure environments with restricted
access. This includes using locked rooms and monitoring who accesses the hardware.
8) Regular Backups and Recovery Plans
Backup Strategies: Implement a robust backup strategy that includes regular backups stored
securely offsite. This ensures quick recovery in case of data loss or corruption due to breaches or
failures.
9) Database Hardening
Remove Unnecessary Services: Harden databases by disabling unused features and services,
applying security patches promptly, and enforcing strict password policies.
10) Utilize Database Firewalls
Access Control: Deploy database-specific firewalls that deny all traffic by default, allowing only
trusted applications or users to access the database.
5
Distributed Database Management Security
Distributed Database Management Systems (DDBMS) face unique security challenges due to their
architecture, which involves multiple sites, various users, and diverse data types. As a result,
security measures in DDBMS must be more comprehensive compared to centralized systems. Key
areas of focus include communication security, data security, and auditing practices.
Key Security Aspects
1. Communication Security
Threats: DDBMS are vulnerable to two main types of intruders: passive eavesdroppers, who
monitor communications to gather sensitive information, and active attackers, who can corrupt
data by modifying or injecting new information
Protection Measures:
Secure communication channels are essential to prevent unauthorized access and ensure data
integrity during transmission.
Technologies such as Transport Layer Security (TLS) and Virtual Private Networks
(VPNs) are commonly employed to secure communications.
2. Data Security
Access Control: Implementing robust authentication and authorization protocols is crucial. This
typically involves:
Username/password combinations and digital certificates for user verification.
Discretionary access control (DAC) and mandatory access control (MAC) mechanisms to regulate
user permissions based on their roles and data sensitivity level.
Data Encryption: Two primary approaches for encrypting data in distributed systems include:
Internal encryption, where applications encrypt data before storing it in the database.
External encryption, where the database system handles encryption transparently for the
applications.
Input Validation: Ensuring that all user inputs are validated before processing helps prevent
various attacks, including SQL injection and buffer overflow exploits
3. Auditing and Monitoring
Regular auditing is vital for identifying security violations and ensuring compliance with security
policies. Audit logs should capture:
i) Access attempts (both successful and failed).
ii) Significant modifications to the database.
iii) Patterns of data access across multiple sites
iv) These logs should ideally be stored on separate servers to enhance security against
potential attacks.
6
Challenges in DDBMS Security
• Network Complexity: Distributed systems rely on network communication between
nodes, which introduces complexity and overhead. Managing network latency, bandwidth
limitations, and packet loss can be challenging, particularly in large-scale deployments
spanning multiple geographic locations.
• Consistency and Coordination: Maintaining data consistency across distributed nodes is
challenging due to the possibility of concurrent updates and network partitions. Achieving
strong consistency requires coordination mechanisms like distributed transactions and
consensus protocols, which can introduce latency and overhead.
• Fault Tolerance: Distributed systems must be resilient to hardware failures, software bugs,
and network issues. Implementing fault tolerance mechanisms, such as replication,
redundancy, and failure detection, adds complexity and overhead to the system
architecture.
• Concurrency Control: Coordinating concurrent access to shared resources in a distributed
environment is challenging. Distributed systems must implement efficient concurrency
control mechanisms to prevent data corruption, race conditions, and deadlocks while
maximizing throughput and performance.
• Security: Distributed systems face various security threats, including unauthorized access,
data breaches, and denial-of-service attacks. Securing communication channels,
authenticating users and nodes, and implementing access control policies are critical to
protecting sensitive data and ensuring system integrity.
7
3) Domain Integrity: Enforces valid entries for a given column by restricting the type,
format, or range of values that can be entered. This helps maintain the quality of data stored
in the database.
4) User-defined Integrity: Allows users to create additional business rules or constraints that
ensure data meets specific organizational requirements, enhancing the flexibility of data
management within the DDBMS.
Challenges to Integrity in DDBMS
1) Concurrency Control: Multiple transactions may attempt to access and modify the same
data simultaneously, leading to potential conflicts and inconsistencies. Effective
concurrency control mechanisms are required to manage these interactions without
compromising integrity.
2) Site and Network Failures: DDBMS must ensure consistency even in the event of failures
at individual sites or network disruptions. Reliable commit protocols and transaction
management strategies are essential to maintain integrity during such incidents.
3) Enforcement of Integrity Constraints: Ensuring that integrity constraints are enforced
across multiple databases can be complex. A distributed integrity manager may be
necessary to oversee these constraints effectively across different systems.
Techniques for Maintaining Integrity
1) Automated Validation Mechanisms: Implementing automated checks during data entry
and processing can significantly reduce errors and enhance data integrity. These
mechanisms can include input validation, error-checking algorithms, and regular audits.
2) Metadata Management: Maintaining accurate metadata about the structure and
relationships of data across distributed systems helps enforce integrity constraints and
supports effective query processing.
3) Use of Checksums and Hash Functions: These tools can detect data corruption during
transmission or storage, ensuring that any alterations are identified promptly.
In conclusion maintaining data integrity in a Distributed Database Management System is
multifaceted, involving various types of integrity constraints and techniques tailored to address the
unique challenges posed by distributed architectures. By implementing robust mechanisms for
entity, referential, domain, and user-defined integrity, organizations can ensure that their data
remains accurate and reliable throughout its lifecycle, thus supporting effective decision-making
and operational efficiency.
8
Types of Scalability
Vertical Scalability (Scale-Up):
Involves enhancing the capacity of a single server by adding more resources such as CPU, RAM,
or storage.
Limitations:
1) There is a finite limit to how much a single server can be upgraded.
2) Upgrading can be costly and may lead to downtime.
3) It does not eliminate the risk of a single point of failure, which can impact availability.
Horizontal Scalability (Scale-Out):
Involves adding more servers or nodes to distribute the load across multiple machines.
Advantages:
1) More cost-effective than vertical scaling, especially for large-scale applications.
2) Provides better resilience and fault tolerance since the failure of one node does not affect
the entire system.
3) Allows for seamless handling of increased data volumes and user requests without
significant performance degradation.
Benefits of Horizontal Scalability in DDBMS
1) Cost-Effectiveness: Organizations can expand their database infrastructure by adding
inexpensive commodity hardware rather than investing in high-end servers
2) Improved Performance: By distributing data and workloads across multiple nodes,
DDBMS can maintain high performance levels even under heavy loads. Techniques such
as sharding (dividing data into smaller, manageable pieces) and replication (duplicating
data across nodes) enhance read performance and fault tolerance.
3) Geographical Distribution: DDBMS can be designed to operate across multiple
locations, allowing for local access to data, which reduces latency for users distributed
globally. This is particularly beneficial for applications like banking and e-commerce,
where quick access to data is critical.
Challenges in Achieving Scalability
While scaling out offers many benefits, it also presents challenges that must be addressed:
1) Data Consistency: Maintaining consistency across distributed nodes can be complex due
to the CAP theorem, which states that it is impossible for a distributed system to
simultaneously provide consistency, availability, and partition tolerance. DDBMS must
often choose between strong consistency (which may increase latency) and eventual
consistency (which risks accessing stale data).
2) Operational Complexity: Managing a distributed infrastructure requires advanced tools
and skilled personnel to handle the complexities associated with synchronization, resource
allocation, and load balancing.
9
3) Performance Management: As systems scale, ensuring sustained performance in terms
of query execution speed and transaction handling becomes increasingly challenging.
Please Note: Scalability is a defining feature of Distributed Database Management Systems,
allowing them to efficiently manage growing amounts of data and user requests. By leveraging
horizontal scalability through techniques like sharding and replication, DDBMS can provide high
availability and performance while minimizing costs. However, organizations must navigate
challenges related to data consistency and operational complexity to fully realize the benefits of
scalable distributed databases.
10
Rollback and Forward Recovery:
Rollback Recovery: This technique involves reverting the system to a previous checkpoint when
an error is detected, effectively undoing any changes made since that point.
Forward Recovery: Instead of reverting, this approach attempts to move the system from an
erroneous state to a correct one without going back to earlier checkpoints
Commit Protocols: Commit protocols ensure that transactions maintain ACID properties
(Atomicity, Consistency, Isolation, and Durability) even in the event of failures:
Two-Phase Commit Protocol: This protocol ensures that all nodes involved in a transaction either
commit or abort changes collectively. It involves a prepare phase where nodes confirm readiness
and a commit phase where changes are finalized.
Transaction Undo and Redo:
Undo: If a transaction fails before it commits, any changes must be undone.
Redo: If a transaction commits but a subsequent failure occurs, its changes must be reapplied
during recover.
Failure and recovery mechanisms in Distributed Database Management Systems are vital for
ensuring data integrity and system reliability. By employing robust logging, check pointing, and
commit protocols, DDBMS can effectively manage various types of failures while maintaining
consistent operations. Understanding these mechanisms helps organizations design resilient
systems capable of quickly recovering from disruptions.
12