Advanced Databases Assignment 1 (1)
Advanced Databases Assignment 1 (1)
ASSIGNMENT COVER
REGION: ___________________HARARE__________________________________________________
MAILING ADDRESS:
[email protected]_____________________________________________________
_____
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
88%
OVERALL MARK: _____________ Zvirikuzhe
MARKER’S NAME: ________________________
Question 1
(a) Compare and contrast relational databases with non-relational databases in terms of
structure, scalability, and typical use cases. Provide examples. (5 Marks)
Relational databases organize data in tables with defined schemas, ensuring integrity through
relationships and foreign keys. They typically scale vertically, requiring hardware upgrades for
increased loads, though some support horizontal scaling via sharding.
Non-relational databases (NoSQL) allow for flexible, unstructured data storage without fixed
schemas. They are designed for horizontal scalability, efficiently distributing data across multiple
servers, making them ideal for high-volume, rapidly changing applications.
[5]
Relational Databases: Best for data integrity applications like banking and ERP systems (e.g.,
MySQL, PostgreSQL).
Non-relational Databases: Suitable for flexible, scalable applications like Big Data and real-time web
apps (e.g., MongoDB, Redis).
Max: 70 words.
(b) Explain the importance of ACID properties in a database system. Discuss how these properties
are managed in a distributed database environment. (5 Marks)
Max: 80 words
(c) What is the role of indexing in query optimization? Provide an example of how improper
indexing can impact database performance. (5 Marks)
Indexing enhances data retrieval speed in databases by allowing quick access to specific rows
without scanning the entire table. When a query is executed, a database management system
(DBMS) uses indexes to efficiently locate data, significantly reducing response times.
However, improper indexing—such as excessive indexes or indexing rarely queried columns—can
degrade performance. Too many indexes slow down write operations, while unused indexes waste [4]
resources. Thus, effective indexing is crucial for optimizing query performance while avoiding
unnecessary overhead.
Max: 80 Words
(b) Discuss the importance of data privacy and compliance with regulations such as GDPR in
database management. (5 Marks)
Max: 80 Words
Question 2
(a) Explain the primary characteristics of NoSQL databases. Provide a use case where a
NoSQL database would be more suitable than a relational database, and justify
your choice (10 Marks)
NoSQL databases are tailored for diverse data models, making them ideal for large-scale storage
and real-time web applications. Key features include:
Schema Flexibility: NoSQL databases can store unstructured or semi-structured data
without a fixed schema, adapting easily to changing needs.
Horizontal Scalability: They scale out by adding servers, accommodating increased loads
without relying on upgraded hardware.
High Performance: Optimized for rapid read and write operations, crucial for real-time data
processing.
Variety of Data Models: They include document stores (like MongoDB), key-value stores
(like Redis), and more, allowing choice based on use case.
[10]
Eventual Consistency: Many prioritize availability over immediate consistency, ensuring
operation during network partitions.
1.Query Optimization: ML analyzes historical query patterns to suggest efficient execution plans
and indexing strategies.
2.Anomaly Detection: ML monitors database activity to identify unusual patterns, alerting for
potential security breaches or failures.
3.Automated Indexing: ML automates index creation by analyzing workloads, reducing manual
efforts for administrators.
[10]
4.Predictive Maintenance: ML predicts hardware failures or performance issues, allowing proactive
maintenance.
5.Data Classification: ML classifies and clusters data, improving organization and retrieval speeds.
6.Enhanced Security: ML flags deviations in user behavior to detect unauthorized access.
7.Intelligent Caching: ML optimizes caching based on access patterns, improving response times.
(c)Explore the emerging trend of integrating machine learning with database systems. How can
machine learning enhance database performance.
Integrating machine learning (ML) with database systems enhances performance and efficiency in
several ways:
1. Query Optimization: ML analyzes historical data to predict efficient execution plans,
reducing query latency.
2. Intelligent Indexing: Automated indexing based on access patterns optimizes response
times without manual effort.
3. Anomaly Detection: ML detects unusual patterns in transaction logs, alerting for potential
security breaches or system issues.
4. Predictive Analytics: ML forecasts future trends, aiding proactive decision-making.
5. Automated Data Management: Streamlines tasks like load balancing through dynamic
resource allocation.
6. Enhanced Data Retrieval: Natural language processing allows intuitive querying for non-
technical users.
7. Performance Tuning: Continuous monitoring suggests adjustments for optimal resource
usage.
Overall, ML integration leads to smarter, more adaptive database management.
Max: 120 Words
(c) Discuss the Two-Phase Commit Protocol and its role in ensuring data consistency.
(10 Marks)
The Two-Phase Commit (2PC) protocol is a distributed algorithm that ensures all participants in a
transaction either commit or abort, maintaining data consistency across multiple databases or systems. It
consists of two phases:
Prepare Phase: The coordinator sends a “prepare” request to all participant nodes, which perform
local checks to determine if they can commit. Each participant replies with either a “vote commit” [6]
or “vote abort.”
Commit Phase: The coordinator evaluates the votes. If all participants vote to commit, a “commit”
command is sent. If any participant votes to abort, an “abort” command is issued.
While 2PC ensures atomicity and consistency, it faces challenges like blocking (if a participant fails),
a single point of failure at the coordinator, and performance overhead due to required round-trip
communication.