Database Assignments
Database Assignments
Logical Data Independence refers to the capacity to change the conceptual schema without
affecting the external schema or application programs. This means that changes can be made to
the database structure without impacting how the data is accessed by users or applications.
Physical Data Independence is the ability to change the internal schema without affecting the
conceptual schema. This pertains to how the data is actually stored on disk, including storage
structures and access methods.
In inclusion logical data independence is generally more challenging than physical data
independence because application programs are often closely tied to the logical structure of the
data. Changes at the logical level may require significant alterations to the application code,
whereas changes at the physical level are usually absorbed by the DBMS without affecting the
higher-level schemas.
Question 2
Insertion Anomalies:
These occur when new data cannot be added to the database without the presence of additional,
sometimes unrelated, information. For example, , if a new course "INSY124 - Artificial
intelligence" is added, there would be no student enrolled in it, leading to an insertion anomaly
where the course cannot exist without a related student record.
Deletion Anomalies:
These happen when the deletion of one piece of data inadvertently results in the loss of other
valuable data. For instance, if deleting StudentId 12 from the database also removes all the
courses associated with that student, this means loss of information which should be retained for
historical analysis.
Update Anomalies:
These occur when a single change in the database requires multiple updates to ensure
consistency. For example, if a campus address is stored in multiple places and they move to a
new location, every instance of the old address must be updated. Failure to do so can result in
different parts of the database having different addresses for the same student.
Rules:
1. The table must first satisfy all the rules of First Normal Form (1NF).
2. Eliminate any partial dependencies by splitting the table into separate tables.
Violation of 2NF
Students Table
StudentId StudentName Campus Address Programm
12 Student A Fern Hill Information Systems
13 Student C Greenside Mathematics
14 Student D Chikanga Information Systems
Courses Table
Course CourseTitle InstructorName Instructor
Address
INSY122 Database Lecturer A Block A
INSY123 System Analysis Lecturer B Block B
INSY121 Computational Lecturer C Block C
Mathematics
StudentCourseTable
StudentId CourseId Grade
12 INSY122 1
12 INSY123 2.1
13 INSY121 2.2
14 INSY122 3
14 INSY123 2.1
3NF (Third Normal Form)
Rules:
1. Ensure that the table is in 1NF and 2NF.
2. Eliminate any transitive dependencies by splitting the table into separate tables.
Violation of 3NF
There was transitive dependency in CourseInstructors Table.
Instructors Table
Instructor Name Instructor Location
Lecture A Block A
Lecture B Block B
Lecture C Block C
Students Table
StudentId StudentName Campus Address Programm
12 Student A Fern Hill Information Systems
13 Student C Greenside Mathematics
14 Student D Chikanga Information Systems
StudentCourse Table
StudentId CourseId Grade
12 INSY122 1
12 INSY123 2.1
13 INSY121 2.2
14 INSY122 3
14 INSY123 2.1
Question3
'The CAP theorem states that when a network partition occurs, a distributed database
system must choose between consistency and availability'. Justify this assertion.
The CAP theorem, also known asserts that in a distributed database system, it is impossible to
simultaneously achieve all three of the following guarantees.
Consistency requires that all nodes see the same data at the same time. If a network partition
prevents communication between nodes, maintaining consistency would mean that some nodes
may not be able to respond to read or write requests until the partition is resolved.
Availability ensures that every request receives a response, regardless of whether some data may
be outdated. If a partition occurs, prioritizing availability would allow nodes to continue to
respond to requests with the most recent version of the data they have, which may not be up-to-
date.
Partition Tolerance means that the system continues to operate despite any number of
communication breakdowns between nodes. Since network partitions are a reality in distributed
systems, a system must be tolerant to partitions to function effectively.
Therefore, when a network partition happens, a distributed system must make a trade-off where,
the system must choose between consistency and availability .If it chooses consistency, it
sacrifices availability because some nodes will not be able to respond to requests if they cannot
guarantee the data is current for example a banking service can choose consistency of its
transactions over availability.
If it chooses availability, it sacrifices consistency because the system will respond with the most
recent data available, which may not be the latest due to the partition for example, a social media
platform can choose availability of its services over consistency.
In essence, the CAP theorem highlights the inherent trade-offs in distributed system design and
helps system architects make informed decisions based on the specific requirements and
priorities of their applications.
Question 4
Provide a scenario in which you would choose to use replication instead of a fragmentation
strategy in a distributed environment. In your answer, provide the advantages of
replication over fragmentation. [6]
Scenario:
Imagine a multinational corporation with offices around the globe that requires access to a
centralized customer database. The database contains sensitive customer information that is
frequently accessed and updated by various departments, including sales, customer service, and
marketing.
Improved Availability: Replication ensures that the database remains available even if one site
fails. If the primary site goes down, users can still access the data from a replicated site.
Enhanced Performance: With replication, query requests can be processed in parallel across
multiple sites. This can lead to faster response times, especially for read-intensive operations.
Disaster Recovery: Replication provides a robust disaster recovery solution. In the event of a
catastrophic failure at one site, the data can be recovered from another replicated site.
Modular Growth: As the demand increases, new sites can be added without significant
reconfiguration of the database. Replication allows for easy scaling of the database system.
Data Integration: Replication can integrate data from two or more existing systems without the
need to combine them, maintaining a unified view of data across different organizational
structures.