0% found this document useful (0 votes)
8 views30 pages

Debla

The document discusses the benefits of conceptual data models in database development, highlighting clear communication, requirement validation, and simplified design processes. It explains the differences between strong and weak entities, derived attributes, and cardinality in relationships, as well as the representation of multivalued and ternary relationships in ER diagrams. Additionally, it covers the mapping of entities and relationships to relational database schemas and the importance of transaction control concepts like locking and serializability.

Uploaded by

gopogo56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views30 pages

Debla

The document discusses the benefits of conceptual data models in database development, highlighting clear communication, requirement validation, and simplified design processes. It explains the differences between strong and weak entities, derived attributes, and cardinality in relationships, as well as the representation of multivalued and ternary relationships in ER diagrams. Additionally, it covers the mapping of entities and relationships to relational database schemas and the importance of transaction control concepts like locking and serializability.

Uploaded by

gopogo56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SAMPLE 1

1. What are the key benefits of using conceptual data models in the early stages of database
development ?
Clear Communication:
• Conceptual models are easy to understand for both technical and non-technical stakeholders,
facilitating communication and collaboration.
Requirement Gathering and Validation:
• They help ensure that the database structure aligns with user requirements and business rules,
reducing misunderstandings.
Focus on Business Logic:
• By abstracting technical details, conceptual models focus on business entities and
relationships, ensuring that the database meets organizational goals.
Simplified Design Process:
• They provide a blueprint for designing logical and physical data models, reducing
complexity during implementation.
2. Explain the difference between strong entities and weak entities.

Strong Entity Weak Entity

Strong entity always has a primary key. While a weak entity has a partial discriminator key.

Strong entity is not dependent on any


Weak entity depends on strong entity.
other entity.

Strong entity is represented by a single


Weak entity is represented by a double rectangle.
rectangle.

Two strong entity’s relationship is While the relation between one strong and one weak
represented by a single diamond. entity is represented by a double diamond.

Strong entities have either total


A weak entity has a total participation constraint.
participation or partial participation.

3. What is a derived attribute, and how is it typically represented in a data model?


Derived Attribute:
• An attribute whose value is calculated from other attributes rather than being stored
directly.
• Example: Age derived from Date of Birth.
Representation:
• Typically represented in a data model with a dashed oval connected to the entity.

Why is it important to specify cardinality in a relationship?


Defines Constraints: Cardinality specifies the number of instances of one entity that can or must
be associated with instances of another entity.
Ensures Data Integrity: It helps enforce business rules and ensures the database structure aligns
with real-world relationships.

How do you represent a multivalued attribute in an E/R diagram?


A multivalued attribute is represented in an E/R diagram as a double oval connected to the entity it
belongs to.
Example: A "Student" entity with a multivalued attribute "Phone Numbers" would show "Phone
Numbers" in a double oval linked to "Student."
6. How does an ER diagram help in visualizing a database design?
1. Simplifies Structure: It provides a clear graphical representation of entities, relationships,
and attributes.
2. Facilitates Understanding: Helps stakeholders visualize the data model and identify potential
issues early.

7. What is a "many-to-many" cardinality constraint, and how is it represented in ER diagrams?


1. Definition: A "many-to-many" constraint means multiple instances of one entity can be
associated with multiple instances of another entity.
2. Representation: Represented using a diamond (relationship) connected to both entities,
with 'M:N' noted on the lines.

8. Provide an example of a weak entity and explain its relationship to a strong entity.
1. Example: A "Dependent" is a weak entity that relies on the "Employee" (strong entity).
2. Relationship: The "Dependent" is identified by a combination of its partial key (e.g.,
Dependent Name) and the primary key of "Employee."
9. Explain the representation of a ternary relationship in an EER diagram.
1. Definition: A ternary relationship involves three entities.
2. Representation: Represented with a diamond connected to three entities, and cardinality
constraints specify associations.

10. What is an entity in ER modeling, and how does it differ from an attribute?
1. Entity: Represents a real-world object or concept with unique identification (e.g.,
"Student").
2. Attribute: Represents a property or characteristic of an entity (e.g., "Name" of Student).

11. What is the role of subclasses in the Enhanced ER model?


1. Role: Subclasses allow specialization, where an entity type is divided into subgroups with
additional attributes or relationships.
2. Example: A "Person" entity can have subclasses "Employee" and "Customer."

12. What is the concept of UNION types, and how are they represented in the Enhanced ER
model?
1. Concept: UNION types (categories) combine multiple entity types into a single superclass.
2. Representation: Represented with a circle labeled "U" connected to the participating entity
types and the resulting union type.

SECTION B

1. How do entities in a data model relate to real-world objects or concepts? Provide an example
of an entity in a healthcare system.
Entities in a data model represent real-world objects or concepts that have a distinct identity and
are essential to the system being modeled. These entities encapsulate attributes that describe the
object and can relate to other entities through relationships.
Example in Healthcare:
In a healthcare system, the "Patient" entity represents individuals receiving medical care. Attributes
of this entity might include:
• Patient ID: A unique identifier for each patient.
• Name: The patient's full name.
• Date of Birth: To calculate age or verify identity.
• Medical History: A summary of diagnoses, treatments, or medications.
The "Patient" entity reflects real-world individuals and ensures their information is accurately
represented and accessible in the system.

2. Explain the concept of composite attributes and provide an example. How are composite
attributes useful in representing real-world data more effectively?
Concept of Composite Attributes:
A composite attribute is an attribute that can be divided into smaller, meaningful parts. Each part
represents a sub-component of the main attribute.
Example:
A "Full Address" attribute can be broken into:
• Street Address
• City
• State
• Postal Code
Usefulness:
• Granularity: Enables detailed data representation and retrieval. For instance, you can query
customers in a specific city.
• Flexibility: Allows targeted updates (e.g., changing only the postal code).
• Real-World Alignment: Matches how addresses or similar data are structured in the real
world, improving usability and accuracy.

3. Discuss how entities and relationships in an ER model are mapped to tables in a relational
database schema.
1. Mapping Entities:
o Each entity is mapped to a table. The attributes of the entity become the columns of
the table.
o Example: The "Student" entity becomes a table with columns like Student_ID, Name,
and Age.
2. Mapping Relationships:
o One-to-One: Add the foreign key of one table to the other or merge the two tables.
o One-to-Many: Add the primary key of the "one" table as a foreign key in the "many"
table.
o Many-to-Many: Create a new table (junction table) containing the primary keys of
both entities as foreign keys.
3. Primary and Foreign Keys:
o Primary keys ensure each row is unique, while foreign keys establish relationships
between tables.
4. Constraints:
o Enforce data integrity and the rules defined by the ER model.
This process ensures the logical structure in the ER model is accurately reflected in the relational
schema.

4. How do you represent relationships between multiple entities in an ER diagram, and what
rules govern their representation?
1. Representation in ER Diagram:
o Relationships between entities are represented by diamonds connecting the entities.
o Lines link the diamond to the entities, with cardinality (e.g., 1:1, 1:N, or M:N)
specified near the lines.
2. Rules Governing Representation:
o Entity Participation: Each entity must participate in the relationship with appropriate
cardinality (total or partial participation).
o Attributes of Relationships: If a relationship has attributes (e.g., date of association),
they are represented as ovals connected to the diamond.
o Complex Relationships: For relationships involving more than two entities (e.g.,
ternary relationships), all entities connect to a single diamond.
Example:
A "Doctor" can treat multiple "Patients," and each "Patient" can be treated by multiple "Doctors"
(M:N relationship). This is represented by a "Treats" relationship diamond between the "Doctor"
and "Patient" entities.

5. Describe the relationship between entity types and their instances in a data model. How are
these represented in ER diagrams?
1. Entity Types:
o Represent a class of objects or concepts, defined by their attributes (e.g., "Student").
2. Entity Instances:
o Represent specific occurrences of the entity type (e.g., a student with ID 101 and
name "John").
3. Representation in ER Diagram:
o Entity types are represented by rectangles.
o Attributes (oval shapes) define the properties of the entity type. Instances are not
explicitly shown in ER diagrams but exist as rows in the corresponding relational
table.
4. Example:
o Entity Type: "Employee" with attributes like Employee_ID, Name, and Role.
o Instances: Actual employees like Employee_ID = 001, Name = "Alice", Role =
"Manager".
This distinction helps abstract the structure (entity type) from the data (instances).

6. Describe the concept of a derived attribute and explain how it differs from other types of
attributes in a data model.
1. Derived Attribute:
o An attribute whose value is calculated or derived from other attributes in the
database, rather than being stored directly.
2. Example:
o Age derived from Date of Birth.
3. Difference from Other Attributes:
o Regular attributes store directly entered values (e.g., Name, Date of Birth). Derived
attributes are computed dynamically based on existing data.
4. Representation:
o In ER diagrams, derived attributes are shown with a dashed oval.
This distinction helps optimize storage by avoiding redundancy and improving data consistency.

7. What is a relationship in data modeling? Discuss how relationships are represented in ER


diagrams and give an example of a relationship in an employee-management database.
1. Definition:
o A relationship represents an association between two or more entities in a data
model.
2. Representation in ER Diagram:
o Relationships are depicted using diamonds, with lines connecting them to the
associated entities.
o Cardinality (1:1, 1:N, or M:N) is specified near the connecting lines.
3. Example:
o In an employee-management database:
▪ Entities: Employee and Manager.
▪ Relationship: "Manages," representing the association where a Manager
supervises Employees (1:N relationship).
This captures the real-world interaction between employees and their managers.

8. What is the difference between specialization and generalization in ER modeling? Explain with
examples.
1. Specialization:
o The process of dividing a higher-level entity into lower-level entities based on specific
characteristics.
o Example: "Employee" specialized into "Manager" and "Technician."
2. Generalization:
o The process of combining lower-level entities into a higher-level entity.
o Example: "Car" and "Truck" generalized into "Vehicle."
3. Key Difference:
o Specialization moves from general to specific, while generalization moves from
specific to general.
These concepts help model hierarchies and inheritance in databases effectively.
SAMPLE 2
1. What is the key difference between a primary index and a secondary index in a database
system?
• Primary Index: Built on the primary key of a table, and each record in the database is stored
in the order of this index. It ensures the uniqueness of records and defines the physical
order of the data.
• Secondary Index: Built on non-primary key attributes to allow efficient retrieval of data
based on different search criteria. Secondary indexes do not affect the physical storage of
records and can support multiple indexes on different attributes.

2. Consider a B+ tree in which the maximum number of keys in a node is 5. What is the minimum
number of keys in any non-root node?
• Answer: In a B+ tree, the minimum number of keys in any non-root node is calculated as
⌈(maximum keys)/2⌉.
For a B+ tree with a maximum of 5 keys per node, the minimum number of keys in a non-
root node is ⌈5/2⌉ = 3. This ensures the tree remains balanced and allows efficient searching.
The root can have fewer keys if it has fewer than 2 children.

3. Given a B-tree where each node can contain at most 5 keys, determine the order of the B-tree.
• Answer: The order of a B-tree is defined as the maximum number of children a node can
have. The relationship between the number of keys and the order is:
o Order = max keys + 1.
For a B-tree where each node can hold a maximum of 5 keys, the order of the tree is
6. This means each node can have up to 6 children, ensuring efficient data structure
for searching and insertion operations.

4. What is the difference between a "committed" transaction and a "rolled-back" transaction in


the context of transaction control?
• Committed Transaction: A transaction is considered committed when all its changes have
been permanently saved to the database. This means all operations in the transaction are
complete and can no longer be undone. The database is now in a consistent state after the
transaction.
• Rolled-Back Transaction: A transaction is rolled back if an error occurs or the transaction is
canceled before it is committed. All changes made during the transaction are undone, and
the database is restored to its previous consistent state. Rollback ensures that no partial,
inconsistent data is saved.
5. What is two-phase locking (2PL) in transaction control, and how does it help in ensuring
transaction isolation?
• Two-Phase Locking (2PL): 2PL is a locking protocol used to ensure transaction isolation by
dividing the transaction into two phases:
o Growing Phase: A transaction acquires all the locks it needs but does not release any
locks.
o Shrinking Phase: Once the transaction begins releasing locks, it cannot acquire any
more locks.
• Ensures Isolation: 2PL ensures that no other transaction can access locked data until the
current transaction is completed. This prevents conflicts and ensures that the transactions
are serializable, meaning they behave as if executed one after another.

6. Explain what is meant by serializability in transaction control and why it is important.


• Serializability: Serializability refers to the concept that the result of executing multiple
transactions concurrently should be the same as if the transactions were executed in some
serial order, one after the other.
• Importance: It is critical to ensure consistency and correctness in a database, as concurrent
transactions could lead to anomalies like lost updates, temporary inconsistencies, or dirty
reads. Serializability guarantees that despite parallel execution, the final database state
remains consistent and as if transactions were executed serially.

7. What is the purpose of a lock in transaction control? Explain how locks help in ensuring the
isolation property of transactions.
• Purpose of Locks: Locks are mechanisms used to prevent multiple transactions from
accessing the same data simultaneously in conflicting ways. They ensure that one
transaction does not overwrite or read inconsistent data while another is modifying it.
• Ensuring Isolation: By acquiring locks on data, transactions prevent other transactions from
interfering until they complete. This prevents issues like dirty reads, non-repeatable reads,
and phantom reads. Locks enforce the isolation property, ensuring that each transaction
operates as if it were the only one running.

8. What is the difference between shared lock and exclusive lock in locking protocols?
• Shared Lock: A shared lock allows multiple transactions to read the data simultaneously but
prevents any of them from modifying the data. It is typically used for read-only operations.
• Exclusive Lock: An exclusive lock gives a transaction sole access to the data, preventing
others from reading or modifying it. It is used when a transaction needs to write data,
ensuring no other transaction can access it simultaneously.

9. What is the primary objective of the timestamp ordering protocol in a database management
system (DBMS)?
• Objective: The timestamp ordering protocol ensures transactions are executed in the order
of their timestamps, without requiring locks. It guarantees serializability by using
timestamps to enforce a total order on transactions.
• How it Works: Each transaction is assigned a unique timestamp. If a transaction attempts to
access a data item, it is allowed only if it does not conflict with a previously executed
transaction. This method avoids deadlock and lock contention by controlling the order of
transaction execution.

10. What is lock point in a transaction schedule?


• Lock Point: The lock point in a transaction schedule is the point at which a transaction
acquires its final lock. It is significant because it determines the order in which transactions
should be serialized. After the lock point, the transaction is guaranteed to be able to
proceed without acquiring further locks. Lock points are used to maintain the serializability
and consistency of the database.

SECTION B

1. What is indexing in databases? Explain the differences between primary, secondary, and
clustered indexes with examples. (2+3)
• Indexing in Databases: Indexing is a technique used to improve the speed of data retrieval
operations on a database table. An index is a data structure that stores a sorted mapping of
keys to data, allowing faster search operations. It works like an index in a book, where you
can quickly locate the page corresponding to a specific topic.
• Primary Index: The primary index is built on the primary key of a table. It ensures unique
identification of records and dictates the physical order of the data. For example, in a
Student table, a primary index on Student_ID helps quickly locate students based on their
IDs.
• Secondary Index: This index is created on non-primary key columns to enable quick lookups
for alternate search criteria. Unlike primary indexes, secondary indexes do not determine
the physical order of rows. For example, in the Student table, a secondary index on
Last_Name allows quick access based on the student’s last name.
• Clustered Index: A clustered index is a type of index where the rows of the table are stored
in the same order as the index. Only one clustered index is allowed per table. For instance,
in the Student table, if the Student_ID is indexed, the actual data is stored in that same
order. This is typically used when the primary key is indexed.

2. Write short notes on B-tree and B+ tree? (5)


• B-tree: A B-tree is a balanced tree data structure used in databases and file systems for
indexing. It maintains sorted data and allows searches, insertions, deletions, and sequential
access in logarithmic time. Each node can contain multiple keys and children, reducing the
height of the tree and improving efficiency. B-trees are used for storing large volumes of
data on disk, where every node is stored in a single disk block.
o Properties of B-tree:
▪ Each node can have multiple keys.
▪ Nodes are balanced, with all leaf nodes at the same level.
▪ A B-tree is self-balancing; it adjusts its structure as data is inserted or deleted.
• B+ tree: The B+ tree is an extension of the B-tree that stores all values in the leaf nodes
while internal nodes only store keys for navigation. This makes it more efficient for range
queries since the leaf nodes form a linked list, allowing for easier sequential access. The B+
tree is widely used in databases and file systems.
o Properties of B+ tree:
▪ Internal nodes store only keys, and all actual data records are stored in the leaf
nodes.
▪ Leaf nodes are linked to facilitate fast range queries.
▪ It supports both exact match and range queries efficiently.

3. Consider a B+ tree in which the search key is 12 bytes long, block size is 1024 bytes, and record
pointer is 8 bytes long. The maximum number of keys that can be accommodated in each non-
leaf node of the tree? (5)
• Calculation:
Each key is 12 bytes long, and each record pointer is 8 bytes long. In a non-leaf node, we
need to store both the keys and the pointers.
o Total space required for each key-pointer pair = 12 bytes (key) + 8 bytes (pointer) = 20
bytes.
o The block size is 1024 bytes. Therefore, the number of key-pointer pairs that can be
accommodated in a node is:
Number of key-pointer pairs=1024/20=51.2
Number of key-pointer = 1024/20 = 51.2
Number of key-pointer pairs=201024=51.2
Since the number of pairs must be a whole number, we round it down to 51.
Thus, the maximum number of keys that can be accommodated in each non-leaf
node is 51.
List the ACID properties. How does each component (ACID) ensure reliable transaction control?
(2+3)
• ACID Properties:
o Atomicity: Ensures that a transaction is treated as a single unit, meaning either all its
operations are completed successfully or none of them are. If a transaction fails, all
changes are rolled back, preserving the integrity of the database.
o Consistency: Guarantees that a transaction brings the database from one valid state
to another, maintaining all integrity constraints, rules, and relationships defined in
the schema.
o Isolation: Ensures that concurrent transactions do not affect each other's execution.
Intermediate states of a transaction are invisible to other transactions, preserving
data consistency.
o Durability: Ensures that once a transaction has been committed, its effects are
permanently recorded in the database, even in the event of system crashes or
failures.
• How ACID Ensures Reliable Transaction Control:
o Atomicity: Prevents partial updates or incomplete transactions, ensuring no
inconsistent data is left in the database.
o Consistency: Ensures that transactions only produce valid data according to the rules
of the database schema.
o Isolation: Prevents transactions from interfering with each other, ensuring that the
final database state is consistent regardless of transaction order.
o Durability: Guarantees that the results of a transaction are permanent, making the
system resilient to hardware failures.

2. Explain the difference between conflict serializability (CSS) and view serializability (VSS). How
do these concepts impact the correctness of a transaction schedule? (2+3)
• Conflict Serializability (CSS):
A schedule is conflict serializable if it can be transformed into a serial schedule by swapping
non-conflicting operations (those that do not access the same data item or involve a read-
write conflict). This ensures the final state of the database is the same as if the transactions
were executed in some serial order.
• View Serializability (VSS):
A schedule is view serializable if, for each transaction, the reads and writes in the schedule
are consistent with some serial ordering of the transactions. In other words, the schedule
should produce the same final results, even if the operations are reordered, as long as the
sequence of reads and writes on each data item is respected.
• Impact on Correctness:
o CSS ensures that transactions are conflict-free and thus provides a strong guarantee
of correctness by ensuring the schedule is equivalent to a serial execution.
o VSS is more flexible and allows certain non-serializable schedules, as long as the final
results are the same. It is more relaxed but still ensures that the database state after
transaction completion is correct.

3. Consider the following read-write schedule SSS over three transactions T1,T2,T1, T2,T1,T2, and
T3T3T3, where the subscripts in the schedule indicate transaction IDs:
S: r1(z); w1(z); r2(x); r3(y); w3(y); r2(y); w2(x); w2(y);
Which serial schedules (scheduling order) is/are conflict equivalent to S? Write down the process
of finding equivalent serial schedules. (2+3)
• Schedule S:
r1(z); w1(z); r2(x); r3(y); w3(y); r2(y); w2(x); w2(y);
• Conflict Analysis:
o r1(z) and w1(z) are from the same transaction, no conflict.
o r2(x) and w2(x) are in conflict because both are reading and writing on x.
o r3(y) and w3(y) are in conflict because both are reading and writing on y.
o r2(y) and w2(y) are in conflict because both are reading and writing on y.
Now, checking for conflicts between transactions:
o r1(z) and w1(z): No conflict (same transaction).
o r2(x) and w2(x): Conflict (both transactions, same data item).
o r3(y) and w3(y): Conflict (both transactions, same data item).
o r2(y) and w2(y): Conflict (both transactions, same data item).
Based on conflict relations, we can generate the following serial schedules:
o Serial Schedule 1: T1→T3→T2T1 \to T3 \to T2T1→T3→T2
o Serial Schedule 2: T1→T2→T3T1 \to T2 \to T3T1→T2→T3
Both schedules can preserve the conflict relationships and hence are conflict equivalent to
schedule SSS.
4. Let Ri(z)R_i(z)Ri(z) and Wi(z)W_i(z)Wi(z) denote read and write operations on a data element z
by a transaction TiT_iTi, respectively. Consider the schedule SSS with four transactions:
S: R4(x) R2(x) R3(x) R1(y) W1(y) W2(x) W3(y) R4(y)
Which serial schedules (scheduling order) is/are conflict equivalent to S? Write down the process
of finding equivalent serial schedules. (2+3)
• Schedule S:
R4(x) R2(x) R3(x) R1(y) W1(y) W2(x) W3(y) R4(y)
• Conflict Analysis:
o R4(x), R2(x), R3(x) are in conflict because they access the same data item x in
different transactions.
o R1(y), W1(y), W3(y) are in conflict because they access the same data item y in
different transactions.
o W2(x), W2(x) are in conflict.
Now, checking for conflicts between transactions:
o R4(x) and R2(x) conflict, implying that T4T4T4 should be ordered before T2T2T2.
o W1(y) and W3(y) conflict, implying that T1T1T1 should be ordered before T3T3T3.
o Since no other conflicts exist, we can generate the following serial schedules:
o Serial Schedule 1: T4→T2→T1→T3T4 \to T2 \to T1 \to T3T4→T2→T1→T3
o Serial Schedule 2: T4→T1→T2→T3T4 \to T1 \to T2 \to T3T4→T1→T2→T3

5. Let ri(z)r_i(z)ri(z) and wi(z)w_i(z)wi(z) denote read and write operations, respectively, on a
data item z by a transaction TiT_iTi. Consider the following two schedules:
S1: r1(x) r1(y) r2(x) r2(y) w2(y) w1(x)
S2: r1(x) r2(x) r2(y) w2(y) r1(y) w1(x)
Which schedules are conflict serializable and mention the reason for this. (2+3)
• Conflict Analysis for S1:
o r1(x) and r2(x) do not conflict (different transactions).
o r1(y) and r2(y) do not conflict (different transactions).
o w2(y) and w1(x) do not conflict (different data items).
o Conclusion: Schedule S1 is conflict serializable, because we can convert it into a serial
schedule T1→T2T1 \to T2T1→T2.
• Conflict Analysis for S2:
o r1(x) and r2(x) do not conflict (different transactions).
o w2(y) and w1(x) do not conflict (different data items).
o Conclusion: Schedule S2 is also conflict serializable because we can convert it into a
serial schedule T1→T2T1 \to T2T1→T2.

6. Show that the two-phase locking protocol ensures conflict serializability, and that transactions
can be serialized according to their lock points.
• Two-Phase Locking Protocol: The two-phase locking protocol ensures conflict serializability
by requiring transactions to acquire all necessary locks before releasing any locks (the
"growing" phase), and once they release a lock, they cannot acquire any new locks (the
"shrinking" phase). This protocol prevents cycles in the locking graph, thus ensuring that the
schedule is serializable.
o Why Conflict Serializable:
▪ The two-phase locking protocol guarantees that no transaction can perform
operations that conflict with others after it releases a lock. This prevents race
conditions and ensures that the final schedule is conflict serializable.
▪ Transactions are ordered based on their lock points, and no new locks are
acquired once the first lock is released. This serializes the schedule according
to transaction order.
o Serialization Based on Lock Points:
The lock point of a transaction is the point at which it starts releasing locks. The order
of transactions in the schedule can be determined by their lock points, thus ensuring
that the schedule is serializable according to the transaction order.
SECTION C

1. Define a B-Tree. Explain its properties and how it differs from a binary search tree. Illustrate
with an example how insertions and deletions are handled in a B-Tree.
Definition:
A B-Tree is a self-balancing search tree in which nodes can have multiple keys and children. It is
designed to efficiently handle large amounts of data stored on disk by minimizing disk I/O
operations.
Properties:
1. All leaf nodes are at the same depth.
2. A node can have at most m−1m-1m−1 keys and mmm children, where mmm is the order of
the B-Tree.
3. Each node contains keys in sorted order.
4. A non-leaf node with nnn keys has n+1n+1n+1 child pointers.
5. Keys in a node are arranged such that all keys in the left subtree are less, and all keys in the
right subtree are greater.
Difference from Binary Search Tree (BST):
• Structure: B-Tree nodes store multiple keys, whereas BST nodes store only one key.
• Balancing: B-Trees are balanced by design, while BSTs can become unbalanced, resulting in
poor performance.
• Disk I/O: B-Trees are optimized for disk I/O by grouping keys and pointers into blocks, while
BSTs are designed for in-memory operations.
Insertion Example in a B-Tree (Order = 3):
Insert keys: 10, 20, 5, 6, 15, 30, 25.
1. Insert 10 → [10]
2. Insert 20 → [10, 20]
3. Insert 5 → [5, 10, 20]
4. Insert 6 → Split into [10], children: [5, 6] and [20].
5. Insert 15 → [10], children: [5, 6], [15, 20].
6. Insert 30 → [10], children: [5, 6], [15, 20, 30].
7. Insert 25 → Split second child, new root: [10, 20], children: [5, 6], [15], [25, 30].
Deletion Example:
Delete 20:
1. Replace 20 with its inorder predecessor or successor (e.g., 15).
2. Adjust nodes to maintain balance.

2. Explain the structure of a B+ Tree and its advantages over a B-Tree. Discuss the significance of
leaf nodes in a B+ Tree with respect to range queries.
Structure of a B+ Tree:
1. Similar to a B-Tree, but all keys are stored in the leaf nodes.
2. Internal nodes only store keys and pointers to child nodes.
3. Leaf nodes are linked to each other to facilitate range queries.
Advantages over B-Tree:
1. Efficient Range Queries: Linked leaf nodes allow sequential access to keys, making range
queries faster.
2. Separation of Index and Data: Internal nodes act as an index, and leaf nodes store the actual
data, improving search efficiency.
3. Better Disk Utilization: Internal nodes are smaller, reducing disk I/O during searches.
Significance of Leaf Nodes in Range Queries:
Leaf nodes contain all the keys in sorted order and are linked, allowing fast traversal of a specific
range of keys without additional lookups. For example, finding all keys between 10 and 50 involves
traversing the linked list of leaf nodes.

3. A. B-Tree Calculation
• Search key size = 9 bytes
• Disk block size = 512 bytes
• Record pointer = 7 bytes
• Block pointer = 6 bytes
Let mmm be the order of the B-tree.
Each node can store m−1m-1m−1 keys and mmm pointers.
The total size of a node:
(m−1)×9 (keys)+m×6 (pointers)≤512 (block size)(m-1) \times 9 \, \text{(keys)} + m \times 6 \,
\text{(pointers)} \leq 512 \, \text{(block size)}(m−1)×9(keys)+m×6(pointers)≤512(block size)
9(m−1)+6m≤5129(m-1) + 6m \leq 5129(m−1)+6m≤512 9m−9+6m≤512 ⟹ 15m≤521 ⟹
m=⌊34.73⌋=349m - 9 + 6m \leq 512 \quad \implies \quad 15m \leq 521 \quad \implies \quad m =
\lfloor 34.73 \rfloor = 349m−9+6m≤512⟹15m≤521⟹m=⌊34.73⌋=34
Order of the B-tree = 34
B. B+ Tree Calculation:
• Child pointer = 6 bytes
• Search field value = 14 bytes
• Block size = 512 bytes
Each internal node has m−1m-1m−1 search keys and mmm child pointers.
(m−1)×14+m×6≤512(m-1) \times 14 + m \times 6 \leq 512(m−1)×14+m×6≤512 14m−14+6m≤512
⟹ 20m≤526 ⟹ m=⌊26.3⌋=2614m - 14 + 6m \leq 512 \quad \implies \quad 20m \leq 526 \quad
\implies \quad m = \lfloor 26.3 \rfloor = 2614m−14+6m≤512⟹20m≤526⟹m=⌊26.3⌋=26
Order of the B+ tree = 26

4. What is a recoverable schedule? Why is recoverability of schedules desirable? Are there


circumstances under which nonrecoverable schedules are desirable?
Recoverable Schedule:
A schedule is recoverable if no transaction commits until all transactions whose changes it depends
on have also committed. This prevents cascading rollbacks.
Desirability:
• Ensures consistent database state by avoiding committing transactions dependent on
uncommitted changes.
• Prevents loss of committed data in case of transaction failure.
Nonrecoverable Schedules:
Nonrecoverable schedules may be desirable in highly time-sensitive systems (e.g., real-time
databases) where performance outweighs the need for strict recoverability, though it risks data
inconsistency.

5. What is a cascadeless schedule? Why is cascadelessness desirable? Are noncascadeless


schedules ever desirable?
Cascadeless Schedule:
A schedule is cascadeless if no transaction reads uncommitted data from another transaction. This
avoids cascading rollbacks.
Desirability:
• Simplifies recovery by ensuring that a transaction failure does not propagate to others.
• Improves system stability and reduces complexity in handling rollbacks.
Noncascadeless Schedules:
Noncascadeless schedules may be desirable in performance-critical systems where the overhead of
ensuring cascadelessness (e.g., locking protocols) is too high. However, this increases the risk of
cascading rollbacks.

6. A. Benefits and Disadvantages of Strict Two-Phase Locking (2+3)


• Benefits:
o Ensures serializability and avoids cascading rollbacks.
o Provides a clear point at which all locks are released (transaction commit/abort).
• Disadvantages:
o May lead to deadlocks as transactions hold locks for long periods.
o Decreases concurrency by preventing other transactions from accessing locked
resources.
B. Benefits and Comparison of Rigorous Two-Phase Locking (2+3)
• Benefits:
o All locks (both shared and exclusive) are held until the transaction commits or aborts,
ensuring strict serializability.
o Avoids inconsistencies caused by premature lock releases.
• Comparison:
o Rigorous locking is stricter than basic two-phase locking, as it delays lock release until
commit/abort.
o While it increases reliability, it further reduces concurrency compared to standard
2PL.
SET 3
1. Highlight the benefits of using a DBMS over traditional file systems:
o A DBMS provides data abstraction, data consistency, concurrent access, and data
integrity through constraints.
o It supports ACID transactions, scalability, and easier management of large
datasets compared to traditional file systems.
2. Compare the use of primary and foreign key constraints in ensuring data integrity:
o A primary key ensures uniqueness and non-null values in a column of a table.
o A foreign key ensures referential integrity by establishing a relationship between
two tables.
3. Determine the total number of superkeys for R:
Given the functional dependencies, the candidate key is A. Therefore, the total number
of superkeys = 26−2(6−∣A∣)2^6 - 2^{(6 - |A|)} = 64−32=3264 - 32 = 32.
Answer: 32 superkeys.
4. Compare JOIN and Correlated Subquery in SQL:
o JOIN combines rows from two or more tables based on a related column.
o A correlated subquery executes for each row of the outer query, often less
efficient for large datasets.
5. Explain Dense Clustering and Sparse Clustering:
o Dense clustering groups objects with many closely packed data points.
o Sparse clustering forms groups with fewer data points, often suited for dispersed
data.
6. Purpose of the Cartesian Product in Relational Algebra:
o It combines every tuple of one relation with every tuple of another relation,
forming a larger relation.
o Often used as an intermediate step in JOIN operations.
7. Define the degree and cardinality of a relation:
o Degree: Number of attributes (columns) in a relation.
o Cardinality: Number of tuples (rows) in a relation.
8. Explain dense clustering and its significance in database indexing:
o Dense clustering stores index entries for every search key value.
o It speeds up data retrieval, especially for sequential queries.
9. Normalization form of R(X, Y, Z, W):
Given X→YX \rightarrow Y, Y→ZY \rightarrow Z, Z→WZ \rightarrow W, W→XW
\rightarrow X, the relation is in 1NF but not in 2NF, as there are partial dependencies.
10.Describe the augmentation axiom in Armstrong's axioms:
o If X→YX \rightarrow Y, then XZ→YZXZ \rightarrow YZ holds.
o It allows adding attributes to both sides of a functional dependency.
11.Explain the purpose of referential integrity in DBMS:
o It ensures consistency between related tables by maintaining valid references in
foreign key relationships.
12.Highlight the problems in Basic 2-Phase Locking Protocol:
Cascading rollbacks if uncommitted data is accessed.
Deadlocks due to improper lock releases.

Section B

1. Analyze the advantages of achieving data independence in database management


systems, particularly in terms of system maintenance, adaptability, and overall efficiency.
Answer: Achieving data independence in a DBMS offers several benefits:
• System Maintenance: Changes in data storage or structure can be made without
impacting the applications using the data. This minimizes downtime and reduces the
need for costly application modifications.
• Adaptability: As business needs change, data can be restructured without requiring
changes to the application layer.
• Efficiency: Data independence allows for more optimized storage and retrieval
techniques to be used, enhancing overall system performance while minimizing
redundancy and complexity.

2. Explain the importance of integrity constraints in maintaining accurate and consistent


data in a database system. Include examples of different types of integrity constraints.
Answer: Integrity constraints ensure that data in the database is both accurate and
consistent, preventing errors such as data duplication, invalid entries, or incomplete
records.
• Types of Integrity Constraints:
1. Primary Key: Ensures that each record is unique. For example, an employee's ID
number.
2. Foreign Key: Ensures relationships between tables are consistent. For example,
an order in an Orders table must reference an existing customer in the
Customers table.
3. Check Constraint: Ensures that data adheres to specified conditions. For
example, ensuring that an employee's salary is above a certain threshold.
4. Unique Constraint: Ensures all values in a column are distinct. For example,
email addresses in a user registration system.
OR
SQL Commands:
-- (i) Increase the salary of employees in departments where the average salary is less than
50,000 by 10%.
UPDATE Employee
SET Salary = Salary * 1.10
WHERE Department IN (SELECT Department FROM Employee GROUP BY Department
HAVING AVG(Salary) < 50000);

-- (ii) Set the ManagerID to NULL for employees in departments where the total salary
exceeds the department’s Budget.
UPDATE Employee
SET ManagerID = NULL
WHERE Department IN (SELECT Department FROM Employee GROUP BY Department
HAVING SUM(Salary) > (SELECT Budget FROM Department WHERE DeptID = Department));

3. Given relations:
Student (StudentID, StudentName, DeptID)
Course (CourseID, CourseName, DeptID)
Enrolls (StudentID, CourseID, Grade)
Write a relational algebra expression to find the names of students who have enrolled in
all courses offered by the "Computer Science" department and received a grade higher
than "B" in each course. Show every step of your solution, including the intermediate
operations.
Answer:
1. First, get the list of courses offered by the "Computer Science" department:
o σ DeptName = 'Computer Science' (Course)
2. Then, get students who have enrolled in each course and received a grade higher
than 'B':
o π StudentID (σ Grade > 'B' (Enrolls ⨝ Course))
3. To find students who have enrolled in all these courses, we use the division operator:
o π StudentName (Student ⨝ (Enrolls ÷ σ DeptName = 'Computer Science'
(Course)))
OR
Demonstrate the application of the division operator in relational algebra, using examples
from real-world scenarios. Explain the steps involved and clearly illustrate how the operator
can be used to solve practical problems.
Answer: In relational algebra, the division operator is used to find tuples in one relation that
are related to all tuples in another relation.
Example:
• Scenario: A company wants to find employees who have worked on all projects
assigned to their department.
o Employee (EmpID, DeptID)
o Project (ProjID, DeptID)
o WorksOn (EmpID, ProjID)
Relational Algebra:
• First, find the set of projects assigned to the department of interest:
o π ProjID (σ DeptID = 'Dept_X' (Project))
• Then, use the division operator to find employees who have worked on all those
projects:
o π EmpID (WorksOn ÷ π ProjID (σ DeptID = 'Dept_X' (Project)))
4. Explain the concept of "ON DELETE CASCADE" in foreign key constraints with suitable
examples and necessary SQL queries.
Answer: The "ON DELETE CASCADE" action in foreign key constraints automatically deletes
rows in the child table when the corresponding row in the parent table is deleted. It ensures
referential integrity by maintaining the consistency of data when a record is deleted in the
referenced table.
Example:
• Tables: Order (OrderID, CustomerID) and Customer (CustomerID)
o If a customer is deleted from the Customer table, all their associated orders in
the Order table should also be deleted.
SQL Query:
ALTER TABLE Order
ADD CONSTRAINT fk_customer
FOREIGN KEY (CustomerID)
REFERENCES Customer (CustomerID)
ON DELETE CASCADE;
OR
Justify the statement “Every candidate key is a super key but every super key is not a
candidate key” with suitable examples.
Answer:
• Candidate Key: A minimal superkey. It uniquely identifies a record and does not
contain any unnecessary attributes.
• Superkey: A set of attributes that uniquely identifies a record, but may contain extra
attributes beyond what is required.
Example:
• Consider a relation Employee(EmpID, Name, Email):
o Candidate Key: EmpID (since it uniquely identifies each employee).
o Superkey: EmpID, Name (this still uniquely identifies an employee but contains
extra information).

5. Explain why the set difference R1−R2 increases as R1 grows and decreases as R2 grows.
Provide examples.
Answer: The set difference R1 − R2 consists of tuples that are in R1 but not in R2. As the size
of R1 increases, more tuples are available for difference, increasing the result size.
Conversely, as R2 grows, more tuples are removed from R1, decreasing the result size.
Example:
• Let R1 = {1, 2, 3, 4} and R2 = {2, 3}.
R1 − R2 = {1, 4}
• If R2 increases to {1, 2, 3, 4}, R1 − R2 = {} (result size is 0).

6. Given the relation R=(A,B,C,D,E,G) and the functional dependencies


{AB→C, AC→B, AD→E, B→D, BC→A, E→G}, determine if the decomposition {AB, BC,
ABDE, EG} is a lossless join decomposition. Provide justification for your answer.
Answer: To determine if a decomposition is lossless-join, we check if the intersection of
attributes in the decomposed relations contains a key for the original relation.
Intersection of Relations:
• AB ∩ BC = {B, A}
• AB ∩ ABDE = {A, B}
• ABDE ∩ EG = {E}
Since we have common attributes (keys) in the decompositions, the decomposition is
lossless.

7. In a system using the Timestamping Protocol, the following operations are executed by
two transactions, T1 and T2:
T1: Read(A), Write(A), Read(B), Write(B)
T2: Write(A), Read(B), Write(C)
Assume the timestamps of T1 and T2 are:
T1: Timestamp= 1, T2: Timestamp= 2
Using the Timestamping Protocol, check if the above schedule will lead to any conflict
serializability violations. Identify the operation sequence and explain the conflict
resolution mechanism if violations occur.
Answer: The Timestamping Protocol resolves conflicts by checking the timestamps of
conflicting operations. If two transactions have conflicting operations and the earlier
timestamp transaction is requesting an operation, it is allowed. If the later timestamp
transaction requests the operation first, it is rolled back.
• T1 performs Read(A), then T2 performs Write(A)—this leads to a conflict as T1's read
cannot occur after T2's write.
• T2's Write(A) happens after T1's read, which is a violation of the protocol.
Thus, the system will resolve by rolling back T2’s operation to ensure serializability.

SECTION 3

1. i). Describe BCNF, its stricter requirements compared to 3NF, and illustrate the
difference with an example.
Answer:
• BCNF (Boyce-Codd Normal Form) is a stricter version of the Third Normal Form (3NF).
A relation is in BCNF if for every functional dependency X → Y, X is a superkey. This
means that there are no exceptions where a non-prime attribute depends on a non-
superkey, which is allowed in 3NF under certain conditions.
Difference between BCNF and 3NF:
o In 3NF, a relation is in 3NF if for every functional dependency X → Y, either:
1. X is a superkey, or
2. Y is a prime attribute (i.e., part of a candidate key).
o In BCNF, the requirement is stricter. Every determinant X in the functional
dependency must be a superkey, with no exceptions.
Example:
• Consider the relation R(A, B, C) with the functional dependency A → B and B → C.
o Here, A → B violates BCNF because A is a superkey but B is not a superkey. So,
this relation is not in BCNF.
o To achieve BCNF, we can decompose the relation into two:
1. R1(A, B) with A → B (BCNF)
2. R2(B, C) with B → C (BCNF)

1. ii). Consider the following functional dependencies:


F = {P → Q, Q → R, R → S, PQ → S}
G = {P → Q, P → R, P → S}
a. Split the dependencies in F into their canonical form and compute the minimal cover.
Answer:
• Canonical form of a set of functional dependencies requires that:
1. Each functional dependency has a single attribute on the right-hand side.
2. The left side of each functional dependency is minimal.
• F = {P → Q, Q → R, R → S, PQ → S}
o Split PQ → S into two functional dependencies:
▪ P → S and Q → S
Now, the canonical form of F is:
o F = {P → Q, Q → R, R → S, P → S, Q → S}
Minimal cover:
• Remove P → S and Q → S, because these can be derived from other functional
dependencies.
o P → S can be derived as P → Q → R → S.
o Q → S can be derived as Q → R → S.
Thus, the minimal cover for F is:
• F = {P → Q, Q → R, R → S}
b. Are F and G equivalent? Justify your answer.
Answer:
• F = {P → Q, Q → R, R → S}
• G = {P → Q, P → R, P → S}
To check if F and G are equivalent, we need to check if the closure of F and G result in the
same set of functional dependencies.
• F → G: Using F, we can derive P → R and P → S:
o P → Q, then Q → R, so P → R.
o P → Q, then Q → R, and R → S, so P → S. Hence, F implies G.
• G → F: Using G, we can derive Q → R and R → S:
o P → R and P → Q, so Q → R.
o P → S, so R → S can be derived. Hence, G implies F.
Therefore, F and G are equivalent.

2. Demonstrate why the statement "Every view-serializable schedule is also conflict


serializable" is false. Justify your explanation with a counterexample of a schedule.
Answer: The statement is false because view serializability is a broader concept than conflict
serializability. While conflict serializability ensures that a schedule can be transformed into a
serial schedule without violating the order of conflicting operations, view serializability
allows for schedules where operations on non-conflicting data can be interchanged,
potentially leading to a different outcome than conflict serializability.
Counterexample:
• Schedule S:
• T1: R(A), W(A)
• T2: R(A), W(A)
o Conflict serializability: There is a conflict between the operations of T1 and T2
on A. The schedule can be serialized, but there are conflicting operations.
o View serializability: This schedule is view serializable because there is no
conflict between the transactions regarding A. However, it is not conflict
serializable because the operations conflict with each other.
Thus, a view-serializable schedule may not be conflict serializable.

3. Consider the following relations:


Student (StudentID, Name, Age, Major, TeacherID)
Teacher (TeacherID, Name, Department)
Write SQL queries for the following:
i) Find the names of all students along with their respective teacher's name.
Answer:
SELECT Student.Name, Teacher.Name
FROM Student
JOIN Teacher ON Student.TeacherID = Teacher.TeacherID;
ii) List the names of teachers who are supervising students in the "Computer Science"
major.
Answer:
SELECT DISTINCT Teacher.Name
FROM Teacher
JOIN Student ON Teacher.TeacherID = Student.TeacherID
WHERE Student.Major = 'Computer Science';
iii) Find the name of the teacher with the highest number of students assigned to them.
Answer:
SELECT Teacher.Name
FROM Teacher
JOIN Student ON Teacher.TeacherID = Student.TeacherID
GROUP BY Teacher.Name
ORDER BY COUNT(Student.StudentID) DESC
LIMIT 1;
iv) Find the average age of students supervised by each teacher.
Answer:
SELECT Teacher.Name, AVG(Student.Age) AS AverageAge
FROM Teacher
JOIN Student ON Teacher.TeacherID = Student.TeacherID
GROUP BY Teacher.Name;
v) List all teachers who supervise students older than 25 years.
Answer:
SELECT DISTINCT Teacher.Name
FROM Teacher
JOIN Student ON Teacher.TeacherID = Student.TeacherID
WHERE Student.Age > 25;

4. Consider a schedule S with transactions T1, T2, and T3. The transactions perform the
following operations:
T1: Transfer Rs. 100 from account A to account B, then transfer Rs. 50 from account A to
account C.
T2: Add Rs. 200 to account A, then transfer Rs. 150 from account B to account C.
T3: Add Rs. 100 to account B, then transfer Rs. 75 from account C to account A.
Prepare a concurrent schedule for these three transactions following the two-phase
locking protocol (2PL).
Answer:
• T1: Lock A, Lock B, Unlock A, Lock C, Unlock B, Unlock C.
• T2: Lock A, Lock B, Unlock A, Lock C, Unlock B, Unlock C.
• T3: Lock B, Lock C, Unlock B, Lock A, Unlock C, Unlock A.
The two-phase locking protocol ensures that all locks are acquired before any unlock
operation, and locks are only released once the transaction is complete.

You might also like