DBMS 02
DBMS 02
of Pages :
INSTRUCTIONS TO CANDIDATES :
SECTION-A:
Data Abstraction:
Data abstraction is the process of hiding the complexities of the database from users and
providing only essential information. It simplifies the interaction with the database by
separating data into three abstraction levels:
Data Independence:
Data independence refers to the ability to modify one level of the database schema without
affecting other levels.
1. Physical Data Independence: Changes in physical storage do not affect the logical schema.
o Example: Moving data from HDD to SSD should not impact applications.
2. Logical Data Independence: Changes in the logical schema do not affect the view schema.
o Example: Adding a new column to a table should not affect user views.
2. Differentiate between DDL and DML.
Open-Source DBMS:
1. MySQL
2. PostgreSQL
3. SQLite
4. MariaDB
Commercial DBMS:
1. Oracle Database
2. Microsoft SQL Server
3. IBM Db2
4. SAP HANA
Query Optimization:
Query optimization is the process of improving the efficiency of SQL queries to minimize
execution time and resource usage.
Key Techniques:
Example:
A query without optimization:
Role of Indices:
An index is a data structure that improves the speed of data retrieval operations in a database
at the cost of additional storage.
Advantages:
Types of Indexes:
1. Clustered Index: Reorders the data in the table to match the index.
2. Non-Clustered Index: Creates a separate structure that points to the data.
3. Unique Index: Ensures that all values in a column are unique.
1. Lost Update: When two transactions update the same data, and one overwrites the
other.
o Example: T1 and T2 both update a salary value simultaneously.
2. Dirty Read: A transaction reads uncommitted data from another transaction.
o Example: T2 reads data modified by T1, but T1 rolls back.
3. Non-Repeatable Read: A transaction reads the same data twice and gets different
values due to updates by another transaction.
o Example: T1 reads a value; T2 updates it; T1 reads it again and sees a different value.
4. Phantom Read: A transaction reads a set of rows, but another transaction inserts or
deletes rows, altering the result.
o Example: T1 queries a range; T2 adds a new record in the range.
1. System Crash:
o Cause: Hardware or software failures.
o Example: Power outage, operating system crash.
2. Media Failure:
o Cause: Corruption of storage media.
o Example: Hard disk crash.
3. Transaction Failure:
o Cause: Errors within a transaction.
o Example: Division by zero, logical constraints violated.
4. Application Errors:
o Cause: Bugs in the application accessing the database.
o Example: Incorrect SQL logic.
5. Disk Failure:
o Cause: Damage to disk sectors or data corruption.
o Example: Loss of a database file.
Intrusion Detection:
Intrusion detection involves monitoring database systems to identify unauthorized access or
suspicious activities.
Types:
Example Threats:
• SQL injection.
• Privilege escalation attacks.
Features:
Examples:
Advantages:
SECTION B:
Integrity Constraints:
Integrity constraints are rules enforced in a database to maintain data accuracy, consistency,
and validity. They ensure that the database adheres to its schema and real-world expectations.
1. Hashing:
o Hashing uses a hash function to compute the location of data in a hash table.
o Advantages:
▪ Fast retrieval for exact matches.
▪ Efficient for primary key lookups.
o Disadvantages:
▪ Not suitable for range queries.
▪ Collisions require additional handling (e.g., chaining or open addressing).
2. B-Trees:
o A B-tree is a balanced tree structure used for indexing, where data is stored in
sorted order.
o Advantages:
▪ Supports range queries and ordered traversal.
▪ Handles insertions, deletions, and updates efficiently.
o Disadvantages:
▪ Slower for exact lookups compared to hashing.
▪ Requires more space for tree maintenance.
• Hashing: Ideal for exact match queries (e.g., retrieving records by primary key).
• B-Trees: Suitable for range queries, ordered traversal, and multi-level indexing.
Authentication:
Authentication verifies the identity of a user or system trying to access a database. It ensures
that only legitimate users can log in.
Methods:
Example:
Authorization:
Authorization determines what actions a user or system is allowed to perform after
authentication. It ensures that users can only access resources they have permissions for.
Types of Authorization:
1. Role-Based Access Control (RBAC): Users are assigned roles, and each role has specific
permissions.
o Example: Admin can modify data; users can only read data.
2. Discretionary Access Control (DAC): Permissions are assigned directly to users or roles.
o Example: A user can be given SELECT and UPDATE permissions for a table.
3. GRANT SELECT, UPDATE ON Employees TO user1;
4. Mandatory Access Control (MAC): Access is controlled based on security levels.
Importance:
Key Models:
b) Distributed Databases:
A distributed database is a collection of data stored across multiple locations, connected
through a network.
Key Features:
Advantages:
Challenges:
Example: A banking system with branches in multiple cities using a distributed database to
manage customer accounts locally while synchronizing globally.
Features:
Example:
An employee record is stored as an object with properties (Name, Salary) and methods
(CalculateBonus()).
Object-Relational Databases (ORDB):
ORDB extends relational databases by adding object-oriented features.
Features:
Example: PostgreSQL supports JSON and array data types, allowing it to handle semi-
structured data.
SECTION C:
7. What is a Data Model? State and explain various data models with suitable
examples.
A data model is an abstract representation of the structure of data, the operations that can be
performed on the data, and the relationships between different data elements. It is used to
define how data is stored, organized, and manipulated in a database. Data models help in
designing the database structure and provide a framework for managing and interacting with
data. There are several types of data models in Database Management Systems (DBMS),
each with its own approach to data organization and manipulation. Below are the most
commonly used data models:
• Description: The hierarchical data model organizes data in a tree-like structure where each
record has a single parent, and each parent can have multiple children. The structure is a set
of hierarchical relationships between data elements.
• Example: A typical example is the organizational chart of a company. Each department can
have multiple employees, and each employee belongs to a specific department.
• Advantages: Data retrieval is fast if the hierarchy is small.
• Disadvantages: It is rigid and not flexible for representing many-to-many relationships.
• Description: The relational data model organizes data in tables (also called relations), where
each table is made up of rows (tuples) and columns (attributes). Data is related using keys,
such as primary keys and foreign keys.
• Example: A "Student" table where each row represents a student, and columns represent
attributes like student ID, name, and age.
• Advantages: Simple and flexible, supports powerful querying using SQL (Structured Query
Language).
• Disadvantages: Not well-suited for complex hierarchical or network relationships.
• Description: This model is used in NoSQL databases, where data is stored as documents,
typically in JSON, BSON, or XML formats. It is flexible, allowing for semi-structured or
unstructured data.
• Example: A collection of documents where each document represents a product, with
attributes like "name," "price," and "description" stored in a JSON format.
• Advantages: Highly flexible and scalable, suitable for applications with large and diverse
datasets.
• Disadvantages: Querying can be less efficient than relational models.
8. Write notes on the following:
a) Relational Algebra
• Description: Relational algebra is a procedural query language used to query and manipulate
relational databases. It consists of a set of operations that take one or more relations as
input and produce a new relation as output. It is used to define queries without the need for
procedural code.
• Operations:
1. Select (σ): Filters rows based on a condition. Example: σ (Age > 20)(Student).
2. Project (π): Extracts specific columns from a table. Example: π (Name,
Age)(Student).
3. Union (∪): Combines two relations into one, eliminating duplicates. Example:
Student ∪ Teacher.
4. Set Difference (−): Returns rows that are in the first relation but not in the second.
Example: Student − Graduate.
5. Cartesian Product (×): Combines each row from the first relation with every row
from the second relation. Example: Student × Course.
6. Join: Combines related rows from two relations based on a common attribute.
Example: Student ⨝ Enrollment.
b) Normal Forms
c) Query Processing
• Description: Query processing is the set of steps that a DBMS follows to execute a query.
This includes parsing, optimization, and execution of the query.
o Parsing: The query is parsed to check for syntax and semantics.
o Optimization: The DBMS creates an optimized execution plan that minimizes cost
(such as disk I/O or CPU time).
o Execution: The optimized query plan is executed, and the results are returned to the
user.
d) Join Strategies
• Description: Join strategies are algorithms used by a DBMS to perform joins between two or
more tables. Some common join strategies include:
1. Nested Loop Join: For each row in one table, scan all rows in the other table.
2. Sort-Merge Join: Both tables are sorted on the join column, and the rows are
merged based on the sorted order.
3. Hash Join: A hash table is built for one table, and the other table is probed to find
matching rows.
4. Index Join: Uses indexes to speed up the search for matching rows in a table.
ACID stands for Atomicity, Consistency, Isolation, and Durability. These are the four key
properties that ensure reliable transaction processing in a DBMS.
Lock-based Protocol
• Description: The lock-based protocol ensures that transactions acquire locks on data before
accessing it. These locks prevent other transactions from accessing the same data
simultaneously, thereby preventing conflicts.
• Types of Locks:
1. Shared Lock (S-lock): Allows multiple transactions to read a data item but prevents
them from writing to it.
2. Exclusive Lock (X-lock): Prevents other transactions from both reading and writing
to the data item.
• Two-Phase Locking Protocol (2PL): Ensures that transactions follow two phases: the growing
phase, where locks can be acquired, and the shrinking phase, where locks are released. This
guarantees serializability.
Timestamp-based Protocol