DBMS
DBMS
• Schema Definition: Defining the logical structure of the database, including tables, indexes,
views, and relationships.
• Data Modeling: Creating data models that outline the database's structure and ensure it
meets business requirements.
• Physical Database Design: Determining how data will be stored, indexed, and managed on
the storage media to ensure efficient data retrieval and storage.
• Access Control: Managing user permissions and roles to ensure that only authorized users
have access to specific data and functionalities.
• Audit Trails: Monitoring and logging database activity to track access and modifications,
ensuring accountability and compliance with regulations.
• Backup and Recovery: Implementing regular backup routines to ensure data can be restored
in case of hardware failure, data corruption, or other disasters.
• Alert Management: Setting up alerts for specific events or thresholds, ensuring timely
intervention when issues arise.
• Software Updates: Keeping the DBMS software up-to-date by applying patches, updates,
and new releases to ensure the system is secure and running optimally.
• Compatibility Testing: Ensuring that updates and patches are compatible with existing
applications and systems.
• Planning and Execution: Planning and executing database upgrades with minimal disruption
to operations, including thorough testing and fallback plans.
What is DBMS?
A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to
store and retrieve database information that is both convenient and efficient. A DBMS is the
database itself, along with all the software and functionality. It is used to perform
different operations, like addition, access, updating, and deletion of the data.
The entity-relationship data model perceives the real world as consisting of basic objects,
called entities and relationships among these objects. It was developed to facilitate data
E-R Diagram
Entities: Objects or concepts that can have data stored about them. Entities are typically
represented by rectangles.
Attributes: Properties or details that describe an entity. Attributes are usually represented by ovals
connected to their respective entity rectangles.
Cardinality: Specifies the number of instances of one entity that can be associated with instances of
another entity. Common cardinalities include one-to-one, one-to-many, and many-to-many.
What is the use of relational query language in DBMS? Use the example to explain tuple and
domain relational calculus.
1. Data Retrieval: Query languages allow users to extract specific data from the database using
precise conditions.
2. Data Manipulation: They enable users to insert, update, and delete records in the database.
3. Data Definition: They help define the structure of the database, including tables, views, and
indexes.
4. Data Control: They provide commands to control access to the data, ensuring security and
integrity.
Tuple Relational Calculus is a non-procedural query language that specifies what data to retrieve
rather than how to retrieve it. In TRC, queries are expressed using tuples (rows).
Example:
To find the names of all students majoring in 'Computer Science' using TRC:
• T is a tuple variable that ranges over all tuples in the Students relation.
• The query returns the Name attribute of all tuples T where the Major attribute is 'Computer
Science'.
Domain Relational Calculus is another non-procedural query language that uses domain variables,
which take values from an attribute's domain, rather than entire tuples.
Example:
Using the same Students table, to find the names of all students majoring in 'Computer Science'
using DRC:
• The condition specifies that for the returned Name, there must exist a StudentID and Age
such that the combination (StudentID, Name, Age, 'Computer Science') is a tuple in the
Students relation.
LOSS LESS DECOMPOSITION:
every relation R that satisfies the FDs in F, the natural join of the projections or R
DEPEDENCY PRSERVATION:
Example:
Let R(A,B,C) AND F={A→B}. Then the decomposition of R into R1(A,B) and
Partial dependency:
candidate key ,if X is a proper subset of K and if F|= X→A, then A is said to be
partial dependent on K
NORMALIZATION
wastage of storage space and increase in the total size of the data stored. Relations
are normalized so that when relations in a database are to be altered during the life
1. 1NF
2. 2NF
Non prime attribute can not depend on the part of the PK.
3. 3NF
We must not derive prime attribute from any prime or non-prime attribute.
Schedules : sequences that indicate the chronological order in which instructions of
concurrent transactions are executed
transactions
must preserve the order in which the instructions appear in each individual
transaction
Serializability:
• Basic Assumption – Each transaction preserves database consistency.
schedule. Different forms of schedule equivalence give rise to the notions of:
o conflict serializability
o view serializability
• We ignore operations other than read and write instructions, and we assume that
between reads and writes. Our simplified schedules consist of only read and
write instructions.
Conflict Serializability
there exists some item Q accessed by both li and lj, and at least one of these
instructions wrote Q.
them. If li and lj are consecutive in a schedule and they do not conflict, their
results would remain the same even if they had been interchanged in the schedule.
Types of Locks
• Allows multiple transactions to read a data item but not modify it.
• Other transactions can also acquire a shared lock on the same data item.
• Prevents other transactions from acquiring any type of lock on the data item.
Locking Protocols
Locking protocols define the rules for acquiring and releasing locks to ensure transaction isolation
and serializability. Common protocols include:
• Growing Phase: A transaction can acquire locks but not release any lock.
• Shrinking Phase: A transaction can release locks but cannot acquire any new lock.
• All exclusive locks are held until the transaction commits or aborts.
• Both shared and exclusive locks are held until the transaction commits or aborts.
Key Concepts
1. Timestamps: Unique identifiers assigned to transactions. These are usually generated based
on the system clock or a logical counter, ensuring that each transaction has a distinct
timestamp.
2. Read Timestamp (RTS): For each data item, this is the timestamp of the last transaction that
successfully read the item.
3. Write Timestamp (WTS): For each data item, this is the timestamp of the last transaction
that successfully wrote the item.
Role Based Access Control: Role-Based Access Control (RBAC) is a security model used in
Database Management Systems (DBMS) to control access to data based on the roles of individual
users within an organization. In RBAC, permissions are assigned to roles, and users are then assigned
to specific roles. This simplifies the process of managing user permissions and access rights by
grouping users with similar responsibilities into roles and granting permissions to those roles.
1. Roles: Represents a set of permissions that are associated with a particular job function or
responsibility within the organization. Examples include "Manager," "Sales Representative,"
or "Administrator."
2. Permissions: Actions that users are allowed or denied to perform on resources within the
database. Permissions can include read, write, update, delete, execute, etc.
3. Users: Individuals who interact with the system. Users are assigned to one or more roles
based on their job responsibilities.
RBAC Model:
The RBAC model consists of three primary components:
1. Role Assignment: Users are assigned to roles based on their job responsibilities. A user can
be assigned to multiple roles, and a role can have multiple users assigned to it.
2. Role Permission Assignment: Permissions are assigned to roles based on the tasks or
operations associated with each role. Each role is associated with a set of permissions that
define the actions users assigned to that role can perform.
3. User-Role Activation: Users inherit the permissions associated with the roles to which they
are assigned. When a user activates a role, they gain the permissions associated with that
role for the duration of their session.
Data Warehousing : A data warehouse is a centralized repository designed to store large
volumes of structured data from multiple sources. The primary purpose of a data warehouse is to
facilitate complex queries and analysis, providing a unified view of the data across the organization.
1. Integration: Combines data from various sources into a single, coherent data store.
2. Subject-Oriented: Organized around key subjects or business areas, such as sales, finance, or
customer data.
1. ETL (Extract, Transform, Load): Processes that extract data from operational databases,
transform it into a suitable format, and load it into the data warehouse.
2. Data Storage: Optimized for query performance, often organized into fact tables (containing
measures) and dimension tables (containing context for measures).
3. Metadata: Data about the data, such as definitions, source information, and usage statistics.
Example Use Case: A retail company might use a data warehouse to store and analyze sales data
from various stores. This helps in generating reports on sales trends, inventory management, and
customer purchasing patterns.
Data Mining : Data mining involves analyzing large datasets to discover patterns, correlations,
trends, and insights that are not immediately obvious. It utilizes statistical, mathematical, and
machine learning techniques to extract valuable information from data.
1. Association Rule Learning: Discovering interesting relations between variables, e.g., market
basket analysis.
Example Use Case: A telecom company might use data mining to predict customer churn by
analyzing usage patterns, call records, and customer service interactions.
Concurrency Control : Concurrency control in Database Management Systems (DBMS) is the
process of managing simultaneous access to shared data by multiple users or transactions. It ensures
that transactions execute correctly and produce consistent results despite the concurrent execution
of multiple operations. Concurrency control mechanisms prevent conflicts, such as lost updates and
inconsistent reads, which can occur when multiple transactions access and modify the same data
concurrently.
1. Transaction: A logical unit of work that consists of one or more database operations (e.g.,
read, write, update) that must be executed as a single, indivisible unit.
2. Transaction Isolation: Each transaction should appear to execute in isolation from other
transactions, as if it were the only transaction running on the system. This ensures that the
effects of one transaction are not visible to other transactions until the transaction is
committed.
Data Consistency: Ensures that transactions execute correctly and produce consistent results, even
in a multi-user environment.
Isolation: Prevents interference between transactions, ensuring that each transaction operates on a
consistent snapshot of the database.
SQL Injection: SQL injection is a type of security vulnerability that occurs when an attacker
injects malicious SQL code into input fields or parameters of a web application that interacts with a
database. This allows the attacker to manipulate the database and execute unauthorized SQL
queries. SQL injection attacks can lead to data breaches, data loss, unauthorized access to sensitive
information, and even complete compromise of the affected system.
Armstrong’s Axioms: Armstrong's Axioms are a set of inference rules that help determine
functional dependencies in a relational database schema. These axioms were introduced by William
W. Armstrong in the context of relational database theory and are fundamental to the process of
normalization and database design.
Armstrong's Axioms:
1. Reflexivity (Augmentation):
2. Augmentation (Additive):
3. Transitivity (Transitive):
1. Atomicity
• If any part of the transaction fails, the entire transaction is rolled back, and the
database remains in its original state.
2. Consistency
• Ensures that a transaction brings the database from one valid state to another valid
state, maintaining all predefined rules, such as integrity constraints.
• Any data written to the database must be valid according to all defined rules,
including constraints, cascades, and triggers.
• Example: Inserting a new record in a table must adhere to all constraints such as
primary keys, foreign keys, and unique constraints. If a transaction violates these
constraints, it is rolled back to maintain database consistency.
3. Isolation
• Ensures that the operations of a transaction are isolated from the operations of
other concurrent transactions. Intermediate states of a transaction are invisible to
other transactions.
• Guarantee: Concurrent transactions do not interfere with each other. The outcome
should be the same as if the transactions were executed serially, one after the other.
• If two transactions are simultaneously updating the same set of records, isolation
ensures that each transaction's operations are not affected by the other. This
prevents scenarios such as dirty reads, non-repeatable reads, and phantom reads.
4. Durability
• Ensures that once a transaction has been committed, it will remain so, even in the
event of a system failure.
• Changes made by committed transactions are permanent and must survive system
crashes and hardware failures.
Update, insertion and deletion anomalies occur due to inconsistence in a database's design. Here's
why each anomaly happens:
1. Update Anomaly
• This anomaly occurs made in tables, it when an update is One table but not in all related can result
in inconsistencies in the DB. This happens because the DB is not properly normalized.
•For example, if you update a customers. address in one table but not in another table where their
orders are stored, it creates a mismatch.
2. Deletion Anomaly
• This arises when deleting data unintentionally removes other necessary data.
• For example, if deleting a product from a table also removes information about customers who
purchased that product, it's a deletion anomaly because fat should be independent of product data
3 Insertion Anomaly
• It happens when you can't add data into the DB without adding additional, unrelated data.
For example, if you can't add new record customer without them making anomaly exist an Order, it's
an insertion because a customer should be able to in the DB without placing on order.
Data Mining: Data mining in the context of a Database Management System (DBMS) refers to the
process of discovering patterns, correlations, anomalies, and useful information from large sets of
data stored in databases. It involves using sophisticated algorithms and statistical methods to extract
hidden knowledge that can be used for various applications such as decision making, prediction, and
data analysis.
1. Predictive Models: Creating models that can predict future trends based on historical data.
2. Clustering: Grouping a set of objects in such a way that objects in the same group are more
similar to each other than to those in other groups.
2. Data Integration: Combining data from multiple sources into a coherent data store.
Candidate Key: A candidate key is a set of one or more columns that can uniquely identify each row
in a table. A table can have multiple candidate keys, but only one can be chosen as the primary key.
1 123-45-6789 Alice 10
2 987-65-4321 Bob 20
3 111-22-3333 Charlie 10
In this table, both EmpID and SSN can be candidate keys because each can uniquely identify a row.
Super Key: A super key is any combination of columns that uniquely identifies a row in a table. It can
contain additional columns that are not necessary for unique identification. All candidate keys are
super keys, but not all super keys are candidate keys.
1 123-45-6789 Alice 10
2 987-65-4321 Bob 20
3 111-22-3333 Charlie 10
{EmpID} is a super key. {SSN} is a super key. {EmpID, SSN} is a super key. {EmpID, EmpName} is a
super key.
Primary Key: A primary key is a special candidate key chosen by the database designer to uniquely
identify rows in a table. It must contain unique values and cannot contain NULLs.
1 123-45-6789 Alice 10
2 987-65-4321 Bob 20
3 111-22-3333 Charlie 10
Here, EmpID is the primary key because it uniquely identifies each row and does not allow NULL
values.
Foreign Key
A foreign key is a column or a set of columns in one table that references the primary key columns in
another table. Foreign keys establish relationships between tables and ensure referential integrity.
What are the typical phases or query processing? With a sketch, discuss these phases in high level
query processing.
Query processing in a Database Management System (DBMS) involves several phases that transform
a high-level query, typically written in SQL, into an efficient execution plan that retrieves the
requested data. The main phases of query processing are:
1. Parsing and Translation: Convert the high-level SQL query into an internal representation, typically
a parse tree or an abstract syntax tree (AST).
Steps:
• Lexical Analysis: The SQL query string is broken into tokens (keywords, identifiers, operators,
etc.).
• Syntax Analysis: Tokens are analyzed against the grammar rules of SQL to form a parse tree.
• Semantic Analysis: Ensures that the query is meaningful, checking for the existence of tables
and columns, verifying data types, and ensuring that the operations are valid.
Example:
• Parse Tree: Represents the structure of the query, breaking it into SELECT, FROM, and
WHERE components.
2. Optimization: Transform the internal representation of the query into an efficient execution plan.
Steps:
• Physical Optimization: Select the best physical execution plan based on available access
paths (indexes, sequential scans) and cost estimation.
Example:
• Logical Plan: An optimized tree structure that represents the query in an algebraic form.
• Physical Plan: A sequence of operations (e.g., index scan, join operations) with specific
methods for executing each operation.
3. Evaluation/Execution: Execute the optimized plan to retrieve the result set from the database.
Steps:
• Execution: The DBMS follows the physical plan to access data, perform joins, apply filters,
and produce the final result set.
Example:
• Physical Plan Execution: Executes a series of operations (e.g., index scan on employees, filter
dept = 'HR', project name).
When is the decomposition of relation schema R into two relation schemes X and Y, said to be a
loss-less-join decomposition? Why is this property so important? Explain with example.
A decomposition of relation schema R into two relation schemas X' and 'Y' to be 7 said a loss-less
join decomposition if every instance of R can be reconstructed from instances X and y using a natural
join, without losing any information.
This property is important because it ensures that we can always retrieve the original data
accurately after decomposition. It prevent anomalies like loss of data during join Operations.
If we decompose R into two relation schemes X(A,B) and Y(B,C), this decomposition is loss-less-join
because we can join 'x2Y On attribute B to reconstruct the original relation R without losing
any information.
1. Generalization
• It involves the process abstracting common properties or attributes from multiple entities. to
create a more generalized entity
• Higher level more abstracts entities are created from lower - level, more specific entities.
• Example: Creating a Person entity from Student, Faculty, and staff entities in a university Dis
2. Specialization:
• It creates lower-level entities that inherit attributes and relationships from a higher- level entity.
• Example. Creating Student, Faculty and Staff entities from Person entity in a university DB.
3. Aggregation
• Example: Aggregating individual book copies into La single Book copies in a library DB.
Discuss the advantages and disadvantages of using DBMS as compared to a conventional file
system.
• DBMS: Ensures data integrity and consistency through constraints and rules. ACID
properties (Atomicity, Consistency, Isolation, Durability) ensure that transactions are
processed reliably.
• File System: Managing data integrity and consistency is more complex and prone to
errors without built-in mechanisms.
2. Data Security:
• File System: Security measures are typically more basic and may rely on operating
system-level file permissions, which are less granular.
• File System: Often leads to data redundancy and duplication, as managing related
data across multiple files is cumbersome.
1. Cost:
• DBMS: Can be expensive to purchase, install, and maintain, especially for large-scale
systems. This includes costs for software, hardware, and skilled personnel.
• File System: Generally less costly, as it uses basic file storage mechanisms provided
by the operating system.
2. Complexity:
• DBMS: More complex to set up and manage, requiring specialized knowledge and
skills.
• File System: Simpler to implement and use, with less overhead for small-scale or
simple applications.
3. Performance:
• DBMS: May introduce overhead due to its abstraction layers, especially for simple,
read-heavy workloads where the overhead of a DBMS may not be justified.
• File System: Can be faster for straightforward, sequential file operations where the
additional functionality of a DBMS is not needed.
What is weak entity set? Explain with suitable example. How weak entities are represented as
relational schemas.
A weak entity set is an entity set that cannot be uniquely identified by its own attributes alone. It
relies on a "strong" or "owner" entity set to ensure its unique identification. A weak entity is
dependent on a strong entity, and this relationship is often represented through a special type of
relationship called an "identifying relationship."
1. Dependence on Strong Entity: A weak entity set depends on a strong entity set for its
existence and cannot exist independently.
2. Partial Key: A weak entity set has a partial key, which is an attribute or set of attributes that
can uniquely identify weak entities within the context of a specific strong entity.
3. Identifying Relationship: The relationship between a weak entity and its corresponding
strong entity is called an identifying relationship. This relationship helps to uniquely identify
the weak entity.
Example
In this scenario, Course is a weak entity because CourseID alone cannot uniquely identify a course
across all departments. However, within the context of a specific department, the combination of
DeptID and CourseID can uniquely identify a course.
To represent the weak entity set and its relationship with the strong entity set in a relational
schema, you follow these steps:
2. Weak Entity Schema: Create a table for the weak entity, including a foreign key to the strong
entity's primary key and the weak entity's partial key.