DBMS
DBMS
Data abstraction in DBMS plays a crucial role in simplifying and securing data
access while managing its complexity. It essentially hides the intricacies of
how data is stored and accessed from users, providing a clear and concise
interface for interaction. Here's a breakdown of its key aspects:
Levels of Abstraction:
● Physical Level: The lowest level, dealing with the actual physical
storage of data like disk blocks and pointers. Users have no direct
access to this level.
● Logical Level: Defines the overall database structure, including tables,
columns, data types, and relationships. Users interact with this level
through queries and data manipulation languages (DMLs).
● View Level: Presents customized subsets of data based on specific user
needs and access privileges. This allows different users to see different
versions of the same data, enhancing security and privacy.
Data Definition Language (DDL) is a subset of SQL used to create, modify, and
delete database objects. Think of it as the architect blueprints for your
database, defining the structure and organization of your data. Unlike Data
Manipulation Language (DML) which focuses on retrieving and manipulating
data, DDL deals with the "what" rather than the "how" of your data stored.
Object Creation:
Object Modification:
Object Deletion:
Data Manipulation Language (DML) is your magic wand for interacting with the
actual data stored within your database. It's the counterpart to DDL, which
focuses on defining the "what" (database structure), while DML deals with the
"how" (manipulating data). Think of it as the instructions you give your
database to retrieve, insert, update, or delete data.
● SELECT: This retrieves data from one or more tables based on specified
criteria. You can filter, sort, and aggregate data to extract valuable
insights.
● INSERT: This adds new rows of data into a table, following the defined
schema and constraints.
● UPDATE: This modifies existing data in a table, changing specific values
or columns based on conditions.
● DELETE: This removes unwanted rows of data from a table permanently.
● Running a query to find all customers who made purchases in the last
month.
● Adding a new employee record to a company database.
● Updating the price of a product in an online store.
● Deleting outdated order records from a database.
Data models are blueprints for organizing and accessing data in databases.
Here's a comparison of four key models:
2. Network Model:
3. Relational Model:
The best model depends on your specific needs and data complexity.
Integrity constraints are the rules that ensure the validity, consistency, and
accuracy of data within a database. They act as safeguards, preventing invalid
data from entering the system, and maintaining the logical relationships
between various data elements.
1. Domain Constraints:
● These define the valid values that can be stored in a specific column.
For example, a "customer age" column might only allow values between
1 and 120.
● Types:
○ Data type constraints: Specify the data type like integer, string,
date, etc.
○ Range constraints: Limit the range of acceptable values (e.g., age
between 18 and 65).
○ Check constraints: Define custom validation rules for specific
data formats or patterns.
ata manipulation operations are the actions you take to interact with the data
stored in a database. These operations cover a wide range of tasks, from
simply retrieving specific data points to transforming and analyzing entire
datasets.
Basic Operations:
● Read (SELECT): This retrieves data from one or more tables based on
specified criteria. You can filter, sort, and aggregate data to extract
valuable insights.
● Create (INSERT): This adds new rows of data into a table, following the
defined schema and constraints.
● Update (UPDATE): This modifies existing data in a table, changing
specific values or columns based on conditions.
● Delete (DELETE): This removes unwanted rows of data from a table
permanently.
Advanced Operations:
For storing the data, there are different types of storage options available. These
storage types differ from one another as per the speed and accessibility. There are
the following types of storage devices used for storing the data:
○ Primary Storage
○ Secondary Storage
○ Tertiary Storage
Primary Storage( RAM)
Fastest access, but volatile and expensive. Used for active data sets in use.
It is the primary area that offers quick access to the stored data. We also know the
primary storage as volatile storage. It is because this type of memory does not
permanently store the data. As soon as the system leads to a power cut or a crash,
the data also get lost. Main memory and cache are the types of primary storage.
○ Main Memory: It is the one that is responsible for operating the data that is
available by the storage medium. The main memory handles each instruction
of a computer machine. This type of memory can store gigabytes of data on
a system but is small enough to carry the entire database. At last, the main
memory loses the whole content if the system shuts down because of power
failure or other reasons.
1. Cache: It is one of the costly storage media. On the other hand, it is the
fastest one. A cache is a tiny storage media which is maintained by the
computer hardware usually. While designing the algorithms and query
processors for the data structures, the designers keep concern on the cache
effects.
● Secondary Storage (Hard Disk Drives, SSDs): Slower access than RAM,
but persistent and more affordable. Used for storing large datasets.
Secondary storage is also called as Online storage. It is the storage area that allows
the user to save and store data permanently. This type of memory does not lose the
data due to any power failure or system crash. That's why we also call it
non-volatile storage.
There are some commonly described secondary storage media which are available
in almost every type of computer system:
○ Flash Memory: A flash memory stores data in USB (Universal Serial Bus)
keys which are further plugged into the USB slots of a computer system.
These USB keys help transfer data to a computer system, but it varies in size
limits. Unlike the main memory, it is possible to get back the stored data
which may be lost due to a power cut or other reasons. This type of memory
storage is most commonly used in the server systems for caching the
frequently used data. This leads the systems towards high performance and
is capable of storing large amounts of databases than the main memory.
○ Magnetic Disk Storage: This type of storage media is also known as online
storage media. A magnetic disk is used for storing the data for a long time. It
is capable of storing an entire database. It is the responsibility of the
computer system to make availability of the data from a disk to the main
memory for further accessing. Also, if the system performs any operation
over the data, the modified data should be written back to the disk. The
tremendous capability of a magnetic disk is that it does not affect the data
due to a system crash or failure, but a disk failure can easily ruin as well as
destroy the stored data.
Tertiary Storage (Tape Drives, Cloud Storage): Very slow access, but extremely
inexpensive. Used for long-term archival purposes
It is the storage type that is external from the computer system. It has the slowest
speed. But it is capable of storing a large amount of data. It is also known as Offline
storage. Tertiary storage is generally used for data backup. There are following
tertiary storage devices available:
○ Tape Storage: It is the cheapest storage medium than disks. Generally, tapes
are used for archiving or backing up the data. It provides slow access to data
as it accesses data sequentially from the start. Thus, tape storage is also
known as sequential-access storage. Disk storage is known as direct-access
storage as we can directly access the data from any location on disk.
Imagine sorting all the books in a library by author's name instead of browsing
each shelf randomly. An index in a DBMS does something similar. It acts as a
sorted data structure based on specific columns, allowing for rapid
identification and retrieval of data rows that match a query's criteria.
B-trees are a self-balancing tree data structure designed for efficient data
storage and retrieval, particularly in databases. They offer several advantages
over simpler tree structures like binary search trees, especially when dealing
with large datasets
are a self-balancing tree data structure designed for efficient data storage and
retrieval, particularly in databases. They offer several advantages over simpler
tree structures like binary search trees, especially when dealing with large
datasets
Algorith
Sr. No. Time Complexity
m
1. Search O(log n)
2. Insert O(log n)
3. Delete O(log n)
● Multiple children per node: Unlike binary search trees which have at
most two child nodes, B-tree nodes can have a minimum and maximum
number of children (often denoted by "t"). This allows for storing more
data points in each node and reducing overall tree height.
● Balanced structure: B-trees automatically adjust their structure to
maintain a roughly consistent height across the tree. This ensures
efficient searches, regardless of the data distribution, because the
number of levels to traverse remains predictable.
● Ordered data: Data within each node is kept sorted in ascending order.
This facilitates faster searching by quickly narrowing down the potential
location of the target data point.
● Dynamic insertion and deletion: B-trees can efficiently handle data
insertion and deletion without compromising the balanced structure.
They automatically redistribute data or split/merge nodes to maintain
order and search performance
●
●
●
●
●
●
●
●
●
●
●
●
● .
Hashing in DBMS plays a crucial role in optimizing data access and retrieval.
It's a powerful technique that leverages hash functions to transform large,
variable-length data into short, fixed-length strings called hash values. These
values essentially act as fingerprints for your data, enabling quick
identification and comparison, especially within large datasets.
Static Hashing
Dynamic Hashing
○ The dynamic hashing method is used to overcome the problems of static
hashing like bucket overflow.
Static Hashing:
● Concept: The number of hash buckets and the hash function are fixed at
the time the hash table is created. Data is evenly distributed across the
pre-defined number of buckets based on their hash values.
● Advantages:
○ Simple and efficient: Easy to implement and understand, offering
predictable performance for operations like insertion and search.
○ Less overhead: Requires minimal memory and processing
resources for maintenance.
○ Suitable for static datasets: Works well for situations where the
data size and access patterns are relatively stable.
● Disadvantages:
○ Performance bottleneck: Can suffer from collisions and
performance degradation as the data grows and fills up buckets
unevenly.
○ Limited scalability: Difficult to adapt to changes in data size or
access patterns, requiring rebuilding the entire hash table if
significant changes are needed.
○ Wasteful space: May lead to empty buckets if the data distribution
is uneven, potentially wasting storage space.
Dynamic Hashing:
Shared Lock : Shared lock is also known as read lock which allows
which is holding a shared lock can only read the data item but it can not
Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive
lock allows a transaction to update a data item. Only one transaction can
hold the exclusive lock on a data item at a time.
Two-Phase Locking Protocol (2PL) – a cornerstone of transaction processing
in database management systems! It's one of the most widely used
concurrency control mechanisms for ensuring data consistency and
preventing interference between concurrent transactions accessing the same
data.
transaction acquires a lock, that lock can not be released until the
item. Once the transaction starts releasing the locks, it can not
Lock Types:
● Shared Lock: Allows other transactions to read the data, but prevents
them from modifying it.
● Exclusive Lock: Allows only the holding transaction to both read and
write the data, blocking other transactions from accessing it in any way.
Benefits of 2PL:
Advantages of Concurrency
In general, concurrency means, that more than one transaction can work
on a system. The advantages of a concurrent system are:
Utilization.
Disadvantages of Concurrency
● Overhead: Implementing concurrency control requires additional
accuracy.
ACID property
A transaction is a single logical unit of work that accesses and possibly
modifies the contents of a database. Transactions access data using read
and write operations.
In order to maintain consistency in a database, before and after the
transaction, certain properties are followed. These are called ACID
properties.
Atomicity: Imagine a bank transfer. Atomicity guarantees that either the entire
transfer happens successfully (money deducted from sender, credited to
receiver) or not at all. No partial transfers! This prevents inconsistent states
and incomplete changes.
Consistency: Think of updating a shopping cart. Consistency ensures that the
database remains in a valid state after a transaction. For example, updating
product availability only after successfully deducting the quantity from
inventory maintains consistency.
example above,
The total amount before and after the transaction must be maintained.
Isolation: Picture multiple users booking movie tickets at the same time.
Isolation ensures that one user's booking doesn't interfere with another's.
Even if multiple bookings happen concurrently, each appears to complete in
its own isolated environment, preventing overbooking or inconsistent seat
allocations.
Durability: Imagine a power outage during a purchase. Durability guarantees
that once a transaction is committed (successfully completed), its changes are
permanently stored in the database, even if the system crashes. No more
worrying about lost data!
● Two schedules are considered equivalent if they produce the same final
database state.
● A schedule is serializable if it is equivalent to some serial schedule.
● A schedule represents the operations from different transactions in a
concurrent run.
Types of Serializability:
● Strict serializability: The most restrictive form, ensuring the final state is
identical to a serial schedule where transactions are executed in order
of their start times.
● Conflict serializability: Allows more flexibility, as long as conflicting
operations from different transactions appear in the same relative order
in all equivalent serial schedules.
Benefits of Serializability:
Types of Failures:
Transaction logs track changes made to the database. By replaying the redo
log from the point of failure, the database can be brought back to a consistent
state
● Benefits:
○ Efficient recovery: Undoing changes directly is often faster than
replaying redo logs, especially for short-lived transactions.
○ Minimizes data loss: Only unwanted changes are reversed,
potentially preserving some recent data compared to restoring
from a backup.
○ Easy to understand: The concept of undoing actions is intuitive
and easy to comprehend.
○
● Drawbacks:
○ Increased overhead: Maintaining undo logs adds overhead to the
system, consuming storage space and requiring processing
power to keep them updated.
○ Limited effectiveness: Undo logs typically only store information
for recent transactions. Recovering from older failures might
require other techniques like backups.
● Benefits:
○ Guaranteed consistency: Redo logs ensure that only successful
transactions are applied, guaranteeing data integrity and
consistency even after failures.
○ Scalability: Redo logs can be large enough to store information
for all recent transactions, allowing recovery from older failures
compared to undo logs.
○ Efficient for long-lived transactions: Replaying committed
changes can be faster than reversing a large number of undo
operations for complex or long-running transactions.
● Drawbacks:
○ Increased overhead: Redo logs can be large and require
significant storage space and processing power to manage.
○ Potential data loss: If a failure occurs before the transaction is
logged, its changes might be lost and need to be recovered from
backups.
○ More complex: The concept of replaying logs might be less
intuitive compared to simply undoing actions.
Feature Rollback/Undo Commit/Redo
Unit 5
Database security encompasses everything you do to safeguard your
database from unauthorized access, malicious attacks, and accidental or
intentional damage. It's like a sturdy vault securing your most valuable
information, ensuring its confidentiality, integrity, and availability.
Authentication:
Authorization:
● Concept: Determining what actions a user can perform after they've
been authenticated. Think of it like assigning roles and permissions in a
team project.
● Techniques: Access control lists (ACLs), user roles and groups,
resource-based access control (RBAC) where permissions are assigned
based on specific resources.
● Importance: Limits user actions based on their roles and needs,
preventing unauthorized modifications or misuse of data.
Access Control:
Strengths:
● Simple to implement: Users have direct control over their data and can
easily share it with others.
● User autonomy: Users can manage access based on their own needs
and preferences.
● Flexible: Can be adapted to various situations and user groups.
Weaknesses:
Strengths:
Weaknesses:
Role-Based Access Control (RBAC) takes a different direction than DAC and
MAC, offering a well-structured approach to access control. Imagine assigning
responsibilities and permissions based on roles in a play, with actors having
access to props and areas relevant to their assigned roles. RBAC works
similarly, assigning pre-defined roles with associated permissions to users,
granting access based on their assigned roles.
Strengths:
Weaknesses:
SQL injection usually occurs when you ask a user for input, like their
username/userid, and instead of a name/id, the user gives you an
SQL statement that you will unknowingly run on your database
Role-Based Access Control (RBAC):