0% found this document useful (0 votes)
8 views44 pages

DBMS

SOME DBMS NOTES

Uploaded by

patixiw394
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views44 pages

DBMS

SOME DBMS NOTES

Uploaded by

patixiw394
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Database systems comprise complex data structures.

In order to make the


system efficient in terms of retrieval of data, and reduce complexity in
terms of usability of users, developers use abstraction i.e. hide irrelevant
details from the users. This approach simplifies database design.

Level of Abstraction in a DBMS


There are mainly 3 levels of data abstraction:

● Physical or Internal Level

● Logical or Conceptual Level

● View or External Level

Data abstraction in DBMS plays a crucial role in simplifying and securing data
access while managing its complexity. It essentially hides the intricacies of
how data is stored and accessed from users, providing a clear and concise
interface for interaction. Here's a breakdown of its key aspects:

Levels of Abstraction:

● Physical Level: The lowest level, dealing with the actual physical
storage of data like disk blocks and pointers. Users have no direct
access to this level.
● Logical Level: Defines the overall database structure, including tables,
columns, data types, and relationships. Users interact with this level
through queries and data manipulation languages (DMLs).
● View Level: Presents customized subsets of data based on specific user
needs and access privileges. This allows different users to see different
versions of the same data, enhancing security and privacy.

Benefits of Data Abstraction in DBMS:

● Simplified interaction: Users don't need to understand the underlying


storage mechanisms, making data access easier and more intuitive.
● Improved security: Encapsulation restricts unauthorized access to
specific data elements or tables.
● Enhanced maintainability: Changes to the physical data storage can be
implemented without affecting the logical level, simplifying database
updates.
● Data independence: Applications are insulated from changes in the
physical or logical structure of the database, improving code stability
and reusability.

Data independence is a crucial concept in database management systems


(DBMS) that refers to the ability to modify one level of the database schema
without affecting the next higher level. This separation between different levels
of data organization offers several key benefits:

Types of Data Independence:

● Physical Data Independence: This occurs when changes to the physical


storage structure of data (e.g., file sizes, indexes) do not impact the
logical view of the data accessed by users and applications. This allows
efficient storage optimization without rewriting queries or application
logic.
● Logical Data Independence: This refers to the ability to modify the
logical schema (e.g., adding/removing columns, tables) without affecting
existing applications. This makes database evolution more flexible and
minimizes application code changes.

Benefits of Data Independence:

● Increased flexibility and agility: Databases can adapt to changing data


needs without disrupting applications, enabling faster response to
evolving business requirements.
● Improved developer productivity: Applications written against a stable
logical schema require fewer modifications when the physical storage or
internal database structure changes.
● Enhanced data integrity and security: Data access controls and
constraints defined at the logical level remain independent of physical
storage specifics, promoting data consistency and protection.
● Simplified database administration: Changes to the physical level can
be implemented transparently, minimizing the need for extensive
schema modifications at higher levels.

Data Definition Language (DDL) Explained

Data Definition Language (DDL) is a subset of SQL used to create, modify, and
delete database objects. Think of it as the architect blueprints for your
database, defining the structure and organization of your data. Unlike Data
Manipulation Language (DML) which focuses on retrieving and manipulating
data, DDL deals with the "what" rather than the "how" of your data stored.

Here's a breakdown of DDL's key functionalities:

Object Creation:

● CREATE TABLE: Defines the structure of a table, specifying column


names, data types, constraints, and relationships.
● CREATE INDEX: Speeds up data retrieval by organizing data based on
frequently used query criteria.
● CREATE USER: Establishes user accounts with defined roles and
access privileges for secure data management.
● CREATE VIEW: Presents customized subsets of data based on specific
user needs and access limitations.

Object Modification:

● ALTER TABLE: Modifies an existing table structure, like


adding/removing columns, changing data types, or updating
constraints.
● ALTER INDEX: Changes the structure or definition of an existing index
to optimize database performance.
● ALTER USER: Updates user roles, passwords, or access privileges to
maintain data security.

Object Deletion:

● DROP TABLE: Permanently removes a table and its associated data


from the database.
● DROP INDEX: Deletes an existing index to free up storage space or
optimize database performance.
● DROP USER: Removes a user account and its access privileges from
the database.

Benefits of using DDL:

● Organized and consistent data structures: Ensures clear and defined


data organization, facilitating efficient data access and manipulation.
● Improved data security: Enables granular control over user access and
data visibility, safeguarding sensitive information.
● Flexible database evolution: Allows modifying the database structure
without impacting existing applications, promoting easy data schema
updates.
● Simplified data management: Provides a standardized way to define and
manage database objects, reducing complexity and maintenance
overhead.

Data Manipulation Language (DML) is your magic wand for interacting with the
actual data stored within your database. It's the counterpart to DDL, which
focuses on defining the "what" (database structure), while DML deals with the
"how" (manipulating data). Think of it as the instructions you give your
database to retrieve, insert, update, or delete data.

Here's a breakdown of DML's core functionalities:

● SELECT: This retrieves data from one or more tables based on specified
criteria. You can filter, sort, and aggregate data to extract valuable
insights.
● INSERT: This adds new rows of data into a table, following the defined
schema and constraints.
● UPDATE: This modifies existing data in a table, changing specific values
or columns based on conditions.
● DELETE: This removes unwanted rows of data from a table permanently.

Benefits of using DML:

● Efficient data access and manipulation: DML provides powerful


commands to extract, add, modify, and remove data with precision and
ease.
● Flexible data analysis: You can filter, sort, and combine data from
different tables to answer complex questions and gain valuable insights.
● Simplified data management: DML offers a standardized way to interact
with your data, reducing operational complexity and maintenance
overhead.
● Increased productivity: Developers and analysts can focus on
data-driven tasks without needing to write low-level data access code.

Real-world examples of DML in action:

● Running a query to find all customers who made purchases in the last
month.
● Adding a new employee record to a company database.
● Updating the price of a product in an online store.
● Deleting outdated order records from a database.

Data Models: A Comparative Look

Data models are blueprints for organizing and accessing data in databases.
Here's a comparison of four key models:

1. Entity-Relationship Model (ER Model):

● Focus: Representing real-world entities and their relationships.


● Components:
○ Entities: Real-world objects (e.g., customers, products).
○ Attributes: Properties of entities (e.g., customer name, product
price).
○ Relationships: Connections between entities (e.g., "purchases").
● Advantages: Easy to understand, good for conceptual design.
● Disadvantages: Doesn't directly translate to physical database
structures, limited for complex relationships.

2. Network Model:

● Focus: Representing data as a network of records with linked pointers.


● Components:
○ Record types: Groups of similar records (e.g., customers, orders).
○ Sets: Relationships between record types (e.g., a customer places
an order).
○ Links: Pointers connecting records within sets.
● Advantages: Flexible for complex relationships, efficient for navigating
related data.
● Disadvantages: Difficult to understand and maintain, not widely used in
modern databases.

3. Relational Model:

● Focus: Organizing data in tables with rows and columns.


● Components:
○ Tables: Collections of related data points (e.g., customers table,
orders table).
○ Columns: Attributes of the data (e.g., customer name, order date).
○ Rows: Records of individual data points (e.g., specific customer
information, specific order details).
○ Relationships: Defined through foreign keys connecting related
tables.
● Advantages: Dominant model in modern databases, simple to
understand and manage, supports efficient data retrieval and
manipulation.
● Disadvantages: Less flexible for complex relationships than network
model, data normalization can be complex.

4. Object-Oriented Data Model (OODM):

● Focus: Combining data and operations into self-contained objects.


● Components:
○ Objects: Encapsulated data and functions acting on that data.
○ Classes: Templates for creating objects.
○ Inheritance: Ability of objects to inherit properties and behavior
from other classes.
● Advantages: Natural fit for object-oriented programming, flexible and
powerful for complex data structures.
● Disadvantages: Not all DBMS support true OODM, learning curve can be
steeper than relational models.

Choosing the right model:

The best model depends on your specific needs and data complexity.

● ER model: Good for understanding data relationships and designing


conceptual database models.
● Network model: Useful for very complex relationships, but not widely
used anymore.
● Relational model: Dominant choice for most business-oriented
databases, due to its simplicity and efficiency.
● OODM model: Ideal for applications with complex data structures and
object-oriented programming.

Integrity constraints are the rules that ensure the validity, consistency, and
accuracy of data within a database. They act as safeguards, preventing invalid
data from entering the system, and maintaining the logical relationships
between various data elements.

Here's a breakdown of different types of integrity constraints:

1. Domain Constraints:
● These define the valid values that can be stored in a specific column.
For example, a "customer age" column might only allow values between
1 and 120.
● Types:
○ Data type constraints: Specify the data type like integer, string,
date, etc.
○ Range constraints: Limit the range of acceptable values (e.g., age
between 18 and 65).
○ Check constraints: Define custom validation rules for specific
data formats or patterns.

2. Entity Integrity Constraints:

● These ensure that each row in a table has a unique identifier to


distinguish it from other rows.
● Types:
○ Primary key: A unique column or set of columns that uniquely
identifies each row in a table.
○ Candidate key: Any column or set of columns that could
potentially serve as a primary key.

3. Referential Integrity Constraints:

● These enforce relationships between tables by ensuring that foreign key


values in one table reference existing primary key values in another
related table.
● Types:
○ Foreign key: A column in one table that references the primary
key of another table.
○ Referential actions: Define what happens to rows in a child table
when the referenced row in the parent table is deleted or updated
(e.g., cascade, set null, restrict).

Benefits of Integrity Constraints:

● Improved data quality: By preventing invalid data entry, constraints


ensure accuracy and consistency, leading to reliable data analysis and
decision-making.
● Enhanced data relationships: Referential constraints maintain the
integrity of relationships between tables, preventing orphaned or
inconsistent data.
● Simplified data management: Constraints automate data validation and
enforcement, reducing manual effort and potential errors.

Examples of Integrity Constraints:

● A "product_id" column in an "orders" table must reference a valid


"product_id" in the "products" table (referential integrity).
● A "customer_email" column must be unique and in a valid email format
(domain and entity integrity).

ata manipulation operations are the actions you take to interact with the data
stored in a database. These operations cover a wide range of tasks, from
simply retrieving specific data points to transforming and analyzing entire
datasets.

Here's a breakdown of some key data manipulation operations:

Basic Operations:

● Read (SELECT): This retrieves data from one or more tables based on
specified criteria. You can filter, sort, and aggregate data to extract
valuable insights.
● Create (INSERT): This adds new rows of data into a table, following the
defined schema and constraints.
● Update (UPDATE): This modifies existing data in a table, changing
specific values or columns based on conditions.
● Delete (DELETE): This removes unwanted rows of data from a table
permanently.

Advanced Operations:

● Join: Combines data from multiple tables based on related columns,


allowing you to analyze data across different entities.
● Union/Intersection/Difference: Perform set operations on tables to
identify overlapping or unique data points.
● Aggregation: Summarizes data using functions like SUM, AVG, COUNT,
etc., providing insights into overall trends and patterns.
● Subqueries: Nested queries that leverage the results of one query within
another, enabling complex data analysis tasks.
● Data Transforms: Modify existing data by applying various calculations,
formatting, or manipulations to create new derived data points.
Unit 3

Storage strategies in dbms

Storage strategies in DBMS (Database Management Systems) are crucial for


optimizing data accessibility, performance, and cost.A database system
provides an ultimate view of the stored data. However, data in the form of bits,
bytes get stored in different storage devices.

Types of Data Storage

For storing the data, there are different types of storage options available. These
storage types differ from one another as per the speed and accessibility. There are
the following types of storage devices used for storing the data:

○ Primary Storage

○ Secondary Storage

○ Tertiary Storage
Primary Storage( RAM)

Fastest access, but volatile and expensive. Used for active data sets in use.

It is the primary area that offers quick access to the stored data. We also know the
primary storage as volatile storage. It is because this type of memory does not
permanently store the data. As soon as the system leads to a power cut or a crash,
the data also get lost. Main memory and cache are the types of primary storage.

○ Main Memory: It is the one that is responsible for operating the data that is
available by the storage medium. The main memory handles each instruction
of a computer machine. This type of memory can store gigabytes of data on
a system but is small enough to carry the entire database. At last, the main
memory loses the whole content if the system shuts down because of power
failure or other reasons.

1. Cache: It is one of the costly storage media. On the other hand, it is the
fastest one. A cache is a tiny storage media which is maintained by the
computer hardware usually. While designing the algorithms and query
processors for the data structures, the designers keep concern on the cache
effects.
● Secondary Storage (Hard Disk Drives, SSDs): Slower access than RAM,
but persistent and more affordable. Used for storing large datasets.

Secondary storage is also called as Online storage. It is the storage area that allows
the user to save and store data permanently. This type of memory does not lose the
data due to any power failure or system crash. That's why we also call it
non-volatile storage.

There are some commonly described secondary storage media which are available
in almost every type of computer system:

○ Flash Memory: A flash memory stores data in USB (Universal Serial Bus)
keys which are further plugged into the USB slots of a computer system.
These USB keys help transfer data to a computer system, but it varies in size
limits. Unlike the main memory, it is possible to get back the stored data
which may be lost due to a power cut or other reasons. This type of memory
storage is most commonly used in the server systems for caching the
frequently used data. This leads the systems towards high performance and
is capable of storing large amounts of databases than the main memory.

○ Magnetic Disk Storage: This type of storage media is also known as online
storage media. A magnetic disk is used for storing the data for a long time. It
is capable of storing an entire database. It is the responsibility of the
computer system to make availability of the data from a disk to the main
memory for further accessing. Also, if the system performs any operation
over the data, the modified data should be written back to the disk. The
tremendous capability of a magnetic disk is that it does not affect the data
due to a system crash or failure, but a disk failure can easily ruin as well as
destroy the stored data.

Tertiary Storage (Tape Drives, Cloud Storage): Very slow access, but extremely
inexpensive. Used for long-term archival purposes

It is the storage type that is external from the computer system. It has the slowest
speed. But it is capable of storing a large amount of data. It is also known as Offline
storage. Tertiary storage is generally used for data backup. There are following
tertiary storage devices available:

○ Optical Storage: An optical storage can store megabytes or gigabytes of


data. A Compact Disk (CD) can store 700 megabytes of data with a playtime
of around 80 minutes. On the other hand, a Digital Video Disk or a DVD can
store 4.7 or 8.5 gigabytes of data on each side of the disk.

○ Tape Storage: It is the cheapest storage medium than disks. Generally, tapes
are used for archiving or backing up the data. It provides slow access to data
as it accesses data sequentially from the start. Thus, tape storage is also
known as sequential-access storage. Disk storage is known as direct-access
storage as we can directly access the data from any location on disk.

Indexes play a crucial role in DBMS performance by optimizing data retrieval.


They're like detailed roadmaps in your database, helping queries reach the
desired data much faster. Here's a breakdown of how they work:

What are Indices?

Imagine sorting all the books in a library by author's name instead of browsing
each shelf randomly. An index in a DBMS does something similar. It acts as a
sorted data structure based on specific columns, allowing for rapid
identification and retrieval of data rows that match a query's criteria.

Benefits of using Indices:

● Faster query execution: Especially for queries involving the indexed


column(s), the sorted structure enables rapid identification and retrieval
of relevant data rows.
● Improved performance for specific operations: Filtering, sorting, and
joining based on indexed columns become significantly faster.
● Reduced overall workload: By minimizing disk access for frequent
queries, the entire database system operates more efficiently.

B-trees are a self-balancing tree data structure designed for efficient data
storage and retrieval, particularly in databases. They offer several advantages
over simpler tree structures like binary search trees, especially when dealing
with large datasets
are a self-balancing tree data structure designed for efficient data storage and
retrieval, particularly in databases. They offer several advantages over simpler
tree structures like binary search trees, especially when dealing with large
datasets

The limitations of traditional binary search trees can be frustrating. Meet


the B-Tree, the multi-talented data structure that can handle massive
amounts of data with ease. When it comes to storing and searching large
amounts of data, traditional binary search trees can become impractical
due to their poor performance and high memory usage. B-Trees, also
known as B-Tree or Balanced Tree, are a type of self-balancing tree that
was specifically designed to overcome these limitations.

Time Complexity of B-Tree:

Algorith
Sr. No. Time Complexity
m
1. Search O(log n)

2. Insert O(log n)

3. Delete O(log n)

Here's a breakdown of what makes B-trees special:

Key Features of B-trees:

● Multiple children per node: Unlike binary search trees which have at
most two child nodes, B-tree nodes can have a minimum and maximum
number of children (often denoted by "t"). This allows for storing more
data points in each node and reducing overall tree height.
● Balanced structure: B-trees automatically adjust their structure to
maintain a roughly consistent height across the tree. This ensures
efficient searches, regardless of the data distribution, because the
number of levels to traverse remains predictable.
● Ordered data: Data within each node is kept sorted in ascending order.
This facilitates faster searching by quickly narrowing down the potential
location of the target data point.
● Dynamic insertion and deletion: B-trees can efficiently handle data
insertion and deletion without compromising the balanced structure.
They automatically redistribute data or split/merge nodes to maintain
order and search performance












● .

Hashing in DBMS plays a crucial role in optimizing data access and retrieval.
It's a powerful technique that leverages hash functions to transform large,
variable-length data into short, fixed-length strings called hash values. These
values essentially act as fingerprints for your data, enabling quick
identification and comparison, especially within large datasets.

Static Hashing
Dynamic Hashing
○ The dynamic hashing method is used to overcome the problems of static
hashing like bucket overflow.

○ In this method, data buckets grow or shrink as the records increases or


decreases. This method is also known as Extendable hashing method.

○ This method makes hashing dynamic, i.e., it allows insertion or deletion


without resulting in poor performance.

Static Hashing:

● Concept: The number of hash buckets and the hash function are fixed at
the time the hash table is created. Data is evenly distributed across the
pre-defined number of buckets based on their hash values.
● Advantages:
○ Simple and efficient: Easy to implement and understand, offering
predictable performance for operations like insertion and search.
○ Less overhead: Requires minimal memory and processing
resources for maintenance.
○ Suitable for static datasets: Works well for situations where the
data size and access patterns are relatively stable.
● Disadvantages:
○ Performance bottleneck: Can suffer from collisions and
performance degradation as the data grows and fills up buckets
unevenly.
○ Limited scalability: Difficult to adapt to changes in data size or
access patterns, requiring rebuilding the entire hash table if
significant changes are needed.
○ Wasteful space: May lead to empty buckets if the data distribution
is uneven, potentially wasting storage space.

Dynamic Hashing:

● Concept: The number of hash buckets can dynamically adjust as the


data volume changes. The hash function may also be modified based on
the current distribution of data to minimize collisions.
● Advantages:
○ Highly scalable: Adapts readily to changes in data size and
access patterns, reducing performance bottlenecks and
maintaining efficient operations.
○ Better collision handling: Employs various techniques to handle
collisions and distribute data evenly across buckets, even with
uneven growth.
○ More efficient space utilization: Minimizes wasted space by
dynamically allocating and deallocating buckets based on actual
data needs.
● Disadvantages:
○ More complex: Requires more sophisticated algorithms and
implementation than static hashing, potentially increasing
maintenance overhead.
○ Potentially slower insertion/deletion: Dynamic adjustments to the
hash table structure can introduce additional processing
overhead for inserting or deleting data compared to static
hashing.
○ Less predictable performance: Performance may vary depending
on the data distribution and chosen dynamic hashing algorithm.
Unit 4
Concurrency control is a crucial aspect of database management systems
(DBMS) that ensures orderly and consistent access to data by multiple users
or processes at the same time. Without it, chaos would ensue, leading to data
corruption, inconsistencies, and unreliable results.

Concurrency Control Protocols


Concurrency control protocols are the set of rules which are maintained in
order to solve the concurrency control problems in the database. It
ensures that the concurrent transactions can execute properly while
maintaining the database consistency. The concurrent execution of a
transaction is provided with atomicity, consistency, isolation, durability,
and serializability via the concurrency control protocols.

● Locked based concurrency control protocol

● Timestamp based concurrency control protocol

Locked based Protocol

In locked based protocol, each transaction needs to acquire locks before


they start accessing or modifying the data items. There are two types of
locks used in databases.

Shared Lock : Shared lock is also known as read lock which allows

multiple transactions to read the data simultaneously. The transaction

which is holding a shared lock can only read the data item but it can not

modify the data item.

Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive
lock allows a transaction to update a data item. Only one transaction can
hold the exclusive lock on a data item at a time.
Two-Phase Locking Protocol (2PL) – a cornerstone of transaction processing
in database management systems! It's one of the most widely used
concurrency control mechanisms for ensuring data consistency and
preventing interference between concurrent transactions accessing the same
data.

Here's a breakdown of how 2PL works:

Phases of the Protocol:

1. Growing Phase : In this phase, the transaction starts acquiring locks

before performing any modification on the data items. Once a

transaction acquires a lock, that lock can not be released until the

transaction reaches the end of the execution.

2. Shrinking Phase : In this phase, the transaction releases all the

acquired locks once it performs all the modifications on the data

item. Once the transaction starts releasing the locks, it can not

acquire any locks further

Lock Types:

● Shared Lock: Allows other transactions to read the data, but prevents
them from modifying it.
● Exclusive Lock: Allows only the holding transaction to both read and
write the data, blocking other transactions from accessing it in any way.

Benefits of 2PL:

● Guarantees serializability: Ensures that concurrent transactions appear


to execute one after the other, even though they are actually running
concurrently. This prevents conflicting modifications and maintains data
consistency.
● Deadlock freedom: Guarantees that transactions will not deadlock (a
situation where two or more transactions wait indefinitely for locks held
by each other). This keeps the system running smoothly and avoids
deadlocks that require manual intervention.
● Simple and predictable: Easy to understand and implement compared to
other concurrency control mechanisms like timestamps.

Timestamp based Protocol

● In this protocol each transaction has a timestamp attached to it.

Timestamp is nothing but the time in which a transaction enters

into the system.

● The conflicting pairs of operations can be resolved by the

timestamp ordering protocol through the utilization of the

timestamp values of the transactions. Therefore, guaranteeing

that the transactions take place in the correct order.

Advantages of Concurrency
In general, concurrency means, that more than one transaction can work
on a system. The advantages of a concurrent system are:

● Waiting Time: concurrency leads to less waiting time.

● Response Time:, concurrency leads to less Response Time.

● Resource Utilization: concurrency leads to more Resource

Utilization.

● Efficiency: Concurrency leads to more Efficiency.

Disadvantages of Concurrency
● Overhead: Implementing concurrency control requires additional

overhead, such as acquiring and releasing locks on database

objects. This overhead can lead to slower performance and


increased resource consumption, particularly in systems with

high levels of concurrency.

● Deadlocks: Deadlocks can occur when two or more transactions

are waiting for each other to release resources, causing a circular

dependency that can prevent any of the transactions from

completing. Deadlocks can be difficult to detect and resolve, and

can result in reduced throughput and increased latency.

● Reduced concurrency: Concurrency control can limit the number

of users or applications that can access the database

simultaneously. This can lead to reduced concurrency and slower

performance in systems with high levels of concurrency.

● Complexity: Implementing concurrency control can be complex,

particularly in distributed systems or in systems with complex

transactional logic. This complexity can lead to increased

development and maintenance costs.

● Inconsistency: In some cases, concurrency control can lead to

inconsistencies in the database. For example, a transaction that is

rolled back may leave the database in an inconsistent state, or a

long-running transaction may cause other transactions to wait for

extended periods, leading to data staleness and reduced

accuracy.
ACID property
A transaction is a single logical unit of work that accesses and possibly
modifies the contents of a database. Transactions access data using read
and write operations.
In order to maintain consistency in a database, before and after the
transaction, certain properties are followed. These are called ACID
properties.

Atomicity: Imagine a bank transfer. Atomicity guarantees that either the entire
transfer happens successfully (money deducted from sender, credited to
receiver) or not at all. No partial transfers! This prevents inconsistent states
and incomplete changes.
Consistency: Think of updating a shopping cart. Consistency ensures that the
database remains in a valid state after a transaction. For example, updating
product availability only after successfully deducting the quantity from
inventory maintains consistency.

example above,

The total amount before and after the transaction must be maintained.

Total before T occurs = 500 + 200 = 700.

Total after T occurs = 400 + 300 = 700.

Therefore, the database is consistent. Inconsistency occurs in case T1


completes but T2 fails. As a result, T is incomplete.

Isolation: Picture multiple users booking movie tickets at the same time.
Isolation ensures that one user's booking doesn't interfere with another's.
Even if multiple bookings happen concurrently, each appears to complete in
its own isolated environment, preventing overbooking or inconsistent seat
allocations.
Durability: Imagine a power outage during a purchase. Durability guarantees
that once a transaction is committed (successfully completed), its changes are
permanently stored in the database, even if the system crashes. No more
worrying about lost data!

serializability in scheduling – a crucial concept in database management


systems (DBMS) that ensures the correctness of concurrent transactions.

Imagine multiple transactions accessing and modifying data in a database at


the same time. Serializability guarantees that the outcome of these concurrent
executions is the same as if they had happened one after the other, in a
specific order (serial schedule). This ensures data consistency and prevents
anomalies from arising.

Formalizing the concept:

● Two schedules are considered equivalent if they produce the same final
database state.
● A schedule is serializable if it is equivalent to some serial schedule.
● A schedule represents the operations from different transactions in a
concurrent run.

Types of Serializability:

● Strict serializability: The most restrictive form, ensuring the final state is
identical to a serial schedule where transactions are executed in order
of their start times.
● Conflict serializability: Allows more flexibility, as long as conflicting
operations from different transactions appear in the same relative order
in all equivalent serial schedules.

Benefits of Serializability:

● Data integrity: Guarantees accurate and consistent database state, even


with concurrent transactions.
● Predictable behavior: Developers can write code with confidence,
knowing the outcomes of transactions regardless of their interleaving.
● Simplified debugging: Makes it easier to diagnose and fix issues in
concurrent programs, as the behavior is equivalent to a sequential
execution.

Both multi-version and optimistic concurrency control (OCC) schemes are


alternative approaches to traditional locking mechanisms in database
management systems (DBMS). They offer different ways to manage concurrent
access to data while still maintaining data integrity and consistency. Let's dive
into their key characteristics:

Multi-version Concurrency Control (MVCC):

● Concept: Instead of locking data elements, MVCC maintains multiple


versions of data as transactions modify it. Each version is associated
with a specific timestamp, representing the time of the transaction that
created it.
● Strengths:
○ Increased concurrency: By eliminating explicit locking, MVCC
allows more concurrent reads and writes compared to locking
schemes.
○ Read-write isolation: Reads never block writes and vice versa,
ensuring efficient concurrent access.
○ No deadlocks: MVCC inherently avoids deadlocks, simplifying
concurrency management.
● Weaknesses:
○ Increased storage overhead: Maintaining multiple versions of data
can consume more storage space.
○ Complex implementation: MVCC algorithms require careful design
and implementation to ensure consistency and avoid anomalies.
○ Potentially slower writes: Version management can introduce
additional overhead for write operations compared to locking.
Optimistic Concurrency Control (OCC):

● Concept: Unlike locking or MVCC, OCC assumes transactions will not


conflict with each other and allows them to proceed without acquiring
locks or maintaining versions. Conflicts are detected and resolved only
when transactions attempt to commit.
● Strengths:
○ Improved performance: No locking overhead leads to potentially
faster transaction execution, especially for read-heavy workloads.
○ Simple implementation: OCC requires minimal changes to
existing database systems compared to locking or MVCC.
○ Scalable: Can handle a large number of concurrent transactions
efficiently.
● Weaknesses:
○ Increased conflict detection overhead: Validating all changes
during commit can be expensive, especially with frequent
conflicts.
○ Potentially higher abort rates: Conflicts can lead to transaction
aborts and re-executions, impacting performance.
○ Not suitable for all scenarios: May not be ideal for applications
with high write contention or long-running transactions.
Multi-version Concurrency Control Optimistic Concurrency Control
(MVCC): (OCC):

● MVCC: Maintains multiple ● OCC: Assumes transactions


versions of data, each are conflict-free until commit
associated with the modifying time. Conflicts are detected
transaction's timestamp. This and resolved (usually by
allows read-write isolation aborting one transaction)
without blocking. during commit validation.

● MVCC: Maintains multiple ● OCC: Operates on the current


versions of data, each single version of data. No
associated with the modifying versioning overhead, but
transaction's timestamp. This potential for lost updates or
allows read-write isolation phantom reads if conflicts
without blocking. occur.

● MVCC: Inherently avoids


deadlocks due to timestamp ● OCC: Possible to have
ordering and versioning. No deadlocks if transactions wait
waiting for locks. for each other during conflict
resolution.

● MVCC: May have higher


storage overhead due to
maintaining multiple data ● OCC: Has minimal overhead as
versions. Implementing and it doesn't require locks or
maintaining the versioning versioning. However, conflict
logic can also be more detection and resolution at
complex. commit can be expensive,
especially with frequent
conflicts.
● MVCC: Can offer higher
concurrency for read-heavy
workloads due to concurrent ● OCC: Can offer faster initial
reads and writes. Writes might execution due to no locking
be slower due to version overhead. However, frequent
management. conflicts leading to aborts and
● MVCC: Suitable for re-executions can impact
applications with frequent performance.
● OCC: Suitable for applications
reads and writes, high with read-heavy workloads,
concurrency needs, and no simple transactions, and
tolerance for deadlocks. potentially high conflict
tolerance.

Feature Multi-version Optimistic Concurrency


Concurrency Control Control (OCC)
(MVCC)

Conflict Resolution At read time through At commit time


versioning

Data Management Multiple data Single data version


versions with
timestamps

Deadlocks No deadlocks Possible deadlocks


possible during conflict
resolution

Overhead Higher storage and Lower overhead, no


implementation locks or versions
overhead

Performance High concurrency for Faster initial


reads and writes, execution, slower
slower writes with frequent
conflicts

Suitability Frequent Read-heavy


reads/writes, high workloads, simple
concurrency, transactions, conflict
deadlock avoidance tolerance
Database recovery. Database recovery encompasses a set of techniques and
procedures used to restore a database to a consistent and valid state after a
failure or error. It's like a safety net, ensuring your critical data remains
accessible and reliable even in the face of unexpected event.
Database Systems like any other computer system, are subject to failures
but the data stored in them must be available as and when required.
When a database fails it must possess the facilities for fast recovery. It
must also have atomicity i.e. either transactions are completed
successfully and committed (the effect is recorded permanently in the
database) or the transaction should have no effect on the database

Here's a breakdown of its core concepts:

Types of Failures:

● Hardware failures: Disk crashes, power outages, etc.


● Software errors: Bugs, configuration issues, application crashes, etc.
● Human errors: Accidental data deletion, incorrect updates, etc.
● Natural disasters: Floods, earthquakes, etc.

Types of Recovery Techniques in DBMS

● Rollback/Undo Recovery Technique

● Commit/Redo Recovery Technique

Rollback/Undo Recovery: This technique focuses on reversing unwanted


changes made to the database. Think of it like hitting the "undo" button.

Transaction logs track changes made to the database. By replaying the redo
log from the point of failure, the database can be brought back to a consistent
state
● Benefits:
○ Efficient recovery: Undoing changes directly is often faster than
replaying redo logs, especially for short-lived transactions.
○ Minimizes data loss: Only unwanted changes are reversed,
potentially preserving some recent data compared to restoring
from a backup.
○ Easy to understand: The concept of undoing actions is intuitive
and easy to comprehend.

● Drawbacks:
○ Increased overhead: Maintaining undo logs adds overhead to the
system, consuming storage space and requiring processing
power to keep them updated.
○ Limited effectiveness: Undo logs typically only store information
for recent transactions. Recovering from older failures might
require other techniques like backups.

Commit/Redo Recovery: This technique focuses on replaying successful


transactions to bring the database back to a consistent state. Imagine
revisiting and applying all the completed steps (commits) to ensure everything
is up-to-date.

● Benefits:
○ Guaranteed consistency: Redo logs ensure that only successful
transactions are applied, guaranteeing data integrity and
consistency even after failures.
○ Scalability: Redo logs can be large enough to store information
for all recent transactions, allowing recovery from older failures
compared to undo logs.
○ Efficient for long-lived transactions: Replaying committed
changes can be faster than reversing a large number of undo
operations for complex or long-running transactions.
● Drawbacks:
○ Increased overhead: Redo logs can be large and require
significant storage space and processing power to manage.
○ Potential data loss: If a failure occurs before the transaction is
logged, its changes might be lost and need to be recovered from
backups.
○ More complex: The concept of replaying logs might be less
intuitive compared to simply undoing actions.
Feature Rollback/Undo Commit/Redo

Focus Reverses unwanted changes Replays successful


within transactions transactions for state
consistency

Data Discards uncommitted Only applies committed


Handli changes, restores previous changes, maintains
ng state integrity

Efficie Fast for short transactions, Faster for long


ncy error correction transactions, consistent
state

Data Less loss (undoes unwanted Potential loss if failure


Loss changes) before log

Overhe Lower (undo logs) Higher (redo logs, resource


ad usage)

Applic Error correction, isolated Data consistency, major


ation rollbacks, storage failures, complex
limitations transactions
Analog Discarding mistaken batter Recreating a perfectly
y before baking baked cake after oven
failure

Unit 5
Database security encompasses everything you do to safeguard your
database from unauthorized access, malicious attacks, and accidental or
intentional damage. It's like a sturdy vault securing your most valuable
information, ensuring its confidentiality, integrity, and availability.

Here are the key pillars of database security:

1 Authentication: Verifying the identity of users trying to access the database.

2 Authorization: Determining what actions each user can perform based on


their roles and permissions.

Authentication:

● Concept: Verifying the identity of someone trying to access your


system. Think of it like checking IDs at a nightclub.
● Methods: Passwords, PINs, biometrics (fingerprints, facial recognition),
multi-factor authentication (combining different methods).
● Importance: Ensures only authorized individuals can access your
system, preventing unauthorized access and potential attacks.

Authorization:
● Concept: Determining what actions a user can perform after they've
been authenticated. Think of it like assigning roles and permissions in a
team project.
● Techniques: Access control lists (ACLs), user roles and groups,
resource-based access control (RBAC) where permissions are assigned
based on specific resources.
● Importance: Limits user actions based on their roles and needs,
preventing unauthorized modifications or misuse of data.

Access Control:

● Concept: Enforcing the rules established by authorization. Think of it as


a bouncer at the nightclub ensuring only those with proper permissions
can enter specific areas.
● Techniques: Firewalls, network segmentation, data encryption, and
software controls within applications.
● Techniques: Implemented through the DBMS itself, restricting user
access to specific data based on permissions
● Importance: Implements the limits set by authorization, preventing
unauthorized access to specific data or resources, even if a user might
be authenticated.

Feature Authentication Authorization Access Control

Concept Verifying user Defining user Enforcing


identity permissions permissions
within the
database

Analogy Checking library Assigning Librarian


card document editing restricting
rights access to certain
books

Methods Passwords, PINs, Database logins, ACLs, roles &


biometrics, MFA client groups, row-level
certificates, security
Kerberos

Importance Prevents Limits user Implements


unauthorized actions authorization
access restrictions

Focus Who can access What users can How permissions


the database do within the are enforced
database

Database Data sensitivity, Granular control, Database engine


Considerations application user roles, integration,
integration, audit dynamic row-level
trails permissions security

Examples SQL Server Granting Restricting


logins, Oracle SELECT, INSERT, access to
Database users UPDATE specific rows
permissions on based on user
tables attributes
DAC, MAC, and RBAC are three prominent access control models used to regulate user access
to resources, including databases. Each model operates with a distinct approach and
philosophy, offering varying levels of granularity and control. Here's a breakdown of their key
characteristics:

Discretionary Access Control (DAC):

Discretionary Access Control (DAC) is a popular access control model where


the owner of a resource (data, file, object) directly manages who can access it
and what actions they can perform. Think of it like the owner of a house
sharing their keys with specific guests and granting them permission to
specific rooms or objects within the house.

Strengths:

● Simple to implement: Users have direct control over their data and can
easily share it with others.
● User autonomy: Users can manage access based on their own needs
and preferences.
● Flexible: Can be adapted to various situations and user groups.

Weaknesses:

● Lack of centralized control: Difficult to enforce consistent security


policies across the system.
● Not scalable: Managing access becomes complex as the number of
users and resources grows.
● Limited accountability: Difficult to track who has access to what and
who granted it.

Mandatory Access Control (MAC):

● Concept: System enforces pre-defined security labels on resources and


user clearances. Think of it as government documents with classified
levels and authorized personnel clearance requirements.
● Strengths: High security, prevents unauthorized access to sensitive
data.
● Weaknesses: Complex to implement, inflexible, users have limited
control over access.
● Suitable for: Highly sensitive data, environments with strict security
regulations.
Mandatory Access Control (MAC) takes a radically different approach to
access control compared to DAC. where access to resources is governed by
pre-defined security labels assigned to both data and users based on
sensitivity levels. Unlike DAC's user-driven approach, MAC prioritizes strict
security and centralized control,

Strengths:

● High Security: Strict separation of data based on sensitivity levels


minimizes unauthorized access risks.
● Centralized Control: System administrators define and enforce security
policies, ensuring consistency and compliance.
● Minimal Misconfiguration Risk: User control over access is limited,
reducing the chance of accidental breaches.
● Auditable and Traceable: Access logs track user activity, facilitating
accountability and anomaly detection.

Weaknesses:

● Complex to Implement: Requires careful policy definition and system


configuration.
● Inflexible: Users have limited control over access, Anpassung an
diverse Datenempfindlichkeiten kann komplex sein.
● User Autonomy Limited: Users might feel restricted by pre-defined
access levels.
● Scalability Concerns: Implementing and managing MAC for large
systems can be challenging.

Role-Based Access Control (RBAC) takes a different direction than DAC and
MAC, offering a well-structured approach to access control. Imagine assigning
responsibilities and permissions based on roles in a play, with actors having
access to props and areas relevant to their assigned roles. RBAC works
similarly, assigning pre-defined roles with associated permissions to users,
granting access based on their assigned roles.

Here's a breakdown of RBAC's key characteristics:

Strengths:

● Granular Control: Permissions are defined at the role level, enabling


flexible and detailed access management.
● Efficient Management: Centralized control of roles simplifies managing
access for large user groups.
● Scalable: Adapts well to growing systems and complex user needs.
● Improved Accountability: Roles clearly define user responsibilities and
access privileges.

Weaknesses:

● Requires Careful Role Definition: Defining roles and permissions


effectively can be complex.
● Overlapping Permissions: Potential for confusion or misuse if roles
share similar permissions.
● Limited User Autonomy: Users have less control over access compared
to DAC.

SQL injection is a code injection technique that might destroy your


database.

SQL injection is one of the most common web hacking techniques.

SQL injection is the placement of malicious code in SQL statements,


via web page inpuT

SQL injection usually occurs when you ask a user for input, like their
username/userid, and instead of a name/id, the user gives you an
SQL statement that you will unknowingly run on your database
Role-Based Access Control (RBAC):

● Concept: Users are assigned roles with pre-defined permissions, access


is granted based on assigned roles. Think of it as assigning access
based on job titles (e.g., editor, manager, administrator) with specific
duties and permissions.
● Strengths: Granular control, efficient management, scalable for large
systems.
● Weaknesses: Requires careful role definition and maintenance, potential
for misuse if roles are poorly defined.
● Suitable for: Large organizations, centralized management,
environments with diverse user needs and data sensitivity levels.

You might also like