db assignment
db assignment
DBMSs have been developed to resolve the limitations of traditional data management systems.
Here are 10 key characteristics of a modern DBMS:
Data Integrity: DBMS ensures data accuracy and consistency, which is crucial for maintaining correct
information in multi-user environments.
Database architecture defines the framework that outlines how data is managed, stored, and
accessed within a database system. The architecture is typically layered to separate user
interactions, application logic, and data management, ensuring system efficiency and
flexibility. Common components include:
3. One-tier Architecture:
o All elements are on a single system; commonly used for simple, standalone
applications.
o Diagram: Show a single layer with the database and application on the same
level.
Two-tier Architecture:
o Involves a client and a server. The client interacts with the database via direct
SQL queries.
o Common in applications with a direct connection to the database, such as
desktop applications.
o Diagram: Show client and database on two separate layers.
Three-tier Architecture:
o Consists of three layers: the presentation (client), application (middleware),
and database (server).
o Supports complex applications, typically web-based, where the application
layer handles business logic.
o Diagram: Show three layers with client, application server, and database
server.
In one-tier architecture, the database, application logic, and user interface are all located on
the same system. This architecture is commonly used in standalone, personal applications
where the user directly interacts with the database on a local machine. Since all components
are on a single device, there’s no network communication required, making it fast but
limiting its scalability and suitability for multi-user environments.
Key Characteristics:
Diagram:
sql
Copy code
+---------------------+
| Application |
| (Database + Logic + |
| User Interface) |
+---------------------+
Key Characteristics:
Diagram:
arduino
Copy code
Client Side Server Side
+---------------+ +---------------+
| Client | | Database |
| Application | <---- Network ----> | Server |
+---------------+ +---------------+
1. Presentation Layer (Client): The user interface, like a web browser or desktop
application.
2. Application Layer (Middleware): Contains the business logic and processes data
requests.
3. Database Layer (Server): Manages data storage and retrieval.
This architecture is highly scalable and is ideal for complex applications, such as web
applications or enterprise systems, where multiple users access a database through a secure,
multi-layered process.
Key Characteristics:
Language Primarily used with languages like C, C++ Only for Java applications
Setup Requires the ODBC driver manager to be Works directly with JDBC
Requirements installed on the system drivers in Java applications
PL/SQL (Procedural
Examples Language/Structured Query Language), SQL (Structured Query Language)
T-SQL
Gives explicit control over how queries Focuses only on desired outcomes
Control
are processed without specifying procedures
Aspect Procedural DML Non-Procedural DML
Common Useful for complex tasks like loops, Used for straightforward querying
Usage conditional statements, and transactions and data manipulation
Execution Can be slower if complex logic adds Often faster since the database
Speed overhead engine optimizes the operations
b. Client-Server Database
Definition: A client-server database divides the database application into two parts:
the client (front-end) and the server (back-end). The client typically handles the user
interface and interacts with the server, which manages data storage and processing.
Characteristics:
o Clients send requests to the server, which processes data and returns results.
o Data management, storage, and processing are handled by the server, while the
client handles user interactions.
Advantages:
o Improved security and centralized control of the database on the server side.
o Better performance and reliability for multi-user environments.
Disadvantages:
o Requires network infrastructure, which could introduce latency.
o Potential for server overload if too many clients access it simultaneously.
c. Parallel Database
d. Distributed Database
Qn 6 a. Inner Join
An inner join returns only the rows where there is a match between both tables based on the
specified condition. If there is no match, the row is excluded from the results.
Example
Employees
1 Alice 101
2 Bob 102
3 Charlie 103
4 David NULL
Departments
DepartmentID DepartmentName
101 HR
102 IT
103 Finance
104 Marketing
SQL Query:
sql
Copy code
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name DepartmentName
Alice HR
Name DepartmentName
Bob IT
Charlie Finance
b. Outer Join
An outer join returns all the rows from one or both tables, even if there is no match. There
are three types of outer joins: left join, right join, and full join.
i. Left Join
A left join (or left outer join) returns all rows from the left table and the matching rows from
the right table. If there is no match, the result will include NULL values for the columns from
the right table.
SQL Query:
sql
Copy code
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name DepartmentName
Alice HR
Bob IT
Charlie Finance
David NULL
In this example, David is included even though he does not have a matching department in
the Departments table. The DepartmentName column shows NULL for him.
ii.Right Join
A right join (or right outer join) returns all rows from the right table and the matching rows
from the left table. If there is no match, the result will include NULL values for the columns
from the left table.
SQL Query:
sql
Copy code
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
RIGHT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name DepartmentName
Alice HR
Bob IT
Charlie Finance
NULL Marketing
Here, the Marketing department appears in the result, even though no employees are assigned
to it. The Name column shows NULL for this department.
A full join (or full outer join) returns all rows when there is a match in either table. Rows
without matches in either table will show NULL for the columns of the non-matching table.
SQL Query:
sql
Copy code
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
FULL JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name DepartmentName
Alice HR
Bob IT
Charlie Finance
David NULL
NULL Marketing
Both David and the Marketing department are included in the results. David has no
department, so DepartmentName is NULL for him, and no employee is assigned to
Marketing, so Name is NULL for the Marketing department.
c. Natural Join
A natural join automatically joins tables based on columns with the same names and
compatible data types in both tables. It doesn’t require specifying the join condition
explicitly, as it assumes the matching is based on columns with identical names.
Example
Given the same Employees and Departments tables, a natural join will automatically use the
DepartmentID column to join these tables, as it exists in both.
SQL Query:
sql
Copy code
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
NATURAL JOIN Departments;
Result:
Name DepartmentName
Alice HR
Bob IT
Charlie Finance
Name DepartmentName
In this case, the result is the same as an inner join because DepartmentID is the only common
column name in both tables. David and the Marketing department are excluded because there
is no matching DepartmentID in one of the tables.
b. Object-Oriented Model
c. Network Model
d. Hierarchical Model
Definition: The hierarchical data model organizes data in a tree-like structure, where
each record has a single parent, and records are connected by parent-child
relationships. This model was widely used in early database systems such as IBM’s
IMS.
Components:
o Nodes: Represent individual records in the hierarchy.
o Parent-Child Relationships: Define the hierarchical structure; each child has
a single parent, but each parent can have multiple children.
o Root Node: The topmost record in the hierarchy from which all records
descend.
Advantages:
o Efficient for representing and querying hierarchical data, such as
organizational structures or file systems.
o Simple to navigate as relationships are predefined.
Disadvantages:
o Limited flexibility, as data relationships must conform to a strict hierarchy.
o Inefficient for many-to-many relationships, requiring redundant data or
complex workarounds to represent them.
Qn 8. A deadlock in a database system occurs when two or more transactions are waiting for
each other to release locks on resources, resulting in a cycle of dependencies where none of
the transactions can proceed. Deadlocks are problematic because they can halt progress in a
database, impacting performance and availability.
Consider two transactions, T1 and T2, that both require access to two resources (data items A
and B) to complete their tasks. Here’s a scenario illustrating a deadlock:
To manage deadlocks, database systems use strategies for detecting and avoiding them. Here
are some commonly used techniques:
1. Deadlock Detection:
o Wait-for Graph: The system maintains a graph where transactions are nodes,
and an edge from T1 to T2 indicates that T1 is waiting for T2. The system
periodically checks this graph for cycles. If a cycle is found, a deadlock exists.
o Resolution: Once a deadlock is detected, the system may "rollback" (undo)
one of the transactions involved in the deadlock to break the cycle.
2. Deadlock Prevention:
o Timeouts: Each transaction is assigned a timeout period. If it cannot obtain
the resources within this time, it’s assumed to be in a deadlock, and the
transaction is rolled back.
o Wait-Die and Wound-Wait Schemes:
Wait-Die: Older transactions are allowed to wait if a resource is held
by a newer transaction. However, if a newer transaction requests a
resource held by an older transaction, it is rolled back.
Wound-Wait: If an older transaction requests a resource held by a
younger transaction, the younger transaction is rolled back (wounded).
If a younger transaction requests a resource held by an older one, it is
allowed to wait.
o Resource Ordering: Establishing a consistent order for accessing resources.
Each transaction must request resources in a predefined order, eliminating
circular wait conditions.
3. Deadlock Avoidance:
o Banker’s Algorithm: Before granting a resource, the system checks if doing
so would lead to a potential deadlock. If granting the resource could cause a
deadlock, the request is denied, and the transaction waits until it can proceed
without causing a deadlock.
Each of these strategies has trade-offs. Deadlock detection is often simpler but reactive, while
prevention techniques proactively manage deadlocks, though they may introduce overhead.
The choice of technique depends on the specific requirements and performance
characteristics of the database system in use.
Without concurrency control, transactions executed concurrently may interfere with each
other, leading to issues such as inconsistent data, lost updates, dirty reads, and phantom
reads. Concurrency control mechanisms prevent these issues, enabling reliable and correct
transaction execution.
Concurrency control is necessary to handle problems that arise when multiple transactions
access the same data simultaneously. Here are common problems it addresses:
1. Lost Update: Two transactions read the same data and then update it simultaneously.
The changes made by one transaction can be overwritten by the other.
2. Dirty Read: A transaction reads data modified by another uncommitted transaction,
potentially reading data that may later be rolled back.
3. Unrepeatable Read: A transaction reads the same data multiple times and gets
different results each time because another transaction modified the data in the
meantime.
4. Phantom Read: A transaction reads a set of rows that match a condition. Another
transaction then inserts or deletes rows that would have met that condition, changing
the result of the initial read.
To ensure that transactions do not interfere with each other, concurrency control relies on
several mechanisms and protocols:
1. Lock-Based Protocols
o Locks are used to control access to database items. Transactions must acquire
locks on data items before reading or writing.
o Types of Locks:
Shared Lock (Read Lock): Allows multiple transactions to read a
data item simultaneously but prevents any of them from modifying it.
Exclusive Lock (Write Lock): Allows only one transaction to both
read and write a data item, preventing other transactions from
accessing it.
o Two-Phase Locking (2PL): Enforces a rule where each transaction must
obtain all the locks it needs before releasing any. This protocol has two
phases:
Growing Phase: The transaction acquires locks but does not release
any.
Shrinking Phase: The transaction releases locks and cannot acquire
any new locks.
o Strict Two-Phase Locking: A more restrictive form where all locks are held
until the transaction commits or rolls back, ensuring serializability and
consistency.
2. Timestamp-Based Protocols
o Timestamp Ordering: Assigns a unique timestamp to each transaction and
uses this timestamp to manage access to data items. Transactions access data
in the order of their timestamps, with older transactions getting priority.
o Thomas’s Write Rule: An optimization in timestamp ordering that allows
certain writes to be ignored if they won’t affect the final outcome, enhancing
concurrency.
3. Optimistic Concurrency Control (OCC)
o Optimistic Protocols assume that transactions will rarely conflict and let them
proceed without restrictions. Concurrency issues are checked only at the end,
during the commit phase.
o Phases in OCC:
Read Phase: Transactions read and make changes in a temporary
space.
Validation Phase: Checks if any conflicts occurred since the
transaction started.
Write Phase: If no conflicts are detected, changes are applied to the
database; otherwise, the transaction rolls back.
o This is efficient in scenarios with low contention, as transactions don’t need to
wait for locks.
4. Multiversion Concurrency Control (MVCC)
o MVCC maintains multiple versions of data items to allow readers to access a
consistent snapshot of the data without being blocked by writers.
o Snapshot Isolation: Each transaction sees a “snapshot” of the database at the
time it started. This avoids many concurrency issues and is commonly used in
systems like PostgreSQL.
o MVCC is particularly useful in read-heavy environments, as it reduces locking
overhead.
5. Serialization and Serializability
o Serial Schedule: A schedule in which transactions are executed one after
another with no overlap.
o Serializable Schedule: A concurrent transaction schedule that produces the
same result as a serial schedule. Ensures data consistency by maintaining the
effect of transactions as if they were executed serially.
o Achieving serializability is a key objective of concurrency control and is often
enforced through locking or timestamp-based protocols.
Types of Locks
1. Deadlock Prevention:
o Use timeout policies to detect and resolve deadlocks by rolling back
transactions if they exceed a wait time.
o Implement Wait-Die or Wound-Wait schemes:
Wait-Die: If an older transaction requests a lock held by a newer
transaction, the older transaction waits; if a newer transaction requests
a lock held by an older one, it’s rolled back.
Wound-Wait: An older transaction requesting a lock preempts (or
“wounds”) a younger one, causing the younger transaction to roll back.
If a younger transaction requests a lock held by an older one, it waits.
2. Deadlock Detection:
o Use a wait-for graph to detect cycles (indicative of deadlocks) and break the
cycle by rolling back one of the involved transactions.
3. Resource Ordering:
o Define a consistent ordering for resource access, ensuring that transactions
request resources in a specific sequence, thereby reducing circular waits and
preventing deadlocks.