DBMS - Notes Full
DBMS - Notes Full
1. Overview of Database
• Definition: A database is an organized collection of structured data that is stored
electronically and can be efficiently retrieved, managed, and updated.
• Purpose: Databases are used to store and manage data in a way that ensures easy
access, retrieval, and management of information.
• Examples: A university storing student records, an e-commerce platform storing
product catalogs, etc.
2. Database Management System (DBMS)
• Definition: A Database Management System (DBMS) is software that enables users
to define, create, maintain, and control access to the database. It acts as an interface
between the users and the database, ensuring efficient data management.
• Functions:
1. Data Storage: Manages how data is stored on disk.
2. Data Retrieval: Provides query processing capabilities (e.g., SQL) to retrieve
data efficiently.
3. Data Manipulation: Allows for data modification (insert, update, delete).
4. Data Security: Controls access to the database, ensuring that only authorized
users can perform specific operations.
5. Concurrency Control: Ensures that multiple users can access the database
simultaneously without conflicts.
6. Backup and Recovery: Helps recover the database to a consistent state in
case of a failure.
• Examples of DBMS: MySQL, Oracle, PostgreSQL, Microsoft SQL Server.
3. DBMS Architecture
DBMS architecture describes the structure and organization of the DBMS components. It is
typically divided into the following layers:
a) 1-Tier Architecture:
• Definition: In 1-tier architecture, the DBMS and user interface both reside on the
same machine. The database directly interacts with the user without any
intermediary layers.
• Example: A stand-alone database application like Microsoft Access.
b) 2-Tier Architecture:
• Definition: In 2-tier architecture, the DBMS is located on a server, and the application
(or client) resides on the user’s machine. The client directly communicates with the
database server.
• Components:
1. Client Application: Sends queries and receives responses from the DBMS.
2. Database Server: Manages the actual data storage and processing.
• Example: A desktop application connecting to a remote database server.
c) 3-Tier Architecture:
• Definition: In 3-tier architecture, an intermediate layer (application server or
business logic layer) exists between the client and the database server. It allows for
more scalability and flexibility.
• Components:
1. Presentation Layer: The client interface that users interact with.
2. Application Layer (Business Logic): Handles the business logic, processing the
client's request before sending it to the database.
3. Database Layer: The backend where data is stored and managed.
• Example: A web-based application where the user interface is a browser, the
application server processes the requests, and the database server stores the data.
4. Data Independence
Data independence refers to the capacity to change the database schema without affecting
the higher-level application programs. It is classified into two types:
a) Logical Data Independence:
• Definition: The ability to change the logical schema (e.g., adding new tables or fields)
without affecting the external schema (user views or application programs).
• Importance: This allows flexibility in the logical design and structure of the database
without disturbing how users access the data.
b) Physical Data Independence:
• Definition: The ability to change the physical storage structure (e.g., indexing,
storage devices) without affecting the logical schema or application programs.
• Importance: This allows for optimizations at the storage level without disrupting the
application-level functionality.
5. Integrity Constraints
Integrity constraints are rules that ensure the accuracy and consistency of data in a
database. They are enforced to maintain data quality and avoid anomalies.
a) Types of Integrity Constraints:
1. Domain Constraints:
o Definition: Specifies the permissible values that a column in a table can hold.
For example, the age field can only hold integer values between 0 and 150.
o Example: CHECK (age >= 0 AND age <= 150) ensures that the age is within the
valid range.
2. Entity Integrity Constraint:
o Definition: Ensures that every table has a primary key, and that the primary
key cannot be NULL. This guarantees that each record in the table is uniquely
identifiable.
o Example: A "Student ID" column in a student table must be unique and
cannot contain NULL.
3. Referential Integrity Constraint:
o Definition: Ensures that relationships between tables remain consistent. It
enforces that a foreign key in one table matches a valid primary key in
another table.
o Example: In an "Orders" table, the "Customer ID" field must refer to a valid
"Customer ID" in the "Customers" table.
4. Key Constraints:
o Definition: A rule that requires a set of attributes (columns) in a relation
(table) to uniquely identify a record. This is usually enforced using primary
keys and unique constraints.
o Example: A unique constraint on "Email" in a user table ensures that no two
users have the same email address.
5. NOT NULL Constraint:
o Definition: Ensures that a column cannot have NULL values. It is applied when
a column must have a value for every record.
o Example: A "Phone Number" column in a contacts table must not contain any
NULL values.
b) Importance of Integrity Constraints:
• Data Accuracy: Ensures that only valid and consistent data is entered into the
database.
• Data Consistency: Maintains the relationship between different tables, ensuring no
data anomalies.
• Prevents Data Redundancy: Constraints like primary keys and foreign keys help
eliminate redundancy and enforce relationships.
• Maintains Business Rules: Ensures that the data adheres to specific rules and
conditions defined by the business.
a) Definition:
A functional dependency (FD) is a relationship between two sets of attributes in a relation (table),
where one set of attributes determines another. It is a fundamental concept used to identify
redundancy in a database.
3. Partial Dependency:
4. Transitive Dependency:
• Functional dependencies are essential for normalization since they help identify redundancy
and ensure the correctness of database design.
• They aid in eliminating anomalies like insertion, deletion, and update anomalies.
a) Definition of Normalization:
Normalization is the process of organizing data in a database to reduce redundancy and improve
data integrity. It involves dividing large tables into smaller, more manageable tables and defining
relationships between them. The goal is to eliminate undesirable characteristics like data anomalies
and data duplication.
b) Normal Forms:
There are several levels of normalization, called normal forms, each addressing specific types of
redundancy and anomalies:
o Requirements: The table should only contain atomic (indivisible) values. Repeating
groups or arrays are not allowed.
o Example: A column containing multiple phone numbers should be split into separate
rows, each with one phone number.
o Requirements: The table must first satisfy 1NF and eliminate partial dependencies,
meaning every non-prime attribute must depend on the entire primary key, not just
a part of it.
o Example: If a table has a composite key (e.g., "OrderID, ProductID") and a column
(e.g., "ProductName") depends only on "ProductID", then "ProductName" should be
moved to a separate table.
o Requirements: The table must satisfy 2NF and eliminate transitive dependencies,
meaning no non-prime attribute should depend on another non-prime attribute.
o Example: If a non-candidate key column determines part of the primary key, BCNF
suggests restructuring the table.
c) Importance of Normalization:
• Reduces Data Redundancy: By organizing data into multiple related tables, normalization
minimizes the duplication of data across the database.
• Improves Data Integrity: Ensures that the relationships between tables are consistent and
accurate, reducing the risk of inconsistencies.
3. Data Redundancy
a) Definition:
Data redundancy occurs when the same piece of data is stored in multiple places within a database.
This duplication of data leads to increased storage usage and the potential for data inconsistencies.
1. Increased Storage Costs: Storing the same data multiple times unnecessarily increases the
storage required.
2. Data Inconsistency: Redundant data may lead to inconsistencies when different copies of the
same data are updated independently, causing mismatches.
o Example: If a customer's address is stored in multiple tables and updated in only one
place, the data becomes inconsistent.
3. Maintenance Issues: Managing and synchronizing redundant data requires more effort,
leading to complex and error-prone updates.
Normalization reduces data redundancy by breaking larger tables into smaller, related tables,
ensuring that each piece of data is stored in only one place and can be referenced through foreign
keys when necessary.
4. Update Anomalies
a) Definition:
Update anomalies occur when changes to the data are incorrectly or inefficiently propagated across
the database, leading to inconsistencies. These anomalies are often caused by poor database design
or data redundancy.
1. Insertion Anomaly:
o Definition: Inability to insert new data due to the absence of other related data.
2. Deletion Anomaly:
o Example: If deleting a student record from a table that also contains course details
results in the loss of course information, this is a deletion anomaly.
3. Update Anomaly:
o Definition: Inconsistent data results when updating one instance of a piece of data
while other instances remain unchanged.
o Solution: Normalize the data to ensure the phone number is stored in only one
table, with references (foreign keys) in other tables.
• Insertion Anomaly: By structuring data into independent tables, normalization ensures that
new data can be inserted without needing unrelated data.
• Deletion Anomaly: Normalized tables ensure that deleting a record does not unintentionally
remove related but independent data.
• Update Anomaly: By removing data redundancy, normalization ensures that updates to data
are made in only one place, preventing inconsistencies.
Normal forms are standards or guidelines for organizing databases to reduce redundancy and
prevent anomalies like insertion, deletion, and update issues. The goal is to structure data efficiently
and maintain data integrity. The most commonly used normal forms are First Normal Form (1NF),
Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF).
• Conditions:
• Example: Consider the following table, which violates 1NF due to repeating groups:
1. It is in 1NF.
2. All non-key attributes are fully functionally dependent on the entire primary key (no
partial dependencies).
• Here, CourseName depends only on CourseID (not the entire primary key StudentID,
CourseID), which is a partial dependency. To convert this into 2NF, split the table as follows:
• Student-Course Table:
StudentID CourseID
101 C101
102 C102
• Course Table:
CourseID CourseName
C101 Math
C102 Chemistry
1. It is in 2NF.
202 D02 HR
• Employee Table:
EmployeeID DepartmentID
201 D01
202 D02
• Department Table:
DepartmentID DepartmentName
D01 Sales
D02 HR
1. It is in 3NF.
• Explanation: BCNF is a stricter version of 3NF. It deals with certain types of anomalies that
3NF cannot handle. BCNF ensures that every determinant (the left side of the functional
dependency) is a candidate key.
• Example: Consider the following table:
• Here, Professor determines Department, but Course also determines Department. This
violates BCNF because Course is not a candidate key, yet it determines Department. To
convert this table into BCNF, we split it as follows:
• Professor Table:
Professor Course
• Course-Department Table:
Course Department
Math Science
History Humanities
2. De-Normalization
a) Definition:
De-normalization is the process of combining tables that were separated during normalization to
improve database performance. It involves intentionally introducing redundancy into a database
schema to reduce the time taken to retrieve data by minimizing the number of joins required.
• Performance is more critical than redundancy, and frequent joins across tables slow down
query execution.
• Data retrieval needs to be optimized for read-heavy applications, such as reporting systems.
c) Example of De-Normalization:
Customer Table:
CustomerID CustomerName
C01 John
CustomerID CustomerName
C02 Alice
Order Table:
Product Table:
In a normalized database, retrieving the complete order details would require joining these three
tables. To optimize for faster retrieval, we can de-normalize by combining them into a single table:
De-Normalized Table:
d) Advantages of De-Normalization:
1. Improved Performance: Queries can retrieve data faster since fewer joins are needed.
2. Simplified Queries: Fewer joins result in simpler and more readable queries.
e) Disadvantages of De-Normalization:
2. Update Anomalies: With data duplicated across multiple places, updating one piece of data
(e.g., product price) requires updating it in multiple locations.
3. More Storage Required: Since the same data is repeated, de-normalization increases the
amount of storage used.
a) What is SQL?
• SQL is used for querying, updating, and managing data in relational databases like MySQL,
PostgreSQL, SQL Server, and Oracle.
• It is a declarative language, meaning users specify what data they need without having to
specify how to retrieve it.
• Data Retrieval: SQL allows you to query the database and retrieve data using SELECT
statements.
• Data Manipulation: You can add, update, or delete records in a database using commands
like INSERT, UPDATE, and DELETE.
• Database Management: SQL lets you create, modify, and manage databases and their
structure using DDL commands (CREATE, ALTER, DROP).
• Security: SQL provides features like roles, permissions, and views to control who can access
and manipulate the data.
SQL Commands
SQL commands are categorized into several types based on their functionality. The main categories
are:
DDL commands deal with the structure or schema of the database and its objects like tables, indexes,
and views.
• CREATE: Used to create new database objects (tables, views, indexes).
• TRUNCATE: Used to remove all records from a table without deleting the table itself.
DCL commands are used to control access to the data stored in a database.
• SAVEPOINT: Sets a point within a transaction to which you can later roll back.
• SELECT: The only command in this category, used to query the database and retrieve data
from one or more tables.
SQL supports various data types, which are used to define the type of data that can be stored in each
column of a table. These data types vary slightly between different SQL implementations, but the
main categories are:
• INT or INTEGER: Used for whole numbers (e.g., INT, BIGINT, SMALLINT).
• DECIMAL(p, s) or NUMERIC(p, s): Used for fixed-point numbers, with precision p and scale s.
• TIMESTAMP: Stores date and time, often used for tracking records creation or modification
time.
• BLOB (Binary Large Object): Used to store binary data such as images, videos, or audio files.
DDL Statements
DDL (Data Definition Language) consists of commands that define and modify the structure of
database objects like tables, indexes, and views. The key DDL commands are CREATE, ALTER, DROP,
and TRUNCATE.
1. CREATE Statement
The CREATE statement is used to create a new database object, such as a table, view, or index.
...
);
• Example:
first_name VARCHAR(50),
last_name VARCHAR(50),
hire_date DATE
);
This creates an employees table with 5 columns: employee_id, first_name, last_name, salary, and
hire_date.
2. ALTER Statement
The ALTER statement is used to modify the structure of an existing database object. It can be used to
add, modify, or drop columns.
• Syntax:
DROP column_name;
3. DROP Statement
The DROP statement is used to delete an entire database object, such as a table or a view.
• Syntax:
• Example:
DROP TABLE employees;
This deletes the employees table, along with all the data stored in it.
4. TRUNCATE Statement
The TRUNCATE statement is used to remove all records from a table, but it does not delete the table
itself. Unlike DELETE, it is faster and cannot be rolled back.
• Syntax:
• Example:
This removes all records from the employees table, but the table structure remains.
• TRUNCATE removes all data from the table but retains the structure for future use.
Summary
• SQL commands are categorized into DML (for data manipulation), DDL (for defining data
structures), DCL (for controlling access), and TCL (for managing transactions).
• Data types help define the type of data that can be stored in a table, ensuring data integrity.
• DDL statements (CREATE, ALTER, DROP, and TRUNCATE) are used to define and modify
database objects like tables.
DML statements in SQL are used to manage and manipulate the data stored in the database. The
most commonly used DML commands are INSERT, UPDATE, and DELETE.
1. INSERT Statement
The INSERT statement is used to add new records to a table. You can insert a single record or
multiple records into a table.
• Example:
This inserts a new row with employee_id as 101, first_name as 'John', last_name as 'Doe', and
department_id as 10 into the employees table.
VALUES
2. UPDATE Statement
The UPDATE statement is used to modify existing data in a table. You can update one or more
records based on a specific condition.
a) Basic Syntax:
UPDATE table_name
WHERE condition;
• Example:
UPDATE employees
SET department_id = 15
WHERE employee_id = 101;
This updates the department_id for the employee with employee_id 101 to 15.
UPDATE employees
This updates the first_name and last_name of the employee with employee_id 101.
3. DELETE Statement
The DELETE statement is used to remove records from a table. The deletion is permanent unless
rolled back using a transaction control command.
a) Basic Syntax:
WHERE condition;
• Example:
This removes all rows from the employees table, but keeps the table structure intact.
The WHERE clause is used to filter records based on specified conditions. It can be applied to SELECT,
UPDATE, and DELETE statements.
a) Basic Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition;
This selects all records from the employees table where the department_id is 10.
• =: Equal to
This selects all employees whose first name starts with the letter "J".
The compound WHERE clause allows you to filter records based on multiple conditions using logical
operators AND and OR.
a) AND Operator:
• Used to combine multiple conditions where all conditions must be true for the record to be
selected.
• Example:
This selects all employees who work in department 10 and have a salary greater than 60,000.
b) OR Operator:
• Used to combine multiple conditions where at least one condition must be true for the
record to be selected.
• Example:
This selects all employees who work in either department 10 or department 20.
You can combine AND and OR in a compound WHERE clause. Be mindful of parentheses to ensure
the correct logic is applied.
• Example:
This selects employees who work in either department 10 or 20 and have a salary greater than
50,000.
Joins in SQL
Joins are used to retrieve data from multiple tables based on related columns between them. There
are several types of joins:
a) Inner Join:
• An Inner Join returns only the rows that have matching values in both tables.
• Syntax:
SELECT columns
FROM table1
ON table1.column = table2.column;
• Example:
FROM employees
ON employees.department_id = departments.department_id;
This selects the first name of employees and their corresponding department name where the
department_id matches in both tables.
• A Left Join returns all rows from the left table, and the matching rows from the right table. If
there is no match, NULL values are returned for columns from the right table.
• Syntax:
SELECT columns
FROM table1
ON table1.column = table2.column;
• Example:
FROM employees
ON employees.department_id = departments.department_id;
This returns all employees, even if they don’t belong to a department. For employees without a
department, department_name will be NULL.
• A Right Join returns all rows from the right table, and the matching rows from the left table.
If there is no match, NULL values are returned for columns from the left table.
• Example:
SELECT employees.first_name, departments.department_name
FROM employees
ON employees.department_id = departments.department_id;
This selects all departments and their employees, even if some departments don't have employees.
The employee columns will contain NULL if there are no matches.
• A Full Join returns all rows from both tables, with NULLs in places where there is no match.
• Syntax:
SELECT columns
FROM table1
ON table1.column = table2.column;
• Example:
FROM employees
ON employees.department_id = departments.department_id;
This selects all employees and departments, and displays NULL where there is no match.
Summary
• INSERT, UPDATE, and DELETE are key DML commands to manipulate data in a table.
• Compound WHERE clauses can combine multiple conditions using AND and OR.
• Joins are used to retrieve data from multiple related tables, with common types being Inner
Join, Left Join, Right Join, and Full Join.
Sub-queries - Simple & Correlated Using IN, EXISTS, NOT
EXISTS
Sub-queries in SQL
A sub-query (or nested query) is a query within another SQL query. Sub-queries can be used to
perform operations that require multiple steps or to return data that will be used in the main query.
They can be classified into two main types: Simple Sub-queries and Correlated Sub-queries.
1. Simple Sub-queries
A simple sub-query is a standalone query that can be executed independently of the main query. It
typically returns a single value or a set of values that can be used in the main query.
Using IN Operator
The IN operator allows you to specify multiple values in a WHERE clause. You can use a simple sub-
query with the IN operator to filter records based on the results of the sub-query.
• Syntax:
FROM table1
• Example:
FROM employees
In this example, the sub-query retrieves department_id values from the departments table where
department_name is 'Sales', and the main query selects employees whose department_id matches
those values.
The EXISTS operator is used to test for the existence of any record in a sub-query. It returns TRUE if
the sub-query returns one or more records, and FALSE if it returns no records.
• Syntax:
• Example:
FROM employees
Here, the main query selects employees for whom there is a matching record in the departments
table based on department_id.
The NOT EXISTS operator is the opposite of EXISTS. It returns TRUE if the sub-query returns no
records.
• Syntax:
FROM table1
• Example:
FROM employees
This example retrieves employees who do not belong to any department listed in the departments
table.
2. Correlated Sub-queries
A correlated sub-query is a sub-query that refers to columns from the outer query. This means that
the sub-query cannot be executed independently of the outer query because it relies on the outer
query for its values.
• Syntax:
SELECT column1, column2
FROM table1 t1
• Example:
FROM employees e
In this example, the inner query filters departments based on the manager_id from the outer
employees table.
• Syntax:
FROM table1 t1
• Example:
FROM employees e
This retrieves employees who are part of departments located in New York.
• Syntax:
FROM table1 t1
FROM employees e
Summary of Sub-queries
DCL (Data Control Language) statements are used to control access to data in a database. The
primary DCL commands are GRANT and REVOKE, which manage permissions and access rights.
1. GRANT Statement
The GRANT statement is used to give users access privileges to database objects such as tables,
views, and procedures.
Syntax:
GRANT privilege_type
ON object_name
TO user_name;
• Example:
ON employees
TO user1;
This example grants the SELECT and INSERT privileges on the employees table to user1.
ON employees
TO user1;
2. REVOKE Statement
The REVOKE statement is used to remove previously granted privileges from users.
Syntax:
REVOKE privilege_type
ON object_name
FROM user_name;
• Example:
REVOKE INSERT
ON employees
FROM user1;
This example removes the INSERT privilege on the employees table from user1.
ON employees
FROM user1;
GROUP BY Clause
The GROUP BY clause is used in collaboration with aggregate functions (like COUNT, SUM, AVG, etc.)
to group the result set by one or more columns.
Syntax:
FROM table_name
WHERE condition
GROUP BY column1;
Example:
FROM employees
GROUP BY department_id;
This query counts the number of employees in each department, grouping the results by
department_id.
Multiple Columns:
FROM employees
The HAVING clause is used to filter records that work on summarized group results, which is often
used in conjunction with the GROUP BY clause. It is similar to the WHERE clause, but HAVING is
applied after grouping.
Syntax:
FROM table_name
GROUP BY column1
HAVING condition;
Example:
FROM employees
GROUP BY department_id
You can use logical operators in the HAVING clause to apply multiple conditions.
FROM employees
GROUP BY department_id
This query retrieves departments with an average salary greater than 60,000 and more than 5
employees.
A view is a virtual table in a database that is based on the result set of a SQL query. It does not store
the data itself but provides a way to represent data from one or more tables in a specific format.
1. Creating Views
You can create a view using the CREATE VIEW statement. This statement allows you to define a view
based on a SELECT query.
Syntax:
FROM table_name
WHERE condition;
Example:
FROM employees
In this example, employee_view is a view that includes only active employees from the employees
table.
2. Benefits of Views
• Data Abstraction: Views provide a way to present data to users in a simplified manner
without exposing the underlying table structure.
• Security: You can grant access to views rather than the underlying tables, limiting users'
access to specific data.
• Simplified Queries: Views can encapsulate complex queries, making it easier for users to
retrieve data without needing to know the underlying query structure.
• Consistency: Views ensure that users see a consistent dataset, even if the underlying tables
change.
• Join and Aggregate Data: Views can combine data from multiple tables, allowing for more
complex reporting and analysis without changing the underlying data structure.
3. Altering Views
You can modify an existing view using the CREATE OR REPLACE VIEW statement. This allows you to
redefine the view with a new query.
Syntax:
FROM new_table_name
WHERE new_condition;
Example:
FROM employees
This example updates the employee_view to include the hire_date column and filters for employees
hired after January 1, 2020.
4. Dropping Views
You can remove a view from the database using the DROP VIEW statement. This permanently deletes
the view definition.
Syntax:
Example:
DROP VIEW employee_view;
In this example, the employee_view is dropped from the database, and it will no longer be available
for use.
Summary of Views
Creating Defines a virtual table based CREATE VIEW employee_view AS SELECT first_name,
Views on a SELECT query. last_name FROM employees;
- Data abstraction
- Security
Benefits of
- Simplified queries N/A
Views
- Consistency
- Join and aggregate data
Joins (inner join, outer join, cross join, self join), write
complex queries using joins
Joins in SQL
Joins are used in SQL to combine rows from two or more tables based on a related column between
them. There are several types of joins, including INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF
JOIN.
1. INNER JOIN
The INNER JOIN keyword selects records that have matching values in both tables.
Syntax:
SELECT columns
FROM table1
ON table1.common_column = table2.common_column;
Example:
FROM employees e
This query retrieves employee names along with their department names for those employees who
belong to a department.
2. OUTER JOIN
OUTER JOIN can be classified into three types: LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
• LEFT JOIN: Returns all records from the left table and matched records from the right table; if
no match, NULL values are returned for right table columns.
Syntax:
SELECT columns
FROM table1
ON table1.common_column = table2.common_column;
Example:
FROM employees e
This query retrieves all employees and their department names. If an employee does not belong to
any department, the department_name will be NULL.
• RIGHT JOIN: Returns all records from the right table and matched records from the left table;
if no match, NULL values are returned for left table columns.
Syntax:
SELECT columns
FROM table1
ON table1.common_column = table2.common_column;
Example:
FROM employees e
This query retrieves all departments and the employees working in them. If a department has no
employees, the employee columns will be NULL.
• FULL OUTER JOIN: Returns all records when there is a match in either left or right table
records.
Syntax:
SELECT columns
FROM table1
ON table1.common_column = table2.common_column;
Example:
FROM employees e
This query retrieves all employees and departments, including employees without departments and
departments without employees.
3. CROSS JOIN
The CROSS JOIN returns the Cartesian product of two tables, meaning it will return all possible
combinations of rows from both tables.
Syntax:
SELECT columns
FROM table1
Example:
SELECT e.first_name, d.department_name
FROM employees e
This query retrieves every employee's name combined with every department name.
4. SELF JOIN
A SELF JOIN is a regular join but the table is joined with itself. It is often used to compare rows within
the same table.
Syntax:
WHERE condition;
Example:
FROM employees e1
This query retrieves employee names along with their managers' names from the same employees
table.
Here are a few complex queries that utilize different types of joins:
FROM employees e
This query retrieves employees from departments located in New York and the projects they are
working on.
FROM departments d
GROUP BY d.department_name;
This query counts the number of employees in each department, including departments that have
no employees.
FROM departments d
GROUP BY d.department_name
This query retrieves departments with an average salary greater than 50,000.
FROM employees e
This query retrieves employee names and department names for those departments located in
California, showcasing a combination of CROSS JOIN and filtering.
Summary of Joins
Type of
Description Example
Join
Returns all rows from the left SELECT e.first_name, d.department_name FROM
LEFT JOIN table and matched rows from the employees e LEFT JOIN departments d ON
right table; NULLs if no match. e.department_id = d.department_id;
Type of
Description Example
Join
Returns all rows from the right SELECT e.first_name, d.department_name FROM
RIGHT
table and matched rows from the employees e RIGHT JOIN departments d ON
JOIN
left table; NULLs if no match. e.department_id = d.department_id;
FULL Returns all rows when there is a SELECT e.first_name, d.department_name FROM
OUTER match in either left or right table employees e FULL OUTER JOIN departments d ON
JOIN records. e.department_id = d.department_id;
Stored Programs in MySQL are routines that are stored in the database and can be executed by
calling them. They help encapsulate complex operations and logic within the database, allowing for
more efficient and maintainable code. Stored programs include:
1. Stored Procedures: These are collections of SQL statements that can be executed as a single
unit. They can accept parameters, perform operations, and return results.
2. Stored Functions: These are similar to stored procedures but are used to compute and
return a single value.
3. Triggers: These are special types of stored programs that automatically execute in response
to certain events on a specified table, such as INSERT, UPDATE, or DELETE.
Stored programs provide several benefits that enhance database management and application
development:
1. Improved Performance:
o Reduced Network Traffic: Since the logic is stored in the database, multiple SQL
statements can be executed with a single call, minimizing the amount of data sent
over the network.
o Execution Plan Reuse: Stored programs allow the database to reuse execution plans,
which can speed up query processing.
2. Enhanced Security:
o Data Validation: Logic can be centralized within stored programs, ensuring data
integrity and validation before operations are performed.
o Code Reusability: Common logic can be written once in a stored program and reused
across multiple applications, reducing code duplication and maintenance efforts.
o Easier Updates: Changes can be made to the stored program without altering the
application code, simplifying updates and maintenance.
o Centralized Business Logic: Business rules can be enforced at the database level,
ensuring consistency across different applications that access the database.
5. Error Handling:
6. Complex Operations:
o Support for Complex Logic: Stored procedures can include control flow structures
(such as loops and conditional statements), enabling complex business logic to be
implemented directly in the database.
7. Scheduled Tasks:
o Automation: Stored programs, especially events, can automate routine tasks (like
data archiving or cleanup) at specified intervals without manual intervention.
DELIMITER //
BEGIN
FROM employees
DELIMITER ;
In SQL, constraints are rules applied to columns in a table to ensure the integrity, accuracy, and
consistency of data. Constraints are critical to maintain the quality of the data stored in the database.
They can be applied when creating or altering a table.
• Primary Key
• Foreign Key
• Unique
• Not Null
• Default
• Check
A Primary Key is a column (or a combination of columns) that uniquely identifies each row in a table.
Each table can have only one primary key, and it ensures that:
• Syntax:
column_name2 datatype,
...
);
• Example:
first_name VARCHAR(50),
last_name VARCHAR(50)
);
This creates an employees table where employee_id is the primary key, meaning each employee
must have a unique employee_id, and it cannot be NULL.
• Composite Primary Key: A primary key can be a combination of more than one column,
known as a composite key.
order_id INT,
product_id INT,
);
A Foreign Key is a column or set of columns in a table that establishes a link between data in two
tables. It enforces referential integrity by ensuring that values in the foreign key column must match
values in the primary key column of another table.
• Syntax:
column_name datatype,
column_name2 datatype,
);
• Example:
CREATE TABLE departments (
department_name VARCHAR(50)
);
department_id INT,
);
Here, department_id in the employees table is a foreign key that references the department_id in
the departments table. This ensures that department_id in employees must have a corresponding
value in the departments table.
3. Unique Constraint
A Unique constraint ensures that all values in a column (or combination of columns) are distinct. It
allows NULL values, unlike the primary key constraint, but still ensures uniqueness.
• Syntax:
);
• Example:
first_name VARCHAR(50)
);
In this example, the email column must contain unique values for each employee, but it can contain
NULL values if necessary.
order_id INT,
product_id INT,
);
This ensures that the combination of order_id and product_id must be unique.
A Not Null constraint ensures that a column cannot contain a NULL value. By default, columns in a
table can contain NULL unless this constraint is applied.
• Syntax:
);
• Example:
);
In this example, the first_name and last_name columns must contain values and cannot be left NULL.
5. Default Constraint
A Default constraint provides a default value for a column if no value is provided when inserting
data. This helps to ensure that every column has a value, even if the user doesn't explicitly provide
one.
• Syntax:
);
• Example:
first_name VARCHAR(50),
last_name VARCHAR(50),
);
In this case, if no hire_date is provided when a new employee is inserted, the default value will be
the current date.
6. Check Constraint
A Check constraint ensures that all values in a column satisfy a specific condition. This allows you to
enforce additional rules on the data that can be inserted.
• Syntax:
);
• Example:
first_name VARCHAR(50),
);
In this example, the salary column must have a value greater than 0. Any attempt to insert or update
a salary with a value less than or equal to 0 will result in an error.
Primary
Uniquely identifies each row in a table and ensures no NULL values.
Key
Ensures referential integrity between two tables by linking one column to another
Foreign Key
table's primary key.
Ensures all values in a column or combination of columns are unique, allowing NULL
Unique
values.
Introduction to Cursors
A cursor is a database object used to retrieve, manipulate, and
navigate through the result set (the collection of rows) retrieved from
a query. Cursors allow row-by-row processing of the data, giving
developers more control over how records are accessed and
modified. Unlike simple SQL queries that process all rows
simultaneously, cursors provide mechanisms to traverse through
each record one by one.
Types of Cursors
1. Implicit Cursor:
o Automatically created by the database system for single
row queries or SELECT statements that return only one
row.
o They are managed internally, and developers typically
don't need to define or open them explicitly.
2. Explicit Cursor:
o Defined explicitly by the developer to handle multiple
rows returned by a query.
o Must be declared, opened, fetched, and closed manually.
Types of explicit cursors include:
o Static Cursor: The result set is determined when the
cursor is opened and cannot be changed during its
lifetime.
o Dynamic Cursor: Reflects changes made to the rows in the
result set (like insertions or updates) while the cursor is
open.
o Forward-only Cursor: Allows fetching rows in one
direction only (from the first to the last row).
o Scroll Cursor: Allows moving both forward and backward
through the result set.
3. Cursor for Loops:
o Simplifies the usage of explicit cursors by automatically
opening, fetching, and closing the cursor during iteration.
4. Parameterised Cursor:
o Accepts parameters, allowing flexibility in fetching data
based on varying inputs.
Advantages of Cursors
1. Row-by-row Processing:
o Cursors provide fine-grained control, allowing operations
on individual rows. This is useful for complex business
logic that requires iterative processing.
2. Handling Large Datasets:
o Instead of loading all data into memory, cursors handle
records one at a time, making it feasible to work with
large result sets.
3. Complex Processing:
o Ideal for scenarios where you need to perform complex
operations on each row, which would be difficult to
achieve with set-based operations.
4. Custom Navigation:
o Cursors allow moving forward or backward through a
result set, skipping rows, or even fetching specific rows
multiple times.
Disadvantages of Cursors
1. Performance Overhead:
o Cursors tend to be slower than set-based operations
because they process data row by row, which can be
inefficient for large datasets.
2. Resource-Intensive:
o Cursors consume more memory and CPU resources since
they keep a lock on the result set and maintain pointers
for navigation.
3. Concurrency Issues:
o Long-running cursors can lead to locking problems, where
other transactions cannot access the same data until the
cursor is closed.
4. Reduced Scalability:
o Due to the row-by-row processing nature, cursors do not
scale well with large volumes of data, making them
unsuitable for high-performance applications.
Example:
Let’s consider a banking transaction as an example of ACID
properties in action:
Transaction: Transfer $500 from Account X to Account Y.
• Atomicity: The transaction should either debit $500 from
Account X and credit $500 to Account Y, or neither
operation should happen. If there’s an error, the
transaction is rolled back.
• Consistency: The total balance of both accounts before
and after the transaction should remain unchanged. If
Account X has $1,000 and Account Y has $2,000, after
the transaction, Account X should have $500, and
Account Y should have $2,500.
• Isolation: If two users are transferring money
simultaneously, one transaction should not affect the
outcome of the other. For example, if User 1 transfers
$500 from Account X to Y, User 2 should not see an
intermediate state where only $500 has been deducted
from X but not yet added to Y.
• Durability: Once the transaction is committed, it should
be saved in the database, even in the case of a system
crash or power failure. After the transaction is
completed, the new balances of both accounts should
be retained in the database.
1. Concurrent Transactions
• Definition: Concurrent transactions refer to the
execution of multiple transactions at the same time in a
database system. Each transaction may involve read and
write operations on the shared data.
• Challenge: When multiple transactions run
simultaneously, there is a potential for conflicts like lost
updates, dirty reads, uncommitted data overwrites,
and inconsistent data.
2. Need for Concurrency Control
• Concurrency Control: It refers to the techniques and
mechanisms used to ensure correct execution of
transactions in a concurrent environment while
maintaining database consistency and integrity.
• Problems without Concurrency Control:
o Lost Update: Occurs when two transactions
simultaneously read and update the same data,
causing one update to be overwritten by another.
o Dirty Read: A transaction reads data that has been
written by another uncommitted transaction,
leading to incorrect results if the uncommitted
transaction rolls back.
o Unrepeatable Read: A transaction reads the same
data twice and gets different results because
another transaction has modified the data between
reads.
o Phantom Reads: A transaction retrieves a set of
rows based on a condition, and later in the same
transaction, re-executes the query and finds
additional rows inserted by another transaction.
3. Locking Techniques
Locking is a widely used concurrency control mechanism that
ensures that data used by one transaction is not
simultaneously used by another in a way that could lead to
inconsistencies.
a) Types of Locks:
• Shared Lock (S-Lock): Acquired by transactions that are
only reading data. Multiple transactions can hold a
shared lock on the same data, allowing for concurrent
reads.
• Exclusive Lock (X-Lock): Acquired by transactions that
need to modify the data. Only one transaction can hold
an exclusive lock at any time, preventing other
transactions from accessing the data.
b) Lock Granularity:
• Row-level Lock: Locks only a specific row in a table,
providing high concurrency but higher locking overhead.
• Page-level Lock: Locks a disk page, which could contain
multiple rows. This provides a balance between
concurrency and overhead.
• Table-level Lock: Locks the entire table, ensuring low
overhead but limiting concurrency.
c) Two-Phase Locking Protocol (2PL):
• Growing Phase: A transaction can acquire locks but not
release any.
• Shrinking Phase: Once the transaction starts releasing
locks, it cannot acquire any more locks.
• Purpose: Guarantees serializability (ensuring that
transactions execute as if they were in serial order),
preventing lost updates and dirty reads.
d) Locking Schemes:
1. Pessimistic Locking: Assumes conflicts will occur and
locks data before accessing it. Suitable for high-
contention environments.
2. Optimistic Locking: Assumes conflicts are rare and does
not lock data when reading but checks for conflicts
before committing the transaction. Suitable for low-
contention environments.
e) Deadlocks:
• Definition: A deadlock occurs when two or more
transactions are waiting for each other to release locks,
creating a cycle of dependencies that cannot be
resolved.
• Prevention Techniques:
o Wait-Die Scheme: Older transactions wait for
younger ones to release locks, while younger ones
are aborted.
o Wound-Wait Scheme: Older transactions force
younger ones to abort, while younger ones must
wait for older ones.
o Timeouts: Abort transactions that are waiting for
too long.
4. Deadlock Detection and Recovery:
• Detection: Periodically checking for cycles in the wait-for
graph (a graph where nodes represent transactions, and
edges represent dependencies).
• Recovery: Once a deadlock is detected, one of the
involved transactions is rolled back to break the cycle.
5. Other Concurrency Control Techniques:
• Timestamp-based Protocols: Transactions are ordered
based on their timestamps to ensure consistency.
Transactions with earlier timestamps get priority.
• Validation-based Protocols: Transactions proceed
without locks but are validated at the end. If they pass
validation, they are committed; otherwise, they are
rolled back.
• Data Integrity: In the event of system failures, such as crashes or power outages, recovery
ensures that the database is restored to a consistent state.
• Durability (ACID Property): The results of committed transactions must be preserved, even
in case of failures.
• Handling Transaction Failures: When a transaction fails due to errors, the changes made by
the transaction should be undone to prevent inconsistencies.
• Prevent Data Loss: Recovery mechanisms help avoid data loss in case of unexpected failures,
ensuring minimal downtime and data restoration.
3. Types of Errors
Several types of errors can affect the database, making recovery mechanisms necessary:
a) Transaction Failure:
• Reason: Occurs when a transaction cannot complete successfully due to errors like invalid
inputs, deadlock, or insufficient resources.
b) System Crash:
• Reason: Caused by hardware failures, power outages, or operating system crashes. The
system stops abruptly, potentially leaving transactions incomplete.
• Impact: Requires recovering from the last consistent state before the crash.
c) Media Failure:
• Reason: Occurs due to physical issues like hard disk crashes or data corruption. The storage
media holding the database may become unavailable.
d) Logical Errors:
• Reason: Caused by software bugs, human errors (such as accidental deletion), or corruption
of data structures within the database.
• Impact: Data corruption may spread if not identified early, requiring partial or full recovery
of data.
4. Recovery Techniques
To handle various types of errors and failures, databases implement several recovery techniques:
a) Log-Based Recovery:
• Concept: Uses logs to keep a record of all the transactions that modify the database. The log
contains information like transaction start, data changes, and commit status.
• Steps:
1. Write-Ahead Logging (WAL): Before making changes to the database, the log is
updated. This ensures that no change is applied before it is recorded.
2. Undo (Rollback): If a transaction fails or is aborted, the database is rolled back to its
original state using the log.
3. Redo (Roll Forward): If a transaction was committed before a failure but changes
weren't reflected in the database, the log helps apply these changes during recovery.
b) Checkpoints:
• Concept: A checkpoint is a saved point in time when the database state is considered
consistent. All the transactions completed before the checkpoint are permanently saved in
the database.
• Benefits:
o Limits the amount of log information that needs to be processed during recovery.
• Concept: Changes made by a transaction are not applied to the database until the
transaction commits. If a transaction fails, no changes are written, simplifying recovery.
• Advantages: It eliminates the need to undo changes for failed transactions, as no changes
are made until the commit point.
d) Immediate Update:
• Concept: The database is updated immediately as a transaction makes changes, but the
changes are also recorded in the log. If a failure occurs, uncommitted changes are undone.
• Steps:
2. Use logs to redo the changes for committed transactions that may not have been
applied due to failure.
e) Shadow Paging:
• Concept: Shadow paging maintains two versions of the database: a current page table (in
use) and a shadow page table (backup). The current page table is updated with new changes,
but the shadow table remains unchanged.
• Steps:
1. During updates, changes are made to a copy of the page (current page table), not
the original data.
2. If a transaction is committed, the new page table replaces the shadow page.
3. If a transaction fails, the system reverts to the shadow page table.
• Advantages: It provides fast recovery because there’s no need for logging or undoing. The
system only switches between the two page tables.
• Concept: Regular backups are taken to prevent data loss in case of media failure or data
corruption. When a failure occurs, the most recent backup is restored, and log files are
applied to recover recent transactions.
• Types of Backups:
2. Incremental Backup: A copy of only the changes made since the last backup.
3. Differential Backup: A copy of all changes made since the last full backup.
1. Phase 1 (Prepare): The coordinator asks all participating sites to prepare for commit.
2. Phase 2 (Commit): If all sites are ready, the coordinator sends a commit signal;
otherwise, it sends a rollback signal.