Unit-III - SQL & Schema Refinement

Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

Unit-III : SQL & Schema

Refinement
Form of basic SQL query
Regular expressions in the SELECT Command
SQL provides support for pattern matching through the LIKE operator, along with the use of the wild-card symbols.

Regular expressions: is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string
matching.

Examples:
Finds Names that start or ends with "a“
Finds names that start with "a" and are at least 3 characters in length.

LIKE: The LIKE operator is used in a 'WHERE' clause to search for a specified pattern in a column

wild-card: There are two primary wildcards used in conjunction with the `LIKE` operator

percent sign (%) Represents zero, one, or multiple characters

underscore sign(_) Represents a single character


UNION operator
The UNION operator is used to combine the result sets of two or more SELECT statements. However, it will
only select distinct values. The UNION operator selects only distinct values by default. If you want to allow
duplicate values, you can use UNION ALL
INTERSECT
INTERSECT
EXCEPT
Nested Queries
● A nested query is a query within another query. Nested query allows for more complex and specific data
retrieval.
● In SQL, a nested query involves a query that is placed within another query.
● Output of the inner query is used by the outer query.
● A nested query has two SELECT statements: one for the inner query and another for the outer query.
Syntax of Nested Queries
Types of Nested Queries in SQL
Independent Nested Queries
In independent nested queries, the execution order is from the innermost query to the outer query. An outer query
won't be executed until its inner query completes its execution. The outer query uses the result of the inner query.
Operators such as IN, NOT IN, ALL, and ANY are used to write independent nested queries.

● The IN operator checks if a column value in the outer query's result is present in the inner query's
result. The final result will have rows that satisfy the IN condition.
● The NOT IN operator checks if a column value in the outer query's result is not present in the inner
query's result. The final result will have rows that satisfy the NOT IN condition.
● The ALL operator compares a value of the outer query's result with all the values of the inner query's
result and returns the row if it matches all the values.
● The ANY operator compares a value of the outer query's result with all the inner query's result
values and returns the row if there is a match with any value.
Co-related Nested Queries
In co-related nested queries, the inner query uses the values from the outer query to execute the inner query
for every row processed by the outer query. The co-related nested queries run slowly because the inner query
is executed for every row of the outer query's result.
Aggregation Operators / Functions

○ SQL aggregation function is used to perform the calculations on multiple rows of a single column of a
table. It returns a single value.
○ It is also used to summarize the data.
Aggregation Operators

● Aggregation operators are used to perform operations on a group of values to return a single
summarizing value. The most common aggregation operators include COUNT, SUM, AVG, MIN, and
MAX.
SELECT COUNT(*)
FROM PRODUCT_MAST;

SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;

SELECT COUNT(DISTINCT COMPANY)


FROM PRODUCT_MAST;

SELECT COMPANY, COUNT(*)


FROM PRODUCT_MAST
GROUP BY COMPANY;
Triggers
● A trigger is a predefined action that the database automatically executes in response to certain events on
a particular table or view. Triggers are typically used to maintain the integrity of the data, automate
data-related tasks, and extend the database functionalities.
● There are various types of triggers based on when they are executed:

BEFORE: Trigger is executed before the triggering event.


AFTER: Trigger is executed after the triggering event.
INSTEAD OF: Trigger is used to override the triggering event, primarily for views.

● They can also be categorized by the triggering event:

INSERT: Trigger is executed when a new row is inserted.


UPDATE: Trigger is executed when a row is updated.
DELETE: Trigger is executed when a row is deleted.
the basic syntax for creating a trigger in SQL, using MySQL as an

trigger_name: Name of the trigger.

trigger_time: BEFORE, AFTER, or INSTEAD OF.

trigger_event: INSERT, UPDATE, or DELETE.

table_name: The name of the table associated with the trigger.

trigger_body: The set of SQL statements to be executed.


Key Features of Triggers
1. Automatic Execution: Triggers run automatically in response to data modification events. You don't have to explicitly call them.
2. Event-Driven: They are defined to execute before or after INSERT, UPDATE, and DELETE events.
3. Transitional Access: Triggers can access the "old" (pre-modification) and "new" (post-modification) values of the rows
affected.

Example of a Trigger
Suppose we have an `Employees` table and we want to maintain an `AuditLog` table that keeps a record of
salary changes for employees.
How the Trigger Works

- The trigger is named `AfterSalaryUpdate`.


- It activates `AFTER` an `UPDATE` on the `Employees` table.
- It compares the old and new salary values. If there's a change (`OLD.Salary != NEW.Salary`), it inserts a new record into the
`AuditLog` table with the details of the change and the current date and time (`NOW()`).

With this trigger in place, every time an employee's salary is updated in the `Employees` table, an entry is automatically added
to the `AuditLog` table recording the change.
Active Databases

An active database is a database that uses triggers and other event-driven functionalities. The term "active" signifies that the DBMS reacts
automatically to changes in data and predefined events. Triggers are a primary mechanism that makes a database "active."

Key Features of Active Databases

1. Event-Condition-Action (ECA) Rule: This is the foundational concept of active databases. When a specific event occurs, the database
checks a particular condition, and if that condition is met, an action is executed.
2. Reactive Behavior: The database can react to changes without external applications or users having to intervene, thanks to the ECA rules.
3. Flexibility: Active databases provide more flexibility in data management and ensure better data integrity and security.

Why are Active Databases Important?

● Integrity Maintenance: Active databases can enforce more complex business rules that can't be enforced using standard integrity
constraints.
● Automation: They can automate certain tasks, reducing manual interventions.
● Alerts: They can notify users or applications when specific conditions are met.
Relation between Triggers and Active Databases

● Triggers are what give an active database its "active" nature. The ability of the database to
react to events automatically is primarily because of triggers that execute in response to
these events.
● In essence, while "trigger" refers to the specific procedural code blocks that run in response
to events, "active database" refers to the broader capability of a DBMS to support and use
such event-driven functionalities.
Data Redundancy
● Data redundancy means the occurrence of duplicate copies of similar data. It is done intentionally to keep
the same piece of data at different places, or it occurs accidentally.
● In DBMS, when the same data is stored in different tables, it causes data redundancy.
● Sometimes, it is done on purpose for recovery or backup of data, faster access of data, or updating data
easily. Redundant data costs extra money, demands higher storage capacity, and requires extra effort to
keep all the files up to date.
● Sometimes, unintentional duplicity of data causes a problem for the database to work properly, or it may
become harder for the end user to access data. Redundant data unnecessarily occupy space in the
database to save identical copies, which leads to space constraints, which is one of the major problems.
● In the below example, there is a "Student" table that contains data such as "Student_id", "Name", "Course",
"Session", "Fee", and "Department". As you can see, some data is repeated in the table, which causes
redundancy.
Problems caused by redundancy
Data redundancy in databases refers to the unnecessary duplication of data. It can arise from poor database design or lack of proper normalization.
Redundancy can cause several issues:

Problems Caused by Redundancy

1. Wasted Storage

Storing duplicate data consumes more storage than necessary.

2. Data Anomalies

These are inconsistencies that arise due to redundancy.

● Update Anomalies: When you have the same piece of data stored in multiple places, updating it in one place can lead to inconsistency if it's not updated
everywhere.
● Insertion Anomalies: You might have to insert redundant data in multiple places, leading to inconsistencies.
● Deletion Anomalies: Deleting data in one table might unintentionally remove necessary data that's needed elsewhere.
3. Increased Complexity

Querying and maintaining redundant data can be more complex.

4. Performance Issues

Duplicate data can slow down search, update, and insert operations.

5. Data Integrity Issues

If data is inconsistent across tables, it can lead to data integrity issues.


Example for Problems Caused by Redundancy:
Let's consider a simplistic example. Suppose you have a table called "Orders" with the following structure and data:

| OrderID | CustomerName | Product | CustomerAddress |

|---------|--------------|-----------|-------------------|

| 1 | Madhu | Laptop | Hyderabad |

| 2 | Madhu | Mouse | Hyderabad |

| 3 | Naveen | Keyboard | Bengaluru |

From the table:

Redundancy: The `CustomerName` "Madhu" and his `CustomerAddress` "Hyderabad" are repeated for two orders.
Problems:

1. Update Anomaly: If Madhu moves to a new address, you'd have to update multiple rows. If you forget to update all the rows, it leads to inconsistent data.
2. Insertion Anomaly: To insert a new order for Madhu, you have to re-enter his address, leading to further redundancy.
3. Deletion Anomaly: If you decide to delete the order with the mouse, you might be tempted to delete Madhu's details entirely, but that would remove crucial data associated with the laptop order.

Solution:

Normalizing the database can resolve these problems. In this example, splitting the table into two tables, `Orders` and `Customers`, would be a start:

1. Customers Table:

| CustomerID | CustomerName | CustomerAddress |

|------------|--------------|-------------------|

| 101 | Madhu | Hyderabad |

| 102 | Naveen | Bengaluru |


2. Orders Table:

| OrderID | CustomerID | Product |

|---------|------------|----------|

| 1 | 101 | Laptop |

| 2 | 101 | Mouse |

| 3 | 102 | Keyboard |

This design reduces redundancy and eliminates the anomalies.


Decompositions and its problems

● Decomposition in the context of database design refers to the process of breaking down a single table into multiple tables in order to eliminate
redundancy, reduce data anomalies, and achieve normalization. Decomposition is typically done using rules defined by normalization forms.
● However, while decomposition can be helpful, it is not without challenges. Done incorrectly, decomposition can lead to its own set of problems.

Problems Related to Decomposition

1. Loss of Information
● Non-loss decomposition: When a relation is decomposed into two or more smaller relations, and the original relation can be perfectly reconstructed
by taking the natural join of the decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy decomposition."
● Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it into `R1(A, B)` and `R2(B, C)`, it would be lossy
because you can't recreate the original table using natural joins.
Functional Dependencies and its reasoning
Functional dependencies play a vital role in the normalization process in relational database design. They help in defining the relationships between
attributes in a relation and are used to formalize the properties of the relation and drive the process of decomposition.

Functional Dependencies (FD)


A functional dependency `\( X \rightarrow Y \)` between two sets of attributes X and Y in a relation R is defined as: if two tuples (rows) of R have the
same value for attributes X, then they must also have the same values for attributes Y. In other words, the values of X determine the values of Y.

1. sid functionally determines sname because for a given


student ID, there's only one possible student name
2. zipcode functionally determines cityname, a specific zip code
should determine a unique cityname
3. cityname functionally determines state, A city name could
determine a state.
4. Mathematically, these functional dependencies can be
represented as:
5. \( sid \rightarrow sname \)
\( zipcode \rightarrow cityname \)
Reasoning About Functional Dependencies
Introduction to Normal Forms
● In database management systems (DBMS), the concept of normalization is employed to organize
relational databases efficiently and to eliminate redundant data, ensure data dependency, and ensure
data integrity. The process of normalization is divided into several stages, called "normal forms." Each
normal form has a specific set of rules and criteria that a database schema must meet.
● Normalization often involves trade-offs. While higher normal forms eliminate redundancy and improve
data integrity, they can also result in more complex relational schemas and sometimes require more
joins, which can affect performance. As such, it's essential to understand the data and the specific
application's requirements when deciding the level of normalization suitable for a particular situation.
Sometimes, denormalization (intentionally introducing redundancy) is implemented to improve
performance, especially in read-heavy databases.

Types of Normal Forms

1. First Normal Form (1NF)


2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF or Project-Join Normal Form - PJNF)
7. Sixth Normal Form (6NF)
First Normal Form (1NF) in DBMS
Second Normal Form (2NF) in DBMS
Example for Second Normal Form
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF) in DBMS
Fourth Normal Form (4NF) in DBMS
Fifth Normal Form (5NF or PJNF) in DBMS
Now, these decomposed tables eliminate the redundancy caused by the specific constraints and join dependencies of the original
relation. When you take the natural join of these tables, you will get back the original table.
It's worth noting that reaching 5NF can lead to an increased number of tables, which can complicate queries and database
operations. Thus, achieving 5NF should be a conscious decision made based on the specific requirements and constraints of a
given application.

You might also like