Unit-III - SQL & Schema Refinement
Unit-III - SQL & Schema Refinement
Unit-III - SQL & Schema Refinement
Refinement
Form of basic SQL query
Regular expressions in the SELECT Command
SQL provides support for pattern matching through the LIKE operator, along with the use of the wild-card symbols.
Regular expressions: is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string
matching.
Examples:
Finds Names that start or ends with "a“
Finds names that start with "a" and are at least 3 characters in length.
LIKE: The LIKE operator is used in a 'WHERE' clause to search for a specified pattern in a column
wild-card: There are two primary wildcards used in conjunction with the `LIKE` operator
● The IN operator checks if a column value in the outer query's result is present in the inner query's
result. The final result will have rows that satisfy the IN condition.
● The NOT IN operator checks if a column value in the outer query's result is not present in the inner
query's result. The final result will have rows that satisfy the NOT IN condition.
● The ALL operator compares a value of the outer query's result with all the values of the inner query's
result and returns the row if it matches all the values.
● The ANY operator compares a value of the outer query's result with all the inner query's result
values and returns the row if there is a match with any value.
Co-related Nested Queries
In co-related nested queries, the inner query uses the values from the outer query to execute the inner query
for every row processed by the outer query. The co-related nested queries run slowly because the inner query
is executed for every row of the outer query's result.
Aggregation Operators / Functions
○ SQL aggregation function is used to perform the calculations on multiple rows of a single column of a
table. It returns a single value.
○ It is also used to summarize the data.
Aggregation Operators
● Aggregation operators are used to perform operations on a group of values to return a single
summarizing value. The most common aggregation operators include COUNT, SUM, AVG, MIN, and
MAX.
SELECT COUNT(*)
FROM PRODUCT_MAST;
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;
Example of a Trigger
Suppose we have an `Employees` table and we want to maintain an `AuditLog` table that keeps a record of
salary changes for employees.
How the Trigger Works
With this trigger in place, every time an employee's salary is updated in the `Employees` table, an entry is automatically added
to the `AuditLog` table recording the change.
Active Databases
An active database is a database that uses triggers and other event-driven functionalities. The term "active" signifies that the DBMS reacts
automatically to changes in data and predefined events. Triggers are a primary mechanism that makes a database "active."
1. Event-Condition-Action (ECA) Rule: This is the foundational concept of active databases. When a specific event occurs, the database
checks a particular condition, and if that condition is met, an action is executed.
2. Reactive Behavior: The database can react to changes without external applications or users having to intervene, thanks to the ECA rules.
3. Flexibility: Active databases provide more flexibility in data management and ensure better data integrity and security.
● Integrity Maintenance: Active databases can enforce more complex business rules that can't be enforced using standard integrity
constraints.
● Automation: They can automate certain tasks, reducing manual interventions.
● Alerts: They can notify users or applications when specific conditions are met.
Relation between Triggers and Active Databases
● Triggers are what give an active database its "active" nature. The ability of the database to
react to events automatically is primarily because of triggers that execute in response to
these events.
● In essence, while "trigger" refers to the specific procedural code blocks that run in response
to events, "active database" refers to the broader capability of a DBMS to support and use
such event-driven functionalities.
Data Redundancy
● Data redundancy means the occurrence of duplicate copies of similar data. It is done intentionally to keep
the same piece of data at different places, or it occurs accidentally.
● In DBMS, when the same data is stored in different tables, it causes data redundancy.
● Sometimes, it is done on purpose for recovery or backup of data, faster access of data, or updating data
easily. Redundant data costs extra money, demands higher storage capacity, and requires extra effort to
keep all the files up to date.
● Sometimes, unintentional duplicity of data causes a problem for the database to work properly, or it may
become harder for the end user to access data. Redundant data unnecessarily occupy space in the
database to save identical copies, which leads to space constraints, which is one of the major problems.
● In the below example, there is a "Student" table that contains data such as "Student_id", "Name", "Course",
"Session", "Fee", and "Department". As you can see, some data is repeated in the table, which causes
redundancy.
Problems caused by redundancy
Data redundancy in databases refers to the unnecessary duplication of data. It can arise from poor database design or lack of proper normalization.
Redundancy can cause several issues:
1. Wasted Storage
2. Data Anomalies
● Update Anomalies: When you have the same piece of data stored in multiple places, updating it in one place can lead to inconsistency if it's not updated
everywhere.
● Insertion Anomalies: You might have to insert redundant data in multiple places, leading to inconsistencies.
● Deletion Anomalies: Deleting data in one table might unintentionally remove necessary data that's needed elsewhere.
3. Increased Complexity
4. Performance Issues
Duplicate data can slow down search, update, and insert operations.
|---------|--------------|-----------|-------------------|
Redundancy: The `CustomerName` "Madhu" and his `CustomerAddress` "Hyderabad" are repeated for two orders.
Problems:
1. Update Anomaly: If Madhu moves to a new address, you'd have to update multiple rows. If you forget to update all the rows, it leads to inconsistent data.
2. Insertion Anomaly: To insert a new order for Madhu, you have to re-enter his address, leading to further redundancy.
3. Deletion Anomaly: If you decide to delete the order with the mouse, you might be tempted to delete Madhu's details entirely, but that would remove crucial data associated with the laptop order.
Solution:
Normalizing the database can resolve these problems. In this example, splitting the table into two tables, `Orders` and `Customers`, would be a start:
1. Customers Table:
|------------|--------------|-------------------|
|---------|------------|----------|
| 1 | 101 | Laptop |
| 2 | 101 | Mouse |
| 3 | 102 | Keyboard |
● Decomposition in the context of database design refers to the process of breaking down a single table into multiple tables in order to eliminate
redundancy, reduce data anomalies, and achieve normalization. Decomposition is typically done using rules defined by normalization forms.
● However, while decomposition can be helpful, it is not without challenges. Done incorrectly, decomposition can lead to its own set of problems.
1. Loss of Information
● Non-loss decomposition: When a relation is decomposed into two or more smaller relations, and the original relation can be perfectly reconstructed
by taking the natural join of the decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy decomposition."
● Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it into `R1(A, B)` and `R2(B, C)`, it would be lossy
because you can't recreate the original table using natural joins.
Functional Dependencies and its reasoning
Functional dependencies play a vital role in the normalization process in relational database design. They help in defining the relationships between
attributes in a relation and are used to formalize the properties of the relation and drive the process of decomposition.