0% found this document useful (0 votes)
2 views

DBMS Module 3 Study Notes

Uploaded by

forosih265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS Module 3 Study Notes

Uploaded by

forosih265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Module 3: Normalization and SQL

This module covers the fundamental concepts of database normalization, focusing on


functional and multivalued dependencies, and provides an introduction to the SQL
language for data definition and manipulation.

Introduction to Normalization
Normalization is a formal process for analyzing relation schemas based on their
Functional Dependencies (FDs) and primary keys. Its main goals are:
1.​ Minimizing redundancy.
2.​ Minimizing insertion, deletion, and update anomalies.

Database design can follow two approaches:


●​ Bottom-up (Design by Synthesis): Starts with relationships among individual
attributes to construct relation schemas. Less popular in practice.
●​ Top-down (Design by Analysis): Starts with existing attribute groupings and
analyzes/decomposes them until desirable properties are met. This is the
approach used in normalization.
Implicit goals of design activity:
●​ Information Preservation: Maintaining all concepts from the conceptual design
(attribute types, entity types, relationship types, etc.).
●​ Minimum Redundancy: Minimizing redundant storage and reducing the need for
multiple updates to maintain consistency.

Informal Design Guidelines for Relation Schema


These are four informal guidelines to assess the quality of a relation schema design:
1.​ Imparting Clear Semantics to Attributes in Relations:
○​ Guideline 1: Design a relation schema so its meaning is easy to explain. Do
not combine attributes from multiple entity types and relationship types into a
single relation.
○​ Explanation: Each relation should represent one entity type or one
relationship type. Mixing them leads to semantic ambiguities.
○​ Example: An EMPLOYEE relation with attributes like Ename, Ssn, Bdate,
Address, Dnumber (department number) is clear. An EMP_DEPT relation
mixing employee and department attributes (Ename, Ssn, ..., Dname,
Dmgr_ssn) violates this guideline if used as a base relation.
2.​ Reducing Redundant Information in Tuples and Update Anomalies:
○​ Grouping attributes can lead to storing the same information multiple times
(redundancy). This causes update anomalies:
■​ Insertion Anomalies: Cannot insert information about an entity unless it
is related to another entity.
■​ Example (EMP_DEPT): Cannot insert a new department unless an
employee works for it (violates entity integrity if Ssn is primary key).
■​ Deletion Anomalies: Deleting information about one entity inadvertently
deletes information about another.
■​ Example (EMP_DEPT): Deleting the last employee in a department
loses all information about that department.
■​ Modification Anomalies: Changing a value requires updating multiple
tuples to maintain consistency.
■​ Example (EMP_DEPT): Changing a department manager requires
updating all employee tuples in that department.
○​ Guideline 2: Design base relation schemas to avoid insertion, deletion, or
modification anomalies. If unavoidable, note them and handle them in update
programs.
3.​ Reducing NULL Values in Tuples:
○​ Many NULL values waste space and can make attribute meanings unclear.
Aggregate operations and comparisons with NULLs can be unpredictable.
○​ Interpretations of NULL: Attribute doesn't apply, value is unknown, value is
known but not recorded.
○​ Guideline 3: Avoid placing attributes whose values are frequently NULL in
base relations. If NULLs are unavoidable, they should apply only in
exceptional cases.
4.​ Disallowing the Possibility of Generating Spurious Tuples:
○​ Joining relations based on attributes that are not properly related (primary
key, foreign key pairs) can produce incorrect tuples (spurious tuples).
○​ Example: Joining EMP_PROJ1 and EMP_LOCS on Plocation when Plocation is
neither a primary nor foreign key in either relation can generate spurious
tuples.
○​ Guideline 4: Design relation schemas so they can be joined correctly using
primary key/foreign key pairs, guaranteeing no spurious tuples are generated.

Functional Dependencies
A formal tool for analyzing relational schemas.
●​ Definition: A functional dependency (FD), denoted by X→Y, between sets of
attributes X and Y (subsets of a relation schema R) means that for any two tuples
t1​and t2​in any legal relation state of R, if t1​[X]=t2​[X], then t1​[Y]=t2​[Y].
●​ Meaning: The values of X uniquely determine the values of Y. Y is functionally
dependent on X.
●​ Relation states satisfying FDs are called legal relation states.
●​ FDs are derived from the real-world semantics of the data.
●​ A single counterexample in a relation state is sufficient to disprove an FD.
●​ Examples (from EMP_PROJ schema):
○​ Ssn → Ename (Employee's SSN determines their name)
○​ Pnumber → {Pname, Plocation} (Project number determines project name
and location)
○​ {Ssn, Pnumber} → Hours (Employee's SSN and project number determine
hours worked on that project)

Normal Forms Based on Primary Keys


Normalization proceeds by testing relation schemas against normal form criteria and
decomposing them if necessary.
●​ Keys:
○​ Superkey: A set of attributes S in R such that no two tuples in any legal state
of R have t1​[S]=t2​[S].
○​ Key (Candidate Key): A superkey K where no proper subset of K is a
superkey.
○​ Primary Key: One candidate key arbitrarily designated as the primary key.
○​ Secondary Key: Other candidate keys.
○​ Prime Attribute: An attribute that is a member of some candidate key.
○​ Nonprime Attribute: An attribute that is not a member of any candidate key.
●​ First Normal Form (1NF):
○​ Definition: A relation is in 1NF if it does not contain:
■​ Composite attributes
■​ Multivalued attributes
■​ Nested relations (attributes with non-atomic values)
○​ Note: 1NF is considered a basic property of relations in the relational model.
All relations are assumed to be in 1NF by default.
○​ Normalization to 1NF: Involves flattening the relation by creating separate
tuples for each value in a multivalued attribute or nested relation, often
duplicating other attribute values. This can introduce redundancy.
○​ Example: A DEPARTMENT relation with a multivalued Dlocations attribute is
not in 1NF. To make it 1NF, create multiple tuples for the same department,
each with a single Dlocation value.
●​ Second Normal Form (2NF):
○​ Definition: A relation schema R is in 2NF if every non-prime attribute A in R is
fully functionally dependent on the primary key.
○​ Full Functional Dependency: A FD X→Y where removing any attribute from
X means the FD no longer holds.
○​ Partial Dependency: A FD X→Y where some attribute A∈X can be removed
and (X−A)→Y still holds.
○​ Condition for 2NF: No non-prime attribute is partially dependent on the
primary key.
○​ Normalization to 2NF: Decompose the relation into smaller relations where
partial dependencies are removed.
○​ Example: If {Ssn, Pnumber} is the primary key of EMP_PROJ, and Ssn →
Ename is a partial dependency (since Ename is a non-prime attribute),
EMP_PROJ is not in 2NF. Decompose into EP1(Ssn, Pnumber, Hours) and
EP2(Ssn, Ename).
●​ Third Normal Form (3NF):
○​ Definition: A relation schema R is in 3NF if it is in 2NF, and no non-prime
attribute A in R is transitively dependent on the primary key.
○​ Transitive Functional Dependency: A FD X→Z derived from X→Y and Y→Z,
where X is the primary key and Y is not a candidate key.
○​ Condition for 3NF: No non-prime attribute is transitively dependent on the
primary key.
○​ Normalization to 3NF: Decompose the relation to remove transitive
dependencies.
○​ Example: If EMP_DEPT has primary key Ssn and FDs Ssn → Dnumber and
Dnumber → Dmgr_ssn, and Dnumber is not a candidate key, then Dmgr_ssn is
transitively dependent on Ssn. EMP_DEPT is not in 3NF. Decompose into
ED1(Ssn, Ename, Bdate, Address, Dnumber) and ED2(Dnumber, Dname,
Dmgr_ssn).
●​ Informal Summary of Normal Forms:
○​ 1NF: All attributes depend on the key (no composite, multivalued, or nested
attributes).
○​ 2NF: All attributes depend on the whole key (no partial dependencies on the
primary key for non-prime attributes).
○​ 3NF: All attributes depend on nothing but the key (no transitive dependencies
on the primary key for non-prime attributes).

Boyce-Codd Normal Form (BCNF)


A stricter version of 3NF.
●​ Definition: A relation schema R is in BCNF if whenever an FD X→A holds in R,
then X is a superkey of R.
●​ Comparison with 3NF: BCNF is strictly stronger than 3NF. Every BCNF relation is
in 3NF, but not vice versa. A relation is in 3NF but not BCNF if for an FD X→A, X is
not a superkey, and A is a prime attribute.
●​ Goal: Design relations in BCNF (or at least 3NF).
●​ Normalization to BCNF: Decompose the relation to satisfy the BCNF condition.
This decomposition might not preserve all functional dependencies.
●​ Example: The TEACH relation with FDs {student, course} \rightarrow instructor
and instructor \rightarrow course. {student, course} is a candidate key. instructor
\rightarrow course violates BCNF because instructor is not a superkey, but course
is a prime attribute. This relation is in 3NF but not BCNF.

Multivalued Dependency and Fourth Normal Form


Addresses issues beyond functional dependencies.
●​ Multivalued Dependency (MVD):
○​ Definition: An MVD X→→Y between sets of attributes X and Y (subsets of R)
means that if t1​and t2​are two tuples in a legal state of R such that
t1​[X]=t2​[X], then there exist tuples t3​and t4​in R such that:
■​ t3​[X]=t4​[X]=t1​[X]
■​ t3​[Y]=t1​[Y] and t4​[Y]=t2​[Y]
■​ t3​[Z]=t2​[Z] and t4​[Z]=t1​[Z] (where Z=R−(X∪Y))
○​ Meaning: The set of Y values associated with a given X value is independent
of the set of Z values associated with the same X value.
○​ MVDs occur when two or more attributes in a table are independent of each
other but both depend on a third attribute. An MVD involves at least three
attributes.
○​ Notation: X→→Y or X→→Y∣Z (where Z is the remaining attributes).
○​ Example: In a relation Name Course_work Hobby, if Name →→ Course_work
and Name →→ Hobby hold, it means the set of courses taken by a student is
independent of the set of hobbies for that student.
●​ Fourth Normal Form (4NF):
○​ Definition: A relation schema R is in 4NF if, whenever a non-trivial multivalued
dependency X→→Y holds in R, X is a superkey of R.
○​ Non-trivial MVD: Y is not a subset of X, and X∪Y is a proper subset of R.
○​ Condition for 4NF: No non-trivial MVD exists unless the determinant (X) is a
superkey.
○​ Relationship: 4NF builds on BCNF. A relation in 4NF is also in BCNF.
Join Dependencies and Fifth Normal Form
Addresses cases where decomposition is necessary even if no MVDs exist.
●​ Join Dependency (JD):
○​ Definition: A relation R has a join dependency JD(R1​,R2​,...,Rn​) if every legal
state of R is equal to the join of its projections onto R1​,R2​,...,Rn​.
○​ Meaning: The relation R can be losslessly decomposed into R1​,R2​,...,Rn​.
○​ A JD exists if you cannot recover the original table R by joining the
decomposed tables without loss of data or spurious tuples.
●​ Fifth Normal Form (5NF):
○​ Definition: A relation R is in 5NF if and only if it satisfies the following
conditions:
1.​ R is in 4NF.
2.​ It cannot be further non-loss decomposed (i.e., it satisfies no non-trivial
join dependency).
○​ Note: 5NF is also known as Project-Join Normal Form (PJNF).

SQL (Structured Query Language)


SQL is the standard language for relational databases.

SQL Data Definition and Data Types


●​ Terminology: SQL uses Table, row, and column for the relational model terms
relation, tuple, and attribute.
●​ CREATE statement: Main command for data definition.
●​ SQL Schema: A named collection of schema elements (tables, constraints, views,
domains, etc.).
○​ Syntax: CREATE SCHEMA schema_name AUTHORIZATION user_id;
○​ Example: CREATE SCHEMA COMPANY AUTHORIZATION 'Jsmith';
●​ Catalog: A named collection of schemas.
●​ CREATE TABLE Command: Specifies a new relation.
○​ Syntax:​
CREATE TABLE table_name (​
attribute1 datatype [constraints],​
attribute2 datatype [constraints],​
...​
[table_constraints]​
);​

○​ Can specify schema: CREATE TABLE COMPANY.EMPLOYEE ...


●​ Base Tables: Relations whose tuples are physically stored.
●​ Virtual Relations (Views): Created by CREATE VIEW, do not correspond to
physical files.
●​ Attribute Data Types:
○​ Numeric: INTEGER, INT, SMALLINT, FLOAT, REAL, DOUBLE PRECISION.
○​ Character-string: CHAR(n) (fixed), VARCHAR(n) (varying).
○​ Bit-string: BIT(n) (fixed), BIT VARYING(n) (varying).
○​ Boolean: TRUE, FALSE, NULL.
○​ DATE: YYYY-MM-DD.
○​ TIMESTAMP: Includes date and time, optional time zone.
○​ INTERVAL: Relative value for date/time arithmetic.
●​ Domain: A named set of allowed values for an attribute.
○​ Syntax: CREATE DOMAIN domain_name AS data_type [constraints];
○​ Example: CREATE DOMAIN SSN_TYPE AS CHAR(9);
●​ User Defined Types (UDTs): Supported for object-oriented features.
○​ Syntax: CREATE TYPE type_name ...;

Schema Change Statements


●​ DROP Command: Removes named schema elements.
○​ Syntax: DROP element_type element_name {CASCADE | RESTRICT};
○​ CASCADE: Drops dependent objects.
○​ RESTRICT: Drops only if no dependent objects exist.
○​ Example: DROP SCHEMA COMPANY CASCADE; (Removes schema and all its
elements)
○​ Example: DROP TABLE EMPLOYEE CASCADE; (Deletes data and table
definition)
●​ ALTER Command: Changes the definition of schema elements (primarily tables).
○​ Actions: Add/drop columns, change column definitions, add/drop constraints.
○​ Add Column Syntax: ALTER TABLE table_name ADD COLUMN column_name
datatype [constraints];
○​ Example: ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job
VARCHAR(12);
○​ Drop Column Syntax: ALTER TABLE table_name DROP COLUMN
column_name {CASCADE | RESTRICT};
○​ Example: ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address
CASCADE;
Specifying Constraints in SQL
●​ Basic Relational Model Constraints:
○​ Key Constraint: Primary key values must be unique.
○​ Entity Integrity Constraint: Primary key values cannot be NULL.
○​ Referential Integrity Constraint: Foreign key values must either be NULL or
match a primary key value in the referenced table.
●​ Attribute Constraints (within CREATE TABLE):
○​ DEFAULT <value>: Specifies a default value.
○​ NOT NULL: Prohibits NULL values.
○​ CHECK (condition): Specifies a condition that must be true for the attribute
value.
■​ Example: Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber <
21);
●​ Specifying Key and Referential Integrity Constraints:
○​ PRIMARY KEY (attribute_list): Specifies the primary key.
■​ Example: Dnumber INT PRIMARY KEY; or PRIMARY KEY (Dnumber);
○​ UNIQUE (attribute_list): Specifies an alternate (candidate) key.
■​ Example: Dname VARCHAR(15) UNIQUE; or UNIQUE (Dname);
○​ FOREIGN KEY (attribute_list) REFERENCES referenced_table
(referenced_attribute_list) [ON DELETE action] [ON UPDATE action]: Specifies
a foreign key and referential triggered actions.
■​ Referential Triggered Actions: SET NULL, CASCADE, SET DEFAULT.
■​ Example: FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT
(Dnumber) ON DELETE SET NULL ON UPDATE CASCADE;
●​ Naming Constraints: Use the CONSTRAINT constraint_name clause for easier
management.
○​ Example: CONSTRAINT PK_EMPLOYEE PRIMARY KEY (Ssn);
●​ Additional Constraints: CHECK clauses at the end of CREATE TABLE can apply
to individual tuples.
○​ Example: CHECK (Dept_create_date <= Mgr_start_date);

Retrieval Queries in SQL


●​ SELECT Statement: The basic command for retrieving data.
●​ Multiset/Bag Behavior: SQL tables can have duplicate tuples (unlike the strict
relational model).
○​ DISTINCT: Used in the SELECT clause to remove duplicate tuples from the
result.
●​ Basic Form:​
SELECT [DISTINCT] attribute_list​
FROM table_list​
[WHERE condition];​
●​ Logical Comparison Operators: =, <, <=, >, >=, <>.
●​ Projection Attributes: Attributes listed after SELECT.
●​ Selection Condition: Boolean condition in the WHERE clause. Includes join
conditions.
●​ Ambiguous Attribute Names: Qualify attribute names with the relation name if
they exist in multiple tables in the FROM clause.
○​ Example: SELECT EMPLOYEE.Ename, DEPARTMENT.Dname ... FROM
EMPLOYEE, DEPARTMENT WHERE ...
●​ Aliasing and Renaming: Use AS to define aliases for table names (tuple
variables) or rename result columns.
○​ Table Alias Example: FROM EMPLOYEE AS E, EMPLOYEE AS S WHERE
E.Super_ssn = S.Ssn;
○​ Column Rename Example: SELECT 1.1 * E.Salary AS Increased_sal ...
●​ Unspecified WHERE Clause: Results in a CROSS PRODUCT (Cartesian Product)
of the tables in the FROM clause.
●​ Asterisk (*): Selects all attributes of the selected tuples. Can be prefixed by a
relation name (EMPLOYEE.*).
●​ Set Operations: UNION, EXCEPT (difference), INTERSECT. UNION ALL, EXCEPT
ALL, INTERSECT ALL for multisets. Require type compatibility.
●​ Substring Pattern Matching: LIKE operator.
○​ %: Matches zero or more characters.
○​ _: Matches a single character.
○​ Example: WHERE Address LIKE '%Houston, TX%';
●​ BETWEEN Operator: Checks if a value is within a range.
○​ Example: WHERE Salary BETWEEN 30000 AND 40000;
●​ Arithmetic Operators: +, -, *, / can be used in the SELECT clause.
●​ ORDER BY Clause: Sorts the result.
○​ Syntax: ORDER BY attribute_list [ASC | DESC];
○​ Example: ORDER BY Dname DESC, Lname ASC;
●​ Complete SELECT Syntax:​
SELECT [DISTINCT] attribute_list​
FROM table_list​
[WHERE conditions]​
[GROUP BY grouping_attribute_list]​
[HAVING condition]​
[ORDER BY attribute_list];​
INSERT, DELETE, and UPDATE Statements
Commands for modifying data.
●​ INSERT: Adds tuples to a relation.
○​ Syntax (single tuple): INSERT INTO table_name [(attribute_list)] VALUES
(value_list);
○​ Attribute values must match order and type. Constraints are enforced.
○​ Syntax (from query result): INSERT INTO table_name [(attribute_list)]
SELECT ... FROM ... WHERE ...;
○​ Bulk Loading (example syntax): CREATE TABLE D5EMPS LIKE EMPLOYEE
(SELECT E.* FROM EMPLOYEE AS E WHERE E.Dno = 5) WITH DATA; (Creates
and loads a new table)
●​ DELETE: Removes tuples from a relation.
○​ Syntax: DELETE FROM table_name [WHERE condition];
○​ Removes tuples satisfying the WHERE condition.
○​ A missing WHERE clause deletes all tuples.
○​ Referential integrity is enforced (potentially triggering CASCADE if defined).
●​ UPDATE: Modifies attribute values of selected tuples.
○​ Syntax: UPDATE table_name SET attribute1 = value1, attribute2 = value2, ...
[WHERE condition];
○​ Modifies tuples satisfying the WHERE condition.
○​ Referential integrity is enforced.
○​ Example: UPDATE PROJECT SET Plocation = ‘Bellaire’, Dnum = 5 WHERE
Pnumber = 10;
○​ Example: UPDATE EMPLOYEE SET Salary = Salary * 1.1 WHERE Dno = 5; (Gives
a 10% raise to employees in department 5)
Additional Features of SQL
●​ Techniques for complex queries (subqueries, joins, etc.).
●​ Programming interfaces: Embedded SQL, Dynamic SQL, SQL/CLI (ODBC),
SQL/PSM (Stored Modules).
●​ Commands for physical database design (CREATE INDEX).
●​ Transaction control commands (COMMIT, ROLLBACK).
●​ Specifying privileges (GRANT, REVOKE).
●​ Constructs for triggers (CREATE TRIGGER).
●​ Object-relational features (UDTs, defining relations as classes).
●​ Integration with technologies like XML and OLAP.

You might also like