The document discusses SQL queries and database concepts like SELECT statements, FROM and WHERE clauses, Cartesian products, and different types of joins. It provides examples of SQL queries on sample relations to retrieve certain fields based on conditions. Key concepts covered include selecting fields, filtering tuples based on conditions, joining multiple relations based on common attributes, and using subqueries.
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
403 views
Advance SQL
The document discusses SQL queries and database concepts like SELECT statements, FROM and WHERE clauses, Cartesian products, and different types of joins. It provides examples of SQL queries on sample relations to retrieve certain fields based on conditions. Key concepts covered include selecting fields, filtering tuples based on conditions, joining multiple relations based on common attributes, and using subqueries.
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 103
SQL and More Databases Final
Simple SQL Queries
A SQL query has a form: SELECT . . . FROM . . . WHERE . . .; The SELECT clause indicates which attributes should appear in the output. The FROM gives the relation(s) the query refers to The WHERE clause is a Boolean expression indicating which tuples are of interest. The query result is a relation Note that the result relation is unnamed. Example SQL Query Relation schema: Course (courseNumber, name, noOfCredits) Query: Find all the courses stored in the database Query in SQL: SELECT - FROM Course; Note: - means all the attributes in the relations involved. Example SQL Query Relation schema: Movie (title, year, length, filmType) Query: Find the titles of all movies stored in the database Query in SQL: SELECT title FROM Movie; Example SQL Query Relation schema: Student (ID, firstName, lastName, address, GPA) Query: Find the ID of every student who has GPA > 3 Query in SQL: SELECT ID FROM Student WHERE GPA > 3; Example SQL Query Relation schema: Student (ID, firstName, lastName, address, GPA) Query: Find the ID and last name of every student with first name John, who has GPA > 3 Query in SQL: SELECT ID, lastName FROM Student WHERE firstName = John AND GPA > 3; WHERE clause The expressions that may follow WHERE are conditions Standard comparison operators includes { =, <>, <, >, <=, >= } The values that may be compared include constants and attributes of the relation(s) mentioned in FROM clause Simple expression A op Value A op B Where A, B are attributes and op is a comparison operator We may also apply the usual arithmetic operators, +,-,*,/, etc. to numeric values before comparing them (year - 1930) * (year - 1930) < 100 The result of a comparison is a Boolean value TRUE or FALSE Boolean expressions can be combined by the logical operators AND, OR, and NOT Example SQL Query Relation schema: Movie (title, year, length, filmType) Query: Find the titles of all color movies produced in 1990 Query in SQL: SELECT title FROM Movie WHERE filmType = color AND year = 1990; Example SQL Query Relation schema: Movie (title, year, length, filmType) Query: Find the titles of all color movies that are either made after 1970 or are less than 90 minutes long Query in SQL: SELECT title FROM Movie WHERE (year > 1970 OR length < 90) AND filmType = color; Note on precedence rules: AND takes precedence over OR, and NOT takes precedence over both Products and Joins SQL has a simple way to couple relations in one query list each relevant relation in the FROM clause All the relations in the FROM clause are coupled through Cartesian product (, in algebra) Cartesian Product From Set Theory: The Cartesian Product of two sets R and S is the set of all pairs (a, b) such that: a R and b S. Denoted as R S Note: In general, R S = S R Example Instance S: Instance R: R x S: B C D 2 5 6 4 7 8 9 10 11 A B 1 2 3 4 A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11 Example Instance of Course: Instance of Student: SELECT - FROM Student, Course; ID firstName lastName GPA Address courseNumber name noOfCredits 111 Joe Smith 4.0 45 Pine av. Comp352 Data structures 3 111 Joe Smith 4.0 45 Pine av. Comp353 Databases 4 222 Sue Brown 3.1 71 Main st. Comp352 Data structures 3 222 Sue Brown 3.1 71 Main st. Comp353 Databases 4 333 Ann Johns 3.7 39 Bay st. Comp352 Data structures 3 333 Ann Johns 3.7 39 Bay st. Comp353 Databases 4 ID firstName lastName GPA Address 111 Joe Smith 4.0 45 Pine av. 222 Sue Brown 3.1 71 Main st. 333 Ann Johns 3.7 39 Bay st. courseNumber name noOfCredits Comp352 Data structures 3 Comp353 Databases 4 Example Instance of Course: Instance of Student: SELECT ID, courseNumber FROM Student, Course; ID firstName lastName GPA Address 111 Joe Smith 4.0 45 Pine av. 222 Sue Brown 3.1 71 Main st. 333 Ann Johns 3.7 39 Bay st. courseNumber name noOfCredits Comp352 Data structures 3 Comp353 Databases 4 ID courseNumber 111 Comp352 111 Comp353 222 Comp352 222 Comp353 333 Comp352 333 Comp353 Example Relation schemas: Student (ID, firstName, lastName, address, GPA) Ugrad (ID, major) Query: Find all information available about every undergraduate student We can try to compute the Cartesian product () SELECT - FROM Student, Ugrad; Example Instance of Ugrad: Instance of Student: SELECT - FROM Student, Ugrad; ID firstName lastName GPA Address ID major 111 Joe Smith 4.0 45 Pine av. 111 CS 111 Joe Smith 4.0 45 Pine av. 333 EE 222 Sue Brown 3.1 71 Main st. 111 CS 222 Sue Brown 3.1 71 Main st. 333 EE 333 Ann Johns 3.7 39 Bay st. 111 CS 333 Ann Johns 3.7 39 Bay st. 333 EE ID firstName lastName GPA Address 111 Joe Smith 4.0 45 Pine av. 222 Sue Brown 3.1 71 Main st. 333 Ann Johns 3.7 39 Bay st. ID major 111 CS 333 EE Which tuples should be in the query result and which shouldnt? Example Instance of Ugrad: Instance of Student: SELECT - FROM Student, Ugrad WHERE Student.ID = Ugrad.ID; ID firstName lastName GPA Address ID major 111 Joe Smith 4.0 45 Pine av. 111 CS 333 Ann Johns 3.7 39 Bay st. 333 EE ID firstName lastName GPA Address 111 Joe Smith 4.0 45 Pine av. 222 Sue Brown 3.1 71 Main st. 333 Ann Johns 3.7 39 Bay st. ID major 111 CS 333 EE Joins in SQL The above query is an example of Join operation There are various kinds of joins and we will study them later in detail To join relations R 1 ,,R n in SQL: List all these relations in the FROM clause Express the conditions in the WHERE clause in order to get the desired join Joining Relations Relation schemas: Movie (title, year, length, filmType) Owns (title, year, studioName) Query: Find title, length, and studio name of every movie Query in SQL: SELECT Movie.title, Movie.length, Owns.studioName FROM Movie, Owns WHERE Movie.title = Owns.title AND Movie.year = Owns.year; Is Owns in Owns.studioName necessary?
Joining Relations Relation schemas: Movie (title, year, length, filmType) Owns (title, year, studioName) Query: Find the title and length of every movie produced by Disney Query in SQL: SELECT Movie.title, length FROM Movie, Owns WHERE Movie.title = Owns.title AND Movie.year = Owns.year AND studioName = Disney; Joining Relations Relation schemas: Movie (title, year, length, filmType) Owns (title, year, studioName) StarsIn (title, year, starName) Query: Find the title and length of each movie with Julia Roberts, produced by Disney Query in SQL: SELECT Movie.title, Movie.length FROM Movie, Owns, StarsIn WHERE Movie.title = Owns.title AND Movie.year = Owns.year AND Movie.title = StarsIn.title AND Movie.year = StarsIn.year AND studioName = Disney AND starName = Julia Roberts; Example title year starName T1 1990 JR T2 1991 JR title year studioName T1 1990 Disney T2 1991 MGM title year length filmTyp e T1 1990 124 color T2 1991 144 color SELECT Movie.title, Movie.length FROM Movie, Owns, StarsIn WHERE Movie.title = Owns.title AND Movie.year = Owns.year AND Movie.title = StarsIn.title AND Movie.year = StarsIn.year AND studioName = Disney AND starName = Julia Roberts; title length T1 124 Movie Owns StarsIn Example Relation schemas: Movie (title, year, length, filmType, studioName, producerC#) Exec (name, address, cert#, netWorth) Query: Find the name of the producer of Star Wars Query in SQL: SELECT Exec.name FROM Movie, Exec WHERE Movie.title = Star Wars AND Movie.producerC# = Exec.cert#; Example Relation schemas: Movie (title, year, length, filmType, studioName, producerC#) Exec (name, address, cert#, netWorth) Query: Find the name of the producer of Star Wars Query with Subquery: SELECT name FROM Exec WHERE cert# = ( SELECT producerC# FROM Movie WHERE title = Star Wars ); Example Relation schemas: Movie(title, year, length, filmType, studioName, producerC#) Exec(name, address, cert#, netWorth) StarsIn(title, year, starName) Query: Find the names of the producers of Harrison Fords movies Query in SQL: SELECT name FROM Exec WHERE cert# IN (SELECT producerC# FROM Movie WHERE (title, year) IN (SELECT title, year FROM StarsIn WHERE starName = Harrison Ford)); Example Relation schemas: Movie(title, year, length, filmType, studioName, producerC#) Exec(name, address, cert#, netWorth) StarsIn(title, year, starName) Query:Find names of the producers of Harrison Fords movies Query in SQL: SELECT Exec.name FROM Exec, Movie, StarsIn WHERE Exec.cert# = Movie.producerC# AND Movie.title = StarsIn.title AND Movie.year = StarsIn.year AND starName = Harrison Ford; Correlated Subqueries Relation schema: Movie(title, year, length, filmType, studioName, producerC#) Query: Find movie titles that appear more than once Query in SQL: SELECT title FROM Movie Old WHERE year < ANY (SELECT year FROM Movie WHERE title = Old.title); Note the scopes of the variables in this query. Correlated Subqueries Query in SQL SELECT title FROM Movie Old WHERE year < ANY (SELECT year FROM Movie WHERE title = Old.title); The condition in the outer WHERE is true only if there is a movie with same title as Old.title that has a later year The query will produce a title one fewer times than there are movies with that title What would be the result if we used <>, instead of < ? For a movie title appearing 3 times, we would get 3 copies of the title in the output Aggregation in SQL SQL provides five operators that apply to a column of a relation and produce some kind of summary These operators are called aggregations These operators are used by applying them to a scalar-valued expression, typically a column name, in a SELECT clause Aggregation Operators SUM the sum of values in the column AVG the average of values in the column MIN the least value in the column MAX the greatest value in the column COUNT the number of values in the column, including the duplicates, unless the keyword DISTINCT is used explicitly Example Relation schema: Exec(name, address, cert#, netWorth) Query: Find the average net worth of all movie executives Query in SQL: SELECT AVG(netWorth) FROM Exec; The sum of all values in the column netWorth divided by the number of these values In general, if a tuple appears n times in a relation, it will be counted n times when computing the average Example Relation schema: Exec (name, address, cert#, netWorth) Query: How many tuples are there in the Exec relation? Query in SQL: SELECT COUNT(*) FROM Exec; The use of * as a parameter is unique to COUNT; using * does not make sense for other aggregation operations Example Relation schema: Exec (name, address, cert#, netWorth) Query: How many different names are there in the Exec relation? Query in SQL: SELECT COUNT (DISTINCT name) FROM Exec; In query processing time, the system first eliminates the duplicates from column name, and then counts the number of values there Aggregation -- Grouping Often we need to consider the tuples in an SQL query in groups, with regard to the value of some other column(s) Example: suppose we want to compute: Total length in minutes of movies produced by each studio: Movie(title, year, length, filmType, studioName, producerC#) We must group the tuples in the Movie relation according to their studio, and get the sum of the length values within each group; the result would be something like: studio SUM(length) Disney 12345 MGM 54321
Aggregation - Grouping Relation schema: Movie(title, year, length, filmType, studioName, producerC#) Query: What is the total length in minutes produced by each studio? Query in SQL: SELECT studioName, SUM(length) FROM Movie GROUP BY studioName; Whatever aggregation used in the SELECT clause will be applied only within groups Only those attributes mentioned in the GROUP BY clause may appear unaggregated in the SELECT clause Can we use GROUP BY without using aggregation? (Yes/No) Aggregation -- Grouping Relation schema: Movie(title, year, length, filmType, studioName, producerC#) Exec(name, address, cert#, netWorth) Query: For each producer (name), list the total length of the films produced Query in SQL: SELECT Exec.name, SUM(Movie.length) FROM Exec, Movie WHERE Movie.producerC# = Exec.cert# GROUP BY Exec.name; Aggregation HAVING clause We might be interested in not all but some groups of tuples that satisfy certain conditions We can follow a GROUP BY clause with a HAVING clause HAVING is followed by some conditions about the group We can not use a HAVING clause without GROUP BY Aggregation HAVING clause Relation schema: Movie (title, year, length, filmType, studioName, producerC#) Exec(name, address, cert#, netWorth) Query: For those producers who made at least one film prior to 1930, list the total length of the films produced Query in SQL: SELECT Exec.name, SUM(Movie.length) FROM Exec, Movie WHERE producerC# = cert# GROUP BY Exec.name HAVING MIN(Movie.year) < 1930; Aggregation HAVING clause This query chooses the group based on the property of the group
SELECT Exec.name, SUM(Movie.length) FROM Exec, Movie WHERE producerC# = cert# GROUP BY Exec.name HAVING MIN(Movie.year) < 1930;
This query chooses the movies based on the property of each movie tuple
SELECT Exec.name, SUM(Movie.length) FROM Exec, Movie WHERE producerC# = cert# AND Movie.year < 1930 GROUP BY Exec.name;
Note the difference!
Order By The SQL statements/queries we looked at so far return an unordered relation/bag (except when using ORDER BY) Movie (title, year, length, filmType, studioName, producerC#)
SELECT Exec.name, SUM(Movie.length) FROM Exec, Movie WHERE producerC# = cert# GROUP BY Exec.name HAVING MIN(Movie.year) < 1930 ORDER BY Exec.name ASC;
In general: ORDER BY A1 ASC, B DESC, C ASC;
Database Modifications SQL & Database Modifications? Next we will look at SQL statements that do not return something, but rather change the state of the database There are three types of such SQL statements/transactions: Insert tuples into a relation Delete certain tuples from a relation Update values of certain components of certain existing tuples We refer to these types of operations collectively as database modifications, and refer to such requests as transactions Insertion The insertion statement consists of: The keyword INSERT INTO The name of a relation R A parenthesized list of attributes of the relation R The keyword VALUES A tuple expression, that is, a parenthesized list of concrete values, one for each attribute in the attribute list The form of an insert statement: INSERT INTO R(A 1 , A n )
VALUES (v 1 , v n ); A tuple is created and added, where v i is the value of
attribute A i , for i =
1,2,,n Insertion Relation schema: StarsIn (title, year, starName) Update the database: Add Sydney Greenstreet to the list of stars of The Maltese Falcon
In SQL: INSERT INTO StarsIn (title,year, starName) VALUES(The Maltese Falcon, 1942, Sydney Greenstreet); Another formulation of this query: INSERT INTO StarsIn VALUES(The Maltese Falcon, 1942, Sydney Greenstreet);
Insertion The previous insertion statement was very simple It added only one tuple into a relation Instead of using explicit values for one tuple, we can compute a set of tuples to be inserted using a subquery This subquery replaces the keyword VALUES and the tuple expression in the INSERT statement Insertion Database schema: Studio(name, address, presC#) Movie(title, year, length, filmType, studioName, producerC#) Update the database: Add to Studio, all studio names mentioned in the Movie relation
If the list of attributes does not include all attributes of relation R, then the tuple created has default values for the missing attributes Since there is no way to determine an address or a president for such a studio value, NULL will be used for the attributes address and presC# Insertion Database schema: Studio(name, address, presC#) Movie(title, year, length, filmType, studioName, producerC#) Update the database: Add to Studio, all studio names mentioned in the Movie relation
In SQL: INSERT INTO Studio(name) SELECT DISTINCT studioName FROM Movie WHERE studioName NOT IN (SELECT name FROM Studio); Deletion A deletion statement consists of : The keyword DELETE FROM The name of a relation R The keyword WHERE A condition The form of a delete statement: DELETE FROM R WHERE <condition>; The effect of executing this statement is that every tuple in relation R satisfying the condition will be deleted from R Note: unlike the INSERT, we need a WHERE clause here Deletion Relation schema: StarsIn(title, year, starName) Update: Delete: Sydney Greenstreet was a star in The Maltese Falcon
In SQL: DELETE FROM StarIn WHERE title = The Maltese Falcon AND starName = Sydney Greenstreet; Deletion Relation schema: Exec(name, address, cert#, netWorth) Update: Delete every movie executive whose net worth is < $10,000,000
In SQL: DELETE FROM Exec WHERE netWorth < 10,000,000;
Anything wrong here?! Deletion Relation schema: Studio(name, address, presC#) Movie(title, year, length, filmType, studioName, producerC#) Update: Delete from Studio, all movies produced by studios not mentioned in Movie (i.e., we dont want to have non-producing studios)
In SQL: DELETE FROM Studio WHERE name NOT IN (SELECT StudioName FROM Movie); Defining Database Schema To create a table in SQL: CREATE TABLE name (list of elements); Principal elements are attributes and their types, but key declarations and constraints may also appear Example: CREATE TABLE Star ( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE ); Defining Database Schema To delete a table: DROP TABLE name; Example: DROP TABLE Star; Data types INT or INTEGER REAL or FLOAT DECIMAL(n, d) -- NUMERIC(n, d) DECIMAL(6, 2), e.g., 0123.45 CHAR(n)/BIT(B) fixed length character/bit string Unused part is padded with the "pad character, denoted as VARCHAR(n) / BIT VARYING(n) variable-length strings up to n characters Data types (contd) Time: SQL2 format is TIME 'hh:mm:ss[.ss...]' Date: SQL2 format is DATE yyyy-mm-dd (m =0 or 1) The default format of date in Oracle is dd-mon-yy Example: CREATE TABLE Days(d DATE); INSERT INTO Days VALUES(08-aug-02); Oracle function to_date converts a specified format into default format, e.g., INSERT INTO Days VALUES (to_date('2002-08-08', 'yyyy-mm-dd')); Altering Relation Schemas Adding Columns Add an attribute to a relation R with ALTER TABLE R ADD <column declaration>; Example: Add attribute phone to table Star ALTER TABLE Star ADD phone CHAR(16); Removing Columns Remove an attribute from a relation R using DROP: ALTER TABLE R DROP COLUMN <column_name>; Example: Remove column phone from Star ALTER TABLE Star DROP COLUMN phone; Note: Cant drop if it is the only column Attribute Properties We can assert that the value of an attribute to be: NOT NULL every tuple must have a real (non-null) value for this attribute DEFAULT value Null is the default value for every attribute A The owner of the relation can define some other value as the default, instead of NULL Attribute Properties CREATE TABLE Star ( name CHAR(30), address VARCHAR(255), gender CHAR(1) DEFAULT ?, birthdate DATE NOT NULL); Example: Add an attribute with a default value: ALTER TABLE Star ADD phone CHAR(16) DEFAULT unlisted; INSERT INTO Star(name, birthdate) VALUES (Sally ,0000-00-00) name address gender birthdate phone Sally NULL ? 0000-00-00 unlisted INSERT INTO Star(name, phone) VALUES (Sally,333-2255) this insertion could not be performed since the value for birthdate is not given and it is disallowed to use NULL as the default Schema Refinement Functional Dependencies: Essential to Normalization Theory
Functional Dependencies Consider the relation: Movie (title, year, length, filmType, studioName, starName) What are the functional dependencies? title, year length title, year filmType title, year studioName title, year length, filmType, studioName Note that the FD title, year starName does not hold
Logical Implication: Reasoning with FDs Consider relation R(A, B,C) with the set of FDs: F = {AB, BC} We can deduce from F that AC also holds on R. How? Apply the definition To detect possible redundancy, is it necessary to consider all the given FDs? As shown above, there might be some additional hidden (nontrivial) FDs implied by a given set of FDs Logical Implication (Contd) Consider R(A 1 ,A 2 ,A 3 ,A 4 ,A 5 ) with FDs: F = { A 1 A 2 , A 2 A 3 , A 2 A 3 A 4 , A 2 A 3 A 4 A 5 } Prove that F A 5 A 1
Solution method: Provide a counter-example; give a relation instance r of R that satisfies every FD in F but not
A 5 A 1
A 1 A 2 A 3 A 4 A 5
t1: 0 1 1 1 1 t2: 1 1 1 1 1
A desired instance r of R.
Closure of a set of FDs Defn: The closure of F, denoted F + , is the set of FDs that are logically implied by F How can we compute F + ? Definitely, F + includes F but possibly more FDs We need to know how to reason about FDs
Equivalence Defn: Suppose R is a relation schema, and S and T are sets of functional dependencies on R. T and S are equivalent (S T)
Example: Suppose R = {A,B,C}, and S = {A B, B C, A C} T = {A B, B C} Can show that S T Armstrongs Axioms [1974] R is a relation schema, and X, Y and Z are subsets of R. Reflexivity If Y X, then X Y (trivial FDs) Augmentation If X Y, then XZ YZ, for every Z Transitivity If X Y and Y Z, then X Z These are sound and complete inference rules for FDs Additional rules / axioms Other useful rules that follow from Armstrong Axioms Union (Combining) Rule If X Y and X Z, then X YZ Decomposition (Splitting) Rule If X YZ, then X Y and X Z Pseudotransitivity Rule If X Y and WY Z, then XW Z NOTE: X, Y, Z, and W are sets of attributes Example Discovering hidden FDs Consider a relation schema R = {A, B, C, G, H, I} with FDs F = { A B, A C, CG H, CG I, B H } Using these rules, we can derive the following FDs Since A B and B H, then A H, by transitivity Since CG H and CG I, then CG HI, by union Since A C then AG CG, by augmentation Now, since AG CG and CG I, then AG I, by transitivity (Do AG H) Many trivial dependencies can be derived(!) by augmentation Computing the Closure of Attributes Given a set F of FDs and a set X of attributes, how do we compute the closure of X w.r.t. F? Starting with X, we repeatedly expand the set, by adding the right hand side (RHS) of every FD, once we have included its LHD in the set. When the set cannot be expanded anymore, we have obtained the result, X + An Algorithm to Compute X + under F
X + X (initialization step) repeat for each FD W Z in F do: if W _ X + then
X + X + Z // include Z to the result until X + does not change anymore
Complexity question: In the worst case, how many times the repeat statement will be executed?
Example Consider a relation scheme R = { A, B, C, D, E, F } with the set of FDs { AB C, BC AD, D E, CF B } Compute {A, B} +
Execution result at each iteration: X + = {A, B} Using AB C, we get X + = {A, B, C} Using BC AD, we get X + = {A, B, C, D} Using D E, we get X + = {A, B, C,D, E} No more change to X + is possible. X + = {A, B} + = {A, B, C, D, E} Does the order in which FDs appear matter in this computation?
Implication Problem Revisited Is a given FD X Y implied by a set F of FDs? That is to ask whether X Y is in F + ? How to answer this question? An algorithm for this: Compute X + under F, and Check if Y is in X + If yes, then F X Y Otherwise F X Y Example Consider a relation schema R = { A, B, C, D, E, F } with the FDs F = { AB C, BC AD, D E, CF B } True/false: F AB D? Two steps: Compute X + = {A, B} + = {A, B, C, D, E} Check if D e X +
Yes, AB D is implied by F Example Consider a relation scheme R = { A, B, C, D, E, F } with FDs F = { AB C, BC AD, D E, CF B } Is D A implied by F? Two steps: Compute X + = {D} + = {D, E} Check if A e X +
Since A is not in {D, E}, we conclude that D A is not implied by F Closures and Keys When will X + include all attributes of a relation R? Clearly, the answer is yes iff X
is a (superkey) key of R To check if X
is a candidate key of R, we may check if: 1. X + contains all attributes of R, i.e., X + = R, and 2. No proper subset S of X has this property, i.e., AX, {XA} + = R Knowledge about keys is essential for Normal forms Canonical Cover Number of iterations of the algorithm for computing the closure of a set of attributes depends on the number of FDs in F The same will be observed for other algorithms that we will study (such as the decomposition algorithms) Can we minimize F? Covers FDs can be represented in several different ways without changing the set of legal/valid instances of the relation Let F and G be sets of FDs. We say G follows from F, if every relation instance that satisfies F also satisfies G. In symbols: F G. We may also say: G is implied by F or G is covered by F. If both F G and G F hold, then we say that G and F are equivalent and denote this by F G Note that F G iff F + G + If F G we may also say: G is a cover of F and vice versa
Canonical Cover Let F be a set of FDs. A canonical / minimal cover of F is a set G of FDs that satisfies the following: 1. G is equivalent to F; that is, G F
2. G is minimal; that is, if we obtain a set H of FDs from G by deleting one or more of its FDs, or by deleting one or more attributes from some FD in G, then F H 3. Every FD in G is of the form X A, where A is a single attribute Canonical Cover A canonical cover G is minimal in two respects: 1. Every FD in G is required in order for G to be equivalent to F
2. Every FD in G is as small as possible, that is, each attribute on the left hand side is necessary. Recall: the RHS of every FD in G is a single attribute Computing Canonical Cover Given a set F of FDs, how to compute a canonical cover G of F? Step 1: Put the FDs in the standard form Initialize G := F Replace each FD X A 1 A 2 A k in G with XA 1 , XA2, , XA k Step 2: Minimize the left hand side of each FD E.g., for each FD AB C in G, check if A or B on the LHS is redundant ,
i.e., (G {AB C } {A C }) + F + ? Step 3: Delete redundant FDs For each FD X A in G, check if it is redundant, i.e., whether (G {X A }) + F + ? Computing Canonical Cover R = { A, B, C, D, E, H} F = { A B, DE A, BC E, AC E, BCD A, AED B } Step one put FDs in the standard form All present FDs are in the standard form G = {AB, DE A, BC E, AC E, BCD A, AED B} Computing Canonical Cover Step two Check for left redundancy For every FD X A in G, check if the closure of a subset of X determines A. If so, remove redundant attribute(s) from X R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { A B, DE A, BC E, AC E, BCD A, AED B } A B obviously OK (no left redundancy) DE A D + = D E + = E OK (no left redundancy)
R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { A B, DE A, BC E, AC E, BCD A, AED B } BC E B + = B C + = C OK (no left redundancy) R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { A B, DE A, BC E, AC E, BCD A, AED B } AC E A + = AB C + = C OK (no left redundancy) R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { A B, DE A, BC E, AC E, BCD A, AED B } BCD A B + = B C + = C D + = D BC + = BCE CD + = CD BD + = BD OK (no left redundancy) R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { A B, DE A, BC E, AC E, BCD A, AED B } AED B A + = AB E & D are redundant we can remove them from AED B G = { A B, DE A, BC E, AC E, BCD A, A B } G = { DE A, BC E, AC E, BCD A, A B } R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover Step 3 Check for redundant FDs For every FD X A in G Remove X A from G; call the result G Compute X + under G If A e X + , then X A is redundant and hence we remove the FD X A from G (that is, we rename G to G) R = { A, B, C, D, E, H} F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { DE A, BC E, AC E, BCD A, A B } Remove DE A from G G = { BC E, AC E, BCD A, A B } Compute DE + under G DE + = DE (computed under G) Since A DE, the FD DE A is not redundant G = { DE A, BC E, AC E, BCD A, A B } R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { DE A, BC E, AC E, BCD A, A B } Remove BC E from G G = { DE A, AC E, BCD A, A B } Compute BC + under G BC + = BC BC E is not redundant G = { DE A, BC E, AC E, BCD A, A B } R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { DE A, BC E, AC E, BCD A, A B } Remove AC E from G G = { DE A, BC E, BCD A, A B } Compute AC + under G AC + = ACBE Since E ACBE, AC E is redundant remove it from G G = { DE A, BC E, BCD A, A B } R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { DE A, BC E, BCD A, A B } Remove BCD A from G G = { DE A, BC E, A B } Compute BCD + under G BCD + = BCDEA This FD is redundant remove it from G G = { DE A, BC E, A B } R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } Computing Canonical Cover G = { DE A, BC E, A B } Remove A B from G G = { DE A, BC E } Compute A + under G A + = A This FD is not redundant (Another reason why this is true?) G = { DE A, BC E, A B } G is a minimal cover for F R = { A, B, C, D, E, F } F = { A B, DE A, BC E, AC E, BCD A, AED B } Several Canonical Covers Possible? Relation R={A,B,C} with F = {A B, A C, B A, B C, C B, C A} Several canonical covers exist G = {A B, B A, B C, C B} G = {A B, B C, C A} A B C A B C A B C Can you find more ? How to Deal with Redundancy? Name Address RepresentingFirm SpokesPerson Carrie Fisher 123 Maple Star One Joe Smith Harrison Ford 789 Palm dr. Star One Joe Smith Mark Hamill 456 Oak rd. Movies & Co Mary Johns Relation Instance: Relation Schema: Star (name, address, representingFirm, spokesPerson) We can decompose this relation into two smaller relations F = { name address, representingFirm, spokePerson, representingFirm spokesPerson } How to Deal with Redundancy? Relation Schema: Star (name, address, representingFirm, spokesperson) Decompose this relation into the following relations: Star (name, address, representingFirm) with F1={ name address, representingFirm } and Firm (representingFirm, spokesPerson) with F2= { representingFirm spokesPerson }
F = { representingFirm spokesPerson } How to Deal with Redundancy? Name Address RepresentingFirm Spokesperson Carrie Fisher 123 Maple Star One Joe Smith Harrison Ford 789 Palm dr. Star One Joe Smith Mark Hamill 456 Oak rd. Movies & Co Mary Johns Relation Instance before decomposition: Name Address RepresentingFirm Carrie Fisher 123 Maple Star One Harrison Ford 789 Palm dr. Star One Mark Hamill 456 Oak rd. Movies & Co Relation Instances after decomposition: RepresentingFirm Spokesperson Star One Joe Smith Movies & Co Mary Johns Decomposition A decomposition of a relation schema R consists of replacing R by two or more non-empty relation schemas such that each one is a subset of R and together they include all attributes of R. Formally, R = {R 1 ,,R m } is a decomposition if all conditions below hold: (0) R i , for all i in {1,,m} (1) R 1 R m = R (2) R i R j ,
for different i and j in {1,,m} When m = 2, the decomposition R = { R 1 , R 2 } is called binary Not every decomposition of R is desirable Properties of a decomposition? (1) Lossless-join this is a must (2) Dependency-preserving this is desirable Explanation follows Example Relation Instance: Decomposed into: B C 2 3 2 5 A B C 1 2 3 4 2 5 A B 1 2 4 2 To recover information, we join the relations: A B C 1 2 3 4 2 5 4 2 3 1 2 5 Why do we have new tuples? Lossless-Join Decomposition R is a relation schema and F is a set of FDs over R. A binary decomposition of R into relation schemas R 1 and R 2 with attribute sets X and Y is said to be a lossless-join decomposition with respect to F, if for every instance r of R that satisfies F, we have t X ( r ) t Y ( r ) = r Thm: Let R be a relation schema and F a set of FDs on R. A binary decomposition of R into R 1 and R 2 with attribute sets X and Y is lossless iff X Y X or X Y Y, i.e., this binary decomposition is lossless if the common attributes of X and Y form a key of R 1 or R 2 Example: Lossless-join Relation Instance: Decomposed into: B C 2 3 A B C 1 2 3 4 2 3 A B 1 2 4 2 To recover the original relation r, we join the two relations: A B C 1 2 3 4 2 3 F = { B C } No new tuples ! Example: Dependency Preservation Relation Instance: Decomposed into: B C D 2 5 7 3 6 8 A B 1 2 4 3 F = { B C, B D, A D } A B C D 1 2 5 7 4 3 6 8 Can we enforce A D? How ? Dependency-Preserving Decomposition A dependency-preserving decomposition allows us to enforce every FD, on each insertion or modification of a tuple, by examining just one single relation instance Let R be a relation schema that is decomposed into two schemas with attribute sets X and Y, and let F be a set of FDs over R. The projection of F on X (denoted by F X ) is the set of FDs in F + that involve only attributes in X Recall that a FD U V in F + is in F X if all the attributes in U and V are in X; In this case we say this FD is relevant to X The decomposition of < R, F > into two schemas with attribute sets X and Y is dependency-preserving if ( F X F Y ) + F +
Normal Forms Given a relation schema R, we must be able to determine whether it is good or we need to decompose it into smaller relations, and if so, how? To address these issues, we need to study normal forms If a relation schema is in one of these normal forms, we know that it is in some good shape in the sense that certain kinds of problems (related to redundancy) cannot arise 1NF 2NF 3NF BCNF Normal Forms The normal forms based on FDs are First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) These normal forms have increasingly restrictive requirements
(Ebook) Grokking Relational Database Design by Dr. Qiang Hao and Dr. Michael Tsikerdekis ISBN 9781633437418, 1633437418 - Read the ebook online or download it for the best experience