Database For Final
Database For Final
• Relationship types o degree 3 are called ternary and o degree n are called n-ary
• I needed, the binary and n-ary relationships can all be included in the schema
design
• For example, the TAUGHT_DURING binary relationship in Figure 3.18 (see next
slide) can be derived rom the ternary relationship OFFERS (based on the meaning o the
relationships)
• The (min, max) constraints can be displayed on the edges – however, they do not
ully describe the constraints
– An M or N indicates no constraint
• In general, both (min, max) and 1, M, or N are needed to ully describe the
constraints
Enhanced Entity-Relationship (EER) Modelling
– Includes all modelling concepts o basic ER. (EER) diagrams are basically a more
expansive version o ER diagrams.
– Additional concepts:
• subclasses/superclasses
• specialization/generalization
• The additional EER concepts are used to model applications more completely and more
accurately
– MANAGER
– SALARIED_EMPLOYEE, HOURLY_EMPLOYEE
– EMPLOYEE/SECRETARY
– EMPLOYEE/TECHNICIAN
– EMPLOYEE/MANAGER
– …
• Example:
– In the previous slide, SECRETARY (as well as TECHNICIAN and ENGINEER) inherit
the attributes Name, SSN, …, rom EMPLOYEE
Specialization (1)
Specialization (2)
Specialization (3)
Generalization
Generalization (2)
– We do not use this notation because it is o ten sub ective as to which process is
more appropriate or a particular situation
– We can call all entity types (and their corresponding collections) classes,
whether they are entity types, superclasses, or subclasses.
– Completeness Constraint:
– Speci ies that the subclasses o the specialization must be dis oint:
• that is the same entity may be a member o more than one subclass o the
specialization – Speci ied by o in EER diagram
• Completeness Constraint:
– Total speci ies that every entity in the superclass must be a member o some subclass
in the specialization/generalization
• Shown in EER diagrams by a double line
– Overlapping, total
– Overlapping, partial
• Note: Generalization usually is total because the superclass is derived rom the
subclasses.
• That the relational model’s basic components are entities, attributes, and
relationships among entities
– Table
• Table:
• Table also called a relation because the relational model’s creator, Codd, used the
term relation as a synonym or table
• Think o a table as a persistent relation:
Keys
(row)
– I you know the value o attribute A, you can look up (determine) the value o
attribute B
• Composite key
• Key attribute
• Candidate key
Keys (continued)
– An attribute whose values match primary key values in the related table
• Re erential integrity
– FK contains a value that re ers to an existing valid tuple (row) in another relation
• Secondary key
Null Values
• No data entry
Chapter 4
Normalization or Relational
Databases
Chapter Outline
1. In ormal Design Guidelines or Relational Databases
2.1 De inition o FD
Chapter Outline
• Bottom Line:
– Design a schema that can be explained easily relation by relation.
– Wastes storage
• update anomalies
• Insertion anomalies
• Deletion anomalies
• Update Anomaly:
• Insert Anomaly:
• Conversely
• Delete Anomaly:
– When a pro ect is deleted, it will result in deleting all the employees who work
on that pro ect. Alternately, i an employee is the sole employee on a pro ect, deleting
that employee would result in deleting the corresponding pro ect.
• GUIDELINE 2:
– Design a schema that does not su er rom the insertion, deletion and update
anomalies.
– I there are any anomalies present, then note them so that applications can be
made to take them into account.
– Relations should be designed such that their tuples will have as ew NULL values
as possible
– Attributes that are NULL requently could be placed in separate relations (with
the primary key)
• Reasons or nulls:
• GUIDELINE 4:
– The relations should be designed to satis y the lossless oin condition.
– Are constraints that are derived rom the meaning and interrelationships o the
data attributes
– It plays a vital role to ind the di erence between good and bad database design.
• The attribute set on the le t side o the arrow, X is called Determinant, while on
the right side, Y is called the Dependent.
• X -> Y holds i whenever two tuples have the same value or X, they must have
the same value or Y
– For any two tuples t1 and t2 in any relation instance r(R): I t1[X]=t2[X], then
t1[Y]=t2[Y]
Given a set o FDs F, we can in er additional FDs that hold whenever the FDs in F hold
– (Notation: XZ stands or X U Z)
• IR1, IR2, IR3 orm a sound and complete set o in erence rules
– These are rules hold and all other rules that hold can be deduced rom these
In erence Rules or FDs (2)
• The last three in erence rules, as well as any other in erence rules, can be
deduced rom IR1, IR2, and IR3 (completeness property)
• Closure o a set F o FDs is the set F+ o all FDs that can be in erred rom F
• X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
• De inition (Covers):
2. We cannot remove any dependency rom F and have a set o dependencies that is
equivalent to F.
• Normalization:
• Normal orm:
– Condition using keys and FDs o a relation to certi y whether a relation schema is
in a particular normal orm
• There are three stages o normal orms known as irst normal orm (or 1NF),
second normal orm (or 2NF), and third normal orm (or 3NF).
• 4NF
• 5NF
• Normalization is carried out in practice so that the resulting designs are o high
quality and meet the desirable properties
• The practical utility o these normal orms becomes questionable when the
constraints on which they are based are hard to understand or to detect
• The database designers need not normalize to the highest possible normal orm
• Denormalization:
– The process o storing the oin o higher normal orm relations as a base
relation— which is in a lower normal orm.
• A superkey o a relation schema R = {A1, A2, ...., An} is a set o attributes S subset
-o R with the property that no two tuples t1 and t2 in any legal relation state r o R will
have t1[S] = t2[S]
• A key K is a superkey with the additional property that removal o any attribute
rom K will cause K not to be a superkey any more.
• I a relation schema has more than one key, each is called a candidate key.
– One o the candidate keys is arbitrarily designated to be the primary key, and the
others are called secondary keys.
• A Prime attribute must be a member o some candidate key
• Disallows
– composite attributes
– multivalued attributes
• De initions
• Examples:
– {SSN, PNUMBER} -> HOURS is a ull FD since neither SSN -> HOURS nor PNUMBER
-> HOURS hold
– {SSN, PNUMBER} -> ENAME is not a ull FD (it is called a) since SSN -> ENAME
also holds partial dependency
• De inition:
– Transitive unctional dependency: a FD X -> Z that can be derived rom two FDs
X -> Y and Y -> Z
• Examples:
• Since there is no set o attributes X where SSN -> X and X -> ENAME
Third Normal Form (2)
• R can be decomposed into 3NF relations via the process o 3NF normalization
• NOTE:
– In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only i
Y is not a candidate key.
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
• The ollowing more general de initions take into account relations with multiple
candidate keys
• De inition:
• (a) X is a superkey o R, or
• (b) A is a prime attribute o R
• {student, course} is a candidate key or this relation and that the dependencies
shown ollow the pattern in Figure 10.12 (b).
• Out o the above three, only the 3rd decomposition will not generate spurious
tuples a ter oin.(and hence has the non-additivity property).
Chapter Outline
• Relational Algebra
• Relational Calculus
• These operations enable a user to speci y basic retrieval requests (or queries)
• The result o an operation is a new relation, which may have been ormed rom
one or more input relations
– This property makes the algebra “closed” (all ob ects in relational algebra are
relations)
– The result o a relational algebra expression is also a relation that represents the
result o a database query (or retrieval request)
• Muhammad ibn Musa al-Khwarizmi (800-847 CE) wrote a book titled al- abr
about arithmetic o variables
• CARTESIAN PRODUCT ( x )
• DIVISION
– Keeps only those tuples that satis y the quali ying condition
– Tuples satis ying the condition are selected whereas the other tuples are
discarded ( iltered out)
• Examples:
• DNO = 4 (EMPLOYEE)
– SELECT is commutative:
• <cond1>( < cond2> ( <cond3>(R)) = <cond1> AND < cond2> AND < cond3>(R)))
– The number o tuples in the result o a SELECT is less than (or equal to) the
number o tuples in the input relation R
• This operation keeps certain columns (attributes) rom a relation and discards the
other columns.
• Example: To list each employee’s irst and last name and salary, the ollowing is
used:
– LNAME, FNAME,SALARY(EMPLOYEE)
– <attribute list>(R)
– This is because the result o the pro ect operation must be a set o tuples
– The number o tuples in the result o pro ection <list>(R) is always less or equal
to the number o tuples in R
• I the list o attributes includes a key o R, then the number o tuples in the result
o
• <list1> ( <list2> (R) ) = <list1> (R) as long as <list2> contains the attributes
in <list1>
– We can apply one operation at a time and create intermediate result relations.
• In the latter case, we must give names to the relations that hold the intermediate
results.
• To retrieve the irst name, last name, and salary o all employees who work in
department number 5, we must apply a select and a pro ect operation
– DEP5_EMPS DNO=5(EMPLOYEE)
• In some cases, we may want to rename the attributes o a relation or the relation
name or both
• The general RENAME operation can be expressed by any o the ollowing orms:
– S(R) changes:
– I we write:
• RESULT will have the same attribute names as DEP5_EMPS (same attributes as
EMPLOYEE)
• I we write:
– The result o R S, is a relation that includes all tuples that are either in R or in S
or in both
R and S
– The two operand relations R and S must be “type compatible” (or UNION
compatible)
• Example:
– To retrieve the social security numbers o all employees who either work in
department 5 (RESULT1 below) or directly supervise an employee who works in
department 5 (RESULT2 below)
– We can use the UNION operation as ollows:
RESULT1 SSN(DEP5_EMPS)
RESULT2(SSN) SUPERSSN(DEP5_EMPS)
– The union operation produces the tuples that are in either RESULT1 or RESULT2 or
both
• R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible i :
• The resulting relation or R1 R2 (also or R1 R2, or R1–R2, see next slides) has the
same attribute names as the irst operand relation R1 (by convention)
• INTERSECTION is denoted by
• The result o the operation R S, is a relation that includes all tuples that are in
both R and S
– The attribute names in the result will be the same as the attribute names in R
• The result o R – S, is a relation that includes all tuples that are in R but not in S
– The attribute names in the result will be the same as the attribute names in R
Student Instructor
– The resulting relation state has one tuple or each combination o tuples—one
rom R and one rom S.
– FEMALE_EMPS SEX=’F’(EMPLOYEE)
Relational Algebra Operations rom Set Theory: CARTESIAN PRODUCT (cont.) - Example
– A special operation, called JOIN combines this sequence into a single operation
– This operation is very important or any relational database with more than a
single relation, because it allows us combine related tuples rom various relations
– The general orm o a oin operation on two relations R(A1, A2, . . ., An) and S(B1,
B2, . . ., Bm) is:
– where R and S can be any relations that result rom general relational algebra
expressions.
• Example: Suppose that we want to retrieve the name o the manager o each
department.
– To get the manager’s name, we need to combine each DEPARTMENT tuple with
the EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department
tuple.
– We do this by using the oin operation.
– Combines each department record with the employee who manages the
department
EQUIJOIN Operation
• The most common use o oin involves oin conditions with equality comparisons
only
• Such a oin, where the only comparison operator used is =, is called an EQUIJOIN.
• because one o each pair o attributes with identical values is super luous
– The standard de inition o natural oin requires that the two oin attributes, or
each pair o corresponding oin attributes, have the same name in both relations
– The implicit oin condition includes each pair o attributes with the same name,
“AND”ed together:
• Q(A,B,C,D,E)
• DIVISION Operation
– R(Z) S(X), where X subset Z. Let Y = Z - X (and hence Z = X Y); that is, let Y be
the set o attributes o R that are not attributes o S.
– The result o DIVISION is a relation T(Y) that includes a tuple t i tuples tR appear
in R with tR [Y] = t, and with
– For a tuple t to appear in the result T o the DIVISION, the values in t must appear in R
in combination with every tuple in S.
Example o DIVISION
• Examples o such unctions include retrieving the average or total salary o all
employees or the total number o employee tuples.
– These unctions are used in simple statistical queries that summarize in ormation
rom the database tuples.
• Common unctions applied to collections o numeric values include
– ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value rom the
EMPLOYEE relation
– ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value rom the
EMPLOYEE relation
– ℱSUM Salary (EMPLOYEE) retrieves the sum o the Salary rom the EMPLOYEE
relation
• Note: count ust counts the number o rows, without removing duplicates
ALL_EMPS SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN) ESSN(DEPENDENT)
Relational Calculus
– This is the main distinguishing eature between relational algebra and relational
calculus.
Relational Calculus
• The tuple relational calculus is based on speci ying a number o tuple variables.
• Each tuple variable usually ranges over a particular database relation, meaning
that the variable may take as its value any individual tuple rom that relation. • A simple
tuple relational calculus query is o the orm
{t | COND(t)}
– The result o such a query is the set o all tuples t that satis y COND (t).
Tuple Relational Calculus
• Example: To ind the irst and last names o all employees whose salary is above
$50,000, we can write the ollowing tuple calculus expression:
• The condition EMPLOYEE(t) speci ies that the range relation o tuple variable t is
EMPLOYEE.
• The irst and last name (PROJECTION FNAME, LNAME) o each EMPLOYEE tuple t
that satis ies the condition t.SALARY>50000 (SELECTION SALARY >50000) will be
retrieved.
Relational Algebra
Relational Algebra is a procedural language. In Relational Algebra, The order is speci ied
in which the operations have to be per ormed. In Relational Algebra, rameworks are
created to implement the queries. The basic operation included in relational algebra are:
1. Select (σ)
6. Rename (ρ)
Relational Calculus
In Relational Calculus, the order is not speci ied in which the operation has to be
per ormed.
Where, t: the set o tuples and p: is the condition which is true or the given set o
tuples.
▪ Where
▪ Joins
▪ Grouping
SQL
SQL commands can be used interactively as a query language within the DBMS.
Data de inition language (DDL) is a language that allows the user to de ine the
data and their relationship to other types o data.
Data De inition language statements work with the structure o the database
table.
DDL Commands
DDL Commands
◼ Table name.
Data types
When a table is created, each column in the table is assigned a data type.
◼ Varchar2
◼ Char
◼ Number
CREATE
....
);
ALTER
This command is used to add, delete or change columns in the existing table.
DROP
This command is used to remove an existing table along with its structure rom
the Database.
Alter
Drop
INSERT
UPDATE
DELETE
SELECT
VALUES (value-list)
VALUES (‘MIS499’,4);
VALUES (‘MIS499’,’’,4);
COLUMN
Deleting Data
DELETE COURSE; deletes all rows Be care ul!! This deletes ALL o the rows in
your table. I you use this command in error, you can use ROLLBACK to undo the changes.
DELETE COURSE WHERE HOURS=4; deletes a group o rows delete course where
hours<4;
Updating Data
Applies to
• Inserts,
• Updates, and
• Deletes
FROM table_name
WHERE condition/criteria;
This statement will retrieve the speci ied ield values or all rows in the speci ied
table that meet the speci ied conditions.
Every SELECT statement returns a recordset.
WHERE Conditions
COLUMNS
Customer Table
Customer_name varchar(20),
Balance loat,
Credit_limit loat,
State_in varchar(10)
);
LOOK FOR
AND/OR/NOT Conditions
OR CREDIT_LIMIT>1000;
TWO COMPARISONS
TWO COMPARISONS
STATE<>‘OH’
More on AND/OR/NOT
D OH 1000 200
• A primary key is a ield in a table which uniquely identi ies each row/record in a
database table. Primary keys must contain unique values.
• A table can have only one primary key, which may consist o single or multiple
ields.
• When multiple ields are used as a primary key, they are called a composite key.
• I a table has a primary key de ined on any ield(s), then you cannot have two
records having the same value o that ield(s).
• To create a PRIMARY KEY constraint on the "ID" column when the CUSTOMERS
table already exists, use the ollowing SQL syntax −
ALTER TABLE CUSTOMER ADD PRIMARY KEY (ID);
• For de ining a PRIMARY KEY constraint on multiple columns, use the SQL syntax
given below.
• To create a PRIMARY KEY constraint on the "ID" and "NAMES" columns when
CUSTOMERS table already exists, use the ollowing SQL syntax.
• You can clear the primary key constraints rom the table with the syntax given below.
Foreign Key
• A oreign key is a key used to link two tables together. This is sometimes also
called as a re erencing key.
• The relationship between 2 tables matches the Primary Key in one o the tables
with a Foreign Key in the second table.
• I a table has a primary key de ined on any ield(s), then you cannot have two
records having the same value o that ield(s).
DISTINCT
Arithmetic operators: +, -, *, /
Comparison operators: =, >, >=, <, <=, <>
Concatenation operator: ||
Substring comparisons: %, _
BETWEEN
AND, OR
ORDER BY Clause
IN
Joins:
For every relationship among the tables in the FROM clause, you need one
WHERE condition (2 tables - 1 oin, 3 tables - 2 oins…)
SQL
◼ Order by
◼ Group by
◼ Distinct keyword
Advanced SQL
◼ Constraints
◼ Using Joins
◼ Using Views
◼ Indexes
◼ Union Clause
Customers Table
Orders Table
SQL Order by
The SQL ORDER BY clause is used to sort the data in ascending or descending
order, based on one or more columns. Some databases sort the query results in an
ascending order by de ault.
Syntax
SELECT column-list
FROM table_name
[WHERE condition]
You can use more than one column in the ORDER BY clause. Make sure whatever
column you are using to sort that column should be in the column-list.
SQL Order by
Example: The ollowing code block has an example, which would sort the result in
an ascending order by the NAME and the SALARY
Example 2: The ollowing code block has an example, which would sort the result
in the descending order by NAME.
SQL Group by
This GROUP BY clause ollows the WHERE clause in a SELECT statement and
precedes the ORDER BY clause.
Syntax
WHERE [ conditions ]
SQL Group by
In SQL, we use the GROUP BY clause to group rows based on the value o columns.
Example:
◼ I you want to know the total amount o the salary on each customer, then the
GROUP BY query would be as ollows.
GROUP BY NAME;
The SQL DISTINCT keyword is used in con unction with the SELECT statement to
eliminate all the duplicate records and etching only unique records.
Syntax:
FROM table_name
WHERE [condition]
Example:
ORDER BY SALARY;
The SQL Joins clause is used to combine records rom two or more tables in a
database.
A JOIN is a means or combining ields rom two tables by using values common
to each.
Several operators can be used to oin tables, such as =, <, >, <>, <=, >=, !=,
BETWEEN, LIKE, and NOT; they can all be used to oin tables. However, the most
common operator is the equal to symbol.
SQL – Using Joins
◼ LEFT JOIN − returns all rows rom the le t table, even i there are no matches in
the right table.
◼ RIGHT JOIN − returns all rows rom the right table, even i there are no matches in
the le t table.
◼ FULL JOIN − returns rows when there is a match in one o the tables.
◼ SELF JOIN − is used to oin a table to itsel as i the table were two tables,
temporarily renaming at least one table in the SQL statement.
◼ CARTESIAN JOIN − returns the Cartesian product o the sets o records rom the
two or more oined tables.
Example:
Syntax:
SELECT column_name(s)
FROM table1
ON table1.column_name = table2.column_name;
FROM Orders
Inner Join
The INNER JOIN keyword selects records that have matching values in both tables.
FROM Products
Categories.CategoryID;
A view is nothing more than a SQL statement that is stored in the database with
an associated name.
A view is actually a composition o a table in the orm o a prede ined SQL query.
A view can contain all rows o a table or select rows rom a table.
A view can be created rom one or many tables which depends on the written SQL
query to create a view. Views are a type o virtual tables
◼ Structure data in a way that users or classes o users ind natural or intuitive.
◼ Restrict access to the data in such a way that a user can see and (sometimes)
modi y exactly what they need and no more.
◼ Summarize data rom various tables which can be used to generate reports.
Views can be created rom a single table, multiple tables or another view. You can
also drop views, can insert rows into views and delete rows rom views.
WHERE [condition];
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual
table. Following is an example or the same.
UPDATE CUSTOMERS_VIEW
SET AGE = 35
SQL - Indexes
Indexes are special lookup tables that the database search engine can use to
speed up data retrieval.
An index helps to speed up SELECT queries and WHERE clauses, but it slows
down data input, with the UPDATE and the INSERT statements.
Indexes can be created or dropped with no e ect on the data. Creating an index
involves the CREATE INDEX statement,
ON table_name (column_name);
SQL - Indexes
Implicit Indexes: are indexes that are automatically created by the database
server when an ob ect is created. Indexes are automatically created or primary key
constraints and unique constraints.
Unique Indexes: are used not only or per ormance, but also or data integrity.
The DROP INDEX Command: An index can be dropped using SQL DROP command.
SHORT_DESC VARCHAR(100),
AUTHOR VARCHAR(40),
PUBLISHER VARCHAR(40),
PRICE FLOAT,
);
Poorly designed SQL indexes and a lack o them are primary sources o database and
application per ormance issues. Here are a ew indexing strategies that should be
considered when indexing tables:
Avoid indexing highly used table/columns – The more indexes on a table the
bigger the e ect will be on a per ormance o Insert, Update, Delete, and Merge
statements because all indexes must be modi ied appropriately. This means that SQL
Server will have to do page splitting, move data around, and it will have to do that or all
a ected indexes by those DML statements
Use narrow index keys whenever possible – Keep indexes narrow, that is, with as
ew columns as possible. Exact numeric keys are the most e icient SQL index keys (e.g.
integers). These keys require less disk space and maintenance overhead
SQL – Indexing strategy guidelines
Use clustered indexes on unique columns – Consider columns that are unique or
contain many distinct values and avoid them or columns that undergo requent changes
Cover SQL indexes or big per ormance gains – Improvements are attained when
the index holds all columns in the query
The SQL UNION clause/operator is used to combine the results o two or more
SELECT statements without returning any duplicate rows.
[WHERE condition]
UNION
[WHERE condition]