Module 2
Module 2
Module 2
Examples of domains:
Attribute: the name of the role played by some value (coming from some domain) in the
context of a relational schema. The domain of attribute A is denoted dom(A).
Tuple: A tuple is a mapping from attributes to values drawn from the respective domains of
those attributes. A tuple is intended to describe some entity (or relationship between entities)
in the miniworld.As an example, a tuple for a PERSON entity might be
Relation: A (named) set of tuples all of the same form (i.e., having the same set of
attributes). The term table is a loose synonym. (Some database purists would argue that a
table is "only" a physical manifestation of a relation.)
Relational Schema: used for describing (the structure of) a relation. E.g., R(A 1, A2, ..., An)
says that R is a relation with attributes A1, ... An. The degree of a relation is the number of
attributes it has, here n.
One would think that a "complete" relational schema would also specify the domain of each
attribute.
Relational Database: A collection of relations, each one consistent with its specified
relational schema. The model was first introduced by Tod Codd of IBM Research in 1970. It
uses the concept of a mathematical relation. Hence, the database is a collection of relations. A
relation can be thought as a table of values, each row in the table represents a collection of
related data values. In relational model terminology, a row is called a tuple, a column header
is called an attribute.
The degree of a relation is the number of attributes in its relation schema. For example, a
relation schema of order 7 STUDENT(Name, SSN, HPhone, Address, WPhone, Age, GPA)
describes students. The relation student can be shown as follows:
Characteristics of Relations
Ordering of Attributes: A tuple is best viewed as a mapping from its attributes (i.e.,
the names we give to the roles played by the values comprising the tuple) to the
corresponding values. Hence, the order in which the attributes are listed in a table is
irrelevant. (Note that, unfortunately, the set theoretic operations in relational algebra (at least
how E&N define them) make implicit use of the order of the attributes. Hence, E&N view
attributes as being arranged as a sequence rather than a set.)
Values of Attributes: For a relation to be in First Normal Form, each of its attribute
domains must consist of atomic (neither composite nor multi-valued) values. Much of the
theory underlying the relational model was based upon this assumption. Chapter 10 addresses
the issue of including non-atomic values in domains. (Note that in the latest edition of C.J.
Date's book, he explicitly argues against this idea, admitting that he has been mistaken in the
past.) The Null value: used for don't know, not applicable.
Keep in mind that some relations represent facts about entities (e.g., students) whereas others
represent facts about relationships (between entities). (e.g., students and course sections).
The closed world assumption states that the only true facts about the miniworld are those
represented by whatever tuples currently populate the database.
A tuple can be considered as a set of (<attribute>, <value>) pairs. Thus the following two
tuples are identical:
Tuple ordering is not a part of relation, that is the following relation is identical to that of
previously considered table.
An attribute A can be qualified with the relation name R using the dot notation R.A -for
example, STUDENT.Name or STUDENT.Age.
R(A1, A2, ..., An) is a relational schema of degree n denoting that there is a relation R having as
its attributes A1, A2, ..., An.By convention, Q, R, and S denote relation names. By convention,
q, r, and s denote relation states. For example, r(R) denotes one possible state of relation R. If
R is understood from context, this could be written, more simply, as r. By convention, t, u,
and v denote tuples. The "dot notation" R.A (e.g., STUDENT.Name) is used to qualify an
attribute name, usually for the purpose of distinguishing it from a same-named attribute in a
different relation (e.g., DEPARTMENT.Name).
schema-based: can be expressed using DDL; this kind is the focus of this section.
application-based: are specific to the "business rules" of the miniworld and typically
difficult or impossible to express and enforce within the data model. Hence, it is left to
application programs to enforce.
Domain Constraints:Each attribute value must be either null (which is really a non-value) or
drawn from the domain of that attribute. Note that some DBMS's allow you to impose the not
null constraint upon an attribute, which is to say that that attribute may not have the
(non-)value null.
Key Constraints:A relation is a set of tuples, and each tuple's "identity" is given by the
values of its attributes. Hence, it makes no sense for two tuples in a relation to be identical
(because then the two tuples are actually one and the same tuple). That is, no two tuples may
have the same combination of values in their attributes.
Usually the miniworld dictates that there be (proper) subsets of attributes for which no two
tuples may have the same combination of values. Such a set of attributes is called a superkey
of its relation. From the fact that no two tuples can be identical, it follows that the set of all
attributes of a relation constitutes a superkey of that relation.
A key is a minimal superkey, i.e., a superkey such that, if we were to remove any of its
attributes, the resulting set of attributes fails to be a superkey.
Example: Suppose that we stipulate that a faculty member is uniquely identified by Name
and Address and also by Name and Department, but by no single one of the three attributes
mentioned. Then { Name, Address, Department } is a (non-minimal) superkey and each of {
Name, Address } and { Name, Department } is a key (i.e., minimal superkey).
Candidate key: any key! (Hence, it is not clear what distinguishes a key from a candidate
key.)
Primary key: a key chosen to act as the means by which to identify tuples in a relation.
Typically, one prefers a primary key to be one having as few attributes as possible.
A relational database schema is a set of schemas for its relations together with a set of
integrity constraints. A relational database state/instance/snapshot is a set of states of its
relations such that no integrity constraint is violated. Figure 3.5 shows a relational database
schema that we call COMPANY = { EMPLOYEE , DEPARTMENT ,
DEPT_LOCATIONS , PROJECT , WORKS_ON , DEPENDENT }. The underlined
attributes represent primary keys.
When we refer to a relational database, we implicitly include both its schema and its current
state. A database state that does not obey all the integrity constraints is called an invalid
state, and a state that satisfies all the constraints in the defined set
Entity Integrity Constraint:In a tuple, none of the values of the attributes forming the
relation's primary key may have the (non-)value null. Or is it that at least one such attribute
must have a non-null value? In my opinion, E&N do not make it clear!
(R is called the referencing relation and S the referenced relation.) For this to make sense, the
set of attributes of R forming the foreign key should "correspond to" some superkey of S.
Indeed, by definition we require this superkey to be the primary key of S.
This constraint says that, for every tuple in R, the tuple in S to which it refers must actually be
in S. Note that a foreign key may refer to a tuple in the same relation and that a foreign key
may be part of a primary key (indeed, for weak entity types, this will always occur). A
foreign key may have value null (necessarily in all its attributes??), in which case it does not
refer to any tuple in the referenced relation.
For each of the update operations (Insert, Delete, and Update), we consider what kinds of
constraint violations may result from applying it and how we might choose to react.
Insert:
If an insertion violates one or more constraints, the default option is to reject the insertion.
Ways of dealing with it: reject the attempt to insert! Or give user opportunity to try again
with different attribute values.
Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, NULL , ‘1960-04-05’, ‘6357 Windy Lane, Katy,
Result: This insertion violates the entity integrity constraint ( NULL for the
Operation:
Insert <‘Alicia’, ‘J’, ‘Zelaya’, ‘999887777’, ‘1960-04-05’, ‘6357 Windy Lane, Katy,
Result: This insertion violates the key constraint because another tuple with
the same Ssn value already exists in the EMPLOYEE relation, and so it is
rejected.
Operation:
Operation:
Delete: Referential integrity violation: a tuple referring to the deleted one exists. Three
options for dealing with it:
Operation:
Delete the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10.
Operation:
Operation:
Result: This deletion will result in even worse referential integrity violations,
Operation:
Update the salary of the EMPLOYEE tuple with Ssn = ‘999887777’ to 28000.
Result: Acceptable.
Operation:
Result: Acceptable.
Operation:
Operation:
‘987654321’.
Result: Unacceptable, because it violates primary key constraint by repeating a value that
already exists as a primary key in another tuple; it violates referential integrity constraints
because there are other relations that refer to the existing value of Ssn .
1.4 Transactions:
This concept is relevant in the context where multiple users and/or application
programs are accessing and updating the database concurrently. A transaction is a logical unit
of work that may involve several accesses and/or updates to the database (such as what might
be required to reserve several seats on an airplane flight). The point is that, even though
several transactions might be processed concurrently, the end result must be as though the
transactions were carried out sequentially. (Example of simultaneous withdrawals from same
checking account.)
JOIN operations .
σ Dno = 4 ( EMPLOYEE )
σ Salary > 30000 ( EMPLOYEE )
where the symbol σ (sigma) is used to denote the SELECT operator and the selection
condition is a Boolean expression (condition) specified on the attributes of relation R. Notice
that R is generally a relational algebra expression whose result is a relation—the simplest
such expression is just the name of a database relation. The relation resulting from the
SELECT operation has the same attributes as R.
The Boolean expression specified in <selection condition> is made up of a number of clauses
of the form
where <attribute name> is the name of an attribute of R, <comparison op> is normally one of
the operators {=, <, ≤, >, ≥, ≠}, and <constant value> is a constant value from the attribute
domain. Clauses can be connected by the standard Boolean operators and, or, and not to form
a general selection condition. For example, to select the tuples for all employees who either
work in department 4 and make over $25,000 per year, or work in department 5 and make
over $30,000, we can specify the following SELECT operation:
σ ( Dno = 4 AND Salary > 25000 ) OR ( Dno = 5 AND Salary > 30000 ) ( EMPLOYEE )
selected. All the selected tuples appear in the result of the SELECT operation. The Boolean
conditions AND , OR , and NOT have their normal interpretation, as follows:
(cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE ; other-
wise, it is FALSE .
(cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE ;
otherwise, it is FALSE .
( NOT cond) is TRUE if cond is FALSE ; otherwise, it is FALSE .
The SELECT operator is unary; that is, it is applied to a single relation. Moreover, the
selection operation is applied to each tuple individually; hence, selection conditions cannot
involve more than one tuple. The degree of the relation resulting from a SELECT operation
—its number of attributes is the same as the degree of R. The number of tuples in the
resulting relation is always less than or equal to the number of tuples in R. That is, |σ c (R)| ≤
|R| for any condition C. The fraction of tuples
selected by a selection condition is referred to as the selectivity of the condition.
σ < cond1 > ( σ < cond2 > ( R )) = σ < cond2 > ( σ < cond1 > ( R ))
Hence, a sequence of SELECT s can be applied in any order. In addition, we can always
combine a cascade (or sequence) of SELECT operations into a single SELECT operation
with a conjunctive ( AND ) condition; that is,
σ < cond1 > ( σ < cond2 > ( ... ( σ < condn > ( R )) ... )) = σ < cond1 > AND < cond2 >
AND...AND < condn > ( R )
In SQL, the SELECT condition is typically specified in the WHERE clause of a query.
Keeps only certain attributes (columns) from a relation R specified in an attribute list L.
Form of operation: L(R) . Resulting relation has only those attributes of R specified in
L .The PROJECT operation eliminates duplicate tuples in the resulting relation so that it
remains a mathematical set (no duplicate elements). Duplicate tuples are eliminated by the
operation.
Example: SEX,SALARY(EMPLOYEE)
If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the
resulting relation.
Sequences of operations:
Example: Retrieve the names and salaries of employees who work in department 4:
FNAME,LNAME,SALARY ( DNO=4(EMPLOYEE) )
DEPT4_EMPS DNO=4(EMPLOYEE)
Attributes can optionally be renamed in the resulting left-hand-side relation (this may be
required for some operations that will be presented later):
DEPT4_EMPS (EMPLOYEE)
DNO=4
(FIRSTNAME,LASTNAME,SALARY)
NAME,LNAME,SALARY(DEPT4_EMPS)
For , , -, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the
same number of attributes, and the domains of corresponding attributes must be compatible;
that is, dom(Ai) = dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility. The
resulting relation for , , or - has the same attribute names as the first operand relation R1
(by convention).
The relation RESULT1 has the Ssn of all employees who work in department 5, whereas
RESULT2 has the Ssn of all employees who directly supervise an employee who works in
department 5. The UNION operation produces the tuples that are in either RESULT1 or
RESULT2 or both, while eliminating any duplicates. Thus, the Ssn value ‘333445555’
appears only once in the result.
We can define the three operations UNION , INTERSECTION , and SET DIFFERENCE
on two union-compatible relations R and S as follows:
UNION : The result of this operation, denoted by R ∪ S, is a relation that includes all tuples
that are either in R or in S or in both R and S. Duplicate tuples are eliminated.
INTERSECTION : The result of this operation, denoted by R ∩ S, is a relation that includes
all tuples that are in both R and S.
SET DIFFERENCE (or MINUS ): The result of this operation, denoted by R – S, is a
relation that includes all tuples that are in R but not in S.
Figure 6.4 illustrates the three operations. The relations STUDENT and INSTRUCTOR in
Figure 6.4(a) are union compatible and their tuples represent the names of students and the
names of instructors, respectively. The result of the UNION operation in Figure 6.4(b) shows
the names of all students and instructors. Note that duplicate tuples appear only once in the
result. The result of the
INTERSECTION operation (Figure 6.4(c)) includes only those who are both students and
instructors.
CARTESIAN PRODUCT
R(A1, A2, ..., Am, B1, B2, ..., Bn) R1(A1, A2, ..., Am) X R2 (B1, B2, ..., Bn)
A tuple t exists in R for each combination of tuples t1 from R1 and
t2 from R2 such that:
Example: Combine each DEPARTMENT tuple with the EMPLOYEE tuple of the
manager. DEP_EMP DEPARTMENT X EMPLOYEE
DEPT_MANAGER MGRSSN=SSN(DEP_EMP)
The JOIN operation, denoted by , is used to combine related tuples from two relations into
single “longer” tuples. To illustrate JOIN , suppose that we want to retrieve the name of the
manager of each department. To get the manager’s name, we need to combine each
department tuple with the employee tuple whose Ssn value matches the Mgr_ssn value in the
department tuple.
DEPT_MGR ← DEPARTMENT
Mgr_ssn = Ssn EMPLOYEE
RESULT ← π Dname , Lname , Fname ( DEPT_MGR )
The first operation is illustrated in Figure 6.6. Note that Mgr_ssn is a foreign key of the
DEPARTMENT relation that references Ssn , the primary key of the EMPLOYEE relation.
This referential integrity constraint plays a role in having matching tuples in the referenced
relation EMPLOYEE .
The JOIN operation can be specified as a CARTESIAN PRODUCT operation followed by a
SELECT operation. However, JOIN is very important because it is used very frequently
when specifying database queries. Consider the earlier example illustrating CARTESIAN
PRODUCT , which included the following sequence of operations:
The general form of a JOIN operation on two relations 5 R(A 1 , A 2 , ..., A n ) and S(B 1 ,B
2 , ..., B m ) is
R <join condition> S
EQUIJOIN: The join condition c includes one or more equality comparisons involving
attributes from R1 and R2. That is, c is of the form:
RESULT
DNAME,FNAME,LNAME (T)
Example: Retrieve each EMPLOYEE's name and the name of the DEPARTMENT he/she
works for:
Example: Retrieve each EMPLOYEE's name and the name of his/her SUPERVISOR:
SUPERVISOR(SUPERSSN,SFN,SLN) (EMPLOYEE)
SSN,FNAME,LNAME
T EMPLOYEE * SUPERVISOR
RESULT
FNAME,LNAME,SFN,SLN(T)
Note: In the original definition of NATURAL JOIN, the join attributes were required to have
the same names in both relations. There can be a more than one set of join attributes with a
different meaning between the same two relations. For example:
JOIN ATTRIBUTES
RELATIONSHIP
Example: Retrieve each EMPLOYEE's name and the name of the DEPARTMENT he/she
works for:
EMPLOYEE(1).SUPERSSN=
EMPLOYEE(2).SSN
EMPLOYEE(1)
Example: Retrieve each EMPLOYEE's name and the name of his/her SUPERVISOR:
T EMPLOYEE SUPERVISOR
SUPERSSN=SSSN
RESULT
FNAME,LNAME,SFN,SLN(T)
It has been shown that the set of relational algebra operations {σ, π, ∪, ρ, –, ×} is a
complete set; that is, any of the other original relational algebra operations can be expressed
as a sequence of operations from this set. For example, the INTERSECTION operation can
be expressed by using UNION and MINUS as follows:
R ∩ S ≡ (R ∪ S) – ((R – S) ∪ (S – R))
Although, strictly speaking, INTERSECTION is not required, it is inconvenient to
specify this complex expression every time we wish to specify an intersection. As another
example, a JOIN operation can be specified as a CARTESIAN PRODUCT followed by a
SELECT operation, as we discussed:
R < condition > S≡ σ <condition> (R × S)
Similarly, a NATURAL JOIN can be specified as a CARTESIAN PRODUCT
preceded by RENAME and followed by SELECT and PROJECT operations.
The preceding
operations are shown
in Figure 6.8(a).
Figure 6.8(b) illustrates a DIVISION operation where X = {A}, Y = {B}, and Z = {A,B}.
Notice that the tuples (values) b 1 and b 4 appear in R in combination with all three tuples in
S; that is why they appear in the resulting relation T. All other values of B in R do not appear
with all the tuples in S and are not selected: b 2 does not appear with a 2 , and b 3 does not
appear with a 1 . The DIVISION operation can be expressed as a sequence of π, ×, and –
operations as
follows:
T1 ← π Y (R)
T2 ← π Y ((S × T1) – R)
T ← T1 – T2
Figure 6.9 shows a query tree for Query 2 (see Section 4.3.1): For every project
located in ‘Stafford’, list the project number, the controlling department number, and
the department manager’s last name, address, and birth date.In Figure 6.9, the three leaf
nodes P , D , and E represent the three relations PROJECT ,
DEPARTMENT , and EMPLOYEE . The relational algebra operations in the expression are
represented by internal tree nodes. The query tree signifies an explicit order of execution in
the following sense. In order to execute Q2 , the node marked (1) in Figure 6.9 must begin
execution before node (2) because some resulting tuples of operation (1) must be available
before we can begin to execute operation (2). Similarly, node (2) must begin to execute and
produce results before node (3) can start execution, and so on.
Generalized Projection
The generalized projection operation extends the projection operation by allowing functions
of attributes to be included in the projection list. The generalized form can be expressed as:
AGGREGATE FUNCTIONS ( )
Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of
values or sets of tuples in database applications
Example 1: Retrieve the average salary of all employees (no grouping needed):
Example 2: For each department, retrieve the department number, the number of employees,
and the average salary (in the department):
(DNO,NUMEMPS,AVGSAL) DNO
COUNT SSN, AVERAGE SALARY (EMPLOYEE)
DNO is called the grouping attribute in the above example
all employees eЈI directly supervised by each employee eЈ, all employees eЈЈЈ directly
supervised by each employee eЈЈ, and so on.
For example, to specify the Ssn s of all employees eЈ directly supervised—at level one—by
the employee e whose name is ‘James Borg’ (see Figure 3.6), we can apply the following
operation:
BORG_SSN ← π Ssn ( σ Fname = ‘James’ AND Lname = ‘Borg’ ( EMPLOYEE ))
SUPERVISION ( Ssn1 , Ssn2 ) ← π Ssn , Super_ssn ( EMPLOYEE )
RESULT1 ( Ssn ) ← π Ssn1 ( SUPERVISION
Ssn2 = Ssn BORG_SSN )
To retrieve all employees supervised by Borg at level 2—that is, all employees Eјј supervised
by some employee eЈ who is directly supervised by Borg—we can apply another JOIN to the
result of the first query, as follows:
RESULT2 ( Ssn ) ← π Ssn1 ( SUPERVISION
Ssn2 = Ssn RESULT1 )
To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we can
apply the UNION operation to the two results, as follows:
RESULT ← RESULT2 ∪ RESULT1
The results of these queries are illustrated in Figure 6.11
all those in S, or all those in both relations in the result of the JOIN , regardless of whether or
not they have matching tuples in the other relation Some queries require all tuples in R 1 (or R2
or both) to appear in the result When no matching tuples are found, nulls are placed for the
missing attributes
For example, suppose that we want a list of all employee names as well as the name of the
departments they manage if they happen to manage a department; if they do not manage one,
we can indicate it with a NULL value. We can apply an operation LEFT OUTER JOIN ,
denoted by , to retrieve the result as
follows:
TEMP ← ( EMPLOYEE
Ssn = Mgr_ssn DEPARTMENT )
RESULT ← π Fname , Minit , Lname , Dname ( TEMP )
The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in R S; if
no matching tuple is found in S, then the attributes of S in the join result are filled or padded
with NULL values. The result of these operations is shown in Figure 6.12.
A similar operation, RIGHT OUTER JOIN , denoted by , keeps every tuple in the second, or
right, relation S in the result of R S. A third operation, FULL OUTER JOIN , denoted by,
keeps all tuples in both the left and the right relations when no matching tuples are found,
padding them with NULL values as needed.
Department , Rank ). Tuples from the two relations are matched based on having the same
combination of values
of the shared attributes— Name , Ssn , Department . The resulting relation,
STUDENT_OR_INSTRUCTOR , will have the following attributes:
Query 1. Retrieve the name and address of all employees who work for the ‘Research’
department.
RESEARCH_DEPT ← σ Dname = ‘Research’ ( DEPARTMENT )
RESEARCH_EMPS ← ( RESEARCH_DEPT
Dnumber = Dno EMPLOYEE )
RESULT ← π Fname , Lname , Address ( RESEARCH_EMPS )
As a single in-line expression, this query becomes:
π Fname , Lname , Address ( σ Dname = ‘Research’ ( DEPARTMENT
Dnumber = Dno ( EMPLOYEE ))
Query 2. For every project located in ‘Stafford’, list the project number, the controlling
department number, and the department manager’s last name, address, and birth date.
Query 3. Find the names of employees who work on all the projects controlled by
department number 5.
Query 4. Make a list of project numbers for projects that involve an employee whose
last name is ‘Smith’, either as a worker or as a manager of the department that controls
the project.
We use the COMPANY database example to illustrate the mapping procedure. The COMPANY
ER schema is shown again in Figure 9.1, and the corresponding COMPANY relational database
schema is shown in Figure 9.2 to illustrate the mapping steps
SQL
The name SQL stands for Structured Query Language.
The SQL language may be considered as one of the major reasons for the success of relational
databases in the commercial world.
SQL is a comprehensive database language because
It has statements for data definition ,database construction and database manipulation
It does automatic query optimizations
It has facilities for defining views on the database
It has facilities for specifying security and authorization
It has facilities for defining integrity constraints
It has facilities for specifying transaction controls
It also has rules for embedding SQL statements into a general-purpose programming language
such as Java or COBOL or C/C
The SQL command for data definition is the CREATE statement, which can be used to
create schemas, tables (relations), and domains as well as other constructs such as views,
assertions, and triggers.
Schema elements include tables, constraints, views, domains, and other constructs (such as
authorization grants) that describe the schema.
A schema is created via the CREATE SCHEMA statement, which can include all the
schema elements' definitions.
The schema can be assigned a name and authorization identifier, and the elements can be
defined later.
For example, the following statement creates a schema called COMPANY, owned by the
user with authorization identifier SMITH:
In general, not all users are authorized to create schemas and schema elements. The
privilege to create schemas, tables, and other constructs must be explicitly granted to the
relevant user accounts by the system administrator or DBA.
Integrity constraints such as referential integrity can be defined between relations only if
they exist in schemas within the same catalog.
The attributes are specified first, and each attribute is given a name, a data type to specify its
domain of values, and any attribute constraints, such as NOT NULL.
The key, entity integrity, and referential integrity constraints can be specified within the
CREATE TABLE statement after the attributes are declared, or they can be added later using
the ALTER TABLE command.
We can explicitly attach the schema name to the relation name, separated by a period.
rather than
The relations declared through CREATE TABLE statements are called “base tables” or base
relations; this means that the relation and its rows are actually created and stored as a file by
the DBMS.
Base relations are distinguished from “virtual relations”, created through the CREATE
VIEW statement, which may or may not correspond to an actual physical file.
In SQL the attributes in a base table are considered to be ordered in the sequence in which
they are specified in the CREATE TABLE statement. However, rows are not considered to
be ordered within a table.
Figure 8.1 shows sample data definition statements in SQL for the COMPANY database.
When specifying a literal string value, it is placed between single quotation marks
(apostrophes), and it is case sensitive (a distinction is made between uppercase and
lowercase.
For fixed-length strings, a shorter string is padded with blank characters to the right.
For example, if the value 'Smith' is for an attribute of type CHAR(10), it is padded with five
blank characters to become 'Smith ' if needed.
Padded blanks are generally ignored when strings are compared. For comparison purposes,
strings are considered ordered in alphabetic order;
If a string str1 appears before another string str2 in alphabetic order, then str1 is considered
to be less than str2.
There is also a concatenation operator denoted by || (doublevertical bar) that can
concatenate two strings in SQL.
c) Bit-string data types are either of fixed length -BIT(n) or varying length-BIT VARYING(n),
where ‘n’ is the maximum number of bits.
The default for ‘n’, the length of a character string or bit string, is 1.
Literal bit strings are placed between single quotes but preceded by a B to distinguish them
from character strings; For example, B'10101
d) A Boolean data type has the traditional values of TRUE or FALSE in SQL.
Because of the presence of NULL values, a three-valued logic is used, so a third possible
value for a boolean data type is UNKNOWN.
e) New data types for date and time were added in SQL2.
The DATE data type has ten positions, and its components are YEAR, MONTH, and DAY in
the form YYYY-MM-DD.
The TIME data type has at least eight positions, with the components HOUR, MINUTE,and
SECOND in the form HH:MM:SS.
The < (less than) comparison can be used with dates or times-an earlier date is considered
to be smaller than a later date, and similarly with time.
Literal values are represented by single-quoted strings preceded by the keyword DATE or
TIME;
f) A timestamp data type (TIMESTAMP) includes both the DATE and TIME fields, plus a
minimum of six positions for decimal fractions of seconds.
g) Another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data type.
This specifies an interval-a “relative value” that can be used to increment or decrement an
absolute value of a date, time, or timestamp.
Domain
It is possible to specify the data type of each attribute directly, as in Figure 8.1;
A domain can be declared, and the domain name can be used with the attribute
specification.
We can use SSN_TYPE in place of CHAR(9) in Figure 8.1 for the attributes SSN and SUPERSSN of
EMPLOYEE, MGRSSN of DEPARTMENT, ESSN of WORKS_ON, and ESSN of DEPENDENT.
NOT NULL is always implicitly specified for the attributes that are part of the primary key
of each relation, but it can be specified for any other attributes whose values are required not
to be NULL, as shown in Figure 8.1.
It is also possible to define a default value for an attribute by appending the clause
The default value is included in any new tuple if an explicit value is not provided for that
attribute. Figure 8.2 illustrates examples of specifying a default values to various attributes.
If no default clause is specified, the default value is NULL for attributes that do not have
the NOT NULL constraint.
Another type of constraint can restrict attribute or domain values using the CHECK clause
following an attribute or domain definition.
For example, suppose that department numbers are restricted to integer numbers between 1
and 20; then, we can change the attribute declaration of DNUMBER in the DEPARTMENT
table (see Figure 8.1) to the following:
DNUMBER INT NOT NULL CHECK (DNUMBER > 0 AND DNUMBER < 21);
The CHECK clause can also be used in conjunction with the CREATE DOMAIN statement.
CREATE DOMAIN D_NUM AS INTEGER CHECK (D_NUM > 0 AND D_NUM < 21);
We can then use the created domain D_NUM as the attribute type for all attributes that refer to
department numbers in Figure 8.1, such as DNUMBER of DEPARTMENT, DNUM of PROJECT,
DNO of EMPLOYEE, and so on.
If a primary key has a single attribute, the clause can follow the attribute directly. For
example, the primary key of DEPARTMENT can be specified as follows
A referential integrity constraint is violated when rows are inserted or deleted, or when a
foreign key or primary key attribute value is modified.
The default action that SQL takes for an integrity violation is to reject the update operation
that will cause a violation.
However, the schema designer can specify an alternative action to be taken if a referential
integrity constraint is violated, by attaching a referential triggered action clause to any
foreign key constraint.
The options include SET NULL, CASCADE, and SET DEFAULT. An option must be
qualified with either ON DELETE or ON UPDATE as shown in figure 8.2.
We illustrate this with the examples shown in Figure 8.2. Here, the database designer chooses
SET NULL ON DELETE and CASCADE ON UPDATE for the foreign key SUPERSSN of
EMPLOYEE (Figure 8.3)
This means that if the row for a supervising employee is deleted, the value of SUPERSSN is
automatically set to NULL for all employee rows that were referencing the deleted employee tuple.
On the other hand, if the SSN value for a supervising employee is updated (say, because it was
entered incorrectly), the new value is cascaded to SUPERSSN for all employee tuples referencing the
updated employee tuple.
Figure 8.3: One possible database state for the COMPANY database
A constraint name is used to identify a particular constraint in case the constraint must be
dropped later and replaced with another constraint. Giving names to constraints is optional.
These can be called tuple-based constraints because they apply to each tuple individually
and are checked whenever a tuple is inserted or modified.
For example, suppose that the DEPARTMENT table in Figure 8.1 had an additional attribute
DEPT_CREATE_DATE, which stores the date when the department was created. Then we
could add the following CHECK clause at the end of the CREATE TABLE statement for the
DEPARTMENT table to make sure that a manager's start date is greater than the department
creation date:
statement.
Where
<attribute list> is a list of attribute names whose values are to be retrieved by the query.
<table list> is a list of the relation names required to process the query.
In SQL, the basic logical comparison operators for comparing attribute values with one
another and with literal constants are =, <, <=, >, >=, and <>.
These correspond to the relational algebra operators =, <, ~, >, ~, and *, respectively, and to
the c{c++ programming language operators =, <, <=, >, >=, and !=.
Examples:
This query involves only the EMPLOYEE relation listed in the FROM clause. The query selects the
EMPLOYEE tuples that satisfy the condition of the WHERE clause, then projects the result on the
BDATE and ADDRESS attributes listed in the SELECT clause.
If this is the case, and if a query refers to two or more attributes with the same name, we
must qualify the attribute name with the relation name to prevent ambiguity.
This is done by prefixing the relation name to the attribute name and separating the two by
a period.
To illustrate this, suppose that the DNO and LNAME attributes of the EMPLOYEE relation were
called DNUMBER and NAME, and the DNAME attribute of DEPARTMENT was also called
NAME; then, to prevent ambiguity. Query Ql would be rephrased as shown in QIA.
Ambiguity also arises in the case of queries that refer to the same relation twice, as in the
following example:
In this case, we are allowed to declare alternative relation names E and S, called “aliases” or
“tuple variables”, for the EMPLOYEE relation.
An alias can follow the keyword AS, as shown in Q8, or it can directly follow the relation
name-for example, by writing EMPLOYEE E, EMPLOYEE S in the FROM clause of Q8.
It is also possible to rename the relation attributes within the query in SQL by giving them
aliases.
For example, if we write EMPLOYEE AS E(FN, MI, LN, SSN, SD, ADDR, SEX, SAL,
SSSN, DNO) in the FROM clause, FN becomes an alias for FNAME, MI for MINH, LN for
LNAME, and so on.
In Q8, we can think of E and S as two different copies of the EMPLOYEE relation;
Whenever one or more aliases are given to a relation, we can use these names to represent different
references to that relation. This permits multiple references to the same relation within a query. We
could specify query Q1A as in Q1B:
If more than one relation is specified in the FROM clause and there is no WHERE clause,
then the CROSS PRODUCT-all possible tuple combinations-of these relations is selected.
For example, Query 9 selects all EMPLOYEE SSNS (Figure 8.3e), and Query 10 selects all
combinations of an EMPLOYEE SSN and a DEPARTMENT DNAME (Figure 8.3f).
It is extremely important to specify every selection and join condition in the WHERE
clause; if any such condition is overlooked then incorrect and very large relations may result.
To retrieve all the attribute values of the selected tuples, we do not have to list the attribute
names explicitly in SQL; we just specify an asterisk (*), which stands for all the attributes.
Query Q1C retrieves all the attribute values of any EMPLOYEE who works in
DEPARTMENT number 5 (Figure 8.3g)
Query Q1D retrieves all the attributes of an EMPLOYEE and the attributes of the
DEPARTMENT in which he or she works the 'Research' department.
Query Ql0A specifies the CROSS PRODUCT of the EMPLOYEE and DEPARTMENT
relations.
Duplicate tuples can appear more than once in a table, and in the result of a query.
SQL does not automatically eliminate duplicate tuples in the results of queries, for the
following reasons:
Duplicate elimination is an expensive operation. One way to implement it is to sort the tuples
first and then eliminate duplicates.
The user may want to see duplicate tuples in the result of a query.
If we do want to eliminate duplicate tuples from the result of an SQL query, we use the
keyword DISTINCT in the SELECT clause, meaning that only distinct tuples should remain
in the result.
In general, a query with SELECT DISTINCT eliminates duplicates, whereas a query with
SELECT ALL does not.
Specifying SELECT with neither ALL nor DISTINCT-as in our previous examples-is
Equivalent to SELECT ALL.
Query 11 retrieves the salary of every employee; if several employees have the same
salary, that salary value will appear as many times in the result of the query, as shown in
Figure 8.4(a).
By using the keyword DISTINCT as in Q11A, we get only the distinct salary values , as
shown in Figure 8.4(b).
SQL has directly incorporated some of the set operations of relational algebra.
The relations resulting from these set operations are sets of tuples; that is, duplicate tuples are
eliminated from the result.
Because these set operations apply only to union-compatible relations, we must make sure
that the two relations on which we apply the operation have the same attributes and that the
attributes appear in the same order in both relations.
The first SELECT query retrieves the projects that involve a 'Smith' as manager of the
department that controls the project
The second SELECT retrieves the projects that involve a 'Smith' as a worker on the project.
Applying the UNION operation to the two SELECT queries gives the desired result.
comparison operator.
Partial strings are specified using two reserved characters: % replaces an arbitrary number of zero or
more characters and the underscore( _ )replaces a single character.
To retrieve all employees who were born during the 1950s, we can use Query 12A. Here,
'5' must be the third character of the string (according to our format for date), so we use the
value ' 5 ', with each underscore serving as a placeholder for an arbitrary character.
For example, 'AB\_CD\%EF' ESCAPE '\' represents the literal string AB_CD%EF', because \ is
specified as the escape character. Any character not used in the string can be chosen as the escape
character.
The standard arithmetic operators for addition (+), subtraction (-), multiplication (*), and
division (/) can be applied to numeric values or attributes with numeric domains.
For string data types, the concatenate operator ‘||’ can be used in a query to append two
string values.
For date, time, timestamp, and interval data types, operators include incrementing (+) or
decrementing (-) a date, time, or timestamp by an interval.
Another comparison operator that can be used for convenience is BETWEEN, which is
illustrated in Query 14.
We can specify the keyword DESC if we want to see the result in a descending order of
values.
The values should be listed in the same order in which the corresponding attributes were
specified in the CREATE TABLE command.
For example, to add a new tuple to the EMPLOYEE relation, we can use U1:
A second form of the INSERT statement allows the user to specify explicit attribute names
that correspond to the values provided in the INSERT command.
This is useful if a relation has many attributes but only a few of those attributes are to be
assigned values in the new tuple.
However, the values must include all attributes with NOT NULL specification and no default
value.
Attributes with NULL allowed or DEFAULT values are the ones that can be left out.
For example, to enter a tuple for a new EMPLOYEE for whom we know only the
It is also possible to insert into a relation multiple tuples separated by commas in a single
INSERT command. The attribute values forming each tuple are enclosed in parentheses.
A DBMS that fully implements SQL-99 should support and enforce all the integrity
constraints that can be specified in the DDL. However, some DBMSs do not incorporate all
the constraints (like referential integrity), in order to maintain the efficiency of the DBMS and
because of the complexity of enforcing all constraints.
If a system does not support some constraint, the users or programmers must enforce the
constraint.
For example, if we issue the command in U2 on the database shown in Table 5.1, a DBMS
not supporting referential integrity will do the insertion even though no DEPARTMENT
tuple exists in the database with DNUMBER = 2.
It is the responsibility of the user to check that any such constraints whose checks are not
implemented by the DBMS are not violated.
A single INSERT command can be used for inserting multiple tuples into a relation in
conjunction with creating the relation and loading the relation with the result of a query.
For example, to create a temporary table DEPT_NAME that has the name, number of
employees, and total salaries for each department, we can write the statements in U3A and
U3B:
We can now query DEPTS_INFO as we would any other relation; when we do not need it any
more, we can remove it by using the DROP TABLE command.
It includes a WHERE clause to select the tuples to be deleted. Tuples are explicitly deleted
from only one table at a time.
However, the deletion may propagate to tuples in other relations if referential triggered
actions are specified in the referential integrity constraints of the DDL.
Depending on the number of tuples selected by the condition in the WHERE clause, zero,
one, or several tuples can be deleted by a single DELETE command.
A missing WHERE clause specifies that all tuples in the relation are to be deleted; however,
the table remains in the database as an empty table.
The DELETE commands in U4A to U4D, if applied independently to the database of Table
5.1, will delete zero, one, four, and all tuples, respectively, from the EMPLOYEE relation:
The UPDATE command is used to modify attribute values of one or more selected
tuples.
As in the DELETE command, a WHERE clause in the UPDATE command selects the
tuples to be modified from a single relation.
However, updating a primary key value may propagate to the foreign key values of
tuples in other relations if such a referential triggered action is specified in the
referential integrity constraints of the DDL.
For example, to change the location and controlling department number of project
number 10 to 'Bellaire' and 5, respectively, we use U5:
Example to give all employees in the 'Research' department a 10 percent rise in salary
SQL has the capability to specify more general constraints, called assertions, using the
CREATE ASSERTION statement.
SQL has language constructs for specifying views, also known as virtual tables, using
the CREATE VIEW statement. Views are derived from the base tables declared
through the CREATE TABLE statement.
SQL has several different techniques for writing programs in various programming
languages that can include SQL statements to access one or more databases. These
include embedded SQL, dynamic SQL SQL/CLI (Call Language Interface) and its
predecessor ODBC (Open Data Base Connectivity), and SQL/PSM (Program Stored
Modules).
Each commercial RDBMS will have, in addition to the SQL commands, a set of
commands for specifying physical database design parameters, file structures for
relations, and access paths such as indexes. We call these commands a storage
definition language (SDL).
SQL has transaction control commands. These are used to specify units of database
processing for concurrency control and recovery purposes.
SQL has language constructs for specifying the granting and revoking of privileges to
users. Privileges typically correspond to the right to use certain SQL commands to
access certain relations. Each relation is assigned an owner, and either the owner or
the DBA staff can grant to selected users the privilege to use an SQL statement-such
as SELECT, INSERT, DELETE, or UPDATE-to access the relation. In addition, the
DBA staff can grant the privileges to create schemas, tables, or views to certain users.
These SQL commands-called GRANT and REVOKE.
SQL has language constructs for creating Triggers. These are generally referred to as
active database techniques, since they specify actions that are automatically triggered
by events such as database updates.
SQL has incorporated many features from object-oriented models to have more
powerful capabilities, leading to enhanced relational systems known as object-
relational. Capabilities such as creating complex-structured attributes (also called
nested relations), specifying abstract data types (called DDTs or user-defined types)
for attributes and tables, creating object identifiers for referencing tuples, and
specifying operations on these types.
SQL and relational databases can interact with new technologies such as XML
(eXtended Markup Language) and OLAP (On Line Analytical Processing for Data
Warehouses).