Fundamentals of Database Systems 4e - Elmasri
Fundamentals of Database Systems 4e - Elmasri
Chapter 1
Introduction and
Conceptual Modeling
car1
((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999, (red, black))
car2
((ABC 123, NEW YORK), WP9872, Nissan 300ZX, 2-door, 2002, (blue))
car3
((VSY 720, TEXAS), TD729, Buick LeSabre, 4-door, 2003, (white, blue))
.
.
.
ENTITY TYPE
RELATIONSHIP TYPE
ATTRIBUTE
KEY ATTRIBUTE
MULTIVALUED ATTRIBUTE
COMPOSITE ATTRIBUTE
DERIVED ATTRIBUTE
E1 R E2 TOTAL PARTICIPATION OF E2 IN R
E1 N
R E2 CARDINALITY RATIO 1:N FOR E1:E2 IN R
(min,max)
R E STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION
OF E IN R
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 3-68
ER DIAGRAM – Entity Types are:
EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT
r1
e1 d1
e2 r2
e3 r3 d2
e4 r4
d3
e5
r5
e6
r6
e7
r7
r9
r1
e1 p1
e2 r2
e3 r3 p2
e4 r4
p3
e5
r5
e6
r6
e7
r 8 r7
r1
e1 d1
e2 r2
e3 r3 d2
e4 r4
d3
e5
r5
e6
r6
e7
r7
r9
r1
e1 p1
e2 r2
e3 r3 p2
e4 r4
p3
e5
r5
e6
r6
e7
r 8 r7
e1 2
1 r1
e2 2
1
r2
e3 2
1
e4 2 r3
1
e5 1
2 r4
e6 1
2 r5
e7
r6
© The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition
relationship instances in R
Default(no constraint): min=0, max=n
Examples:
A department has exactly one manager and an employee can manage at most
one department.
– Specify (0,1) for participation of EMPLOYEE in MANAGES
An employee can work for exactly one department but a department can have
(0,1) (1,1)
(1,1) (1,N)
© The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition
Let R S1 X S2
Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
Populated Table Extension
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 5-107
Copyright © 2004 Pearson Education, Inc.
Example - Figure 5.1
Notation:
- We refer to component values of a tuple t
by t[Ai] = vi (the value of attribute Ai for
tuple t).
Similarly, t[Au, Av, ..., Aw] refers to the
subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively.
SELECT operation is used to select a subset of the tuples from a relation that
satisfy a selection condition. It is a filter that keeps only those tuples that
satisfy a qualifying condition – those satisfying the condition are selected
while others are discarded.
Example: To select the EMPLOYEE tuples whose department number is
four or those whose salary is greater than $30,000 the following notation is
used:
DNO = 4 (EMPLOYEE)
SALARY > 30,000 (EMPLOYEE)
In general, the select operation is denoted by <selection condition>(R) where the
symbol (sigma) is used to denote the select operator, and the selection
condition is a Boolean expression specified on the attributes of relation R
–
<list2> R)<list1> Ras long as<list2>contains
<list1>
theattributes in<list2>
S (B1, B2, …, Bn ) ( R) is a renamed relationS based on R with column names B1, B1, …..Bn
S ( R) is a renamed relationS based on R (which does not specify column names).
(B1, B2, …, Bn ) ( R) is a renamed relationwith column names B1, B1, …..Bn which does
STUDENTINSTRUCTOR
The result of this operation, denoted by R S, is a relation that includes all
tuples that are in both R and S. The two operands must be "type compatible"
STUDENT INSTRUCTOR
Example: The figure shows the names of students who are not instructors,
and the names of instructors who are not students.
STUDENT-INSTRUCTOR
INSTRUCTOR-STUDENT
Example:
FEMALE_EMPS SEX=’F’(EMPLOYEE)
EMPNAMES FNAME, LNAME, SSN (FEMALE_EMPS)
DIVISION Operation
– The division operation is applied to two relations
R(Z) S(X), where X subset Z. Let Y = Z - X (and hence Z
= X Y); that is, let Y be the set of attributes of R that are
not attributes of S.
– The result of DIVISION is a relation T(Y) that includes a
tuple t if tuples tR appear in R with tR [Y] = t, and with
tR [X] = ts for every tuple ts in S.
Example: To find the first and last names of all employees whose salary is above
$50,000, we can write the following tuple calculus expression:
Two special symbols called quantifiers can appear in formulas; these are the
universal quantifier ) and the existential quantifier ).
Informally, a tuple variable t is bound if it is quantified, meaning that it
appears in an ( t) or ( t) clause; otherwise, it is free.
Query :
{e.LNAME, e.FNAME | EMPLOYEE(e) and x)(not(PROJECT(x)) or
not(x.DNUM=5)
OR ( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO) ) ) )}
Exclude from the universal quantification all tuples that we are not interested in
by making the condition true for all such tuples. The first tuples to exclude (by
making them evaluate automatically to true) are those that are not in the relation
R of interest.
In query above, using the expression not(PROJECT(x)) inside the universally
quantified formula evaluates to true all tuples x that are not in the PROJECT
relation. Then we exclude the tuples we are not interested in from R itself. The
expression not(x.DNUM=5) evaluates to true all tuples x that are in the project
relation but are not controlled by department 5.
Finally, we specify a condition that must hold on all the remaining tuples in R.
( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-169
Languages Based on Tuple Relational
Calculus
The language SQL is based on tuple calculus. It uses the basic
SELECT <list of attributes>
FROM <list of relations>
WHERE <conditions>
block structure to express the queries in tuple calculus where the SELECT clause
mentions the attributes being projected, the FROM clause mentions the relations
needed in the query, and the WHERE clause mentions the selection as well as the
join conditions.
SQL syntax is expanded further to accommodate other operations. (See Chapter 8).
Query :
{uv | ( q) ( r) ( s) ( t) ( w) ( x) ( y) ( z)
(EMPLOYEE(qrstuvwxyz) and q=’John’ and r=’B’ and s=’Smith’)}
Ten variables for the employee relation are needed, one to range over the
domain of each attribute in order. Of the ten variables q, r, s, . . ., z, only u and
v are free.
Specify the requested attributes, BDATE and ADDRESS, by the free domain
variables u for BDATE and v for ADDRESS.
Specify the condition for selecting a tuple following the bar ( | )—namely, that
the sequence of values assigned to the variables qrstuvwxyz be a tuple of the
employee relation and that the values for q (FNAME), r (MINIT), and s
(LNAME) be ‘John’, ‘B’, and ‘Smith’, respectively.
– For each weak entity type W in the ER schema with owner entity type E, create
a relation R and include all simple attributes (or simple components of
composite attributes) of W as attributes of R.
– In addition, include as foreign key attributes of R the primary key attribute(s)
of the relation(s) that correspond to the owner entity type(s).
– The primary key of R is the combination of the primary key(s) of the owner(s)
and the partial key of the weak entity type W, if any.
For each binary 1:1 relationship type R in the ER schema, identify the relations
S and T that correspond to the entity types participating in R. There are three
possible approaches:
(1) Foreign Key approach: Choose one of the relations-S, say-and include a foreign key in S the
primary key of T. It is better to choose an entity type with total participation in R in the role of S.
Example: 1:1 relation MANAGES is mapped by choosing the participating entity type
DEPARTMENT to serve in the role of S, because its participation in the MANAGES relationship
type is total.
(2) Merged relation option: An alternate mapping of a 1:1 relationship type is possible by merging
the two entity types and the relationship into a single relation. This may be appropriate when both
participations are total.
(3) Cross-reference or relationship relation option: The third alternative is to set up a third relation R
for the purpose of cross-referencing the primary keys of the two relations S and T representing the
entity types.
– For each regular binary 1:N relationship type R, identify the relation S
that represent the participating entity type at the N-side of the
relationship type.
– Include as foreign key in S the primary key of the relation T that
represents the other entity type participating in R.
– Include any simple attributes of the 1:N relation type as attributes of S.
– For each regular binary M:N relationship type R, create a new relation S
to represent R.
– Include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types; their combination will form the
primary key of S.
– Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.
FIGURE 7.7
An ER schema for a SHIP_TRACKING database.
The database users must still enter a value for the new attribute
JOB for each EMPLOYEE tuple. This can be done using the
UPDATE command.
EMPLOYEE.LNAME, DEPARTMENT.DNAME
Q1C: SELECT *
FROM EMPLOYEE
WHERE DNO=5
Q1D: SELECT *
FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND
DNO=DNUMBER
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 8-235
USE OF DISTINCT
SQL does not treat a relation as a set; duplicate tuples can
appear
To eliminate duplicate tuples in a query result, the keyword
DISTINCT is used
For example, the result of Q11 may have duplicate SALARY
values whereas Q11A does not have any duplicate values
or as:
– In this case, the grouping and functions are applied after the joining of
the two relations
The LIKE operator allows us to get around the fact that each
value is considered atomic and indivisible; hence, in SQL,
character string attribute values are not atomic
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 8-264
ARITHMETIC OPERATIONS
The standard arithmetic operators '+', '-'. '*', and '/' (for addition,
subtraction, multiplication, and division, respectively) can be
applied to numeric values in an SQL query result
Query 27: Show the effect of giving all employees who work
on the 'ProductX' project a 10% raise.
MORE SQL:
Assertions,
Views, and
Programming
Techniques
Disconnection
DISCONNECT connection-name;
A stored function
CREATE FUNCTION fun-name (params) RETRUNS return-type
local-declarations
function-body;
Definition:
Transitive functional dependency - a FD X -> Z
that can be derived from two FDs X -> Y and Y -> Z
Examples:
- SSN -> DMGRSSN is a transitive FD since
SSN -> DNUMBER and DNUMBER -> DMGRSSN
hold
- SSN -> ENAME is non-transitive since there is no set
of attributes X where SSN -> X and X -> ENAME
Definition:
Superkey of relation schema R - a set of attributes
S of R that contains a key of R
A relation schema R is in third normal form (3NF)
if whenever a FD X -> A holds in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b)
above
DNUM.
11.1 A decomposition Boolean result: Testing for non- See a simpler test
D of R and a set F yes or no for additive join in Section 11.1.4
of functional lossless join decomposition for binary
dependencies property decompositions
11.2 Set of functional A set of Dependency No guarantee of
dependencies F relations in 3NF preservation satisfying lossless
join property
11.3 Set of functional A set of Lossless join No guarantee of
dependencies F relations in decomposition dependency
BCNF preservation
11.4 Set of functional A set of Lossless join and May not achieve
dependencies F relations in 3NF dependency BCNF
preserving
decomposition
11.4a Relation schema Key K of R To find a key K The entire relation
R with a set of (which is a R is always a
functional subset of R) default superkey
dependencies
ElmasriFand Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-400
3. Multivalued Dependencies and Fourth
Normal Form (1)
(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. (b)
Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS.
Then, we get:
blocking factor Bfr= B div R= 512 div 150= 3 records/block
number of file blocks b= (r/Bfr)= (30000/3)= 10000 blocks
For an index on the SSN field, assume the field size V SSN=9 bytes,
assume the record pointer size PR=7 bytes. Then:
index entry size RI=(VSSN+ PR)=(9+7)=16 bytes
index blocking factor BfrI= B div RI= 512 div 16= 32 entries/block
number of index blocks b= (r/ Bfr I)= (30000/32)= 938 blocks
binary search needs log2bI= log2938= 10 block accesses
Primary Index
– Defined on an ordered data file
– The data file is ordered on a key field
– Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor
– A similar scheme can use the last record in a block.
– A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-467
FIGURE 14.1
Primary index
on the
ordering key
field of the file
shown in
Figure 13.7.
Clustering Index
Note: The above figure is now called Figure 15.3 (continued) in Edition 4
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-510
Algorithms for SELECT and JOIN Operations
(14)
Implementing the JOIN Operation (cont.):
Factors affecting JOIN performance
– Available buffer space
RESULT Elmasri
TEMP1 υ TEMP2
and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 15-523
Copyright © 2004 Pearson Education, Inc.
6. Combining Operations using Pipelining (1)
Motivation
– A query is mapped into a sequence of operations.
– Each execution of an operation produces a temporary
result.
– Generating and saving temporary files on disk is time
consuming and expensive.
Alternative:
– Avoid constructing temporary results as much as
possible.
– Pipeline the data through multiple operations - pass the
result of a previous operator to the next without waiting
to complete the previous operation.
Example:
For every project located in ‘Stafford’, retrieve the project
number, the controlling department number and the department
manager’s last name, address and birthdate.
Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-528
Using Heuristics in Query Optimization (4)
Note: The above figure is now called Figure 15.4 (continued) in Edition 4
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-530
Using Heuristics in Query Optimization (6)
Heuristic Optimization of Query Trees:
The same query could correspond to many different relational
algebra expressions — and hence many different query trees.
Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO
AND ESSN=SSN AND BDATE > ‘1957-12-31’;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-531
Using Heuristics in Query Optimization (7)
Issues
– Cost function
– Number of execution strategies to be considered
Left-deep tree: a binary tree where the right child of each non-
leaf node is always a base relation.
– Amenable to pipelining
– Could utilize any access paths on the base relation (the right child)
when executing the join.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-553
9. Overview of Query Optimization in Oracle
Oracle DBMS V8
Rule-based query optimization: the optimizer chooses
execution plans based on heuristically ranked operations.
(Currently it is being phased out)
Cost-based query optimization: the optimizer examines
alternative access paths and operator algorithms and chooses
the execution plan with lowest estimate cost. The query cost is
calculated based on the estimated usage of resources such as
I/O, CPU and memory needed.
Application developers could specify hints to the ORACLE
query optimizer. The idea is that an application developer
might know more information about the data.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-554
10. Semantic Query Optimization
Semantic Query Optimization: Uses constraints
specified on the database schema in order to modify one
query into another query that is more efficient to execute.
Goal:
– To make application run faster
– To lower the response time of queries/transactions
– To improve the overall throughput of transactions
ACID properties:
Atomicity: A transaction is an atomic unit of
processing; it is either performed in its entirety
or not performed at all.
In Sa, the operations w2(X) and w3(X) are blind writes, since T1 and T3
do not read the value of X.
Type of Violation
___________________________________
Isolation Dirty nonrepeatable
level read read phantom
_____________________ _____ _________ ____________________
READ UNCOMMITTED yes yes yes
READ COMMITTED no yes yes
REPEATABLE READ no no yes
SERIALIZABLE no no no
Y N
Write
N N
T1 T2 Result
read_lock (Y); read_lock (X); Initial values: X=20; Y=30
read_item (Y); read_item (X); Result of serial execution
unlock (Y); unlock (X); T1 followed by T2
write_lock (X); Write_lock (Y); X=50, Y=80.
read_item (X); read_item (Y); Result of serial execution
X:=X+Y; Y:=X+Y; T2 followed by T1
write_item (X); write_item (Y); X=70, Y=50
unlock (X); unlock (Y);
T’1 T’2
read_lock (Y); read_lock (X); T1 and T2 follow two-phase
read_item (Y); read_item (X); policy but they are subject to
write_lock (X); Write_lock (Y); deadlock, which must be
unlock (Y); unlock (X); dealt with.
read_item (X); read_item (Y);
X:=X+Y; Y:=X+Y;
write_item (X); write_item (Y);
unlock (X); unlock (Y);
Two-phase policy generates two locking algorithms (a) Basic and (b)
Conservative.
Conservative: Prevents deadlock by locking all desired data items before
transaction begins execution.
Basic: Transaction locks data items incrementally. This may cause deadlock
which is dealt with.
Strict: A more stricter version of Basic algorithm where unlocking is
performed after a transaction terminates (commits or aborts and rolled-back).
This is the most commonly used two-phase locking algorithm.
Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow two-phase
read_item (Y); policy but they are deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)
Deadlock (T’1 and T’2)
Deadlock prevention
A transaction locks all data items it refers to before it begins execution.
This way of locking prevents deadlock since a transaction never waits
for a data item. The conservative two-phase locking uses this approach.
Deadlock avoidance
There are many variations of two-phase locking algorithm. Some avoid
deadlock by not letting the cycle to complete. That is as soon as the
algorithm discovers that blocking a transaction is likely to create a
cycle, it rolls back the transaction. Wound-Wait and Wait-Die
algorithms use timestamps to avoid deadlocks by rolling-back
victim.
Starvation
Starvation occurs when a particular transaction consistently waits or
restarted and never gets a chance to proceed further. In a deadlock
resolution it is possible that the same transaction may consistently be
selected as victim and rolled-back. This limitation is inherent in all
priority based scheduling mechanisms. In Wound-Wait scheme a
younger transaction may always be wounded (aborted) by a long
running older transaction which may create starvation.
Timestamp
A monotonically increasing variable (integer) indicating the age of an
operation or a transaction. A larger timestamp value indicates a more
recent event or operation.
Timestamp based algorithm uses timestamp to serialize the execution of
concurrent transactions.
Concept
Note
In multiversion 2PL read and write operations from conflicting
transactions can be processed concurrently. This improves
concurrency but it may delay transaction commit because of
obtaining certify locks on all its writes. It avoids cascading abort but
like strict two phase locking scheme conflicting transactions may get
deadlocked.
When validating Ti, the first condition is checked first for each
transaction Tj, since (1) is the simplest condition to check. If (1) is
false then (2) is checked and if (2) is false then (3 ) is checked. If
none of these conditions holds, the validation fails and Ti is aborted.
DB
f1 f2
r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j
IS IX S SIX X
IS yes yes yes yes no
IX yes yes no no no
S yes no yes no no
SIX yes no no no no
X no no no no no
2 Types of Failure
The database may become unavailable for use due to
• Transaction failure: Transactions may fail because of
incorrect input, deadlock, incorrect synchronization.
• System failure: System may fail because of addressing
error, application error, operating system fault, RAM
failure, etc.
• Media failure: Disk head crash, power disruption, etc.
We show the process of roll-back with the help of the following three transactions T1,
and T2 and T3.
T1 T2 T3
read_item (A) read_item (B) read_item (C)
read_item (D) write_item (B) write_item (B)
write_item (D) read_item (D) read_item (A)
write_item (A) write_item (A)
[start_transaction, T3]
[read_item, T3, C]
* [write_item, T3, B, 15, 12] 12
[start_transaction,T2]
[read_item, T2, B]
** [write_item, T2, B, 12, 18] 18
[start_transaction,T1]
[read_item, T1, A]
[read_item, T1, D]
[write_item, T1, D, 20, 25] 25
[read_item, T2, D]
** [write_item, T2, D, 25, 26] 26
[read_item, T3, A]
---- system crash ----
* T3 is rolled back because it did not reach its commit point.
** T2 is rolled back because it reads the value of item B written by T3.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 19-688
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Roll-back: One execution of T1, T2 and T3 as recorded in the log.
T1
T2
T3
T4
T5
t1 Time t2
checkpoint system crash
T2 and T3 are ignored because they did not reach their commit points.
T4 is redone because its commit point is after the last checkpoint.
During recovery, all transactions of the commit table are redone and all
transactions of active tables are ignored since none of their AFIMs
reached the database. It is possible that a commit table transaction may
be redone twice but this does not create any inconsistency because of a
redone is “idempotent”, that is, one redone for an AFIM is equivalent to
multiple redone for the same AFIM.
Undo/No-redo Algorithm
Recovery schemes of this category apply undo and also redo for
recovery. In a single-user environment no concurrency control is
required but a log is maintained under WAL. Note that at any time there
will be one transaction in the system and it will be either in the commit
table or in the active table. The recovery manager performs:
X Y
X' Y'
Database
A log record stores (a) the previous LSN of that transaction, (b) the
transaction ID, and (c) the type of log record.
For efficient recovery following tables are also stored in the log during
checkpointing:
Dirty Page table: Contains an entry for each dirty page in the buffer,
which includes the page ID and the LSN corresponding to the earliest
update to that page.
Object Database
Standards, Languages,
and Design
interface Date:Object {
enum weekday{sun,mon,tue,wed,thu,fri,sat};
enum Month{jan,feb,mar,…,dec};
unsigned short year();
unsigned short month();
unsigned short day();
…
boolean is_equal(in Date other_date);
};
class Degree {
attribute string college;
attribute string degree;
attribute string year;
};
interface Shape {
attribute struct point {…} reference_point;
float perimeter ();
…
};
define has_minor(dept_name) as
select s
from s in students
where s.minor_in.dname=dept_name
has_minor can now be used in queries
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-742
Single Elements from Collections
d_Extent<Person> All_Persons(CS101)
Relationships in ODB:
– relationships are handled by reference attributes
that include OIDs of related objects
– single and collection of references are allowed
– references for binary relationships can be
expressed in single direction or both directions
via inverse operator
Relationships in RDB:
– Relationships among tuples are specified by
attributes with matching values (via foreign
keys)
– Foreign keys are single-valued
– M:N relationships must be presented via a
separate relation (table)
Types of Security
– Legal and ethical issues
– Policy issues
– System-related issues
– The need to identify multiple security levels
Threats to databases
- Loss of integrity
- Loss of availability
- Loss of confidentiality
1. Account creation
2. Privilege granting
3. Privilege revocation
4. Security level assignment
To keep a record of all updates applied to the database and of the particular
user who applied each update, we can modify system log, which
includes an entry for each operation applied to the database that may be
required for recovery from a transaction failure or system crash.
If any tampering with the database is suspected, a database audit is
performed, which consists of reviewing the log to examine all accesses
and operations applied to the database during a certain time period.
A database log that is used mainly for security purposes is sometimes
called an audit trail.
Suppose that the DBA creates four accounts --A1, A2, A3, and A4-- and
wants only A1 to be able to create base relations; then the DBA must
issue the following GRANT command in SQL:
In SQL2 the same effect can be accomplished by having the DBA issue a
CREATE SCHEMA command as follows:
User account A1 can create tables under the schema called EXAMPLE.
Suppose that A1 wants to grant A2 the privilege to insert and delete tuples
in both of these relations, but A1 does not want A2 to be able to
propagate these privileges to additional accounts:
EMPLOYEE
NAME SSN BDATE ADDRESS SEX SALARY DNO
DEPARTMENT
DNUMBER DNAME MGRSSN
The two keys used for public key encryption are referred
to as the public key and the private key.
– the private key is kept secret, but it is referred to as private key
rather than a secret key (the key used in conventional encryption)
to avoid confusion with conventional encryption.
Public key is made for public and private key is known only
by owner.
A general-purpose public key cryptographic algorithm relies
on one key for encryption and a different but related one
for decryption. The essential steps are as follows:
1. Each user generates a pair of keys to be used for the encryption and
decryption of messages.
2. Each user places one of the two keys in a public register or other
accessible file. This is the public key. The companion key is kept
private.
Introduction
Structured, Semi structured, and Unstructured
Data.
XML Hierarchical (Tree) Data Model.
XML Documents, DTD, and XML Schema.
XML Documents and Databases.
XML Querying.
– Xpath
– XQuery
FIGURE 26.3
A complex XML
element called
<projects>.
The basic object is XML is the XML document. There are two
main structuring concepts that are used to construct an XML
document: elements and attributes. Attributes in XML provide
additional information that describe elements.
As in HTML, elements are identified in a document by their
start tag and end tag. The tag names are enclosed between
angled brackets <…>, and end tags are further identified by a
backslash </…>. Complex elements are constructed from other
elements hierarchically, whereas simple elements contain data
values.
It is straightforward to see the correspondence between the
XML textual representation and the tree structure. In the tree
representation, internal nodes represent complex elements,
whereas leaf nodes represent simple elements. That is why the
XML model is called a tree model or a hierarchical model.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-872
XML Hierarchical (Tree) Data Model (cont.)
It is possible to characterize three main types of XML
documents:
1. Data-centric XML documents:
These documents have many small data items that follow a
specific structure, and hence may be extracted from a
structured database. They are formatted as XML documents in
order to exchange them or display them over the Web.
2. Document-centric XML documents:
These are documents with large amounts of text, such as news
articles or books. There is little or no structured data elements
in these documents.
3. Hybrid XML documents:
These documents may have parts that contains structured data
and other parts that are predominantly textual or unstructured.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-873
XML Documents, DTD, and XML Schema.
Well-Formed
– It must start with an XML declaration to indicate the version of XML being used—as
well as any other relevant attributes.
– It must follow the syntactic guidelines of the tree model. This means that there should
be a single root element, and every element must include a matching pair of start tag
and end tag within the start and end tags of the parent element.
– A well-formed XML document is syntactically correct. This allows it to be
processed by generic processors that traverse the document and create an internal tree
representation.
DOM (Document Object Model) - Allows programs to manipulate the resulting
tree representation corresponding to a well-formed XML document. The whole
document must be parsed beforehand when using dom.
SAX - Allows processing of XML documents on the fly by notifying the
processing program whenever a start or end tag is encountered.
Valid
– A stronger criterion is for an XML document to be valid. In this case, the document
must be well-formed, and in addition the element names used in the start and end tag
pairs must follow the structure specified in a separate XML DTD (Document Type
Definition) file or XML schema file.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-874
XML Documents, DTD, and XML Schema (cont.)
A * following the element name means that the element can be repeated zero or more
times in the document. This can be called an optional multivalued (repeating) element.
A + following the element name means that the element can be repeated one or more
times in the document. This can be called a required multivalued (repeating) element.
A ? following the element name means that the element can be repeated zero or one
times. This can be called an optional single-valued (non-repeating) element.
An element appearing without any of the preceding three symbols must appear exactly
once in the document. This can be called an required single-valued (non-repeating)
element.
The type of the element is specified via parentheses following the element. If the
parentheses include names of other elements, these would be the children of the
element in the tree structure. If the parentheses include the keyword #PCDATA or one
of the other data types available in XML DTD, the element is a leaf node. PCDATA
stands for parsed character data, which is roughly similar to a string data type.
Parentheses can be nested when specifying elements.
A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the document.
Third, all DTD elements are always forced to follow the specified
ordering the document so unordered elements are not permitted.
Overview of Data
Warehousing and OLAP
2 Multimedia Databases
2.1 The Nature of Multimedia Data and Applications
2.2 Data Management Issues
2.3 Open Research Problems
2.4 Multimedia Database Applications
Wireless Communications –
The wireless medium have bandwidth significantly lower
than those of a wired network.
– The current generation of wireless technology has data rates range
from the tens to hundreds of kilobits per second (2G cellular
telephony) to tens of megabits per second (wireless Ethernet,
popularly known as WiFi).
– Modern (wired) Ethernet, by comparison, provides data rates on
the order of hundreds of megabits per second.
Wireless Communications –
The other characteristics distinguish wireless connectivity
options:
– interference,
– locality of access,
– range,
– support for packet switching,
– seamless roaming throughout a geographical region.
Wireless Communications –
Some wireless networks, such as WiFi and Bluetooth, use
unlicensed areas of the frequency spectrum, which may
cause interference with other appliances, such as cordless
telephones.
Modern wireless networks can transfer data in units called
packets, that are used in wired networks in order to
conserve bandwidth.
Client/Network Relationships –
Mobile units can move freely in a geographic mobility
domain, an area that is circumscribed by wireless network
coverage.
– To manage entire mobility domain is divided into one or more
smaller domains, called cells, each of which is supported by at
least one base station.
– Mobile units be unrestricted throughout the cells of domain, while
maintaining information access contiguity.
Client/Network Relationships –
The communication architecture described earlier is designed
to give the mobile unit the impression that it is attached to
a fixed network, emulating a traditional client-server
architecture.
Wireless communications, however, make other architectures
possible. One alternative is a mobile ad-hoc network
(MANET), illustrated in 29.2.
Client/Network Relationships –
In a MANET, co-located mobile units do not need to
communicate via a fixed network, but instead, form their
own using cost-effective technologies such as Bluetooth.
In a MANET, mobile units are responsible for routing their
own data, effectively acting as base stations as well as
clients.
– Moreover, they must be robust enough to handle changes in the
network topology, such as the arrival or departure of other mobile
units.
Client/Network Relationships –
MANET applications can be considered as peer-to-peer,
meaning that a mobile unit is simultaneously a client and a
server.
– Transaction processing and data consistency control become more
difficult since there is no central control in this architecture.
– Resource discovery and data routing by mobile units make
computing in a MANET even more complicated.
– Sample MANET applications are multi-user games, shared
whiteboard, distributed calendars, and battle information sharing.
Performance
– multimedia applications involving only documents
and text, performance constraints are subjectively
determined by the user.
– applications involving video playback or audio-video
synchronization, physical limitations dominate.
Histology and cell biology delve into the tissue and cellular
levels and provide knowledge about the inner structure
and function of the cell. This wealth of information that
has been generated, classified, and stored for centuries
has only recently become a major application of
database technology.
GenBank DNA/RNA Text files Flat-file/ASN.1 Schema browsing, Text, numeric, Some
sequence, schema evolution, complex types
protein linking to other dbs
OMIM Disease phenotypes Index cards/text files Flat-file/ASN.1 Unstructured, free Text
and genotypes,etc text entries linking to
other dbs
GDB Genetic map linkage Flat file Relational Schema expansion / Text, Numeric
data evolution, complex
objects, linking to
other dbs
ACEDB Genetic map linkage OO OO Schema expansion Text, Numeric
data, sequence /evolution, linking to
data(non-human) other dbs
HGMDB Sequence and Flat File-application Flat File-application Schema expansion Text
sequence variants specific specific /evolution, linking to
other dbs