Rdms Full
Rdms Full
database-management system
A database-management system (DBMS) is a collec on of interrelated data and a set
of programs to access those data. The collec on of data, usually referred to as the
database, contains informa on relevant to an enterprise. The primary goal of a DBMS
is to provide a way to store and retrieve database informa on that is both convenient
and efficient. Ex: my sql, oracle ,google
Database-System Applica ons
Databases are widely used. Here are some representa ve applica ons: Enterprise
Informa on ◦ Sales: For customer, product, and purchase informa on. ◦ Accoun ng:
For payments, receipts, account balances, assets and other accoun ng informa on.
Banking and Finance ◦ Banking: For customer informa on, accounts, loans, and
banking transac ons. ◦ Credit card transac ons: For purchases on credit cards and
genera on of monthly statements Universi es: For student informa on, course
registra ons, and grades (in addi on to standard enterprise informa on such as
human resources and accoun ng). • Airlines: For reserva ons and schedule
informa on. Airlines were among the f irst to use databases in a geographically
distributed manner. • Telecommunica on: For keeping records of calls made,
genera ng monthly bills, maintaining balances on prepaid calling cards, and storing
informa on about the communica on networks. ///The importance of
database systems can be judged in another way—today, database system vendors like
Oracle are among the largest so ware companies in the world, and database systems
form an important part of the product line of Microso and IBM.
Purpose of Database Systems
Database systems arose in response to early methods of computerized management
of commercial data. As an example of such methods, typical of the 1960s, consider
part of a university organiza on that, among other data, keeps informa on about all
instructors, students, departments, and course offerings. Add new students
,instructors, and course, Register students for courses and generate class rosters
Assign grade to students ,compute grade point averages(GPA),and generate
transcripts. // Data redundancy and inconsistency. Since different
programmers create the files and applica on programs over a long period, the
various files are likely to have different structures and the programs may be wri en in
several programming languages data inconsistency; that is, the various copies of the
same data may no longer agree. Difficulty in accessing data. Suppose that one of the
university clerks needs to find out the names of all students who live within a
par cular postal-code area. The clerk asks the data-processing department to
generate such a list. Because the designers of the original system did not an cipate
this request, there is no applica on program on hand to meet it. Data isola on.
Because data are sca ered in various files, and files may be in different formats,
wri ng new applica on programs to retrieve the appropriate data is difficult.
Integrity problems. The data values stored in the database must sa sfy certain types
of consistency constraints. Suppose the university maintains an account for each
department, and records the balance amount in each account. Suppose also that the
university requires that the account balance of a department may never fall below
zero. system. Concurrent-access anomalies .For the sake of overall performance of
the system and faster response, many systems allow mul ple users to update the
data simultaneously. Indeed, today, the largest Internet retailers may have millions of
accesses per day to their data by shoppers. Security problems. Not every user of the
database system should be able to access all the data. For example, in a university,
payroll personnel need to see only that part of the database that has financial
informa on.
View of Data
A database system is a collec on of interrelated data and a set of programs that
allow users to access and modify these data. A major purpose of a database system is
to provide users with an abstract view of the data. That is, the system hides certain
details of how the data are stored and maintained.
Data Abstrac on
For the system to be usable, itmus tretrieve data efficiently. The need for efficiency
hassled designers to use complex data structure store present data in the database.
Physical level. The lowest level of abstrac on describes how the data are ac tually
stored. The physical level describes complex low-level data structures in detail.
Logical level. The next-higher level of abstrac on describes what data are stored in
the database, and what rela onships exist among those data. The logical level thus
describes the en re database in terms of a small number of rela vely simple
structures. View level. The highest level of abstrac on describes only part of the
en re database. Even though the logical level uses simpler structures, complexity
remains because of the variety of informa on stored in a large database.
Database Languages
A database system provides a data-defini on language to specify the database
schema and a data-manipula on language to express database queries and up dates.
In prac ce, the data-defini on and data-manipula on languages are not two
separate languages; instead they simply form parts of a single database language,
such as the widely used SQL language.
Data-Manipula on Language(DML), data-defini on language(DDL),data control
Language(DCL),transec on control Language(TCL),data query Language(DQL)
Data-Manipula on Language A data-manipula on language (DML) is a language that
enables users to access or manipulate data as organized by the appropriate data
model. The types of access are: Procedural DMLs require a user to specify what data
are needed and how to get those data. Declara ve DMLs(also referred to as non
procedural DMLs)require a user to specify what data are needed without specifying
how to get those data.
Data-Defini on Language We specify a database schema by a set of defini ons
expressed by a special language called a data-defini on language(DDL).The DDL is
also used to specify addi onal proper es of the data. Domain Constraints. A domain
of possible values must be associated with every a ribute (for example, integer
types, character types, date/ me types).
Rela onal Databases
A rela onal database is based on the rela onal model and uses a collec on of tables
to represent both data and the rela onships among those data. It also in cludes a
DML and DDL. Tables Each table has mul ple columns and each column has a unique
name. Figure 1.2 presents a sample rela onal database comprising two tables: one
shows details of university instructors and the other shows details of the various
university departments. Rela onal Model. The rela onal model uses a collec on of
tables to repre sent both data and the rela onships among those data. Each table has
mul ple columns, and each column has a unique name. Object-
BasedDataModel.Object-orientedprogramming(especiallyinJava, C++, or C#) has
become the dominant so ware-development methodology
Database Design
Database systems are designed to manage large bodies of informa on. These large
bodies of informa on do not exist in isola on. They are part of the opera on of some
enterprise whose end product may be informa on from the database or may be
some device or service for which the database plays only a suppor ng role.
Design Process
Ahigh-level data model provides the data base designer with a concept ual frame
work in which to specify the data requirements of the database users, and how the
database will be structured to ful fill these requirements. The schema developed at
this conceptual-design phase provides a detailed overview of the enterprise. In a
specifica on of func onal requirements, users describe the kinds of opera ons (or
transac ons) that will be performed on the data. In the logical-design phase, the
designer maps the high-level conceptual schema onto the implementa on data
model of the database system that will be used. The designer uses the resul ng
system-specific database schema in the subsequent physical-design phase, in which
the physical features of the database are specified.
The En ty-Rela onship Model The en ty-rela onship (E-R) data model uses a
collec on of basic objects, called en es ,and rela onships among these objects. An
en ty is a “thing” or “object” in the real world that is dis nguishable from other
objects. For example, each person is an en ty, and bank accounts can be considered
as en es.
Normaliza on Another method for design in ga rela onal database is to use a process
commonly known as normaliza on. The goal is to generate a set of rela on schemas
that allows us to store informa on without unnecessary redundancy, yet also allows
us to retrieve informa on easily.
Data Storage and Querying
A database system is par oned into modules that deal with each of the re-
sponsibili es of the overall system. The func onal components of a database system
can be broadly divided into the storage manager and the query processor
components. The query processor is important because it helps the database system
to simplify and facilitate access to data. The query processor allows database users to
obtain good performance while being able to work at the view level and not be
burdened with under standing the physical-level details of the implementa on of the
system.
Storage Manager
The storage manager is the component of a database system that provides the
interface between the low-level data stored in the database and the applica on
programs and queries submi ed to the system. The storage manager is responsible
for the interac on with the file manager. Authoriza on and integrity manager, which
tests for the sa sfac on of integrity constraints and checks the authority of users to
access data. • Transac on manager, which ensures that the database remains in a
consis tent (correct) state despite system failures, and that concurrent transac on
execu ons proceed without conflic ng. • File manager, which manage the alloca on
of space on disk storage and the data structures used to represent informa on stored
on disk. • Buffer manager, which is responsible for fetching data from disk storage
into main memory, and deciding what data to cache in main memory.
The Query Processor
The query processor components include: • DDL interpreter ,which interprets DDL
statement sand records the defini ons in the data dic onary. • DML compiler, which
translates DML statements in a query language into an evalua on plan consis ng of
low-level instruc ons that the query evalua on engine understands. Query
evalua on engine, which executes low-level instruc ons generated by the DML
compiler.
Transac on Management
O en, several opera ons on the database form a single logical unit of work. An
example is a funds transfer, as in, in which one department account (say A) is debited
and another department account (say B) is credited. Clearly, it is essen al that either
both the credit and debit occur, or that neither occur. That is, the funds transfer must
happen in its en rety or not at all. This all-or-none requirement is called atomicity. A
transac on is a collec on of opera ons that performs a single logical func on in a
database applica on. the recovery manager. In the absence of failures, all
transac ons complete successfully, and atomicity is achieved easily.
Database Architecture
We are now in a posi on to provide a single picture (Figure 1.5) of the various
components of a database system and the connec ons among them. The
architecture of a database system is greatly influenced by the underlying computer
system on which the database system runs. Database systems can be centralized, or
client-server, where one server machine executes work on behalf of mul ple client
machines. Data base systems can also bed e signed to exploit parallel computer
architectures.
Data Mining and Informa on Retrieval
The term data mining refers loosely to the process of semi automa cally analyzing
large databases to find useful pa erns. Like knowledge discovery in ar ficial
intelligence (also called machine learning) or sta s cal analysis, data mining a empts
to discover rules and pa erns from data. Wever ,and certain types of informa on
cannot be extractedeven by using SQL. Several techniques and tools are available to
help with decision support. Several tools for data analysis allow analysts to view data
in different ways. Other analysis tools precompute summaries of very large amounts
of data, in order to give fast responses to queries. The SQL standard contains
addi onal constructs to support data analysis.
Database Users and Administrators
A primary goal of a database system is to retrieve informa on from and store new
informa on into the database. People who work with a database can be categorized
as database users or database administrators. Database Users and User Interfaces
There are four different types of database-system users, differen ated by the way
they expect to interact with the system. Different types of user interfaces have been
designed for the different types of users. Sophis cated users interact with the system
without wri ng programs. In stead, they form their requests either using a database
query language or by using tools such as data analys is so war Specialized users are
sophis cated users who write specialized database applica ons that do not fit into
the tradi onal data-processing framework.
Database Administrator
One of the main reasons for using DBMS sisto have central control of both the data
and the programs that access those data. A person who has such central control over
the system is called a database administrator (DBA). Schema defini on. The DBA
creates the original database schema by execu ng a set of data defini on statements
in the DDL. • Storage structure and access-method defini on. • Schema and physical-
organiza on modifica on .The DBA carries out changes to the schema and physical
organiza on to reflect the changing needs of the organiza on, or to alter the physical
organiza on to improve performance. Gran ng of authoriza on for data access. By
gran ng different types of authoriza on, the database administrator can regulate
which parts of the database various users can access. Rou ne maintenance. Examples
of the database administrator’s rou ne maintenance ac vi es are: ◦ Periodically
backing up the database, either onto tapes or onto remote servers, to prevent loss of
data in case of disasters such as flooding.
History of Database Systems
Informa on processing drives the growth of computers, as it has from the earli est
days of commercial computers. In fact, automa on of data processing tasks predates
computers. Punched cards, invented by Herman Hollerith, were used at the very
beginning of the twen eth century to record U.S. census data, and mechanical
systems were used to process the cards and tabulate results. Punched cards were
later widely used as a means of entering data into computers.
Techniques for data storage and processing have evolved over the years:
1950s and early 1960s:Magne c tapes were developed for data storage. Data
processing tasks such as payroll were automated, with data stored on tapes.
Processing of data consisted of reading data from one or more tapes and wri ng data
to a new tape. Late1960s and 1970s: Wides preaduse of hard disk sin the late 1960s
changed the scenario f or data processing greatly, since hard disks allow ed direct
access to data.
1980s: Although academically interes ng, the rela onal model was not used in
prac ce ini ally, because of its perceived performance disadvantages; rela onal data
bases could not match the perfor mance of exis ng network and hi erarchical
databases.
Early 1990s:TheSQL language was designed primarily for decision support
applica ons, which are query-intensive, yet the mainstay of databases in the 1980s
was transac on-processing applica ons, which are update-intensive.
1990s: The major event of the 1990s was the explosive growth of the World Wide
Web. Data bases were deployed much more extensively tha never before.
2000s: Thefirst half of the 2000s saw the emerging of XML and the associated query
language XQuery as a new database technology. Although XML is widely used for
data exchange, as well as for storing certain complex data types, rela onal databases
s ll form the core of a vast majority of large-scale database applica ons.
Unit-2
Structure of Rela onal Databases
A rela onal database consists of a collec on of tables, each of which is assigned a
unique name. For example, consider the instructor ,which stores informa on about
instructors. The table has four column headers: ID, name, dept name, and salary.
Each row of this table records informa on about an instructor, consis ng of the
instructor’s ID, name, dept name, and salary. Similarly, the course table of Figure 2.2
stores informa on about courses, consis ng of a course id, tle, dept name, and
credits, for each course. Note that each instructor is iden fied by the value of the
column ID, while each course is iden fied by the value of the column course.
Database Schema
When we talk about a database, we must differen ate between the database
schema, which is the logical design of the database, and the database instance, which
is a snapshot of the data in the database at a given instant in me. The concept of a
rela on corresponds to the programming-language no on of a variable, while the
concept of a rela on schema corresponds to the programming-language no on of
type defini on.
student (ID, name, dept name, tot cred) • advisor (s id, i id) • takes (ID, course id, sec
id, semester, year, grade) • classroom (building, room number, capacity) • me slot
( me slot id, day, start me, end me),.
Keys
We must have a way to specify how tuples within a given rela on are dis n guished.
This is expressed in terms of their a ributes. That is, the values of the a ribute
values of a tuple must be such that they can uniquely iden fy the tuple. In other
words, no two tuples in a rela on are allowed to have exactly the same value for all
a ributes. A superkeyis a set of one or more a ributes that, taken collec vely, allow
us to iden fy uniquely a tuple in the rela on. We are o en interested in superkeys
for which no proper subset is a superkey. Such minimal superkeys are called
candidate keys. We shall use the term primary key to denote a candidate key that is
chosen by the database designer as the principal means of iden fying tuples within a
rela on. This a ribute is called a foreign key from r1, referencing r2. The rela on r1 is
also called the referencing rela on of the foreign key depen dency, andr2 is called the
referenced rela on of the foreign key. For example, the a ribute dept name in
instructor is a foreign key from instructor, referencing depart ment,sincedept name is
the primary key of department.
Schema Diagrams
A database schema, along with primary key and foreign key dependencies, can be
depicted by schema diagrams. Figure 2.8 shows the schema diagram for our
university organiza on. Each rela on appears as a box, with the rela on name at the
top in blue, and the a ributes listed inside the box. Primary key a ributes are shown
underlined. Foreign key dependencies appear as arrows from the foreign key
a ributes of the referencing rela on to the primary key of the referenced rela on.
Referen al integrity constraints other than foreign key constraints are not shown
explicitly in schema diagrams.
Rela onal Query Languages
A query language is a language in which a user requests informa on from the
database. These languages are usually on a level higher than that of a standard
programming language. Query languages can be categorized as either procedural or
nonprocedural. In a procedural language, the user instructs the system to perform a
sequence of opera ons on the database to compute the desired result. In a non
procedural language, the user describe s the desired informa on without giving a
specific procedure for obtaining that informa on.
There are a number of “pure”
query languages: The rela onal
algebra is pro cedural, whereas
the tuple rela onal calculus and
domain rela onal calculus are
non procedural .These query
languages are terse and formal
,lacking the “syntac c sugar” of
commercial languages, but they
illustrate the fundamental
techniques for extrac ng data
from the database.
The rela ons r and s must be of the same arity. That is, they must have the same
number of a ributes. 2. The domains of the ith a ribute of r and the ith a ribute of s
must be the same, for all i.
The Set-Difference Opera on The set-difference opera on, denoted by −, allows us to
find tuples that are in one rela on but are not in another. The expression r − s
produces a rela on containing those tuples in r but not in s.
The Cartesian-Product Opera on The Cartesian-product opera on, denoted by a
cross (×), allows us to combine informa on from any two rela ons. We write the
Cartesian product of rela ons r1 and r2 as r1 ×r2.
dept name to Physics is applied
to instructor, and the Cartesian
product is applied subsequently;
in contrast, the Cartesian
product was applied before the
selec on in the earlier query.
However, the two queriesare
equivalent; that is, they give the
same result on any database.
The Rename Opera on Unlike rela ons in the database, the results of rela onal-
algebra expressions do not have a name that we can use to refer to them. It is useful
to be able to give them names; the rename operator, denoted by the lowercase
Greek le er rho ( ), lets us do this. Given a rela onal-algebra expression E, the
expression.
Formal Defini on of the Rela onal Algebra The opera on allow us to give a complete
defini on of an expres sion in the rela onal algebra. A basic expression in the
rela onal algebra consists of either one of the following: • A rela on in the database
• A constant rela on.
Addi onal Rela onal-Algebra Opera ons
The fundamental opera ons of the rela onal algebra are sufficient to express any
rela onal-algebra query.However ,if were strictour selves to just the fundamental
opera ons, certain common queries are lengthy to express. Therefore, we define
addi onal opera ons that do not add any power to the algebra, but simplify
The Set-Intersec on Opera on The first addi onal rela onal-algebra opera on that
we shall define is set inter sec on (∩). Suppose that we wish to find the set of all
courses taught in both the Fall 2009 and the Spring 2010 semesters// The result
rela on for this query appears in Figure 6.13. Note that we can rewrite any rela onal-
algebra expression that uses set in ter sec on by replacing the intersec on opera on
with a pair of set-difference opera ons.
The Natural-Join Opera on It is o en desirable to simplify certain queries that
require a Cartesian product. Usually, a query that involves a Cartesian product
includes a selec on opera on on the result of the Cartesian product. The selec on
opera on most o en requires that all a ributes that are common to the rela ons
that are involved in the Cartesian product be equated.
The Natural Join In our example query that combined informa on from the instructor
and teaches table, the matching condi on required instructor. ID to be equal to
teaches. ID. These are the only a ributes in the two rela ons that have the same
name. In fact this is a common case; that is, the matching condi on in the from
clause most o en requires all a ributes with matching names to be equated.
Addi onal Basic Opera ons
There are number of addi onal basic opera ons that are supported in SQL. 3.4.1 The
Rename Opera on Consider again the query that we used earlier. The names of the
a ributes in the result are derived from the names of the a ributes in the rela ons in
the from clause. We cannot, however, always derive names in this way, for several
reasons: First, two rela ons in the from clause may have a ributes with the same
name, in which case an a ribute name is duplicated in the result. Second, if we used
an arithme c expression in the select clause, the resultant a ribute does not have.
String Opera ons SQL specifies strings by enclosing the minsingle quotes, for
example,’ Computer’. A single quote character that is part of a string can be specified
by using two single quote characters; for example ,the string “It’sright” can be
specified by“ It” sright”. The SQL standard specifies that the equality opera on on
strings is case sensi ve; as a result the expression “’comp. sci.’ = ’Comp. Sci.’”
evaluates to false. However, some database systems, such as MySQL and SQL Server,
do not dis nguish uppercase from lowercase when matching strings; as a result
“’comp. sci.’ =’Comp.Sci.
A ribute Specifica on in Select Clause Theasterisk symbol “*”can be used in the
select clause to denot e“all a ributes.” Thus, the use of instructor.* in the select
clause of the query:’ Ordering the Display of Tuples SQL offers the user some control
over the order in which tuples in a rela on are displayed. The order by clause causes
the tuples in the result of a query to appear in sorted order. Where Clause Predicates
SQL includes a between comparison operator to simplify where clauses that specify
that a value be less than or equal to some value and greater than or equal to some
other value.
Set Opera ons
The SQL opera ons union, inter sect ,and except operate on rela ons and cor
respond to the mathema cal set-theory opera ons ∪, ∩,and−. We shall now
construct queries involving the union, intersect, and except opera ons over two sets.
The Union Opera on The union opera on automa cally eliminates duplicates, unlike
the select clause. Thus, using the sec on rela on of Figure 2.6, where two sec ons of
CS-319 are offered in Spring 2010, and a sec on of CS-101 is offered in the Fall 2009
as well as in the Fall 2010 semester.
The Except Opera on The number of duplicate copies of a tuple in the result is equal
to the number of duplicate copies in c1 minus the number of duplicate copies in
c2,providedthat the difference is posi ve. Thus, if 4 sec ons of ECE-101 were taught
in the Fall 2009 semesterand2sec onsofECE-101 were taugh n Spring 2010, then
there are 2 tuples with ECE-101 in the result.
Aggrega on with Grouping There are circumstances where we would like to apply the
aggregate func on not only to a single set of tuples, but also to a group of sets of
tuples; we specify this wish in SQL using the group by clause. The a ribute or
a ributes given in the group by clause are used to form groups. The Having Clause At
mes, it is useful to state a condi on that applies to groups rather than to tuples. For
example, we might be interested in only those departments where the average salary
of the instructors is more than $42,000. Aggrega on with Null and Boolean Values
Null values, when they exist, complicate the processing of aggregate operators. For
example, assume that some tuples in the instructor rela on have a null value for
salary. Consider the following query to total all salary amounts:
Nested Subqueries
SQL provides a mechanism for nes ng subqueries. A subquery is a select-from where
expression that is nested within another query. A common use of sub queries is to
perform tests for set membership, make set comparisons, and deter mine set
cardinality, by nes ng subqueries in the where clause. We study such uses of nested
subqueries in the where clause. Set Membership SQL allows tes ng tuples for
membership in a rela on. The in connec ve tests for set membership, where the set
is a collec on of values produced by a select clause. The not in connec ve tests for
the absence of set membership. Set Comparison As an example of the ability of a
nested subquery to compare sets, consider the query “Find the names of all
instructors whose salary is greater than at least one instructor in the Biology
department.” Test for Empty Rela ons group by dept name); SQL includes a feature
for tes ng whether a subquery has any tuples in its result. The exists construct
returns the value true if the argument sub query is nonempty. Using the exists
construct, we can write the query. Test for the Absence of Duplicate Tuples SQL
includes a boolean func on for tes ng whether a subquery has duplicate tuples in its
result. The unique construct9 returns the value true if the argument subquery
contains no duplicate tuples. Subqueries in the From Clause SQL allows a subquery
expression to be used in the from clause.
Modifica on of the Database
Wehave restricted our a en on un l now to the extrac on of informa on from the
data base. Now, wes how how to add, remove, or change informa on with SQL. 3.9.1
Dele on Adeletereques sexpressedinmuchthesamewayasaquery.
Inser on To insert data into a rela on, we either specify a tuple to be inserted or
write a query who seresul sa set of tuple stobe inserted. Obviously, the a ribute
values for inserted tuples must be members of the corresponding a ribute’s domain.
Updates In certain situa ons, we may wish to change a value in a tuple without
changing all values in the tuple. For this purpose, the update statement can be used.
As we could for insert and delete, we can choose the tuples to be updated by using
Join Expressions
, we introduced the natural join opera on. SQL provides other forms of the join
opera on, including the ability to specify an explicit join pred icate, and the ability to
include in the result tuples that are excluded by natural join. We shall discuss these
forms of join in this sec on.
Join Condi ons The result of the above
query is shown in Figure 4.3. The on
condi on can express any SQL predicate,
and thus a join expressions using the on
condi on can express a richer class of join
condi ons than natural join. However, as
illustrated by our preceding
Outer Joins Suppose we wish to display a list of all students, displaying their ID ,and
name, dept name, and tot cred, along with the courses that they have taken. The
following SQL query may appear to retrieve the required informa on Join Types and
SQL. A join clause can thus specify inner join instead of outer join to specify that a
normal join is to be used. The keyword inner is, however, op onal. The default join
type, when the join clause is used without the outer prefix is the inner join.
Views
In our examples up to this point, we have operated at the logical-model level. That is,
we have assumed that the rela ons in the collec on we are given are the actual
rela ons stored in the database View Defini on We define a view in SQL by using the
create view command. To define a view, we must give the view a name and must
state the query that computes the view.T he form of the create view command is
Using Views in SQL Queries Once we have defined a view, we can use the view name
to refer to the virtual rela on that the view generates. Using the view physics fall
2009, we can find all Physics courses offered. Materialized Views Certain database
systems allow view rela ons to be stored, but they make sure that, if the actual
rela ons used in the view defini on change, the view is kept up-to-date. Such views
are called materialized views. Update of a View Although views are a useful tool for
queries, they present serious problems if we express updates, inser ons, or dele ons
with them.
Transac ons
A transac on consists of a sequence of query and/or update statements. The SQL
standard specifies that a transac on begins implicitly when an SQL statement is
executed. One of the following SQL statements must end the transac on: • Commit
work commits the current transac on; that is, it makes the updates performed by the
transac on become permanent in the database. A er the transac on is commi ed, a
new transac on is automa cally started. • Rollback work causes the current
transac on to be rolled back; that is, it undoes all the updates performed by the SQL
statements in the transac on. Thus, the database state is restored to what it was
before the first statement of the transac on was executed.
Integrity Constraints
Integrity constraints ensure that changes made to the database by authorized users
do not result in a loss of data consistency. Thus, integrity constraints guard against
accidental damage to the database. Examples of integrity constraints are: • An
instructor name cannot be null. • Not wo instructors can have the same instructor ID.
• Every department name in the course rela on must have a matching depart ment
name in the department rela on. • The budge to fa department must be greater
than $0.00.
Constraints on a Single Rela on We described how to define tables using the create
table command. The create table command may also include integrity-constraint
statements.
Not Null Constraint As we discussed in Chapter 3, the null value is a member of all
domains, and as a result is a legal value for every a ribute in SQL by default. For
certain a ributes, however, null values may be inappropriate. The check Clause When
applied to a rela on declara on, the clause check(P) specifies a predicate P that must
be sa sfied by every tuple in a rela on.
SQL Data Types and Schemas
we covered a number of built-in data types supported in SQL, such as integer types,
real types, and character types. There are addi onal built-in data types supported by
SQL, which we describe below .We also describe how to create basic user-defined
types in SQL. Date and Time Types in SQL In addi on to the basic data types we
introduced in Sec on 3.2, the SQL standard supports several data types rela ng to
dates and mes: • date: A calendar date containing a (four-digit) year, month, and
day of the month. • me: The me of day, in hours, minutes, and seconds. A variant,
me(p), It is also possible to store me-zone informa on along with the me by
specifying me with mezone. • me stamp:A combina on of date and me .A
variant, me stamp(p),canbe used to specify the number of frac onal digits for
seconds (the default here being 6).
Default Values The default value of the tot cred a ribute is declared to be 0. As a
result, when a tuple is inserted into the student rela on, if no value is provided for
the tot cred a ribute, its value is set to 0.
Index Crea on An index on an a ribute of a rela on is a data structure that allows
the database system to find those tuples in the rela on that have a specified value
for that a ribute efficiently, without scanning through all the tuples of the rela on.
Large-Object Types For result tuples containing large objects (mul ple mega by test
ogi ga bytes),it is inefficient or imprac cal to retrieve an en re large object into
memory. Instead, an applica on would usually use an SQL query to retrieve a
“locator” for a large object and then use the locator to manipulate the object from
the host language in which the applica on itself is wri en.
User-Defined Types SQL supports two forms of user-defined data types. The first
form, which we cover here, is called dis nct types. The other form, called structured
data types, allows the crea on of complex data types with nested record structures,
arrays A good type system should be able to detect such assignments or comparisons.
To support such checks, SQL provides the no on of dis nct types. The create type
clause can be used.
Unit-4
The Tuple Rela onal Calculus
When wewrite a rela onal-algebra expression, we provide a sequence of proce
dures that generates the answer to our query. The tuple rela onal calculus, by
contrast, is a nonprocedural query language. It describes the desired informa on
without giving a specific procedure for obtaining that informa on. Aquery in the
tuple rela onal calculus is expressed as: Example Queries Find the ID, name, dept
name, salary for instructors whose salary is greater than $80,000: {t | t ∈ instructor ∧
t[salary] > 80000} Suppose that we want only the ID a ribute, rather than all
a ributes of the instructor rela on. To write this query in the tuple rela onal
calculus, we need to write an expression for a rela on on the schema(ID). We need
tho setup leson(ID) such that there is a tuple in instructor with the salary a ribute >
80000. To express this request, we need the construct “there exists” from
mathema cal logic. Formal Defini on Weare now ready for a formal defini on. A
tuple-rela onal-calculus expression is of the form: {t|P(t)} where P is a formula.
Several tuple variables may appear in a formula. A tuple variable is said to be a free
variable unless it is quan fied by a ∃ or ∀.Thus,in: t ∈ instructor ∧∃s ∈
department(t[dept name] = s[dept name]) t is a free variable. Tuple variable s is said
to be a bound variable. Atuple-rela onal-calculus formula is built up out of atoms. A
natom has one of the following forms: Safety of Expressions There is one final issue
to be addressed. A tuple-rela onal-calculus expression maygenerate an infinite
rela on. Suppose that we write the expression: {t |¬ (t ∈ instructor)} There are
infinitely many tuples that are not in instructor. Most of the setup les contain values
that do not even appear in the database! Clearly, we do not wish to allow such
expressions. To help us define a restric on of the tuple rela onal calculus, we
introduce the concept of the domain of a tuple rela onal formula, P. Intui vely, the
domain of P, denoted dom(P), is the set of all values referenced by P. They include
values men oned in P itself, as well as values that appear in a tuple of a rela on
men oned in P.T. Expressive Power of Languages The tuple rela onal calculus
restricted to safe expressions is equivalent in expres sive power to the basic rela onal
algebra(with the operators∪,−,×, ,and ,but with out the extended rela onal
opera ons such as generalized projec on andag gre ga on (G)). Thus, for every
rela onal-algebra expression using only the basic opera ons, there is an equivalent
expression in the tuple rela onal calculus, and for every tuple-rela onal-calculus
expression, there is an equivalent rela onal algebra expression.
The Domain Rela onal Calculus A second form of rela onal calculus, called domain
rela onal calculus,uses domain variables that take on values from an a ributes
domain, rather han values for an en re tuple. The domain rela onal calculus,
however, is closely related to the tuple rela onal calculus. Domain rela onal calculus
serves as the theore cal basis of the widely used QBE language (see Appendix B.1),
just as rela onal algebra serves as the basis for the SQL language.
Formal Defini on An expression in the domain rela onal calculus is of the form {< x1,
x2,...,xn > | P(x1, x2,...,xn)} wherex1, x2,...,xn represent domain variables.
Prepresents a form ula composed of atoms, as was the case in the tuple rela onal
calculus. An atom in the domain rela onal calculus has one of the following forms: •
∈ r,wherer is a rela on on n a ributes and x1, x2,...,xn are domain variables or
domain constants. • x y,wherexand y are domain variables and is a comparison
operator (, ≥). We require that a ributes x and y have domains that can be
compared.
Example Queries We now give domain-rela onal-calculus queries for the examples
that we con sidered earlier. Note the similarity of these expressions and the
corresponding tuple-rela onal-calculus expressions. • Find the instructor ID, name,
dept name ,and salary for instructors who se salary is greater than $80,000: {< i,n,d,s
> | < i,n,d,s >∈ instructor ∧ s > 80000} • Findall instructor ID for instructors whose
salary is greater than $80,000: {< n > |∃i,d,s (< i,n,d,s >∈ instructor ∧ s > 80000)}
Although the second query appears similar to the one that we wrote for the tuple
rela onal calculus, there is an important difference. In the tuple calculus, when we
write ∃ s for some tuple variable s,.
Safety of Expressions We noted that, in the tuple rela onal calculus (Sec on 6.2), it is
possible to write expressions that may generate an infinite rela on. That led us to
define safety for tuple-rela onal-calculus expressions. A similar situa on arises for
the domain rela onal calculus. An expression such as {< i,n,d,s > |¬(< i,n,d,s > ∈
instructor)} is unsafe, because it allows values in the result that are not in the domain
of the expression. For the domain rela onal calculus, we must beconcerned also a
bout the form of formulae within “there exists” and “for all” clauses. Consider the
expression {< x > |∃y(< x, y>∈ r) ∧∃z(¬(< x, z>∈ r) ∧ P(x,z))} where P is some formula
involving x and z. We can test the first part of the formula, ∃ y (< x, y >∈ r),by
considering only the values in r. Expressive Power of Languages When the domain
rela onal calculus is restricted to safe expressions, it is equiv alent in expressive
power to the tuple rela onal calculus restricted to safe ex pressions. Since we noted
earlier that the restricted tuple rela onal calculus is equivalent to the rela onal
algebra, all three of the following are equivalent: • The basic rela onal algebra
(without the extended rela onal-algebra opera ons) • The tuple rela onal calculus
restricted to safe expressions • The domain rela onal calculus restricted to safe
expressions.
Overview of the Design Process
The design of a complete database applica on environment that meets the needs of
the enterprise being modeled requires a en on to a broad set of issues. These
addi onal aspects of the expected use of the database influence a variety of design
choices at the physical, logical, and view levels.
Design Phases The ini al phase of database design is to characterize fully the data
needs of the prospec ve database users. The database designer needs to interact ex
tensively with domain experts and users to carry out this task. The outcome of this
phase is a specifica on of user requirements. Next ,the designer chooses a data
model and, by applying the concepts of the chosen data model, translates these
requirements into a conceptual schema of the database. The schema developed at
this conceptual-design phase pro vides a detailed overview of the enterprise.
Design Alterna ves Amajor part of the database design process is deciding how to
represent in the design the various types of “things” such as people, places, products,
and the like. We use the term en ty to refer to any such dis nctly iden fiable item. In
a university database, examples of en es would include instructors, students,
departments, courses, and course offerings. Redundancy: A bad design may repeat
informa on. For example, if we store the course iden fier and tle of a course with
each course offering, the tle would be stored redundantly (that is, mul ple mes,
unnecessarily) with each course offering. Incompleteness: A bad design may make
certain aspects of the enterprise difficult or impossible to model. For example,
suppose that, as in case (1) above, we only had en es corresponding to course
offering, without hav ing an en ty corresponding to courses.
The En ty-Rela onship Model
The en ty-rela onship (E-R) data model was developed to facilitate database design
by allowing specifica on of an enterprise schema that represents the overall logical
structure of a database. The E-R model is very useful in mapping the meanings and
interac ons of real-world enter priseson to a concept ual schema. Because of this use
fulness ,many database-design tools draw on concepts from the E-R model.
En ty Sets An en ty is a “thing” or “object” in the real world that is dis nguishable
from all other objects. For example, each person in a university is an en ty. An en ty
has a set of proper es, and the values for some set of proper es may uniquely
iden fy an en ty. For instance, a person may have a person id property whose An
en ty is represented by a set of a ributes. A ributes are descrip ve proper es
possessed by each member of an en ty set. The designa on of an a ribute for an
en ty set expresses that the database stores similar informa on concerning each
en ty in the en ty set; however, each en ty may have its own value for each
a ribute.
Rela onship Sets A rela onship is an associa on among several en es. For example,
we can define a rela onship advisor that associates instructor Katz with student
Shankar. This rela onship specifies that Katz is an advisor to student Shankar. A
rela onship set is a set of rela onships of the same type. Formally, it is a
mathema cal rela on on n ≥ 2(possibly non dis nct)en ty sets.If E1, E2,...,En are
en ty sets, then a rela onship set R is a subset of {(e1, e2,...,en) | e1 ∈ E1,e2 ∈
E2,...,en ∈ En} where (e1,e2,...,en) is a rela onship.
A ributes For each a ribute, there is a set of permi ed values, called the
domain,orvalue set, of that a ribute. The domain of a ribute course id might be the
set of all text strings of a certain length. Similarly, the domain of a ribute semester
might be strings from the set {Fall, Winter, Spring, Summer}.
Reduc on to Rela onal Schemas
We can represent a database that conforms to an E-R database schema by a col
lec on of rela on schemas. For each en ty set and for each rela onship set in the
database design, there is a unique rela on schema to which we assign the name of
the corresponding en ty set or rela onship set. Both the E-R model and the
rela onal database model are abstract, logical representa ons of real-world enter
prises. Because thet wo models employ similar design principles, we can convert an
E-R design into a rela onal design.
Representa on of Strong En ty Sets with Simple A ributes Let E be a strong en ty
set with only simple descrip ve a ributes a1, a2,...,an. Were presen his en ty by a
schema called E with ndist in ct a ributes. Each tuple in a rela on on this schema
corresponds to one en ty of the en ty set E. For schemas derived from strong en ty
sets, the primary key of the en ty set serves as the primary key of the resul ng
schema. This follows directly from the fact that each tuple corresponds to a specific
en ty in the en ty set. As an illustra on, consider the en ty set student of the E-R
diagram in Fig ure 7.15. This en ty set has three a ributes: ID, name, tot cred. Were
present this en ty set by a schema called student with three a ributes:
Representa on of Strong En ty Sets with Complex A ributes When a strongen ty set
has non simple a ributes, things are a bit more complex. We handle composite
a ributes by crea ng a separate a ribute for each of the component a ributes; we
do not create a separate a ribute for the composite a ribute itself. Mul valued
a ributes are treated differently from other a ributes. We have
seenthata ributesinanE-Rdiagramgenerallymapdirectlyintoa ributesforthe
appropriate rela on schemas. Mul valued a ributes, however, are an excep on;
new rela on schemas are created for these a ributes, as we shall see shortly.
Representa on of Weak En ty Sets For schemas derived from a weak en ty set, the
combina on of the pri mary key of the strong en ty set and the discriminator of the
weak en ty set serves as the primary key of the schema. In addi on to crea ng a
primary key, we also create a foreign-key constraint on the rela on A,
Representa on of Rela onship Sets Let R be a rela onship set, let a1,a2,...,am be the
set of a ributes formed by the union of the primary keys of each of the en ty sets
par cipa ng in R, and let the descrip ve a ributes (if any) of R be b1,b2,...,bn. We
represent this rela onship set by a rela on schema called R with one a ribute for
each member of the set: {a1,a2,...,am} ∪ {b1,b2,...,bn
Redundancy of Schemas Arela onship set linking a weak en ty set to the
corresponding strong en ty set is treated specially. As we noted in Sec on 7.5.6,
these rela onships are many-to one and have no descrip ve a ributes. Furthermore,
the primary key of a weak en ty set includes the primary key of the strong en ty set.
En ty-Rela onship Design Issues The no ons of an en ty set and are la on ship set
are not precise, and i spossible to define a set of en es and the rela onships
among them in a number of different ways. In this sec on, we examine basic issues in
the design of an E-R database schema. Sec on 7.10 covers the design process in
further detail.
Use of En ty Sets versus A ributes Consider the en ty set instructor with the
addi onal a ribute phone number (Fig ure 7.17a.) It can easily be argued that a
phone is an en ty in its own right with a ributes phone number and loca on; the
loca on may be the office or homewhere the phone is located, with mobile (cell).
Use of En ty Sets versus Rela onship Sets It is not always clear whether an object is
best expressed by an en ty set or a rela onship set. In Figure 7.15, we used the takes
rela onship set to model the situa on where a student take sa(sec on of a)course.
An alter na ve is to imagine that there is a course-registra on record for each course
that each student takes. Then, we have an en ty set to represent the course-
registra on record.
Binary versus n-ary Rela onship Sets Rela onships in databases are o en binary.
Some rela onships that appear to be nonbinary could actually be be er represented
by several binary rela onships. For instance, one could create a ternary rela onship
parent, rela ng a child to his /her mothe rand father. How ever ,sucha rela on ship
could also bere presented by two binary rela onships, mother and father, rela ng a
child to his/her mother and father separately.
Placement of Rela onship A ributes The cardinality ra o of a rela onship can affect
the placement of rela onship a ributes. Thus, a ributes of one-to-one or one-to-
many rela onship sets can be associated with one of the par cipa ng en ty sets,
rather than with the rela on ship set. For instance, let us specify that advisor .
Extended E-R Features
Although the basic E-R concepts can model most database features, some aspects of
a database may be more aptly expressed by certain extensions to the basic E-R
model. In this sec on, we discuss the extended E-R features of specializa on,
generaliza on, higher- and lower-level en ty sets, a ribute inheritance, and
aggrega on.
Specializa on An en ty set may include subgroupings of en es that are dis nct in
some way from other en es in the set. For instance, a subset of en es within an
en ty set may have a ributes that are not shared by all the en es in the en ty set.
The E-R model provides a means for represen ng these dis nc ve en ty groupings.
A ribute Inheritance Acrucialpropertyo hehigher-andlower-
levelen escreatedbyspecializa on and generaliza on is a ribute inheritance. The
a ributes of the higher-level en ty sets are said to be inherited by the lower-level
en ty sets. For example, student and employee inherit the a ributes of person.
Constraints on Generaliza ons To model an enterprise more accurately, the database
designer may choose to place certain constraints on a par cular generaliza on. One
type of constraint involves deter mining which en es can be members of a given
lower-level en ty set.
Generaliza on There fine ment froman ini l en ty set into successive levelsof en ty
subgroup ingsrepresentsatop-
downdesignprocessinwhichdis nc onsaremadeexplicit. The design process may also
proceed in a bo om-up manner.
A ribute Inheritance Acrucialpropertyo hehigher-andlower-
levelen escreatedbyspecializa on and generaliza on is a ribute inheritance. The
a ributes of the higher-level en ty sets are said to be inherited by the lower-level
en ty sets. For example, student and employee inherit the a ributes of person.
The Unified Modeling Language UML En ty-rela onship diagrams help model the
data representa on component of a so ware system. Data representa on, however,
forms only one part of an overall system design. Other components include models
of user interac ons with the system, specifica on of func onal modules of the
system and their interac on, etc. The Unified Modeling Language (UML).
Other Aspects of Database Design
Our extensive discussion of schema design in this chapter may create the false
impression that schema sing’s the only component ofadatabasedesign.There are
indeed several other considera ons that we address more fully in subsequent
chapters, and survey briefly here.
Data Constraints and Rela onal Database Design We have seen a variety of data
constraints that can be expressed using SQL, including primary-key constraints,
foreign-key constraints, check constraints, asser ons, and triggers. Constraints serve
several purposes. The most obvious one is the automa on of consistency
preserva on. By expressing constraints in the SQL data-defini on language, the
designer is able to ensure that the database system itself enforces the constraints.
Usage Requirements: Queries, Performance Database system performance is a
cri cal aspect of most enterprise informa on systems. Performance pertains not only
to the efficient use of the compu ng and storage hardware being used, but also to
the efficiency of people who interact with the system and of processes that depend
upon database data. Throughput—the number of queries or updates, Response
me—the amount of me a single transac on takes.
Authoriza on Requirements Authoriza on constraints
affectdesigno hedatabaseaswellbecauseSQLallows access to be granted to users on
the basis of components of the logical design of the database. A rela on schema may
need to be decomposed into two or more schemas to facilitate the gran ng of access
rights in SQL. For example, an employee record may include data rela ng to payroll,
job func ons, and medical benefits.
Data Flow, Workflow Database applica ons are o en part of a larger enterprise
applica on that in teracts not only with the database system but also with various
specialized ap plica ons. For example, in a manufacturing company, a computer-
aided design (CAD) system may assist in the design of new products. The CAD system
may extract data from the database via an SQL statement, process the data internally,
perhaps interac ng with a product designer, and then update the database.
Other Issues in Database Design Database design is usually not a one- me ac vity.
The needs of an organiza on evolve con nually, and the data that it needs to store
also evolve correspond ingly. During the ini al database-design phases, or during the
development of an applica on, the database designer may realize that changes are
required at the conceptual, logical, or physical schema levels.