i Bcom(CA) Database Management Systems
i Bcom(CA) Database Management Systems
UNIT – I..................................................................................... 1
1.1 INTRODUCTION...............................................................................1
UNIT – II...................................................................................23
2.4 KEY.................................................................................................25
2.7 SQL.................................................................................................27
UNIT – III..................................................................................39
3.3 VIEW...............................................................................................46
3.4 TRANSCATIONS.............................................................................49
3.5 AUTHORIZATION...........................................................................50
3.8 TRIGGERS......................................................................................56
UNIT - IV..................................................................................69
4.4 CONSTRAINTS...............................................................................76
UNIT - V.................................................................................114
OBJECTIVE TEST-I....................................................................................144
SYLLABUS
UNIT – IV: Database Design and the E-R Model: Overview of the Data
Process – The Entity-Relationship Model – Constraints - Entity-Relationship
Diagram - Entity-Relationship Design Issues – Extended E-R Features.
Relational Database Design: Atomic Domain and First Normal Form –
Decomposition using Functional Dependency - Functional Dependency
Theory - Decomposition using Multivalued Dependencies – More Normal
Form.
1
time market data to enable on-line trading by customers and
automated.
Sales:-
For customer, product and purchase information.
2
On-line retailers:-
For sales data noted above plus on-line order tracking,
generation of
recommendation lists , and maintenance of on-line product
evaluations.
Manufacturing:-
For management of the supply chain for tracking production of
items in
factories, inventories of items in warehouses, stores and orders for
items.
Human resources:-
For information about employees, salaries, payroll taxes,
benefits, and for generation of paychecks.
Data Independence:-
3
Reduced application development time:-
The DBMS supports many important functions that are common
to applications accessing data stored in the DBMS.
1.4 VIEW OF DATA
The overview of data used to explain how to organize information in
a DBMS and to maintain it and retrieve it effectively. That is , it is used to
explain how to design a database and use a DBMS effectively.
Database Design:-
It is used to describe a real world enterprise in terms of the
data stored in DBMS.
It explains about factor to be considered while the time of
data organization.
Data analysis:-
It describes about the of query to the user and it helps them
to answer.
Concurrency and Robustness:-
It describes the concurrency that is how DBMS allow many
users to access data concurrently. Also provides guidelines how
DBMS protects the data in the event of system failures.
Efficiency and Scalability:-
4
Logical level:-
This describes what data are stored in the database and what are
the relationship exist among those data. The entire database is thus
described in terms of a small number of relatively simple structures.
View level:-
Conceptual
schema
Physical
schema
Disk
5
1.5 DATA MODELS
The three levels of data modeling, conceptual data model, logical
data model, and physical data model.
Hierarchical database model
Relational model
Network model
Object-oriented database model
Entity-relationship model
Conceptual Data Model
Logical Data Model
Physical Data Model
Hierarchical Database Model:-
The hierarchical model organizes data into a tree-like structure,
where each record has a single parent or root. Sibling records are sorted
in a particular order. That order is used as the physical order for storing
the database. This model is good for describing many real-world
relationships.
Relational Model:-
The most common model, the relational model sorts data into
tables, also known as relations, each of which consists of columns and
6
rows. Each column lists an attribute of the entity in question, such as
price, zip code, or birth date.
Together, the attributes in a relation are called a domain. A
particular attribute or combination of attributes is chosen as a primary key
that can be referred to in other tables, when it’s called a foreign key.
Each row, also called a tuple, includes data about a specific instance
of the entity in question, such as a particular employee.
The model also accounts for the types of relationships between
those tables, including one-to-one, one-to-many, and many-to-many
relationships. Here’s an example:
Network Model:-
The network model builds on the hierarchical model by allowing
many-to-many relationships between linked records, implying multiple
parent records.
Based on mathematical set theory, the model is constructed with
sets of related records. Each set consists of one owner or parent record
and one or more member or child records. A record can be a member or
child in multiple sets, allowing this model to convey complex relationships.
It was most popular in the 70s after it was formally defined by the
Conference on Data Systems Languages (CODASYL).
7
Object-Oriented Database Model:-
This model defines a database as a collection of objects, or reusable
software elements, with associated features and methods. There are
several kinds of object-oriented databases:
A multimedia database incorporates media, such as images, that
could not be stored in a relational database.
A hypertext database allows any object to link to any other
object. It’s useful for organizing lots of disparate data, but it’s not ideal for
numerical analysis.
The object-oriented database model is the best known post-
relational database model, since it incorporates tables, but isn’t limited to
tables. Such models are also known as hybrid database models.
Entity-Relationship Model:-
This model captures the relationships between real-world entities
much like the network model, but it isn’t as directly tied to the physical
structure of the database. Instead, it’s often used for designing a
database conceptually.
Here, the people, places, and things about which data points are
stored are referred to as entities, each of which has certain attributes that
together make up their domain. The cardinality, or relationships between
entities, are mapped as well
8
Conceptual Data Model:-
A conceptual data model identifies the highest-level relationships
between the different entities. Features of conceptual data model include:
Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.
From the figure above, we can see that the only information shown via
the conceptual data model is the entities that describe the data and the
relationships between those entities. No other information is shown
through the conceptual data model.
9
Normalization occurs at this level.
The steps for designing the logical data model are as follows:-
Specify primary keys for all entities.
Find the relationships between different entities.
Find all attributes for each entity.
Resolve many-to-many relationships.
Normalization.
Comparing the logical data model shown above with the conceptual
data model diagram, we see the main differences between the two:
In a logical data model, primary keys are present, whereas in a
conceptual data model, no primary key is present.
In a logical data model, all attributes are specified within an entity.
No attributes are specified in a conceptual data model.
11
Comparing the logical data model shown above with the logical data
model diagram, we see the main differences between the two:
Entity names are now table names.
Attributes are now column names.
Data type for each column is specified. Data types can be different
depending on the actual database being used.
12
Two types of DML are:-
Procedural DMLs:-
It requires a user to specify what data are needed and how to get
those data.
Nonprocedural DMLs :-
It requires a user to specify what data are needed without specifying
how to get those data.
It is also referred to as declarative DMLs.
It is easy to learn.DML component of SQL language is
nonprocedural.
A query is a statement requesting the retrieval of information.
The portion of a DML that involves information retrieval is called
a query language.
Example: Select, insert, Update, and Delete commands
Select customer. Customer _ name
From Customer
Where customer. Customer_ id=192-83-7465
1.7 RELATIONAL DATABASE
A relational database store data in a series of tables so that the data
models a mathematical theory of relations.
The model allows for queries based on projection, selection and join,
among other operations, and connect the data in the tables by way
of keys. The queries are expressed in a standard syntax called SQL,
the Standard Query Language, which is common to all various
vendors of relational databases.
The theory of relations states that data is arranged as various sets
of tuples, called relations, where a tuple is collection of values for
attributes. A relation states which attributes it collects.
Concretely speaking, the attributes are the columns of a table, and
the tuples are rows in the table.
Constraints among the attributes will allow only certain tuples to be
valid members of the relation, and the database should not allow
13
rows into be inserted in the table if they would violate the
constraints.
For instance, the mathematical theory says that if two tuples agree in the
value of all attributes, they are the same tuple.
In a table, it is possible for two distinct rows to contain the same data for
all columns. However, the database should prevent this from happening
because that would not be consistent with the mathematical model.
Account (Account-schema)
14
Rs-Puram,Sbi Coimbatore 60000
15
1.9 DATA STORAGE AND QUERY
Speed with which data can be accessed
Cost per unit of data
Reliability:-
Data loss on power failure or system crash
Physical failure of the storage device
Can differentiate storage into:-
Volatile storage: loses contents when power is switched off
Nonvolatile storage: Contents persist even when power
is switched off. Includes secondary and tertiary storage, as well as
battery backed up main memory.
TRANSACTION:-
16
Data are scattered in various files and files may be in different
formats, writing new application programs to retrieve the
appropriate data is difficult.
Durability
After a transaction completes successfully, the changes it has
made to the database persist, even if there are system failures.
1.11 DATABASE SYSTEM ARCHITECTURE
A database structure partitioned into several modules .The modules
deals with responsibilities of the overall system.
The DBMS accepts SQL commands generated from a variety of user
interfaces and returns the answers. When a user issues a query uses
information about how the data is stored to produce an efficient execution
plan for evaluating the query.
The architecture of a database system is greatly influenced by the
underlying complete system on which it runs, by such aspects of
computer architectures as networking, parallelism and distribution:
Networking of computers allows some tasks to be executes on
server system, and some tasks to be executed on client systems. This
division of work is client-server database systems.
Parallel processing within a computer system allows database
system activities to be speed up, allowing faster response to transactions,
as well as more transactions per second. The need for parallel query
processing has let to parallel database system.
Keeping multiple copies of the database across different sites also
allows large organizations to continue their database operations even
though the site is affected by a natured disaster. The distributed
database systems handle geographically or administratively distributed
data spread across multiple database system.
Functional components of DBMS are:-
Storage manager Components
Query processor Components
17
Naïve Applicatio Sophistica Database
users n ted users administra use
tor rs
Programm
ers
Applic
Applica Query Database
ation
tion tools scheme
Interfa
ce progra
ms
Query processor
Storage manager
Indices
Data
Data
dictionar
files
Statisti y
cal data
System structure
Disk
storage
18
a. Storage manager
A storage manager is a program module that provides the interface
between the low level data stored in a database and the application
programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file
manager.
Translation of various DML statements into low level files system
commands are managed by storage manager.
The storage manager is responsible for storing, retrieving, and
updating data in the database.
Components of Storage manager:-
Authorization and Integrity manager
Who tests for the satisfaction of integrity constraints and checks
the authority of users to access data,.
Transaction manager
Who ensures that the database remains consistent state if the
system failures.
File manager
Who manages the allocation of space on disk storage and data
structures used to represent information stored on disk.
Buffer manager
Who is responsible for fetching data from disk storage into main
memory and decide what data to be cache in main memory.
Data Structures in Storage Manager:-
Data files
Which stores the database itself.
Data Dictionary
Which stores the Meta data about the structure of the database
19
Indices
which provide fast access to data items that hold particular
values.
Statistical data
Which stores statistical information about the data in the
database.
b. Query Processor Components
DML compiler
It translates DML statements in a query language into low level
instructions that a query evaluation engine understands.
DML Compiler
(Input) (Output)
20
Database Users:-
Naïve users
Application programmers
Sophisticated users
Specialized users
Naïve Users:-
They are unsophisticated users.
They interact with the system by invoking the application programs.
Application programmers:-
Schema Definition:-
21
It creates the original database schema and executing a set of data
definition statements in the DDL. Storage structure and Access – method
Definition
A DBA creates appropriate storage structure and access methods by
writing a set of definitions, which is a translated by data-storage and data
definition language compiler.
Schema and physical organization modification:-
DBA carries changes to the schema and physical organization to
reflect changes needed by organization.
Granting of authorization for data access:-
The authorization information is kept in a special system structure.
The database system consults whenever someone attempts to access the
data in the system.
Routine maintenance:-
Ensuring that enough free disk space is available for normal
operations and upgrading space as required Periodically backing up
the data.
Monitoring jobs.
23
Edgar Codd worked for IBM in the development of hard disk
systems, and he was not happy with the lack of a search engine in
the CODASYL approach, and the IMS model.
He wrote a series of papers, in 1970, outlining novel ways to
construct databases. His ideas eventually evolved into a paper
titled, A Relational Model of Data for Large Shared Data
Banks, which described new method for storing data and processing
large databases.
Records would not be stored in a free-form list of linked records, as
in CODASYL navigational model, but instead used a “table with
fixed-length records.”
IBM had invested heavily in the IMS model, and wasn’t terribly
interested in Codd’s ideas.
Fortunately, some people who didn’t work for IBM “were” interested.
In 1973, Michael Stonebraker and Eugene Wong (both then at UC
Berkeley) made the decision to research relational database
systems.
The project was called INGRES (Interactive Graphics and Retrieval
System), and successfully demonstrated a relational model could be
efficient and practical. INGRES worked with a query language known
as QUEL, in turn, pressuring IBM to develop SQL in 1974, which was
more advanced (SQL became ANSI and OSI standards in 1986 1nd
1987).
SQL quickly replaced QUEL as the more functional query language.
RDBM Systems were an efficient way to store and process
structured data. Then, processing speeds got faster, and
“unstructured” data (art, photographs, music, etc.) became much
more common place.
Unstructured data is both non-relational and schema-less, and
Relational Database Management Systems simply were not
designed to handle this kind of data.
24
UNIT – II
2.1 RELATIONAL DATABASE
A relational database is a digital database whose organization is
based on the relational model of data. The various software systems used
to maintain relational databases are known as a relational database
management system (RDBMS). Virtually all relational database systems
use SQL (Structured Query Language) as the language for querying and
maintaining the database.
25
104 Ammapet 600
26
In this table let D1 denotes the set of all account numbers
D2 denotes the set of all branch name
D3 denotes the set of all balance.
Any row of account consists of 3-tuples(V1,V2,V3)
Where ,
V1 is an account number(V1 is in domain D1)
V2 is an account number(V2 is in domain D2)
V3 is an account number(V3 is in domain D3)
In general , account will contain only a subset of the set of all
possible rows. Account is the subset of D1* D2* D3,……
Generally ,a table of n attributes must be a subset of D1*D2*………
*Dn -1*Dn.
Relation is a subset of Cartesian product of a list of domains . Tables
and relations are exactly the same . A tuple variable is a variable
whose domain is the set of all tuples. In the above table, we have 4
tuples.
27
28
A database schema can be divided broadly into two categories −
2.4 KEY
Key is an attribute or collection of attributes that uniquely identifies
an entity among entity set.
For example, the roll number of a student makes him/her
identifiable among students.
Super Key − A set of attributes (one or more) that collectively
identifies an entity in an entity set.
Candidate Key − A minimal super key is called a candidate key.
An entity set may have more than one candidate key.
Primary Key − A primary key is one of the candidate keys chosen
by the database designer to uniquely identify the entity set.
29
Users can be granted access to log into individual schemas on a
case-by-case basis, and ownership is transferable. Since each object is
associated with a particular schema, which serves as a kind of
namespace, it’s helpful to give some synonyms, which allows other users
to access that object without first referring to the schema it belongs to.
These schemas do not necessarily indicate the ways that the data
files are stored physically. Instead, schema objects are stored logically
within a table space. The database administrator can specify how much
space to assign to a particular object within a data file.
Finally, schemas and table spaces don’t necessarily line up
perfectly: objects from one schema can be found in multiple table spaces,
while a table space can include objects from several schemas.
30
relational DBMS, the internal and conceptual schemas, as well as the
views, are defined by relations.
2.7 SQL
SQL is a programming language for Relational Databases. It is
designed over relational algebra and tuple relational calculus. SQL comes
as a package with all major distributions of RDBMS.
SQL comprises both data definition and data manipulation
languages. Using the data definition properties of SQL, one can design
and modify database schema, whereas data manipulation properties
allows SQL to store and retrieve data from database.
31
2.9 SQL DATA DEFINITION
SQL uses the following set of commands to define database schema
−
CREATE
Creates new databases, tables and views from RDBMS.
For example:-
Create database tutorials point;
Create table article;
Create view for_students;
DROP
32
The select clause corresponds to the projection operation of the
relational algebra. It is used to list the attributes desired in the
result of a query.
The from clause corresponds to the Cartesian-product operation
of the relational algebra.
The where clause corresponds to the selection predicate of the
relational algebra.
SQL query form:-
select A1, A2,……,Ai
from r1,r2,….,rm
where P
33
select loan-number from loan where amount between 90000
and 100000
From clause:-
The from clause by itself defines a Cartesian product of the relations
in the clause.
Example: For all customers who have a loan from the bank, find their
names and loan a amount
34
FROM from – list
WHERE qualification
SELECT Which specifies columns to be retained in the result
FROM which specifies a gross – product of tables
WHERE which specifies selection condition on the tables.
It is used to select the list from the condition
Example:
SELECT DISTINCT S.Sname. S.age FROM Sailors s
It products each set of rows <sname, age> pair.
It we omit DISTINCT then it returns a multiset of rows (ie) with
Duplicates.
Range Variable:-
SELECT S.sid, S.Sname, S.rating, S.age
FROM Sailors As S
WHERE S.rating > 7
Here As is called as range variable. This is convenient shorthand for
SQL
Example:
Example:
(select sname from student) intersect(select ename form
employee)
Except(-):-
The Except operation automatically eliminates duplicate values. if
we want to retain all duplicates we must write except all in place of
except.
Example:
(select sname from student) except(select ename form
employee)
2.12 NULL VALUES
SQL allows the use of null values to indicate the absence of
information about the value of an attribute. The keyword null is used to
test for null values. SQL also provides a special comparison operator is
null to test whether a column value is null, not null constraints are used
disallow the null values. The primary key constraints are not allowed to
take on null values.
Disadvantage:-
36
The issue in the presence of null values in the definition of when two
rows in a relation instance or regarded as duplicates. Count (*) handles
null values. All other aggregate operations simply discard null values.
Example:
Select loan-no from loan where amount is null
Disallowing Null values:-
It can be done by Specifying NOT NULL as a part of the field
definition.
EX: CHAR (20) NOTNULL
The fields on a primary key are not allowed to take on null values.
2.13 AGGREGATE FUNCTIONS
To retrieve the data, we need to perform some Computation or
Summarization.
SQL Supports Five Aggregation Operators:-
Count
Sum
Avg
Max
Min
Count:-
It is used to count the number of occurrence
Example:
Count the number of sailors
SELECT COUNT (*) FROM Sailors S
It computes the number of distinct sailor names.
Sum:-
It is used to sum to number of values.
Avg:-
It is used to calculate the average value
Example:
Find the average of all Sailors?
Find the average age of sailors with a rating of 10?
37
SELECT AVG(S.age) FROM sailors S where S.rating =10;
Max:-
It is used to return the maximum value
Example:
find the name and age of Oldest Sailor
SELECT S.Sname, Max (s.age) FROM Sailors
SELECT S.Sname,S.age FROM Sailors S WHERE S.age =
(SELECT MAX(S2.age) FROM Sailors S2)
38
Min:-
It is used to count the number of occurrence
Example:
Find the name and age of Oldest Sailor
SELECT S.Sname, Max(s.age)FROM Sialors
39
Example:
Find the names of sailors who have reserved a beat number 103.
SELECT S.Sname FROM Sailor S
WHERE EXISTS (SELECT * FROM Reserves R
WHERE R.Bid = 103 AND R. Bid = S.Sid)
This EXISTS is another set – comparison operator such as IN NOTIN.
It allows us to test whether a set is non – empty set of reserves lows
R such that R.Bid = 103 AND S.Sid = R.Sid is non - empty
If so, sailor S has reserved boat 103, we retrieve the name.
The sub query clearly depends on the current rows and must be
reevaluated for each row in sailors.
40
Example:
To insert row values directly
INSERT INTO student VALUES(1001,’vikky’, 80, 60, 90, 240,
80.00,’pass’
To insert row values through keyboard
INSERT INTO student VALUES (®no,’&name’,&m1,&m2,
&m2,&total,&percentage,’&result’)
To insert row values
INSERT INTO student
(®no,’&name’,&m1,&m2,&m2,&total,
&percentage,’ result’) VALUES(1001,’vikky’, 80, 60, 90, 240,
80.00,’pass’);
To insert row values into a table from another table
INSERT INTO student SELECT rno, name, tot, res FROM
academic;
Deletion:-
Delete command is used to remove rows from the table.
Using this command either of the below mentioned are possible.
All rows can be deleted from a table.
Remove selected rows from a table.
Delete command operated in only one operation.
Syntax:
DELETE FROM <table name> WHERE predicate;
Example:
To delete those rows from the table student, the query
DELETE FROM student;
To delete those rows whose result is fail, the query is
DELETE FROM student WHERE result=’fail’;
Update:-
To change a value in a tuple without changing all values in the
tuple.
Syntax:
41
UPDATE <TABLENAME> SET PREDICATE;
42
Example:
The annual interest payments are made, and all branches are to be
increased by 5 percent.
UPDATE account
SET balance=balance*1.05
If interest is to paid only to accounts with a balance of $1000 or
more
UPDATE account
SET balance=balance*1.05
WHERE balance>=1000
SQL first tests all tuples in the relation to see whether they should
be updated, and carries out the updates afterward
43
UNIT – III
3.1 INTERMEDIATE SQL
This Intermediate/Advanced SQL Tutorial will cover the SELECT
statement in great detail. The SELECT statement is the core of SQL, and it
is likely that the vast majority of your SQL commands will be SELECT
statements. Due to the enormous amount of options available for the
SELECT statement, this entire tutorial has been dedicated to it.
44
Table Geography
Region_Na Store_Na
me me
East Boston
West Los
Angeles
West 2050
The first two lines tell SQL to select two fields, the first one is the field
"Region_Name" from table Geography (aliased as REGION), and the
second one is the sum of the field "Sales" from
table Store_Information (aliased as SALES). Notice how the table aliases
are used here: Geography is aliased as A1, and Store_Information is
aliased as A2. Without the aliasing, the first line would become
45
SELECT Geography.Region_Name REGION,
SUM(Store_Information.Sales) SALES
This is much more cumbersome. In essence, table aliases make the
entire SQL statement easier to understand, especially when multiple
tables are included.
An alternative way to specify a join between tables is to use
the JOIN and ON keywords. In the current example, the SQL query would
be,
SELECT A1.Region_Name REGION, SUM(A2.Sales) SALES
FROM Geography A1
JOIN Store_Information A2
ON A1.Store_Name = A2.Store_Name
GROUP BY A1.Region_Name;
Several different types of joins can be performed in SQL. The key ones are
as follows:
Inner Join
Outer Join
Left Outer Join
Cross Join
Inner Join:-
An inner join in SQL returns rows where there is at least one match
on both tables. Let's assume that we have the following two tables,
Table Store_Information
47
Table Geography
Region_Na Store_Na
me me
East Boston
West Los
Angeles
We want to find out sales by store, and we only want to see stores
with sales listed in the report. To do this, we can use the following SQL
statement using INNER
SELECT A1.Store_Name STORE, SUM(A2.Sales) SALES
FROM Geography A1
INNER JOIN Store_Information A2
ON A1.Store_Name = A2.Store_Name
GROUP BY A1.Store_Name;
Result:-
STORE SALES
Los Angeles 1800
By using INNER JOIN, the result shows 3 stores, even though we are
selecting from the Geographytable, which has 4 rows. The row "New York"
is not selected because it is not present in the Store_Information table.
Outer Join:-
Previously, we had looked at left join, or inner join, where we select
rows common to the participating tables to a join. What about the cases
where we are interested in selecting elements in a table regardless of
48
whether they are present in the second table? We will now need to use
the SQL OUTER JOIN command.
Table Store_Information
Table Geography
Region_Na Store_Na
me me
East Boston
West Los
Angeles
49
Store_Information table. Therefore, we need to perform an outer join on
the two tables above:
SELECT A1.Store_Name, SUM(A2.Sales) SALES
FROM Geography A1, Store_Information A2
WHERE A1.Store_Name = A2.Store_Name (+)
GROUP BY A1.Store_Name;
Note that in this case, we are using the Oracle syntax for outer join.
Result:-
Store_Nam SALES
e
Boston 70
In an left outer join, all rows from the first table mentioned in the
SQL query is selected, regardless whether there is a matching row on the
second table mentioned in the SQL query. Let's assume that we have the
following two tables,
Table Store_Information
50
1999
Table Geography
Region_Na Store_Na
me me
East Boston
West Los
Angeles
We want to find out sales by store, and we want to see the results
for all stores regardless whether there is a sale in
the Store_Information table. To do this, we can use the following SQL
statement using LEFT OUTER JOIN
Result:-
STORE SALES
51
BOSTAN 700
By using LEFT OUTER JOIN, all four rows in the Geography table is
listed. Since there is no match for "New York" in
the Store_Information table, the Sales total for "New York" is NULL. Note
that it is NULL and not 0, as NULL indicates there is no match.
Cross Join:-
A cross join (also called a Cartesian join) is a join of tables without
specifying the join condition. In this scenario, the query would return all
possible combination of the tables in the SQL query. To see this in action,
let's use the following example:
52
Table Store_Information
Table Geography
Region_Na Store_Na
me me
East Boston
West Los
Angeles
53
3.3 VIEW
In some cases, it is not desirable for all users to see the entire
logical model (that is, all the actual relations stored in the database.)
Consider a person who needs to know an instructors name and
department, but not the salary. This person should see a relation
described, in SQL, by
select ID, name, dept_name
from instructor
A view provides a mechanism to hide certain data from the view of
certain users.
Any relation that is not of the conceptual model but is made visible
to a user as a “virtual relation” is called a view.
View Definition:-
A view is defined using the create view statement which has the
form
Create view v as < query expression >
Where <query expression> is any legal SQL expression. The view
name is represented by v.
Once a view is defined, the view name can be used to refer to the
virtual relation that the view generates.
View definition is not the same as creating a new relation by
evaluating the query expression
Rather, a view definition causes the saving of an expression; the
expression is substituted into queries using the view.
Example:-
A view of instructors without their salary
create view faculty as
select ID, name, dept_name
from instructor
Find all instructors in the Biology department
select name
54
from faculty
where dept_name = ‘Biology’
Create a view of department salary totals
create view departments_total_salary(dept_name, total_salary) as
select dept_name, sum (salary)
from instructor
group by dept_name;
create view physics_fall_2009 as
select course.course_id, sec_id, building, room_number
from course, section
where course.course_id = section.course_id
and course.dept_name = ’Physics’
and section.semester = ’Fall’
and section.year = ’2009’;
create view physics_fall_2009_watson as
select course_id, room_number
from physics_fall_2009
where building= ’Watson’;
Expand use of a view in a query/another view
55
A view relation v1 is said to depend on view relation v2 if either v1
depends directly to v2 or there is a path of dependencies from v 1 to
v2
A view relation v is said to be recursive if it depends on itself.
View Expansion:-
A way to define the meaning of views defined in terms of other
views.
Let view v1 be defined by an expression e1 that may itself contain
uses of view relations.
View expansion of an expression repeats the following replacement
step:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining
vi until no
more view relations are present in e1
As long as the view definitions are not recursive, this loop will
terminate
Update Of A View:-
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
3.5 AUTHORIZATION
Forms of authorization on parts of the database:
Read - allows reading, but not modification of data.
Insert - allows insertion of new data, but not modification of existing
data.
Update - allows modification, but not deletion of data.
Delete - allows deletion of data.
Forms of authorization to modify the database schema
Index - allows creation and deletion of indices.
Resources - allows creation of new relations.
Alteration - allows addition or deletion of attributes in a relation.
Drop - allows deletion of relations.
57
Authorization Specification in SQL
The grant statement is used to confer authorization
Select: allows read access to relation, or the ability to query using the
view
Example: grant users U1, U2, and U3 select authorization on the instructor
relation:
Grant select on instructor to U1, U2, U3
Insert: the ability to insert tuples
Update: the ability to update using the SQL update statement
Delete: the ability to delete tuples.
All privileges: used as a short form for all the allowable privileges
Revoking Authorization in SQL:-
The revoke statement is used to revoke authorization.
revoke <privilege list>
on <relation name or view name> from <user list>
Example:
revoke select on branch from U1, U2, U3
<privilege-list> may be all to revoke all privileges the revokee may
hold.
If <revokee-list> includes public, all users lose the privilege except
those granted it explicitly.
58
If the same privilege was granted twice to the same user by
different grantees, the user may retain the privilege after the
revocation.
All privileges that depend on the privilege being revoked are also
revoked.
Roles:-
Create role instructor;
Grant instructor to commit;
Privileges can be granted to roles:
Grant select on takes to instructor;
Roles can be granted to users, as well as to other roles
Create role teaching_assistant
Grant teaching_assistant to instructor;
Instructor inherits all privileges of teaching_assistant
Chain of roles
Create role dean;
Grant instructor to dean;
Grant dean to Satoshi;
Authorization on Views:-
create view geo_instructor as
(select * from instructor where dept_name = ’Geology’);
grant select on geo_instructor to geo_staff
Suppose that a geo_staff member issues
select * from geo_instructor;
What if geo_staff does not have permissions on instructor?
creator of view did not have some permissions on instructor?
Other Authorization Features:-
References privilege to create foreign key
Grant reference (dept_name) on department to Mariano;
why is this required?
Transfer of privileges
grant select on department to commit with grant option;
revoke select on department from commit, Satoshi cascade;
59
revoke select on department from commit, Satoshi restrict;
60
3.6 ADVANCED SQL
SQL allows us to perform various transactions on the underlying
database data. It allows the user to retrieve simple to complex request in
an efficient way. The basic command used to retrieve the data from the
database is SELECT.
61
Find the name and address of each customer that has more than one
account.
select customer_name, customer_street, customer_city
from customer
where account_count (customer_name ) > 1
Table Functions:-
SQL:2003 added functions that return a relation as a result
Example: Return all accounts owned by a given customer
create function accounts_of (customer_name char(20)
returns table ( account_number char(10),
branch_name char(15)
balance numeric(12,2))
return table
(select account_number, branch_name, balance
from account A where exists (
select *
from depositor D
where D.customer_name = accounts_of.customer_name
and D.account_number = A.account_number ))
Usage select *
from table (accounts_of (‘Smith’))
SQL Procedures:-
The author_count function could instead be written as
procedure:
create procedure account_count_proc (in title varchar(20), out
a_count integer)
begin select count(author) into a_count
from depositor
where depositor.customer_name =
account_count_proc.customer_name
end
Procedures can be invoked either from an SQL procedure or
from embedded SQL, using the call statement.
62
declare a_count integer;
call account_count_proc( ‘Smith’, a_count);
Procedures and functions can be invoked also from dynamic SQL
SQL:1999 allows more than one function/procedure of the same
name (called name overloading), as long as the number of
arguments differ, or at least the types of the arguments differ
Compound statement: begin … end,
May contain multiple SQL statements between begin and end.
Local variables can be declared within a compound statements
While and repeat statements:
declare n integer default 0;
while n < 10 do
set n = n + 1
end while
repeat
set n = n – 1
until n = 0
end repeat
For loop
Permits iteration overall results of a query
Example: find total of all balances at the Perryridge branch
declare n integer default 0;
for r as
select balance from account
where branch_name = ‘Perryridge’
do
set n = n + r.balance
end for
Conditional statements (if-then-else)
E.g. To find sum of balances for each of three categories of accounts
(with balance <1000, >=1000 and <5000, >= 5000)
if r.balance < 1000
then set l = l + r.balance
63
elseif r.balance < 5000
then set m = m + r.balance
else set h = h + r.balance
end if
SQL:1999 also supports a case statement similar to C case
statement
Signaling of exception conditions, and declaring handlers for
exceptions
declare out_of_stock condition
declare exit handler for out_of_stock
begin
…
.. signal out-of-stock
end
The handler here is exit -- causes enclosing begin..end to be exited
3.8 TRIGGERS
You can write triggers that fire whenever one of the following
operations occurs:
DML statements (INSERT, UPDATE, DELETE) on a particular table or
view, issued by any user
DDL statements (CREATE or ALTER primarily) issued either by a
particular schema/user or by any schema/user in the database
Database events, such as logon/logoff, errors, or startup/shutdown,
also issued either by a particular schema/user or by any
schema/user in the database
Triggers are similar to stored procedures. A trigger stored in the
database can include SQL and PL/SQL or Java statements to run as a
unit and can invoke stored procedures. However, procedures and
triggers differ in the way that they are invoked. A procedure is
explicitly run by a user, application, or trigger. Triggers are implicitly
fired by Oracle when a triggering event occurs, no matter which
user is connected or which application is being used.
64
A database application with some SQL statements that implicitly fire
several triggers stored in the database. Notice that the database
stores triggers separately from their associated tables.
66
are also known as attributes. All these names are used interchangeably in
relational database.
67
It can also use logical AND, OR and NOT operators to combine the
various filtering conditions.
This operation can be represented as below:-
σ p (r)-Where σ is the symbol for select operation, r represents the
relation/table, and p is the logical formula or the filtering conditions to get
the subset. Let us see an example as below:
σSTD_NAME = “James” (STUDENT)
What does above relation algebra do? It selects the record/tuple from the
STUDENT table with Student name as ‘James’
σdept_id = 20 AND salary>=10000 (EMPLOYEE) - Selects the records from
EMPLOYEE table with department ID = 20 and employees whose salary is
more than 10000.
Project (∏) :-
This is a unary operator and is similar to select operation above. It
creates the subset of relation based on the conditions specified.
Here, it selects only selected columns/attributes from the relation-
vertical subset of relation.
The select operation above creates subset of relation but for all the
attributes in the relation. It is denoted as below:
∏a1, a2, a3 (r)-Where ∏ is the operator for projection, r is the relation
and a1, a2, a3 are the attributes of the relations which will be shown
in the resultant subset.
∏std_name, address, course (STUDENT) - This will select all the records from
STUDENT table but only selected columns – std_name, address and
68
course. Suppose we have to select only these 3 columns for
particular student then we have to combine both project and select
operations.
This selects the record for ‘James’ and displays only std_ID, address
and his course columns. Here we can see two unary operators are
combined, and it has two operations performing.
First it selects the tuple from STUDENT table for ‘James’. The
resultant subset of STUDENT is also considered as intermediary
relation.
But it is temporary and exists till the end of this operation. It then
filters the 3 columns from this temporary relation.
Rename (ρ) :-
This is a unary operator used to rename the tables and columns of a
relation. When we perform self join operation, we have to
differentiate two same tables.
In such case rename operator on tables comes into picture.
When we join two or more tables and if those tables have same
column names, then it is always better to rename the columns to
differentiate them. This occurs when we perform Cartesian product
operation.
ρ
R(E)- Where ρ is the rename operator, E is the existing relation
name, and R is the new relation name.
ρ STUDENT (STD_TABLE) – Renames STD_TABLE table to STUDENT
Let us see another example to rename the columns of the table. If
the STUDENT table has ID, NAME and ADDRESS columns and if they
have to be renamed to STD_ID, STD_NAME, STD_ADDRESS, then we
have to write as follows
ρ
STD_ID, STD_NAME, STD_ADDRESS (STUDENT) – It will rename the columns in the
order the names appear in the table
69
Cartesian product (X):-
This is a binary operator. It combines the tuples of two relations into
one relation.
RXS-Where R and S are two relations and X is the operator.
If relation R has m tuples and relation S has n tuples, then the
resultant relation will have mn tuples. For example, if we perform
cartesian product on EMPLOYEE (5 tuples) and DEPT relations (3
tuples), then we will have new tuple with 15 tuples.
EMPLOYEE X DEPT:-
This operator will simply create a pair between the tuples of each
table. i.e.; each employee in the EMPLOYEE table will be mapped with
each department in DEPT table. Below diagram depicts the result of
cartesian product.
Set-difference (-) :-
This is a binary operator. This operator creates a new relation with
tuples that are in one relation but not in other relation. It is denoted by
‘-‘symbol.
R–S
Where R and S are the relations.
70
Suppose we want to retrieve the employees who are working in
Design department but not in testing.
71
DESIGN_EMPLOYEE −TESTING_EMPLOYEE
Set Intersection:-
This operation is a binary operation. It results in a relation with
tuples that are in both the relations. It is denoted by ‘∩ ‘.
R∩S
Where R and S are the relations. It picks all the tuples that are
present in both R and S, and results it in a new relation.
Suppose we have to find the employees who are working in both
design and testing department.
If we have tuples as in above example, the new result relation will
not have any tuples. Suppose we have tuples like below and see the new
relation after set difference.
Assignment:-
As the name indicates, the assignment operator ‘ ’ is used to
assign the result of a relational operation to temporary relational
variable.
This is useful when there is multiple steps in relational operation
and handling everything in one single expression is difficult.
Assigning the results into temporary relation and using this
temporary relation in next operation makes task simple and easy.
72
T S – denotes relation S is assigned to temporary relation T
A relational operation ∏a1, a2 (σ p (E)) with selection and projection can be
divided as below.
T σ p (E)
S ∏a1, a2 (T)
Our example above in projection for getting STD_ID, ADDRESS and
COURSE for the Student ‘James’ can be re-written as below.
∏STD_ID, address, course (σ (STUDENT))
STD_NAME = “James”
73
This set intersection can also be written as a combination of set difference
operations.
R∩S R-(R-S)
i.e.; it evaluates R-S to get the tuples which are present only in R and then
it gets the record which are present only in R but not in new resultant
relation of R-S.
In above example of employees,
DESIGN_EMPLOYEE – (DESIGN_EMPLOYEE – TESTING_EMPLOYEE)
It first filters only those employees who are only design employees –
(104, Kathy). This result is then used to find the difference with
design employee.
This will find those employees who are design employees but not in
new result – (100, James).
Thus it gives the result tuple which is both designer and tester. We
can see here fundamental relational operator is used twice to get
set intersection. Hence this operation is not fundamental operation.
Division:-
This operation is used to find the tuples with phrase ‘for all’. It is
denoted by ‘÷’. Suppose we want to see all the employees who
work in all of departments. What are the steps involved to find this?
First we find all the department ID - T1 ∏DEPT_ID (DEPARTMENT)
Next step is list all the employees and their departments – T2
∏ EMP_ID, DEPT_ID (EMPLOYEE)
In third step we will find the employees in T2 with the entire
department ID in T1. This is obtained by using division operation
– T2 ÷ T1
74
3.11 TUPLE RELATIONAL CALCULUS
Tuple variable is also called as range variable that ranges over some
relation.
A tuple variable is a variable that takes on tuple is a variable that
takes on tuples of a particular relation schema as values.
A Tuple relational calculus query has the from {T/P(T)},where Tis a
tuple variable and P(T) denotes a formula that describe T:
Describe T:
Where TSet of tuple variable
Pformula Involving
To find all teachers whose salary is above 1000?
Query:{t/TEACHER(t) and t.salary > 10000}
To retrieve first and last names
Query:{t.fname,t.Lname/TEACHER (t) and} t.salary >
100000}
75
Syntax of TRC Queries:
A Predicate followed by its arguments is called as Atomic formula
R Rel
R.a op S
R.a op constant, or constant op R.a
A formula is recursively defined to be, where p and q are
themselves formulas and P® denotes a formula in which the
variable R appears:
Any atomic formula
7P,Pq,Pvq, or P=>q
R (p (R), where R is a tuple variable
V R (p (R)), where R is a tuple Variable.
The quantifiers and V are said to bind the variable.
Example DBMS (x).COMPANY (y)
A language consists of symbols (ie) variables, constants and
predicates.
Example: P(x) v (Q(y)
(P(x)(Q(y)
77
UNIT - IV
4.1 DATABASE DESIGN AND THE E-R MODEL
Database design is the process of producing a detailed data
model of database. This data model contains all the needed logical and
physical design choices and physical storage parameters needed to
generate a design in a data definition language, which can then be used
to create a database. A fully attributed data model contains detailed
attributes for each entity.
78
A data processing system may involve some combination of:
Attributes:-
79
There exists a domain or range of values that can be assigned to
attributes. For example, a student's name cannot be a numeric value. It
has to be alphabetic. A student's age cannot be negative, etc.
80
Types of Attributes:-
81
Candidate Key − A minimal super key is called a candidate key.
An entity set may have more than one candidate key.
Primary Key − A primary key is one of the candidate keys chosen
by the database designer to uniquely identify the entity set.
Relationship:-
Relationship Set:-
Degree of Relationship:-
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities:-
82
Many-to-one − More than one entities from entity set A can be
associated with at most one entity of entity set B, however an entity
from entity set B can be associated with more than one entity from
entity set A.
Many-to-many − One entity from A can be associated with more
than one entity from B and vice versa.
Let us now learn how the ER Model is represented by means of an
ER diagram. Any object, for example, entities, attributes of an
entity, relationship sets, and attributes of relationship sets, can be
represented with the help of an ER diagram.
Entity:-
Entities are represented by means of rectangles. Rectangles are
named with the entity set they represent.
Attributes:-
83
Multivalued attributes are depicted by double ellipse.
Relationship:-
84
Binary Relationship and Cardinality:-
85
Many-to-many − The following image reflects that more than one
instance of an entity on the left and more than one instance of an
entity on the right can be associated with the relationship. It depicts
many-to-many relationship.
Participation Constraints:-
4.4 CONSTRAINTS
Constraints enforce limits to the data or type of data that can be
inserted/updated/deleted from a table. The whole purpose of constraints is
to maintain the data integrity during an update/delete/insert into a
table. In this tutorial we will learn several types of constraints that can be
created in RDBMS.
86
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Domain constraints
Mapping constraints
Not Null:-
NOT NULL constraint makes sure that a column does not hold NULL
value. When we don’t provide value for a particular column while inserting
a record into a table, it takes NULL value by default. By specifying NULL
constraint, we can be sure that a particular column(s) cannot have NULL
values.
Example:
CREATE TABLE STUDENT(
88
Key Constraints:-
Primary Key:
Each table has certain set of columns and each column allows a
same type of data, based on its data type. The column does not accept
values of any other data type.
Domain constraints are user defined data type and we can define them
like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE /
PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)
89
At first glance an entity relationship diagram looks very much like
a flowchart. It is the specialized symbols, and the meanings of those
symbols, that make it unique
Common Entity Relationship Diagram Symbols:-
An ER diagram is a means of visualizing how the information a
system produces is related. There are five main components of an ERD:
Entity Relationship Diagram Symbols — Chen notation
Attributes
90
An attribute that can have many
values (there are many distinct values
Multivalued
entered for it in the same column of
attribute
the table). Multivalued attribute is
depicted by a dual oval.
An attribute whose value is calculated
(derived) from other attributes. The
Derived derived attribute may or may not be
attribute physically stored in the database. In
the Chen notation, this attribute is
represented by dashed oval.
Relationships
A relationship where entity is
existence-independent of other
Strong entities, and PK of Child doesn’t
relationship contain PK component of Parent Entity.
A strong relationship is represented by
a single rhombus
A relationship where Child entity is
existence-dependent on parent, and
Weak
PK of Child Entity contains PK
(identifying)
component of Parent Entity. This
relationship
relationship is represented by a double
rhombus.
Symbol Meaning
Relationships (Cardinality and Modality)
Zero or One
One or More
Zero or More
Many - to - One
a one through many notation on one side of
a relationship and a one and only one on the
other
91
a zero through many notation on one side of
a relationship and a one and only one on the
other
a one through many notation on one side of
a relationship and a zero or one notation on
the other
a zero through many notation on one side of
a relationship and a zero or one notation on
the other
Many - to - Many
a zero through many on both sides of a
relationship
The notions of an entity set and a relationship set are not precise,
and it is possible to define a set of entities and the relationships among
them in a number of different ways. In this section, we examine basic
issues in the design of an E-R database schema.
93
For example, it is incorrect to model customer-id as an attribute
of loan even if each loan had only one customer.
The relationship borrower is the correct way to represent the
connection between loans and customers, since it makes their
connection explicit, rather than implicit via an attribute.
Another related mistake that people sometimes make is to
designate the primary key attributes of the related entity sets as
attributes of the relationship set. This should not be done, since the
primary key attributes are already implicit in the relationship.
94
Use of Entity Sets versus Relationship Sets
95
This approach can also be useful in deciding whether certain
attributes may be more appropriately expressed as relationships.
Binary versus nary Relationship Sets
96
• (ei, bi) in RB
• (ei, ci) in RC
We can generalize this process in a straightforward manner to n-ary
relationship sets.
Thus, conceptually, we can restrict the E-R model to include only
binary relationship sets.
However, this restriction is not always desirable.
An identifying attribute may have to be created for the entity set
created to represent the relationship set.
This attribute, along with the extra relationship sets required,
increases the complexity of the design and (as we shall see in
Section 2.9) overall storage requirements.
A n-ary relationship set shows more clearly that several entities
participate in a single relationship.
There may not be a way to translate constraints on the ternary
relationship into constraints on the binary relationships.
For example, consider a constraint that says that R is many-to-one
from A, B to C; that is, each pair of entities from A and B is
associated with at most one C entity. This constraint cannot be
expressed by using cardinality constraints on the relationship
sets RA, RB , and RC .
We cannot directly split works-on into binary relationships between
employee and branch and between employee and job.
If we did so, we would be able to record that Jones is a manager and
an auditor and that Jones works at Perryridge and Down- town
however
we would not be able to record that Jones is a manager at
Perryridge and an auditor at Downtown, but is not an auditor at
Perryridge or a manager at Downtown.
The relationship set works-on can be split into binary relationships
by creating a new entity set as described above. However, doing so
would not be very natural.
97
Placement of Relationship Attributes
98
that a customer may have one or more accounts, and that an
account can be held by one or more customers.
If we are to express the date on which a specific customer last
accessed a specific account, access-date must be an attribute of
the depositor relationship set, rather than either one of the
participating entities.
If access-date were an attribute of account, for instance, we could
not determine which customer made the most recent access to a
joint account.
When an attribute is determined by the combination of participating
entity sets, rather than by either entity separately, that attribute
must be associated with the many-to-many relationship set the
placement of access-date as a relationship attribute
Specialization
Consider an entity set person, with attributes name, street, and city.
A person may be further classified as one of the following:
• customer
• employee
99
Each of these person types is described by a set of attributes that
includes all the at- tributes of entity set person plus possibly additional
attributes. For example, customer entities may be described further by
the attribute customer-id, whereas employee enti- ties may be described
further by the attributes employee-id and salary. The process of
designating subgroupings within an entity set is called specialization.
The specialization of person allows us to distinguish among persons
according to whether they are employees or customers.
• officer
• teller
• secretary
Each of these employee types is described by a set of attributes
that includes all the attributes of entity set employee plus additional
100
attributes. For example, officer entities may be described further by the
attribute office-number, teller entities by the attributes station-
number and hours-per-week, and secretary entities by the attribute hours-
per- week. Further, secretary entities may participate in a
relationship secretary-for, which identifies which employees are assisted
by a secretary.
An entity set may be specialized by more than one distinguishing
feature. In our example, the distinguishing feature among employee
entities is the job the employee performs. Another, coexistent,
specialization could be based on whether the person is a temporary
(limited-term) employee or a permanent employee, resulting in the entity
sets temporary-employee and permanent-employee. When more than one
specialization is formed on an entity set, a particular entity may belong to
multiple specializations. For instance, a given employee may be a
temporary employee who is a secretary.
Generalization
101
There are similarities between the customer entity set and
the employee entity set in the sense that they have several attributes in
common. This commonality can be expressed by generalization, which
is a containment relationship that exists between a higher-level entity set
and one or more lower-level entity sets. In our example, person is the
higher-level entity set and customer and employee are lower-level entity
sets. Higher- and lower-level entity sets also may be designated by the
terms superclass and subclass, respectively. The person entity set is
the superclass of the customer and employee subclasses.
102
Differences in the two approaches may be characterized by their starting
point and overall goal.
Attribute Inheritance
103
participates. The officer, teller, and secretary entity sets can participate in
the works-for relationship set, since the superclass employee participates
in the works-for relationship. Attribute inheritance applies through all tiers
of lower-level entity sets. The above entity sets can participate in any
relationships in which the person entity set participates.
Whether a given portion of an E-R model was arrived at by
specialization or generalization, the outcome is basically the same:
A higher-level entity set with attributes and+ relationships that
apply to all of its lower-level entity sets
Lower-level entity sets with distinctive features that apply only
within a particular lower-level entity set
what follows, although we often refer to only generalization, the
properties that we discuss belong fully to both processes.
In a hierarchy, a given entity set may be involved as a lower-level
entity set in only one ISA relationship; that is, entity sets in this diagram
have only single inheritance. If an entity set is a lower-level entity set in
more than one ISA relationship, then the entity set has multiple
inheritance, and the resulting structure is said to be a lattice.
Constraints on Generalizations
106
account and savings-account into account is a total, disjoint
generalization. The completeness and disjointness constraints, however,
do not depend on each other. Constraint patterns may also be partial-
disjoint and total-overlapping.
Aggregation
107
combine them into a single relationship, since
some employee, branch, job combinations many not have a manager.
The best way to model a situation such as the one just described is
to use aggregation. Aggregationis an abstraction through which
relationships are treated as higher- level entities. Thus, for our example,
we regard the relationship set works-on (relating the entity
sets employee, branch, and job) as a higher-level entity set called works-
on. Such an entity set is treated in the same manner as is any other entity
set. We can then create a binary relationship manages between works-on
and manager to represent who manages what tasks.
109
A properly designed database has many benefits. The processes of
adding, editing, deleting, and retrieving table data are greatly facilitated
by a properly designed database. In addition, reports are easier to build.
Most importantly, the database becomes easy to modify and maintain.
"Atomic" has never really meant "indivisible", which is why that term is
finally falling out of favor. Loosely speaking, "atomic" means if a value has
component parts, the dbms either ignores the existence of those parts, or
it provides functions to manipulate them. For example, a timestamp value
has these parts.
Year
Month
Day
Hours
Minutes
Seconds
Milliseconds
Rule of atomicity:-
rule 1: a column with atomic data can't have several values of the
same type of data in the same column.
rule2: a table with atomic data can't have several columns with the
same datatype.
110
Like fullname column can't say that it could be atomic because it can be
further divded into lastname, firstname. A column with interest could also
be divided further, so a column which can't be divided is known as atomic
Example :-
Domains and values Below is a table that stores the names and
telephone numbers of customers. One requirement though is to retain
multiple telephone numbers for some customers. The simplest way of
satisfying this requirement is to allow the "Telephone Number" column in
any given row to contain more than one value:
Customer
111
Customer First Surnam
ID Name e Telephone Number
555-861-2025, 192-122-
123 Pooja Singh 1111
(555) 403-1659 Ext. 53; 182-
456 Zhang San 929-2929
would be fine. But it's not arbitrary text at all: we obviously intended this
numbers, the text contains more than one number in two of our rows. This
columns contain non- atomic values, and they contain more than one of
them.
To bring the model into the first normal form, we split the strings we
Note that the "ID" is no longer unique in this solution with duplicated
and the number tables. A row in the "parent" table, Customer Name, can
112
be associated with many telephone number rows in the "child"
belongs to one, and only one customer. It is worth noting that this design
meets the additional requirements for second and third normal form.
113
Customer
Atomcity
'atomicity'. Codd states that the "values in the domains on which each
than one kind of data in it such that what one part means to the DBMS
114
Hugh Darwen and Chris Date have suggested that Codd's concept of
would seem to imply that few, if any, data types are atomic:
A character string would seem not to be atomic, as the RDBMS typically
provides operators to decompose it into substrings.
A fixed-point number would seem not to be atomic, as the RDBMS
typically provides operators to decompose it into integer and fractional
components.
An ISBN would seem not to be atomic, as it includes language and
publisher identifier.
Date suggests that "the notion of atomicity has no absolute meaning":
a value may be considered atomic for some purposes, but may be
considered an assemblage of more basic elements for other purposes. If
this position is accepted, 1NF cannot be defined with reference to
atomicity. Columns of any conceivable data type (from string types and
numeric types to array types and table types) are then acceptable in a
1NF table—although perhaps not always desirable; for example, it would
be more desirable to separate a Customer Name column into two
separate columns as First Name, Surname.
First normal form, as defined by Chris Date, permits relation-
valued attributes (tables within tables). Date argues that relation-
valued attributes, by means of which a column within a table can contain
a table, are useful in rare cases.
1NF tables as representations of relation
According to Date's definition, a table is in first normal form if and
only if it is "isomorphic to some relation", which means, specifically, that it
satisfies the following five conditions:
There's no top-to-bottom ordering to the rows.
115
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one value from
the applicable domain (and nothing else).
All columns are regular [i.e. rows have no hidden components such
as row IDs, object IDs, or hidden timestamps]
Violation of any of these conditions would mean that the table is not
strictly relational, and therefore that it is not in first normal form.
Examples of tables (or views) that would not meet this definition of
first normal form are:
A table that lacks a unique key. Such a table would be able to
accommodate duplicate rows, in violation of condition 3.
A view whose definition mandates that results be returned in a
particular order, so that the row-ordering is an intrinsic and
meaningful aspect of the view. This violates condition 1.
The tuples in true relations are not ordered with respect to each
other.
A table with at least one null able attribute. A null able attribute
would be in violation of condition 4, which requires every column to
contain exactly one value from its column's domain. It should be
noted, however, that this aspect of condition 4 is controversial. It
marks an important departure from Codd's later vision of
the relational model, which made explicit provision for nulls.
116
a®b
holds on R if and only if for any legal relations r(R), whenever any
two tuples t1 and t2 of r agree on the attributes a, they also agree on
the attributes b. That is,
t1[a] = t2 [a] Þ t1[b ] = t2 [b ]
K ® R, and for no a Ì K, a ® R
R = (A, B, C, G, H, I)
F={ A®B
A®C
CG ® H
CG ® I
B ® H}
some members of F+
A®H by transitivity from A ® B and B ® H
AG ® I by augmenting A ® C with G, to get AG ® CG
and then transitivity with CG ® I
CG ® HI by augmenting CG ® I to infer CG ® CGI, and
augmenting of CG ® H to infer CGI ® HI, and then transitivity
Formal theory of functional dependencies
Rules for computing with these
Can be used when showing that a database satisfies normal forms
And for giving a formal definition of e.g. dependency preserving
decomposition
118
Also important for decomposition algorithms .An implied functional
dependency is one that follows from other stated ones . Assume e.g.
ID → dept_name
dept_name → budget
ID → budget
The definitions of the normal forms talk about all functional
dependencies, including the implied ones 9 ID → dept_name
dept_name → budget
Decomposition using Multivalued Dependencies
But wait! There’s more! Recall that, at the beginning of this whole
exercise, we established that the ultimate goal of all of this is to
eliminate redundancy.
Unfortunately, BCNF does not completely eradicate all cases of
redundancy. There isn’t a precise rule that characterizes such a
schema, but a general intuitive notion is for schemas that involve a
relationship among entities which either have multivalued attributes
or are involved in other 1-to-many relationships. Such schemas will
allow copies of data without violating any functional dependencies.
This final redundancy is dealt with using a new type of constraint,
called a multivalued dependency, and an accompanying normal
form, fourth normal form (4NF).
To define a multivalued dependency, we begin with the observation
that a functional dependency rules out tuples — it prevents certain
tuples from appearing in a relation. But what about the converse?
What if we want to require the presence of a tuple? That’s where
multivalued dependencies come in.
Because of this distinction, functional dependencies are sometimes
referred to as equalitygenerating, and multivalued dependencies
are referred to as tuple-generating.
Given a relation schema R and attribute sets α ⊆ R and β ⊆ R. A
multivalued dependency of β on α, written as α →→ β, holds on R if,
119
for any legal relation r(R), ∀ t1, t2 ∈ r|t1[α] = t2[α], ∃ t3, t4 ∈ r such
that:
t1[α] = t2[α] = t3[α] = t4[α]
t3[β] = t1[β]
t3[R − β] = t2[R − β]
t4[β] = t2[β]
t4[R − β] = t1[R − β]
First normal form excludes variable repeating fields and groups. This
is not so much a design guideline as a matter of definition. Relational
database theory doesn't deal with records having a variable number of
fields.
Second And Third Normal Forms:-
Second and third normal forms deal with the relationship between
non-key and key fields.
121
Under second and third normal forms, a non-key field must provide
a fact about the key, us the whole key, and nothing but the key. In
addition, the record must satisfy first normal form.
We deal now only with "single-valued" facts. The fact could be a
one-to-many relationship, such as the department of an employee, or a
one-to-one relationship, such as the spouse of an employee. Thus the
phrase "Y is a fact about X" signifies a one-to-one or one-to-many
relationship between Y and X. In the general case, Y might consist of one
or more fields, and so might X. In the following example, QUANTITY is a
fact about the combination of PART and WAREHOUSE.
Second normal form is violated when a non-key field is a fact about
a subset of a key. It is only relevant when the key is composite, i.e.,
consists of several fields. Consider the following inventory record:
The key here consists of the PART and WAREHOUSE fields together,
but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The
basic problems with this design are:
The warehouse address is repeated in every record that refers to a
part stored in that warehouse.
If the address of the warehouse changes, every record referring to a
part stored in that warehouse must be updated.
Because of the redundancy, the data might become inconsistent,
with different records showing different addresses for the same
warehouse.
If at some point in time there are no parts stored in the warehouse,
there may be no record in which to keep the warehouse's address.
To satisfy second normal form, the record shown above should be
decomposed into (replaced by) the two records:
| PART | WAREHOUSE | QUANTITY | | WAREHOUSE | WAREHOUSE-
ADDRESS |
122
When a data design is changed in this way, replacing unnormalized
records with normalized records, the process is referred to as
normalization. The term "normalization" is sometimes used relative to a
particular normal form. Thus a set of records may be normalized with
respect to second normal form but not with respect to third.
The normalized design enhances the integrity of the data, by
minimizing redundancy and inconsistency, but at some possible
performance cost for certain retrieval applications. Consider an
application that wants the addresses of all warehouses stocking a certain
part. In the unnormalized form, the application searches one record type.
With the normalized design, the application has to search two record
types, and connect the appropriate pairs.
Third Normal Form:-
In this case, it turns out that we can reconstruct all the true facts
from a normalized form consisting of three separate record types, each
containing two fields:
125
UNIT - V
5.1 DATABASE SYSTEM ARCHITECTURE
The architecture of a database system is greatly influenced by the
underlying complete system on which it runs, by such aspects of
computer architectures as networking, parallelism and distribution.
Networking of computers allows some tasks to be executes on server
system, and some tasks to be executed on client systems. This division of
work is client-server database systems.
Parallel processing within a computer system allows database
system activities to be speed up, allowing faster response to transactions,
as well as more transactions per second. The need for parallel query
processing has let to parallel database system.
Keeping multiple copies of the database across different sites also
allows large organizations to continue their database operations even
though the site is affected by a natured disaster. The distributed
database systems handle geographically or administratively distributed
data spread across multiple database system.
5.2 CENTRALIZED AND CLIENT-SYSTEM ARCHITECTURES
Centralized database systems are those that run on a single
computer system and do not interact with computer systems. The client-
server systems have functionality split between a server system and
multiple client systems.
Centralized Systems:-
126
Disk Disk Printer tape drives
System bus
Memory
Controller
Memory
The CPUs have local cache memories that store local copies of parts
of the memory, to speed up access to data. Each device controller is in
charge of a specific type of device.The personal computers and
workstations fall into the single-user systems. A typical single user is a
desktop unit used by a single person, usually with only one CPU and one
or two hard disks, and usually only one person using the machine at a
time.A multi-user system has more disks, more memory, multiple CPUs
and has a multi-user operating system. It serves a large number of users
of users who are connected to the system via terminals. The database
systems designed for use by single users usually, do not provide many of
the facilities that a multi-user database provides.The general-purpose
computer systems have multiple processors, they have coarse–
granularity parallelism, with only a few processors, all sharing the main
memory. Such systems support a higher throughput that is they allow a
greater number of transactions do not run any faster.The database
designed for single-processor machines provides multitasking, allowing
multiple processors to run on the same processor in a time-shared
manner, giving a view to the users of multiple processors running in
parallel.The machines with fine–granularity parallelism have a large
127
number of processors, and database systems running on such machines
attempt to parallelize single tasks submitted by users.
Client-Server Systems:-
Server
Network
Back-end
Data-Server Systems:-
129
Lock Manager Process: This process implements lock manager
functionality, which includes lock grant, lock release, and deadlock
detection.
Database Writer Process: There are one or more processes that
output modified buffer blocks back to disk on a continuous basis.
Log Writer Process: This process output log records from the log
record buffer to stable storage.
Checkpoint Process: This process performs periodic
checkpoints.
Process Monitor Process: This process monitors other process,
and if any of them, it takes recovery actions for the process, such
as aborting any transaction being executed by the failed process,
and then restarting the process.
The shared data contains all shared data, such as
Buffer pool
Lock table
Log buffer, containing log records waiting to be output to the log on
stable storage.
Cached query plans, which can be reused if the same query is
submitted again.
Data Servers:-
130
page, or fine granularity, such as a tuple. Use the term item to refer
to both tuples and objects.
Locking: Locks our usually granted by the server for the data items
that it ships to the client machines. The technique for lock de-
escalation, have been proposed where the server can request its
clients to transfer back locks on prefetched items. If the client
machine does not need a prefetched item, it can transfer locks on
the item, it can transfer locks on the item back to the server, and
the locks can then be allocated to other clients.
Data Caching: Data that are shipped to a client on behalf of a
transaction can be cached at the client, even after the transaction
completes, if sufficient storage space is available.
Lock Caching: If the data is mostly partitioned among the clients,
with clients rarely requesting data that are also requested by other
clients, locks can also cached at the client machine. The server must
keep track of cached locks; if a client requests a lock from the
server, the server must call back all conflicting locks on the data
item from any other client machines that have cached the locks.
Linear speedup
Resources
Figure: Speedup with Increasing Resources
There are two kinds of scaleup that the relevant in parallel database
systems, depends on how the size of the task is measured:
In batch scaleup, the size of the database increases, and the tasks
are large jobs whose runtime depends on the size of the database.In
transaction scaleup, the rate at which transactions are submitted to the
132
database increases and the size of the database increases proportionally
to the transaction rate.
The number of factors works against efficient parallel operations and can
diminish both speedup and scaleup.
Bus All the system components can send data on and receive data
from a single communication bus.
Mesh The components are nodes in grid, and each component
connects to all its adjacent components in the grid. In a two-
dimensional mesh each node connects to four adjacent nodes. The
number of communication links, grows as the number of
components grows, and the communication capacity of a mesh
therefore scales better with increasing parallelism.
Hypercube The components are numbered in binary, and a
component is connected to another if the binary representations of
their number differ in exactly one bit. Thus, each of the n
components is connected to log(n) other components.
Parallel Database Architectures:-
133
Shared Memory:-
Shared Disk:-
134
provide a degree of fault tolerance: If a processor fails, the other
processor can take its tasks, since the database is resident on disks that
are accessible from all processors.The main problem with a shared-disk
system is again scalability. Compared to shared-memory systems, shared-
disk systems can scale to a somewhat larger number of processors, but
communication across processor is slower, since it has to go through a
communication network.
Shared Nothing:-
Hierarchical:-
135
5.5 DISTRIBUTED SYSTEMS
In a distributed database system, the database is stored on several
computers. The computers in a distributed system communicate with one
another through various communication media, such as high–speed
networks or telephone lines. They do not share main memory or disks.
The computers in a distributed system may very in size and function,
ranging from workstations unto mainframe systems. The computers in a
distributed system are referred to by a number of different names, such
as sites or nodes, depending on the content in which they are
mentioned. A local transaction is one that accesses the data only from
sites where the transaction was initiated. A global transaction, on the
other hand, is one that either accesses data in a site different from the
one at which the transaction was initiated, or accessed data in several
different sites.
There are several reasons for building distributed database systems, they
are:
136
recover from the failure. It is crucial for database systems used for
real-time applications.
Netw
ork
Site A Site C
Communication Via
network Site B
A Distributed System
Implementation Issues:-
Local-Area Networks
Wide-Area Networks
Local-Area Networks:-
138
High availability, since data is still accessible even if a
computer fails.
Wide-Area Network:-
139
Replication The system maintains several identical replicas of the
relation, and stores each replica at a different site. The alternative
to replication is to store only one copy of relation r.
Fragmentation The system partitions the relation into several
fragments, and stores each fragment at a different site.
a. Data Replication
If relation r is replicated, a copy of relation r is stored in two or more
sites. The full replication is a copy is stored in every site in the system.
Horizontal Fragmentation
Vertical Fragmentation
1. Horizontal Fragmentation
140
The Horizontal Fragmentation splits the relation by assigning
each tuple of r to one or more fragments. A relation r is partitioned into a
number of subsets, r1, r2,….,rn. Each tuple of relation r must belong to at
least one of the fragments, so that the original relation can be
reconstructed.
Example
2. Vertical Fragmentation
Transparency:-
141
should be able to find any data as long as the data identifier is
supplied by the user transaction.
Local Transaction
Global Transaction
The Local Transactions are those that access and update data in
only one local database. The Global Transactions are those that access
and update data in several local databases.
System Structure:-
Each site has its own local transaction manager, whose function is
to ensure is to ensure the ACID properties of those transactions that
execute at that site.To understand how such a manager can be
implemented, consider an abstract model of a transaction system, in
which each site contains two subsystems:
142
Coordinating the termination of transaction, this may result in the
transaction being committed at all sites or aborted at all sites.
Transaction Coordinator
TC TC
1 n
TM TM
1 n
Transaction manager
Computer 1 Comptuer n
A distributed system may also suffer from the same types of failure
that a centralized system does. The basic failure types are
Failure of a site
Loss of messages
Failure of a communication link
Network partition
143
The loss or corruption of messages is always a possibility in a
distributed system. The system user transmission-control protocols, such
as TCP/IP, to handle such errors.
Ci adds the record <Prepare T> to the log, and forces the log onto
stable storage. It then sends a prepare T message to all sites at which T
executed.On receiving such a message, the transaction manager at that
site determines whether it is waiting to commit its portion of T.If the
answer is no, it adds a record <no T> to the log, and then responds by
sending an abort T message to Ci. If the answer is yes, it adds a record
<ready T> to the log, and forces the log onto stable storage. The
transaction manager then replies with a ready T message to Ci.
144
Phase 2
c. Handling of Failures
d. Network Partition
145
The coordinator and its participants belong to several partitions.
From the view point of the sites in one of the partitions, it appears
that the sites in other partitions have failed.
e. Recovery and Concurrency Control
f. Three-Phase Commit
For example
147
a. Majority-Based Approach
148
actions such as concurrency control and recovery are performed at a
single site, and only data and log records are replicated at other site.This
remote backup systems offer a lower-cost approach to high availability
than replication. On the other hand, replication can provide greater
availability by having multiple replicas available, and using the majority
protocol.
d. Coordinator Selection
149
Venders of cloud service
Traditional computing vendors, Amazon, Google
Cloud-based database:-
Web applications need to store and retrieve data for very large
numbers of users
Traditional parallel databases not designed to scale to 1000’s of
nodes (and expensive)
Value availability and scalability over consistency
Storing and retrieving data items by key value is minimum
functionality
Key-value stores
Systems for data storage on the cloud:-
Bigtable from Google
HBase, an open source clone of Bigtable
Dynamo, which is a key-value storage system from Amazon
Simple Storage Service (S3) from Amazon
Cassandra from Facebook
Sherpa/PNUTs from Yahoo!
150
without having to share a file system. Several Directory Access
Protocols have been developed to provide a standardized way of
accessing data in a directory. The most widely used among them today is
the Lightweight Directory Access Protocol (LDAP).
d. Data Manipulation
151
exchanging information. The querying mechanism in LDAP is very simple,
consisting of just selections and projections, without any join. A query
must specify the following:
SECTION - A
152
2. Draw an E-R Model for an Employee Data
3. What are the functions of a DBA
4. List out the Fundamental operations of a Relational Algebra
5. What are the string operations in SQL?
6. Define a Null value
7. Define Referential Integrity
8. Define Triggers
9. What is atomic domain?
10. Define commit protocols?
SECTION - B
Answer ALL the Questions. (5X5
= 25 )
11. a).Explain the Levels of abstraction
(Or)
b).Write short notes on the following
i) Types of Attributes ii) Symbols in ER Diagram
12. a). Explain the Tuple Relational Calculus
(Or)
b). Explain the Domain Relational Calculus
13. a). Write short notes on Views
(Or)
b). Explain the Modification of database in SQL
153
14. a). Write about Joined relations with examples
(Or)
b).What are the Domain types/Data types in SQL
15. a).Explain in detail about distributed transactions?
(Or)
b).Explain the concepts of cloud based database?
SECTION – C
154
Commerce
DATABASE MANAGEMENT SYSTEM
Time: Three hours Maximum marks: 75 marks
SECTION - A
1. Define DBMS.
2. Define Atomicity.
3. Differentiate Unary & Binary Algebra Operators.
4. Give the Syntax for DRC.
5. List out the Symbols of ER-Diagram.
6. Differentiate Strong & Weak Entity set.
7. What is homogeneous and heterogeneous database?
8. Define constraints
9. What is authorization?
10. What is null values with examples?
SECTION – B
155
15. a. Briefly explain about the network types?
(Or)
b. Briefly explain about the basic concepts of parallel systems?
SECTION – C
Answer ANY THREE Questions. ( 3 X 10
= 30 )
16. Explain the E-R model with examples.
17. Describe the basic structure of SQL query.
18. Explain the relational algebra with examples?
19. Briefly explain about the decomposition using multivalued
dependency?
20. Write in detail about the directory systems with examples.
OBJECTIVE TEST-I
3. A set of operations that take one or two relations as input and produce
a new relation as their result is called as -------
5. The operation returns all rows that appear in either or both of two
tables is ----------------------
158