DBP Notes
DBP Notes
lOMoARcPSD|47481031
UNIT 2
RELATIONAL DATA MODEL
Entity Relationship Model – Relational Data Model – Mapping Entity Relationship Model to Relational
Model – Relational Algebra – Structured Query Language – Database Normalization.
1. INTRODUCTION
Data: Known facts that can be recorded that have implicit meaning.
E.g. Student roll no, names, address etc
DBMS: DBMS is a collection of interrelated data and a set of program to access those data. The primary
goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and
efficient.
Database Applications
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Credit card transactions
Telecommunications & Finance
lOMoARcPSD|47481031
For Example: Failure during transfer of fund from system A to A. It will be debited from A
but not credited to B leading to wrong transaction.
vi. Concurrent Access Anomalies
In order to improve the overall performance of the system and obtain a faster response
time many systems allow multiple users to update the data simultaneously. In such
environment, interaction of concurrent updates may result in inconsistent data.
For Example: Consider bank account A, containing $500. If two customers withdraw
funds say $50 and $100 respectively) from account A at about the same time, the result of the
concurrent executions may leave the account in an incorrect (or inconsistent) state. Balance
will be $400 instead of $350. To protect against this possibility, the system must maintain
some form of supervision.
vii. Security problems
Not every user of the database system should be able to access all the data. System should
be protected using proper security.
For Example: In a banking system, pay roll personnel should be only given authority to
see the part of the database that has information about the various bank employees. They do
not need access to information about customer accounts.
Since application programs added to the system in an ad-hoc manner, it is difficult to enforce
such security constraints.
viii. Integrity problems
The data values stored in the database must satisfy certain types of consistency constrains.
For Example: The balance of a bank account may never fall below a prescribed amount (say
$100).These constraints are enforced in the system by adding appropriate code in the various
application programs.
Advantages of Database
Data base is a way to consolidate and control the operational data centrally. It is a better way to control
the operational data. The advantages of having a centralized control of data are:
When the same data is duplicated and changes are made at one side, which is not propagated
to the other site, it gives rise to inconsistency. Then the two entries regarding the same data will not
agree. So, if the redundancy is removed, chances of having inconsistent data are also removed.
iii. The data can be shared
The data stored from one application, can be used for another application. Thus, the data of
database stored for one application can be shared with new applications.
iv. Standards can be enforced
With central control of the database, the DBA can ensure that all applicable standards are
observed in the representation of the data.
v. Security can be enforced
DBA can define the access paths for accessing the data stored in database and he can define
authorization checks whenever access to sensitive data is attempted.
vi. Integrity can be maintained
Integrity means that the data in the database is accurate. Centralized control of the data helps
in permitting the administrator to define integrity constraints to the data in the database.
3. VIEW OF DATA
A major purpose of a database system is to provide users with an abstract view of the data. That is,
the system hides certain details of how the data are stored and maintained.
Data abstraction
The Complexity is hidden from the users through several level of abstraction. There are three levels
of data abstraction:
i. Physical level: It is the lowest level of abstraction that describes how the data are actually stored.
The physical level describes complex low-level data structures in details.
ii. Logical level: It is the next higher level of abstraction that describes what data are stored in the
database and what relationships exist among those data.
iii. View level: It is the highest level of abstraction that describes only part of the entire database.
lOMoARcPSD|47481031
Data Independence
The ability to modify a scheme definition in one level without affecting a scheme definition in the
next higher level is called data independence. There are two levels of data independence:
1. Physical data independence is the ability to modify the physical scheme without causing application
programs to be rewritten. Modifications at the physical level are occasionally necessary in order to improve
performance.
2. Logical data independence is the ability to modify the conceptual scheme without causing application
programs to be rewritten. Modifications at the conceptual level are necessary whenever the logical structure
of the database is altered.
Logical data independence is more difficult to achieve than physical data independence since
application programs are heavily dependent on the logical structure of the data they access.
Database change over times as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database.
iii. Subschema: A database may also have several subschemas at the view level called as subschemas
that describe different views of the database.
4. DATA MODELS
Underlying structure of the database is called as data models.
It is a collection of conceptual tools for describing data, data relationships, data semantics,
and consistency constraints.
It is a way to describe the design of the database at physical, logical and view level.
Different types of data models are:
Relational Model
The relational model uses a collection of tables to represent both data and the relationship among
those data.
Each table has multiple columns and each column has a unique name.
Software such as Oracle, Microsoft SQL Server and Sybase are based on the relational model.
E.g. Record Based model. It is based on fixed format records of several types.
Hierarchical Model
Hierarchical database organize data in to a tree data structure such that each record type has only
one owner
Hierarchical structures were widely used in the first main frame database management systems.
Links are possible vertically but not horizontally or diagonally.
lOMoARcPSD|47481031
Advantages
High speed of access to large datasets.
Ease of updates.
Simplicity: the design of a hierarchical database is simple.
Data security: Hierarchical model was the first database model that offered the data security
hat is provided and enforced by the DBMS.
Efficiency: The hierarchical database model is a very efficient one when the database
contains a large number of transactions, using data whose relationships are fixed.
Disadvantages
Implementation complexity
Database management problems
Lack of structural independence
Network Model
The model is based on directed graph theory.
The network model replaces the hierarchical tree with a graph thus allowing more
general connections among the nodes.
The main difference of the network model from the hierarchical model is its ability to handle
many- to-many (n: n) relationship or in other words, it allows a record to have more than one parent.
Example is, an employee working for two departments.
Sample network model
5. DATABASE LANGUAGES
Data definition and data manipulation languages are not two separate languages but part of a single
database language such as SQL language.
Data definition language
DDL specifies the database schema and some additional properties to data.
The storage structure and access methods are specified using specified using special type of DDL called
s data storage and data definition language.
The data values stored in the database must satisfy certain consistency constraints. For example,
suppose the balance on an account should not fall below $100.
Database system concentrates on constraints that have less overload.
1. Domain Constraints:
Domain of possible value should be associated with every attributes.
E.g. integer type, character type, date/time type
Declaring attributes to a particular domain will act as a constraint on that value.
A data-manipulation language (DML) is a language that helps users to access or manipulate data.
Application program interface standards like ODBC and JDBC are used for interaction between the
client and the server.
Two tier architecture
In three tier architecture, the client machines act as a front end and do not contain any direct database
calls.
The client end communicates with the application servers through interface.
Storage Manager
A storage manager is a program module that provides the interface between the low level data stored in
the database and the application programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file manager.
The storage manager translates the various DML statements into low-level file system commands. Thus,
the storage manager is responsible for storing, retrieving, and updating data in the database.
Components of the storage manager are:
1. Authorization and integrity manager: It tests for satisfaction of various integrity constraints and
checks the authority of users accessing the data.
2. Transaction manager: It ensures that the database remains in a consistent state despite system
failures, and concurrent executions proceed without conflicting.
3. File manager: It manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
4. Buffer manager: It is responsible for fetching data from disk storage into main memory and to
decide what data to cache in main memory. It enables the database to handle data sizes that are much
larger than the size of the main memory. The storage manager implements several data structures as
part of physical system implementation.
i. Data files: which store the database itself.
ii. Data dictionary: It contains metadata that is data about data. The schema of a table is an
example of metadata. A database system consults the data dictionary before reading and
modifying actual data.
iii. Indices: Which provide fast access to data items that hold particular values.
The Query Processor
lOMoARcPSD|47481031
The query processor is an important part of the database system. It helps the database system to simplify
and facilitate access to data. The query processor components include:
1. DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
2. DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
A query can be translates into any number of evaluations plans that all give the same result.
The DML compiler also performs query optimization, that is, it picks up the lowest cost
evaluation plan from among the alternatives.
3. Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
lOMoARcPSD|47481031
• Schema definition. The DBA creates the original database schema by executing a set of data definition
statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization.
• Granting of authorization for data access. By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access.
Authorization information is kept in a special system structure that the database system consults whenever
someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator‘s routine maintenance activities are:
1. periodically backing up the database
2. Ensuring that enough free disk space
3. Monitoring jobs running on the database and ensuring that performance is not degraded by very
expensive tasks submitted by some users.
4. Ensuring that performance is not degraded by very expensive tasks submitted by some users.
An entity may be concrete, such as person or a book, or it may be abstract, such as a loan, or a
holiday.
An entity set is a set of entities of the same type that share the same properties, or attributes.
lOMoARcPSD|47481031
For example all persons who are customers at a given bank can be defined as entity set customer.
The properties that describe an entity are called attributes.
Types of relationships
i) Unary relationship: A unary relationship exists when an association is maintained within a single entity.
Boss Employee
Manager
Worker
ii) Binary relationship: A binary relationship exists when two entities are associated.
iii) Ternary relationship: A ternary relationship exists when there are three entities associated.
Student
lOMoARcPSD|47481031
iv) Quaternary relationship: A quaternary relationship exists when there are four entities associated.
Teacher
Student Studies Course material
Subject
The number of entity set participating in a relationship is called degree of the relationship set.
Binary relationship set is of degree 2; a tertiary relationship set is of degree 3.
Entity role: The function that an entity plays in a relationship is called that entity‘s role. A role is
one end
of an association.
Person Company
Works-
for
Employee Employee
Here Entity role is Employee
3. Attributes
The properties that describes an entity is called attributes.
The attributes of customer entity set are customer_id, customer_name and city.
Each attributes has a set of permitted values called the domain or value set.
Each entity will have value for its attributes.
Example:
Customer Name John
Customer Id 321
Simple
Composite
Single- valued
Multi-valued
Derived
lOMoARcPSD|47481031
1) Simple attribute:
This type of attributes cannot be divided into sub parts.
Example: Age, sex, GPA
2) Composite attribute:
This type of attributes Can be subdivided.
Example: Address: street, city, state, zip
3) Single-valued attribute:
This type of attributes can have only a single value
Example: Social security number
4) Multi-valued attribute:
Multi-valued attribute Can have many values.
Example: Person may have several college degrees, phone numbers
5) Derived attribute:
Derived attribute Can be calculated or derived from other related attributes or entities.
Example: Age can be derived from D.O.B.
6) Stored attributes:
The attributes stored in a data base are called stored attributes.
An attribute takes a null value when an entity does not have a value for it.
Null values indicate the value for the particular attribute does not exists or unknown.
E.g. : 1. Middle name may not be present for a person (non existence case)
2. Apartment number may be missing or unknown.
CONSTRAINTS
An E-R enterprise schema may define certain constraints to which the contents of a database system
must conform.
Three types of constraints are
1. Mapping cardinalities
2. Key constraints
3. Participation constraints
1. Mapping cardinalities
Mapping cardinalities express the number of entities to which another entity can be associated
via a relationship set.
Cardinality in E-R diagram that is represented by two ways:
i) Directed line ( ) ii) Undirected line ( )
lOMoARcPSD|47481031
ii) One-to-many: An entity in A is associated with any number of entities (zero or more) in B. An
entity in B, however, can be associated with at most one entity in A.
iii) Many-to-one: An entity in A is associated with at most one entity in B. An entity in B, however, can
be associated with any number (zero or more) of entities in A.
lOMoARcPSD|47481031
Example: Many employees works for a company. This relationship is shown by many-to-one as given
below.
Employees Company
Works-for
iv) Many-to-many: An entity in A is associated with any number (zero or more) of entities in B, and an
entity in B is associated with any number (zero or more) of entities in A.
Example: Employee works on number of projects and project is handled by number of employees.
Therefore, the relationship between employee and project is many-to-many as shown below.
Works- on
Employee Project
2. Keys
A key allows us to identify a set of attributes and thus distinguishes entities from each other.
Keys also help uniquely identify relationships, and thus distinguish relationships from each other.
Any attribute or combination of attributes that uniquely identifies a row in the table.
Superkey Example: Roll_No attribute of the entity set ‗student‘ distinguishes one student
entity
from another. Customer_name, Customer_id together is a Super key
Minimal Superkey. A superkey that does not contain a subset of attributes that is itself a
superkey.
Candiate Key
Example: Student_name and Student_street,are sufficient to uniquely identify one
particular student.
The candidate key selected to uniquely identify all rows. It should be rarely changed and
Primary Key cannot contain null values.
Secondary Key An attribute or combination of attributes used to make data retrieval more efficient.
3. Participation Constraint
9. ENTITY-RELATIONSHIP(E-R) DIAGRAMS
E-R diagram can express the overall logical structure of a database graphically.
E-R diagram consists of the following major components:
lOMoARcPSD|47481031
Double lines are used in an E-R diagram to indicate that the participation of an entity set in a
relationship set is total; that is, each entity in the entity set occurs in at least one relationship in that
relationship set.
lOMoARcPSD|47481031
The number of time an entity participates in a relationship can be specified using complex
cardinalities.
An edge between an entity set and binary relationship set can have an associated minimum and
maximum cardinality assigned in the form of l..h.
l - Minimum cardinality
h - Maximum cardinality
A minimum value of 1 indicates total participation of the entity set in the relationship set.
A maximum value of 1 indicates that the entity participates in at most one relationship.
A maximum value * indicates no limit.
A label 1... on an edge is equivalent to a double line.
An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a
weak entity set.
An entity set that has a primary key is termed a strong entity set.
Weak entity set is associated with another entity set called the identifying or owner entity set. ie,
weak entity set is said to be existence dependent on the identifying entity set.
Identifying entity set is said to own the weak entity set.
The relationship among the weak and identifying entity set is called the identifying relationship.
lOMoARcPSD|47481031
Discriminator in a weak entity set is a set of attributes that distinguishes the different entities among
the weak entity also called as partial key.
Extended E-R Features
ER model that is supported with the additional semantic concepts is called the extended entity
relationship model or EER model.
EER model deals with
1. Specialization
2. Generalization
3. Aggregation
1. Specialization:
The process of designating subgroupings within an entity set is called Specialization
Specialization is a top-down process.
Consider an entity set person. A person may be further classified as one of the following:
• Customer
• Employee
All person has a set of attributes in common with some additional attributes.
If an entity set is involved as a lower-level entity set in more than one ISA relationship, then the
entity set has multiple inheritance and the resulting structure is said to be a lattice.
Constraints on Generalizations
1. One type of constraint determining which entities can be members of a lower-level entity set. Such
membership may be one of the following:
• Condition-defined. In condition-defined the members of lower-level entity set is evaluated on
the basis of whether or not an entity satisfies an explicit condition.
• User-defined. User defined constraints are defined by user.
2. A second type of constraint relates to whether or not entities may belong to more than one lower-
level entity set within a single generalization. The lower-level entity sets may be one of the
following:
• Disjoint. A disjointness constraint requires that an entity belong to no more than one lower-
level entity set.
• Overlapping. Same entity may belong to more than one lower-level entity set within a single
generalization.
3. A final constraint, the completeness constraint specifies whether or not an entity in the higher-level
entity set must belong to at least one of the lower-level entity sets .This constraint may be one of the
following:
• Total generalization or specialization. Each higher-level entity must belong to a lower-level
entity set. It is represented by double line.
• Partial generalization or specialization. Some higher-level entities may not belong to any
lower-level entity set.
3. Aggregation
One limitation of the E-R model is that it cannot express relationships among relationships.
Consider the ternary relationship works-on, between a employee, branch, and job. Now, suppose
we want to record managers for tasks performed by an employee at a branch. There another
entity set manager is created.
The best way to model such a situation is to use aggregation.
Aggregation is an abstraction through which relationships are treated as higherlevel entities.
In our example works-on act as high level entity.
lOMoARcPSD|47481031
Summary of ER diagram
lOMoARcPSD|47481031
2 Mark Questions
16 Mark Questions
UNIT II RELATIONAL
MODEL
The relational Model – The catalog- Types– Keys - Relational Algebra – Domain Relational Calculus –
Tuple Relational Calculus - Fundamental operations – Additional Operations- SQL fundamentals -
Integrity – Triggers - Security – Advanced SQL features –Embedded SQL– Dynamic SQL- Missing
Information– Views – Introduction to Distributed Databases and Client/Server Databases
A relational database consists of a collection of tables, each of which is assigned a unique name.
Each column header is attributes. Each attribute allows a set of permitted values called domain of
that attribute.
Mathematically table is called as a relation and rows in a table are called as tuples.
The tuples in a relation can be either sorted or unsorted.
Several attributes can have same domain. E.g.: customer_name, employee_name.
Account Table
2. THE CATALOG
The catalog is a place where all the schemas and the corresponding mappings are kept.
The catalog contains detailed information also called as descriptor information or meta data.
Descriptor information is essential for the system to perform its job properly.
For example the authorization subsystem uses catalog information about users and security
constraints to grant or deny access to a particular user.
E1 – E2
E1 x E2
p (E1), P is a predicate on attributes in E1
s (E1), S is a list consisting of some of the attributes in E1
x (E1), x is the new name for the result of E1
Additional operations that can be expressed in terms of basic operations-Set intersection, Natural
join Division and Assignment.
The other three operations (union, set difference, cartesian product) operate on pairs of relations and
are, therefore, called binary operations.
It allows combination of server predicates using connectives like and (∧), or (∨), and not (¬).
E.g.:1. σ amount>1200 (loan)
2. σ branch-name =―Perryridge‖ ∧amount>1200 (loan)
Other Examples
Consider following Book relation.
Book_Id Title Author Publisher Year Price
B001 DBMS Korth McGraw_Hill 2000 250
B002 Compiler Ulman 2004 350
B003 OOMD Rambaugh 2003 450
B004 PPL Sabista 2000 500
Query 1: σyear=2000(Book)
The output of query 1 is shown below.
Book_Id Title Author Publisher Year Price
B001 DBMS Korth McGraw_Hill 2000 250
B004 PPL Sabista 2000 500
Example 3: Select the tuples for all books whose publishing year is 2000 or price is greater than 300.
Query 3: σ(year=2000) OR (price>300)
(Book) The output of query 3 is shown
below.
Book_Id Title Author Publisher Year Price
B001 DBMS Korth McGraw_Hill 2000 250
B002 Compiler Ulman 2004 350
B003 OOMD Rambaugh 2003 450
B004 PPL Sabista 2000 500
Example 4: Select the tuples for all books whose publishing year is 2000 and price is greater than 300.
Query 3: σ(year=2000) AND (price>300)
(Book) The output of query 4 is shown below.
The project operation selects certain columns from a table while discarding others. It removes any
duplicate tuples from the result relation.
Syntax
Π<attributelist> ( R )
Example: The following are the examples of project operation on Book relation.
Example 1: Display all titles with author name.
Query 1: ΠTitle, Author (Book)
The output of query 1 is shown
below.
Title Author
DBMS Korth
Compiler Ulman
OOMD Rambaugh
PPL Sabista
Example 2: Display all book titles with authors and price.
Query 2: ΠTitle, Author, Price ( Book
) The output of query 2 is shown below.
Title Author Price
DBMS Korth 250
Compiler Ulman 350
OOMD Rambaugh 450
PPL Sabista 500
Composition of select and project operations
The relational operations select and project can be combined to form a complicated query.
Π customer-name (σ customer-city =―Harrison‖
Output:
Customer-name
Hayes
Example: Display the titles of books having price greater than 300.
Query: ΠTitle,( σprice>300(Book))
The output of query 1 is shown below.
Title
Compiler
OOMD
PPL
Example 1: Renames both the relation and its attributes, the second renames the relation only and the third
renames as follows.
6. Cartesian-Product Operation(X)
Cartesian product is also known as CROSS PRODUCT or CROSS JOINS.
Publisher_Info
Publisher_code Name
P0001 McGraw_Hill
P0002 PHI
P0003 Pearson
Book_Info
Book_ID Title
B0001 DBMS
lOMoARcPSD|47481031
B0002 Compiler
The Cartesian product of Publisher_Info and Book_Info is given in fig .
Publisher_Info X Book_Info
Publisher_code Name Book_ID Title
P0001 McGraw_Hill B0001 DBMS
P0002 PHI B0001 DBMS
P0003 Pearson B0001 DBMS
P0001 McGraw_Hill B0002 Compiler
P0002 PHI B0002 Compiler
P0003 Pearson B0002 Compiler
The result of intersection operation is a relation that includes all tuples that are in both Relation1 and
Relation2.
The intersection operation is denoted by depositor borrower.
Syntax: Relation1 Relation 2
Example:
Π customer-name (borrower) Π customer-name (depositor)
The result relation for this query:
2. Natural join ( )
lOMoARcPSD|47481031
The natural join operation performs a selection on those attributes that appear in both relation
schemes and finally removes duplicate attributes.
Syntax: Relation1 Relation 2
Example: consider the 2 relations
Employee Salary
Query:Π emp_name, salary (employee salary)
Emp_code Emp_name Emp_code Salary
The output of query is:
E0001 Hari E0001 2000
Division Operation is suited to queries that include the phrase ‗for all‘.
Depositor Relation
Suppose that we wish to find all customers who have an account at all the
branches located in Brooklyn.
Step 1: We can obtain all branches in Brooklyn by the expression
r1 = Π branch-name (σ branch-city =―Brooklyn‖ (branch))
The result relation for this expression is shown in figure.
Step 2: We can find all (customer-name, branch-name) pairs for which the customer has an account at a
branch by writing
r2 = Π customer-name, branch-name (depositor account)
Figure shows the result relation for this expression.
lOMoARcPSD|47481031
Now, we need to find customers who appear in r2 with every branch name in r1. The operation that
provides exactly those customers is the divide operation.
Thus, the query is
Π customer-name, branch-name (depositor account)
÷ Π branch-name (σ branch-city =―Brooklyn‖ (branch))
The result of this expression is a relation that has the schema (customer-name) and that contains the tuple
(Johnson).
4. The Assignment Operation ( )
The assignment operation works like assignment in a programming language.
Example:
Result of the expression to the right of the ← is assigned to the relation variable on the left of the←.
With the assignment operation, a query can be written as a sequential program consisting of a series
of assignments followed by an expression whose value is displayed as the result of the query.
1. Generalized projection
2. Aggregate operations
3. Outer join.
1. Generalized projection
The generalized-projection operation extends the projection operation by allowing arithmetic
functions to be used in the projection list.
The generalized projection operation has the form
lOMoARcPSD|47481031
2. Aggregate Functions
Aggregate functions take a collection of values and return a single value as a result. Few Aggregate
Function are,
1. Avg
2. Min
3. Max
4. Sum
5. Count
1. Avg: The aggregate function avg returns the average of the values.
Example: Use the pt-works relation in Figure
G avg (salary)(pt-works)
lOMoARcPSD|47481031
Result: Salary
2062.5
The attribute branch-name in the left-hand subscript of G indicates that the input relation pt-
works must be divided into groups based on the value of branch-name.
The calculated sum is placed under the attribute name sum-salary and the maximum salary is
placed under the attribute max-salary.
3. Sum:
The aggregate function sum returns the total of the values.
Example: Suppose that we want to find out the total sum of salaries.
G sum(salary)(pt-works)
The symbol G is the letter G in calligraphic font; read it as ―calligraphic G.‖
Result:
Salary
16500
4. Count:
Returns the number of the elements in the collection,
Result: The result of this query is a single row containing the value 3.
3. Outer join
Joins are classified into three types namely:
lOMoARcPSD|47481031
1. Inner Join
2. Outer Join
3. Natural Joint
Inner Join ( )
Inner Join returns the matching rows from the tables that are being jointed.
Example: Consider the two relations
Example:
Result:
Outer Join
The outer-join operation is an extension of the join operation to deal with missing information.
Outer-join operations avoid loss of information.
Outer Joins are classified into three types namely:
1. Left Outer Join
2. Right Outer Join
3. Full Outer Join
The left outer join ( ) takes all tuples in the left relation that did not match with any tuple in the
right relation, pads the tuples with null values for all other attributes from the right relation, and adds them
to the result of the natural join.
lOMoARcPSD|47481031
Example:
Result:
Example:
Result:
The full outer join( ) does both of those operations, padding tuples from the left relation that did
not match any from the right relation, as well as tuples from the right relation that did not match any from
the left relation, and adding them to the result of the join.
Example:
Result:
lOMoARcPSD|47481031
4. RELATIONAL CALCULUS
Relational Calculus is a formal query language where we can write one declarative expression to
specify a retrieval request and hence there is no description of how to retrieve it.
A calculus expression specifies what is to be retrieved rather than how to retrieve it.
2. s[x] u[y], where s and u are tuple variables, x is an attribute on which s is defined, y is
formulae.
Equivalence relation in Tuple relational calculus
P1 ∧ P2 is equivalent to ¬ (¬ (P1) ∨ ¬ (P2)).
3. Fin
d the names of all customers having a loan, an account, or both at the bank
4. Find the names of all customers who have a loan and an account at the bank
5. Find the names of all customers having a loan at the Perryridge branch
Safety of Expressions
To guard against the problem, a domain is defined for all tuple relational calculus formula P.
It is denoted by dom(P). it denotes that P can take value only in that domain.
An expression {t | P (t )} in the tuple relational calculus is safe if every component of t appears in
one of the relations, tuples, or constants that appear in P
It servers as the theoretical basis of widely used Query By Example (QBE) language.
1. Find the loan_number, branch_name, and amount for loans of over $1200
2. Find the names of all customers who have a loan of over $1200
3. Find the names of all customers who have a loan from the Perryridge branch and the loan
amount:
{ c, a | l ( c, l borrower b ( l, b, a loan b = ―Perryridge‖))}
{ c, a | l ( c, l borrower l, “ Perryridge”, a loan)}
Safety of Expressions
The expression: { x1, x2, …, xn | P (x1, x2, …, xn )} is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom (P) (that is, the values appear
either in P or in a tuple of a relation mentioned in P).
2. For every ―there exists‖ subformula of the form x (P1(x)), the subformula is true if and only if there
is a value of x in dom (P1) such that P1(x) is true.
3. For every ―for all‖ subformula of the form x (P1 (x)), the subformula is true if and only if P1(x) is
true for all values x from dom (P1).
5. SQL FUNDAMENTALS
5.1. Introduction
SQL is a standard common set used to communicate with the relational database
management systems.
SQL enables the end-users and systems personnel to deal with a number of database management
systems where it is available.
Application written in SQL can be easily ported across systems.
Data-definition language (DDL). The SQL DDL provides commands for defining relation
schemas, deleting relations, and modifying relation schemas.
Interactive data-manipulation language (DML). The SQL DML includes a query language based
on both the relational algebra and the tuple relational calculus. It also includes commands to insert
tuples into, delete tuples from, and modify tuples in the database.
View definition. The SQL DDL includes commands for defining views.
Transaction control. SQL includes commands for specifying the beginning and ending of
transactions.
Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements can
be embedded within general-purpose programming languages, such as C, C++, Java, PL/I, COBOL,
Pascal, and FORTRAN.
Integrity. The SQL DDL includes commands for specifying integrity constraints that the data stored
in the database must satisfy. Updates that violate integrity constraints are disallowed.
Authorization. The SQL DDL includes commands for specifying access rights to relations and
views.
5.4. Domain Types in SQL
1. Char (n): Fixed length character string, with user-specified length n.
2. varchar(n): Variable length character strings, with user-specified maximum length n.
3. int: Integer (a finite subset of the integers that is machine-dependent).
4. Smallint: Small integer (a machine-dependent subset of the integer domain type).
5. numeric (p,d): fixed point number, with user-specified precision of p digits, with n digits to the
right of decimal point.
6. Real, double precision: Floating point and double-precision floating point numbers, with
machine-dependent precision.
7. float (n): Floating point number, with user-specified precision of at least n digits.
8. Date: Dates, containing a (4 digit) year, month and date
Example: date ‗2005-7-27‘
9. Time: Time of day, in hours, minutes and seconds.
Example: time ‗09:00:30‘ time ‗09:00:30.75‘
10. Timestamp: date plus time of day
Example: timestamp ‗2005-7-27 09:00:30.75‘
11. Interval: period of time
lOMoARcPSD|47481031
Syntax: create table <table name> (columnname1 data type (size), Columnname 2 data
ii) Syntax: alter table <table name> modify (columnname new data type (size));
Example: alter table customer modify (social_security_no varchar2 (11));
iii) Syntax: alter table <table name> add (new columnname data type (size));
Example: alter table customer add (acc_no varchar2(5));
Example:
a) Find the names of all branches in the loan table
select branch_name from loan;
b) List all account numbers made by brighton branch
select acc_no from account where branch_name = 'brighton';
c) List the customers who are living in the city harrison
select cust_name from customer where cust_city = 'harrison';
Where P
Where, A1-represent an
attribute R1-represent
relation
P-is a predicate
Example:
―Find the names of all branches in the loan relation‖:
select branch-name from loan
select distinct branch-name from loan /*Distinct keyword eleminates duplicates*/
select all branch-name from loan /*Duplicates are not removed*/
―Find all loan numbers for loans made at the Perryridge branch with loan amounts greater that $1200.‖
select loan-number from loan where branch-name = ‘Perryridge‘ and amount > 1200
Rename Operation
The SQL allows renaming relations and attributes using the as clause:
Old-name as new-name
lOMoARcPSD|47481031
Example: Find the name, loan number and loan amount of all customers; rename the column name
loan_number as loan_id.
select customer_name ,borrower.loan_number as loan_id,a mount from borrower,loan where
borrower.loan_number = loan.loan_number
Tuple Variables
Tuple variables are defined in the from clause via the use of the as clause.
Example: Find the customer names and their loan numbers for all customers having a loan at some
branch.
select customer_name, T.loan_number, S.amount from borrower as T, loan as S
where T.loan_number = S.loan_number
String Operation
SQL includes a string-matching operator for comparisons on character strings.The operator
―like‖
uses patterns that are described using two special characters:
o percent (%). The % character matches any substring.
o underscore ( _ ). The _ character matches any character.
Example:
‘Perry%‘ matches any string beginning with ―Perry‖.
‘%idge%‘ matches any string containing ―idge‖ as a substring, for example, ‘Perryridge‘,
‘Rock
Ridge‘, ‘Mianus Bridge‘, and ‘Ridgeway‘.
‘- - - ‘ matches any string of exactly three characters.
‘ - - -%‘ matches any string of at least three characters.
Example: Select * from customer where customer_name like ‘j%‘;
Select * from customer where customer_Street like ‘_a%‘;
Example :select distinct customer_name from borrower, loan where borrower loan_number =
loan.loan_number and branch_name = 'Perryridge' order by customer_name
We may specify desc for descending order or asc for ascending order, for each attribute;
ascending order is the default.
Example: order by customer_name desc
Set Operations
Set operators combine the results of two queries into a single one.
1. Union – returns all distinct rows selected by either query.
Example: Find all customers having a loan, an account or both at the bank
Query: select cust_name from depositor union select cust_name from borrower;
3. Intersect – returns only rows that are common to both the Queries
Example: Find all customers who have both a loan, and an account at the bank.
Query: select cust_name from depositor intersect select cust_name from borrower;
4. Minus – returns all distinct rows selected only by the first Query and not by the
second.
Example: To find all customers who have an account but no loan at the bank.
Query: select cust_name from depositor minus select cust_name from borrower;
Aggregate Function
These functions operate on the multiset of values of a column of a relation, and return a value
(a) AVG - To find the average of values.
Example: Find the average of account balance from the account table
Query: select avg (balance) from account;
(b) SUM – To find the sum of values.
Example: Find the sum of account balance from the account table
lOMoARcPSD|47481031
Query: select branch_name, avg (balance) from account group by branch_name having
branch_name=‘brighton‘;
Null Values
It is possible for tuples to have a null value, denoted by null, for some of their attributes
Null signifies an unknown value or that a value does not exist.
The predicate is null can be used to check for null values.
Example: Find all loan number which appears in the loan relation with null values for
amount.
select loan_number from loan where amount is null
lOMoARcPSD|47481031
and: The result of true and unknown is unknown, false and unknown is false, while unknown and unknown
is unknown.
or: The result of true or unknown is true, false or unknown is unknown, while unknown or unknown is
unknown.
not: The result of not unknown is unknown.
Nested Subqueries
SQL provides a mechanism for the nesting of subqueries.
A subquery is a select-from-where expression that is nested within another query.
A common use of subqueries is to perform tests for set membership, set comparisons, and
set cardinality.
Example 1: Find all the information of customer who has an account number is A-101.
Query: select * from customer where cust_name=(select cust_name
from depositor where acc_no='A-101');
Example 2:Find all customers who have a loan from the bank, find their names And loan numbers.
Query: select cust_name, loan_no from borrower where loan_no in
(select loan_no from loan);
1. Set memberships
INExample:
Select * from customer where
customer_name in(‗Hays‘,Jones‘);
NOT
INExample: Select * from customer where customer_name not in(‗Hays‘,Jones‘);
2. Set comparisons
SQL uses various comparison operators such as <, <=,=,>,>=,<>,any, all,
some,>some,>any etc to compare sets.
Examples 1: Select * from borrower where loan_number<any(select loan_number
from loan 2 where
branch_name=‘Perryridge‘);
Example 2: Select loan_no from loan from amount<=30000;
Example: Select title from book where exists(select * from order where book.book-
id=order.book_id);
Similar to exists we can use not exists also.
Example: Select title from book where not exists(select * from order where book.book-
id=order.book_id);
1. Derived Relations
SQL allows a subquery expression to be used in the from clause. If we use such an
expression, then we must give the result relation a name, and we can rename the attributes. For renaming as
clause is used
For example: ―Find the average account balance of those branches where the average account
balance is
greater than $1200.‖
Select branch-name, avg-balance from (select branch-name, avg (balance) from account
group by branch-name)as branch-avg (branch-name, avg-balance) where avg-balance > 1200
Here subquery result is named as branch-avg with attributes of branch-name and avg-balance.
2. with clause.
The with clause provides a way of defining a temporary view, whose definition is available
only to the query in which the with clause occurs.
lOMoARcPSD|47481031
Consider the following query, which selects accounts with the maximum balance; if there are
many accounts with the same maximum balance, all of them are selected.
with max-balance (value) as
select max(balance)
from account
select account-number
from account, max-balance
where account.balance = max-balance.value
6. INTEGRITY
Integrity constraints ensures that changes made to the database by authorized users donot result in a
loss of data consistency.
It is a mechanism used to prevent invalid data entry into the table.
Example: create table student (name char(15) not null,student-id char(10), degree_level char(15),
primary key(student_id), check(degree_level
in(‗bachelors‘,‘master‘,‘doctorate‘)));
The create domain clause can be used to de.ne new domains. For example, the statements:
create domain Dollars numeric(12,2)
create domain Pounds numeric(12,2)
2. Entity integrity Constraints
The entity integrity constraints state that no primary key value can be null. This is because the
primary key value is used to identify individual tuples in a relation.
Types
Unique Constraint
Primary key Constraint
a) UNIQUE – Avoid duplicate values unique(Aj1,Aj2,
……,Ajm)
The unique specification saya that attributes Aj1,Aj2,……,Ajm form a candidate key. These attributes
should have distinct values.
Syntax :create table <table name>(columnname data type (size) constraint
constraint_name unique);
b) Composite UNIQUE – Multicolumn unique key is called composite unique key
Syntax : create table <table name>(columnname1 data type (size), columnname2 data type
(size), constraint constraint_name unique (columnname1, columnname2));
c) PRIMARY KEY – It will not allow null values and avoid duplicate values.
Syntax : create table <table name>(columnname data type (size) constraint constraint_name
primary key);
d) Composite PRIMARY KEY – Multicolumn primary key is called composite primary key
Syntax : create table <table name>(columnname1 data type (size), columnname2 data type
(size), constraint constraint_name primary key (columnname1, columnname2));
3. REFERENTIAL INTEGRITY
Ensures that a value appears in one relation for a given set of attributes also appears for a certain set
of attributes in another relation. This condition is called referential integrity
Reference key (foreign key) – Its represent relationships between tables. Foreign key is a column whose
values are derived from the primary key of the same or some other table.
lOMoARcPSD|47481031
Syntax: create table <table name>(columnname data type (size) constraint constraint_name
references parent_table_name);
Formal Definition
Let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively.
The subset of R2 is a foreign key referencing K1 in relation r1, if for every t2 in r2 there
must be a tuple t1 in r1 such that t1[K1] = t2[ ].
Referential integrity constraint also called subset dependency since its can be written as
(r2) K1 (r1)
Assertions
An assertion is a predicate expressing a condition that we wish the database always to satisfy.
An assertion in SQL takes the form
Create assertion <assertion-name> check <predicate>
When an assertion is made, the system tests it for validity, and tests it again on every update that
may violate the assertion
Asserting for all X,P(X) is achieved in a round-robin fashion using not exists X such that not
P(X).
Assertion Example
The sum of all loan amounts for each branch must be less than the sum of all account balances at
the branch.
create assertion sum-constraint check (not exists (select * from branch where (select
sum(amount) from loan where loan.branch-name = branch.branch-name) >= (select
sum(balance) from account where account.branch-name = branch.branch-name)))
7. TRIGGERS
A trigger is a statement that is executed automatically by the system as a side effect of a
modification to the database.
To design a trigger mechanism, we must:
Specify the conditions under which the trigger is to be executed.
Specify the actions to be taken when the trigger executes.
Trigger Example
Suppose that instead of allowing negative account balances, the bank deals with overdrafts by
setting the account balance to zero
lOMoARcPSD|47481031
Security of data is important concept in DBMS because it is essential to safeguard the data against
any unwanted users.
It is a protection from malicious attempts to steal or modify data.
There are five different levels of security
1. Database system level
Authentication and authorization mechanism to allow specific users access only to required data.
2. Operating
Protection from invalid logins
File-level access protection
Protection from improper use of ―superuser‖ authority.
Protection from improper use of privileged machine instructions.
3. Network level
Each site must ensure that it communicates with trusted sites.
Links must be protected from theft or modification of messages.
Mechanisms used
Identification protocol (password based)
Cryptography
4. Physical level
Protection of equipment from floods,power failure etc.
Protection of disks from theft,erasure,physical damage etc.
Protection of network and terminal cables from wire tapes,non-invasive electronic
eavesdropping,physical damage, etc.
Solution
Replication hardware-mirrored disks,dual busses etc.
Multiple access paths between every pair of devices.
Physical security by locks,police etc.
lOMoARcPSD|47481031
Authorization
revoke select on
branch from U1, U2, U3 restrict
With restrict, the revoke command fails if cascading revokes are required.
Roles
Roles permit common privileges for a class of users can be specified just once by creating a
corresponding ―role‖
Privileges can be granted to or revoked from roles, just like user
Roles can be assigned to users, and even to other roles
o create role teller
create role manager
o grant select on
branch to teller
grant update (balance) on account to teller
grant all privileges on account to manager
grant teller to manager
grant teller to alice, bob
grant manager to avi
Authorization and Views
Users can be given authorization on views, without being given any authorization on the relations
used in the view definition
Ability of views to hide data serves both to simplify usage of the system and to enhance security by
allowing users access only to data they need for their job
A combination or relational-level security and view-level security can be used to limit a
user‘s
access to precisely the data that user needs.
Granting of Privileges
The passage of authorization from one user to another may be represented by an authorization grant
graph.
The nodes of this graph are the users.
The root of the graph is the database administrator.
Consider graph for update authorization on loan.
An edge Ui Uj indicates that user Ui has granted update authorization on loan to Uj.
Authorization Grant Graph
lOMoARcPSD|47481031
If the database administrator revokes authorization from U2, U2 retains authorization through U3,
If authorization is revoked subsequently from U3, U3 appears to retain authorization through U2.
lOMoARcPSD|47481031
When the database administrator revokes authorization from U3, the edges fromU3 to U2 and from
U2 to U3 are no longer part of a path starting with the database administrator.
The edges between U2 and U3 are deleted, and the resulting authorization graph is
Audits Trials
An audit trail is a log of all changes (inserts/deletes/updates) to the database along with information
such as which user performed the change, and when the change was performed.
Used to track erroneous/fraudulent updates.
Can be implemented using triggers, but many database systems provide direct support.
Limitations of SQL Authorization
SQL does not support authorization at a tuple level
o E.g. we cannot restrict students to see only (the tuples storing) their own grades
With the growth in Web access to databases, database accesses come primarily from application
servers.
o End users don't have database user ids, they are all mapped to the same database user id
All end-users of an application (such as a web application) may be mapped to a single database
user
The task of authorization in above cases falls on the application program, with no support from
SQL
o Benefit: fine grained authorizations, such as to individual tuples, can be implemented by the
application.
o Drawback: Authorization must be done in application code, and may be dispersed all over
an application
o Checking for absence of authorization loopholes becomes very difficult since it requires
reading large amounts of application code
Encryption
lOMoARcPSD|47481031
Data Encryption Standard (DES) substitutes characters and rearranges their order on the basis of an
encryption key which is provided to authorized users via a secure mechanism. Scheme is no more secure
than the key transmission mechanism since the key has to be shared.
Advanced Encryption Standard (AES) is a new standard replacing DES, and is based on the Rijndael
algorithm, but is also dependent on shared secret keys
Public-key encryption is based on each user having two keys:
o public key – publicly published key used to encrypt data, but cannot be used to decrypt data
o private key -- key known only to individual user, and used to decrypt data.
Need not be transmitted to the site doing encryption.
Encryption scheme is such that it is impossible or extremely hard to decrypt data given only the
public key.
The RSA public-key encryption scheme is based on the hardness of factoring a very large number
(100's of digits) into its prime components.
Authentication (Challenge response system)
Password based authentication is widely used, but is susceptible to sniffing on a network
Challenge-response systems avoid transmission of passwords
o DB sends a (randomly generated) challenge string to user
o User encrypts string and returns result.
o DB verifies identity by decrypting result
o Can use public-key encryption system by DB sending a message encrypted
using user‘s
public key, and user decrypting and sending the message back
Digital signatures are used to verify authenticity of data
o Private key is used to sign data and the signed data is made public.
o Any one can read the data with public key but cannot generate data without private key..
o Digital signatures also help ensure nonrepudiation: sender
cannot later claim to have not created the data
Digital Certificates
Digital certificates are used to verify authenticity of public keys.
Problem: when you communicate with a web site, how do you know if you are talking with the
genuine web site or an imposter?
o Solution: use the public key of the web site
o Problem: how to verify if the public key itself is genuine?
Solution:
lOMoARcPSD|47481031
o Every client (e.g. browser) has public keys of a few root-level certification authorities
o A site can get its name/URL and public key signed by a certification authority: signed
document is called a certificate
o Client can use public key of certification authority to verify certificate
o Multiple levels of certification authorities can exist. Each certification authority
presents its own public-key certificate signed by a higher level authority, and
Uses its private key to sign the certificate of other web sites/authorities
9. EMBEDDED SQL
Embedded SQL are SQL statements included in the programming language
The SQL standard defines embeddings of SQL in a variety of programming languages such as C,
Java, and Cobol.
A language to which SQL queries are embedded is referred to as a host language, and the SQL
structures permitted in the host language comprise embedded SQL.
The embedded SQL program should be preprocessed prior to compilation.
The preprocessor replaces embedded SQLrequests with host language declarations and procedure
calls.
The resulting program is compiled by host language compiler.
EXEC SQL statement is used to identify embedded SQL request to the preprocessor
o EXEC SQL <embedded SQL statement > END_EXEC
Note: this varies by language (for example, the Java embedding uses # SQL { …. }; , C
language uses semicolon instead of END_EXEC)
Example Query
From within a host language, find the names and cities of customers with more than the variable
amount dollars in some account.
Specify the query in SQL and declare a cursor for it
EXEC SQL
declare c cursor for
select depositor.customer_name, customer_city
from depositor, customer, account
where depositor.customer_name = customer.customer_name
and depositor account_number = account.account_number
and account.balance > :amount
lOMoARcPSD|47481031
END_EXEC
The open statement causes the query to be evaluated
EXEC SQL open c END_EXEC
The fetch statement causes the values of one tuple in the query result to be placed on host
language variables.
EXEC SQL fetch c into :cn, :cc END_EXEC
Repeated calls to fetch get successive tuples in the query result
A variable called SQLSTATE in the SQL communication area (SQLCA) gets set to ‗02000‘ to
indicate no more data is available
The close statement causes the database system to delete the temporary relation that holds the result
of the query.
EXEC SQL close c END_EXEC
Note: above details vary with language. For example, the Java embedding defines Java iterators to step
through result tuples.
Updates Through Cursors
Can update tuples fetched by cursor by declaring that the cursor is for update
o declare c cursor for select * from account
where branch_name = ‗Perryridge‘
for update
To update tuple at the current location of cursor c
update account set balance = balance + 100 where current of c
SQLPREPPED identifies the SQL variables. It holds the compiled version of SQL statement whose
source form is given in SQLSOURCE.
The prepare statement takes the source statement and prepares it to produce an executable version,
which is stored in SQLPREPPED.
EXECUTE statement executes the SQLPREPPED version.
EXECUTE IMMEDIATE statement combines the functions of PREPARE and EXECUTE in a
single operation.
Call Level Interface
The SQL Call Level Interface (SQL/CLI) is based on Microsoft‘s Opensoure DataBase
Connectivity
(ODBC).
They allow the applications to be written from which the exact SQL code is not known until run
time.
Two principle reason for using SQL/CLI
Dynamic SQL is a source code statement. Dynamic SQL requires some kind of SQL
compiler to process the operations like PREPARE, EXECUTE. SQL/CLI does not requir any
special compiler instead it uses the host language compiler. It is in object code form.
SQL/CLI is DBMS independent i.e, it allows creation of several applications with different
DBMS.
Example for SQL/CLI
strcpy (sqlsource, ― Delete from account where amount>10000);
rc = SQLExecDirect(hstmt,(SQLCHAR*)sqlsource,SQL.NTS);
Strcpy is used to copy the source form of delete statement into sqlsource variable.
SQLExecDirect executes the SQL Statement contained in sqlsource anf assigns the return code to
the variable rc.
Two standards connects an SQL database and performs queries and updates.
Opensoure DataBase Connectivity (ODBC) was initially developed for C language and extended to
other languages like C++, C# amd Visual Basic.
Java DataBase Connectivity (JDBC) is an application program interface foe java language.
The users and applications connects to an SQL server establishing a session, executes a series of
atatements and finally disconnects the session.
In addition to normal SQL commands, a session can also contains commands to commit the work
carried out or rollback the work carried out in a session.
lOMoARcPSD|47481031
11.VIEWS
A View is an object that gives the user a logical view of data from an underlying tables or tables.
It is not desirable for all users to see the entire logical model.
Security consideration may require that certain data be hidden from users.
Any relation that is not part of the logical model, but is made visible to a user as a virtual relation,
is called as view
Creating of Views
Updation of a View
Views can be used for data manipulation i.e, the user can perform insert, Update,a nd the delete
operations on the view.
The views on which data manipulation can be done are called updatable Views, the views that do
not allow data manipulation are called Readonly Views.
Destroying a view
lOMoARcPSD|47481031
2 Mark Questions
16 Mark Questions
1. Discuss about various operations in Relational algebra (Fundamental operations – Additional operation)
2. Discuss in detail about an Integrity, Triggers and Security.
3. Explain Embedded and Dynamic SQL.
4. Explain String Operations and Aggregate functions used in SQL.
5. Explain detail in domain relational calculus.
6. Explain detail in Tuple relational calculus.
7. Explain detail in distributed databases and client/server databases.
UNIT III
lOMoARcPSD|47481031
DATABASE DESIGN
1. INTRODUCTION
Relational database design requires a ―good‖ collection of relation schemas.
Pit-falls in Relational Database Design
A bad design may lead to
• Repetition of information
• Inability to represent certain information
Design Goals
a) Avoid redundant data.
b) Ensure that relationships among attributes are represented.
c) Facilitate the checking of updates for violation of database integrity constraints.
Example: Consider the relation schema:
Lending-schema= (branch_name, branch_city, assets,c ustomer_name, loan_no, amount).
branch_name branch_city assets customer_name loan_no amount
Here branch Downtown details are represented 2 times. This leads to a redundancy problem.
Redundancy leads to
(a) Wastage of space.
(b) Complicates updating, introduces inconsistency.
Null values
(a) Cannot store information about a branch if no loan exist.
lOMoARcPSD|47481031
(b) Can use null values, but they are difficult to handle.
2. FUNCTIONAL DEPENDENCIES
Functional dependencies are constraints on the set of legal relations.
The functional dependency holds on R if and only if for any legal relations r(R), whenever
any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,
t1[ ] = t2 [ ] t1[ ] = t2 [ ]
It requires that the value for a certain set of attributes determines uniquely the value for another set
of attributes.
In a given relation R, X and Y are attributes. Attributes Y is functionally dependent on attribute X if
each value of X determines exactly one value of Y, which is represented as
X –> Y
i.e., ―X determines Y‖ or ―Y is functionally dependent on X‖
X –> Y does not imply Y –> X
For example, in a student relation the value of an attribute ―Marks‖ is known then the value of an
attribute ―Grade‖ is determined since
Marks –> Grade
Types
(a) Full functional dependency
(b) Partial functional dependency
(c) Transitive functional dependency
(a) Full dependencies
In a relation R, X and Y are attributes. X functionally determines Y. Subset of X should not
functionally determine Y.
In the above example marks is fully functionally dependent on student_no and course_no together
and not on subset of {student_no, course_no}.
This means marks cannot be determined either by student_no or course_no alone.It can be
determined only using student_no and course_no together.
Hence marks are fully functionally dependent on {student_no, course_no}.
(b) Partial dependencies
lOMoARcPSD|47481031
X –> Y
Y –>
ZX
For example, –> Z grade depends on marks and in turn mark depends on {student_no
course_no}, hence Grade depends fully transitively on {student_no & course_no}.
Use of Functional Dependencies
We use functional dependencies to:
o Test relations to see if they are legal under a given set of functional dependencies.
If a relation r is legal under a set F of functional dependencies, we say that r satisfies
F.
o specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R satisfy the set of functional
dependencies F.
2.1. CLOSURE OF A SET OF FUNCTIONAL DEPENDENCIES
Given a set of functional dependencies F, there are certain other functional dependencies that are
logically implied by F.
o For example: If A B and B C, then we can infer that A C
The set of all functional dependencies logically implied by F is the closure of F.
We denote the closure of F by F+.
We can find all F+ by applying Armstrong’s Axioms:
o Reflexivity Rule
If is a set of attributes and , then holds.
o Augmentation Rule
If , then is a set of attributes, then holds.
o Transitivity Rule
If holds and holds then holds.
These rules are
lOMoARcPSD|47481031
Procedure for Computing F+: To compute the closure of a set of functional dependencies F:
F+=F
repeat
for each functional dependency f in F+
lOMoARcPSD|47481031
end
3. result = ABCG (A C)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
Testing for super key:
+, +
o To test if is a super key, we compute and check if contains all attributes of R.
Testing functional dependencies
o To check if a functional dependency holds (or, in other words, is in F+), just check if
+
.
+
o That is, we compute by using attribute closure, and then check if it contains .
o Is a simple and cheap test, and very useful
2.3. CANONICAL COVER
If a relational schema R has a set of functional dependencies.
Whenever a user performs an update on the relation, the database system must ensure that the update
does not violate any functional dependencies.
The system must roll back the update if it violates any functional dependencies in the set F.
The violation can be checked by testing a simplified set of functional dependencies.
If simplified set of functional dependency is satisfied then the original functional dependency is
satisfied and vice versa.
Sets of functional dependencies may have redundant dependencies that can be inferred from the
others.
A canonical cover of F is a ―minimal‖ set of functional dependencies equivalent to F,
having no
redundant dependencies or redundant parts of dependencies
Extraneous Attributes
An attribute of a functional dependency is said to be extraneous if we can remove it without
changing the closure of the set of functional dependencies.
Consider a set F of functional dependencies and the functional dependency in F.
o Attribute A is extraneous in if A and F logically implies
(F – { }) {( – A) }.
lOMoARcPSD|47481031
3. NORMALIZATION
Normalization of data is a process of analyzing the given relational schema based on their functional
dependencies and primary key to achieve the desirable properties of
Minimize redundancy
Minimize insert, delete and update anomalies during database activities
Normalization is an essential part of database design.
The concept of normalization helps the designer to built efficient design.
Purpose of Normalization:
Minimize redundancy in data.
Remove insert, delete and update anomaly during database activities.
lOMoARcPSD|47481031
Department
lOMoARcPSD|47481031
In our example Departmentrelation is not in 1NF because Dlocation has multivalued attributes.
There are 3 main techniques to achieve 1NF for such relation.
1. Remove the Dlocation that violates 1NF and place it in a separate relation Dept_location
along with primary key Dnumber of department. The primary key of this relation is the
combination of {Dnumber, Dlocation}.
Dept_location
Dnumber Dlocation
5 Bellaire
5 Sugsrland
5 Houston
4 Stafford
1 Houston
2. Expand the key so that there will be separate tuple in the original department relation. The
primary key becomes {Dnumber, Dlocation}. This solution has the disadvantage of
introducing redundancy in the relation.
3. If a maximum number of values is knowm for the attribute. For example, if it is known that
atmost three locations can exist for a department, and then replace Dlocation by Dlocation1,
Dlocation 2, and Dlocation3. This solution has the disadvantage of introducing null values if
most departments have fewerthan three locations.
EMP_PROJ1 EMP_PROJ2
EidPnumberHours
Eid Ename
lOMoARcPSD|47481031
In the above example EMP_PROJ. Ssn and Pnumber are primary key.
The table is in 1NF.
FD1 is in 2NF but FD2 and FD3 violates 2Nf.
The Ename, Pname, Plocation in FD2 and FD3 are partially dependent on the primary key attributes
Ssn and Pnumber.
A relation which is not in second normal form can be made to be in 2NF by decomposing the
relation into a number such that each nonprime attribute is fully functional dependent on the primary
key.
lOMoARcPSD|47481031
FD1
FD2
FD3
EP1
Ssn Pnumber Hours
FD1
EP2
Ssn Ename
FD2
EP3
Pnumber Pname Plocation
FD
3
EMP_DEPT
lOMoARcPSD|47481031
ED1
Ename Eid DOB Address Dnumber
ED2
DnumberDnameDMGRid
The dependency EidDMRid is transitive through Dnumber in EMO_DEPT, because both the
dependencies EidDnumber and DnumberDMGRid hold.
Dnumber is neither a key itself nor a subset of key of EMP_DEPT. therefore the EMP_DEPT
relational schema is not in 3NF.
The relation is in 2NF because there is no partial dependencies on the key attribute.
We can normalize EMP_DEPT by decomposing it into two 3NF relational schemas ED1 and ED2.
For relations where primary key Decomposes and set up a new relation
contains multiple attribute, no non key for each partial key with its dependent
2NF attribute should be functionally attributes. Make sure to keep a relation
dependent on a pert of primary key. with the original primary key and any
attributes that are fully FD on it.
Relations should not have a non key Decompose and set up a relation that
attribute functionally determined by includes the non key attributes that
3NF another non key attribute. i.e., there functionally determines other non-key
should be no transitive dependency of attributes.
a non key attribute on the primary key.
lOMoARcPSD|47481031
o R1 R2R1
o R1 R2R2
If it is decomposed into
8. DEPENDENCY PRESERVATION
Let F be a set of functional dependencies on a schema R, and let R1, R2, . . . , Rn be a decomposition of
R.
The restriction of F to Ri is the set Fi of all functional dependencies in F+ that include only
attributes of Ri.
Example
lOMoARcPSD|47481031
F = {A → B, B → C}
The restriction of F is A → C, since A → C is in F+, even though it is not in F.
Even though F’≠ F, F‘+=F+ where F‘=F1 F2 F3 Fn.
The decomposition having the property F‘ =F is a dependency-preserving decomposition.
+ +
The input to the algorithm is a set of decomposed relational schemas D = {R1, R2, R3…,
Rn} and a
set F of functional dependencies.
This algorithm is expensive since it requires the computation of F+
The second alternative method to calculate dependency preservation is as follows.
The test is applied to each { } in F
result = α
while (changes to result) do
for each Ri in the decomposition
t = (result ∩Ri)+ ∩ Ri
result = result t
If result contains all attributes in β, then the functional dependency α → β is preserved.
Example:
EMP
Ename Pname Dname
Smith X John
Smith Y Anna
Smith X Anna
Smith Y John
EMP_PROJECTS EMP_DEPENDENTS
The constraint states that every legal state r of R should have a nonadditive join decomposition into
R1, R2, R3… Rn i.e, for every such relation r we have
( R1(r), R2(r)… Rn(r)) = r
JD denoted as JD (R1, R2) implies an MVD (R1 R2) →→ (R1 - R2).
FIFTH NORMAL FORM (5NF)
A relational schema R is in fifth normal form or Project Join Normal Form (PJNF) with respect to a
set F of functional, multivalued and join dependency if, for every nontrivial join dependency JD (R 1, R2,
R3… Rn) in F+, every Ri is a superkey of R.
SUPPLY
Smith Bolt Y
lOMoARcPSD|47481031
Smith X Bolt X
Smith Y Nut Y
Adamsky Y Bolt Y
R1 Nut Z
Walton Z
R2
R3 Nail X
Adamsky X
ANOMALIES IN DATABASES
There are three types of anomalies. They are
1. Insert Anomalies
2. Update Anomalies
3. Delete Anomalies
1. Insert Anomalies:
The inability to insert part of information into a relational schema due to the unavailability of part of
the remaining information is called Insert Anomalies.
Example: If there is a guid having no registered under him, then we can not insert the
guide‘s
information in the schema project.
2. Update Anomalies:
3. Delete Anomalies:
If the deletion of some information leads to loss of some other information, then we say there is a
deletion anomaly.
Example: If a guide guides one student and if the student discontinues the course then the
information about the guid will be lost.
2 Mark Questions
lOMoARcPSD|47481031
16 Mark Questions
1. Explain detail about Functional Dependencies.
2. Explain detail about first, second and third normalization form.
3. Explain detail about Boyce code normal form and fifth normalization form.
4. Explain detail in decomposition using Functional Dependencies.
5. Explain detail in decomposition using Multi-Valued Dependencies.