0% found this document useful (0 votes)
2 views

i Bcom(CA) Database Management Systems

The document outlines a comprehensive syllabus for a Database Management System course, covering various units that include topics such as database applications, data models, SQL, database design, and system architectures. It emphasizes the importance of data independence, efficient data access, and data integrity while detailing different database models like relational, hierarchical, and object-oriented. Additionally, it provides a structured approach to database design and management, including the use of entity-relationship models and normalization techniques.

Uploaded by

govindanm223
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

i Bcom(CA) Database Management Systems

The document outlines a comprehensive syllabus for a Database Management System course, covering various units that include topics such as database applications, data models, SQL, database design, and system architectures. It emphasizes the importance of data independence, efficient data access, and data integrity while detailing different database models like relational, hierarchical, and object-oriented. Additionally, it provides a structured approach to database design and management, including the use of entity-relationship models and normalization techniques.

Uploaded by

govindanm223
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 164

CONTENTS

UNIT – I..................................................................................... 1

1.1 INTRODUCTION...............................................................................1

1.2 DATABASE SYSTEM APPLICATIONS.............................................1

1.3 PURPOSE OF DATABASE SYSTEMS..............................................2

1.4 VIEW OF DATA................................................................................3

1.5 DATA MODELS.................................................................................5

1.6 DATABASE LANGUAGES...............................................................11

1.7 RELATIONAL DATABASE..............................................................12

1.8 DATABASE DESIGN.......................................................................13

1.9 DATA STORAGE AND QUERY.......................................................14

1.10 TRANSACTION MANAGEMENT..................................................14

1.11 DATABASE SYSTEM ARCHITECTURE........................................15

1.12 DATABASE USERS AND ADMINISTRATORS............................18

1.13 HISTORY OF DATABASE SYSTEM.............................................20

UNIT – II...................................................................................23

2.1 RELATIONAL DATABASE..............................................................23

2.2 STRUCTURE OF RELATIONAL DATABASE..................................23

2.3 DATABASE SCHEMAS...................................................................24

2.4 KEY.................................................................................................25

2.5 SCHEMA DIAGRAMS.....................................................................25

2.6 RELATIONAL QUERY LANGUAGE................................................26

2.7 SQL.................................................................................................27

2.8 OVERVIEW OF THE SQL QUERY LANGUAGE.............................27

2.9 SQL DATA DEFINITION.................................................................28

2.10 BASIC STRUCTURE OF SQL QUERIES.......................................28

2.11 SET OPERATIONS.......................................................................32

2.12 NULL VALUES..............................................................................33


2.13 AGGREGATE FUNCTIONS...........................................................34

2.14 NESTED SUBQUERIES................................................................35

2.15 MODIFICATION OF DATABASE..................................................36

UNIT – III..................................................................................39

3.1 INTERMEDIATE SQL......................................................................39

3.2 JOIN EXPRESSIONS.......................................................................39

3.3 VIEW...............................................................................................46

3.4 TRANSCATIONS.............................................................................49

3.5 AUTHORIZATION...........................................................................50

3.6 ADVANCED SQL.............................................................................53

3.7 FUNCTIONS AND PROCEDURES..................................................53

3.8 TRIGGERS......................................................................................56

3.9 FORMAL RELATIONAL QUERIES LANGUAGES...........................58

3.10 THE RELATIONAL ALGEBRA......................................................59

3.11 TUPLE RELATIONAL CALCULUS................................................66

3.12 DOMAIN RELATIONAL CALCULUS...........................................67

UNIT - IV..................................................................................69

4.1 DATABASE DESIGN AND THE E-R MODEL.................................69

4.2 OVERVIEW OF THE DATA PROCESS...........................................69

4.3 THE ENTITY-RELATIONSHIP MODEL...........................................70

4.4 CONSTRAINTS...............................................................................76

4.5 ENTITY-RELATIONSHIP DIAGRAM..............................................79

4.6 ENTITY-RELATIONSHIP DESIGN ISSUES....................................82

4.7 EXTENDED E-R FEATURES...........................................................88

4.8 RELATIONAL DATABASE DESIGN................................................98

4.9 ATOMIC DOMAIN AND FIRST NORMAL FORM..........................99

4.10 DECOMPOSITION USING FUNCTIONAL DEPENDENCY.........104

4.11 FUNCTIONAL-DEPENDENCY THEORY.....................................105


4.12 DECOMPOSITION USING MULTIVALUED DEPENDENCIES...108

4.13 MORE NORMAL FORM..............................................................109

UNIT - V.................................................................................114

5.1 DATABASE SYSTEM ARCHITECTURE........................................114

5.2 CENTRALIZED AND CLIENT-SYSTEM ARCHITECTURES..........114

5.3 SERVER SYSTEM ARCHITECTURE.............................................117

5.4 PARALLEL SYSTEMS...................................................................119

5.5 DISTRIBUTED SYSTEMS.............................................................124

5.6 NETWORK TYPES........................................................................126

5.7 DISTRIBUTED DATABASES........................................................127

5.8 HOMOGENEOUS AND HETROGENEOUS DATABASES.............127

5.9 DISTRIBUTED DATA STORAGE..................................................127

5.10 DISTRIBUTED TRANSACTIONS................................................129

5.11 COMMIT PROTOCOLS...............................................................132

5.12 CLOUD BASED DATABASE.......................................................137

5.13 DIRECTORY SYSTEMS..............................................................138

MODEL QUESTION PAPER-I......................................................140

MODEL QUESTION PAPER-II.....................................................142

OBJECTIVE TEST-I....................................................................................144
SYLLABUS

ALLIED II - DATABASE MANAGEMENT SYSTEM

UNIT –I: Introduction: Database System Application – Purpose of


Database System – View of Data – Data Model – Database Language –
Relational Database – Database Design – Data Storage and Query –
Transaction Management – Database Architecture - Database User and
Administrator. – History of Database System.

UNIT –II: Relational Database: Structure of Relational Databases –


Database Schemas – Keys – Schema Diagrams - Relational Query
Language. SQL: Overview of the SQL Query Language - SQL Data
Definition – Basic Structure of SQL Queries - Set operations – Null Values –
Aggregate Functions – Nested Sub queries – Modification of the Database.

UNIT – III: Intermediate SQL: Join Expressions – View – Transactions -


Authorization. Advance SQL: Functions and Procedures – Triggers - Formal
Relational Queries Languages: The Relational Algebra – The Tuple
Relational Calculus – The Domain Relational Calculus.

UNIT – IV: Database Design and the E-R Model: Overview of the Data
Process – The Entity-Relationship Model – Constraints - Entity-Relationship
Diagram - Entity-Relationship Design Issues – Extended E-R Features.
Relational Database Design: Atomic Domain and First Normal Form –
Decomposition using Functional Dependency - Functional Dependency
Theory - Decomposition using Multivalued Dependencies – More Normal
Form.

UNIT-V: Database System Architectures: Centralized and Client-System


Architectures – Server System Architectures – Parallel Systems –
Distributed Systems – Network Types. Distributed Databases:
Homogeneous and Heterogeneous Databases - Distributed Data Storage -
Distributed Transaction - Commit Protocols – Cloud Based Databases –
Directory Systems.

TEXT BOOK: 1. “Database System Concepts” - Abraham Sliberschatz,


Henery F.Korth, S.Sudarshan,6 Edition. MC Graw Hill International Edition
UNIT – I
1.1 INTRODUCTION
Database management system is a collection of interrelated data
and a set of programs to access those data.
A collection of data is usually referred to as the database which
contains the information.

1.2 DATABASE SYSTEM APPLICATIONS


Database management system is software designed to assist in
maintaining
And utilizing large collection of data and need for such systems.
Databases are widely used in the following applications:-
 Banking:-
Database system used for customer information, accounts,
loans, and
banking transactions.
 Airlines:-
For reservations and schedule information. Airlines were
among the first to use databases in a geographically distributed
manner.
 Universities:-
For Student information, course registrations and grades.
 Credit and Transactions:-
For purchases on credit cards and generation of monthly
statements.
 Telecommunications:-
For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing
information about the communication networks.
 Finance:-
For storing information about holdings, sales and purchases of
financial instruments such as stocks and bonds.Also for storing real-

1
time market data to enable on-line trading by customers and
automated.
 Sales:-
For customer, product and purchase information.

2
 On-line retailers:-
For sales data noted above plus on-line order tracking,
generation of
recommendation lists , and maintenance of on-line product
evaluations.
 Manufacturing:-
For management of the supply chain for tracking production of
items in
factories, inventories of items in warehouses, stores and orders for
items.
 Human resources:-
For information about employees, salaries, payroll taxes,
benefits, and for generation of paychecks.

1.3 PURPOSE OF DATABASE SYSTEMS

 Data Independence:-

Application programs should be as independent as possible from


details of data representation and storage. The DBMS also provide
an abstract view of data.
 Efficient Data Access:-
DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently.
 Data Integrity and Security:-

If data is accessed through the DBMS, can enforce integrity


constraints on the data. It also provides security by using many
levels of abstraction.
 Data Administration:-

One of the main reasons for using DBMS is to have central


control of both the data and programs that access those data. The
person who has such central control over the system is called DBA
(Data Base Administrator)
 Concurrent Access and Crash Recovery:-
2
The overall performance of the system is improved and a faster
response time is possible, many systems allow multiple users to
update data simultaneously. The DBMS protects users from the
effects of system failures

3
 Reduced application development time:-
The DBMS supports many important functions that are common
to applications accessing data stored in the DBMS.
1.4 VIEW OF DATA
The overview of data used to explain how to organize information in
a DBMS and to maintain it and retrieve it effectively. That is , it is used to
explain how to design a database and use a DBMS effectively.
 Database Design:-
It is used to describe a real world enterprise in terms of the
data stored in DBMS.
It explains about factor to be considered while the time of
data organization.
 Data analysis:-
It describes about the of query to the user and it helps them
to answer.
 Concurrency and Robustness:-
It describes the concurrency that is how DBMS allow many
users to access data concurrently. Also provides guidelines how
DBMS protects the data in the event of system failures.
 Efficiency and Scalability:-

The primary goal of the database is to store and retrieve the


information that is both efficient and convenient manner.
 Data Abstraction:-

A major purpose of a database system is to provide users with an


abstract view of data. That is the system hides the complexity from
users through several levels of abstraction, to simplify users
interactions with the system.
 Physical level

The lowest level of abstraction describes how the data are


actually store

4
 Logical level:-
This describes what data are stored in the database and what are
the relationship exist among those data. The entire database is thus
described in terms of a small number of relatively simple structures.
 View level:-

The highest level of abstraction describes only part of the entire


database. Users don’t know about the entire database and their
complexities. such users need to access only a part of the
database.
Example: Students (sid: String ,Login: String , Age : Integer, Gpa :
real)
Faculty(fid: String , fname : String , Sal: real )
Courses(old : string, cname : String , credits: integer)

View 1 View 2 ……. View n

External Schema External Schema External Schema


1 2 n

Conceptual
schema

Physical
schema

Disk

Three Levels of abstraction

5
1.5 DATA MODELS
The three levels of data modeling, conceptual data model, logical
data model, and physical data model.
 Hierarchical database model
 Relational model
 Network model
 Object-oriented database model
 Entity-relationship model
 Conceptual Data Model
 Logical Data Model
 Physical Data Model
Hierarchical Database Model:-
The hierarchical model organizes data into a tree-like structure,
where each record has a single parent or root. Sibling records are sorted
in a particular order. That order is used as the physical order for storing
the database. This model is good for describing many real-world

relationships.

Relational Model:-
The most common model, the relational model sorts data into
tables, also known as relations, each of which consists of columns and

6
rows. Each column lists an attribute of the entity in question, such as
price, zip code, or birth date.
Together, the attributes in a relation are called a domain. A
particular attribute or combination of attributes is chosen as a primary key
that can be referred to in other tables, when it’s called a foreign key.
Each row, also called a tuple, includes data about a specific instance
of the entity in question, such as a particular employee.
The model also accounts for the types of relationships between
those tables, including one-to-one, one-to-many, and many-to-many
relationships. Here’s an example:

Network Model:-
The network model builds on the hierarchical model by allowing
many-to-many relationships between linked records, implying multiple
parent records.
Based on mathematical set theory, the model is constructed with
sets of related records. Each set consists of one owner or parent record
and one or more member or child records. A record can be a member or
child in multiple sets, allowing this model to convey complex relationships.
It was most popular in the 70s after it was formally defined by the
Conference on Data Systems Languages (CODASYL).

7
Object-Oriented Database Model:-
This model defines a database as a collection of objects, or reusable
software elements, with associated features and methods. There are
several kinds of object-oriented databases:
A multimedia database incorporates media, such as images, that
could not be stored in a relational database.
A hypertext database allows any object to link to any other
object. It’s useful for organizing lots of disparate data, but it’s not ideal for
numerical analysis.
The object-oriented database model is the best known post-
relational database model, since it incorporates tables, but isn’t limited to
tables. Such models are also known as hybrid database models.
Entity-Relationship Model:-
This model captures the relationships between real-world entities
much like the network model, but it isn’t as directly tied to the physical
structure of the database. Instead, it’s often used for designing a
database conceptually.
Here, the people, places, and things about which data points are
stored are referred to as entities, each of which has certain attributes that
together make up their domain. The cardinality, or relationships between
entities, are mapped as well

8
Conceptual Data Model:-
A conceptual data model identifies the highest-level relationships
between the different entities. Features of conceptual data model include:
 Includes the important entities and the relationships among them.
 No attribute is specified.
 No primary key is specified.

The figure below is an example of a conceptual data model.

From the figure above, we can see that the only information shown via
the conceptual data model is the entities that describe the data and the
relationships between those entities. No other information is shown
through the conceptual data model.

Logical Data Model:-


A logical data model describes the data in as much detail as possible,
without regard to how they will be physical implemented in the database.
Features of a logical data model include:
 Includes all entities and relationships among them.
 All attributes for each entity are specified.

 The primary key for each entity is specified.

 Foreign keys (keys identifying the relationship between different


entities) are specified.

9
 Normalization occurs at this level.
The steps for designing the logical data model are as follows:-
 Specify primary keys for all entities.
 Find the relationships between different entities.
 Find all attributes for each entity.
 Resolve many-to-many relationships.
 Normalization.

The figure below is an example of a logical data model.

Comparing the logical data model shown above with the conceptual
data model diagram, we see the main differences between the two:
 In a logical data model, primary keys are present, whereas in a
conceptual data model, no primary key is present.
 In a logical data model, all attributes are specified within an entity.
No attributes are specified in a conceptual data model.

 Relationships between entities are specified using primary keys and


foreign keys in a logical data model. In a conceptual data model, the
relationships are simply stated, not specified, so we simply know
that two entities are related, but we do not specify what attributes
are used for this relationship.
10
Physical Data Model:-
Physical data model represents how the model will be built in the
database. A physical database model shows all table structures, including
column name, column data type, column constraints, primary key, foreign
key, and relationships between tables.
Features of a physical data model include:-
 Specification all tables and columns.
 Foreign keys are used to identify relationships between tables.
 De-normalization may occur based on user requirements.

 Physical considerations may cause the physical data model to be


quite different from the logical data model.

 Physical data model will be different for different RDBMS. For


example, data type for a column may be different between MySQL
and SQL Server.
The steps for physical data model design are as follows:-
 Convert entities into tables.
 Convert relationships into foreign keys.
 Convert attributes into columns.

 Modify the physical data model based on physical constraints /


requirements.

The figure below is an example of a physical data model.

11
Comparing the logical data model shown above with the logical data
model diagram, we see the main differences between the two:
 Entity names are now table names.
 Attributes are now column names.

 Data type for each column is specified. Data types can be different
depending on the actual database being used.

1.6 DATABASE LANGUAGES

Database System Provides Two Types Of Languages.

 Data Definition Language


 Data Manipulation Language
Data Definition Language:-
A database schema by a set of definitions expressed by a special
language called DDL .DDL stands for tables that is stored in a specified file
called Data Dictionary. Data Dictionary is a file that contains meta
data( i.e. data about data).
The storage and access methods specified by database system are
called data storage and definition language.
Example:
 Create, Alter, and drop table commands.
 Create table account(account-number char(10),balance integer(4));
Data Manipulation Language:-
Data Manipulation means,
 Retrieval of information stored in database.
 Insertion of new information into the database.
 The deletion of information from the database.
 Modification of information stored in database.
 A DML is a language that enables users to access or manipulate
data as organized by appropriate data model.

12
Two types of DML are:-
Procedural DMLs:-
It requires a user to specify what data are needed and how to get
those data.
Nonprocedural DMLs :-
It requires a user to specify what data are needed without specifying
how to get those data.
 It is also referred to as declarative DMLs.
 It is easy to learn.DML component of SQL language is
nonprocedural.
 A query is a statement requesting the retrieval of information.
The portion of a DML that involves information retrieval is called
a query language.
Example: Select, insert, Update, and Delete commands
Select customer. Customer _ name
From Customer
Where customer. Customer_ id=192-83-7465
1.7 RELATIONAL DATABASE
 A relational database store data in a series of tables so that the data
models a mathematical theory of relations.
 The model allows for queries based on projection, selection and join,
among other operations, and connect the data in the tables by way
of keys. The queries are expressed in a standard syntax called SQL,
the Standard Query Language, which is common to all various
vendors of relational databases.
 The theory of relations states that data is arranged as various sets
of tuples, called relations, where a tuple is collection of values for
attributes. A relation states which attributes it collects.
 Concretely speaking, the attributes are the columns of a table, and
the tuples are rows in the table.
 Constraints among the attributes will allow only certain tuples to be
valid members of the relation, and the database should not allow

13
rows into be inserted in the table if they would violate the
constraints.
For instance, the mathematical theory says that if two tuples agree in the
value of all attributes, they are the same tuple.
In a table, it is possible for two distinct rows to contain the same data for
all columns. However, the database should prevent this from happening
because that would not be consistent with the mathematical model.

1.8 DATABASE DESIGN


 Database schema is the logical design of the database. Database
instance is the snapshot of the data in the database at a given
instant in time.
 Concept of relation corresponds to variable of programming
language. Concept of relation schema corresponds to type definition
of programming language.
 Using lower case names for relation and uppercase for relation
schema.
Example: Account –schema to denote relation schema for
relation account.

Account-schema=(account-number, branch name, balance)

Account is a relation on Account – schema.

Account (Account-schema)

BRANCH BRANCH ASSETS


NAME CITY

Ponnamapet, Salem 50000


Sbi

Hasthampatti Salem 70000


,Sbi

Thilainagar,S Trichy 50000


bi

14
Rs-Puram,Sbi Coimbatore 60000

Table: branch relation

Branch-schema=(Branch-name, Branch-city, assets)

15
1.9 DATA STORAGE AND QUERY
 Speed with which data can be accessed
 Cost per unit of data
Reliability:-
 Data loss on power failure or system crash
 Physical failure of the storage device
Can differentiate storage into:-
 Volatile storage: loses contents when power is switched off
 Nonvolatile storage: Contents persist even when power
is switched off. Includes secondary and tertiary storage, as well as 
battery backed up main memory.

1.10 TRANSACTION MANAGEMENT

TRANSACTION:-

 A transaction is a collection of operations that performs a single


logical function in a database application.
 The transaction manager is responsible for ensuring that the
database remains in a correct (consistent) state despite system
failures.
 Ensuring the atomicity and durability properties are the
responsibilities the database system itself specifically the
transaction management component
Properties of the transaction are called as ACID properties:-
 Atomicity
Either all operations of the transaction are reflected properly in
the
database or not. Example: Money transfer must happen entirely or
not at all.
 Consistency
Execution of a transaction in isolation preserves the consistency
of the database.
 Isolation

16
Data are scattered in various files and files may be in different
formats, writing new application programs to retrieve the
appropriate data is difficult.
 Durability
After a transaction completes successfully, the changes it has
made to the database persist, even if there are system failures.
1.11 DATABASE SYSTEM ARCHITECTURE
A database structure partitioned into several modules .The modules
deals with responsibilities of the overall system.
The DBMS accepts SQL commands generated from a variety of user
interfaces and returns the answers. When a user issues a query uses
information about how the data is stored to produce an efficient execution
plan for evaluating the query.
The architecture of a database system is greatly influenced by the
underlying complete system on which it runs, by such aspects of
computer architectures as networking, parallelism and distribution:
Networking of computers allows some tasks to be executes on
server system, and some tasks to be executed on client systems. This
division of work is client-server database systems.
Parallel processing within a computer system allows database
system activities to be speed up, allowing faster response to transactions,
as well as more transactions per second. The need for parallel query
processing has let to parallel database system.
Keeping multiple copies of the database across different sites also
allows large organizations to continue their database operations even
though the site is affected by a natured disaster. The distributed
database systems handle geographically or administratively distributed
data spread across multiple database system.
Functional components of DBMS are:-
 Storage manager Components
 Query processor Components

17
Naïve Applicatio Sophistica Database
users n ted users administra use
tor rs
Programm
ers
Applic
Applica Query Database
ation
tion tools scheme
Interfa
ce progra
ms

Embedded DML DML queries DDL


Precompiler interpreter

Application DML compiler and


program object organizer
code
Query Evaluation
Engine

Buffer File Authorizat Transactio


Manager manager ion and n manager
integrity
manager

Query processor

Storage manager

Indices
Data
Data
dictionar
files
Statisti y
cal data

System structure
Disk
storage

18
a. Storage manager
A storage manager is a program module that provides the interface
between the low level data stored in a database and the application
programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file
manager.
Translation of various DML statements into low level files system
commands are managed by storage manager.
The storage manager is responsible for storing, retrieving, and
updating data in the database.
Components of Storage manager:-
 Authorization and Integrity manager
Who tests for the satisfaction of integrity constraints and checks
the authority of users to access data,.
 Transaction manager
Who ensures that the database remains consistent state if the
system failures.
 File manager
Who manages the allocation of space on disk storage and data
structures used to represent information stored on disk.
 Buffer manager
Who is responsible for fetching data from disk storage into main
memory and decide what data to be cache in main memory.
Data Structures in Storage Manager:-
 Data files
Which stores the database itself.
 Data Dictionary
Which stores the Meta data about the structure of the database

19
 Indices
which provide fast access to data items that hold particular
values.
 Statistical data
Which stores statistical information about the data in the
database.
b. Query Processor Components
 DML compiler
It translates DML statements in a query language into low level
instructions that a query evaluation engine understands.

DML Compiler

Query language Low level


Instructions

(Input) (Output)

 Embedded DML Precompiler


It converts DML statements embedded in an application program
to normal procedure calls in the host language .The precompiler
must interact with DML compiler to generate appropriate code.
 DDL interpreter
It interprets DDL statements and records them in a set of tables
containing meta data.
 Query Evaluation Engine
It executes low level instructions generated by the DML compiler.

1.12 DATABASE USERS AND ADMINISTRATORS

Primary goal of database system is to retrieve information from


stored database. A person who is working with database can be
categorized into two types.
 Database Users
 Database Administrators

20
Database Users:-

Database system users classified into four types:-

 Naïve users
 Application programmers
 Sophisticated users
 Specialized users
Naïve Users:-
 They are unsophisticated users.
 They interact with the system by invoking the application programs.
Application programmers:-

 They are computer professionals who are specialized in writing,


developing or maintaining application programs.
Sophisticated Users:-
 These types of users are interacting with the system without writing
programs.

 They form their request in a database query language.


 Analyst comes under this category.
 They use two types of tools for their analysis task.

Online Analytical Processing (OLAP): This tool simplifies analyst’s


tasks by viewing summaries of data in different views.
Data mining: This tool helps to find the certain kinds of patterns in data.
Specialized Users:-
They are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework.
Database Administrators:-

A person who is responsible for the design, development, operation


protection, management, maintenance, and use of a database is called
database administrator. The functions of DBA includes:

Schema Definition:-

21
It creates the original database schema and executing a set of data
definition statements in the DDL. Storage structure and Access – method
Definition
A DBA creates appropriate storage structure and access methods by
writing a set of definitions, which is a translated by data-storage and data
definition language compiler.
Schema and physical organization modification:-
DBA carries changes to the schema and physical organization to
reflect changes needed by organization.
Granting of authorization for data access:-
The authorization information is kept in a special system structure.
The database system consults whenever someone attempts to access the
data in the system.
Routine maintenance:-
 Ensuring that enough free disk space is available for normal
operations and upgrading space as required Periodically backing up
the data.
 Monitoring jobs.

1.13 HISTORY OF DATABASE SYSTEM

 Database Management System allows a person to organize, store,


and retrieve data from a computer. It is a way of communicating
with a computer’s “stored memory.” In the very early years of
computers, “punch cards” were used for input, output, and data
storage.
 Punch cards offered a fast way to enter data, and to retrieve it.
Herman Hollerith is given credit for adapting the punch cards used
for weaving looms to act as the memory for a mechanical tabulating
machine, in 1890. Much later, databases came along.
 Databases (or DBs) have played a very important part in the recent
evolution of computers. The first computer programs were
developed in the early 1950s, and focused almost completely on
coding languages and algorithms.
22
 At the time, computers were basically giant calculators and data
(names, phone numbers) was considered the leftovers of processing
information.
 Computers were just starting to become commercially available,
and when business people started using them for real-world
purposes, this leftover data suddenly became important.
 Enter the Database Management System (DBMS). A database, as a
collection of information, can be organized so a Database
Management System can access and pull specific information. In
1960, Charles W. Bachman designed the Integrated Database
System, the “first” DBMS. IBM, not wanting to be left out, created a
database system of their own, known as IMS. Both database
systems are described as the forerunners of navigational databases.
 By the mid-1960s, as computers developed speed and flexibility,
and started becoming popular, many kinds of general use database
systems became available. As a result, customers demanded a
standard be developed, in turn leading to Bachman forming the
Database Task Group.
 This group took responsibility for the design and standardization of a
language called Common Business Oriented Language (COBOL). The
Database Task Group presented this standard in 1971, which also
came to be known as the “CODASYL approach.”
 The CODASYL approach was a very complicated system and
required substantial training. It depended on a “manual” navigation
technique using a linked data set, which formed a large network.
Searching for records could be accomplished by one of three techniques:
 Using the primary key (also known as the CALC key)
 Moving relationships (also called sets) to one record from another
 Scanning all records in sequential order
 Eventually, the CODASYL approach lost its popularity as simpler,
easier-to-work-with systems came on the market.

23
 Edgar Codd worked for IBM in the development of hard disk
systems, and he was not happy with the lack of a search engine in
the CODASYL approach, and the IMS model.
 He wrote a series of papers, in 1970, outlining novel ways to
construct databases. His ideas eventually evolved into a paper
titled, A Relational Model of Data for Large Shared Data
Banks, which described new method for storing data and processing
large databases.
 Records would not be stored in a free-form list of linked records, as
in CODASYL navigational model, but instead used a “table with
fixed-length records.”
 IBM had invested heavily in the IMS model, and wasn’t terribly
interested in Codd’s ideas.
 Fortunately, some people who didn’t work for IBM “were” interested.
In 1973, Michael Stonebraker and Eugene Wong (both then at UC
Berkeley) made the decision to research relational database
systems.
 The project was called INGRES (Interactive Graphics and Retrieval
System), and successfully demonstrated a relational model could be
efficient and practical. INGRES worked with a query language known
as QUEL, in turn, pressuring IBM to develop SQL in 1974, which was
more advanced (SQL became ANSI and OSI standards in 1986 1nd
1987).
 SQL quickly replaced QUEL as the more functional query language.
 RDBM Systems were an efficient way to store and process
structured data. Then, processing speeds got faster, and
“unstructured” data (art, photographs, music, etc.) became much
more common place.
 Unstructured data is both non-relational and schema-less, and
Relational Database Management Systems simply were not
designed to handle this kind of data.

24
UNIT – II
2.1 RELATIONAL DATABASE
A relational database is a digital database whose organization is
based on the relational model of data. The various software systems used
to maintain relational databases are known as a relational database
management system (RDBMS). Virtually all relational database systems
use SQL (Structured Query Language) as the language for querying and
maintaining the database.

2.2 STRUCTURE OF RELATIONAL DATABASE


 A relational database is based on the relational data model.
 Most commercial relational database systems employ the SQL query
language
 A relational database consists of a collection of tables, each of which
is assigned a unique name.
 A row in a table represents a relationship among a set of values.
 The concept of relation corresponds to the programming language
variable.
Basic Structure:-

Attributes with set of permitted values are called domain.


Example:
D1 D2 D3

ACCOUN BRANCH BALANC


T NAME E
NUMBER

101 Shevapet 500

102 Alagapura 700


m

103 Hasthamp 500


atti

25
104 Ammapet 600

Table: account relation

26
In this table let D1 denotes the set of all account numbers
 D2 denotes the set of all branch name
 D3 denotes the set of all balance.
 Any row of account consists of 3-tuples(V1,V2,V3)
Where ,
 V1 is an account number(V1 is in domain D1)
 V2 is an account number(V2 is in domain D2)
 V3 is an account number(V3 is in domain D3)
 In general , account will contain only a subset of the set of all
possible rows. Account is the subset of D1* D2* D3,……
 Generally ,a table of n attributes must be a subset of D1*D2*………
*Dn -1*Dn.
 Relation is a subset of Cartesian product of a list of domains . Tables
and relations are exactly the same . A tuple variable is a variable
whose domain is the set of all tuples. In the above table, we have 4
tuples.

2.3 DATABASE SCHEMAS

A database schema is the skeleton structure that represents the


logical view of the entire database. It defines how the data is organized
and how the relations among them are associated. It formulates all the
constraints that are to be applied on the data.
A database schema defines its entities and the relationship among
them. It contains a descriptive detail of the database, which can be
depicted by means of schema diagrams. It’s the database designers who
design the schema to help programmers understand the database and
make it useful.

27
28
A database schema can be divided broadly into two categories −

 Physical Database Schema − This schema pertains to the actual


storage of data and its form of storage like files, indices, etc. It
defines how the data will be stored in a secondary storage.
 Logical Database Schema − This schema defines all the logical
constraints that need to be applied on the data stored. It defines
tables, views, and integrity constraints.

2.4 KEY
Key is an attribute or collection of attributes that uniquely identifies
an entity among entity set.
For example, the roll number of a student makes him/her
identifiable among students.
 Super Key − A set of attributes (one or more) that collectively
identifies an entity in an entity set.
 Candidate Key − A minimal super key is called a candidate key.
An entity set may have more than one candidate key.
 Primary Key − A primary key is one of the candidate keys chosen
by the database designer to uniquely identify the entity set.

2.5 SCHEMA DIAGRAMS


A database schema represents the logical configuration of all or
part of a relational database. It can exist both as a visual representation
and as a set of formulas known as integrity constraints that govern a
database. These formulas are expressed in a data definition language,
such as SQL. As part of a data dictionary, a database schema indicates
how the entities that make up the database relate to one another,
including tables, views, stored procedures, and more.

29
Users can be granted access to log into individual schemas on a
case-by-case basis, and ownership is transferable. Since each object is
associated with a particular schema, which serves as a kind of
namespace, it’s helpful to give some synonyms, which allows other users
to access that object without first referring to the schema it belongs to.
These schemas do not necessarily indicate the ways that the data
files are stored physically. Instead, schema objects are stored logically
within a table space. The database administrator can specify how much
space to assign to a particular object within a data file.
Finally, schemas and table spaces don’t necessarily line up
perfectly: objects from one schema can be found in multiple table spaces,
while a table space can include objects from several schemas.

2.6 RELATIONAL QUERY LANGUAGE


Relational databases are the most widely used database paradigm,
and are the basis for commercial developments in object-relational
paradigms.
Relations are subsets of a cross product of domains (sets of values), and
relational models are defined as collections of relations. With a relational
model, we can use very abstract algebraic (relational algebra) and logic
(relational calculus) languages can be used to specify transactions. With a

30
relational DBMS, the internal and conceptual schemas, as well as the
views, are defined by relations.

2.7 SQL
SQL is a programming language for Relational Databases. It is
designed over relational algebra and tuple relational calculus. SQL comes
as a package with all major distributions of RDBMS.
SQL comprises both data definition and data manipulation
languages. Using the data definition properties of SQL, one can design
and modify database schema, whereas data manipulation properties
allows SQL to store and retrieve data from database.

2.8 OVERVIEW OF THE SQL QUERY LANGUAGE


SQL is the most widely used commercial relational database
language. it was originally developed at IBM’s san jose research
laboratory. This language originally called sequel and its name has
changed to SQL
IBM Sequel language developed as part of System R project at the
IBM San Jose Research Laboratory Renamed Structured Query Language
(SQL)
ANSI and ISO standard SQL:-
 SQL-86
 SQL-89
 SQL-92
 SQL:1999 (language name became Y2K compliant!)
 SQL:2003
Commercial systems offer most, if not all, SQL-92 features, plus
varying feature sets from later standards and special proprietary features.
Not all examples here may work on your particular system.

31
2.9 SQL DATA DEFINITION
SQL uses the following set of commands to define database schema

CREATE
 Creates new databases, tables and views from RDBMS.
For example:-
 Create database tutorials point;
 Create table article;
 Create view for_students;
DROP

 Drops commands, views, tables, and databases from RDBMS.


For example:-
 Drop object_type object_name;
 Drop database tutorials point;
 Drop table article;
 Drop view for_students;
ALTER

 Modifies database schema.

 Alter object_type object_name parameters;


For example−

 Alter table article add subject varchar;

 This command adds an attribute in the relation article with the


name subject of string type.

2.10 BASIC STRUCTURE OF SQL QUERIES

A relational database consists of a collection of relations, each of


which is assigned a unique name. SQL allows the use of null values to
indicate that the value either is unknown or does not exist. The basic
structure of an SQL expression consists of three clauses: select, from, and
where

32
 The select clause corresponds to the projection operation of the
relational algebra. It is used to list the attributes desired in the
result of a query.
 The from clause corresponds to the Cartesian-product operation
of the relational algebra.
 The where clause corresponds to the selection predicate of the
relational algebra.
SQL query form:-
select A1, A2,……,Ai

from r1,r2,….,rm

where P

each Ai represents an attribute, each ri a relation. P is a predicate.


The query is equivalent to the relational-algebra expression ∏

A1, A2, ….., An( P(r1 * r2 * ….* rm))


Select clause:-
 Find the names of all branches in the loan relation
select branch-name from loan
 Suppose we want to force the elimination of duplicates, we insert
the keyword distinct after select. We can rewrite the preceding
query as
select distinct branch-name from loan
Where clause:
 Find all loan numbers for loans made at the Perryridge branch with
loan amounts greater that $1200
select loan-number from loan where branch-name=
’Perryridge’ and amount >
1200
SQL uses the logical connectives and, or, and not and also includes
between comparison operator
 Find the loan number of those loans with loan amounts between
$90,000 and $100,000

33
select loan-number from loan where amount between 90000
and 100000
From clause:-
The from clause by itself defines a Cartesian product of the relations
in the clause.
Example: For all customers who have a loan from the bank, find their
names and loan a amount

select customer-name, borrower.loan-number, amount


from borrower, loan
where borrower.loan-no = loan.loan-number.
 A select clause corresponds to the projection operation of the relational
algebra.
 The from clause corresponds to the Cartesian product operation of the
relational algebra.
 The where clause corresponds to the selection predicate of the
relational algebra.
 The qualification in the where clause is a Boolean combination of
conditions.
 The distinct keyword is optional. It indicates that the table does not
contain duplicates.
 The default is that duplicates are not eliminated.
 The selection list is the list of column names of table.
 The from list in a from clause is the list of table names.
Example:
 Find the sname,sregno from student relation
select sname,sregno from student;
 Find the sname,sregno from student relation where smark1 greater
than 60
select sname,sregno from student where smark1>60
Syntax:

SELECT (DISTINCT) Select – list

34
FROM from – list
WHERE qualification
SELECT  Which specifies columns to be retained in the result
FROM  which specifies a gross – product of tables
WHERE which specifies selection condition on the tables.
 It is used to select the list from the condition
Example:
SELECT DISTINCT S.Sname. S.age FROM Sailors s
 It products each set of rows <sname, age> pair.
 It we omit DISTINCT then it returns a multiset of rows (ie) with
Duplicates.

Range Variable:-
SELECT S.sid, S.Sname, S.rating, S.age
FROM Sailors As S
WHERE S.rating > 7
Here As is called as range variable. This is convenient shorthand for
SQL
Example:

 Find the sid’s of sailors who have reserved a red boat.


SELECT R.Sid FROM Boats B, Reserves R
 Find the names of sailors who have reserved a red boat?
SELECT S.Sname FROM sailors S.Reserver R.
Boats B WHERE S.Sid = R.Sid AND R.Bid =
b.Bid AND B.Color= ‘red’
 Find the colors of boats reserved by lubber .
SELECT B.Color
FROM Sailors S.Reserves R.Boats B
WHERE S.Sid = R.Sid AND R.Bid AND S.Sname = ‘Libbs’

2.11 SET OPERATIONS


The following set operations are used in SQL. They are
 UNION
35
 INTERSECT
 EXCEPT
It is also called as a set – manipulation constructs. They return a
multiset of rows.
Union:-
The Union operation automatically eliminates duplicate values. If we
want to retain all duplicates we must write union all in place of union.
Example:
(select sname from student) union(select ename form
employee)
Intersect:-
The intersect operation automatically eliminates duplicate values. if
we want to retain all duplicates we must write intersect all in place of
intersect.

Example:
(select sname from student) intersect(select ename form
employee)
Except(-):-
The Except operation automatically eliminates duplicate values. if
we want to retain all duplicates we must write except all in place of
except.
Example:
(select sname from student) except(select ename form
employee)
2.12 NULL VALUES
SQL allows the use of null values to indicate the absence of
information about the value of an attribute. The keyword null is used to
test for null values. SQL also provides a special comparison operator is
null to test whether a column value is null, not null constraints are used
disallow the null values. The primary key constraints are not allowed to
take on null values.
Disadvantage:-
36
The issue in the presence of null values in the definition of when two
rows in a relation instance or regarded as duplicates. Count (*) handles
null values. All other aggregate operations simply discard null values.
Example:
Select loan-no from loan where amount is null
Disallowing Null values:-
 It can be done by Specifying NOT NULL as a part of the field
definition.
 EX: CHAR (20) NOTNULL
 The fields on a primary key are not allowed to take on null values.
2.13 AGGREGATE FUNCTIONS
To retrieve the data, we need to perform some Computation or
Summarization.
SQL Supports Five Aggregation Operators:-
 Count
 Sum
 Avg
 Max
 Min
Count:-
It is used to count the number of occurrence
Example:
Count the number of sailors
SELECT COUNT (*) FROM Sailors S
It computes the number of distinct sailor names.
Sum:-
It is used to sum to number of values.
Avg:-
It is used to calculate the average value
Example:
Find the average of all Sailors?
Find the average age of sailors with a rating of 10?

37
SELECT AVG(S.age) FROM sailors S where S.rating =10;
Max:-
It is used to return the maximum value
Example:
find the name and age of Oldest Sailor
SELECT S.Sname, Max (s.age) FROM Sailors
SELECT S.Sname,S.age FROM Sailors S WHERE S.age =
(SELECT MAX(S2.age) FROM Sailors S2)

38
Min:-
It is used to count the number of occurrence

Example:
Find the name and age of Oldest Sailor
SELECT S.Sname, Max(s.age)FROM Sialors

2.14 NESTED SUBQUERIES


 It is a Query that has another query embedded with it
 The embedded query is called a sub – query
 The inner sub – query could Independent of the Outer – query.
 Find the names of sailors who have reserved the boat 103
SELECT S.Sname
FROM Sailors S
WHERE S.Sid IN (SELECT R.Sid FROM Reserver R WHERE
R.Bid = 103)
Multiple - Nested subqueries”-
It contains multiple queries embedded within it
Example:
 Find the names of sailors who have reserved a red boat?
SELECT S.Sname FROM Sailor S
WHERE S.Sid IN (SELECT R.Sid FROM Reserver R)
WHERE R.Bid IN SELECT B.Sid FROM Boats B
WHERE B.Color = ‘red’)
 Find the names of sailors who have not reserved a red boat?
SELECT S.Sname FROM Sailors S
WHERE S.Sid NOT IN (SELECT R.Sid FROM Reservers P)
WHERE R.Bid IN (SELECT B.Bid FROM Boats B)
WHERE B.Color = ‘red’
Correlated Nested Queries:-
Here the inner sub – query could depend on the low that is currently
being examined in the outer query.

39
Example:
 Find the names of sailors who have reserved a beat number 103.
SELECT S.Sname FROM Sailor S
WHERE EXISTS (SELECT * FROM Reserves R
WHERE R.Bid = 103 AND R. Bid = S.Sid)
 This EXISTS is another set – comparison operator such as IN NOTIN.
 It allows us to test whether a set is non – empty set of reserves lows
R such that R.Bid = 103 AND S.Sid = R.Sid is non - empty
 If so, sailor S has reserved boat 103, we retrieve the name.
 The sub query clearly depends on the current rows and must be
reevaluated for each row in sailors.

2.15 MODIFICATION OF DATABASE


The modification operations that can be performed on a database
are
 Insertion
 Deletion
 Updation
Insertion:-
While inserting a single row of data into the table, the insert
operation:
Creates a new row in the database table.
Loads the values passed into all the columns specified.
Syntax:
INSERT INTO tablename(attributename1,attributename2,…
attributenamen)
VALUES(expression);
Where
Table name->Name of the table .
Attribute Name->Name of the column.
Expression->Values of the column.
The character expression must be enclosed in a single quotes.

40
Example:
 To insert row values directly
INSERT INTO student VALUES(1001,’vikky’, 80, 60, 90, 240,
80.00,’pass’
 To insert row values through keyboard
INSERT INTO student VALUES (&regno,’&name’,&m1,&m2,
&m2,&total,&percentage,’&result’)
 To insert row values
INSERT INTO student
(&regno,’&name’,&m1,&m2,&m2,&total,
&percentage,’ result’) VALUES(1001,’vikky’, 80, 60, 90, 240,
80.00,’pass’);
 To insert row values into a table from another table
INSERT INTO student SELECT rno, name, tot, res FROM
academic;
Deletion:-
 Delete command is used to remove rows from the table.
 Using this command either of the below mentioned are possible.
 All rows can be deleted from a table.
 Remove selected rows from a table.
 Delete command operated in only one operation.
Syntax:
DELETE FROM <table name> WHERE predicate;
Example:
 To delete those rows from the table student, the query
DELETE FROM student;
 To delete those rows whose result is fail, the query is
DELETE FROM student WHERE result=’fail’;
Update:-
To change a value in a tuple without changing all values in the
tuple.
Syntax:

41
UPDATE <TABLENAME> SET PREDICATE;

42
Example:
 The annual interest payments are made, and all branches are to be
increased by 5 percent.
UPDATE account
SET balance=balance*1.05
 If interest is to paid only to accounts with a balance of $1000 or
more
UPDATE account
SET balance=balance*1.05
WHERE balance>=1000
 SQL first tests all tuples in the relation to see whether they should
be updated, and carries out the updates afterward

43
UNIT – III
3.1 INTERMEDIATE SQL
This Intermediate/Advanced SQL Tutorial will cover the SELECT
statement in great detail. The SELECT statement is the core of SQL, and it
is likely that the vast majority of your SQL commands will be SELECT
statements. Due to the enormous amount of options available for the
SELECT statement, this entire tutorial has been dedicated to it.

3.2 JOIN EXPRESSIONS


 Join operations take two relations and return as a result another
relation.
 A join operation is a Cartesian product which requires that tuples in
the two relations match (under some condition). It also specifies the
attributes that are present in the result of the join
 The join operations are typically used as subquery expressions in
the from clause
 Now we want to look at joins. To do joins correctly in SQL requires
many of the elements we have introduced so far. Let's assume that
we have the following two tables
Table Store_Information
Store_Na Sal Txn_Dat
me es e

Los 150 Jan-05-


Angeles 0 1999

San Diego 250 Jan-07-


1999

Los 300 Jan-08-


Angeles 1999

Boston 700 Jan-08-


1999

44
Table Geography
Region_Na Store_Na
me me
East Boston

East New York

West Los
Angeles

West San Diego

We see that table Geography includes information on regions and


stores, and table Store_Information contains sales information for each
store. To get the sales information by region, we have to combine the
information from the two tables. Examining the two tables, we find that
they are linked via the common field, "Store_Name". We will first present
the SQL statement and explain the use of each segment later:
 SELECT A1.Region_Name REGION, SUM(A2.Sales) SALES
FROM Geography A1, Store_Information A2
WHERE A1.Store_Name = A2.Store_Name
GROUP BY A1.Region_Name;
Result:-
REGIO
SALES
N
East 700

West 2050

The first two lines tell SQL to select two fields, the first one is the field
"Region_Name" from table Geography (aliased as REGION), and the
second one is the sum of the field "Sales" from
table Store_Information (aliased as SALES). Notice how the table aliases
are used here: Geography is aliased as A1, and Store_Information is
aliased as A2. Without the aliasing, the first line would become
45
 SELECT Geography.Region_Name REGION,
SUM(Store_Information.Sales) SALES
This is much more cumbersome. In essence, table aliases make the
entire SQL statement easier to understand, especially when multiple
tables are included.
An alternative way to specify a join between tables is to use
the JOIN and ON keywords. In the current example, the SQL query would
be,
 SELECT A1.Region_Name REGION, SUM(A2.Sales) SALES
FROM Geography A1
JOIN Store_Information A2
ON A1.Store_Name = A2.Store_Name
GROUP BY A1.Region_Name;
Several different types of joins can be performed in SQL. The key ones are
as follows:
 Inner Join
 Outer Join
 Left Outer Join
 Cross Join
Inner Join:-
An inner join in SQL returns rows where there is at least one match
on both tables. Let's assume that we have the following two tables,
Table Store_Information

Store_Na Sal Txn_Dat


me es e

Los 150 Jan-05-


Angeles 0 1999

San Diego 250 Jan-07-


1999

Los 300 Jan-08-


Angeles 1999

Boston 700 Jan-08-


46
1999

47
Table Geography

Region_Na Store_Na
me me

East Boston

East New York

West Los
Angeles

West San Diego

We want to find out sales by store, and we only want to see stores
with sales listed in the report. To do this, we can use the following SQL
statement using INNER
 SELECT A1.Store_Name STORE, SUM(A2.Sales) SALES
FROM Geography A1
INNER JOIN Store_Information A2
ON A1.Store_Name = A2.Store_Name
GROUP BY A1.Store_Name;
Result:-
STORE SALES
Los Angeles 1800

San Diego 250


Boston 700

By using INNER JOIN, the result shows 3 stores, even though we are
selecting from the Geographytable, which has 4 rows. The row "New York"
is not selected because it is not present in the Store_Information table.
Outer Join:-
Previously, we had looked at left join, or inner join, where we select
rows common to the participating tables to a join. What about the cases
where we are interested in selecting elements in a table regardless of

48
whether they are present in the second table? We will now need to use
the SQL OUTER JOIN command.

The syntax for performing an outer join in SQL is database-


dependent. For example, in Oracle, we will place an "(+)" in
the WHERE clause on the other side of the table for which we want to
include all the rows.

Let's assume that we have the following two tables,

Table Store_Information

Store_Na Sal Txn_Dat


me es e

Los 150 Jan-05-


Angeles 0 1999

San Diego 250 Jan-07-


1999

Los 300 Jan-08-


Angeles 1999

Boston 700 Jan-08-


1999

Table Geography
Region_Na Store_Na
me me

East Boston

East New York

West Los
Angeles

If we do a regular join, we will not be able to get what we want


because we will have missed "New York," since it does not appear in the

49
Store_Information table. Therefore, we need to perform an outer join on
the two tables above:
 SELECT A1.Store_Name, SUM(A2.Sales) SALES
FROM Geography A1, Store_Information A2
WHERE A1.Store_Name = A2.Store_Name (+)
GROUP BY A1.Store_Name;
Note that in this case, we are using the Oracle syntax for outer join.

Result:-

Store_Nam SALES
e

Boston 70

New York 700

Los Angeles 1800

San Diego 250

Left Outer Join:-

In an left outer join, all rows from the first table mentioned in the
SQL query is selected, regardless whether there is a matching row on the
second table mentioned in the SQL query. Let's assume that we have the
following two tables,

Table Store_Information

Store_Na Sal Txn_Dat


me es e

Los 150 Jan-05-


Angeles 0 1999

San Diego 250 Jan-07-

50
1999

Los 300 Jan-08-


Angeles 1999

Boston 700 Jan-08-


1999

Table Geography

Region_Na Store_Na
me me

East Boston

East New York

West Los
Angeles

West San Diego

We want to find out sales by store, and we want to see the results
for all stores regardless whether there is a sale in
the Store_Information table. To do this, we can use the following SQL
statement using LEFT OUTER JOIN

 SELECT A1.Store_Name STORE, SUM(A2.Sales) SALES


FROM Geography A1
LEFT OUTER JOIN Store_Information A2
ON A1.Store_Name = A2.Store_Name
GROUP BY A1.Store_Name;

Result:-

STORE SALES

Los Angeles 1800

San Diego 250

New York NULL

51
BOSTAN 700

By using LEFT OUTER JOIN, all four rows in the Geography table is
listed. Since there is no match for "New York" in
the Store_Information table, the Sales total for "New York" is NULL. Note
that it is NULL and not 0, as NULL indicates there is no match.
Cross Join:-
A cross join (also called a Cartesian join) is a join of tables without
specifying the join condition. In this scenario, the query would return all
possible combination of the tables in the SQL query. To see this in action,
let's use the following example:

52
Table Store_Information

Store_Na Sal Txn_Dat


me es e

Los 150 Jan-05-


Angeles 0 1999

San Diego 250 Jan-07-


1999

Los 300 Jan-08-


Angeles 1999

Boston 700 Jan-08-


1999

Table Geography

Region_Na Store_Na
me me

East Boston

East New York

West Los
Angeles

West San Diego

 SELECT A1.Store_Name STORE1, A2.Store_Name STORE2,


A2.Sales SALES
FROM Geography A1
JOIN Store_Information A2;
 SELECT A1.store_name STORE1, A2.store_name STORE2,
A2.Sales SALES
FROM Geography A1, Store_Information A2;
A cross join is seldom the desired result. Rather, it is an indication that
some required join condition is missing in the SQL query.

53
3.3 VIEW
In some cases, it is not desirable for all users to see the entire
logical model (that is, all the actual relations stored in the database.)
Consider a person who needs to know an instructors name and
department, but not the salary. This person should see a relation
described, in SQL, by
select ID, name, dept_name
from instructor
A view provides a mechanism to hide certain data from the view of
certain users.
Any relation that is not of the conceptual model but is made visible
to a user as a “virtual relation” is called a view.
View Definition:-

 A view is defined using the create view statement which has the
form
Create view v as < query expression >
 Where <query expression> is any legal SQL expression. The view
name is represented by v.
 Once a view is defined, the view name can be used to refer to the
virtual relation that the view generates.
 View definition is not the same as creating a new relation by
evaluating the query expression
 Rather, a view definition causes the saving of an expression; the
expression is substituted into queries using the view.
Example:-
 A view of instructors without their salary
create view faculty as
select ID, name, dept_name
from instructor
 Find all instructors in the Biology department
select name

54
from faculty
where dept_name = ‘Biology’
 Create a view of department salary totals
create view departments_total_salary(dept_name, total_salary) as
select dept_name, sum (salary)
from instructor
group by dept_name;
 create view physics_fall_2009 as
select course.course_id, sec_id, building, room_number
from course, section
where course.course_id = section.course_id
and course.dept_name = ’Physics’
and section.semester = ’Fall’
and section.year = ’2009’;
 create view physics_fall_2009_watson as
select course_id, room_number
from physics_fall_2009
where building= ’Watson’;
 Expand use of a view in a query/another view

create view physics_fall_2009_watson as


(select course_id, room_number
from (select course.course_id, building, room_number
from course, section
where course.course_id = section.course_id
and course.dept_name = ’Physics’
and section.semester = ’Fall’
and section.year = ’2009’)
where building= ’Watson’;
 One view may be used in the expression defining another view
 A view relation v1 is said to depend directly on a view relation v 2 if
v2 is used in the expression defining v1

55
 A view relation v1 is said to depend on view relation v2 if either v1
depends directly to v2 or there is a path of dependencies from v 1 to
v2
 A view relation v is said to be recursive if it depends on itself.

View Expansion:-
 A way to define the meaning of views defined in terms of other
views.
 Let view v1 be defined by an expression e1 that may itself contain
uses of view relations.
 View expansion of an expression repeats the following replacement
step:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining
vi until no
more view relations are present in e1
 As long as the view definitions are not recursive, this loop will
terminate

Update Of A View:-

 Add a new tuple to faculty view which we defined earlier


insert into faculty values (’30765’, ’Green’, ’Music’);
 This insertion must be represented by the insertion of the tuple
(’30765’, ’Green’, ’Music’, null) into the instructor
relation
3.4 TRANSCATIONS
 Unit of work Atomic transaction either fully executed or rolled back
as if it never occurred
 Isolation from concurrent transactions
 Transactions begin implicitly
 Ended by commit work or rollback work
 But default on most databases: each SQL statement commits
automatically
 Can turn off auto commit for a session (e.g. using API)
56
 In SQL:1999, can use: begin atomic …. end
 Not supported on most databases
 A transaction can be defined as a group of tasks. A single task is the
minimum processing unit which cannot be divided further.
 Let’s take an example of a simple transaction. Suppose a bank
employee transfers Rs 500 from A's account to B's account. This
very simple and small transaction involves several low-level tasks.

A’s Account

Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account

Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)

3.5 AUTHORIZATION
 Forms of authorization on parts of the database:
 Read - allows reading, but not modification of data.
 Insert - allows insertion of new data, but not modification of existing
data.
 Update - allows modification, but not deletion of data.
 Delete - allows deletion of data.
 Forms of authorization to modify the database schema
 Index - allows creation and deletion of indices.
 Resources - allows creation of new relations.
 Alteration - allows addition or deletion of attributes in a relation.
 Drop - allows deletion of relations.
57
 Authorization Specification in SQL
 The grant statement is used to confer authorization

grant <privilege list>


on <relation name or view name> to <user list>
<user list> is:
a user-id
 which allows all valid users the privilege granted
 A role (more on this later)
 Granting a privilege on a view does not imply granting any
privileges on the underlying relations.
 The grantor of the privilege must already hold the privilege on the
specified item (or be the database administrator).
Privileges in SQL:-

Select: allows read access to relation, or the ability to query using the
view
Example: grant users U1, U2, and U3 select authorization on the instructor
relation:
Grant select on instructor to U1, U2, U3
Insert: the ability to insert tuples
Update: the ability to update using the SQL update statement
Delete: the ability to delete tuples.
All privileges: used as a short form for all the allowable privileges
Revoking Authorization in SQL:-
 The revoke statement is used to revoke authorization.
 revoke <privilege list>
 on <relation name or view name> from <user list>
Example:
 revoke select on branch from U1, U2, U3
 <privilege-list> may be all to revoke all privileges the revokee may
hold.
 If <revokee-list> includes public, all users lose the privilege except
those granted it explicitly.

58
 If the same privilege was granted twice to the same user by
different grantees, the user may retain the privilege after the
revocation.
 All privileges that depend on the privilege being revoked are also
revoked.
Roles:-
 Create role instructor;
 Grant instructor to commit;
 Privileges can be granted to roles:
 Grant select on takes to instructor;
 Roles can be granted to users, as well as to other roles
 Create role teaching_assistant
 Grant teaching_assistant to instructor;
 Instructor inherits all privileges of teaching_assistant
 Chain of roles
 Create role dean;
 Grant instructor to dean;
 Grant dean to Satoshi;
Authorization on Views:-
 create view geo_instructor as
(select * from instructor where dept_name = ’Geology’);
 grant select on geo_instructor to geo_staff
 Suppose that a geo_staff member issues
 select * from geo_instructor;
 What if geo_staff does not have permissions on instructor?
 creator of view did not have some permissions on instructor?
Other Authorization Features:-
 References privilege to create foreign key
 Grant reference (dept_name) on department to Mariano;
 why is this required?
 Transfer of privileges
 grant select on department to commit with grant option;
 revoke select on department from commit, Satoshi cascade;
59
 revoke select on department from commit, Satoshi restrict;

60
3.6 ADVANCED SQL
SQL allows us to perform various transactions on the underlying
database data. It allows the user to retrieve simple to complex request in
an efficient way. The basic command used to retrieve the data from the
database is SELECT.

3.7 FUNCTIONS AND PROCEDURES

 SQL:1999 supports functions and procedures


 Functions/procedures can be written in SQL itself, or in an external
programming language
 Functions are particularly useful with specialized data types such as
images and geometric objects
 Example: functions to check if polygons overlap, or to compare
images for similarity
 Some database systems support table-valued functions, which can
return a relation as a result SQL:1999 also supports a rich set of
imperative constructs, including Loops, if-then-else, assignment
 Many databases have proprietary procedural extensions to SQL that
differ from SQL:1999
SQL Functions:-
Define a function that, given the name of a customer, returns the
count of the number of accounts owned by the customer.
create function account_count (customer_name
varchar(20))
returns integer
begin
declare a_count integer;
select count (* ) into a_count
from depositor
where depositor.customer_name = customer_name
return a_count;
end

61
Find the name and address of each customer that has more than one
account.
select customer_name, customer_street, customer_city
from customer
where account_count (customer_name ) > 1
Table Functions:-
 SQL:2003 added functions that return a relation as a result
 Example: Return all accounts owned by a given customer
create function accounts_of (customer_name char(20)
returns table ( account_number char(10),
branch_name char(15)
balance numeric(12,2))
return table
(select account_number, branch_name, balance
from account A where exists (
select *
from depositor D
where D.customer_name = accounts_of.customer_name
and D.account_number = A.account_number ))
Usage select *
from table (accounts_of (‘Smith’))
SQL Procedures:-
 The author_count function could instead be written as
procedure:
create procedure account_count_proc (in title varchar(20), out
a_count integer)
begin select count(author) into a_count
from depositor
where depositor.customer_name =
account_count_proc.customer_name
end
 Procedures can be invoked either from an SQL procedure or
from embedded SQL, using the call statement.
62
declare a_count integer;
call account_count_proc( ‘Smith’, a_count);
 Procedures and functions can be invoked also from dynamic SQL
 SQL:1999 allows more than one function/procedure of the same
name (called name overloading), as long as the number of
arguments differ, or at least the types of the arguments differ
 Compound statement: begin … end,
 May contain multiple SQL statements between begin and end.
 Local variables can be declared within a compound statements
 While and repeat statements:
declare n integer default 0;
while n < 10 do
set n = n + 1
end while
repeat
set n = n – 1
until n = 0
end repeat
For loop
 Permits iteration overall results of a query
 Example: find total of all balances at the Perryridge branch
declare n integer default 0;
for r as
select balance from account
where branch_name = ‘Perryridge’
do
set n = n + r.balance
end for
 Conditional statements (if-then-else)
E.g. To find sum of balances for each of three categories of accounts
(with balance <1000, >=1000 and <5000, >= 5000)
if r.balance < 1000
then set l = l + r.balance

63
elseif r.balance < 5000
then set m = m + r.balance
else set h = h + r.balance
end if
 SQL:1999 also supports a case statement similar to C case
statement
 Signaling of exception conditions, and declaring handlers for
exceptions
declare out_of_stock condition
declare exit handler for out_of_stock
begin

.. signal out-of-stock
end
 The handler here is exit -- causes enclosing begin..end to be exited

3.8 TRIGGERS
 You can write triggers that fire whenever one of the following
operations occurs:
 DML statements (INSERT, UPDATE, DELETE) on a particular table or
view, issued by any user
 DDL statements (CREATE or ALTER primarily) issued either by a
particular schema/user or by any schema/user in the database
 Database events, such as logon/logoff, errors, or startup/shutdown,
also issued either by a particular schema/user or by any
schema/user in the database
 Triggers are similar to stored procedures. A trigger stored in the
database can include SQL and PL/SQL or Java statements to run as a
unit and can invoke stored procedures. However, procedures and
triggers differ in the way that they are invoked. A procedure is
explicitly run by a user, application, or trigger. Triggers are implicitly
fired by Oracle when a triggering event occurs, no matter which
user is connected or which application is being used.

64
 A database application with some SQL statements that implicitly fire
several triggers stored in the database. Notice that the database
stores triggers separately from their associated tables.

 A trigger can also call out to a C procedure, which is useful for


computationally intensive operations.
 The events that fire a trigger include the following:
 DML statements that modify data in a table (INSERT, UPDATE,
or DELETE)
 DDL statements
 System events such as startup, shutdown, and error messages
 User events such as logon and logoff
How Triggers Are Used:-
 Triggers supplement the standard capabilities of Oracle to provide a
highly customized database management system. For example, a
trigger can restrict DML operations against a table to those issued
during regular business hours. You can also use triggers to:
 Automatically generate derived column values
 Prevent invalid transactions
 Enforce complex security authorizations
65
 Enforce referential integrity across nodes in a distributed database
 Enforce complex business rules
 Provide transparent event logging
 Provide auditing
 Maintain synchronous table replicates
 Gather statistics on table access
 Modify table data when DML statements are issued against views
 Publish information about database events, user events, and SQL
statements to subscribing applications

3.9 FORMAL RELATIONAL QUERIES LANGUAGES


Relational query languages use relational algebra to break the user
requests and instruct the DBMS to execute the requests. It is the language
by which user communicates with the database. These relational query
languages can be procedural or non-procedural.
Procedural Query Language:-
A procedural query language will have set of queries instructing the
DBMS to perform various transactions in the sequence to meet the user
request. For example, get_CGPA procedure will have various queries to
get the marks of student in each subject, calculate the total marks, and
then decide the CGPA based on his total marks. This procedural query
language tells the database what is required from the database and how
to get them from the database. Relational algebra is a procedural query
language.
Non-Procedural Query Language:-
Non-procedural queries will have single query on one or more tables
to get result from the database. For example, get the name and address
of the student with particular ID will have single query on STUDENT table.
Relational Calculus is a non procedural language which informs what to do
with the tables, but doesn’t inform how to accomplish this.
These query languages basically will have queries on tables in the
database. In the relational database, a table is known as relation.
Records / rows of the table are referred as tuples. Columns of the table

66
are also known as attributes. All these names are used interchangeably in
relational database.

3.10 THE RELATIONAL ALGEBRA


 Relational algebra is a procedural query language. It takes one or
more relations / tables and performs the operation and produce the
result. This result is also considered as a new table or relation.
 Suppose we have to retrieve student name, address and class for
the given ID. What a relational algebra will do in this case is, it filters
the name, address and class from the STUDENT table for the input
ID.
 In mathematical terms, relational algebra has produced a subset of
STUDENT table for the given ID.
 Relational algebra will have operators to indicate the operations.
This algebra can be applied on single relation – called unary or can
be applied on two tables – called binary. While applying the
operations on the relation, the resulting subset of relation is also
known as new relation.
 There can be multiple steps involved in some of the operations. The
subsets of relations at the intermediary level are also known as
relation.
 We will understand it better when we see different operations
below.
 Relational Algebra in DBMS has 6 fundamental operations. There are
several other operations defined upon these fundamental
operations.
Select (σ):-
 This is a unary relational operation. This operation pulls the
horizontal subset (subset of rows) of the relation that satisfies the
conditions.
 This can use operators like <, >, <=, >=, = and != to filter the data
from the relation.

67
 It can also use logical AND, OR and NOT operators to combine the
various filtering conditions.
This operation can be represented as below:-
σ p (r)-Where σ is the symbol for select operation, r represents the
relation/table, and p is the logical formula or the filtering conditions to get
the subset. Let us see an example as below:
σSTD_NAME = “James” (STUDENT)

Acc Yr-pub title


-no
734216 1982 Algorithm
design
237235 1995 Database
systems
631523 1992 Compiler
design
543211 1991 Programming
376112 1992 Machine design

What does above relation algebra do? It selects the record/tuple from the
STUDENT table with Student name as ‘James’
σdept_id = 20 AND salary>=10000 (EMPLOYEE) - Selects the records from
EMPLOYEE table with department ID = 20 and employees whose salary is
more than 10000.
Project (∏) :-
 This is a unary operator and is similar to select operation above. It
creates the subset of relation based on the conditions specified.
Here, it selects only selected columns/attributes from the relation-
vertical subset of relation.
 The select operation above creates subset of relation but for all the
attributes in the relation. It is denoted as below:
 ∏a1, a2, a3 (r)-Where ∏ is the operator for projection, r is the relation
and a1, a2, a3 are the attributes of the relations which will be shown
in the resultant subset.
 ∏std_name, address, course (STUDENT) - This will select all the records from
STUDENT table but only selected columns – std_name, address and

68
course. Suppose we have to select only these 3 columns for
particular student then we have to combine both project and select
operations.

∏STD_ID, address, course (σ STD_NAME = “James” (STUDENT))

 This selects the record for ‘James’ and displays only std_ID, address
and his course columns. Here we can see two unary operators are
combined, and it has two operations performing.
 First it selects the tuple from STUDENT table for ‘James’. The
resultant subset of STUDENT is also considered as intermediary
relation.
 But it is temporary and exists till the end of this operation. It then
filters the 3 columns from this temporary relation.
Rename (ρ) :-
 This is a unary operator used to rename the tables and columns of a
relation. When we perform self join operation, we have to
differentiate two same tables.
 In such case rename operator on tables comes into picture.
 When we join two or more tables and if those tables have same
column names, then it is always better to rename the columns to
differentiate them. This occurs when we perform Cartesian product
operation.
 ρ
R(E)- Where ρ is the rename operator, E is the existing relation
name, and R is the new relation name.
 ρ STUDENT (STD_TABLE) – Renames STD_TABLE table to STUDENT
 Let us see another example to rename the columns of the table. If
the STUDENT table has ID, NAME and ADDRESS columns and if they
have to be renamed to STD_ID, STD_NAME, STD_ADDRESS, then we
have to write as follows
 ρ
STD_ID, STD_NAME, STD_ADDRESS (STUDENT) – It will rename the columns in the
order the names appear in the table

69
Cartesian product (X):-
 This is a binary operator. It combines the tuples of two relations into
one relation.
 RXS-Where R and S are two relations and X is the operator.
 If relation R has m tuples and relation S has n tuples, then the
resultant relation will have mn tuples. For example, if we perform
cartesian product on EMPLOYEE (5 tuples) and DEPT relations (3
tuples), then we will have new tuple with 15 tuples.
EMPLOYEE X DEPT:-
This operator will simply create a pair between the tuples of each
table. i.e.; each employee in the EMPLOYEE table will be mapped with
each department in DEPT table. Below diagram depicts the result of
cartesian product.

Set-difference (-) :-
This is a binary operator. This operator creates a new relation with
tuples that are in one relation but not in other relation. It is denoted by
‘-‘symbol.
R–S
Where R and S are the relations.

70
Suppose we want to retrieve the employees who are working in
Design department but not in testing.

71
DESIGN_EMPLOYEE −TESTING_EMPLOYEE

Set Intersection:-
This operation is a binary operation. It results in a relation with
tuples that are in both the relations. It is denoted by ‘∩ ‘.
R∩S
Where R and S are the relations. It picks all the tuples that are
present in both R and S, and results it in a new relation.
Suppose we have to find the employees who are working in both
design and testing department.
If we have tuples as in above example, the new result relation will
not have any tuples. Suppose we have tuples like below and see the new
relation after set difference.
Assignment:-
 As the name indicates, the assignment operator ‘ ’ is used to
assign the result of a relational operation to temporary relational
variable.
 This is useful when there is multiple steps in relational operation
and handling everything in one single expression is difficult.
 Assigning the results into temporary relation and using this
temporary relation in next operation makes task simple and easy.
72
 T S – denotes relation S is assigned to temporary relation T
A relational operation ∏a1, a2 (σ p (E)) with selection and projection can be
divided as below.
T σ p (E)
S ∏a1, a2 (T)
Our example above in projection for getting STD_ID, ADDRESS and
COURSE for the Student ‘James’ can be re-written as below.
∏STD_ID, address, course (σ (STUDENT))
STD_NAME = “James”

T σ STD_NAME = “James” (STUDENT)

S ∏STD_ID, address, course (T)

73
This set intersection can also be written as a combination of set difference
operations.
R∩S R-(R-S)
i.e.; it evaluates R-S to get the tuples which are present only in R and then
it gets the record which are present only in R but not in new resultant
relation of R-S.
In above example of employees,
DESIGN_EMPLOYEE – (DESIGN_EMPLOYEE – TESTING_EMPLOYEE)
 It first filters only those employees who are only design employees –
(104, Kathy). This result is then used to find the difference with
design employee.
 This will find those employees who are design employees but not in
new result – (100, James).
 Thus it gives the result tuple which is both designer and tester. We
can see here fundamental relational operator is used twice to get
set intersection. Hence this operation is not fundamental operation.
Division:-
 This operation is used to find the tuples with phrase ‘for all’. It is
denoted by ‘÷’. Suppose we want to see all the employees who
work in all of departments. What are the steps involved to find this?
 First we find all the department ID - T1 ∏DEPT_ID (DEPARTMENT)
 Next step is list all the employees and their departments – T2
∏ EMP_ID, DEPT_ID (EMPLOYEE)
 In third step we will find the employees in T2 with the entire
department ID in T1. This is obtained by using division operation
– T2 ÷ T1

74
3.11 TUPLE RELATIONAL CALCULUS
 Tuple variable is also called as range variable that ranges over some
relation.
 A tuple variable is a variable that takes on tuple is a variable that
takes on tuples of a particular relation schema as values.
 A Tuple relational calculus query has the from {T/P(T)},where Tis a
tuple variable and P(T) denotes a formula that describe T:
Describe T:
Where TSet of tuple variable
Pformula Involving
 To find all teachers whose salary is above 1000?
Query:{t/TEACHER(t) and t.salary > 10000}
 To retrieve first and last names
Query:{t.fname,t.Lname/TEACHER (t) and} t.salary >
100000}
75
Syntax of TRC Queries:
 A Predicate followed by its arguments is called as Atomic formula
R Rel
R.a op S
R.a op constant, or constant op R.a
 A formula is recursively defined to be, where p and q are
themselves formulas and P® denotes a formula in which the
variable R appears:
Any atomic formula
7P,Pq,Pvq, or P=>q
 R (p (R), where R is a tuple variable
V R (p (R)), where R is a tuple Variable.
The quantifiers  and V are said to bind the variable.
Example DBMS (x).COMPANY (y)
 A language consists of symbols (ie) variables, constants and
predicates.
Example: P(x) v (Q(y)
(P(x)(Q(y)

3.12 DOMAIN RELATIONAL CALCULUS


 A domain variable is a variable that ranges over the values in the
domain of some attribute.
 A DRC query has the form {<x1,x2………..xn>1}
 P (<x2,x2,…xn>)} denote a dRC formula whose only due variables
are the variables among the xi,1  I  n
Syntax:
{x / f(x)}
where F - formula on x
X - set of domain variable
Example
 Find all sailors with a rating above 7.
{<I,N,T,A > / <I,N,T,A >  sailors T > 7 }
 Find the names of sailors who have reserved the boat 103.
76
{<N>1 I,T, A(<I,N,T,A>E SailorsIr.Br.D(<Ir,Br,D> Reserves
Ir=IrLABr=103)}
 Find the name of sailors who have reserved a red boat
{<N>1I,T,A(<I,N,T,A> E sailors  <I, Br.D>) Reserves <Br,
Bn, ‘red’>  Boats)}

77
UNIT - IV
4.1 DATABASE DESIGN AND THE E-R MODEL
Database design is the process of producing a detailed data
model of database. This data model contains all the needed logical and
physical design choices and physical storage parameters needed to
generate a design in a data definition language, which can then be used
to create a database. A fully attributed data model contains detailed
attributes for each entity.

The term database design can be used to describe many different


parts of the design of an overall database system. Principally, and most
correctly, it can be thought of as the logical design of the base data
structures used to store the data. In the relational model these are
the tables and views.

In an object database the entities and relationships map directly to


object classes and named relationships. However, the term database
design could also be used to apply to the overall process of designing, not
just the base data structures, but also the forms and queries used as part
of the overall database application within the database management
system (DBMS).

An entity–relationship model (ER model) describes inter-related


things of interest in a specific domain of knowledge. An ER model is
composed of entity types (which classify the things of interest) and
specifies relationships that can exist between instances of those entity
types.

4.2 OVERVIEW OF THE DATA PROCESS


A data processing system is a combination of machines, people,
and processes that for a set of inputs produces a defined set of outputs.
The inputs and outputs are interpreted as data, facts, information, ...
depending on the interpreter's relation to the system. A common
synonymous term is "information system".

78
A data processing system may involve some combination of:

 Conversion converting data to another format.


 Validation – Ensuring that supplied data is "clean, correct and
useful."
 Sorting – "arranging items in some sequence and/or in different
sets."
 Summarization – reducing detail data to its main points.
 Aggregation – combining multiple pieces of data.
 Analysis – the "collection, organization, analysis, interpretation
and presentation of data.".
 Reporting – list detail or summary data or computed information.
The ER model defines the conceptual view of a database. It works
around real-world entities and the associations among them. At view
level, the ER model is considered a good option for designing databases.

4.3 THE ENTITY-RELATIONSHIP MODEL


Entity:-

An entity can be a real-world object, either animate or inanimate,


that can be easily identifiable. For example, in a school database,
students, teachers, classes, and courses offered can be considered as
entities. All these entities have some attributes or properties that give
them their identity.

An entity set is a collection of similar types of entities. An entity set


may contain entities with attribute sharing similar values. For example, a
Students set may contain all the students of a school; likewise a Teachers
set may contain all the teachers of a school from all faculties. Entity sets
need not be disjoint.

Attributes:-

Entities are represented by means of their properties,


called attributes. All attributes have values. For example, a student
entity may have name, class, and age as attributes.

79
There exists a domain or range of values that can be assigned to
attributes. For example, a student's name cannot be a numeric value. It
has to be alphabetic. A student's age cannot be negative, etc.

80
Types of Attributes:-

 Simple attribute − Simple attributes are atomic values, which


cannot be divided further. For example, a student's phone number
is an atomic value of 10 digits.
 Composite attribute − Composite attributes are made of more
than one simple attribute. For example, a student's complete name
may have first_name and last_name.
 Derived attribute − Derived attributes are the attributes that do
not exist in the physical database, but their values are derived from
other attributes present in the database. For example,
average_salary in a department should not be saved directly in the
database, instead it can be derived. For another example, age can
be derived from data_of_birth.
 Single-value attribute − Single-value attributes contain single
value. For example − Social_Security_Number.
 Multi-value attribute − Multi-value attributes may contain more
than one values. For example, a person can have more than one
phone number, email_address, etc.
These attribute types can come together in a way like −
 simple single-valued attributes
 simple multi-valued attributes
 composite single-valued attributes
 composite multi-valued attributes
Entity-Set and Keys:-

Key is an attribute or collection of attributes that uniquely identifies


an entity among entity set.

For example, the roll_number of a student makes him/her


identifiable among students.
 Super Key − A set of attributes (one or more) that collectively
identifies an entity in an entity set.

81
 Candidate Key − A minimal super key is called a candidate key.
An entity set may have more than one candidate key.
 Primary Key − A primary key is one of the candidate keys chosen
by the database designer to uniquely identify the entity set.
Relationship:-

The association among entities is called a relationship. For example,


an employee works_at a department, a student enrolls in a course.
Here, Works_at and Enrolls are called relationships.

Relationship Set:-

A set of relationships of similar type is called a relationship set. Like


entities, a relationship too can have attributes. These attributes are
called descriptive attributes.

Degree of Relationship:-

The number of participating entities in a relationship defines the


degree of the relationship.

 Binary = degree 2

 Ternary = degree 3

 n-ary = degree

Mapping Cardinalities:-

Cardinality defines the number of entities in one entity set, which


can be associated with the number of entities of other set via relationship
set.

 One-to-one − One entity from entity set A can be associated with


at most one entity of entity set B and vice versa.
 One-to-many − One entity from entity set A can be associated
with more than one entities of entity set B however an entity from
entity set B, can be associated with at most one entity.

82
 Many-to-one − More than one entities from entity set A can be
associated with at most one entity of entity set B, however an entity
from entity set B can be associated with more than one entity from
entity set A.
 Many-to-many − One entity from A can be associated with more
than one entity from B and vice versa.
 Let us now learn how the ER Model is represented by means of an
ER diagram. Any object, for example, entities, attributes of an
entity, relationship sets, and attributes of relationship sets, can be
represented with the help of an ER diagram.
Entity:-
 Entities are represented by means of rectangles. Rectangles are
named with the entity set they represent.

Attributes:-

Attributes are the properties of entities. Attributes are represented


by means of ellipses. Every ellipse represents one attribute and is directly
connected to its entity (rectangle).

If the attributes are composite, they are further divided in a tree


like structure. Every node is then connected to its attribute. That is,
composite attributes are represented by ellipses that are connected with
an ellipse.

83
 Multivalued attributes are depicted by double ellipse.

 Derived attributes are depicted by dashed ellipse.

Relationship:-

Relationships are represented by diamond-shaped box. Name of the


relationship is written inside the diamond-box. All the entities
(rectangles) participating in a relationship, are connected to it by a line.

84
Binary Relationship and Cardinality:-

A relationship where two entities are participating is called a binary


relationship. Cardinality is the number of instance of an entity from a
relation that can be associated with the relation.

 One-to-one − When only one instance of an entity is associated


with the relationship, it is marked as '1:1'. The following image
reflects that only one instance of each entity should be associated
with the relationship. It depicts one-to-one relationship.

 One-to-many − When more than one instance of an entity is


associated with a relationship, it is marked as '1:N'. The following
image reflects that only one instance of entity on the left and more
than one instance of an entity on the right can be associated with
the relationship. It depicts one-to-many relationship.

 Many-to-one − When more than one instance of entity is


associated with the relationship, it is marked as 'N:1'. The following
image reflects that more than one instance of an entity on the left
and only one instance of an entity on the right can be associated
with the relationship. It depicts many-to-one relationship.

85
 Many-to-many − The following image reflects that more than one
instance of an entity on the left and more than one instance of an
entity on the right can be associated with the relationship. It depicts
many-to-many relationship.

Participation Constraints:-

 Total Participation − Each entity is involved in the relationship.


Total participation is represented by double lines.

 Partial participation − Not all entities are involved in the


relationship. Partial participation is represented by single lines.

4.4 CONSTRAINTS
Constraints enforce limits to the data or type of data that can be
inserted/updated/deleted from a table. The whole purpose of constraints is
to maintain the data integrity during an update/delete/insert into a
table. In this tutorial we will learn several types of constraints that can be
created in RDBMS.

86
Types of constraints

 NOT NULL
 UNIQUE

 DEFAULT

 CHECK

 Key Constraints – PRIMARY KEY, FOREIGN KEY

 Domain constraints

 Mapping constraints
Not Null:-
NOT NULL constraint makes sure that a column does not hold NULL
value. When we don’t provide value for a particular column while inserting
a record into a table, it takes NULL value by default. By specifying NULL
constraint, we can be sure that a particular column(s) cannot have NULL
values.
Example:
CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,

STU_NAME VARCHAR (35) NOT NULL,


STU_AGE INT NOT NULL,

STU_ADDRESS VARCHAR (235),


PRIMARY KEY (ROLL_NO)
);
Unique:-
UNIQUE Constraint enforces a column or set of columns to have
unique values. If a column has a unique constraint, it means that
particular column cannot have duplicate values in a table.
CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,

STU_NAME VARCHAR (35) NOT NULL UNIQUE,


STU_AGE INT NOT NULL,
87
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
Default:-
The DEFAULT constraint provides a default value to a column when
there is no value provided while inserting a record into a table.
CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,

STU_NAME VARCHAR (35) NOT NULL,


STU_AGE INT NOT NULL,

EXAM_FEE INT DEFAULT 10000,


STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
Check:-
This constraint is used for specifying range of values for a particular
column of a table. When this constraint is being set on a column, it
ensures that the specified column must have the value falling in the
specified range.
CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,


STU_NAME VARCHAR (35) NOT NULL,

STU_AGE INT NOT NULL,

EXAM_FEE INT DEFAULT 10000,


STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
In the above example we have set the check constraint on ROLL_NO
column of STUDENT table. Now, the ROLL_NO field must have the value
greater than 1000.

88
Key Constraints:-

Primary Key:

Primary key uniquely identifies each record in a table. It must have


unique values and cannot contain nulls. In the below example the
ROLL_NO field is marked as primary key, that means the ROLL_NO field
cannot have duplicate and null values.
CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,

STU_NAME VARCHAR (35) NOT NULL UNIQUE,

STU_AGE INT NOT NULL,

STU_ADDRESS VARCHAR (35) UNIQUE,


PRIMARY KEY (ROLL_NO)
);
Foreign Key:-
Foreign keys are the columns of a table that points to the primary
key of another table. They act as a cross-reference between tables.
Domain constraints:-

Each table has certain set of columns and each column allows a
same type of data, based on its data type. The column does not accept
values of any other data type.
Domain constraints are user defined data type and we can define them
like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE /
PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)

4.5 ENTITY-RELATIONSHIP DIAGRAM


An entity relationship diagram (ERD) shows the relationships of
entity sets stored in a database. An entity in this context is a component
of data. In other words, ER diagrams illustrate the logical structure of
databases.

89
At first glance an entity relationship diagram looks very much like
a flowchart. It is the specialized symbols, and the meanings of those
symbols, that make it unique
Common Entity Relationship Diagram Symbols:-
An ER diagram is a means of visualizing how the information a
system produces is related. There are five main components of an ERD:
Entity Relationship Diagram Symbols — Chen notation

Symbol Shape Name Symbol Description


Entities

An entity is represented by a rectangle


Entity
which contains the entity’s name.

An entity that cannot be uniquely


identified by its attributes alone. The
existence of a weak entity is
dependent upon another entity called
Weak Entity
the owner entity. The weak entity’s
identifier is a combination of the
identifier of the owner entity and the
partial key of the weak entity.

An entity used in a many-to-many


Associative relationship (represents an extra
Entity table). All relationships for the
associative entity should be many

Attributes

In the Chen notation, each attribute is


Attribute represented by an oval containing
atributte’s name

An attribute that uniquely identifies a


Key attribute particular entity. The name of a key
attribute is underscored.

90
An attribute that can have many
values (there are many distinct values
Multivalued
entered for it in the same column of
attribute
the table). Multivalued attribute is
depicted by a dual oval.
An attribute whose value is calculated
(derived) from other attributes. The
Derived derived attribute may or may not be
attribute physically stored in the database. In
the Chen notation, this attribute is
represented by dashed oval.
Relationships
A relationship where entity is
existence-independent of other
Strong entities, and PK of Child doesn’t
relationship contain PK component of Parent Entity.
A strong relationship is represented by
a single rhombus
A relationship where Child entity is
existence-dependent on parent, and
Weak
PK of Child Entity contains PK
(identifying)
component of Parent Entity. This
relationship
relationship is represented by a double
rhombus.

Entity Relationship Diagram Symbols — Crow’s Foot notation

Symbol Meaning
Relationships (Cardinality and Modality)
Zero or One

One or More

One and only One

Zero or More

Many - to - One
a one through many notation on one side of
a relationship and a one and only one on the
other

91
a zero through many notation on one side of
a relationship and a one and only one on the
other
a one through many notation on one side of
a relationship and a zero or one notation on
the other
a zero through many notation on one side of
a relationship and a zero or one notation on
the other
Many - to - Many
a zero through many on both sides of a
relationship

a zero through many on one side and a one


through many on the other
a one through many on both sides of a
relationship
a one and only one notation on one side of a
relationship and a zero or one on the other

a one and only one notation on both sides

4.6 ENTITY-RELATIONSHIP DESIGN ISSUES


Design Issues:-

The notions of an entity set and a relationship set are not precise,
and it is possible to define a set of entities and the relationships among
them in a number of different ways. In this section, we examine basic
issues in the design of an E-R database schema.

Use of Entity Sets versus Attributes

Consider the entity set employee with attributes employee-


name and telephone-number. It can easily be argued that a telephone is
an entity in its own right with attributes telephone-
number and location (the office where the telephone is located). If we take
this point of view, we must redefine the employee entity set as:
 The employee entity set with attribute employee-name
 The telephone entity set with attributes telephone-
number and location
92
 The relationship set emp-telephone, which denotes the association
between employees and the telephones that they have
 What, then, is the main difference between these two definitions of
an employee? Treating a telephone as an attribute telephone-
number implies that employees have precisely one telephone
number each. Treating a telephone as an entity telephone per- mits
employees to have several telephone numbers (including zero)
associated with them.
 However, we could instead easily define telephone-number as a
multivalued at- tribute to allow multiple telephones per employee.
 The main difference then is that treating a telephone as an entity
better models a situation where one may want to keep extra
information about a telephone, such as its location, or its type
(mobile, video phone, or plain old telephone), or who all share the
telephone.
 Thus, treating telephone as an entity is more general than treating
it as an attribute and is appropriate when the generality may be
useful.
 In contrast, it would not be appropriate to treat the
attribute employee-name as an entity; it is difficult to argue
that employee-name is an entity in its own right (in contrast to the
telephone).
 Thus, it is appropriate to have employee-name as an attribute of
the employee entity set.
 Two natural questions thus arise: What constitutes an attribute, and
what constitutes an entity set? Unfortunately, there are no simple
answers.
 The distinctions mainly depend on the structure of the real-world
enterprise being modeled, and on the semantics associated with the
attribute in question.
 A common mistake is to use the primary key of an entity set as an
attribute of another entity set, instead of using a relationship.

93
 For example, it is incorrect to model customer-id as an attribute
of loan even if each loan had only one customer.
 The relationship borrower is the correct way to represent the
connection between loans and customers, since it makes their
connection explicit, rather than implicit via an attribute.
 Another related mistake that people sometimes make is to
designate the primary key attributes of the related entity sets as
attributes of the relationship set. This should not be done, since the
primary key attributes are already implicit in the relationship.

94
Use of Entity Sets versus Relationship Sets

 It is not always clear whether an object is best expressed by an


entity set or a relationship set.
 we assumed that a bank loan is modeled as an entity. An alternative
is to model a loan not as an entity, but rather as a relationship
between customers and branches, with loan-number and amount as
descriptive attributes.
 Each loan is represented by a relationship between a customer and
a branch.
 If every loan is held by exactly one customer and is associated with
exactly one branch, we may find satisfactory the design where a
loan is represented as a relationship.
 However, with this design, we cannot represent conveniently a
situation in which several customers hold a loan jointly.
 To handle such a situation, we must define a separate relationship
for each holder of the joint loan.
 Then, we must replicate the values for the descriptive
attributes loan-number and amount in each such relation- ship.
 Each such relationship must, of course, have the same value for the
descriptive attributes loan-number and amount.
Two problems arise as a result of the replication:
 The data are stored multiple times, wasting storage space, and
 Updates potentially leave the data in an inconsistent state, where
the values differ in two relationships for attributes that are supposed
to have the same value
 The issue of how to avoid such replication is treated formally
by normalization theory
 The problem of replication of the attributes loan-
number and amount is absent in the original design .
 One possible guideline in determining whether to use an entity set
or a relation- ship set is to designate a relationship set to describe
an action that occurs between entities.

95
 This approach can also be useful in deciding whether certain
attributes may be more appropriately expressed as relationships.
Binary versus nary Relationship Sets

 Relationships in databases are often binary. Some relationships that


appear to be no binary could actually be better represented by
several binary relationships.
 For instance, one could create a ternary relationship parent, relating
a child to his/her mother and father.
 However, such a relationship could also be represented by two
binary relationships, mother and father, relating a child to his/her
mother and father separately.
 Using the two relationships mother and father allows us record a
child’s mother, even if we are not aware of the father’s identity; a
null value would be required if the ternary relationship parent is
used.
 Using binary relationship sets is preferable in this case.
 In fact, it is always possible to replace a nonbinary (n-ary, for n > 2)
relationship set by a number of distinct binary relationship sets.
 For simplicity, consider the abstract ternary (n = 3) relationship
set R, relating entity sets A, B, and C.
We replace the relationship set R by an entity set E, and create
three relationship sets:
• RA, relating E and A
• RB , relating E and B
• RC , relating E and C
If the relationship set R had any attributes, these are assigned to
entity set E; further, a special identifying attribute is created for E (since it
must be possible to distinguish different entities in an entity set on the
basis of their attribute values). For each relationship (ai, bi, ci) in the
relationship set R, we create a new entity ei in the entity set E. Then, in
each of the three new relationship sets, we insert a relationship as follows:
• (ei, ai) in RA

96
• (ei, bi) in RB
• (ei, ci) in RC
 We can generalize this process in a straightforward manner to n-ary
relationship sets.
 Thus, conceptually, we can restrict the E-R model to include only
binary relationship sets.
 However, this restriction is not always desirable.
 An identifying attribute may have to be created for the entity set
created to represent the relationship set.
 This attribute, along with the extra relationship sets required,
increases the complexity of the design and (as we shall see in
Section 2.9) overall storage requirements.
 A n-ary relationship set shows more clearly that several entities
participate in a single relationship.
 There may not be a way to translate constraints on the ternary
relationship into constraints on the binary relationships.
 For example, consider a constraint that says that R is many-to-one
from A, B to C; that is, each pair of entities from A and B is
associated with at most one C entity. This constraint cannot be
expressed by using cardinality constraints on the relationship
sets RA, RB , and RC .
 We cannot directly split works-on into binary relationships between
employee and branch and between employee and job.
 If we did so, we would be able to record that Jones is a manager and
an auditor and that Jones works at Perryridge and Down- town
however
 we would not be able to record that Jones is a manager at
Perryridge and an auditor at Downtown, but is not an auditor at
Perryridge or a manager at Downtown.
 The relationship set works-on can be split into binary relationships
by creating a new entity set as described above. However, doing so
would not be very natural.

97
Placement of Relationship Attributes

 The cardinality ratio of a relationship can affect the placement of


relationship at- tributes.
 Thus, attributes of one-to-one or one-to-many relationship sets can
be associated with one of the participating entity sets, rather than
with the relationship set.
 For instance, let us specify that depositor is a one-to-many
relationship set such that one customer may have several accounts,
but each account is held by only one customer.
 In this case, the attribute access-date, which specifies when the
customer last accessed that account, could be associated with
the account entity set to keep the figure simple, only some of the
attributes of the two entity sets are shown.
 Since each account entity participates in a relationship with at most
one in- stance of customer, making this attribute designation would
have the same meaning as would placing access-date with
the depositor relationship set.
 Attributes of a one-to- many relationship set can be repositioned to
only the entity set on the “many” side of the relationship.
 For one-to-one relationship sets, on the other hand, the relationship
attribute can be associated with either one of the participating
entities.
 The design decision of where to place descriptive attributes in such
cases — as a relationship or entity attribute — should reflect the
characteristics of the enterprise being modeled.
 The designer may choose to retain access-date as an attribute
of depositor to express explicitly that an access occurs at the point
of interaction between the customer and account entity sets.
 The choice of attribute placement is more clear-cut for many-to-
many relationship sets.
 Returning to our example, let us specify the perhaps more realistic
case that depositor is a many-to-many relationship set expressing

98
that a customer may have one or more accounts, and that an
account can be held by one or more customers.
 If we are to express the date on which a specific customer last
accessed a specific account, access-date must be an attribute of
the depositor relationship set, rather than either one of the
participating entities.
 If access-date were an attribute of account, for instance, we could
not determine which customer made the most recent access to a
joint account.
 When an attribute is determined by the combination of participating
entity sets, rather than by either entity separately, that attribute
must be associated with the many-to-many relationship set the
placement of access-date as a relationship attribute

4.7 EXTENDED E-R FEATURES


Although the basic E-R concepts can model most database features,
some aspects of a database may be more aptly expressed by certain
extensions to the basic E-R model. In this section, we discuss the
extended E-R features of specialization, generalization, higher- and lower-
level entity sets, attribute inheritance, and aggregation.

Specialization

An entity set may include subgroupings of entities that are distinct


in some way from other entities in the set. For instance, a subset of
entities within an entity set may have attributes that are not shared by all
the entities in the entity set. The E-R model provides a means for
representing these distinctive entity groupings.

Consider an entity set person, with attributes name, street, and city.
A person may be further classified as one of the following:

• customer

• employee

99
Each of these person types is described by a set of attributes that
includes all the at- tributes of entity set person plus possibly additional
attributes. For example, customer entities may be described further by
the attribute customer-id, whereas employee enti- ties may be described
further by the attributes employee-id and salary. The process of
designating subgroupings within an entity set is called specialization.
The specialization of person allows us to distinguish among persons
according to whether they are employees or customers.

As another example, suppose the bank wishes to divide accounts


into two categories, checking account and savings account. Savings
accounts need a minimum balance, but the bank may set interest rates
differently for different customers, offering better rates to favored
customers. Checking accounts have a fixed interest rate, but offer an
overdraft facility; the overdraft amount on a checking account must be
recorded.

The bank could then create two specializations of account,


namely savings-account and checking-account. As we saw earlier, account
entities are described by the attributes account-number and balance. The
entity set savings-account would have all the attributes of account and an
additional attribute interest-rate. The entity set checking account would
have all the attributes of account, and an additional
attribute overdraft amount.

We can apply specialization repeatedly to refine a design scheme.


For instance, bank employees may be further classified as one of the
following:

• officer
• teller
• secretary
Each of these employee types is described by a set of attributes
that includes all the attributes of entity set employee plus additional

100
attributes. For example, officer entities may be described further by the
attribute office-number, teller entities by the attributes station-
number and hours-per-week, and secretary entities by the attribute hours-
per- week. Further, secretary entities may participate in a
relationship secretary-for, which identifies which employees are assisted
by a secretary.
An entity set may be specialized by more than one distinguishing
feature. In our example, the distinguishing feature among employee
entities is the job the employee performs. Another, coexistent,
specialization could be based on whether the person is a temporary
(limited-term) employee or a permanent employee, resulting in the entity
sets temporary-employee and permanent-employee. When more than one
specialization is formed on an entity set, a particular entity may belong to
multiple specializations. For instance, a given employee may be a
temporary employee who is a secretary.

In terms of an E-R diagram, specialization is depicted by


a triangle component labeled ISA. The label ISA stands for “is a” and
represents, for example, that a customer “is a” person. The ISA
relationship may also be referred to as a super class-
subclass relationship. Higher- and lower-level entity sets are depicted as
regular entity sets — that is, as rectangles containing the name of the
entity set.

Generalization

The refinement from an initial entity set into successive levels of


entity sub groupings represents a top-down design process in which
distinctions are made explicit. The design process may also proceed in
a bottom-up manner, in which multiple entity sets are synthesized into a
higher-level entity set on the basis of common features. The database
designer may have first identified a customer entity set with the
attributes name, street, city, and customer-id, and an employee entity set
with the attributes name, street, city, employee-id, and salary.

101
There are similarities between the customer entity set and
the employee entity set in the sense that they have several attributes in
common. This commonality can be expressed by generalization, which
is a containment relationship that exists between a higher-level entity set
and one or more lower-level entity sets. In our example, person is the
higher-level entity set and customer and employee are lower-level entity
sets. Higher- and lower-level entity sets also may be designated by the
terms superclass and subclass, respectively. The person entity set is
the superclass of the customer and employee subclasses.

For all practical purposes, generalization is a simple inversion of


specialization. We will apply both processes, in combination, in the course
of designing the E-R

schema for an enterprise. In terms of the E-R diagram itself, we do not


distinguish be- tween specialization and generalization. New levels of
entity representation will be distinguished (specialization) or synthesized
(generalization) as the design schema comes to express fully the
database application and the user requirements of the database.

102
Differences in the two approaches may be characterized by their starting
point and overall goal.

Specialization stems from a single entity set; it emphasizes


differences among entities within the set by creating distinct lower-level
entity sets. These lower-level entity sets may have attributes, or may
participate in relationships, that do not apply to all the entities in the
higher-level entity set. Indeed, the reason a designer applies
specialization is to represent such distinctive features. If customer and
employee neither have attributes that person entities do not have nor
participate in different relationships than those in which person entities
participate, there would be no need to specialize the person entity set.

Generalization proceeds from the recognition that a number of


entity sets share some common features (namely, they are described by
the same attributes and participate in the same relationship sets). On the
basis of their commonalities, generalization synthesizes these entity sets
into a single, higher-level entity set. Generalization is used to emphasize
the similarities among lower-level entity sets and to hide the differences;
it also permits an economy of representation in that shared attributes are
not repeated.

Attribute Inheritance

A crucial property of the higher- and lower-level entities created by


specialization and generalization is attribute inheritance. The attributes
of the higher-level entity sets are said to be inherited by the lower-level
entity sets. For example, customer and employee inherit the attributes
of person. Thus, customer is described by its name, street,
and city attributes, and additionally a customer-id attribute; employee is
described by its name, street, and city attributes, and
additionally employee-id and salary attributes.

A lower-level entity set (or subclass) also inherits participation in the


relationship sets in which its higher-level entity (or superclass)

103
participates. The officer, teller, and secretary entity sets can participate in
the works-for relationship set, since the superclass employee participates
in the works-for relationship. Attribute inheritance applies through all tiers
of lower-level entity sets. The above entity sets can participate in any
relationships in which the person entity set participates.
Whether a given portion of an E-R model was arrived at by
specialization or generalization, the outcome is basically the same:
 A higher-level entity set with attributes and+ relationships that
apply to all of its lower-level entity sets
 Lower-level entity sets with distinctive features that apply only
within a particular lower-level entity set
 what follows, although we often refer to only generalization, the
properties that we discuss belong fully to both processes.
In a hierarchy, a given entity set may be involved as a lower-level
entity set in only one ISA relationship; that is, entity sets in this diagram
have only single inheritance. If an entity set is a lower-level entity set in
more than one ISA relationship, then the entity set has multiple
inheritance, and the resulting structure is said to be a lattice.
Constraints on Generalizations

To model an enterprise more accurately, the database designer may


choose to place certain constraints on a particular generalization. One
type of constraint involves determining which entities can be members of
a given lower-level entity set. Such membership may be one of the
following:

• Condition-defined. In condition-defined lower-level entity sets,


membership is evaluated on the basis of whether or not an entity satisfies
an explicit condition or predicate. For example, assume that the higher-
level entity set account has the attribute account-type. All account entities
are evaluated on the defining account-type attribute. Only those entities
that satisfy the condition account-type = “savings account” are allowed to
belong to the lower-level en- tity set person. All entities that satisfy the
condition account-type = “checking account” are included in checking
104
account. Since all the lower-level entities are evaluated on the basis of the
same attribute (in this case, on account-type), this type of generalization
is said to be attribute-defined.

• User-defined. User-defined lower-level entity sets are not constrained


by a membership condition; rather, the database user assigns entities to a
given entity set. For instance, let us assume that, after 3 months of
employment, bank employees are assigned to one of four work teams. We
therefore represent the teams as four lower-level entity sets of the higher-
level employee entity set. A given employee is not assigned to a specific
team entity automatically on the basis of an explicit defining condition.
Instead, the user in charge of this decision makes the team assignment on
an individual basis. The assignment is implemented by an operation that
adds an entity to an entity set.

A second type of constraint relates to whether or not entities may


belong to more than one lower-level entity set within a single
generalization. The lower-level entity sets may be one of the following:

• Disjoint. A disjointness constraint requires that an entity belong to no


more than one lower-level entity set. In our example, an account entity
can satisfy only one condition for the account-type attribute; an entity can
be either a savings account or a checking account, but cannot be both.

• Overlapping. In overlapping generalizations, the same entity may


belong to more than one lower-level entity set within a single
generalization. For an illustration, consider the employee work team
example, and assume that certain managers participate in more than one
work team. A given employee may therefore appear in more than one of
the team entity sets that are lower-level entity sets of employee. Thus,
the generalization is overlapping.

As another example, suppose generalization applied to entity sets


customer and employee leads to a higher-level entity set person. The
generalization is overlapping if an employee can also be a customer.
105
Lower-level entity overlap is the default case; a disjointness
constraint must be placed explicitly on a generalization (or specialization).
We can note a disjointedness constraint in an E-R diagram by adding the
word disjoint next to the triangle symbol.

A final constraint, the completeness constraint on a


generalization or specializa- tion, specifies whether or not an entity in the
higher-level entity set must belong to at least one of the lower-level entity
sets within the generalization/specialization. This constraint may be one of
the following:

•Total generalization or specialization. Each higher-level entity must


belong to a lower-level entity set.

•Partial generalization or specialization. Some higher-level entities


may not belong to any lower-level entity set.

Partial generalization is the default. We can specify total


generalization in an E-R diagram by using a double line to connect the box
representing the higher-level entity set to the triangle symbol. (This
notation is similar to the notation for total participation in a relationship.)

The account generalization is total: All account entities must be


either a savings account or a checking account. Because the higher-level
entity set arrived at through generalization is generally composed of only
those entities in the lower-level entity sets, the completeness constraint
for a generalized higher-level entity set is usually total. When the
generalization is partial, a higher-level entity is not constrained to appear
in a lower-level entity set. The work team entity sets illustrate a partial
specialization. Since employees are assigned to a team only after 3
months on the job, some employee entities may not be members of any of
the lower-level team entity sets.

We may characterize the team entity sets more fully as a partial,


overlapping specialization of employee. The generalization of checking-

106
account and savings-account into account is a total, disjoint
generalization. The completeness and disjointness constraints, however,
do not depend on each other. Constraint patterns may also be partial-
disjoint and total-overlapping.

We can see that certain insertion and deletion requirements follow


from the constraints that apply to a given generalization or specialization.
For instance, when a total completeness constraint is in place, an entity
inserted into a higher-level entity set must also be inserted into at least
one of the lower-level entity sets. With a condition-defined constraint, all
higher-level entities that satisfy the condition must be inserted into that
lower-level entity set. Finally, an entity that is deleted from a higher-level
entity set also is deleted from all the associated lower-level entity sets to
which it belongs.

Aggregation

One limitation of the E-R model is that it cannot express


relationships among relationships. To illustrate the need for such a
construct, consider the ternary relationship works-on, which we saw
earlier, between a employee, branch, and job. Now, suppose we want to
record managers for tasks performed by an employee at a branch; that is,
we want to record managers for (employee, branch, job) combinations.
Let us assume that there is an entity set manager.

One alternative for representing this relationship is to create a


quaternary relation- ship manages between employee, branch, job, and
manager. (A quaternary relationship is required — a binary relationship
between manager and employee would not permit us to represent which
(branch, job) combinations of an employee are managed by which
manager. We have omitted the attributes of the entity sets, for simplicity.

It appears that the relationship sets works-on and manages can be


combined into one single relationship set. Nevertheless, we should not

107
combine them into a single relationship, since
some employee, branch, job combinations many not have a manager.

There is redundant information in the resultant figure, however,


since every employee, branch, job combination in manages is also
in works-on. If the manager were a value rather than a manager entity,
we could instead make manager a multivalued at- tribute of the
relationship works-on. But doing so makes it more difficult (logically as
well as in execution cost) to find, for example, employee-branch-job
triples for which a manager is responsible. Since the manager is
a manager entity, this alternative is ruled out in any case.

The best way to model a situation such as the one just described is
to use aggregation. Aggregationis an abstraction through which
relationships are treated as higher- level entities. Thus, for our example,
we regard the relationship set works-on (relating the entity
sets employee, branch, and job) as a higher-level entity set called works-
on. Such an entity set is treated in the same manner as is any other entity
set. We can then create a binary relationship manages between works-on
and manager to represent who manages what tasks.

Alternative E-R Notations

There is no universal standard for E-R diagram notation, and


different books and E-R diagram software use different notations; An
108
entity set may be represented as a box with the name outside, and the
attributes listed one below the other within the box. The primary key
attributes are indicated by listing them at the top, with a line separating
them from the other attributes.

Cardinality constraints can be indicated in several different ways.


The labels ∗ and 1 on the edges out of the relationship are sometimes
used for depicting many-to-many, one-to-one, and many-to-one
relationships. The case of one-to-many is symmetric to many-to-one, and
is not shown. In another alternative notation in, relationship sets are
represented by lines between entity sets, without diamonds; only binary
relationships can be modeled thus. Cardinality constraints in such a
notation are shown by “crow’s foot” notation.

4.8 RELATIONAL DATABASE DESIGN


The number one goal of relational database design is to, as closely
as possible, develop a database that models some real-world system. This
involves breaking the real-world system into tables and fields and
determining how the tables relate to each other. Although on the surface
this task might appear to be trivial, it can be an extremely cumbersome
process to translate a real-world system into tables and fields.

109
A properly designed database has many benefits. The processes of
adding, editing, deleting, and retrieving table data are greatly facilitated
by a properly designed database. In addition, reports are easier to build.
Most importantly, the database becomes easy to modify and maintain.

4.9 ATOMIC DOMAIN AND FIRST NORMAL FORM

"Atomic" has never really meant "indivisible", which is why that term is
finally falling out of favor. Loosely speaking, "atomic" means if a value has
component parts, the dbms either ignores the existence of those parts, or
it provides functions to manipulate them. For example, a timestamp value
has these parts.

 Year
 Month

 Day

 Hours

 Minutes

 Seconds

 Milliseconds

That kind of value is obviously divisible, and all database management


systems provide functions to manipulate those parts. They also provide a
way to select a timestamp as a single value. (Which, of course, it is.)
Atomic means data which cannot be divided further.

Rule of atomicity:-

 rule 1: a column with atomic data can't have several values of the
same type of data in the same column.
 rule2: a table with atomic data can't have several columns with the
same datatype.

110
Like fullname column can't say that it could be atomic because it can be
further divded into lastname, firstname. A column with interest could also
be divided further, so a column which can't be divided is known as atomic

First normal form (1NF) is a property of a relation in a relational


database. A relation is in first normal form if and only if the domain of
each attribute contains only atomic (indivisible) values, and the value of
each attribute contains only a single value from that domain. The first
definition of the term, in a 1971 conference paper by Edgar Codd, defined
a relation to be in first normal form when none of its domains have any
sets as elements.

First normal form is an essential property of a relation in a relational


database. Database normalization is the process of representing a
database in terms of relations in standard normal forms, where first
normal is a minimal requirement. First normal form enforces these
criteria:

Eliminate repeating groups in individual tables:-

 Create a separate table for each set of related data.

 Identify each set of related data with a primary key.

Example :-

The following scenario illustrates how a database design might


violate first normal form.

Domains and values Below is a table that stores the names and
telephone numbers of customers. One requirement though is to retain
multiple telephone numbers for some customers. The simplest way of
satisfying this requirement is to allow the "Telephone Number" column in
any given row to contain more than one value:

Customer

111
Customer First Surnam
ID Name e Telephone Number
555-861-2025, 192-122-
123 Pooja Singh 1111
(555) 403-1659 Ext. 53; 182-
456 Zhang San 929-2929

789 John Doe 555-808-9633

Note that the telephone number column simply contains text:

numbers of different formats, and more importantly, more than one

number for two of the customers. We are duplicating related information

in the same column. If we would be satisfied with such arbitrary text, we

would be fine. But it's not arbitrary text at all: we obviously intended this

column to contain telephone number(s). Seen as telephone numbers, the

text is not atomic: it can be subdivided. As well, when seen as telephone

numbers, the text contains more than one number in two of our rows. This

representation of telephone numbers is not in first normal form: our

columns contain non- atomic values, and they contain more than one of

them.

A design that complies with 1NF

To bring the model into the first normal form, we split the strings we

used to hold our telephone number information into "atomic" (i.e.

indivisible) entities: single phone numbers. And we ensure no row

contains more than one phone number.

Note that the "ID" is no longer unique in this solution with duplicated

customers. To uniquely identify a row, we need to use a combination of

(ID, Telephone Number).to-many relationship exists between the name

and the number tables. A row in the "parent" table, Customer Name, can

112
be associated with many telephone number rows in the "child"

table, Customer Telephone Number, but each telephone number

belongs to one, and only one customer. It is worth noting that this design

meets the additional requirements for second and third normal form.

113
Customer

Customer First Surnam Telephone


ID Name e Number

123 Pooja Singh 555-861-2025

123 Pooja Singh 192-122-1111

456 Zhang San 182-929-2929

(555) 403-1659 Ext.


456 Zhang San 53

789 John Doe 555-808-9633

Atomcity

Edgar F. Codd's definition of 1NF makes reference to the concept of

'atomicity'. Codd states that the "values in the domains on which each

relation is defined are required to be atomic with respect to the

DBMS."Codd defines an atomic value as one that "cannot be decomposed

into smaller pieces by the DBMS (excluding certain special

functions)"meaning a column should not be divided into parts with more

than one kind of data in it such that what one part means to the DBMS

depends on another part of the same column.

114
Hugh Darwen and Chris Date have suggested that Codd's concept of

an "atomic value" is ambiguous, and that this ambiguity has led to

widespread confusion about how 1NF should be understood. In particular,

the notion of a "value that cannot be decomposed" is problematic, as it

would seem to imply that few, if any, data types are atomic:
 A character string would seem not to be atomic, as the RDBMS typically
provides operators to decompose it into substrings.
 A fixed-point number would seem not to be atomic, as the RDBMS
typically provides operators to decompose it into integer and fractional
components.
 An ISBN would seem not to be atomic, as it includes language and
publisher identifier.
Date suggests that "the notion of atomicity has no absolute meaning":
a value may be considered atomic for some purposes, but may be
considered an assemblage of more basic elements for other purposes. If
this position is accepted, 1NF cannot be defined with reference to
atomicity. Columns of any conceivable data type (from string types and
numeric types to array types and table types) are then acceptable in a
1NF table—although perhaps not always desirable; for example, it would
be more desirable to separate a Customer Name column into two
separate columns as First Name, Surname.
First normal form, as defined by Chris Date, permits relation-
valued attributes (tables within tables). Date argues that relation-
valued attributes, by means of which a column within a table can contain
a table, are useful in rare cases.
1NF tables as representations of relation
According to Date's definition, a table is in first normal form if and
only if it is "isomorphic to some relation", which means, specifically, that it
satisfies the following five conditions:
 There's no top-to-bottom ordering to the rows.

115
 There's no left-to-right ordering to the columns.
 There are no duplicate rows.
 Every row-and-column intersection contains exactly one value from
the applicable domain (and nothing else).
 All columns are regular [i.e. rows have no hidden components such
as row IDs, object IDs, or hidden timestamps]
 Violation of any of these conditions would mean that the table is not
strictly relational, and therefore that it is not in first normal form.
 Examples of tables (or views) that would not meet this definition of
first normal form are:
 A table that lacks a unique key. Such a table would be able to
accommodate duplicate rows, in violation of condition 3.
 A view whose definition mandates that results be returned in a
particular order, so that the row-ordering is an intrinsic and
meaningful aspect of the view. This violates condition 1.
The tuples in true relations are not ordered with respect to each
other.
 A table with at least one null able attribute. A null able attribute
would be in violation of condition 4, which requires every column to
contain exactly one value from its column's domain. It should be
noted, however, that this aspect of condition 4 is controversial. It
marks an important departure from Codd's later vision of
the relational model, which made explicit provision for nulls.

4.10 DECOMPOSITION USING FUNCTIONAL DEPENDENCY


 Constraints on the set of legal relations.
 Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes.
 A functional dependency is a generalization of the notion of a key.
 Let R be a relation schema
a Í R and b Í R
 The functional dependency

116
a®b
holds on R if and only if for any legal relations r(R), whenever any
two tuples t1 and t2 of r agree on the attributes a, they also agree on
the attributes b. That is,
t1[a] = t2 [a] Þ t1[b ] = t2 [b ]

 Example: Consider r(A,B ) with the following instance of r.

 On this instance, A ® B does NOT hold, but B ® A does hold.

 K is a superkey for relation schema R if and only if K ® R

 K is a candidate key for R if and only if

K ® R, and for no a Ì K, a ® R

 Functional dependencies allow us to express constraints that cannot


be expressed using superkeys

 If a relation r is legal under a set F of functional dependencies, we


say that r satisfies F. Specify constraints on the set of legal relations

 We say that F holds on R if all legal relations on R satisfy the set of


functional dependencies F.

 For example, a specific instance of loan may, by chance, satisfy


amount ® customer_name.

 A functional dependency is trivial if it is satisfied by all instances of


a relation

4.11 FUNCTIONAL-DEPENDENCY THEORY


 We now consider the formal theory that tells us which functional
dependencies are implied logically by a given set of functional
dependencies.
 We then develop algorithms to generate lossless decompositions
into BCNF and 3NF

 We then develop algorithms to test if a decomposition is


dependency-preserving
117
Closure of a Set of Functional Dependencies

 Given a set F set of functional dependencies, there are certain other


functional dependencies that are logically implied by F.
 For example: If A ® B and B ® C, then we can infer that A ® C
 The set of all functional dependencies logically implied by F is the
closure of F.
 We denote the closure of F by F+.
 We can find all of F+ by applying Armstrong’s Axioms:
if b Í a, then a ® b (reflexivity)
if a ® b, then g a ® g b (augmentation)
if a ® b, and b ® g, then a ® g (transitivity)
These rules are
 sound (generate only functional dependencies that actually hold)
and
 complete (generate all functional dependencies that hold).
Eg

R = (A, B, C, G, H, I)
F={ A®B
A®C
CG ® H
CG ® I
B ® H}

some members of F+
A®H by transitivity from A ® B and B ® H
AG ® I by augmenting A ® C with G, to get AG ® CG
and then transitivity with CG ® I
CG ® HI by augmenting CG ® I to infer CG ® CGI, and
augmenting of CG ® H to infer CGI ® HI, and then transitivity
Formal theory of functional dependencies
 Rules for computing with these
 Can be used when showing that a database satisfies normal forms
 And for giving a formal definition of e.g. dependency preserving
decomposition
118
 Also important for decomposition algorithms .An implied functional
dependency is one that follows from other stated ones . Assume e.g.
ID → dept_name
dept_name → budget
ID → budget
 The definitions of the normal forms talk about all functional
dependencies, including the implied ones 9 ID → dept_name
dept_name → budget
 Decomposition using Multivalued Dependencies
 But wait! There’s more! Recall that, at the beginning of this whole
exercise, we established that the ultimate goal of all of this is to
eliminate redundancy.
 Unfortunately, BCNF does not completely eradicate all cases of
redundancy. There isn’t a precise rule that characterizes such a
schema, but a general intuitive notion is for schemas that involve a
relationship among entities which either have multivalued attributes
or are involved in other 1-to-many relationships. Such schemas will
allow copies of data without violating any functional dependencies.
 This final redundancy is dealt with using a new type of constraint,
called a multivalued dependency, and an accompanying normal
form, fourth normal form (4NF).
 To define a multivalued dependency, we begin with the observation
that a functional dependency rules out tuples — it prevents certain
tuples from appearing in a relation. But what about the converse?
What if we want to require the presence of a tuple? That’s where
multivalued dependencies come in.
 Because of this distinction, functional dependencies are sometimes
referred to as equalitygenerating, and multivalued dependencies
are referred to as tuple-generating.
 Given a relation schema R and attribute sets α ⊆ R and β ⊆ R. A
multivalued dependency of β on α, written as α →→ β, holds on R if,

119
for any legal relation r(R), ∀ t1, t2 ∈ r|t1[α] = t2[α], ∃ t3, t4 ∈ r such
that:
t1[α] = t2[α] = t3[α] = t4[α]

t3[β] = t1[β]

t3[R − β] = t2[R − β]

t4[β] = t2[β]

t4[R − β] = t1[R − β]

 Think about this as mix-and-match: if α →→ β, then two tuples with a


matching α will have corresponding tuples with the same α such
that the resulting four tuples match all combinations of β and R − β.
 As before, the term trivial is attached to the dependency if it is
satisfied by all possible r(R). For example, α →→ β is trivial if β ⊆ α or
β ∪ α = R — in these cases, tuples play double duty and actually
match themselves (or each other) to fulfill the above conditions, so
all possible relations will indeed follow these multivalued
dependencies.
 Like functional dependencies, multivalued dependencies can be
inferred from each other, and so the notion of closure also applies:
given a set D of functional and multivalued dependencies, D+ is the
set of all functional and multivalued dependencies logically implied
by D.
 Also, every functional dependency is also a multivalued
dependency.

4.12 DECOMPOSITION USING MULTIVALUED DEPENDENCIES


For readers interested in pursuing the technical background of
fourth normal form a bit further, we mention that fourth normal form is
defined in terms of multivalued dependencies, which correspond to our
independent multi-valued facts. Multivalued dependencies, in turn, are
defined essentially as relationships which accept the "cross-product"
maintenance policy mentioned above. That is, for our example, every one
120
of an employee's skills must appear paired with every one of his
languages.
It may or may not be obvious to the reader that this is equivalent to
our notion of independence: since every possible pairing must be present,
there is no "information" in the pairings. Such pairings convey information
only if some of them can be absent, that is, only if it is possible that some
employee cannot perform some skill in some language. If all pairings are
always present, then the relationships are really independent.
We should also point out that multivalued dependencies and fourth
normal form apply as well to relationships involving more than two fields.
For example, suppose we extend the earlier example to include projects,
in the following sense:
 An employee uses certain skills on certain projects.
 An employee uses certain languages on certain projects.
If there is no direct connection between the skills and languages that an
employee uses on a project, then we could treat this as two independent
many-to-many relationships of the form EP:S and EP:L, where "EP"
represents a combination of an employee with a project. A record
including employee, project, skill, and language would violate fourth
normal form. Two records, containing fields E,P,S and E,P,L, respectively,
would satisfy fourth normal form.

4.13 MORE NORMAL FORM


First Normal Form:-

First normal form excludes variable repeating fields and groups. This
is not so much a design guideline as a matter of definition. Relational
database theory doesn't deal with records having a variable number of
fields.
Second And Third Normal Forms:-
Second and third normal forms deal with the relationship between
non-key and key fields.

121
Under second and third normal forms, a non-key field must provide
a fact about the key, us the whole key, and nothing but the key. In
addition, the record must satisfy first normal form.
We deal now only with "single-valued" facts. The fact could be a
one-to-many relationship, such as the department of an employee, or a
one-to-one relationship, such as the spouse of an employee. Thus the
phrase "Y is a fact about X" signifies a one-to-one or one-to-many
relationship between Y and X. In the general case, Y might consist of one
or more fields, and so might X. In the following example, QUANTITY is a
fact about the combination of PART and WAREHOUSE.
Second normal form is violated when a non-key field is a fact about
a subset of a key. It is only relevant when the key is composite, i.e.,
consists of several fields. Consider the following inventory record:

PART | WAREHOUSE | QUANTITY |WAREHOUSE-ADDRESS

The key here consists of the PART and WAREHOUSE fields together,
but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The
basic problems with this design are:
 The warehouse address is repeated in every record that refers to a
part stored in that warehouse.
 If the address of the warehouse changes, every record referring to a
part stored in that warehouse must be updated.
 Because of the redundancy, the data might become inconsistent,
with different records showing different addresses for the same
warehouse.
 If at some point in time there are no parts stored in the warehouse,
there may be no record in which to keep the warehouse's address.
To satisfy second normal form, the record shown above should be
decomposed into (replaced by) the two records:
| PART | WAREHOUSE | QUANTITY | | WAREHOUSE | WAREHOUSE-
ADDRESS |

122
When a data design is changed in this way, replacing unnormalized
records with normalized records, the process is referred to as
normalization. The term "normalization" is sometimes used relative to a
particular normal form. Thus a set of records may be normalized with
respect to second normal form but not with respect to third.
The normalized design enhances the integrity of the data, by
minimizing redundancy and inconsistency, but at some possible
performance cost for certain retrieval applications. Consider an
application that wants the addresses of all warehouses stocking a certain
part. In the unnormalized form, the application searches one record type.
With the normalized design, the application has to search two record
types, and connect the appropriate pairs.
Third Normal Form:-

Third normal form is violated when a non-key field is a fact about


another non-key field, as in

| EMPLOYEE | DEPARTMENT | LOCATION |

The EMPLOYEE field is the key. If each department is located in one


place, then the LOCATION field is a fact about the DEPARTMENT -- in
addition to being a fact about the EMPLOYEE. The problems with this
design are the same as those caused by violations of second normal form:
 The department's location is repeated in the record of every
employee assigned to that department.
 If the location of the department changes, every such record must
be updated.
 Because of the redundancy, the data might become inconsistent,
with different records showing different locations for the same
department.
 If a department has no employees, there may be no record in which
to keep the department's location.
To satisfy third normal form, the record shown above should be
decomposed into the two records:
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
123
To summarize, a record is in second and third normal forms if every
field is either part of the key or provides a (single-valued) fact about
exactly the whole key and nothing else.
Fourth And Fifth Normal Forms:-
Fourth [5] and fifth [6] normal forms deal with multi-valued facts.
The multi-valued fact may correspond to a many-to-many relationship, as
with employees and skills, or to a many-to-one relationship, as with the
children of an employee (assuming only one parent is an employee). By
"many-to-many" we mean that an employee may have several skills, and
a skill may belong to several employees.
Note that we look at the many-to-one relationship between children
and fathers as a single-valued fact about a child but a multi-valued fact
about a father.
Fifth Normal Form:-
Fifth normal form deals with cases where information can be
reconstructed from smaller pieces of information that can be maintained
with less redundancy. Second, third, and fourth normal forms also serve
this purpose, but fifth normal form generalizes to cases not covered by
the others.
We will not attempt a comprehensive exposition of fifth normal form, but illustrate the
central concept with a commonly used example, namely one involving agents, companies,
and products. If agents represent companies, companies make products, and agents sell
products, then we might want to keep a record of which agent sells which product for which
company. This information could be kept in one record type with three fields:
AGENT COMPAN PRODUC
Y T

Smith Ford car


Smith GM truck

This form is necessary in the general case. For example, although


agent Smith sells cars made by Ford and trucks made by GM, he does not
124
sell Ford trucks or GM cars. Thus we need the combination of three fields
to know which combinations are valid and which are not.
But suppose that a certain rule was in effect: if an agent sells a certain product, and he
represents a company making that product, then he sells that product for that company.
AGENT COMPANY PRODUCT
Smith Ford car
Smith Ford truck
Smith GM car
Smith GM truck
Jones Ford car

In this case, it turns out that we can reconstruct all the true facts
from a normalized form consisting of three separate record types, each
containing two fields:

AGEN COMPAN COMPAN PRODUC AGEN PRODUC


T Y Y T T T
Smith Ford Ford car Smith car
Smith GM Ford truck Smith truck
Jones Ford GM car Jones car

125
UNIT - V
5.1 DATABASE SYSTEM ARCHITECTURE
The architecture of a database system is greatly influenced by the
underlying complete system on which it runs, by such aspects of
computer architectures as networking, parallelism and distribution.
 Networking of computers allows some tasks to be executes on server
system, and some tasks to be executed on client systems. This division of
work is client-server database systems.
 Parallel processing within a computer system allows database
system activities to be speed up, allowing faster response to transactions,
as well as more transactions per second. The need for parallel query
processing has let to parallel database system.
 Keeping multiple copies of the database across different sites also
allows large organizations to continue their database operations even
though the site is affected by a natured disaster. The distributed
database systems handle geographically or administratively distributed
data spread across multiple database system.
5.2 CENTRALIZED AND CLIENT-SYSTEM ARCHITECTURES
Centralized database systems are those that run on a single
computer system and do not interact with computer systems. The client-
server systems have functionality split between a server system and
multiple client systems.
Centralized Systems:-

A general-purpose computer system consists of one to a few CPUs


and a number of device controllers that are connected through a common
bus that provides access to shared memory

126
Disk Disk Printer tape drives

CPU Disk Printer Tape=Driv


controller Controller e controller

System bus
Memory
Controller

Memory

A Centeralized computer system

The CPUs have local cache memories that store local copies of parts
of the memory, to speed up access to data. Each device controller is in
charge of a specific type of device.The personal computers and
workstations fall into the single-user systems. A typical single user is a
desktop unit used by a single person, usually with only one CPU and one
or two hard disks, and usually only one person using the machine at a
time.A multi-user system has more disks, more memory, multiple CPUs
and has a multi-user operating system. It serves a large number of users
of users who are connected to the system via terminals. The database
systems designed for use by single users usually, do not provide many of
the facilities that a multi-user database provides.The general-purpose
computer systems have multiple processors, they have coarse–
granularity parallelism, with only a few processors, all sharing the main
memory. Such systems support a higher throughput that is they allow a
greater number of transactions do not run any faster.The database
designed for single-processor machines provides multitasking, allowing
multiple processors to run on the same processor in a time-shared
manner, giving a view to the users of multiple processors running in
parallel.The machines with fine–granularity parallelism have a large

127
number of processors, and database systems running on such machines
attempt to parallelize single tasks submitted by users.

Client-Server Systems:-

The centralized systems act as server systems that satisfy requests


generated by client system. The below figure shows the general structure
of a client-server system.

Client Client Client Client


Server
Network

General structure of a client-server system

The database functionality can be broadly divided into two parts-the


front end and the back end as in the following figure:

SQL user – Forms Report Graphic


al Front-
interface interface writer
end
interfac
e
SQL Engine
Interface (SQL+API)

Back-end

Front-end and back-end functionality

The back end manages access structures, query evaluation and


optimization, concurrency control and recovery. The Front end consists of
tools such as forms, report writers, and graphical user interface facilities.
The interface between the front end and back end is through SQL, or
through an application program.The application development tools are
used to construct user interfaces; they provide graphical tools that can be
128
used to construct interfaces without any programming. The Transactional-
processing systems provide a transactional remote procedure call
interface to connect clients with a server.

5.3 SERVER SYSTEM ARCHITECTURE


The server system can be broadly categorized as transaction
servers and data servers.
Transaction–Server:-
It also called as query-server systems, provide an interface to which
clients can send requests to perform an action, in response to which they
execute the action and send back results to the client. The requests may
be specified by using SQL, or through a specialized application program
interface.

Data-Server Systems:-

It allows clients to interact with the servers by making request to


read or update data, in units such as files or pages.For example, file
servers provide a file-system interface where clients can create, update,
read, and delete files.

Transaction Server Process Structure:-

The transaction server system consists of multiple processes


accessing data in shared memory. The processes that form part of the
database system include

 Server Processes: This processes that receive user queries


(transactions) execute them, and send the results back. The queries
may be submitted to the server processes from a user interface, or
from a user process running embedded SQL, or via JDBC, ODBC, or
similar protocols. A thread is like a process, but multiple threads
execute as part of the same process, and all threads within a
process run in the same virtual memory space. Many database
systems use hybrid architecture, with multiple processes, each one
running multiple threads.

129
 Lock Manager Process: This process implements lock manager
functionality, which includes lock grant, lock release, and deadlock
detection.
 Database Writer Process: There are one or more processes that
output modified buffer blocks back to disk on a continuous basis.
 Log Writer Process: This process output log records from the log
record buffer to stable storage.
 Checkpoint Process: This process performs periodic
checkpoints.
 Process Monitor Process: This process monitors other process,
and if any of them, it takes recovery actions for the process, such
as aborting any transaction being executed by the failed process,
and then restarting the process.
The shared data contains all shared data, such as

 Buffer pool
 Lock table
 Log buffer, containing log records waiting to be output to the log on
stable storage.
 Cached query plans, which can be reused if the same query is
submitted again.
Data Servers:-

The data-server systems are used in local-area networks, where


there is a high-speed connection between the clients and the server, the
client machines are comparable in processing power to the server
machine, and the tasks to be executed are computation intensive. The
data-server architectures have been particularly popular in object-
oriented database systems. The time cost of communication between the
client and the server is high compared to that of a local memory
reference.

 Page Shipping versus Item Shipping: The unit of


communication for data can be of coarse granularity, such as a

130
page, or fine granularity, such as a tuple. Use the term item to refer
to both tuples and objects.
 Locking: Locks our usually granted by the server for the data items
that it ships to the client machines. The technique for lock de-
escalation, have been proposed where the server can request its
clients to transfer back locks on prefetched items. If the client
machine does not need a prefetched item, it can transfer locks on
the item, it can transfer locks on the item back to the server, and
the locks can then be allocated to other clients.
 Data Caching: Data that are shipped to a client on behalf of a
transaction can be cached at the client, even after the transaction
completes, if sufficient storage space is available.
 Lock Caching: If the data is mostly partitioned among the clients,
with clients rarely requesting data that are also requested by other
clients, locks can also cached at the client machine. The server must
keep track of cached locks; if a client requests a lock from the
server, the server must call back all conflicting locks on the data
item from any other client machines that have cached the locks.

5.4 PARALLEL SYSTEMS

The parallel systems improve processing and I/O speeds by using


multiple CPUs and disks in parallel. The centralized and client-server
database systems are not powerful enough to handle such applications. In
parallel processing, many operations are performed simultaneously, as
opposed to serial processing, in which the computational steps are
performed sequentially. A coarse–grain parallel machine consists of a
small number of powerful processor; a massively parallel or fine-grain
parallel machine uses thousands of smaller processors. There are two
main measures of performance of a database system: throughput, the
number of tasks that can be completed in a given time interval, and
response time, the amount of time it takes to complete a single task
from the time it is submitted. A system that processes a large number of
small transactions can improve throughput by processing many
131
transactions in parallel. A system that processes large transactions can
improve response time as well as throughput by performing sub tasks of
each transaction in parallel.

Speedup and Scaleup:-


Two important issues in studying parallelism are speedup and
scaleup. Running a given task in less time by increasing the degree of
parallelism is called speedup. Handling larger tasks by increasing the
degree of parallelism is called scaleup.

Linear speedup

Speed Sub linear speedup

Resources
Figure: Speedup with Increasing Resources

Suppose the execution time of a task on the larger machine is T L ,


and the execution time of the same task on the smaller machine is Ts. The
speedup due to parallelism is defined as Ts/ TL. The parallel system is said
to demonstrate linear speedup if the speedup is N when the large
systems has N times the resources of the smaller system. If the speedup
is less than N, the system is said to demonstrate sublinear speedup.

There are two kinds of scaleup that the relevant in parallel database
systems, depends on how the size of the task is measured:

In batch scaleup, the size of the database increases, and the tasks
are large jobs whose runtime depends on the size of the database.In
transaction scaleup, the rate at which transactions are submitted to the

132
database increases and the size of the database increases proportionally
to the transaction rate.

The number of factors works against efficient parallel operations and can
diminish both speedup and scaleup.

 Startup Costs There is a startup cost associated with initiating a


single process. In a parallel operation consisting of thousands of
processes, the startup time may overshadow the actual processing
time, affecting speedup adversely.
 Skew By breaking down a single task into a number of parallel
steps, we reduce the size of the average step.
Interconnection Network:-
Parallel systems consist of a set of components (processors,
memory, and disks) that can communicate with each other via an
interconnection network.
The following are three commonly used types of interconnection
networks.

 Bus All the system components can send data on and receive data
from a single communication bus.
 Mesh The components are nodes in grid, and each component
connects to all its adjacent components in the grid. In a two-
dimensional mesh each node connects to four adjacent nodes. The
number of communication links, grows as the number of
components grows, and the communication capacity of a mesh
therefore scales better with increasing parallelism.
 Hypercube The components are numbered in binary, and a
component is connected to another if the binary representations of
their number differ in exactly one bit. Thus, each of the n
components is connected to log(n) other components.
Parallel Database Architectures:-

There are several architectural models for parallel machines.

133
Shared Memory:-

In shared-memory architecture, the processors and disks have


access to a common memory, typically via bus or through an
interconnection network.The benefit of this memory is extremely efficient
communication between processors-data in shared memory can be
accessed by any processor without being moved with software. A
processor can send messages to other processors much faster by using
memory writes than by sending a message through a communication
mechanism.It usually have large memory caches at each processor. The
caches need to be kept coherent; that is, if a processor performs a write
to a memory location, the data in that memory location should be either
updated at or removed from any processor where the data is cached

Shared Disk:-

All the processors share a common set of disks. Shared-disk


systems are sometimes called clusters. Each processor has its own
memory, the memory bus is not a bottleneck. It offers a cheap way to

134
provide a degree of fault tolerance: If a processor fails, the other
processor can take its tasks, since the database is resident on disks that
are accessible from all processors.The main problem with a shared-disk
system is again scalability. Compared to shared-memory systems, shared-
disk systems can scale to a somewhat larger number of processors, but
communication across processor is slower, since it has to go through a
communication network.

Shared Nothing:-

The processors share neither a common memory nor common disk.


Each node of the machine consists of a processor, memory, and one or
more disks. The processors at one node may communicate with an other
processor at another node by a high-speed interconnection network.The
interconnection networks for shared-nothing systems are usually designed
to be scalable, so that their transmission capacity increases as more
nodes are needed.It is more scalable and can easily support a large
number of processors. The main drawback of this architectures are the
costs of communication of nonlocal disk access, which are higher than is a
shared-memory or shared-disk architecture since sending data involves
software interaction at both ends.

Hierarchical:-

The hierarchical architecture combines the characteristics of shared-


memory, shared-disk, and shared-nothing architectures. At the top-level,
the system consists of nodes connected by an interconnection network,
and does not share disks or memory with one another.Each node of the
system could actually be a shared-memory system with few processors.
Each node could be a shared-disk system, and each of the systems
sharing a set of disks could be a shared-memory system.A system could
be built as a hierarchy, with shared-memory architecture with a few
processors at the base, and a shared-nothing architecture at the top, with
possibly shared-disk architecture in the middle.

135
5.5 DISTRIBUTED SYSTEMS
In a distributed database system, the database is stored on several
computers. The computers in a distributed system communicate with one
another through various communication media, such as high–speed
networks or telephone lines. They do not share main memory or disks.
The computers in a distributed system may very in size and function,
ranging from workstations unto mainframe systems. The computers in a
distributed system are referred to by a number of different names, such
as sites or nodes, depending on the content in which they are
mentioned. A local transaction is one that accesses the data only from
sites where the transaction was initiated. A global transaction, on the
other hand, is one that either accesses data in a site different from the
one at which the transaction was initiated, or accessed data in several
different sites.

There are several reasons for building distributed database systems, they
are:

 Sharing Data. The major advantage in building a distributed


database system is the provision of an environment where users at
one site may be able to access the data residing at other sites.
 Autonomy. The primary advantage of sharing data by means of
data distribution is that each site is able to retain a degree of
control over data that are stored locally. In a distributed system,
there is a global database administrator may have a different
degree of local autonomy. The possibility of local autonomy is
often a major advantage of distributed databases.
 Availability. If one site fails in a distributed system, the remaining
sites may be able to continue operating. If data items replicated in
several sites, a transaction needing a particular data item may find
that item in any of several sites. The failure of one site must be
detected by the system, and appropriate action may be needed to

136
recover from the failure. It is crucial for database systems used for
real-time applications.

Netw
ork

Site A Site C

Communication Via

network Site B

A Distributed System

Implementation Issues:-

Atomicity of transaction is an important issue in building a


distributed database system. If a transaction runs across two sites, unless
the system designers are careful, it may commit at one site and abort at
another, leading to an inconsistent state. Transaction commit protocols
ensure such situation cannot arise. Two 2PC is the most widely used of
these protocols.The concurrency control is another issue in a distributed
database. Since a transaction may access data items at several sites,
transaction managers at several sites may need to coordinate to
implement concurrency control.

Replication of data items, which is the key to the continued


functioning of distributed databases when failures occur, further
complicates concurrency control. The primary disadvantage of
distributed database systems is the added complexity required to ensure
proper coordination among the sites. This increased complexity takes
various forms:
137
Software-development Cost: It is more difficult to implement a
distributed database system; thus, it is more costly.
Greater Potential for Bugs: Since the sites that constitute the
distributed system operate in parallel, it is harder to ensure the
correctness of algorithms, especially operation during failures of part of
the systems, and recovery from failures. The potential exists for
extremely subtle bugs.
Increased Processing Overhead: The exchange of messages and the
additional computation required to achieve intersite coordination are a
form of overhead that does not arise in centralized systems.

5.6 NETWORK TYPES

The distributed databases and client-server systems are built


around communication networks. There are basically two types of
networks:

 Local-Area Networks
 Wide-Area Networks
Local-Area Networks:-

It is emerged in the early 1970s as a way for computers to


communicate and to share data with one another. It is generally used in
an office environment.All the sites in such systems are close to one
another, so the communication links tend to have a higher speed and
lower error rate than do their counterparts in wide-area networks.The
most common links in a local-area network are twisted pair, coaxial cable,
fiber optics, and wireless connections.A Storage-Area Network (SAN) is
a special type of high-speed local-area network designed to connect large
banks of storage devices to computers that use the data. Thus storage-
area networks help builds large-scale shared-disk systems. The
motivation is to connect multiple computers to large banks of storage
devices is essentially the same as that for shared-disk databases, namely

 Scalability by adding more computers.

138
 High availability, since data is still accessible even if a
computer fails.
Wide-Area Network:-

It is emerged in the late 1960s mainly used for wide community of


users. The first WAN to be designed and developed was the Arpanet.The
Arpanet has grown from four-site experimental network to a worldwide
network of networks, the Internet, comprising hundred of millions of
computer systems. The links on the Internet are fiber-optic lines and
satellite channels.

It can be classified into two types

 In discontinuous connection WANs, such as those based on


wireless connections, hosts are connected to the network only part
of the time.
 In continuous connection WANs, such as the wired Internet,
hosts are connected to network at all times.

5.7 DISTRIBUTED DATABASES

A database physically stored in two or more computer systems.


Although geographically dispersed, a distributed database system
manages and controls the entire database as a single collection of data.

5.8 HOMOGENEOUS AND HETROGENEOUS DATABASES


In a homogeneous distributed database, all sites have identical
database management system software, are aware of one another, and
agree to cooperate in processing user’s request. That software must also
cooperate with other sites in exchanging information about transactions,
to make transaction processing possible across multiple sites. In a
heterogeneous distributed Database, different sites may use different
schemes, and different database management system software.
5.9 DISTRIBUTED DATA STORAGE
Consider a relation r that is to be stored in the database. There are two
approaches to storing transaction in the distributed database

139
 Replication The system maintains several identical replicas of the
relation, and stores each replica at a different site. The alternative
to replication is to store only one copy of relation r.
 Fragmentation The system partitions the relation into several
fragments, and stores each fragment at a different site.

a. Data Replication
If relation r is replicated, a copy of relation r is stored in two or more
sites. The full replication is a copy is stored in every site in the system.

There are a number of advantages and disadvantages to replication.

 Availability. If one of the sites containing relation r


fails, then the relation r can be found in another site.
 Increased Parallelism. Where the majority of
accesses to the relation r result in only the reading of
the relation, then several sites can process queries
involving r in parallel. The data replication minimizes
movement of data between sites.
 Increased Overhead on Update. The system ensures
that all replicas of a relation r are consistent; otherwise,
erroneous computations may result. Thus, whenever r is
updated, the update must be propagated to all sites
containing replicas. This result is increased overhead.
b. Data Fragmentation
If relation r is fragmented, r is divided into a number of fragments r 1,
r2,….,rn. These fragments contain sufficient information to allow
reconstruction of the original relation r.

There are two different schemas for fragmenting a relation:

 Horizontal Fragmentation
 Vertical Fragmentation
1. Horizontal Fragmentation

140
The Horizontal Fragmentation splits the relation by assigning
each tuple of r to one or more fragments. A relation r is partitioned into a
number of subsets, r1, r2,….,rn. Each tuple of relation r must belong to at
least one of the fragments, so that the original relation can be
reconstructed.

Example

The account relation can be divided into several different fragments,


each of which consists of tuples of account belonging to a particular
branch. It is used to keep at the sites where they are used the most, to
minimize data transfer.

2. Vertical Fragmentation

This fragmentation splits the relation by decomposing the scheme


R of relation r. In vertical fragmentation of r(R) involves the definition
of several subsets of attributes R1,R2,…Rn of the schema R so that R=
R1ÈR2È…Rn

Transparency:-

The user of a distributed database system should not be required to


know either where the data are physically located or how the data can be
accessed at the specific local site. This characteristic, called data
transparency can take several forms.

 Fragmentation Transparency Users are not required to know


how a relation has been fragmented.
 Replication Transparency Users do not have to concern with
what data objects have been replicated, or where replicas have
been placed.
 Location Transparency Users are not required to know the
physical location of the data. The distributed database system

141
should be able to find any data as long as the data identifier is
supplied by the user transaction.

5.10 DISTRIBUTED TRANSACTIONS


Access to the various data items in a distributed system is usually
accomplished through transactions, which must preserve the ACID
properties. There are two types of transaction:

 Local Transaction
 Global Transaction
The Local Transactions are those that access and update data in
only one local database. The Global Transactions are those that access
and update data in several local databases.

System Structure:-

Each site has its own local transaction manager, whose function is
to ensure is to ensure the ACID properties of those transactions that
execute at that site.To understand how such a manager can be
implemented, consider an abstract model of a transaction system, in
which each site contains two subsystems:

 The Transaction Manager manages the execution of those


transactions that access data stored in a local site. Each such
transaction or part of a global transaction. The transaction manager
responsible for:Maintaining a log for recovery purposes.Participating
in an appropriate concurrency-control scheme to coordinate the
concurrent execution of the transactions executing at that site.
 The Transaction Coordinator coordinates the execution of the
various transactions initiate at the site. For each such transaction,
the coordinator is responsible for:
 Starting the execution of the transaction
 Breaking the transaction into a number of sub transactions and
distributing these sub transactions to the appropriate sites for
execution.

142
 Coordinating the termination of transaction, this may result in the
transaction being committed at all sites or aborted at all sites.

Transaction Coordinator

TC TC
1 n

TM TM
1 n

Transaction manager

Computer 1 Comptuer n

Diagram: System Architecture

System Failure Modes

A distributed system may also suffer from the same types of failure
that a centralized system does. The basic failure types are

 Failure of a site
 Loss of messages
 Failure of a communication link
 Network partition
143
The loss or corruption of messages is always a possibility in a
distributed system. The system user transmission-control protocols, such
as TCP/IP, to handle such errors.

Each configuration has advantages and disadvantages. The configurations


can be compared with one another, based on the following criteria:

 Installation cost The cost of physically linking the sites in the


system.
 Communication cost The cost in time and money to send a
message from site A to Site B
 Availability The degree to which data can be accessed despite the
failure of some links or sites.

5.11 COMMIT PROTOCOLS


The simplest and most widely used commit protocol is the two-
phase commit protocol (2PC). An alternative is the three-phase commit
protocol (3PC), which avoids certain disadvantages of the 2PC protocol but
adds to complexity and overhead.
a. Two-Phase Commit
Consider a transaction T initiated at site Si, where the transaction
coordinator is Ci.
b. The Commit Protocol
When T completes its execution-that is, when all the sites at which T
has executed inform Ci that T has completed- Ci starts the 2PC protocol.
Phase 1

Ci adds the record <Prepare T> to the log, and forces the log onto
stable storage. It then sends a prepare T message to all sites at which T
executed.On receiving such a message, the transaction manager at that
site determines whether it is waiting to commit its portion of T.If the
answer is no, it adds a record <no T> to the log, and then responds by
sending an abort T message to Ci. If the answer is yes, it adds a record
<ready T> to the log, and forces the log onto stable storage. The
transaction manager then replies with a ready T message to Ci.

144
Phase 2

When Ci receives responses to the prepare T message from all the


sites, or when a pre specified interval of time has elapsed since the
prepare T message was send out, Ci can determine whether the
transaction T can be committed or aborted.Transaction T can be
committed if Ci received a ready T message from all the participating
sites. T message from all the participating sites. Otherwise, transaction T
must be aborted. Depending on the verdict, either a record <commit T>
or a record <abort T> is added to the log and the log is forced onto stable
storage.

c. Handling of Failures

The 2PC protocol responds in different ways to various type of failures:

1. Failure of a Participating Site

If the coordinator Ci detects that a site has failed, it takes these


actions: if the site fail before responding with a ready T message to Ci, the
coordinator assumes that it responded with an abort T message.If the site
fails after the coordinator has received the ready T message from the site,
the coordinator executes the rest of the commit protocol in the normal
fashion, ignoring the failure of the site.

2. Failure of the Coordinator

If the coordinator fails in the midst of the execution of the commit


protocol for transaction T, then the participating sites cannot decide
whether to commit or abort T, and therefore these sites must wait for the
recovery of the failed coordinator.

d. Network Partition

When a network partitions, two possibilities exists

 The coordinator and all its participants remain in one partition.

145
 The coordinator and its participants belong to several partitions.
From the view point of the sites in one of the partitions, it appears
that the sites in other partitions have failed.
e. Recovery and Concurrency Control

When a failed site restarts, perform the recovery algorithm. The


recovering site must determine the commit-aborts status of such
transactions by contacting other site. The recovery algorithms provide
support for noting lock information in the log.

f. Three-Phase Commit

 The 3PC protocol is an extension of two-phase commit protocol that


avoids the blocking problem under certain assumptions. The
protocol avoids blocking by introducing an extra third phase where
multiple sites are involved in the decision commit.
 While the 3PC protocol has the desirable property of not blocking
unless k sites fail, if has the drawback that a partitioning of the
network will appear to be the same as more than k sites failing,
which would lead to blocking. The 3PC protocol is not widely used.
g. Alternative Models of Transaction Processing

The use of persistent messaging to avoid the problem of distributed


commit, and then the larger issue of workflows. The persistent messages
are messages that are guarantee to be delivered to the recipient exactly
once, regardless of failures, if the transaction sending the message
commits, and are guaranteed to not be delivered if the transaction aborts.
The database recovery techniques are used to implement persistent
message on top on the normal network channel. Error handling is more
complicated with persistent messaging than two-phase commit. The
application programs that send and receive persistent messages must
include code to handle exception conditions and bring the system back to
a consistent state. There are many applications where the benefit of
eliminating blocking is well worth the extra efficient to implement systems
146
that use persistent messages. The workflows provide a general model of
transaction processing involving multiple sites and possibly human
processing of certain steps. The persistent messaging can be
implemented on top of an unreliable messaging infrastructure, which may
lose messages or deliver them multiple times, by these protocols:

 Sending Site Protocol When a transaction wishes to send a


persistent message, it writes a record containing the message in a
special relation messages to send, instead of directly sending out
the message. The message is also given a unique message
identifier.
 Receiving Site Protocol When a site receives a persistent
message, it runs a transaction that adds the message to a special
received-messages relation, provided it is not already present in the
relation. After the transaction commits, or if the message was
already present in the relation, the receiving site sends an
acknowledgement back to the sending site.
h. Availability

One of the goals in using distributed databases is high availability;


that is, the database must function almost all the time. The ability to
continue functioning even during failures is referred to as robustness. For
a distributed system to be robust, it must detect failures, reconfigure the
system so that computation may continue, and recover when a processor
or a link is repaired. The different types of failures are handler in different
ways.

For example

Message loss is handled by retransmission. Repeated retransmission


of a message across a link without receipt of an acknowledgement is
usually a symption of a link failure. The network usually attempts to find
as alternate route for the message. Failed to find such a route is usually a
symption of network partition.

147
a. Majority-Based Approach

In this approach, each data object stores with it a version number to


detect when it was last written to. Whenever a transaction writes an
object it also updates the version number in this way:

 If data object a is replicated in n different sites, then a lock-request


message must be sent to more than one half of the n sites in which
a is stored.
 Read operations look at all replicas on which a lock has been
obtained, and read the value from the replica that has the highest
version number.
 Failures during a transaction can be tolerated as long as the sites
available at commit contain a majority of replicas to all the object
written to and during reads, a majority of replicas are read to find
the version number.
b. Read One, Write All Available Approach

In this approach there is no need to use version numbers; however,


if even a single site containing a data item fails, no write to the item can
proceed, since the write quorum will not be available. This protocol is
called the read one; write all protocol since all replicas must be
written.To allow work to proceed in the event of failures, we would like to
be able to use a read one, write all available protocol. In this
approach, a read operation proceeds as in the read one write all scheme;
any available replica can be read, and a read lock is obtained at that
replica. A write operation is shipped to all replicas, and writes locks are
acquired on all the replicas. If can be used if there if never any network
partitioning, but it can result in inconsistencies in the event of
network partitions.
c. Comparison with Remote Backup

The remote backup systems and replication in distributed databases


are two alternative approaches to providing high availability. The main
different between the two schemas is that with remote backup systems,

148
actions such as concurrency control and recovery are performed at a
single site, and only data and log records are replicated at other site.This
remote backup systems offer a lower-cost approach to high availability
than replication. On the other hand, replication can provide greater
availability by having multiple replicas available, and using the majority
protocol.

d. Coordinator Selection

One way to continue execution is by maintaining a backup to the


coordinator, which is ready to assume responsibility if the coordinator
fails. A backup coordinator is a site that, in addition to other tasks,
maintains enough information locally to allow it to assume the role of
coordinator with minimal disruption to the distributed system. To executes
the same algorithms and maintains the same internal state information as
does the actual coordinator. The prime advantage to the backup approach
is the ability to continue processing immediately. This backup coordinator
approach avoids a substantial amount of delay while the distributed
system recovers from a coordinator failure.

5.12 CLOUD BASED DATABASE


 A new concept in computing that emerged in the late 1990s and the
2000s.
 Vendors of software services provided specific customizable
applications that they hosted on their own machines
 Then, generic computers as a service
 Clients runs its own software, but runs it on vendor’s computers.
 These machines are called virtual machines, which are simulated by
software that allows a single real computer to simulate several
independent computers
 Clients can add machines as needed to meet demand and release
them at times of light load.
 Data storage services, map services, and other services can be
accessed using a Web-service API.

149
 Venders of cloud service
 Traditional computing vendors, Amazon, Google
Cloud-based database:-
 Web applications need to store and retrieve data for very large
numbers of users
 Traditional parallel databases not designed to scale to 1000’s of
nodes (and expensive)
 Value availability and scalability over consistency
 Storing and retrieving data items by key value is minimum
functionality
 Key-value stores
Systems for data storage on the cloud:-
 Bigtable from Google
 HBase, an open source clone of Bigtable
 Dynamo, which is a key-value storage system from Amazon
 Simple Storage Service (S3) from Amazon
 Cassandra from Facebook
 Sherpa/PNUTs from Yahoo!

5.13 DIRECTORY SYSTEMS


Directory can be used to find information about some class of
objects such as persons. Directories can be used to find information about
a specific object, or in the reverse different direction to find objects that
meet a certain requirement. In the world of physical telephone directories,
directories that satisfy lookups in the forward direction are called white
pages, while directories that satisfy lookups in the reverse direction are
called yellow pages.

a. Directory Access Protocols

The directory information can be made available through web


interfaces, as many organizations. The directories can be used for storing
some information much like file system directories. A user can thus access
the same settings from multiple locations, such as at home and at work,

150
without having to share a file system. Several Directory Access
Protocols have been developed to provide a standardized way of
accessing data in a directory. The most widely used among them today is
the Lightweight Directory Access Protocol (LDAP).

b. LDAP: Lightweight Directory Access Protocol

The X.500 directory access protocol, defined by the


International Organization for standardization (ISO), is a standard for
accessing directory information. However, the protocol is rather complex,
and is not widely used. This LDAP provides many of the X.500 features,
but with less complexity, and is widely used.

c. LDAP Data Model

In LDAP directories store entries, which are similar to objects. Each


entry must have a distinguished name (DN), which uniquely identifies
the entry. A DN is in turn made up of a sequence of relative
distinguished names (RDNs). For example, an entry may have the
following distinguished name

Cn = silberchatz , ou= Bell labs , o= Lucent, c=USA

The set of RDNs for a DN is defined by the schema of the directory


system. The LDAP allows the definition of object classes with attribute
names and types. Inheritance can be used in defining object classes. The
entries are organized into a Directory Information Tree (DIT),
according to their distinguished names. Entries at the leaf level of the tree
usually represent specific objects. Entries that are internal nodes
represent objects such as organizational units, organizations, or countries.

d. Data Manipulation

LDAP does not define either a data-definition language or a data


manipulation language. LDAP defines a network protocol for carrying out
data definition and manipulation. LDAP does not define a file format called
LDAP Data Interchange Format (LDIF) that can be use for storing and

151
exchanging information. The querying mechanism in LDAP is very simple,
consisting of just selections and projections, without any join. A query
must specify the following:

 A base-that is a node within a DIT- by giving its distinguished name.


 A search condition, which can be Boolean combination of conditions
on individual attributes.
 A scope, which can be just the base, the base and its children, or
the entire sub tree beneath the base.
 Attributes to return.
 Limits on number of results and resource consumption.
e. Distributed Directory Trees

The information about an organization may be split into multiple


DITs, each of which stores information about some entries. A node in a DIT
may contain a referral to another node in another DIT. Referrals are the
key component that helps organize a distributed collection of directories
into an integrated system. Many LDAP implementations support master-
slave and multimedia replication of DITs, although replication is not part
of the current LDAP version 3 standard. Work on standardizing replication
in LDAP is in progress.

MODEL QUESTION PAPER-I


B.COM (CA) DEGREE EXAMINATION
Second semester
Commerce
DATABASE MANAGEMENT SYSTEM
Time: Three hours Maximum marks: 75
marks

SECTION - A

Answer ALL the Questions. (10 X 2 =


20 )
1. What are the differences between a File System and Database System?

152
2. Draw an E-R Model for an Employee Data
3. What are the functions of a DBA
4. List out the Fundamental operations of a Relational Algebra
5. What are the string operations in SQL?
6. Define a Null value
7. Define Referential Integrity
8. Define Triggers
9. What is atomic domain?
10. Define commit protocols?
SECTION - B
Answer ALL the Questions. (5X5
= 25 )
11. a).Explain the Levels of abstraction
(Or)
b).Write short notes on the following
i) Types of Attributes ii) Symbols in ER Diagram
12. a). Explain the Tuple Relational Calculus
(Or)
b). Explain the Domain Relational Calculus
13. a). Write short notes on Views
(Or)
b). Explain the Modification of database in SQL

153
14. a). Write about Joined relations with examples
(Or)
b).What are the Domain types/Data types in SQL
15. a).Explain in detail about distributed transactions?
(Or)
b).Explain the concepts of cloud based database?
SECTION – C

Answer ANY THREE Questions. ( 3 X 10


= 30 )

16. Explain the Overall structure of database in detail.

17. Explain briefly the Relational algebra operations.

18. Describe tuple and domain relational calculus?

19. What is a Trigger? Explain its functions with neat diagrams

20. Explain the server and client system architecture?

154
Commerce
DATABASE MANAGEMENT SYSTEM
Time: Three hours Maximum marks: 75 marks

SECTION - A

Answer All the questions (10 X 2


= 20 )

1. Define DBMS.
2. Define Atomicity.
3. Differentiate Unary & Binary Algebra Operators.
4. Give the Syntax for DRC.
5. List out the Symbols of ER-Diagram.
6. Differentiate Strong & Weak Entity set.
7. What is homogeneous and heterogeneous database?
8. Define constraints
9. What is authorization?
10. What is null values with examples?
SECTION – B

Answer All the questions ( 5 X 5 = 25 )


11. a. Define Data Abstraction and its levels?
(Or)
b. Write short notes on transaction management.
12. a. Write down the Structure of SQL query?
(Or)
b. List out various set operations used in SQL?
13. a. Write short notes on transactions?
(Or)
b. What is a trigger? Explain with examples.
14. a. Explain in detail about extended E-R features?
(Or)
b. Explain about relational database design?

155
15. a. Briefly explain about the network types?
(Or)
b. Briefly explain about the basic concepts of parallel systems?

SECTION – C
Answer ANY THREE Questions. ( 3 X 10
= 30 )
16. Explain the E-R model with examples.
17. Describe the basic structure of SQL query.
18. Explain the relational algebra with examples?
19. Briefly explain about the decomposition using multivalued
dependency?
20. Write in detail about the directory systems with examples.

OBJECTIVE TEST-I

Title: DATABASE MANAGEMENT SYSTEM


1. The collection of data is called as------
156
a) Database b)data model c)entity
d)Attribute

2. The Symbol Which represent entity sets is ------

a)Rectangle b)Ellipse c)Line


d)Diamond

3. A set of operations that take one or two relations as input and produce
a new relation as their result is called as -------

a)Relational Algebra b)Relational model

c)Relational Calculas d)Relational database

4. DDL Stands for ------

a) Data-Definition Language b)Data-Difficulty Language

c)Data-language d)Data Detecting Language

5. The operation returns all rows that appear in either or both of two
tables is ----------------------

a) Union Operation b) Intersect Operation

c) Except Operation d)None

6. ---------- returns the minimum value of particular attribute

a) AVG b)MIN c)MAX d)SUM

7. A -------------is a statement that is executed automatically by the system

a) Trigger b)SELECT c)UPDATE


d)INSERT

8. BCNF Stands for--------------

a) Binary Code Normal Form b) Boyce Codd Normal Form

c)Batch Code Normal Form d)Branch Code Normal Form


157
9. Data Structure Diagrams used in ------------ Model.

a)E-R b) Network c) Hierarchical d)Physical

10. In The hierarchical model the records are organized as collections of


-----------------

a)Trees b) Records c)Entities D) Relationships

158

You might also like