0% found this document useful (0 votes)
30 views44 pages

LM Dbmsnotes 2

Uploaded by

raisinghdaksh2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views44 pages

LM Dbmsnotes 2

Uploaded by

raisinghdaksh2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

1

Unit – I Introduction to Database Systems

1. File Processing System


File processing is the process of creating, storing and accessing content of files. File
processing can be used in opening saved files for read only purpose. It can be used to
save a new file or displace the existing one. Through file processing you can make
new files and save the existing files. You can also modify files through this process.

Advantages of File Processing System


1. It supports heterogeneous operating systems including all flavors of the unix
operating system as well as Linux and windows
2. Multiple client machines can access a single resource simultaneously.
3. Enables sharing common application binaries and read only information instead
of putting them on each single machine. This results in reduced overall disk
storage cost and administration overhead.
4. Gives access to uniform data to groups of users.
5. Useful when many users exist on many systems with each user's home directory
located on every single machine. Network file systems allows you to all users
home directories on a single machine under /home

Drawbacks of File Processing System / Need of Database


Management System
1. Data Redundancy
Data Redundancy means same information is duplicated in several files. When
software is developed independently within the file processing system, this can

Database Management System Page 1


2

lead to unwanted duplicated files. This is wasteful since file duplication uses up
extra space on the hard drives which could otherwise be put to better use. Eg. The
address and telephone number of a particular customer may appear in a file
generated from order filling system and also generated from invoicing system.
This redundancy leads to higher storage & excess cost also leads to inconsistency.

2. Data Inconsistency
There are different copies of the same data, but the content does not match as it
should. This problem occurs when data is updated, but the some copies of the data
remain old or not updated. Eg. A changed customer address may be reflected in
customer master file generated by order filling system but not in customer master
file generated from invoicing system.

3. Difficulty in Accessing Data


It is not easy to retrieve information using a conventional file processing system.
Convenient and efficient information retrieval is almost impossible using
conventional file processing system. Suppose that the company owner wants the
list of customer who lived in surat city.
He has 2 choices :
- List all cust_names & manually do the information required.
- Ask the data processing dept. to have system programmer to write necessity
application program. Because the designers of the original system did not
anticipate this request, there is no application program on hand to meet it.
Suppose that such a program is written, and that, several days later, the owner
needs to trim that list to include only those customers who have an order amount
is Rs.10,000 or more. As expected, a program to generate such a list does not
exist. Again, the owner has the preceding two options, neither of which is
satisfactory.

4. Data Isolation
Data are scattered in various files, and the files may be in different format, writing new
application program to retrieve data is difficult.

5. Integrity Problems
The data values may need to satisfy some integrity constraints. For example the phone
number has minimum 6 digits. We have to handle this through program code in file
processing systems. But in database we can declare the integrity constraints along with
definition itself.

6. Atomicity Problem
Atomicity means – it must happen in its entirely or not at all. It is difficult to ensure
atomicity in file processing system. For example, transferring Rs.100 from Account A to
account B. If a failure occurs during execution there could be situation like Rs.100 is
deducted from Account A and not credited in Account B resulting in an inconsistency

Database Management System Page 2


3

database state. It is essential to database that either both the credit and debit occur, or that
neither occurs. That is, the funds transfer must be atomic.

7. Concurrent Access anomalies


If multiple users are updating the same data simultaneously it will result in inconsistent
data state. In file processing system it is very difficult to handle this using program code.
This results in concurrent access anomalies.

8. Security Problems
Enforcing Security Constraints in file processing system is very difficult as the
application programs are added to the system in an ad-hoc manner.

9. Limited Data Sharing


There is no centralized data control in file system. Each application has its own
private files and users have little chance to share data outside their own
application.

10. Lengthy Development Time


For each new application programmers must design their own file formats and
description from scratch.

11. Excess Program Maintenance


There is 80% of budget is spend in program maintenance.

Data and Information

Data is “raw, unanalysed facts, figures and events” Data refers to the lowest abstract
or a raw input which when processed or arranged makes meaningful output.

Information is “useful knowledge derived from the data”. Information is usually the
processed outcome of data. More specifically speaking, it is derived from data.

Knowledge is the appropriate collection of information, such that it's intent is to be


useful.

For example:

Researchers who conduct market research survey might ask member of the public to
complete questionnaires about a product or a service. These completed questionnaires
are data; they are processed and analyze in order to prepare a report on the survey.
This resulting report is information.

Data Informatio Knowledge Action


n
Database Management System Page 3
4

2 Database Management System

A database is a collection of information that is organized so that it can easily be


accessed, managed, and updated. In one view, databases can be classified according to
types of content: bibliographic, full-text, numeric, and images.

Database Management System is a collection of programs that enables user to create


and maintain a database. In other words it is general-purpose software that provides
the users with the processes of defining, constructing and manipulating the database
for various applications. For examples, Microsoft Access, MySQL, Microsoft SQL
Server, Oracle. The database and DBMS software together is called as Database
system.

Disadvantages of Database Management System

1. Complexity
A database system creates additional complexity and requirements. The supply
and operation of database management system with several users and databases is
quite costly and demanding.
2. Qualified personnel
The professional operation of a database system requires appropriately trained
staff. Without a qualified database administrator nothing will work for long.
3. Cost of hardware/software
A processor with high speed of data processing and memory of large size is
required to run the DBMS software. It means that you have to upgrade the
hardware used for file-based system. Similarly, DBMS software is also very
costly.
4. Lower efficiency
A database system is a multi use software which is often less efficient than
specialized software which is produced and optimized exactly for one problem.
5. Danger of a overkill
For small and simple applications for a single user a database system is often not
advisable.
6. Database damage
In most of the organizations, all data is integrated into a single database. If
database is damaged due to electric failure or database is corrupted on the storage
media, then your valuable data may be lost forever.
7. Cost of data conversion
When a computer file-based system is replaced with a database system, the data
stored into data file must be converted to database file. It is very difficult and

Database Management System Page 4


5

costly method to convert data of data files into database. You have to hire
database and system designers along with application programmers. Alternatively,
you have to take the services of some software house. So a lot of money has to be
paid for developing software.
8. Complexity of backup and recovery

ACID Properties
ACID properties are an important concept for databases. The acronym stands for Atomicity,
Consistency, Isolation, and Durability.

The ACID properties of a DBMS allow safe sharing of data. Without these ACID properties,
everyday occurrences such using computer systems to buy products would be difficult and
the potential for inaccuracy would be huge. Imagine more than one person trying to buy the
same size and color of a sweater at the same time -- a regular occurrence. The ACID
properties make it possible for the merchant to keep these sweater purchasing transactions
from overlapping each other -- saving the merchant from erroneous inventory and account
balances.

 Atomicity

The phrase "all or nothing" succinctly describes the first ACID property of atomicity.
When an update occurs to a database, either all or none of the update becomes
available to anyone beyond the user or application performing the update. This update
to the database is called a transaction and it either commits or aborts. This means that
only a fragment of the update cannot be placed into the database, should a problem
occur with either the hardware or the software involved.

 Consistency

It states that only valid data will be written to the database. If, for some reason, a
transaction is executed that violates the database’s consistency rules, the entire
transaction will be rolled back and the database will be restored to a state consistent
with those rules. On the other hand, if a transaction successfully executes, it will take
the database from one state that is consistent with the rules to another state that is also
consistent with the rules.

 Isolation

It requires that multiple transactions occurring at the same time not impact each
other’s execution or One transaction does not interfere with another. If two
transactions are happening on the same time using same resource, then the system
may not be consistent. So the Isolation process makes sure the transactions do not

Database Management System Page 5


6

interfere to each other and make the transactions individually. In case of only one
transaction, it does not matter.

 Durability

A committed (Saved) transaction will not be lost. Durability is ensured through the
use of database backups and transaction logs that facilitate the restoration of
committed transactions in spite of any subsequent software or hardware failures.

Data Abstraction (ANSI-SPARC)


We know that the same thing, if viewed from different angles produces difference
sights. Likewise, the database that we have created already can have different aspects
to reveal if seen from different levels of abstraction. Let us illustrate by a simple
example.

A computer reveals the minimum of its internal details, when seen from outside. We
do not know what parts it is built with. This is the highest level of abstraction,
meaning very few details are visible. If we open the computer case and look inside at
the hard disc, motherboard, CD drive, CPU and RAM, we are in middle level of
abstraction. If we move on to open the hard disc and examine its tracks, sectors and
read-write heads, we are at the lowest level of abstraction, where no details are
invisible.

In the same manner, the database can also be viewed from different levels of
abstraction to reveal different levels of details. From a bottom-up manner, we may
find that there are three levels of abstraction or views in the database. We discuss
them here.

The word schema means arrangement – how we want to arrange things that we have
to store.

 Physical level

It is the lowest level of abstraction. It is also known as Internal View. It deals


with the description of how raw data items (like 1, ABC, KOL, H2 etc.) are stored
in the physical storage (Hard Disc, CD, Tape Drive etc.). It also describes the data
type of these data items, the size of the items in the storage media, the location
(physical address) of the items in the storage device and so on. The database
design at physical level is called physical schema. This schema is useful for
database application developers and database administrator.

Database Management System Page 6


7

 Logical level

It is the middle level of abstraction. It is known as the Conceptual View, and


deals with the structure of the entire database. At this level we are interested with
the structure of the database. This means we want to know the information about
the attributes of each table, the common attributes in different tables that help
them to be combined, what kind of data can be input into these attributes, and so
on. The database design at logical level is called logical schema. Conceptual or
Logical schema is very useful for database administrators whose responsibility is
to maintain the entire database.

 External View

The highest level of abstraction is the User View. This is targeted for the end
users. Now, an end user does not need to know everything about the structure of
the entire database, rather than the amount of details he/she needs to work with.
We may not want the end user to become confused with astounding amount of
details by allowing him/her to have a look at the entire database, or we may also
not allow this for the purpose of security, where sensitive information must

Database Management System Page 7


8

remain hidden from unwanted persons. The database administrator may want to
create custom made tables, keeping in mind the specific kind of need for each
user. These tables are also known as virtual tables, because they have no separate
physical existence. They are crated dynamically for the users at runtime. Say for
example, in our sample database we have created earlier, we have a special officer
whose responsibility is to keep in touch with the parents of any under aged student
living in the hostels. That officer does not need to know every detail except the
Roll, Name, Addresss and Age. The database administrator may create a virtual
table with only these four attributes, only for the use of this officer.

Data Independence
Logical data independence: The ability to change the logical (conceptual) schema
without changing the External schema (User View) is called logical data
independence. For example, the addition or removal of new entities, attributes, or
relationships to the conceptual schema should be possible without having to change
existing external schemas or having to rewrite existing application programs.

Physical data independence: The ability to change the physical schema without
changing the logical schema is called physical data independence. For example, a
change to the internal schema, such as using different file organization or storage
structures, storage devices, or indexing strategy, should be possible without having to
change the conceptual or external schemas.

View level data independence is always independent no affect, because there doesn't
exist any other level above view level.

Instances and Schemas


The overall design of the database is called the database schema. A relation schema
can be thought of as the basic information describing a table or relation. This includes
a set of column names, the data types associated with each column, and the name
associated with the entire table. For example, a relation schema for the relation called
Students could be expressed using the
following representation:
Students(sid: string, name: string, login: string, age: integer, gpa: real)

Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the
database. The term instance is typically used to describe a complete database

Database Management System Page 8


9

environment, including the RDBMS software, table structure, stored procedures and
other functionality.

External schema – It is also known as physical schema. It describes a subset of the


database that a particular user group is interested in, according to the format user
wants, and hides the rest. It may contain virtual data that is derived from the files, but
is not explicitly stored.

Conceptual schema – It is also known as logical schema. It hides the details of


physical storage structures and concentrates on describing entities, data types,
relationships, operations, and constraints.

Internal schema – It is also known as sub-schemas. It describes the physical storage


structure of the database. It uses a low-level (physical) data model to describe the
complete details of data storage and access paths.

Database Languages
A database system provides a data definition language to specify the database
schema and a data manipulation language to express database queries and updates.

 Data Manipulation Language


A data-manipulation language (DML) is a language that enables user to access
or manipulate data as organized by the appropriate data model.

 The retrieval of information stored in the database(Select)


 The insertion of new information into the database(Insert)
 The deletion of information from the database(Delete)
 The modification of information stored in the database(Update)

There are two types of DML

 Procedural DMLs require a user to specify what data are needed and
how to get those data. This means that the user must express all the
data access operations that are to be used by calling appropriate
procedures to obtain the required information. Such a procedural DML
retrieves a record, processes it and based on the results obtained by this
processing, retrieves another record that would be process similarly
and so on. This process of retrievals continuous until the data requested
from the retrieval has been gathered. Typically, procedural DMLs are
embedded in high level programming language that contain construct
to facilitate iteration and handle navigation logic. Network and
Hierarchical databases use Procedural DML's.
 Nonprocedural DMLs require a user to specify what data are needed
without specifying how to get those data. Non-procedural DMLs allow

Database Management System Page 9


10

the required data to be specified in a single retrieval or update


statement. With non-procedural DMLs, the user specifies what data is
required without specifying how it is to be obtained. The DBMS
translates a DML statement into one or more procedurals that
manipulate the required sets of records. This frees the user from having
to know how data structures are internally implemented and what
algorithms are required to retrieve and possibly transform the data,
thus providing users with a considerable degree of data independence.
Non-procedural languages are also called declarative languages.
Relational DBMSs usually include some form of non-procedural
language for data manipulation. Non-procedural DMLs are normally
easier to learn and use than procedural DMLs, as less work is done by
the user and more by the DBMS. SQL is a non-procedural DML. A query
is a statement requesting the retrieval of information. The portion of a
DML that involves information retrieval is called a query language.

In general, a query (noun) is a question, often required to be expressed in a


formal way. The word derives from the Latin quaere (the imperative form of
quaerere, meaning to ask or seek). In computers, what a user of a search
engine or database enters is sometimes called the query. To query (verb)
means to submit a query (noun).

A database query can be either a select query or an action query. A select


query is simply a data retrieval query. An action query can ask for additional
operations on the data, such as insertion, updating, or deletion.

Languages used to interact with databases are called query languages, of


which the Structured Query Language (SQL) is the well-known standard.

 Data Definition Language

A database schema is specified by a set of definitions expressed by special


language called a data-definition language (DDL). The result of compilation
of DDL statements is a set of tables that is stored in a special file called Data
dictionary or data directory – which contains metadata (data about data).

A data dictionary is a file that contains metadata that is, data about data. This
file is consulted before actual data are read or modified in the database system.
The storage structure and access methods used by the database system are
specified by a set of definitions in a special type of DDL called a data storage
and definition language.

Database Management System Page 10


11

With this help a data scheme can be defined and also changed later.
Typical DDL operations (with their respective keywords in the structured
query language SQL):

 Creation of tables and definition of attributes (CREATE TABLE ...)


 Change of tables by adding or deleting attributes (ALTER TABLE …)
 Deletion of whole table including content (!) (DROP TABLE …)
 TRUNCATE - remove all records from a table, including all spaces
allocated for the records are removed
 COMMENT - add comments to the data dictionary
 RENAME - rename an object

The set of rules for constructing queries is known as a query language.


Different DBMSs support different query languages, although there is a semi-
standardized query language called SQL (structured query language).
Sophisticated languages for managing database systems are called fourth-
generation languages, or 4GLs for short.

 Data Control Language

Data Control Language (DCL) statements. Some examples:

 GRANT - gives user's access privileges to database


 REVOKE - withdraw access privileges given with the GRANT
command

 Transaction Control Language

Transaction Control (TCL) statements are used to manage the changes made
by DML statements. It allows statements to be grouped together into logical
transactions.

 COMMIT - save work done


 SAVEPOINT - identify a point in a transaction to which you can later
roll back
 ROLLBACK - restore database to original since the last COMMIT
 SET TRANSACTION - Change transaction options like isolation level
and what rollback segment to use

Database Users
There are four different types of database system users, differentiated by the way that
they expect to interact with the system. A primary goal of a database system is to
provide an environment for retrieving information from and storing new information
into the database.

Database Management System Page 11


12

 Native Users
Naive users are unsophisticated users who interact with the system by
invoking one of the application programs that have been written previously. For
example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer. This program asks the teller for the amount of
money to be transferred, the account from which the money is to be transferred,
and the account to which the money is to be transferred. As another example,
consider a user who wishes to find her account balance over the World Wide
Web. Such a user may access a form, where she enters her account number. An
application program at the Web server then retrieves the account balance, using
the given account number, and passes this information back to the user. The
typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated
from the database.
 Application Programmers
Application programmers are computer professionals who write application
programs and interact with the system through DML calls, which are embedded in
a program written in a host language (for example COBOL, C). Since the DML
syntax is different from the host language syntax, DML calls are usually prefaced
by a special character so that the appropriate code can be generated. A special
preprocessor, called the DML precompiler, converts the DML statements to
normal procedure calls in the host language. There are special types of
programming languages that combine control structures of Pascal like languages
with control structures for the manipulation of a database object.
 Sophisticated Users
Sophisticated users interact with the system without writing programs. Instead,
they form their requests in a database query language. Each such query is
submitted to a query processor whose function is to break down DML statement
into instructions that the storage manager understands.
 Specialized Users
Specialized users are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework. Among
these applications are computer aided design systems, knowledge-base and expert
systems.

Database Administrator
Data base administrator is the person who control the data in the organization its main
function to control the information of the particular enterprise

Functions of a DBA
The functions performed by a DBA are the following:
 Schema Definition

Database Management System Page 12


13

The Database Administrator creates the database schema by executing DDL


statements. Schema includes the logical structure of database table (Relation) like
data types of attributes, length of attributes, integrity constraints etc.
 Storage structure and access method definition
Database tables or indexes are stored in the following ways: Flat files, Heaps, B+
Tree etc.
 Schema and physical organization modification
The DBA carries out changes to the existing schema and physical organization.
 Granting authorization for data access
The DBA provides different access rights to the users according to their level.
Ordinary users might have highly restricted access to data, while you go up in the
hierarchy to the administrator, you will get more access rights.
 Routine Maintenance
Some of the routine maintenance activities of a DBA is given below.
o Taking backup of database periodically
o Ensuring enough disk space is available all the time.
o Monitoring jobs running on the database.
o Ensure that performance is not degraded by some expensive task submitted by
some users.
o Performance Tuning

Database System Structure


The major components of database system are described below.
 Data Definition Language Compiler
The DDL Compiler converts the data definition statements into a set of tables.
These tables contain the metadata concerning the database and are in a form
that can be used by other components of the DBMS.

Database Management System Page 13


14

 Data Manager
The data manager is the central software component of the DBMS. It is
sometimes referred to as the database control system. One of the functions of
the data manager is to convert operations in the user’s queries coming directly
via the query processor or indirectly via an application program from the
user’s logical view to a physical file system. The data manger is responsible
for interfacing with the file system. In addition, the tasks of enforcing
constraints to maintain the consistency and integrity of the data, as well as its
security, are also performed by the data manager. Synchronizing the
simultaneous operations performed by concurrent user is under the control of
the data manager. It is also entrusted with backup and recovery operations.
 File Manager
Responsibility for the structure of the files and managing the file space rests
with the file manager. It is also responsible for locating the block containing
the required record, requesting this block from the disk manager, and

Database Management System Page 14


15

transmitting the required record to the data manger. The file manager can be
implemented using an interface to the existing file subsystem provided by the
operating system of the host of the host computer or it can include a file
subsystem written especially for the DBMS.
 Disk Manager
The disk manager is part of the operating system of the host computer and all
physical input and output operating are performed by it. The disk manager
transfers the block of page requested by the file manager so that the latter need
not be concerned with the physical characteristics of the underlying storage
media.
 Query Processor
The database user retrieves data by formulating a query in the data
manipulation language provided with the database. The query processor is
used to interpret the online user’s query and convert it into an efficient series
of operating in a form capable of being sent to the data manager for execution.
The query processor uses the data dictionary to find the structure of the
relevant portion of the database and uses this information in modifying the
query and preparing an optimal plan to access the database.
 Data Files
Data files contain the data portion of the database.
 Data Dictionary
Information pertaining to the structure and usage of data contained in the
database, the metadata, is maintained in a data dictionary. The term system
catalog also describes this metadata. The data dictionary, which is a database
itself, documents the data. Each database user can consult the data dictionary
to learn what each piece of data and various synonyms of the data field mean.
In an integrated system (i.e., in a system where the data dictionary is
part of the DBMS) the data dictionary stores information concerning the
external, conceptual and internal levels of the database. It contains the source
of each data-field value, the frequency of its use and an audit trail concerning
updates, including who and when of each update.
Currently data dictionary systems are available as add-ons to the
DBMS. Standards have yet to be evolved for integrating the data dictionary
facility with the DBMS so that the two databases, one for metadata and the
other for data, can be manipulated using an unified DDL/DML.
 Access Aids
To improve the performance of a DBMS, a set of access aids in the form of
indexes are usually provided in a database system. Commands are provided to
build and destroy additional temporary indexes.

Data Models

Database Management System Page 15


16

A data model is a collection of concepts that can be used to describe the structure of a
database. The way in which information is subdivided and managed within a database
is referred to as the data model used by the DBMS. Each DBMS is based on a
particular data model. A data model is a description of both a container for data and
methodology for storing and retrieving data from container. A user must choose a
DBMS which is suitable for the project.

Data models can be classified into three major groups. They are:

 Object-Based logical models.


 Record-Based logical models.
 Physical models.

A) Object Based logical Models:

These models are used to describe data at the logical and view levels. The data is
stored in two-dimensional tables (rows and columns). The data is manipulated based
on the relational theory. E-R model will cover in detail shortly.

B) Record Based Logical models:

This model is used to describe data at the logical and view levels. The database is
structured in fixed format records of different types. Each record type has a fixed
number of fields. And each field is of fixed length. The following are the three
important record based logical models.

 Relational Model
 Network Model
 Hierarchical Model.

Relational Model:

A database based on the relational model developed by E.F. Codd. A relational


database allows the definition of data structures, storage and retrieval operations and
integrity constraints. In such a database the data and relations between them are
organized in tables. A table is a collection of records and each record in a table
contains the same fields.

Properties of Relational Tables:


 Values Are Atomic
 Each Row is Unique
 Column Values Are of the Same Kind
 The Sequence of Columns is Insignificant
 The Sequence of Rows is Insignificant
 Each Column Has a Unique Name

Database Management System Page 16


17

Certain fields may be designated as keys, which means that searches for specific
values of that field will use indexing to speed them up. Where fields in two different
tables take values from the same set, a join operation can be performed to select
related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders"
table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would
sum the prices of all products ordered by that customer by joining on the product-code
fields of the two tables. This can be extended to joining multiple tables on multiple
fields. Because these relationships are only specified at retrieval time, relational
databases are classed as dynamic database management system. The RELATIONAL
database model is based on the Relational Algebra.

Network Model:

The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than one
parent per child. So, the network model permitted the modeling of many-to-many
relationships in data. The basic data modeling construct in the network model is the
set construct. A set consists of an owner record type, a set name, and a member record
type. A member record type can have that role in more than one set, hence the
multiparent concept is supported. An owner record type can also be a member or
owner in another set. The data model is a simple network, and link and intersection
record types (called junction records by IDMS) may exist, as well as sets between
them. Thus, the complete network of relationships is represented by several pairwise
sets; in each set some (one) record type is owner (at the tail of the network arrow) and
one or more record types are members (at the head of the relationship arrow). Usually,
a set defines a 1:M relationship, although 1:1 is permitted.

 Data are represented as collection of records


 Relationships are represented as links
 Each record is a collection of fields:

Database Management System Page 17


18

type customer = record


customer-name: string;
customer-street: string;
customer-city: string;
end

type account = record


account-number: string;
balance: integer;

end

Hierarchical Model:

Database Management System Page 18


19

The hierarchical data model organizes data in a tree structure. There is a hierarchy of
parent and child data segments. This structure implies that a record can have repeating
information, generally in the child data segments. To create links between these
record types, the hierarchical model uses Parent Child Relationships. These are a 1:N
mapping between record types. For example, an organization might store information
about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name
and date of birth. The employee and children data forms a hierarchy, where the
employee data represents the parent segment and the children data represents the child
segment. If an employee has three children, then there would be three child segments
associated with one employee segment. In a hierarchical database the parent-child
relationship is one to many. This restricts a child segment to having only one parent
segment. Hierarchical DBMSs were popular from the late 1960s, with the
introduction of IBM's Information Management System (IMS) DBMS, through the
1970s.

 Also use Records, Links (similar to Network Model)


 Database is collection of rooted trees => forest

C) Physical Data Models:

A physical data model is a representation of a data design which takes into account
the facilities and constraints of a given database management system. In the lifecycle
of a project it is typically derived from a logical data model. A physical database
model shows all table structures, including column name, column data type,
constraints definition, indexes, table partitioning, linking tables etc. The physical data
model can usually be used to calculate storage estimates and may include specific
storage allocation details for a given database system.

Database Management System Page 19


20

Entity and Attributes


Entity : entity is a thing in the real world with an independent existence. It is a thing or
object in the real world that is distinguishable from other objects. Every object has some
characteristics which are present in entity as its attributes. So an entity is represented by a set
of attributes that is descriptive properties of entity. For example every fruit have some test,
color, vitamins. Another example, every employee has their employee_id, Name,
Designation, Salary. So all this are attributes of employee entity.

Entity Type : It is a collection (set) of entities that have same attributes.

Entity Set : It is a collection of all entities of particular entity type in the database.
Attributes of an entity are possessed by all member of entity set.

For example

A company have many employees ,and these employees are defined as entities(e1 or
e2 or e3 or ....) and all these entities having same attributes are defined under
ENTITY TYPE employee and set{e1,e2,.....} is called entity set.

fruit=entity type=Employee

apple or banana or orange =entity=e1 or e2 or e3

Database Management System Page 20


21

bucket of apple, banana , mango= enity set={e1,e2,e2…}

Domain
For each attribute there is a set of permitted values called the domain of that attribute. For
example, attribute RollNo have permitted values are 1,2,3,4…. And 09BCA01, 09BCA02….
( 09 is joining year of BCA and 01 is rollno). So all this numbers are consider as the domain
of attribute Rollno.

Types of Attributes
 Simple attribute : Simple attribute that consist of a single atomic value (value that
can not be divided). It means simple attribute can not be subdivided. For example the
attributes age, sex etc are simple attributes.
 Composite attribute : A composite attribute is an attribute that can be further
subdivided. For example the attribute ADDRESS can be subdivided into street, city,
state, and zip code.
 Single Value attribute : A single valued attribute can have only a single value. For
example a person can have only one 'date of birth', 'age' etc. That is a single valued
attributes can have only single value. But it can be simple or composite attribute.That
is 'date of birth' is a composite attribute , 'age' is a simple attribute. But both are single
valued attributes.
 Multi Valued attribute : Multivalued attributes can have multiple values. For
instance a person may have multiple phone numbers,multiple degrees etc.

 Stored and Derived Attributes : The value for the derived attribute is derived from
the stored attribute. For example 'Date of birth' of a person is a stored attribute. The
value for the attribute 'AGE' can be derived by subtracting the 'Date of Birth'(DOB)
from the current date. Stored attribute supplies a value to the related attribute.

 Stored Attribute: An attribute that supplies a value to the related attribute.


Example: Date of Birth
 Derived Attribute: An attribute that’s value is derived from a stored attribute.
Example : age, and it’s value is derived from the stored attribute Date of Birth.

 Complex Attribute : A complex attribute that is both composite and multi valued.

 Prime and Non-Prime Attribute :


A Prime attribute is an attribute that part of any candidate key. An attributes that are
not part of candidate are non-prime attribute.

Database Management System Page 21


22

The CREATE TABLE Command

The CREATE TABLE command defines each column of the table uniquely. Each column
has a minimum of three attributes, a name, datatype and size (i.e. column width). Each table
column definition is separated from the other by a comma. In table name you can use
alphabets from A-Z, a-z, numbers from 0-9 and special character like _ (no other special
characters are allowed). The condition is that the table name must be start with an alphabet.
And SQL reserved words not allowed to use like create, select, and so on.

 Syntax

CREATE TABLE table_name


(ColumnName1 Datatype(size),
ColumnName2 Datatype(size),
. . ,
. . ,
. . ,
ColumnNameN Datatype(size))

 Example

CREATE TABLE Student_Info


(RollNo Number,
Name Text(20),
Address Text(50))

Execution of the above DDL statement creates the table Student_Info, which contain
the information of the students like their RollNo, Name and address. In this table we can
enter maximum 20 characters in name and 50 character in address. In addition, it updates the
data dictionary.

Inserting Data into tables

Once a table is created, the most natural thing to do is fill table with data to be manipulated
later. When inserting a single row of data into the table, the insert operation creates a new
row (empty) in the database table and then fills the values passed by the SQL insert into the
columns specified.
Note : character expressions placed within the INSERT INTO statement must be enclosed in
quotes (single or double)

 Syntax

Database Management System Page 22


23

INSERT INTO Table_Name(columnName1, ColumnName2, ……, ColumnNameN)


VALUES(expression1, expression2, ………… expressionN)

 Example

INSERT INTO Student_Info(RollNo, Name, Address)


VALUES(1,”jigisha”,”surat”)

In the insert into SQL sentence, table columns and values have a one to one relationship that
is, the first value described in inserted into the first column, and the second value described is
inserted into the second column and so on.
Hence, in an insert into SQL sentence if there are exactly the same numbers of values as there
are columns and the values are sequenced in exactly in accordance with the data type of the
table columns, there is no need to indicate the column names.
But in two situations, it is compulsory to give the column names.
 There are less values being described than there are columns in the table, must specify
the name of the columns and its corresponding values which fill in that columns.
 If sequence of the column in the table is not known, then specify the columns name
along with the values.

 Example
 If we know the sequence of the column then
INSERT INTO Student_Info
VALUES(2,”Ami”,”Mumbai”)
 If we do not know the sequence of the column then
INSERT INTO Student_Info(Address, Name, RollNo)
VALUES(“Baroda”,”Mehul”,3)
 If the RollNo column has auto Number then
INSERT INTO Student_Info(Name,Address)
VALUES(“Mukesh”,”Surat”)

Keys Constraints
A key is a single or combination of multiple fields. Its purpose is to access or retrieve data
rows from table according to the requirement. The keys are defined in tables to access or
sequence the stored data quickly and smoothly. They are also used to create links between
different tables.

 Super Key
A combination of one or more columns in a table which can be used to identify a
record in a table uniquely, a table can have any number of super keys. For example,
Employee(Empl_ID, Name, Address, Salary, Department_ID)

Database Management System Page 23


24

Super keys
1 Empl_ID
2 Empl_ID, Name
3 Empl_ID, Address
4 Empl_ID, Department_ID
5 Empl_ID, Salary
6 Name, Address
7 Name, Address, Department_ID ………… So on
as any combination which can identify the records uniquely will be a Super Key.

 Candidate Key

A Column (or) Combination of columns which can help uniquely identify a record in
a table without the need of any external data is called a Candidate Key. Depending on
the need and situation a Table may have one or more candidate keys and one of them
can be used as a Primary Key of the table.

A candidate key is a sub set of a Super Keys.

Candidate key for above example


1 Empl_ID
2 Name, Address

Here, Empl_ID, Name, Address are prime attributes. Other attributes that do not part
of candidate key are non-prime attributes. A super key may contain extraneous
attributes. It is possible to retrieve unique record through the {Empl_ID} and {Name,
Address}. These both are enough for it. So both these keys are candidate key.
Depending on the need and situation a Table may have one or more candidate keys
and one of them can be used as a Primary Key of the table.

 Primary Key
A Candidate Key that is used by the database designer for unique identification of
each row in a table and has the Constraint NOT NULL attached to it is known as a
Primary Key. It can either be part of the actual record itself, or it can be an artificial
field (one that has nothing to do with the actual record). A primary key can consist of
one or more fields on a table. When multiple fields are used as a primary key, they are
called a composite key.
In above example if designer can select {Empl_ID} as primary key. And if designer is
select {Name, Address} as primary key it become composite key or composite
primary key.

Primary key constraint defined at Column level

 Syntax

Database Management System Page 24


25

ColumnName Datatype(size) PRIMARY KEY

 Example

CREATE TABLE Student


(RollNo number PRIMARY KEY,
Name Text,
Address Text)

Primary key constraint defined at Table level

 Syntax
PRIMARY KEY(ColumnName, ColumnName)

 Example

CREATE TABLE Student


(RollNo number,
Name Text,
Address Text,
PRIMARY KEY(Name, Address))

 Alternate Key
We cannot define the Alternate Key Separately from a Candidate Key, for a table, if
there are two or more Candidate Key’s and one is chosen as a Primary Key the others
Candidate Keys are known as the Alternate Key of that table. Here if designer select
{Emp_ID} is choose as primary key then {Name, Address} is become alternate key.

 Unique Key
A column (or) combination of columns which can be used to uniquely identify a
record in a table, it can have NULL Value. It is same as primary key but there are
certain differences between primary key and unique key. The differences are

1. A table may have more than one unique key. This depends on the design of the
table. But, a table can have only one primary key.

2. The values in the unique key columns may or may not be NULL. The values
in primary key columns, however, cannot be NULL.

Unique key constraint defined at Column level

Database Management System Page 25


26

 Syntax
ColumnName Datatype(size) UNIQUE

 Example

CREATE TABLE Student


(RollNo number UNIQUE,
Name Text,
Address Text)

Unique key constraint defined at Table level

 Syntax
UNIQUE(ColumnName, ColumnName)

 Example

CREATE TABLE Student


(RollNo number,
Name Text,
Address Text,
UNIQUE(Name, Address))

 Secondary key
The attributes that are not even the Super Key but can be still used for identification
of records (not unique) are known as Secondary Key.
For example, Department_ID does not contain unique information because more than
one employee works in one Department. But this Department_ID field is to know the
employee who belongs to the particular department like to know the employee list
who works for account department.

 Foreign key
Foreign keys represent relationship between tables. A foreign key is a column or a
group of columns whose values are derived from the primary key or unique key of
some other table or same table. The table in which the foreign key is defined is called
a foreign table or detail table or child table. The table that defines the primary or
unique key and is referenced by the foreign key is called the primary table or master
table or parent table. For example,

 Create Master table (Department Table)

CREATE TABLE Departements

Database Management System Page 26


27

(DeptCode Text(5) PRIMARY KEY,


DeptName Text)

DEPARTMENTS
DeptCode DeptName
AC Accounting
IS Information Systems
MK Marketing
RI Receiving & Inventory
SL Sales

 Create Detail table (Employee Table)

CREATE TABLE Employee


(EmployeeID Text(10) PRIMARY KEY,
LastName Text(20),
FirstName Text(20),
DeptCode Text(5) REFERENCES Department)

EMPLOYEE
EmployeeID LastName FirstName DeptCode
EN1-10 Schaaf Carol SL
EN1-12 Murray Gayle AC
EN1-15 Baranco Steve MK
EN1-16 Racich Kristine RI
EN1-19 Zumbo Barbara IS
EN1-20 Gordon Daniel SL
EN1-22 Rivet Jacqueline MK
EN1-23 Rosyln Betsy RI
EN1-25 Strick Will IS
EN1-26 Shipe Susan MK
EN1-27 Fink Joseph SL
EN1-28 Rubinstein Sara AC
EN1-30 Coleman Michael RI

In above example, Department table is master table and employee table is detail table.
Here DeptCode (attribute) of Department table is used as foreign key in Employee
table.

Database Management System Page 27


28

 This constraint establishes a relationship between records. It ensures that,


records cannot be inserted into a detail table if corresponding records in the
master table do not exist. For example, we create a new department which is
purchase department which has DeptCode is PR. We cannot assign any
employee to this department (cannot enter any record in Employee table which
as PR as DeptCode) until we make entry for this purchase department to the
Department table (Detail table).
 This constraint also ensures that records of the master table cannot be deleted
if corresponding records in the detail table actually exist. For example, if we
want to close the Information System (IS) Department, we cannot directly
delete it entry from the Department table (Master table). First we have to
release all the employees who work in IS department means we have to delete
all record from Emplyee table (Detail table) who DeptCode is IS. Then only,
we can remove the department IS from the Department table (Master table).

 Strong Entity Set and Weak Entity Set


An entity set that does not possess sufficient attributes to form a primary key is called
a weak entity set. One that does have a primary key is called a strong entity set.

The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set

The existence of a weak entity set depends on the existence of a identifying entity set.
It must relate to the identifying entity set via a total, one-to-many relationship set
from the identifying to the weak entity set

 Partial Key
An attribute is a Partial Key if a Key from a related entity type must be used in
conjunction with the attribute in question to uniquely identify instances of a
corresponding entity set. Specifies a key that, that is only partially unique. Used for
weak entities.
For example, suppose "Course Number" is an attribute of the Course entity type in our
design for a University database. Suppose Course Number alone cannot uniquely
identify courses. Rather, to identify a course we must include the Department Number
attribute as well. Course Number is a partial key.
Although a weak entity set does not have a primary key, we nevertheless need
a means of distinguishing among all those entities in the weak entity set that depend
on one particular strong entity. The discriminator or partial key of a weak entity set
is a set of attributes that allows this distinction to be made. For example, the
discriminator of the weak entity set payment is the attribute payment-number, since,
for each loan, a payment number uniquely identifies one single payment for that loan.

Database Management System Page 28


29

Types of Integrity Constraints

There are so many types of constraints. We have already seen the primary key constraint,
unique key constraint and foreign key constraint. Now see some other constraints.

 Domain Constraint
A domain of possible values must be associated with every attribute. Declaring an
attribute to be of a particular domain acts as a constraint on the values that it can take.
For example of above RollNo, if we specify RollNo as integer then now the domain
for the attributes are only 1,2,3,…. so on. (09BCA01, 09BCA02… are not now
considers in the domain of RollNo). It is a domain constraint. Domain constraints are
the most elementary form of integrity constraint. They are tested easily by the system
whenever a new data item is entered into the database.

 NOT NULL Constraint


By default, a column can hold NULL. If you not want to allow NULL value in a
column, you will want to place a constraint on this column specifying that NULL is
now not allowable value. For example,

Create table customer


(CID Number Not NULL,
Name Text Not NULL,
Address Text)

Now column CID and Name cannot contain NULL value while Address can contain
NULL value.

An attempt to execute the following SQL statement,

INSERT INTO Customer (Name, Address) values ('Hitesh','Surat');

It will result in an error because this will lead to column "CID" being NULL, which
violates the NOT NULL constraint on that column.

 Default Constraint

The default constraint provides a default value to a column when insert into statement
does not provide a specific value. For example,

Create table student

Database Management System Page 29


30

(SID integer Unique,


Name Text,
Marks Number Default 20)

And execute the following SQL statement,

INSERT INTO Student (SID, Name)


values (10,'Johnson')

The table will look like the following:

SID Name Marks


10 Johnson 20

Even though we didn't specify a value for the "Marks" column in the INSERT INTO
statement, it does get assigned the default value of 20 since we had already set 20 as
the default value for this column.

 Entity Integrity
Let us look at the effect of null values in prime attributes. A null value for an attribute
is a value that is either not known at the time or does not apply to a given instance of
the object. It may also be possible that a particular tuple does not have a value for an
attribute. This fact could be represented by a null value.
If any attribute of a primary key were permitted to have null values, then,
because the attributes in the key must be non-redundant, the key cannot be used for
unique identification of tuples. This contradicts the requirements for a primary key.

(a) Relation R without null values (b) Relation S with null values
Id Name Id Name
101 Jones 101 Jones
103 Smith @ Smith
104 Lalonde 104 Lalonde
107 Evan 107 Evan
110 Drew 110 Drew
112 Smith @ Lalonde
@ Smith

Consider the relation S. The attribute id is the primary key. If null values, that
is represented as @, were permitted, then the two tuples <@,Smith> are
indistinguishable, even though they may represent two different instances of the enity
type employee. Similarly, the tupes <@,Lalonde> and <104, Lalonde> for all intents
and purposes, are also indistinguishable, so must be their surrogates in the model. So

Database Management System Page 30


31

the instances of the entities are distinguishable and thus no prime attribute value may
be null. This rule is referred to as the entity rule. If attribute A of relation R(R) is
prime attribute of R(R), then A cannot accept null values.

 Referential Integrity

Relation (R) may contain references to another relation (S). Relation R and S need not
be distinct. Suppose the reference in R is via a set to attributes that forms a primary
key of the relation S. This set of attributes in R is a foreign key. A valid relationship
between a tuple in R to one in S requires that the values of the attributes in the foreign
key of R correspond to the primary key of a tuple in S. This ensures that the reference
from a tuple of the relation R is made unambiguously to an existing tuple in the S
relation. The referencing attribute in the R relation can have null value; in this case, it
is not referencing any tuple in the S relation. However, if the value is not null, it must
exist as the primary attribute of a tuple of the S relation. If the referencing attribute in
R has a value that is nonexistent in S, R is attempting to refer a nonexistent tuple and
hence a nonexistent instance of the corresponding entity.

Emp_ID Name Manager


101 Jones @
112 Smith 112
110 Drew 112
103 Smith 110
107 Evan 110
104 Lalonde 107

For example, consider the employees and their manager. Each employee has a
manager and as managers are also employees, we may represent managers by their
employee numbers, if the employee number is a key of the relation employee. The
manager is a foreign key that it is referring to the primary key of the same relation. An
employee can only have a manager who is also an employee. The CEO of the
company can have himself or herself as the manager or may take null values. Some
employees may also be temporarily without managers, and this can be represented by
the manager taking null values.

So, referential integrity is given two relations R and S, suppose R refers to the relation
S via a set of attributes that forms the primary key of S and this set of attributes forms
a foreign key in R. Then the value of the foreign key in a tuple in R must either be
equal to the primary key of a tuple of S or be entirely null.

If we delete a tuple that is target of a foreign key reference, then three explicit
possibilities exist to maintain database integrity.

Database Management System Page 31


32

1) All tuples that contain references to the deleted tuple should also be deleted.
This may cause, in turn, the deletion of other tuples. This option is referred to
as a domino or cascading deletion, since one deletion leads to another.
2) Only tuples that are not referenced by any other tuple can be deleted. A tuple
referred by other tuples in the database cannot be deleted.
3) The tuple is deleted. However, to avoid the domino effect, the pertinent
foreign key attributes of all referencing tuples are set to null.

The choice of the option to use during a tuple deletion depends on the application.

Relationship Set

A relationship is an association among several entities. A relationship is an association


between several entities. The number of entities associate in relationship is known as the
degree of relationship. A relationship set is a set of relationships of the same type. For
example, consider the two entity sets customer and account. We define the relationship
CustAcct to denote the association between customers and their accounts. This is a binary
relationship set(a relationship involves 2 entities) . The relationship set CustAcct is a subset
of all the possible customer and account pairings. The Degree of relationship is 2.

Customer CustAcct Account

If a relationship type is between entities in a single entity type then it is called a unary
relationship type. Employee works and repots to the manager who is also a employee, so
employee reports to the other employee which involves only one entity set so it is unary
relationship set. The Degree of relationship is 1.
Employee

manager managed

Reports_
to
It is possible to model relationship types involving more than two entity types. This
relationship type is said to be a ternary relationship type since three entity types are
involved. After check the patient doctor prescribe medicines to that patient according to the
diagnosis. So here three entities are involved Doctor, Patient and Medicine so it is ternary
relationship set. The Degree of relationship is 3.

Database Management System Page 32


33

Doctor Prescribe Patient

Medicine

A relationship that involves N entities is called an N-ary relationship. The Degree of


relationship is N.

Mapping Cardinality
Mapping cardinality is also known as cardinality ratio. It expresses the number of entities to
which another entity can be associated via a relationship set. Mapping Cardinalities are useful
in describing the binary relationship sets. For a binary relationship between two entity sets X
and Y we have the following mapping cardinalities.

One-to-one mapping:

An entity in X is associated with at least one entity in Y and an entity in Y is associated with
at least one entity in X. Example: husband - wife. For each entity there is one and only one
matching entity.

Database Management System Page 33


34

One-to-many mapping:

An entity in X is associated with any number (Zero or more) of entities in Y and an entity in
Y is associated with at least one entity in X. Example: Father and child. Father may have
more than one child.

Many-to-one mapping:

An entity in X is associated with at least one entity in Y. An entity in y can be associated with
any number (zero or more) of entities in X. Example: vendor and goods/product. One product
may have more than one vendor.

Database Management System Page 34


35

Many-to-many mapping:

An entity in X is associated with any number entities in y and an entity in y is associated with
any number of entities in X. Example: vender and client. A vender have more than one client
and a client have more than one vender.

Entity Relationship Diagram

An E-R diagram can express the overall logical structure of a database graphically.

Joins

Database Management System Page 35


36

An SQL join clause combines records from two or more tables in a database. It creates a set
that can be saved as a table or used as is. A JOIN is a means for combining fields from two
tables by using values common to each. ANSI standard SQL specifies four types of JOINs:
INNER, OUTER, LEFT, and RIGHT. As a special case, a table (base table, view, or joined
table) can JOIN to itself in a self-join.

Customers:

CustomerID FirstName LastName Email DOB Phone

1 John Smith [email protected] 2/4/1968 626 222-2222

2 Steven Goldfish [email protected] 4/4/1974 323 455-4545

3 Paula Brown [email protected] 5/24/1978 416 323-3232

4 James Smith [email protected] 20/10/1980 416 323-8888

Sales:

CustomerID Date SaleAmount

2 5/6/2004 $100.22

1 5/7/2004 $99.95

3 5/7/2004 $122.95

3 5/13/2004 $100.00

4 5/22/2004 $555.55

As you can see those 2 tables have common field called CustomerID and thanks to that we
can extract information from both tables by matching their CustomerID columns.

Consider the following SQL statement:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS


SalesPerCustomer
FROM Customers, Sales

Database Management System Page 36


37

WHERE Customers.CustomerID = Sales.CustomerID


GROUP BY Customers.FirstName, Customers.LastName

The SQL expression above will select all distinct customers (their first and last names) and
the total respective amount of dollars they have spent.
The SQL JOIN condition has been specified after the SQL WHERE clause and says that the
2 tables have to be matched by their respective CustomerID columns.

Here is the result of this SQL statement:

FirstName LastName SalesPerCustomers

John Smith $99.95

Steven Goldfish $100.22

Paula Brown $222.95

James Smith $555.55

The SQL statement above can be re-written using the SQL JOIN clause like this:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS


SalesPerCustomer
FROM Customers JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

There are 2 types of SQL JOINS – INNER JOINS and OUTER JOINS. If you don't put
INNER or OUTER keywords in front of the SQL JOIN keyword, then INNER JOIN is
used. In short "INNER JOIN" = "JOIN" (note that different databases have different syntax
for their JOIN clauses).

The INNER JOIN will select all rows from both tables as long as there is a match between
the columns we are matching on. In case we have a customer in the Customers table, which
still hasn't made any orders (there are no entries for this customer in the Sales table), this
customer will not be listed in the result of our SQL query above.

If the Sales table has the following rows:

CustomerID Date SaleAmount

Database Management System Page 37


38

2 5/6/2004 $100.22

1 5/6/2004 $99.95

And we use the same SQL JOIN statement from above:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS


SalesPerCustomer
FROM Customers JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

We'll get the following result:

FirstName LastName SalesPerCustomers

John Smith $99.95

Steven Goldfish $100.22

Even though Paula and James are listed as customers in the Customers table they won't be
displayed because they haven't purchased anything yet.

But what if you want to display all the customers and their sales, no matter if they have
ordered something or not? We’ll do that with the help of SQL OUTER JOIN clause.

The second type of SQL JOIN is called SQL OUTER JOIN and it has 2 sub-types called
LEFT OUTER JOIN and RIGHT OUTER JOIN.

The LEFT OUTER JOIN or simply LEFT JOIN (you can omit the OUTER keyword in
most databases), selects all the rows from the first table listed after the FROM clause, no
matter if they have matches in the second table.

If we slightly modify our last SQL statement to:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS


SalesPerCustomer
FROM Customers LEFT JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

Database Management System Page 38


39

and the Sales table still has the following rows:

CustomerID Date SaleAmount

2 5/6/2004 $100.22

1 5/6/2004 $99.95

The result will be the following:

FirstName LastName SalesPerCustomers

John Smith $99.95

Steven Goldfish $100.22

Paula Brown NULL

James Smith NULL

As you can see we have selected everything from the Customers (first table). For all rows
from Customers, which don’t have a match in the Sales (second table), the SalesPerCustomer
column has amount NULL (NULL means a column contains nothing).

The RIGHT OUTER JOIN or just RIGHT JOIN behaves exactly as SQL LEFT JOIN,
except that it returns all rows from the second table (the right table in our SQL JOIN
statement).

Database Management System Page 39


40

Functiona Dependency
For example, if A and B are attributes of a table, B is functionally dependent on A, if each
value of A is associated with exactly one value of B (so, you can say, 'A functionally
determines B').

Functional dependency between A and B

Attribute or group of attributes on the left hand side of the arrow of a functional dependency
is refered to as 'determinant'

Simple example would be StaffID functionally determines Position in the above tables.

 Full Functional dependency Indicates that if A and B are attributes(columns)of a


table, B is fully functionally dependent on A if B is functionally dependent on A ,but
not on any proper subset of A.
E.g. StaffID---->BranchID
 Partial Functional Dependency Indicates that if A and B are attributes of a table ,
B is partially dependent on A if there is some attribute that can be removed from A
and yet the dependency still holds.
Say for Ex, consider the following functional dependency that exists in the Tbl_Staff
table:
StaffID,Name -------> BranchID
BranchID is functionally dependent on a subset of A (StaffID,Name), namely
StaffID.
 Transitive Functional Dependency: A condition where A , B and C are attributes
of a table such that if A is functionally dependent on B and B is functionally
dependent on C then C is Transitively dependent on A via B. Say for Ex, consider
the following functional dependencies that exists in the Tbl_Staff_Branch table:

StaffID---->BranchID
BranchID----->Br_Address
So, StaffID attribute functionally determines Br_Address via BranchID attribute.

Database Management System Page 40


41

 Trivial Functional Dependency


A functional dependency FD: X → Y is called trivial if Y is a subset of X. We say
that an FD is a trivial FD if and only if the right-hand side is a subset (not necessarily
a proper subset) of the left-hand side.

STUDENT (Student_ID, First_Name, Last_Name)


The FD {First_Name, Last_Name} → {First_Name} is true but of little informational
interest.

 Non-trivial FDs are, of course, those which are not by definition trivial.

Closure of a set of Functional Dependencies

1. We need to consider all functional dependencies that hold. Given a set F of functional
dependencies, we can prove that certain other ones also hold. We say these ones are
logically implied by F.

2. Suppose we are given a relation scheme R = (A,B,C,G,H,I), and the set of functional
dependencies:

A→B
A→C
CG → H
CG → I
B→H

Then the functional dependency A → H is logically implied.

2. To see why, let t1 and t2 be tuples such that

t1[A] = t2[A]

As we are given A → B , it follows that we must also have


t1[B] = t2[B]

Further, since we also have B → H , we must also have


t1[H] = t2[H]

Thus, whenever two tuples have the same value on A, they must also have the same
value on H, and we can say that A → H .

Database Management System Page 41


42

4. The closure of a set F of functional dependencies is the set of all functional


dependencies logically implied by F.

5. We denote the closure of F by F+.

6. To compute F+, we can use some rules of inference called Armstrong's Axioms:

 Reexivity rule: if α is a set of attributes and β ∈α , then α→β holds.


 Augmentation rule: if α→β holds, and γ is a set of attributes, then γα →γβ
holds.
 Transitivity rule: if α→β holds, and β→γ holds, then α→γ holds.

7. These rules are sound because they do not generate any incorrect functional
dependencies. They are also complete as they generate all of F+.

8. To make life easier we can use some additional rules, derivable from Armstrong's
Axioms:

 Union rule: if α→β and α→γ , then α→βγ holds.


 Decomposition rule: if α→β γ holds, then α→β and α→γ both hold.
 Pseudotransitivity rule: if α→β holds, and γβ→δ holds, then αγ→δ holds.

9. Applying these rules to the scheme and set F mentioned above, we can derive the
following:

 A → H, since A → B and B → H hold, we apply the transitivity rule.


 CG → HI. Since CG → H and CG → I, the union rule implies that CG → H.
 AG → I. Since A → C and CG → I, the pseudo transitivity rule implies that
AG → I holds.

Boyce Code Normal Form


Take the following table structure as an example:

schedule(campus, course, class, time, room/bldg)

Take the following sample data:

campus course class Time room/bldg


East English 101 1 8:00-9:00 212 AYE
East English 101 2 10:00-11:00 305 RFK
West English 101 3 8:00-9:00 102 PPR

Database Management System Page 42


43

Note that no two buildings on any of the university campuses have the same name, thus
ROOM/BLDG CAMPUS. As the determinant is not a candidate key this table is NOT in
Boyce-Codd normal form.

This table should be decomposed into the following relations:

R1(course, class, room/bldg, time)

R2(room/bldg, campus)

Normalization Drawbacks

 By limiting redundancy, normalization helps maintain consistency and saves space.


 But performance of querying can suffer because related information that was stored in
a single relation is now distributed among several
 Example: A join is required to get the names and grades of all students taking CS343
in 2007F.

Student(Id,Name)

Transcript(StudId,CrsCode,Sem,Grade)

SELECT S.Name, T.Grade


FROM Student S, Transcript T
WHERE S.Id = T.StudId AND
T.CrsCode = ‘CS343’ AND T.Sem = ‘2007F’

Normalization Example
Grade_report(StudNo,StudName,Major,Adviser,CourseNo,Ctitle,InstrucName,InstructLocn,
Grade)

 Functional dependencies

StudNo -> StudName


CourseNo -> Ctitle,InstrucName
InstrucName -> InstrucLocn
StudNo,CourseNo,Major -> Grade
StudNo,Major -> Advisor
Advisor -> Major

 Unnormalised

Grade_report(StudNo,StudName,Major,Advisor,CourseNo,Ctitle,InstrucName,InstructLocn,
Grade)

Database Management System Page 43


44

 1NF Remove repeating groups

Student (StudNo, StudName)


StudMajor (StudNo, Major, Advisor)
StudCourse (StudNo, Major, CourseNo,Ctitle,InstrucName,InstructLocn,Grade)

 2NF Remove partial key dependencies

Student (StudNo, StudName)


StudMajor (StudNo, Major, Advisor)
StudCourse (StudNo, Major, CourseNo, Grade)
Course (CourseNo, Ctitle, InstrucName, InstructLocn)

 3NF Remove transitive dependencies

Student (StudNo, StudName)


StudCourse (StudNo, Major, CourseNo, Grade)
Course (CourseNo, Ctitle, InstrucName)
Instructor (InstructName, InstructLocn)
StudMajor (StudNo, Major, Advisor)

 BCNF Every determinant is a candidate key


 Student : only determinant is StudNo
 StudCourse: only determinant is StudNo,Major
 Course: only determinant is CourseNo
 Instructor: only determinant is InstrucName
 StudMajor: the determinants are StudNo,Major, or Adviser

Only StudNo,Major is a candidate key.

 BCNF

Student (StudNo, StudName)


StudCourse (StudNo, Major, CourseNo, Grade)
Course (CourseNo, Ctitle, InstrucName)
Instructor (InstructName, InstructLocn)
StudMajor (StudNo, Advisor)
Adviser (Adviser, Major)

Database Management System Page 44

You might also like