Mit (CS) 402
Mit (CS) 402
Mit (CS) 402
Unit-1
1.1 Learning Objectives
1.2 Introduction
1.3 Database system and application
1.4 Purpose of database system
1.5 Characteristics and Benefits of a Database
1.6 Components of DBMS
1.7 Merits and Demerits of DBMS
1.8 Database Architecture
1.9 Traditional file systems
1.10 View of data
1.11 Database language languages
1.12 Data Dictionary
1.13 Types of DBMS
1.13.1 Centralized DBMS
1.13.2 Parallel DBMS
1.13.3 Distributed DBMS
1.13.4 Client-Server DBMS
1.14 Relational databases
1.15 Database Design
1.16 Database Administrator
1.17 Check Your Progress
1.18 Answer to Check Your Progress
1.1 Learning Objectives
After going through this unit, the learner will able to learn:
1.2 Introduction
There are a number of characteristics that distinguish the database approach from the
file-based system or approach. This unit describes the benefits (and features) of the database
system.
Self-describing nature of a database system
A database system is referred to as self-describing because it not only contains the database
itself, but also metadata which defines and describes the data and relationships between tables
in the database. This information is used by the DBMS software or database users if needed.
This separation of data and information about the data makes a database system totally
different from the traditional file-based system in which the data definition is part of the
application programs.
Insulation between program and data
In the file-based system, the structure of the data files is defined in the application programs
so if a user wants to change the structure of a file, all the programs that access that file might
need to be changed as well.
On the other hand, in the database approach, the data structure is stored in the system
catalogue and not in the programs. Therefore, one change is all that is needed to change the
structure of a file. This insulation between the programs and data is also called program-data
independence.
Support for multiple views of data
A database supports multiple views of data. A view is a subset of the database, which is
defined and dedicated for particular users of the system. Multiple users in the system might
have different views of the system. Each view might contain only the data of interest to a user
or group of users.
Sharing of data and multiuser system
Current database systems are designed for multiple users. That is, they allow many users to
access the same database at the same time. This access is achieved through features called
concurrency control strategies. These strategies ensure that the data accessed are always
correct and that data integrity is maintained.
The design of modern multiuser database systems is a great improvement from those in the
past which restricted usage to one person at a time.
Software
Hardware
Data
Procedures
Users
HARDWARE: The hardware is the actual computer system used for keeping and
accessing the database. It consists of a set of physical electronic devices such as
computers (together with associated I/O devices like disk drives), storage devices, I/O
channels, electromechanical devices that make interface between computers and the
real world systems etc, and so on. It is impossible to implement the DBMS without
the hardware devices, In a network, a powerful computer with high data processing
speed and a storage device with large storage capacity is required as database server.
SOFTWARE: The main component of a DBMS is the software. It is the set of
programs used to handle the database and to control and manage the overall
computerized database. DBMS software itself, is the most important software
component in the overall system.
DATA: Data is the most important component of the DBMS. The main purpose of
DBMS is to process the data. In DBMS, databases are defined, constructed and then
data is stored, updated and retrieved to and from the databases. The database contains
both the actual data and the metadata.
PROCEDURES: Procedures refer to the instructions and rules that help to design the
database and to use the DBMS. The users that operate and manage the DBMS require
documented procedures on hot use or run the database management system.
USERS: The users are the people who manage the databases and perform different
operations on the databases in the database system. There are three kinds of people
who play different roles in database system
Application Programmers: These users implement specific application programs to
access the stored data. They must be familiar with the DBMSs to accomplish their
task.
Database Administrators: This may be one person or a group of people in an
organization responsible for authorizing access to the database, monitoring its use and
managing all of the resources to support the use of the entire database system.
End-Users: End users are the people whose jobs require access to a database for
querying, updating and generating reports.
1.7 Merits and Demerits of DBMS
Merits of DBMS
1. Improved data sharing.
The DBMS helps create an environment in which end users have better access to more and
better-managed data.
2. Improved data security.
The more the users access the data, the greater the risks of data security breaches. This is the
reason DBMS provides a framework for better enforcement of data privacy and security
policies.
3. Better data integration.
It is much easier to see how actions in one segment of the company affect other segments.
4. Minimized data inconsistency.
Data inconsistency exists when different versions of the same data appear in different places.
The probability of data inconsistency is greatly reduced in a properly designed database.
Before the advent of modern general purpose database systems, traditional file
systems were used to store, manipulate, retrieve and delete data. These file systems
comprised of two major components: the data stored as a collection of files, and several
application programs that accessed and manipulated the files. These systems were very
complicated because of a variety of reasons. Data was not centralized, meaning there could
be two or more copies of same data, resulting in redundancy. Redundancy would lead to
difficulty in maintaining consistency throughout the system. Often, application programs
were targeted to achieve some of the major queries, such as adding new tuples, removing
some data, manipulating existing data, etc. However, for each new query, either a new
application had to be written, or some entirely different programs were to be clubbed together
in a complex manner to achieve the desired result.
As technology improved, more and more advances were made in the existing database
systems. Progress was made towards more efficient systems, more application programs were
added to generalize individual systems, among other changes. This resulted in different types
of database systems, as follows:
1. Single file or flat file database system: This is the traditional file based data storage
system, consisting of collection of files and related application programs.
2. Hierarchical system: Data is stored in a hierarchical format, with parent entities at the
top, linked to its child entities. As the name suggests, hierarchical system enforces a tree
like structure, with the only links possible between parent and its children. There is no
link between siblings, which may or may not be a disadvantage, based on the user
requirements. The implementation is pretty straightforward, and hence it is highly
recommended for storing and managing structured data. An example approach is using
XML (eXtensible Markup Language).
3. Networked system: A network system allows for more interconnections between data,
that is when many-to-many relationships exist between records, network systems should
be used. It is more efficient and useful for large scale projects, such as maintaining
company records, where records may not necessarily be stored in a hierarchy. Network
based DBMS are useful for managing structured data. High level languages such as
Pascal, Cobol, C++ are used to implement network – based database systems.
4. Relational database system: This type of system is one of the most popular systems in
use today. It emphasizes the importance of relations between records and entities. This
paves way for more flexibility and more extensibility. Data is stored in tabular form, and
relationships are clearly defined between each entity or table. Such DBMS are also called
as RDBMS. Several popular RDBMS are Oracle, MS Access, SQL.
5. Object-oriented approach: This is also one of the very popular systems, and the most
modern one. Classes and objects are used to describe real world entities. It has a high
significance in today‘s world, where unstructured and semi – structured data have become
mainstream. Object-oriented DBMS are comprised of objects and their behavior. Objects
are implemented with the help of a variety of data structures, and this ability to store
information of all types and in all formats is what makes this approach attractive.
Behavior of objects is implemented with the help of sub – programs called methods, or
subroutines, functions etc. One major disadvantage is the cost, as these systems can be
expensive and are suitable for large scale projects. These can be implemented using high
level languages such as Java, C++.
Some examples of general purpose database systems are MySQL, MS Access, SQL. It is
possible for programmers to build their own database system for specific requirements.
Data Independence
The ability to modify a schema definition in one level without affecting a scheme definition
in the next higher level is called Data Independence. There are two levels of data
independence.
i. Physical Data Independence: The ability to modify the scheme followed at the
physical level without affecting the scheme at the conceptual level.
ii. Logical Data Independence: The ability to modify the conceptual scheme
without causing any changes in the scheme followed at the view levels. i.e. The
application program remain same even after changes in conceptual level.
Data Models
High-level Conceptual Data Models
High-level conceptual data models provide concepts for presenting data in ways that are close
to the way people perceive data. A typical example is the entity relationship model, which
uses main concepts like entities, attributes and relationships. An entity represents a real-world
object such as an employee or a project. The entity has attributes that represent properties
such as an employee‘s name, address and birthdate. A relationship represents an association
among entities; for example, an employee works on many projects. A relationship exists
between the employee and each project.
Record-based Logical Data Models
Record-based logical data models provide concepts users can understand but are not too far
from the way data is stored in the computer. Three well-known data models of this type are
relational data models, network data models and hierarchical data models.
The relational model represents data as relations, or tables. For example, in the
membership system at Science World, each membership has many members. The
membership identifier, expiry date and address information are fields in the
membership. The members are individuals such as Mickey, Minnie, Mighty, Door,
Tom, King, Man and Moose. Each record is said to be an instance of the
membership table.
The network model represents data as record types. This model also represents a
limited type of one to many relationship called a set type, as shown in the below
Figure
Network model diagram
The hierarchical model represents data as a hierarchical tree structure. Each branch
of the hierarchy represents a number of related records. The below Figure shows
this schema in hierarchical model notation.
A database system provides data- definition language to specify the database schema
and A DDL is a language used to define data structures within a database. Data-manipulation
language to express database queries and updates. DML is responsible for retrieval, insertion,
deletion, modification of information in the database.
Data- Manipulation Language
A data manipulation language (DML) is a family of syntax elements similar to a computer
programming language used for selecting, inserting, deleting and updating data in a database.
Performing read-only queries of data is sometimes also considered a component of DML. A
data manipulation language is a language that enables the database user to access and
manipulate the data as organized by the appropriate data model. Data Manipulation Language
(DML) statements which are used for managing data within schema objects.
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
Data-Definition Language
A DDL is a language used to define data structures within a database. Basic idea is Hide
implementation details of the database schemes from the users. DDL statements are
compiled, resulting in a set of tables stored in a special file called a data dictionary or data
directory. The data directory contains metadata (data about data). It is typically considered to
be a subset of SQL, but can also refer to languages that define other types of data. The DDL
concept and name was first introduced in relation to the Codasyl database model, where the
schema of the database was written in a language syntax describing the records, fields, and
sets of the user data model.
Here are some examples of Data Definition Language (DDL) statements which are used to
define the database structure or schema.
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
ii. Parallel DBMS: A parallel database system seeks to improve performance through
parallelization of various operations, such as loading data, building indexes and
evaluating queries. Although data may be stored in a distributed fashion, the
distribution is governed solely by performance considerations. Parallel databases
improve processing and input/output speeds by using multiple CPUs and disks in
parallel. Centralized and client–server database systems are not powerful enough to
handle such applications. In parallel processing, many operations are performed
simultaneously, as opposed to serial processing, in which the computational steps are
performed sequentially.
A relational database is a system for storing and accessing data organized into
relations. A relation is a bag of tuples. Each tuple is an ordered sequence of attributes. Each
attribute is a data value belonging to some data type. All of the tuples in a relation have the
same number of attributes. In addition, the relation has a schema that is imposed on each
tuple in the relation, specifying what the data type each attribute in each tuple will have. For
example, the relation‘s schema might specify that the first attribute in each tuple is an integer.
The relation‘s schema also gives a name to each attribute. The attribute names give us a
convenient way of referring to tuple attributes without having to say ―the first attribute‖, ―the
fourth attribute‖, etc.
Typical relational databases support numeric and text data types as tuple attributes. Many
relational databases also support ―BLOBs‖ (Binary Large OBjects) as attributes. A BLOB is
a large, uninterpreted chunk of data. BLOBs are useful for storing files, images, and other
large chunks of data in a relational database.
Tables
A table is a collection of rows having one or more columns. A row is an instance of a row
type. Every row of the same table has the same row type. The value of the n-th field of every
row in a table is the value of the n-th column of that row in the table. The row is the smallest
unit of data that can be inserted into a table and deleted from a table.
The number of rows in a table is its cardinality. A table whose cardinality is 0 (zero) is said to
be empty.
In the following pictorial presentation of a table and different components of it :
A column name can be used in more than one tables and to maintain the integrity of data and
reduce redundancy. This is called a relation.
Elements of a table
The information of a table stored in some heads, those are fields or columns. Columns show
vertically in a table.
Each field or column has an individual name. A table cannot contain the same name of two
different columns
All the columns in a table make a row. Each row contains all the information of individual
topics.
1 Smallfinger F.G.
2 Whittlbey W.H.J.
3 Earwig Lettice
4 Lightly W.E.
5 Tacticus Callus
The books relation has been changed so that the author of each book is represented by a
unique integer identifier, the author_id attribute. This attribute also exists in the authors
relation. So, the author of each book tuple in the books relation is represented indirectly, by
reference to a matching author tuple in the authors relation.
1.15 Database Design
Database design is the process of producing a detailed data model of database. This
data model contains all the needed logical and physical design choices and physical storage
parameters needed to generate a design in a data definition language, which can then be used
to create a database. A fully attributed data model contains detailed attributes for each entity.
The term database design can be used to describe many different parts of the design of an
overall database system. Principally, and most correctly, it can be thought of as the logical
design of the base data structures used to store the data. In the relational model these are the
tables and views. In an object database the entities and relationships map directly to object
classes and named relationships. However, the term database design could also be used to
apply to the overall process of designing, not just the base data structures, but also the forms
and queries used as part of the overall database application within the database management
system (DBMS).
The process of doing database design generally consists of a number of steps which will be
carried out by the database designer. Usually, the designer must:
Within the relational model the final step above can generally be broken down into two
further steps, that of determining the grouping of information within the system, generally
determining what are the basic objects about which information is being stored, and then
determining the relationships between these groups of information, or objects. This step is not
necessary with an Object database.
1.2 Introduction
A database model is a type of data model that determines the logical structure of a
database and fundamentally determines in which manner data can be stored, organized and
manipulated. The most popular example of a database model is the relational model, which
uses a table-based format.
1.3 What is data Model?
In the database design phases, data are represented using a certain data model. The
data model is a collection of concepts or notations for describing data, data relationships, data
semantics and data constraints. Most data models also include a set of basic operations for
manipulating data in the database.
1.4 Need for Data Model?
The purpose of a data model is to represent data and to make the data understandable.
There have been many data models proposed in the literature. They fall into three broad
categories:
Object Based Data Models
Physical Data Models
Record Based Data Models
The object based and record based data models are used to describe data at the conceptual
and external levels, the physical data model is used to describe data at the internal level.
Object Based Data Models
Object based data models use concepts such as entities, attributes, and relationships. An
entity is a distinct object (a person, place, concept, and event) in the organization that is to be
represented in the database. An attribute is a property that describes some aspect of the object
that we wish to record, and a relationship is an association between entities.
Some of the more common types of object based data model are:
• Entity-Relationship
• Object Oriented
• Semantic
• Functional
The Entity-Relationship model has emerged as one of the main techniques for modeling
database design and forms the basis for the database design methodology. The object oriented
data model extends the definition of an entity to include, not only the attributes that describe
the state of the object but also the actions that are associated with the object, that is, its
behavior. The object is said to encapsulate both state and behavior. Entities in semantic
systems represent the equivalent of a record in a relational system or an object in an OO
system but they do not include behaviour (methods). They are abstractions 'used to represent
real world (e.g. customer) or conceptual (e.g. bank account) objects. The functional data
model is now almost twenty years old. The original idea was to' view the database as a
collection of extensionally defined functions and to use a functional language for querying
the database.
Physical Data Models
Physical data models describe how data is stored in the computer, representing information
such as record structures, record ordering, and access paths. There are not as many physical
data models as logical data models, the most common one being the Unifying Model.
Record Based Logical Models
Record based logical models are used in describing data at the logical and view levels. In
contrast to object based data models, they are used to specify the overall logical structure of
the database and to provide a higher-level description of the implementation. Record based
models are so named because the database is structured in fixed format records of several
types. Each record type defines a fixed number of fields, or attributes, and each field is
usually of a fixed length.
The three most widely accepted record based data models are:
Hierarchical Model
Network Model
Relational Model
The relational model has gained favor over the other two in recent years. The network and
hierarchical models are still used in a large number of older databases.
The entity relationship (ER) data model has existed for over 35 years. It is well suited
to data modeling for use with databases because it is fairly abstract and is easy to discuss and
explain. ER models are readily translated to relations. ER models, also called an ER schema,
are represented by ER diagrams.
ER modeling is based on two concepts:
• Entities, defined as tables that hold specific information (data)
• Relationships, defined as the associations or interactions between entities
We will use a sample database called the COMPANY database to illustrate the concepts of
the ER model. This database contains information about employees, departments and
projects. Important points to note include:
There are several departments in the company. Each department has a unique
identification, a name, location of the office and a particular employee who manages
the department.
A department controls a number of projects, each of which has a unique name, a
unique number and a budget.
Each employee has a name, identification number, address, salary and birthdate. An
employee is assigned to one department but can join in several projects. We need to
record the start date of the employee in each project. We also need to know the direct
supervisor of each employee.
We want to keep track of the dependents for each employee. Each dependent has a
name, birthdate and relationship with the employee.
Entity, Entity Set and Entity Type
An entity is an object in the real world with an independent existence that can be
differentiated from other objects. An entity might be
• An object with physical existence (e.g., a lecturer, a student, a car)
• An object with conceptual existence (e.g., a course, a job, a position)
Entities can be classified based on their strength. An entity is considered weak if its tables are
existence dependent.
• That is, it cannot exist without a relationship with another entity
• Its primary key is derived from the primary key of the parent entity
The Spouse table, in the COMPANY database, is a weak entity because its primary key is
dependent on the Employee table. Without a corresponding employee record, the spouse
record would not exist.
An entity is considered strong if it can exist apart from all of its related entities.
• Kernels are strong entities.
• A table without a foreign key or a table that contains a foreign key that can contain
nulls is a strong entity
Another term to know is entity type which defines a collection of similar entities.
An entity set is a collection of entities of an entity type at a particular point of time. In an
entity relationship diagram (ERD), an entity type is represented by a name in a box. For
example, in the following Figure the entity type is EMPLOYEE.
Simple attributes
Simple attributes are those drawn from the atomic value domains; they are also called single-
valued attributes. In the COMPANY database, an example of this would be: Name = {John} ;
Age = {23}
Composite attributes
Composite attributes are those that consist of a hierarchy of attributes. Using our database
example, and shown in the following Figure, Address may consist of Number, Street and
Suburb. So this would be written as → Address = {59 + ‗Meek Street‘ + ‗Kingsford‘}
1.
i) F ii) T ii) T iv) F v) T
vi) F vii) T viii) T ix) F x) T
2.
(a) T (b) F (c) T d. T e. F
3.
(a) association (b) attributes (c) E-R
(d) diamond (e) many-to-many
Unit-3
1.1 Learning Objectives
1.2 Introduction
1.3 What is Relational data model
1.4 Relation, Tuple, Attribute, Cardinality, Degree, Domain
1.5 Check Your Progress
1.6 Answer to Check Your Progress
After going through this unit, the learner will able to learn:
About the relational data model
About Relation, Tuple, Attribute, Cardinality, Degree and Domain
1.2 Introduction
A relational database is a system for storing and accessing data organized into
relations. A relation is a bag of tuples. Each tuple is an ordered sequence of attributes. Each
attribute is a data value belonging to some data type. All of the tuples in a relation have the
same number of attributes. In addition, the relation has a schema that is imposed on each
tuple in the relation, specifying what the data type each attribute in each tuple will have. For
example, the relation‘s schema might specify that the first attribute in each tuple is an integer.
The relational data model was introduced by C. F. Codd in 1970. Currently, it is the most
widely used data model.
The relational model has provided the basis for:
Research on the theory of data/relationship/constraint
Numerous database design methodologies
The standard database access language called structured query language (SQL)
Almost all modern commercial database management systems
The relational data model describes the world as ―a collection of inter-related relations (or
tables).‖
Fundamental Concepts in the Relational Data Model
Relation
A relation, also known as a table or file, is a subset of the Cartesian product of a list of
domains characterized by a name. And within a table, each row represents a group of related
data values. A row, or record, is also known as a tuple.
The columns in a table is a field and is also referred to as an attribute. You can also think of it
this way: an attribute is used to define the record and a record contains a set of attributes.
The steps below outline the logic between a relation and its domains.
1. Given n domains are denoted by D1, D2, … Dn
2. And r is a relation defined on these domains
3. Then r ? D1×D2×…×Dn
Table
A database is composed of multiple tables and each table holds the data. Figure 7.1 shows a
database that contains three
tables.
Column
A database stores pieces of information or facts in an organized way. Understanding how to
use and get the most out of databases requires us to understand that method of organization.
The principal storage units are called columns or fields or attributes. These house the basic
components of data into which your content can be broken down. When deciding which
fields to create, you need to think generically about your information, for example, drawing
out the common components of the information that you will store in the database and
avoiding the specifics that distinguish one item from another.
Look at the example of an ID card in the following Figure to see the relationship between
fields and their data.
Domain
A domain is the original sets of atomic values used to model data. By atomic value, we mean
that each value in the domain is indivisible as far as the relational model is concerned. For
example:
The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
The domain of Salary is the set of all floating-point numbers greater than 0 and less
than 200,000.
The domain of First Name is the set of character strings that represents names of
people.
In summary, a domain is a set of acceptable values that a column is allowed to contain. This
is based on various properties and the data type for the column.
Records
Just as the content of any one document or item needs to be broken down into its constituent
bits of data for storage in the fields, the link between them also needs to be available so that
they can be reconstituted into their whole form.
Records allow us to do this. Records contain fields that are related, such as a customer or an
employee. As noted earlier, a tuple is another term used for record.
Records and fields form the basis of all databases. A simple table gives us the clearest picture
of how records and fields work together in a database storage project.
The simple table example in above Figure shows us how fields can hold a range of different
sorts of data. This one has:
A Record ID field: this is an ordinal number; its data type is an integer.
A PubDate field: this is displayed as day/month/year; its data type is date.
An Author field: this is displayed as Initial. Surname; its data type is text.
A Title field text: free text can be entered here.
You can command the database to sift through its data and organize it in a particular way. For
example, you can request that a selection of records be limited by date: 1. all before a given
date, 2. all after a given date or 3. all between two given dates. Similarly, you can choose to
have records sorted by date. Because the field, or record, containing the data is set up as a
Date field, the database reads the information in the Date field not just as numbers separated
by slashes, but rather, as dates that must be ordered according to a calendar system.
Degree
The degree is the number of attributes in a table. In our example in above Figure , the degree
is 4.
Properties of a Table
A table has a name that is distinct from all other tables in the database.
There are no duplicate rows; each row is distinct.
Entries in columns are atomic. The table does not contain repeating groups or
multivalued attributes.
Entries from columns are from the same domain based on their data type including:
number (numeric, integer, float, smallint,…)
character (string)
date
logical (true or false)
Operations combining different data types are disallowed.
Each attribute has a distinct name.
The sequence of columns is insignificant.
The sequence of rows is insignificant.
In this relation, there are four attributes called author_lastname, author_firstname, title,
and ISBN. Each attribute is a text string.
Databases with multiple relations
Databases will typically have many relations. One motivation for allowing multiple relations
in a database is to avoid storing redundant information. For example, in the relation above,
there are two tuples representing books by the same author, Callus Tacticus. Because the
author name is represented twice, there is the possibility that this information might not be
recorded consistently if the relation were modified.
We can avoid this redundancy by splitting the database into two relations, books and
authors:
1 Smallfinger F.G.
2 Whittlbey W.H.J.
3 Earwig Lettice
author_id author_lastname author_firstname
4 Lightly W.E.
5 Tacticus Callus
The books relation has been changed so that the author of each book is represented by a
unique integer identifier, the author_id attribute. This attribute also exists in the authors
relation. So, the author of each book tuple in the books relation is represented indirectly, by
reference to a matching author tuple in the authors relation.
Queries, SQL
This query will match a single tuple in the books relation, and return a single title value, A
History of Hats.
Joins
A join is a query which retrieves information from multiple relations. Joins are a powerful
way to exploint associations between tuples in different relations. The idea is that a query
retrieving information from multiple relations will specify a join condition which links
attribute values in tuples of two relations.
Let‘s consider how to find the titles of all books by F.G. Smallfinger in the second version
of the database, where we have two relations, books and authors. We will need to do a join
of both relations in order to connect the author name and book title, which are now stored in
different relations:
select books.title
from books, authors
where books.author_id = authors.author_id
and authors.author_lastname = 'Smallfinger' and
authors.author_firstname = 'F.G.'
Cardinality
Cardinality describes the relationship between two data tables by expressing the minimum
and maximum number of entity occurrences associated with one occurrence of a related
entity. In the following Figure, you can see that cardinality is represented by the innermost
markings on the relationship symbol. In this figure, the cardinality is 0 (zero) on the right and
1 (one) on the left.
The outermost symbol of the relationship symbol, on the other hand, represents the
connectivity between the two tables. Connectivity is the relationship between two tables, e.g.,
one to one or one to many. The only time it is zero is when the FK can be null. When it
comes to participation, there are three options to the relationship between these entities:
either 0 (zero), 1 (one) or many. In the following Figure, for example, the connectivity is 1
(one) on the outer, left-hand side of this line and many on the outer, right-hand side.
Figure: Position of connectivity and cardinality on a relationship
symbol, by A. Watt.
The following Figure shows the symbol that represents a one to many relationship.
In the following Figure, both inner (representing cardinality) and outer (representing
connectivity) markers are shown. The left side of this symbol is read as minimum 1 and
maximum 1. On the right side, it is read as: minimum 1 and maximum many.
Degree
Degrees are defined prior to people being awarded them. The degree of relationship is the
number of occurrences in one entity which are associated to the number of occurrences in
another. There are three degrees of relationship, known as:
1. one-to-one
2. one-to-many
3. many-to-many
Domain
In data management and database analysis, a data domain refers to all the values which a data
element may contain. The rule for determining the domain boundary may be as simple as a
data type with an enumerated list of values.
For example, a database table that has information about people, with one record per person,
might have a "gender" column. This gender column might be declared as a string data type,
and allowed to have one of two known code values: "M" for male, "F" for female—and
NULL for records where gender is unknown or not applicable (or arguably "U" for unknown
as a sentinel value). The data domain for the gender column is: "M", "F".
a. relational database
b. attribute
c. join
d. query
e. tables
Block-2
Unit-1
1.1 Learning Objectives
1.2 Introduction
1.3 Characteristic of SQL
1.4 Basic Structure of SQL Queries
1.5 Basic Data Types
1.6 SQL Commands
1.7 Useful Relational Operator
1.8 Aggregate Functions
1.9 SUM function
1.10 AVG Function
1.11 Check Your Progress
1.12 Answer to Check Your Progress
After going through this unit, the learner will able to learn:
To define the SQL data definition
Basic Structure of SQL Queries
Set Operation
Aggregate Functions
SQL COMMANDS
1.2 Introduction
In June 1970 Dr. E. F. Codd published the paper, "A Relational Model of Data for
Large Shared Data Banks" in the Association of Computer Machinery (ACM) journal.
Codd's model is now accepted as the definitive model for relational database management
ystems (RDBMS).
Using Codd's model the language, Structured English Query Language (SEQUEL) was
developed by IBM Corporation in San Jose Research Center. The language was first
called SEQUEL but Official pronunciation of SQL is ESS QUE ELL.
In 1979 Oracle introduced the first commercially available implementation of SQL. Later
other players join in the race. Today, SQL is accepted as the standard RDBMS language.
We human beings communicate with each other with the help of language. Similiarly,
SQL stands for Structured Query Language is the language that a database understands,
and we will communicate to the database using SQL. SQL is a 4TH generation database
gateway language standardized by ANSI (American National Standards Institute) for
managing data held in a RDBMS (Relational Database Management Systems).
What is SQL?
SQL works with database programs like DB2, MySQL, PostgreSQL, Oracle, SQLite,
SQL Server, Sybase, MS Access and much more. There are many different versions of
the SQL language, but to be in compliance with the ANSI standard, they support the
major keyword such as SELECT, UPDATE, DELETE, INSERT, WHERE, and others.
The following picture shows the communicating with an RDBMS using SQL.
Each column in an RDBMS table specifies/declares the types of data that the columns
stores. This enables RDBMS to use storage space more efficiently by internally storing
different types of data in different ways. ANSI SQL includes the following basic data types.
1. CHARACTER
- CHARACTER(n) or CHAR(n): Fixed width n-character string, padded with
spaces as needed.
- CHARACTER VARYING(n) or VARCHAR(n): Variable width string with a
maximum size of n characters.
- NATIONAL CHARACTER(n) or NCHAR(n): Fixed width n-character string
supporting an international character set(Unicode Character).
- NATIONAL CHARACTER VARYING(n) or NVARCHAR(n): Variable width
string with a maximum size of n characters supporting an international character set.
2. NUMERIC
- SMALLINT, INTEGER OR INT
- FLOAT, REAL and DOUBLE PRECISION
- NUMERIC(precision, scale) or DECIMAL(precision, scale) (e.g. 1234.56)
The number 1234.56 has a precision of 6 and a scale of 2. A scale of 0 indicates that
the number is an integer.
3. DATE
- DATE: Date values (e.g. 1999-12-31).
- TIME: Time values (e.g. 23:30:10). The granularity of the time value is usually a
tick (100 nanoseconds).
- TIMESTAMP: Date and a Time put together (e.g. 1999-12-31 23:30:10).
4. BIT
- BIT: bit values (e.g. 0 or 1)
Additional NON-ANSI LARGE OBJECT data types - CLOB, BLOB, LONG, RAW,
LONG RAW (Oracle specific); VARBINARY(n) (MSSQL Server specific)
The scope of SQL includes schema creation and modification, data access control, data
insert, query, update and delete. SQL Commands are broadly classified into the following
categories:
Data Definition Language (DDL): DDL stands for Data Definition Language. These
statements are used to define the database structure (also known as database schema). Given
below are the DDL statements:
CREATE – CREATE is used for creating database objects, such as tables, views,
indexes, etc.
DROP – DROP is used for deleting database objects.
ALTER – ALTER is used for modifying the structure of database objects.
RENAME – RENAME is used for renaming database objects.
TRUNCATE – TRUNCATE is used for deleting all records from a table.
COMMENT – COMMENT is used for adding comments to a data dictionary.
Data Manipulation Language (DML): These statements are used to manage data within
database objects. Given below are the DML commands:
SELECT – SELECT is used to retrieve data from a database.
INSERT – INSERT is used to insert data into a table.
UPDATE – UPDATE is used to update existing data within a table.
DELETE – DELETE is used to delete all records from a table, the space for the
records remain.
MERGE – MERGE is used to UPSERT operation (conditional INSERT/UPDATE).
CALL – CALL is used to call a PL/SQL or Java subprogram.
EXPLAIN PLAN – EXPLAIN PLAN is used to explain access path to data.
LOCK TABLE – LOCK TABLE is used to control concurrency.
Data Query Language (DQL): Data Query Language (DQL) mainly deals with SQL
SELECT statement for retrieving data from a database
SELECT – SELECT command is used to select statement for retrieving data from a
database
Transaction Control Language (TCL): The TCL statements are used to manage the
changes made by DML statements. It allows statements to be grouped together into logical
transactions.
COMMIT – COMMIT saves the work done.
SAVEPOINT – SAVEPOINT identifies a point in a transaction to which you can later
roll back.
ROLLBACK – ROLLBACK restores database to original state since the last
COMMIT.
SET TRANSACTION – SET TRANSACTION changes transaction options like
isolation level and what rollback segment to use.
Data Control Language (DCL): DCL stands for Data Control Language. Given below are
the DCL statements:
GRANT – GRANT command gives user's access privileges to database.
REVOKE – REVOKE command withdraws access privileges given with the GRANT
command.
UNION operator : The UNION relational operator allows to join information from two or
more tables that have the same structure. Tables with same structure means :
The tables must have the same number of columns.
The corresponding columns must have identical data types and lengths.
The syntax for the SQL UNION is as :
SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
The union of two tables returns all the rows that appear in either table. The duplicate
rows are eliminated. To allow the duplicate values the UNION ALL operator is used. The
syntax of which is as :
SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2
Suppose there are two tables called EMP and CUST having three columns each and
also suppose the data types of each of the corresponding columns in the tables are the same.
Suppose the table have the following data in them.
SELECT * FROM EMP;
First Last Name Age
Name
Manab Nath 32
Rahul Sarma 40
Binod Moshahary 35
SELECT * FROM CUST;
First Name Last Name Age
Hemant Rajkonwar 36
Manab Nath 32
Manash Kalita 25
Now, the Union of the above two tables displays a virtual result table containing all
rows of the first table as well as all the rows of the second table.
SELECT * FROM EMP
UNION
SELECT * FROM CUST;
INTERSECTION operator : The SQL INTERSECT operator also operates on two tables but
unlike the UNION operator, it returns only those rows that appear in both the tables. It
removes duplicate rows from the final result table. The INTERSECT ALL operator does not
remove duplicate rows from the result table. The syntax for the INTERSECT operator is as :
[SQL statement1]
INTERSECT
[SQL statement2]
For example,
SELECT * FROM EMP
INTERSECT
SELECT * FROM CUST;
The result table would look as follows:
First Name Last Name Age
Manab Nath 32
EXCEPT operator : The SQL EXCEPT operator returns all rows from a database
table that appears in the first table but that do not appear in the second table.
Example :
SELECT * FROM EMP
EXCEPT
SELECT * FROM CUST;
The output table would be as follows:
First Name Last Name Age
Binod Moshahary 35
Rahul Sarma 40
JOIN operator: The JOIN operator is a powerful relational operator which can combine data
from multiple tables. Tables are joined on the columns that have the same data type and
width. SQL supports different types of JOIN operations — INNER, OUTER and CROSS.
The INNER JOIN is also known as Equi Join. This is because the SELECT WHERE
statement generally compares two columns of two different tables with the equivalence
operator ‗=‘. The syntax for this type of Join is as follows:
SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Suppose the PRODUCT table has the following data:
SALES table:
cust_id Prod_id quantity
C1 P1 35
C5 P3 55
Now, to list the quantity of the products sold, the SQL statement would be as :
SELECT Product.prod_name, Sales.quantity
FROM Product
INNER JOIN Sales
ON Product.Prod_id=Sales.Prod_id
The result table would be as
Prod_name quantity
Shirt 35
T-shirt 55
OUTER JOIN is similar to INNER JOIN but is more flexible in selecting the data from the
tables. This type of Join is used to select data or rows from the table on the left or right or
both regardless of whether the other table has common values.
The syntax for LEFT Join is :
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Suppose to list all the products along with the number of quantities sold, the SQL statement
would be as :
SELECT Product. Prod-_name, Product.prod_qty,
Sales.quantity
FROM Product
LEFT JOIN Sales
ON Product.Prod_id=Sales.Prod_id
The result table would be as :
prod_name prod_qty quantity
Shirt 100 35
Trouser 100
T-shirt 80 55
The LEFT Join returns all the rows from the left table even if there are no matches in the
right table.
The RIGHT Join returns all the rows from the right table even if there are no matches in the
left table. The syntax is as follows:
SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
CROSS JOIN returns a Cartesian product. Cartesian product means it returns the number of
rows that is equal to the product of all rows in both the tables being joined. That is, it
combines every row from the left table with every row from the right table. For example, if
the first table has 20 rows and the second table has 10 rows, the result will be 20 * 10, or 200
rows. This type of query takes a long time to execute. The pictorial representation of cross
join syntax is as:
The SUM() function returns the total sum of the numeric values of a column. The
syntax is as follows:
SELECT SUM <column_name> FROM <tablename>;
To understand this function better, let us take an example. Suppose we have the
following data in the Products table.
PROD_ID PROD_NAME PROD_QTY
P1 T-Shirts 200
P2 Jeans 150
P3 Trousers 100
P4 Pull Overs 80
P5 Shirts 200
SELECT SUM (prod_qty) ―Total Quantity‖ FROM Product;
The result will be the total number of product quantities in the Product table. So, the
output of the above statement will be
Total Quantity
730
a. SUM()
b. Data
c. INNER JOIN
d. JOIN
e. Tables
f. relational
Unit -2
1.1 Learning Objectives
1.2 Introduction
1.3 Compound Conditions and Logical Operators
1.4 AND Operator
1.5 OR Operator
1.6 Combining AND and OR Operators
1.7 IN Operator
1.8 BETWEEN Operator
1.9 NOT Operator
1.10 Order of Precedence for Logical Operators
1.11 LIKE Operator
1.12 Concatenation Operator
1.13 Alias Column Names
1.14 ORDER BY Clause
1.15 Handling NULL Values
1.16 Check Your Progress
1.17 Answer to Check Your Progress
1.2 Introduction
We have already learnt in the previous unit about the basics of Structured Query
Language (SQL). We have already learnt the creation, insertion and deletion of tables and
data of the tables. SQL can also execute queries against a database and can retrieve data from
the databases. Updating of data in tables can also be performed in the tables.
In this unit we will discuss the operators that are used to operate the data in the database
tables. Operators are used in almost every SQL statement. They are used to compare,
evaluate or calculate values. They tell SQL how to evaluate an SQL expression or a
conditional statement. We will find here that operators are mostly used in the WHERE
clause of the SQL statement.
1.3 Compound Conditions and Logical Operators
SQL AND joins together two or more conditions and returns result only when all of
the conditions are true. AND helps to query for very specific records. There is no limit to the
number of AND conditions that can be applied to a query by utilizing the WHERE clause.
For example, the following query will find out only the rows where the customer
identity is ‗c10‘ and the product identity is ‗p5‘. It will not find a row for customer identity
‘c12‘ and product identity p5.
SELECT *
FROM Sales
WHERE cust_id=‘c10‘
AND prod_id=‘p5‘;
The output of the above query will be as follows:
1.5 OR Operator
Logical OR compares two Booleans as expression and returns TRUE when either of the
conditions is TRUE and returns FALSE when both are FALSE. otherwise, returns
UNKNOWN (an operator that has one or two NULL expressions returns UNKNOWN).
Example :
To get data of 'cust_code', 'cust_name', 'cust_city', 'cust_country' and 'grade' from the
'customer' with following conditions -
1. either 'cust_country' is ‘USA‘,
2. or 'grade' of the 'customer' is 3,
the following SQL statement can be used :
SELECT cust_code,cust_name, cust_city,cust_country,grade
FROM customer
WHERE cust_country = 'USA' OR grade = 3;
Output :
The statements in SQL can be combined using the logical operators AND and OR.
SQL engine will display the results when all the conditions specified with the AND operator
are satisfied and when any of the conditions specified with the OR operator are specified. Let
us take an example to understand how to combine these operators together into a single
statement.
Suppose the ‗Employee‘ table contains the following data in it:
Fig : Table 1
Now, combining the AND and OR operators into a single statement,
SELECT emp_id, emp_name
FROM Employee
WHERE Position = ‗Manager‘ AND sex=‘M‘
OR Salary < 80000
This statement would return all those records where the employee‘s position is
manager and is a male employee. It will also return those records where the salary of the
employee is less than 80000.
1.7 IN Operator
Suppose from the above table, we need the records of the product Id‘s P2, P4, P5. The SQL
statement for this query is as follows:
SELECT *
FROM Products
WHERE prod_id IN (‗P2,‘P4',‘P5‘);
The result-set will look like as follows:
The BETWEEN operator is used with a WHERE clause to test whether a value lies in
a specified range of values. x is BETWEEN a and b means that x >= a and x <= b. This is
also a comparison operator. The syntax of BETWEEN operator is as follows:
SELECT <column_name(s)>
FROM <table_name>
WHERE <column_name>
BETWEEN value1 AND value2
Suppose the ‗EMPLOYEE‘ table contains the following data in it:
Operator precedence determines the sequence in which the operations are performed
in an SQL expression with multiple operators. While evaluating the SQL statements with
multiple operators, the operators with higher precedence are evaluated first before evaluating
those with lower precedence.
When more than one logical operators are present in an SQL expression, NOT is
evaluated first, then AND and finally OR.
Order of Precedence for logical operators is shown in the table below. Arithmetic
operators and comparison operators take higher precedence than logical operators.
Order Evaluated Operator
1 Arithmetic operators
2 Comparison operators
3 NOT
4 AND
5 OR
The ORDER BY clause is used to display the output table of a query in either
ascending or descending alphabetical order. It sorts the individual rows of a table. The syntax
for the ORDER BY clause is:
SELECT <column_names>
FROM <Tablename>
WHERE <predicates>
ORDER BY <column_name>;
The default sort order for ORDER BY clause is an ascending list, [a-z] for characters
and [0-9] for numbers. SQL can also sort the records in descending order, [z-a]. The syntax is
:
SELECT <column_names>
FROM <Tablename>
WHERE <predicates>
ORDER BY <column_name> DESC;
Example 1:
SELECT emp_name, salary
FROM Employee
ORDER BY salary ASC;
This would return the records sorted by the salary field in ascending order as given
below:.
emp_name salary
Babita 15000
Konika 16000
Manoj 17000
Kabita 20000
Neel 25000
Example 2:
SELECT emp_name, salary
FROM Employee
ORDER BY salary DESC;
This would return all the records sorted by the salary field in descending order as
given below:
emp_name salary
Neel 25000
Kabita 20000
Manoj 17000
Konika 16000
Babita 15000
NULL value means unknown or missing data value. SQL treats any zero-length string
like a NULL value. Sometimes there may be records in a table containing no value. This may
be because during the data entry time the data was not available or for some rows in a table
that particular field is not applicable. A table column, by default, can have NULL values. The
NULL value is different from other values like a blank or a zero as zero is a numeric value
and a blank space is a character value.
NULL values in a column of a database table can be tested by using the operators IS NULL
and IS NOT NULL. Suppose to display the records with NULL values in the Phone column
of the Employee table, the following expression can be written in SQL.
SELECT emp_name, phone FROM Employee
WHERE phone IS NULL;
Constraints can be applied to table columns to prevent the addition of invalid data or
deletion of data that is required to maintain the overall consistency of the database. The
constraints are used to control the data being entered into a database table.
In SQL, NOT NULL constraint can be defined at the column level in addition with
the primary and foreign key. This constraint when defined on a column means that the
column cannot be left empty. It becomes a mandatory column and a value must be entered
into it.
The syntax for it is as:
<column_name> <datatype>(<size>) NOT NULL
For example, to create a student table, the SQL statement is –
CREATE TABLE Student(
Roll_no varchar2(10),
Name varchar2(20),
D_o_b date NOT NULL,
Address varchar2(30));
When inserting values in the columns of the Student table, the date of birth (d_o_b)
field if not entered will display a error message. This field value has to be entered or each
row in the table data.
The NOT NULL constraint can be applied only at the column level of a table.
Normalization
After going through this unit, the learner will be able to:
understand what is normalization
list out the objectives of normalizations
describe 1st, 2nd and 3rd normal forms
illustrate fourth and fifth normal forms
1.2 Introduction
In the previous unit, we have learn about the two important concepts in database design i.e.
functional dependency and decomposition which helps in minimizing the redundancy in
database design. In this unit , we will concentrates on the discussion of normalization.
Normalization is the process of efficiently organizing data in a database. Normalization is the
name given to the process of simplifying the relationship among data elements in a record.
We will introduce you the different types of normalizations and brief discussion on it.
1.3 Normalization and Its Objectives
i. minimizing redundancy,
ii. minimizing insertion, deletion and update anomalies.
The process of normalization was first proposed by E.F. Codd. Normalization is a bottom up
design technique for relational database.
The objectives of the normalization process are:
1. To create a formal framework for analyzing relation schemas based on their keys and
on the functional dependencies among their attributes.
2. To free relations from undesirable insertion, update and deletion anomalies.
3. To reduce the need for restructuring the relations as new data types are introduced.
4. To carry out series of tests on individual relation schema so that the relational
database can be normalized to some degree. When a test fails, the relation violating
that test must be decomposed into relations that individually meet the normalization
test
1.4 Normal Forms
Whenever the simple rules of functional dependencies are applied to a relations, it transforms
the relations into a state which called normal form. The normal forms are used to ensure that
various types of anomalies and inconsistencies are not introduced into the database. Various
types of normal forms are used in relational data base, they are :
A relation is said to be in first normal form if the values in the domain of each attribute of the
relation are atomic. The first normal form prohibits multivalued attributes, composite
attributes and their combinations. It means that, 1NF disallows having a set of values, a tuple
of values, or combination of both as an attribute value for a single tuple.
Let us consider the relation TRAVEL_INFO as shown in the figure.
Here, in the relation the domain VISITED_CITY is not simple. Hence, the relation is un-
normalized. Now, let us combine the respective rows in VISITED_CITY with the value of
the attribue PERSON and the resultant relation is shown below –
A relation or table is said to be in third normal form (3NF) if the relation is in 2NF and the
non-prime attributes are -
a. mutually independent,
b. Functionally dependent on the primary key.
It means that, no attributes of the relation should be transitively functionally dependent on the
primary key. Thus, in 3NF, no non-prime attribute is functionally dependent on another non-
prime attribute. This means that a relation in 3NF consists of the primary key and a set of
independent nonprime attributes. 3NF is based on the problem of transitive dependency. The
3NF eliminates the problem of 2NF.
In our example, in the Fig 8.5, relation PATIENT_DOCTOR, there is no dependency
between the attributes P_NAME and DURATION. Again, P_NAME and DOB are mutually
dependent. So, the relation is not in 3NF.
To bring the relation into 3NF, it has to be decomposed and remove the attributes that are not
directly dependent on the primary key. Now, using the transitive dependency, DOB can be
linked to the primary key, through its dependency on the P_NAME. The functional
dependency diagram is shown below. Now, the relations uses are –
Here, the dependency Roll_No -> Hostal_Name is transitive through the following two
dependencies :
Roll_No -> Year,
Year -> Hostal_Name
Thus, the STUDENT relation is not in 3NF. To bring the relation into 3NF we can
decompose the relation into two relation STUD1 and STUD2, as shown below.
In the above examples, the conversion into 3NF is not hard, but whenever a relation has more
than one combination of attributes that may be considered as primary key then the conversion
becomes problematic. Let us consider the following relation UTILIZE, shown below.
The relation stores the machines information used by both projects and project managers.
Each project has one prooject manager and each project manager manages one project. Now
it is obvious from the table that , we can consider any one of the combination of attributes as
primary key, namely, {Project, Machine} or {Proj_Manager, Machine}. The FDD for
relation UTILIZE is shown below.
In the relation, there is only one non-prime attribute called, QTY_Used, which is fully
functionally dependent on each of the two relations. Thus, the relation UTILIZE is in 2NF.
Moreover, there is only one non-prime attribute Qty_Used, there can be no dependencies
between non-prime attributes. Thus the relation is also in 3NF.
Problems in 3NF:
If we consider the above relation i.e. UTILIZE, which is in 3NF, has the following
undesirable properties:
a. The project manager of each project is stored more than once.
b. A project manager cannot be stored until the project has ordered some machines
c. A project cannot be entered unless that project‘s manager is known.
d. If a project‘s manager changes, some rows also must be changed.
The redundancy and the problems of 3NF can be eliminate by the use of the Boyce-Codd
normal form (BCNF) which was proposed by R.F. Boyce.
A relation (or table) R is said to be in BCNF if for every nontrivial FD : X -->Y between
attributes X and Y holds in R. It means -
From the above conditions we have come to know that, a relation must only have candidate
keys as determinants. Any relation in BCNF is also in 3NF and consequently in 2NF.
However, a relation in 3NF is not necessarily in BCNF.
The difference between 3NF and BCNF is that - if the functional dependency A --> B, satisfy
that B is a primary key attribute and A is not a candidate key, then 3NF will allows this
dependency in a relation.
Otherwise, if the functional dependency A -->B, satisfy that A must be a candidate key, then
this dependency will belongs to BCNF.
In our example, in relation (or table) Fig 8.12, does not satisfy the condition of BCNF, as it
contains the following two functional dependencies -
Proj_Manager --> Project
Project --> Proj_Manager
But neither Proj_Manager nor Project is a super key.
Now the relation UTILIZE can be decomposed into the following two BCNF relations
UTILIZE(Project, Machine, Qty_Used)
PROJECTS(Project, Proj_Manager)
Both of the above relation are in BCNF. The only FD between the UTILIZE attributes is
Project, Machine --> Qty_Used
and (Project, Machine) is a super key.
The two FDs between the PROJECTS attributes are
Project --> Proj_Manager
Proj_Manager --> Project
Both Project and Proj_Manager are super keys of relation PROJECTS and PROJECTS is in
BCNF.
LET US KNOW
It is defined as X--> --> Y in relation R(X, Y, Z), if each X value is associated with a set of Y
values in a way that does not depend on the Z values. Here, X and Y are both subsets of R.
The notation X --> --> Y is used to indicate that a set of attriibutes of Y shows a multi-valued
dependecy on a set of attributes of X.
Always remember that
1. in a relation(or table), to contain an MVD, it must have three or more attributes.
2. It is possible to have a table containing two or more attributes which are inter-
dependent multi-valued facts about another attribute. For a relation to be MVD,
the attributes must be independent of each other.
Let us consider the following relation to gain more concept on MVD.
Here, suppose that X is Person and Y is Skill_Type, then Z becomes the combination {
Project, Machine }. Suppose, a particular value of Person ―John‖ is selected. Consider all
rows (tuples) that have some value of Z, for example, Project = P1 and Machine = ―Shovel‖.
The value of Y in thhis tuple is ‗Programmer‘.
Consider also all tuples with same value of X, that is Person, but with some other value of Z,
say Project = ―P2‖ and Machine = ―Welding‖. The value of Y in these tuple is again
‗Programmer‘.
The same set of values of Y is obtained for Person = ―John‖, irrespective of the values chosen
for Project and Machine. Hence, XY, or PersonSkill_Type. If we find out the possible
MVD‘s the following would be the results :
A table is in the fourth normal form (4NF) if it is in BCNF and does not have any
independent multi-valued parts of the primary key.
The fourth normal form is related to the concept of a multi-valued dependency (MVD). In
simple terms, if there are two columns - A and B - and if for a given A, there can be multiple
values of B, then we say that an MVD exists between A and B.
The fourth normal form is theoretical in nature. In practice, normalization up to and including
the third normal form are generally adequate. In certain situations, the designers may also
have to look at the BCNF. However, rarely do we see the 4NF being employed for any real
life use.
Let us consider the following STUDENT table (or relation):
We can see that there are two independent MVD facts in this relationship :
a) A student can study many subjects (i.e. Student --> --> Subject)
b) A student can learn many languages (i.e. Student -->--> Language)
The primary key for the STUDENT table is currently a composite key made up of all the
three columns in the table - Student, Subject and Language. In other words, the primary key
of the table is Student + Subject + Language.
The process of bringing this table into 4NF is : split the independent multi-valued
components of the primary key into two tables.
Therefore, let us split these two independent multi-valued dependencies into two separate
tables namely Student_Subject and Student_Language. The resulting tables are shown below:
We have seen that, this decomposition reduces redundancy with respect to both the
independent MVD relationships, that is subject and language.
A relation (or table) is said to be in the 5NF if and only if it is in 4NF and every join
dependency in it is implied by the candidate key.
There are some relations, which cannot be decomposed into two or higher normal form
relations by means of projection methods discussed in 1NF, 2NF, 3NF and BCNF. Such
relations are decomposed into three or more relations, which can be reconstructed by means
of a three-way or more join operation. This is called fifth normal form (5NF). The 5NF
eliminates the problems of 4NF. 5NF allows for relations with join dependencies. Any
relation that is in 5NF, is also in other normal forms namely 2NF, 3NF and 4NF. 5NF is
mainly used from theoretical point of view and not for practical database design.
a) Normalization is a process of
b) A normal form is
i. a state of a relation that results from applying simple rules regarding FD.
ii. the highest normal form condition that it meets.
iii. an indication of the degree to which it has been normalized.
iv. all of these.
c) In 1NF,
d) 2NF is always in
i. 1NF
ii. BCNF
iii. MVD
iv. none of these
i. if it is in 1NF.
ii. every non-prime key attributes of R is fully functionally dependent on each
relation key of R.
iii. if it is in BCNF.
iv. both (i) & (ii).
i. relation R is in 2NF
ii. non prime attributes are mutually independent.
iii. functionally dependent on the primary key.
iv. all of these.
g) 4NF is concerned with dependencies between the elements of compound keys composed
of
i. one attributes
ii. two attributes
iii. three or more attributes
iv. none of these
Unit-1
After going through this unit, the learner will be able to:
Learn about the concept of key and its uses
Learn the different types of keys like super key, candidate key, alternate key, primary
key foreign key etc.
Define primary and foreign key in a relation
Use composite key
1.2 Introduction
In our previous unit, we have seen that in case of relational model, the database is logically
represented in the form of tables so that it can be easily understood and visualized by
everyone. The roles of the keys are very important in case of relational databases. In fact,
without keys relational database will not be useable at all.
In this unit, we will discuss the concept of keys in a database. The use of different types of
keys will be covered in this unit.
1.3 Key
In a relational model, a database consists of relations (tables), which consists of tuples (or
records/rows), which further consist of attributes (or fields/columns). We must have a way to
specify how tuples within a relation are distinguished. Each relation in a relational database
must have an attribute or combination of attributes such that they can uniquely identify the
tuple. This unique identifier is called key. A key is that data item that exclusively identifies a
record or tuple. It may consist of one or more attributes. We can split related data into
different relations or tables and logically linked them together with the help of keys. Without
this unique identifier, there is no way to retrieve the unique tuple from a relation.
For example, let us consider the following relation (table). In this unit we may use the
terminologies table, row or record and field in place of relation, tuple and attribute
respectively.
STUDENT
Table 5.1
The above table gives us marks and grades of students of a particular class. There are six
records in the table ―STUDENT‖. Each record has the following four fields: Roll_no, Name,
Marks and Grade. As we can see, among the fields Name, Marks and Grade, no one field can
identify a record in the table uniquely. The Name field, cannot be used as key because several
student might have the same name. Marks field contains more than one same marks.
Similarly, more than one students are with same Grade. So these three fields cannot be used
as key. However, the field Roll_no can easily identify any row in the table uniquely. Roll
numbers of students in a particular class are different. So such fields can be used as key.
1.4 Types of Key
Every key which has the property of uniqueness can be distinguished as follows:
Super Key
Candidate Key
Primary Key
Alternative Key
Composite Key
Foreign Key
A super key is a set of columns that uniquely identifies every row in a table. For example, if
there is a table STUDENT with only two columns Roll_no and Name, then the super key will
be
{ Roll_no, Name}
if we assume that there are no two student in the class with the same Roll_no as well as
Name.
Similarly, let us consider a EMPLOYEE table (table 5.2) consisting of the columns Emp_ID,
Name and Post. We could use the Emp_ID in combination with any or all other columns of
this table to uniquely identify a row in the table. Examples of superkeys in this table would be
{Emp_ID}, {Emp_ID, Name} and {Emp_ID, Name, Post}.
In a real database we do not need values for all of those columns to identify a row. We only
need a minimal set of columns that can be used to identify a single row. In our example, the
set {Emp_ID} is the minimal super key.
A table can have more than one columns that could be chosen as the key because they
individually have the capability to identify a record uniquely. These fields are termed as
candidate keys. In other words, a candidate key is any set of one or more columns whose
combined values are unique among all occurrences (i.e., tuples or rows or record). Since a
null value is not guaranteed to be unique, no component of a candidate key is allowed to be
null. Candidate keys are those attributes of a relation, which have the properties of
uniqueness and irreducibility. These two properties are explained below:
Let K be a set of attributes of relation R. Then K is a candidate key for R if and only if it
possesses both of the following properties:
Uniqueness: No legal value of R ever contains two distinct tuples with the same value for K.
Irreducibility: No proper subset of K has the uniqueness property.
Let us consider the following relation EMP_INFO containing some personal information of
employees working in an office. Suppose all of them have passport.
The attribute Emp_ID and Passport_no posseses unique data item for each employee.
Therefore, any of these two attribute can be chosen as the key. These two are examples of
candidate keys in the above relation. The attribute Name cannot be a candidate key as more
than one employee might have identical name. Similary, several employees might have same
blood group. So Blood Group cannot be chosen as key.
1.4.3 PRIMARY KEY
Every database table should have one or more columns designated as the primary key. The
value this key holds should be unique for each record in the database. In a database, there can
be multiple candidate keys. Out of all the available candidate keys, a database designer can
identify a primary key. The primary key should be chosen such that its attributes are never or
very rarely changed.
A primary key is a field or combination of fields that uniquely identify a record in a table, so
that an individual record can be located without confusion. Depending on its design, a table
or relation may have arbitrarily many unique keys but at most one primary key. For example,
let us assume we have a table called EMPLOYEE_ADDRESS that contains some
information for every employee in an organization. We should need to select an appropriate
primary key that would uniquely identify each employee. Our first thought might be to use
the employee‘s name i.e, Emp_Name. But this would not work properly because two or more
employees with the same name might be possible in the organization. The Location field of a
person cannot be chosen as primary key since it is likely to change. A better choice might be
to use a unique Emp_ID number that the organization assign to each employee when they are
appointed. Emp_ID can be a primary key as it does not changed till the person is working in
the same organization.
In the table 5.1., student‘s Roll_no would be a good choice for a primary key in the
STUDENTS table. The student‘s name would not be a good choice, as there is always the
chance that more than one student with same name. Some other examples of primary keys are
Social Security Numbers (associated with a specific person) , ISBN_no (associated with a
specific book).
A primary key is a special case of unique keys. Unique key constraint is used to prevent the
duplication of key values within the rows of a table and allow null values. Primary key allows
each row in a table to be uniquely identified and ensures that no duplicate rows exist and no
null values are entered. Thus primary key constraint can be defined as a rule that says that the
primary key fields cannot be null and cannot contain duplicate data.
Once we decide upon a primary key and set it up in the database, the database management
system (DBMS) will enforce the uniqueness of the key. If we try to insert a record into a
table with a primary key that duplicates an existing record, the insert will fail. Sometimes, a
table just does not have a primary key. In such cases, we may need to introduce an additional
column which contains unique values. Most databases are also capable of generating their
own primary keys. Microsoft Access, for example, may be configured to use the
AutoNumber data type to assign a unique ID to each record in the table. While effective, this
is a bad design practice because it leaves us with a meaningless value in each record in the
table. It is better to use that space by storing some useful data.
Properties of Primary Key
To qualify as a primary key for an entity, an attribute must have the following properties:
Stable:
The value of a primary key must not change or should not become NULL throughout the file
of an entity. A stable primary key helps to keep the model stable. For example, if we consider
a patient record, the value for the primary key (Patient number) must not change with time as
would happen with the age field.
Minimal:
The primary key should be composed of the minimum number of fields that ensures the
occurrences are unique.
Definitive:
A value must exist for every record at creation time. Because an entity occurrence cannot be
substantiated unless the primary key value also exists.
Accessible:
Anyone who wants to create, read or delete a record must be able to see the Primary key
value.
1.4.4 Alternate key
As we have seen, it is possible for a relation to have two or more candidate keys. If we chose
any one of them as primary key, then the remaining keys will be termed as alternate key. The
alternate key (or secondary key) is any candidate key which is not selected to be the primary
key. For the illustration of alternate key, let us consider the following table ELEMENT which
stores some information like element name, symbol, atomic number of the elements of
periodic table.
All the three fields can individually identify each element in the table. So any of these three
fields can be chosen as the primary key . If we choose Symbol as the primary key; Name and
Atomic_no would then be alternate keys. Similarly, in the EMP_INFO (table 5.3), if we
consider Emp_ID as the primary key then Passport_no will be the alternate key.
In some situations, while designing a database, there may not be a particular column or field
that can individually identify a record uniquely in a table. In such cases, we may require to
select two or more fields so that combination of those can identify each record uniquely.
These combination of fields is known as composite key. It is used when a record cannot be
uniquely identified by a single field.
For the illustration of composite key, let us consider the following table ITEM with the fields
Supplier_ID, Item_ID, Item_Name and Quantity. This table gives us the information which
supplier sells
which item. As we can see, any of these fields indivisually cannot identify a row in the table
uniquely. But if we combine Supplier_ID and Item_ID, then these together can easily identify
any row in the table uniquely. Thus, Supplier_ID and Item_ID together becomes a composite
key.
One important type of key that we will discuss in this unit is the foreign key. These keys are
used to create relationships between tables.
A foreign key is a field in a relational table that matches the primary key column of another
table. It identifies a column or a set of columns in one (referencing) table that refers to a
column or set of columns in another (referenced) table. The columns in the referencing table
must be the primary key or other candidate key in the referenced table. The values in one row
of the referencing columns must occur in a single row in the referenced table. Thus, a row in
the referencing table cannot contain values that donnot exist in the referenced table. This way
references can be made to link information together and it is an essential part of database
normalization. Multiple rows in the referencing table may refer to the same row in the
referenced table.
For example in an employees database, let us imagine that we wanted to add a table
DEPARTMENT containing departmental information to the database. We would also want to
include information about the employees in the department, but it would be redundant to have
the same information in two tables (EMPLOYEE and DEPARTMENT). Instead, we can
create a relationship between the two tables.
Let us assume that the DEPARTMENT table uses the Department_Name column as the
primary key. To create a relationship between the two tables, we add a new column to the
EMPLOYEE table called Department_Name. We then fill in the name of the department to
which each employee belongs. The Department_Name column in the EMPLOYEE table is a
foreign key (FK) that references the DEPARTMENT table. The database will then enforce
referential integrity by ensuring that all of the values in the Department column of the
Employees table have corresponding entries in the DEPARTMENT table.
Again, let us consider a book database. The BOOKS table has a link to the publishers table.
The Pub_ID column is the primary key for the PUBLISHERS table and ISBN_no is the
primary key for the BOOKS table. The BOOKS table also contains a Pub_ID column which
matches the primary key column of the publishers table. This Pub_ID is the foreign key in the
BOOKS table. The Pub_ID field in the BOOKS table indicates which publisher a book
belongs to.
Although the primary purpose of a foreign key constraint is to control the data that can be
stored in the foreign key table, it also controls changes to data in the primary key table. For
example, if the row for a publisher is deleted from the publishers table, and the publisher‘s ID
is used for books in the BOOKS table, the relational integrity between the two tables is
broken; the deleted publisher‘s books are orphaned in the BOOKS table without a link to the
data in the publishers table. A foreign key constraint prevents this situa-tion. The constraint
enforces referential integrity by ensuring that changes cannot be made to data in the primary
key table if those changes invalidate the link to data in the foreign key table. If an attempt is
made to delete the row in a primary key table or to change a primary key value, the action
will fail if the deleted or changed primary key value corresponds to a value in the foreign key
constraint of another table. To change or delete a row in a foreign key constraint successfully,
we must first either delete the foreign key data in the foreign key table or change the foreign
key data in the foreign key table, thereby linking the foreign key to different primary key
data. i.e., a primary key constraint cannot be deleted if referenced by a foreign key constraint
in another table; the foreign key constraint must be deleted first.
After going through this unit, the learner will be able to:
understand the importance of back up in a database
know the recovery process
learn the different types of recovery operations
difference between backup and recovery process
1.2 Introduction
Although most database systems have incorporated backup and recovery tools into their
interfaces and infrastructure with the growing dependency in the workplace on information
and general, and the information in a database specifically, there has never been a time when
safe backups and reliable recoveries were more important. It is not just the data files that need
to be part of the backup process. You must also backup the transaction logs of the database as
well. Without the transaction logs the data files are useless in a recovery event. How often
you choose to perform these backup routines is really dependent on the data requirements of
a company.
Backing up of a database is only the first step in the process. The next step is to make sure
those backups are protected. You also need to test the backups and to ensure that they can be
used to restore your database. Probably the most common database backup technique
involves backing up the database to a disk on the same server. This is fine, provided the disk
is a separate physical RAID array from the one that your database sits on. Since backups are
used to recover from worst-case scenarios, they need to be protected from such disasters.
After all, what good are backups that will just be lost when the server fails?
The two most common things to do with these backups are to either back them up to tape
shortly after they are saved to disk or move them to another server for long-term storage.
Either solution is acceptable since you are left with a secondary backup. This way if the
server fails and takes your original backups with it, you can still restore the database.
Once you have your backups saved to another location — either tape or another server — you
then must ask yourself whether those backup can be restored. Ensuring that your backups are
going to be useful is an important step in the backup process. Using the Verify Backup
Integrity checkbox in the maintenance plan is not enough; because it simply makes sure that
the header of the backup is correct without verifying the validity of the backup.
One of the major advantages that enterprise-class databases offer over their desktop
counterparts is a robust backup and recovery feature set. Microsoft SQL Server provides
database administrators with the ability to customize a database backup and recovery plan to
the business and technical requirements of an organization.
In this unit, we will explore the process of backing up data with Microsoft SQL Server. When
you create a backup plan, you‘ll need to create an appropriate mix of backups with varying
backup scopes and backup types that meet the recovery objectives of your organization and
are suitable for your technical environment.
1.3.2 Backup Scopes
The scope of a backup defines the portion of the database covered by the backup. It identifies
the database, file(s) and/or file group(s) that SQL Server will backup. There are three
different types of backup scope available in Microsoft SQL Server:
Database backups cover the entire database including all structural schema
information, the entire data contents of the database and any portion of the transaction
log necessary to restore the database from scratch to its state at the time of the backup.
Database backups are the simplest way to restore your data in the event of a disaster,
but they consume a large amount of disk space and time to complete.
Partial backups are a good alternative to database backups for very large databases
that contain significant quantities of read-only data. If you have read-only file groups
in your database, it probably doesn‘t make sense to back them up frequently, as they
do not change. Therefore, the scope of a partial backup includes all files in the
primary file group, all read/write file groups, and any read-only file groups that you
explicitly specify.
File backups allow you to individually backup files and/or file groups from your
database. They may be used to complement partial backups by creating one-time-only
backups of your read-only file groups. They may also play a role in complex backup
models.
The second decision you need to make when planning a SQL Server database backup model
is the type of each backup included in your plan. The backup type describes the temporal
coverage of the database backup. SQL Server supports two different backup types:
Full Backups include all data within the backup scope. For example, a full database
backup will include all data in the database, regardless of when it was last created or
modified. Similarly, a full partial backup will include the entire contents of every file
and file group within the scope of that partial backup.
Differential Backups include only that portion of the data that has changed since the
last full backup. For example, if you perform a full database backup on Monday
morning and then perform a differential database backup on Monday evening, the
differential backup will be a much smaller file (that takes much less time to create)
that includes only the data changed during the day on Monday.
You should keep in mind that the scope and type of a backup are two independent decisions
made when creating your backup plan. As described above, each type and scope allows you
to customize the amount of data included in the backup and, therefore, the amount of time
required to backup and restores the database in the event of a disaster.
Using the SQL Database Backup and Restore console agent we can automate the backup of
SQL Databases. The input to the agent is in the form of an ini file making it fully interactive.
The agent can write event logs. Using this agent we can create compressed backup of SQL
Databases, restore the SQL Database using this backup and delete the intermediate file. For
best results this agent should be used with Mobility backup software in the client server mode
Other Tools of database backup are
Active@ Partition Recovery
Memory Card Recovery Software
Acronis True Image Home Upgrade
R-Drive Image
Paragon Drive Backup Personal
Recover USB Drive Files
BootMaster Rescue Disk for Windows
Driver Magician
Handy Recovery
WordFIX Data Recovery
USB External Drive Recovery
1.4 Types of Database Failure
Database failures can be classified as transaction failure, media failure and system failures.
Some of the cause for which transaction of a database to fail in the middle of execution:
1. System crash or computer failure: A hardware, software or network failure or
error may occurs in the computer system during transaction execution.
2. Media failures: hardware crashes are generally media failures, such as main
memory failure.
3. A transaction or system error: Some operation in the transaction may cause it to
fail, such as integer overflow or division by zero. It may be occurred due to some
logical error.
4. Concurrency control enforcement: The concurrency control method may decide
to abort the transaction, to be restarted later, because it violates serializability or
because several transactions are in a state of deadlock.
5. Disk Failure: Some of the disk may be lose their data because of read or write
malfunction or because of a disk read/write head crash.
6. Physical problems: This referred to an endless list of problems that include
power or air conditioning failure, fire, theft, overwriting disk or tapes by mistakes,
mounting of a wrong tape by the operator.
Whether a business is small, medium or large business, it must have a well-written plan for
backing up the servers. Planning a backup strategy up-front and documenting not only the
backup process but also the restore process, will save you a ton of time in the end. Because of
its value to the company and the sensitive nature of it, the classification of data must be
carefully considered in the planning stage. Based upon these classifications, the backup and
restore plan will need to be tested and adjusted. While planning stage, data should be ranked
according to sensitivity and value to the business.
With data that is highly valuable to a company, plans must include an increased backup
frequency due to the nature of the costs incurred while recapturing data in case of a to
disaster. Recoverability plans must also consider the availability requirements of this data.
With highly sensitive data, plans must include encryption of backups, especially when this
data is stored offsite.
With a SQL Server, DBA‘s should also be concerned with the OS, the applications that the
server runs and finally the databases. In other words, the entire server needs a backup and
recovery plan. User databases are critical to backup plan, but system databases that contain
significant information like Users, SQL Jobs and other system functionality, must also be
taken into account
A database can become unusable because of hardware or software failure, or both. You may,
at one time or another, encounter storage problems, power interruptions, or application
failures, and each failure scenario requires a different recovery action. Protect your data
against the possibility of loss by having a well-rehearsed recovery strategy in place. Some of
the questions that you should answer when developing your recovery strategy are:
1. Will the database be recoverable?
2. How much time can be spent recovering the database?
3. How much time will pass between backup operations?
4. How much storage space can be allocated for backup copies and archived logs?
5. Will table space level backups be sufficient, or will full database backups be
necessary?
A database recovery strategy should ensure that all information is available when it is
required for database recovery. It should include a regular schedule for taking database
backups and, in the case of partitioned database systems, include backups when the system is
scaled (when database partition servers or nodes are added or dropped). Your overall strategy
should also include procedures for recovering command scripts, applications, user-defined
functions, stored procedure codes.
The concept of a database backup is the same as any other data backup: taking a copy of the
data and then storing it on a different medium in case of failure or damage to the original.
The simplest case of a backup involves shutting down the database to ensure that no further
transactions occur, and then simply backing it up. You can then rebuild the database if it
becomes damaged or corrupted in some way.
The rebuilding of the database is called recovery. Version recovery is the restoration of a
previous version of the database, using an image that was created during a backup operation.
Roll forward recovery is the reapplication of transactions recorded in the database log files
after a database or a table space backup image has been restored.
Crash recovery is the automatic recovery of the database if a failure occurs before all of the
changes that are part of one or more units of work (transactions) are completed and
committed. This is done by rolling back incomplete transactions and completing committed
transactions that were still in memory when the crash occurred.
Recovery log files and the recovery history file are created automatically when a database is
created (Figure 1). These log files are important if you need to recover data that is lost or
damaged.
Each database includes recovery logs, which are used to recover from application or system
errors. In combination with the database backups, they are used to recover the consistency of
the database right up to the point in time when the error occurred.
The recovery history file contains a summary of the backup information that can be used to
determine recovery options, if all or part of the database must be recovered to a given point in
time. It is used to track recovery-related events such as backup and restore operations, among
others. This file is located in the database directory.
The table space change history file, which is also located in the database directory, contains
information that can be used to determine which log files are required for the recovery of a
particular table space.
1.5.3 Structure of Recovery
You cannot directly modify the recovery history file or the table space change history file;
however, you can delete entries from the files using the PRUNE HISTORY command. You
can also use the rec_his_reten in database configuration parameter to specify the number of
days that these history files will be retained.
Figure 1. Database recovery files
Data that is easily recreated can be stored in a non-recoverable database. This includes data
from an outside source that is used for read-only applications, and tables that are not often
updated, for which the small amount of logging does not justify the added complexity of
managing log files and rolling forward after a restore operation. Non-recoverable databases
have the logarchmeth1 and logarchmeth2database configuration parameters set to ―OFF‖.
This means that the only logs that are kept are those required for crash recovery. These logs
are known as active logs, and they contain current transaction data. Version recovery using
offline backups is the primary means of recovery for a non-recoverable database. (An offline
backup means that no other application can use the database when the backup operation is in
progress.) Such a database can only be restored offline. It is restored to the state it was in
when the backup image was taken and roll forward recovery is not supported.
Data that cannot be easily recreated should be stored in a recoverable database. This includes
data whose source is destroyed after the data is loaded, data that is manually entered into
tables, and data that is modified by application programs or users after it is loaded into the
database. Recoverable databases have the logarchmeth1 or logarchmeth2 database
configuration parameters set to a value other than ―OFF‖. Active logs are still available for
crash recovery, but you also have the archived logs, which contain committed transaction
data. Such a database can only be restored offline. It is restored to the state it was in when the
backup image was taken. However, with roll forward recovery, you can roll the database
forward (that is, past the time when the backup image was taken) by using the active and
archived logs to either a specific point in time, or to the end of the active logs.
Recoverable database backup operations can be performed either offline or online (online
meaning that other applications can connect to the database during the backup operation).
Online table space restore and roll forward operations are supported only if the database is
recoverable. If the database is non-recoverable, database restore and roll forward operations
must be performed offline. During an online backup operation, roll forward recovery ensures
that all table changes are captured and reapplied if that backup is restored.
If you have a recoverable database, you can back up, restore, and roll individual table spaces
forward, rather than the entire database. When you back up a table space online, it is still
available for use, and simultaneous updates are recorded in the logs. When you perform an
online restore or roll forward operation on a table space, the table space itself is not available
for use until the operation completes, but users are not prevented from accessing tables in
other table spaces.
1.5.5 Automated Backup Operations
Since it can be time-consuming to determine whether and when to run maintenance activities,
such as backup operations, you can use the Configure Automatic Maintenance wizard to do
this for you. With automatic maintenance, you specify your maintenance objectives,
including when automatic maintenance can run. DB2 then uses these objectives to determine
if the maintenance activities need to be done and then runs only the required maintenance
activities during the next available maintenance window (a user-defined time period for the
running of automatic maintenance activities).
Note: You can still perform manual backup operations when automatic maintains is
configured. DB2 will only perform automatic backup operations if they are required.
1.6 Database Security
Database security protects the database against the unauthorized persons to access a certain
part of a database or the whole database. Database security is very broad area that addresses
many issues.
Types of Security:
1. In database Management System Some information may be deemed to be private,
such that it cannot be accessed by any unauthorized user.
2. Database security issues some policy for accessing data in various levels like
Governmental, Institutional, and corporate level as to what kind of information should
not be made publicly available.
3. System related security.
4. In case of database security, an organization should maintain security at various levels
and to categorize the data and the users‘ base on these classifications. As for example
– A Super admin level users have the authority to insert, read and writes operation on
the data. In admin level user have the authority to insert and read the data but these
type of user cannot delete or modify the data in database permanently and in general
user level they can only read the data from database server
Goals of database security:
1. Loss of integrity: integrity is lost if unauthorized changes are made to the data by
intentionally or accidentally acts. If the lost of the system or data integrity is not
corrected, continued use of the corrupted data could result in inaccuracy, fraud or
erroneous decisions.
2. Loss of availability,
3. Laws of confidentiality- database confidentiality refers to the protection of data from
unauthorized users. Unauthorized, unanticipated or unintentional disclosure could
result in loss of public confidence, embarrassment, or legal action against the
organization.
1.7 Check Your Progress
After going through this unit, the learner will be able to learn:
Describe relational model and its advantages
State different integrity constraints.
Describe how data are organized in the form of tables.
1.2 Introduction
In the previous unit, we have discussed the properties of basic and commercial data
models and details of Entity-Relationship model.
This unit is an attempt to provide you the concept of relational model. Most of the
commercial DBMS products available in the industry are relational at core. In this unit we
will discuss the terminology, operators and operations used in relational model. There are
certain restrictions in formulating the relation table. Those restrictions will also be discussed
in this unit.
1.3 Model Concept
A model in database system basically defines the structure or organization of data and a set of
operations on that data. Relational model is a simple model in which database is represented
as a collection of ―Relations‖, where each relation is represented by a two dimensional table.
Thus, because of simplicity, it is most commonly used in real world. Following table
represents a simple relation:
R_NO S_NAME ADDRESS MARKS
10 Sanjib Kaur Block -4, Noonmati 69
12 Padip Sen Ganeshguri 75
15 Bipul Prasad Bamunimaidan 58
Each row in a table is called a tuple and a column name is called an attribute. For example,
Figure 4.3.1 represents a STUDENT relation where ROLL NO, NAME, ADDRESS and
MARKS are attributes and each entry against these attributes is called tuple of relation
STUDENT.
Domain:
A domain is a collection of all possible values from which the values for a given column or
attribute is drawn. So, every attribute in a table has a specific domain. Values to these
attributes cannot be assigned outside their domains. For example, the domain of attribute
NAME is the set of all alphabetic string of finite length and the domain of a MRKS attribute
should not be greater than 100 for the relation STUDENT in Figure 4.1.
Relation:
The table with all tuples and attributes is called relation. It has three components: Name that
represent by the title of the relation, Degree, the number of column associated with the table
and the Cardinality, the number of rows in the table. For example, Figure 4.3.1 represents a
relation named STUDENT of degree 4, because it has total four attributes, and the cardinality
for this relation is 3(number of rows).
1.3.2 Relational Schema and Instances
The term integrity refers to the accuracy or correctness of data in the database schema and is
expected to hold on every database instance of that schema. Relational model includes two
general integrity constraints. They are:
Entity Integrity Constraints states that no primary key value can be NULL. This is because
we use the primary key value to identify individual tuples in a relation. It ensures that
instances of the entities are distinguishable i.e., they must have a unique identification of
some kind. Primary keys perform that unique identification function in a relational database.
Referential Integrity Constraint is specified between two relations and is used to maintain the
consistency among tuples of the two relations (not necessarily be distinct). It uses a concept
of foreign key which will be explained more details in the next unit. Informally, it states that
a tuple in one relation that refers to another relation must refer to an existing tuple in that
relation. Considering the following relations,
EMPLOYEE
p.k
ENO ENAME DNO
101 Robert 10
102 Smith 12
103 Robindra 12
104 John 10
DEPARMENT
p.k
DNO DNAME LOCATION
10 Comp. Sc. Jalukbari
12 Electronic Sc. Guwahati
In the above figure EMPLOYEE and DEPARTMENT are two relations where ENO and
DNO are primary keys respectively. Here the attribute DNO of EMPLOYEE table is a
foreign key that gives the department number for which each employee works. Hence its
value in each EMPLOYEE tuple must match the DNO value of some tuple in the
DEPARTMENT relation.
1.5 Domain Constraints
It specifies that each attribute in a relation must contain an atomic value only from the
corresponding domains. The data types for commercial RDBMS domains are:
Standard numeric data types for integer
Real numbers
Characters
Fixed length and variable length strings
Thus, domain constraint specifies the condition that we want to put on each instance of the
relation. So, the values that appear in each column must be drawn from the domain associated
with that column.
There are twelve (12) rules formulated by E.F. Codd for RDBMS in 1970 to define the
requirements more rigorously within a single product. In reality it is true to say that they do
not all carry the same degree of importance, but can be obtained a good result if an RDBMS
satisfies all these twelve rules. The rules are:
Rule 1: The information rule
All information is explicitly and logically represented in exactly one way – by data values in
tables. In simple terms this means that if an item of data does not reside somewhere in a table
in the database then it does not exist and this should be extended to the point where even such
information as table, view and column names to mention just a few, should be contained
somewhere in table form.
Rule 2: The rule of guaranteed access
Every item of data must be logically addressable by resorting to a combination of table name,
primary key value and column name. For a table like storage structure, this rule says that at
the insertion of a column and row it is necessarily find one value of a data item or null.
Rule 3: Systematic treatment of null values
In DBMS NULL values are supported in the representation of missing and inapplicable
information. This support for null values must be consistent throughout the DBMS and
independent of all data type.
Rule 4: Database description rule
The description of the database is held and maintained using the same logical structures used
to define the data, thus allowing users with appropriate authority to query such information in
the same way and using the same languages as they would any other data in the database. It
implies that there must be a data dictionary within the RDBMS that is constructed of tables
and/or views that can be examined using SQL. Therefore a dictionary is mandatory for
RDBMS.
Rule 5: Comprehensive sub-language rule
There must be at least one language whose statements can be expressed as character strings
conforming to some well-defined syntax. In real terms, the RDBMS must be completely
manageable through its own extension of SQL.
Rule 6: View updating rule
All views that can be defined using combination of base tables, and theoretically updatable,
must also be capable of being updated by the system. This is quite a difficult rule to interpret
and with all sorts of aggregates and virtual columns, it is obviously not possible to update
through some of them.
Rule 7: Insert and update rule
An RDBMS do more than just be able to retrieve relational data sets. It has to be capable of
inserting, updating and deleting data as a relational set.
Rule 8: Physical independence rule
User access to the database, via monitors and application programs, must remain logically
consistent whenever changes to the storage representation, or access methods to the data, are
changed. For example, if an index is built and destroyed by the DBA on a table, any user
should still retrieve the same data from that table.
Rule 9: Logical data independence
Application programs must be independent of changes made to the base tables. This allows
many types of database design change to be made dynamically, without users being aware of
them.
Rule 10: Integrity rule
The relational model includes two general integrity rules which we have discussed in already
in this unit. These integrity rule implicitly or explicitly define the set of consistent database
states, or changes of state, or both. Other integrity constraints can be specified during
database design.
Rule 11: Distribution rule
An RDBMS must have distribution independence. Thus, RDBMS package must make it
possible for the database to be distributed across multiple computers even though they are
having heterogeneous platforms both for hardware and operating system.
This is one of the most attractive aspects of RDBMSs, database system built on the relational
framework are well suited to today‘s client/server database design.
Rule 12: No subversion rule
If an RDBMS supports a lower level language that permits for example, row at-a-time
processing, then this language must not be able to bypass any integrity rules or constraints of
the relational language.
3.
(a) T (b) F (c) F (d) T (e) T
1.9 Possible Question