RDBMS
RDBMS
B.Com II Year
Syllabus
Unit – I Evolution of Database Technology, File-Oriented system, Database system, Client Server
Platform, Database system in the organization: Database and Data sharing, Strategic
database planning, Management control, risks and cost of database: logical and physical
data representation.
Unit – II Database Development Life Cycle (DDLC). Principles of conceptual Database Design,
Objects, Specialization Generalization, Relationship, Cardinality, Attributes, Relational
data Model: Fundamental Concepts, Normalization process (1NF, 2NF, 3NF, BCNF, 4NF)
Transforming Conceptual Model to a Relational Model.
Unit – III Relational Algebra, Relational implementation with SQL, introduction, Data Definition
Language (DDL), Data Manipulation Language (DML), Data control Language (DCL),
Transaction control Language (TCL), Schema and Table Definition, SQL functions:
Mathematical functions, Group functions, view definition: Introduction command to
create VIEW.
Unit – IV Physical, storage media, Disk performance factors data storage format file organization
and addressing method implementing, Managing the data base environment- Database
administration and control, DBA functions, goals, integrity, security and recovery.
Unit – V Introduction to SQL- Components of SQL, DDL,DML, Query Language DCI, TCI, SCL etc
invoking sql plus The oracle data types two dimensional matrix creation, Insertion
updation, deletion operations, the many faces of SELECT component creating tables
using query, inserting data using query, modifying the structure of tables, renaming
tables, dropping tables, dropping columns, logical operators, range searching, pattern
matching, use of Alias, Oracle Function. Accessing data from multiple tables. Set
operations. Union, Intersect, Minus. Data constraints: 1/0 constricting Business Rule
constraints. Groping Data save point, ROLLBACK & COMMIT constructing creating user
accounts, granting permission, revoking permission
B.Com IInd Year. Subject- Relational Database Management System
Unit 1
The Evolution :
Flat Files:
Earlier, punched cards technology was used to store data – later, files. But the files have no as such
advantage, rather have several limitations.
Advantages Limitations
Various access methods , e.g., sequential, Requires extensive programming in third-
indexed, random generation language such as COBOL, BASIC.
Separation and isolation: Each program
maintains its own set of data, users of one
program may not be aware of holding or
blocking by other programs that are being
used somewhere else, by another user.
Duplication of data – same data is held by
different programs, thus, wastes space and
B.Com IInd Year. Subject- Relational Database Management System
resources.
High maintenance costs such as ensuing
data consistency and controlling access
Sharing granularity is very coarse.
Weak security.
[1968-1980] Era of Hierarchical Database: Prominent hierarchical database model was IBM’s first
DBMS called IMS (Information Management System).
Mid 1960s Rockwell collaborates with IBM to create the Information Management System (IMS), IMS
lead the mainframe database market in 70’s and early 80’s.
In this model, files are related in a parent/child manner, with each child file having at most one parent
file.
Advantages Limitations
Efficient searching. Complex implementation
Less redundant data. Difficult to manage and lack of standards,
can’t easily handle many-many relationships.
Data independence. Lacks structural independence.
Database security and integrity.
Early 1960s, Charles Bachmann developed first DBMS at Honeywell, Integrated Data Store (IDS)
In Network data model, files are related as owners and members, similar to the common network model
except that each member file can have more than one owner.
Advantages Limitations
Ability to handle more System complexity and difficult to design and
relationship types maintain
Data Integrity
Data Independence
Prominent network database model was CODASYL DBTG model where as IDMS was the most popular
network DBMS.
General Comparison:
Here is a glimpse of all those database models we have discussed till now.
B.Com IInd Year. Subject- Relational Database Management System
Advantages Limitations
Can efficiently manage a large number of Switching an existing database to OODBMS
different data types. requires an entire change from scratch.
Objects with complex behaviors are easy to
An OODBMS is typically tied to a specific
handle using inheritance and polymorphism etc.
programming language and an API; this reduces
its flexibility.
Reduces the large number of relations by Ad-hoc queries are difficult to implement as one
creating objects. cannot join two classes as one can join two
tables in RDBMS. Therefore, queries depend
upon the design of the system.
Creates problems when deleting data in bulk.
Advantages Limitations
Large storage capacity The architecture of the object relational model is
not appropriate for web applications.
High access speed
B.Com IInd Year. Subject- Relational Database Management System
1970: Ted Codd at IBM’s San Jose Lab proposed relational models.
Two major projects start and both were operational in late 1970s
INGRES at University of California, Berkeley became commercial and followed up POSTGRES
which was incorporated into Informix.
System R at IBM san Jose Lab, later evolved into DB2, which became one of the first DBMS
product based on the relational model. (Oracle produced a similar product just prior to DB2.)
1980s: Maturation of the relational database technology, more relational based DBMS were developed
and SQL standard adopted by ISO and ANSI.
1990s: Incorporation of object-orientation in relational DBMSs, new application areas, such as data
warehousing and OLAP, web and Internet, Interest in text and multimedia, enterprise resource planning
(ERP) and management resource planning (MRP)
1991: Microsoft ships access, a personal DBMS created as element of Windows gradually supplanted all
other personal DBMS products.
1997: XML applied to database processing, which solves long-standing database problems. Major
vendors begin to integrate XML into DBMS products.
B.Com IInd Year. Subject- Relational Database Management System
Obviously, we cannot discuss all of the history material here, so if anyone wants to study more, here are
the names of the models proposed up till now:
Manual Database system is also known as Document Management System. Manual database is a hard-
file storage system that consists of paper records, folders and filing cabinets or storage boxes. A
quality manual database system makes it easy to retrieve documents and information when they are
needed. The Environmental Protection Agency recommends organizing a manual database
alphabetically, chronologically, numerically or with an alphanumeric system. Other tips include
minimizing record storage to necessary records, cleaning out documents frequently and using the most
obvious organizational method.
Practically manual databases are file boxes full or paper records and folders. Manual databases are still
used in some smaller libraries and also in places where client register is needed for example hospitals.
Of course, usage of manual database is more complicated than usage of electronic database, because, you
need to go through records by yourself.
Also the maintenance of manual databases is a lot more complicated. Manual databases don’t update
automatically, so once in a while the cleaning of records is needed. All the unnecessary records have to
be thrown out. If the database consists of huge amount of files, there can be a situation when database
consists of useless files. That can encumber searching of records. Of course, manual databases needs a
lot more space as well. Manual databases are less convenient in many ways, so you might wonder why
they still exist at all. That is because it takes a lot of time, work and money to computerize all the file
system.
A File Management system is a DBMS that allows access to single files or tables at a time. In a File
System, data is directly stored in set of files. It contains flat files that have no relation to other files
(when only one table is stored in single file, then this file is known as flat file).
B.Com IInd Year. Subject- Relational Database Management System
1) Data redundancy
In computer system many files are likely in the different formats and the programs are written in
different programming languages. Moreover, the same information may be duplicated in several files,
this duplication of data is known as data redundancy.
Example: The address and telephone number of a particular customer may appear in a file that consist
of saving account records and in a file that consist of checking account record.
2) Data inconsistency
Various copies contain the same type of data which may no longer which means that various copies of
same data may contain different kind of information.
Example: A changed customer address may be reflected in savings account records but not elsewhere in
the system.
In file processing system it is very difficult to access the data in a specific way and it also require a
special application program which carry out new task.
B.Com IInd Year. Subject- Relational Database Management System
4) Data isolation
Because data are scattered in various files and files may be in different formats, writing new applications
program to retrieve the appropriate data is difficult.
5) Integrity problem
Database must satisfy a particular consistency constraint. These constraints are added in application
program.
Example: The balance of a bank account may never fall below a prescribed amount.
6) Atomicity problem
A computer system, like any other mechanical or electrical devices, is subject to failure. In many
applications, it is crucial that if failure occurs, the data be restored to the consistent state that existed
prior to the failure.
If two programs run concurrently it is important to has supervision. But supervision is difficult to
provide because data is decentralized in file processing system. In such an environment, interaction
updates may result in inconsistent data.
8) Security problems
In this not every user of the database system should be able to access all the data.
Data
redundancy
Security Data
problems inconsistency
Disadvantages
Concurrent
access
of File Difficulty in
accessing
anomalies processing data
system
Atomicity
Data isolation
problem
Integrity
problem
B.Com IInd Year. Subject- Relational Database Management System
1. Backup:
It is possible to take faster and automatic back-up of database stored in files of computer-based
systems.
Computer systems provide functionalities to serve this purpose. It is also possible to develop
specific application program for this purpose.
2. Compactness:
3. Data Retrieval:
Computer-based systems provide enhanced data retrieval techniques to retrieve data stored in
files in easy and efficient way.
4. Editing:
Specific application programs or editing software can be used for this purpose.
5. Remote Access:
So, to access data it is not necessary for a user to remain present at location where these data are
kept.
6. Sharing:
Data stored in files of computer-based systems can be shared among multiple users at a same
time.
A Database Management System (DBMS) is system software that allows users to efficiently
define, create, maintain and share databases.
Defining a database involves specifying the data types, structures and constraints of the data to
be stored in the database.
Creating a database involves storing the data on some storage medium that is controlled by
DBMS.
Maintaining a database involves updating the database whenever required to evolve and reflect
changes in the mini world and also generating reports for each change.
Sharing a database involves allowing multiple users to access the database.
DBMS also serves as an interface between the database and end users or application programs. It
provides control access to the data and ensures that data is consistent and correct by defining
rules on them.
An application program accesses the database by sending queries or requests for data to the
DBMS. A query causes some data to be retrieved from database.
Data redundancy and inconsistency – Redundancy is the concept of repetition of data i.e. each
data may have more than a single copy. The file system cannot control redundancy of data as
each user defines and maintains the needed files for a specific application to run. There may be a
possibility that two users are maintaining same files data for different applications. Hence a
change made by one user does not reflect in files used by second users, which leads to
inconsistency of data. Whereas DBMS controls redundancy by maintaining a single repository of
data that is defined once and is accessed by many users. As there is no or less redundancy, data
remains consistent.
Data sharing – File system does not allow sharing of data or sharing is too complex. Whereas in
DBMS, data can be shared easily due to centralized system.
Data concurrency – Concurrent access to data means more than one user is accessing the same
data at the same time. Anomalies occur when changes made by one user gets lost because of
changes made by other user. File system does not provide any procedure to stop anomalies.
Whereas DBMS provides a locking system to stop anomalies to occur.
Data searching – For every search operation performed on file system, a different application
program has to be written. While DBMS provides inbuilt searching operations. User only have to
write a small query to retrieve data from database.
Data integrity – There may be cases when some constraints need to be applied on the data
before inserting it in database. The file system does not provide any procedure to check these
constraints automatically. Whereas DBMS maintains data integrity by enforcing user defined
constraints on data by itself.
From pre-stage flat-file system, to relational and object-relational systems, database technology has
gone through several generations and its history that is spread over more than 40 years now.
B.Com IInd Year. Subject- Relational Database Management System
File System is a general, easy-to-use system to store Database management system is used when security
general files which require less security and constraints. constraints are high.
Data Redundancy is more in file management system. Data Redundancy is less in database management
system.
Data Inconsistency is more in file system. Data Inconsistency is less in database management
system.
Centralization is hard to get when it comes to File Centralization is achieved in Database Management
Management System. System.
User locates the physical address of the files to access In Database Management System, user is unaware of
data in File Management System. physical address where data is stored.
Security is low in File Management System. Security is high in Database Management System.
File Management System stores unstructured data as Database Management System stores structured data
isolated data files/entities. which have well defined constraints and interrelation.
Database Applications:
Many persons are involved in the design, use and maintenance of any database. These persons can be
classified into 2 types as below.
B.Com IInd Year. Subject- Relational Database Management System
Database Administrators
Database Designers
System Analyst
Application
People deals with Database
Programmers
Operators and
maintenance personnel
The people, whose jobs involve the day-to-day use of a database are called as 'Actors on the scene', listed
as below.
The DBA is responsible for authorizing access to the database, for Coordinating and monitoring its use
and for acquiring software and hardware resources as needed. These are the people, who maintain and
design the database daily.
The DBA is responsible for interacting with the users of the system to understand what data is to be
stored in the DBMS and how it is likely to be used. The DBA creates the original schema by writing a set
of definitions and is permanently stored in the 'Data Dictionary'.
The DBA is responsible for ensuring the unauthorized data access is not permitted. The granting of
different types of authorization allows the DBA to regulate which parts of the database various users can
access.
The DBA creates appropriate storage structures and access methods by writing a set of definitions,
which are translated by the DDL compiler.
The DBA must take steps to ensure that if the system fails, users can continue to access as much of the
uncorrupted data as possible. The DBA also work to restore the data to consistent state.
Database Tuning:
The DBA is responsible for modifying the database to ensure adequate Performance as requirements
change.
The integrity constraints are kept in a special system structure that is consulted by the DBA whenever
an update takes place in the system.
2. Database Designers:
Database designers are responsible for identifying the data to be stored in the database and for choosing
appropriate structures to represent and store this data.
3. End Users:
People who wish to store and use data in a database. End users are the people whose jobs require access
to the database for querying, updating and generating reports, listed as below.
These people occasionally access the database, but they may need different information each time.
Their job function revolves around constantly querying and updating the database using standard types
of queries and updates.
These include Engineers, Scientists, Business analyst and others familiarize to implement their
applications to meet their complex requirements.
B.Com IInd Year. Subject- Relational Database Management System
These people maintain personal databases by using ready-made program packages that provide easy to
use menu based interfaces.
4. System Analyst:
These people determine the requirements of end users and develop specifications for transactions.
These people can test, debug, document and maintain the specified transactions.
Workers behind the scene:
These people who design and implement the DBMS modules and interfaces as a software package.
2. Tool Developers:
Include persons who design and implement tools consisting the packages for design, performance
monitoring, and prototyping and test data generation.
These are the system administration personnel who are responsible for the actual running and
maintenance of the hardware and software environment for the database system.
Purpose of Database Management Systems
Organizations use large amounts of data. A database management system (DBMS) is a software tool
that makes it possible to organize data in a database.
The standard acronym for database management system is DBMS, so you will often see this instead of
the full name. The ultimate purpose of a database management system is to store and transform data
into information to support making decisions.
1. The physical database: the collection of files that contain the data
2. The database engine: the software that makes it possible to access and modify the contents of
the database
3. The database scheme: the specification of the logical structure of the data stored in the
database
B.Com IInd Year. Subject- Relational Database Management System
Functions of a DBMS
So, what does a DBMS really do? It organizes your files to give you more control over your data.
A DBMS makes it possible for users to create, edit and update data in database files. Once created, the
DBMS makes it possible to store and retrieve data from those database files.
Concurrency: concurrent access (meaning 'at the same time') to the same database by multiple
users
Backup and recovery: processes to back-up the data regularly and recover data if a problem
occurs
Integrity: database structure and rules improve the integrity of the data
Database administrators also control access and security aspects. For example, different people within
an organization use databases in different ways. Some employees may simply want to view the data and
perform basic analysis. Other employees are actively involved in adding data to the database or updating
existing data. This means that the database administrator needs to set the user permissions. You don't
want someone who only needs to view the database to accidentally delete parts of the database.
There are a number of characteristics that distinguish the database approach from the file-based system
or approach. This chapter describes the benefits (and features) of the database system.
A database system is referred to as self-describing because it not only contains the database itself, but
also metadata which defines and describes the data and relationships between tables in the database.
This information is used by the DBMS software or database users if needed. This separation of data and
information about the data makes a database system totally different from the traditional file-based
system in which the data definition is part of the application programs.
In the file-based system, the structure of the data files is defined in the application programs so if a user
wants to change the structure of a file, all the programs that access that file might need to be changed as
well.
On the other hand, in the database approach, the data structure is stored in the system catalogue and not
in the programs. Therefore, one change is all that is needed to change the structure of a file. This
insulation between the programs and data is also called program-data independence.
A database supports multiple views of data. A view is a subset of the database, which is defined and
dedicated for particular users of the system. Multiple users in the system might have different views of
the system. Each view might contain only the data of interest to a user or group of users.
Current database systems are designed for multiple users. That is, they allow many users to access the
same database at the same time. This access is achieved through features called concurrency control
strategies. These strategies ensure that the data accessed are always correct and that data integrity is
maintained.
The design of modern multiuser database systems is a great improvement from those in the past which
restricted usage to one person at a time.
In the database approach, ideally, each data item is stored in only one place in the database. In some
cases, data redundancy still exists to improve system performance, but such redundancy is controlled by
application programming and kept to minimum by introducing as little redudancy as possible when
designing the database.
Data sharing
The integration of all the data, for an organization, within a database system has many advantages. First,
it allows for data sharing among employees and others who have access to the system. Second, it gives
users the ability to generate more information from a given amount of data than would be possible
without the integration.
Database management systems must provide the ability to define and enforce certain constraints to
ensure that users enter valid information and maintain data integrity. A database constraint is a
restriction or rule that dictates what can be entered or edited in a table such as a postal code using a
certain format or adding a valid city in the City field.
B.Com IInd Year. Subject- Relational Database Management System
There are many types of database constraints. Data type, for example, determines the sort of data
permitted in a field, for example numbers only. Data uniqueness such as the primary key ensures that no
duplicates are entered. Constraints can be simple (field based) or complex (programming).
Not all users of a database system will have the same accessing privileges. For example, one user might
have read-only access (i.e., the ability to read a file but not make changes), while another might have read
and write privileges, which is the ability to both read and modify a file. For this reason, a database
management system should provide a security subsystem to create and control different types of user
accounts and restrict unauthorized access.
Data independence
Another advantage of a database management system is how it allows for data independence. In other
words, the system data descriptions or data describing data (metadata) are separated from the
application programs. This is possible because changes to the data structure are handled by the
database management system and are not embedded in the program itself.
Transaction processing
A database management system must include concurrency control subsystems. This feature ensures that
data remains consistent and valid during transaction processing even if several users update the same
information.
By its very nature, a DBMS permits many users to have access to its database either individually or
simultaneously. It is not important for users to be aware of how and where the data they access is stored
Backup and recovery are methods that allow you to protect your data from loss. The database system
provides a separate process, from that of a network backup, for backing up and recovering data. If a hard
drive fails and the database stored on the hard drive is not accessible, the only way to recover the
database is from a backup.
If a computer system fails in the middle of a complex update process, the recovery subsystem is
responsible for making sure that the database is restored to its original state. These are two more
benefits of a database management system.
B.Com IInd Year. Subject- Relational Database Management System
The client/server architecture significantly decreased network traffic by providing a query response
rather than total file transfer. It allows multi-user updating through a GUI front end to a shared
database. Remote Procedure Calls (RPCs) or standard query language (SQL) statements are typically
used to communicate between the client and server.
Two-tier architecture is where a client talks directly to a server, with no intervening server. It is
typically used in small environments(less than 50 users).
In two tier client/server architectures, the user interface is placed at user's desktop environment and
the database management system services are usually in a server that is a more powerful machine that
provides services to the many clients. Information processing is split between the user system interface
environment and the database management server environment.
B.Com IInd Year. Subject- Relational Database Management System
If the client nodes are increased beyond capacity in the structure, then the server is not able to
handle the request overflow and performance of the system degrades.
B.Com IInd Year. Subject- Relational Database Management System
The three tier architecture is introduced to overcome the drawbacks of the two tier architecture. In the
three tier architecture, a middleware is used between the user system interface client environment and
the database management server environment.
The three tier structure provides much better service and fast performance.
The structure can be scaled according to requirements without any problem.
Data security is much improved in the three tier structure.
1) Combination of a client or front-end portion that interacts with the user, and a server or back-end
portion that interacts with the shared resource. The client process contains solution-specific logic and
provides the interface between the user and the rest of the application system. The server process acts
as a software engine that manages shared resources such as databases, printers, modems, or high
powered processors.
B.Com IInd Year. Subject- Relational Database Management System
2) The front-end task and back-end task have fundamentally different requirements for computing
resources such as processor speeds, memory, disk speeds and capacities, and input/output devices.
3) The environment is typically heterogeneous and multivendor. The hardware platform and operating
system of client and server are not usually the same. Client and server processes communicate through a
well-defined set of standard application program interfaces (API's) and RPC's.
Basic Components
A client is any process that requests specific services from server processes.
A server is a process that provides requested services for clients.
Both clients and servers can reside in the same computer or in different computers connected by
a network.
Databases also allow organisations to work more effectively with their customers and suppliers. They
augment workers, allowing them to do their jobs better and faster. They have also created the digital
businesses we use every day, like Amazon and eBay.
Customer Relationship Management (CRM) systems allow organisations to build strong customer
profiles from the moment they become a lead (i.e. when a customer first contacts an organisation). They
allow for targeted marketing, better communication and are also become more connected with social
media and other platforms that are commonly used for customer service and marketing.
Supplier management has become much easier thanks to Supply Chain Management (SCM) systems.
These allow organisations to do the (previously) impossible. For example, they can fulfill orders made at
the last minute and automatically coordinate thousands of suppliers and logistics companies to ensure
products reach customers on time. SCM systems can be used to look at the feasibility of a customer
request and to ensure that enough of a product will exist at times of peak demand. Large-scale SCM is
always a challenge, and even companies like Nintendo and Apple cannot cope with the level of demand
their products attract, despite having state-of-the-art systems.
Traditionally, ERP, CRM and SCM systems have been the domain of multinationals with multi-million
dollar budgets. The startup culture of the last 20 years has spawned alternatives to the SAPs and Oracles
of the world. One of the best examples is Salesforce, a CRM that takes advantage of recent mobile and
cloud services and which offers their CRM system using the Software as a Service model (so the software
is cloud-based and delivered using apps and web browsers). This type of service is much cheaper than
traditional providers, and this means that even the smallest startup can afford to use and benefit from
B.Com IInd Year. Subject- Relational Database Management System
having a customer database. Open source systems are also freely available (for example, SugarCRM) and
can be deployed with no upfront cost. However, support contracts and hiring of programmers and
administrators will never be free.
DATA SHARING:
The most significant difference between a file based systems and database systems is Data sharing.
Data sharing also requires a major change in the way of data are handled and managed within the
organization. Data sharing are of 3 (three) types. They are
The term data sharing suggests that people in different functional areas are use a common pool of data.
Each of these are own applications without data sharing the marketing group may have their data files.
The purchasing group like accounts group their own data files and marketing group have their own data
files and so on… each group benefits from its own data.
In contrasts the effect of combining data into database is synergistic that is the combined data are more
valuable than the some of the data in separated files .Not only does each group continue to have access
to its own data but within a reasonable limits and control they have access to other data as well. In this
environment the marketing department for eg: Is better of because it has access to data from
purchasing, especially product evolution which provide valuable input for marketing campaigns.
Different levels of users also need to share data. The three different levels of users are
Operation level,
Middle Management Level,
Execute level.
These three levels are corresponded to the three different types of system these are
EXECUTIVE
MIDDLE MANAGEMENT
Operations
This level of users and system naturally requires three different types of data. The user at the
operational level needs data for transaction processing that is include data for new accounts are changes
to an existing accounts management. The management information system level utilize summaries to
indicate which sales representatives were most or least productive and executives at highest level used
decision support system to discover long term trends that apply to their own corporation as well as to
identify the economic social and political environment in which they operate. The DSS means Decision
Support System help them to makes the decisions such as building a new factory starting or dropping a
product line and DSS uses summery data from within the company as well as market and author data
from outside sources.
A company with several locations has important data distributed over a valid geographically area
sharing these data is a significant problems. A centralized database is physically contained to a single
location controlled by a single computer that is Personal computer most function for which databases
are created and accomplished more easily. If the database is centralized and it is easily to update and
back up, recovery and control access to a database. If we know database exactly where it is and what’s
software control it and identify the remote place where it is located.
Data must be viewed as a corporate resource and other corporate resources must be devoted to the
development, implement and use of one or more databases. Database planning is strategy corporate
effort to determine the information needs of the organization form extended period into the future. A
successful database planning project will precede operational project to design and implement new
databases to satisfy the organizational information needs.
The need for database planning: Database planning is directed by the information needs of the
organization which in turn or determined by the company’s business plan. The process is shown in the
below diagram.
Business Plan
Information Needs
Database Plan
Database
Development Projects.
The corporation formulates has its strategy business plan for the next 5 (five) years. Accomplishing the
objectives of this plans depends on the availability of certain identify types of information. The
information can we obtained only if the data sources are identified in the database planning, or in place.
This indicates the needs of the database development projects which create new database or enhance or
integrate existing databases.
B.Com IInd Year. Subject- Relational Database Management System
In database approach in order to maintain or develop database we should take a risk and we should
invest money, time and environment. Database approach when we develop a new database or when we
maintain an existing database we should consider the following points.
The database approach causes some additional costs and risks that must be recognized and managed
when implementing this approach.
New, Specialized Personnel: Frequently, organizations that adopt the database approach need to hire
or train individuals to design and implement databases. This personnel increase seems to be expensive,
but an organization should not minimize the need for these specialized skills.
Installation and Management Cost and Complexity: A multi-user database management system is
large and complex software that has a high initial cost. It requires trained personnel to install and
operate, and also has annual maintenance costs. Installing such a system may also require upgrades to
the hardware and data communications systems in the organization.
Conversion Costs: The term “legacy systems” is used to refer to older applications in an organization
that are based on file processing. The cost of converting these older systems to modern database
technology may seem prohibitive to an organization.
Need for Explicit Backup and Recovery: A shared database must be accurate and available at all times.
This raises the need to have backup copies of data for restoring a database when damage occurs. A
modern database management system normally automates recovery tasks.
Organizational Conflict: A database requires an agreement on data definitions and ownership as well
as responsibilities for accurate data maintenance. The conflicts on data definitions, data formats and
B.Com IInd Year. Subject- Relational Database Management System
coding causes updating of shared data. Handling these issues requires organizational commitment to the
database approach.
Data modeling
Data modeling is the process of creating a data model for the data to be stored in a Database. This data
model is a conceptual representation of
Data objects
The associations between different data objects
The rules.
Data modeling helps in the visual representation of data and enforces business rules, regulatory
compliances, and government policies on the data. Data Models ensure consistency in naming
conventions, default values, semantics, security while ensuring quality of the data.
Data model emphasizes on what data is needed and how it should be organized instead of what
operations need to be performed on the data. Data Model is like architect's building plan which helps to
build a conceptual model and set the relationship between data items.
Ensures that all data objects required by the database are accurately represented. Omission of
data will lead to creation of faulty reports and produce incorrect results.
A data model helps design the database at the conceptual, physical and logical levels.
Data Model structure helps to define the relational tables, primary and foreign keys and stored
procedures.
It provides a clear picture of the base data and can be used by database developers to create a
physical database.
It is also helpful to identify missing and redundant data.
Though the initial creation of data model is labor and time consuming, in the long run, it makes
your IT infrastructure upgrade and maintenance cheaper and faster.
Types of Data Models
1. Conceptual: This Data Model defines WHAT the system contains. This model is typically created
by Business stakeholders and Data Architects. The purpose is to organize, scope and define
business concepts and rules.
2. Logical: Defines HOW the system should be implemented regardless of the DBMS. This model is
typically created by Data Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.
B.Com IInd Year. Subject- Relational Database Management System
3. Physical: This Data Model describes HOW the system will be implemented using a specific DBMS
system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.
Conceptual Model
The main aim of this model is to establish the entities, their attributes, and their relationships. In this
Data modeling level, there is hardly any detail available of the actual Database structure.
For example:
Customer and Product are two entities. Customer number and name are attributes of the
Customer entity
Product name and price are attributes of product entity
Sale is the relationship between the customer and product
B.Com IInd Year. Subject- Relational Database Management System
Logical data models add further information to the conceptual model elements. It defines the structure
of the data elements and set the relationships between them.
The advantage of the Logical data model is to provide a foundation to form the base for the Physical
model. However, the modeling structure remains generic.
At this Data Modeling level, no primary or secondary key is defined. At this Data modeling level, you
need to verify and adjust the connector details that were set earlier for relationships.
Describes data needs for a single project but could integrate with other logical data models
based on the scope of the project.
Designed and developed independently from the DBMS.
Data attributes will have datatypes with exact precisions and length.
B.Com IInd Year. Subject- Relational Database Management System
A Physical Data Model describes the database specific implementation of the data model. It offers an
abstraction of the database and helps generate schema. This is because of the richness of meta-data
offered by a Physical Data Model.
This type of Data model also helps to visualize database structure. It helps to model database columns
keys, constraints, indexes, triggers, and other RDBMS features.
The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and null
ability of the relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.
Advantages and Disadvantages of Data Model:
The main goal of a designing data model is to make certain that data objects offered by the
functional team are represented accurately.
The data model should be detailed enough to be used for building the physical database.
The information in the data model can be used for defining the relationship between tables,
primary and foreign keys, and stored procedures.
Data Model helps business to communicate the within and across organizations.
Data model helps to documents data mappings in ETL process
Help to recognize correct sources of data to populate the model
To developer Data model one should know physical data stored characteristics.
B.Com IInd Year. Subject- Relational Database Management System
Unit II
A core aspect of software engineering is the subdivision of the development process into a series of
phases, or steps, each of which focuses on one aspect of the development. The collection of these steps is
sometimes referred to as the software development life cycle (SDLC). The software product moves
B.Com IInd Year. Subject- Relational Database Management System
through this life cycle (sometimes repeatedly as it is refined or redeveloped) until it is finally retired
from use. Ideally, each phase in the life cycle can be checked for correctness before moving on to the
next phase.
Let us start with an overview of the waterfall model such as you will find in most software engineering
textbooks. This waterfall figure, seen in Figure 13.1, illustrates a general waterfall model that could
apply to any computer system development. It shows the process as a strict sequence of steps where the
output of one step is the input to the next and all of one step has to be completed before moving onto the
next.
Waterfall model.
We can use the waterfall process as a means of identifying the tasks that are required, together with the
input and output for each activity. What is important is the scope of the activities, which can be
summarized as follows:
Establishing requirements involves consultation with, and agreement among, stakeholders about
what they want from a system, expressed as a statement of requirements.
Analysis starts by considering the statement of requirements and finishes by producing a system
specification. The specification is a formal representation of what a system should do, expressed
in terms that are independent of how it may be realized.
Design begins with a system specification, produces design documents and provides a detailed
description of how a system should be constructed.
B.Com IInd Year. Subject- Relational Database Management System
Testing compares the implemented system against the design documents and requirements
specification and produces an acceptance report or, more usually, a list of errors and bugs that
require a review of the analysis, design and implementation processes to correct (testing is
usually the task that leads to the waterfall model iterating through the life cycle).
We can use the waterfall cycle as the basis for a model of database development that incorporates three
assumptions:
1. We can separate the development of a database – that is, specification and creation of a schema
to define data in a database – from the user processes that make use of the database.
2. We can use the three-schema architecture as a basis for distinguishing the activities associated
with a schema.
3. We can represent the constraints to enforce the semantics of the data once within a database,
rather than within every user process that uses the data.
A waterfall model of the activities and their outputs for database development.
B.Com IInd Year. Subject- Relational Database Management System
Using these assumptions and Figure 13.2, we can see that this diagram represents a model of the
activities and their outputs for database development. It is applicable to any class of DBMS, not just a
relational approach.
Requirements Gathering
The first step is requirements gathering. During this step, the database designers have to interview the
customers (database users) to understand the proposed system and obtain and document the data and
functional requirements. The result of this step is a document that includes the detailed requirements
provided by the users.
Establishing requirements involves consultation with, and agreement among, all the users as to what
persistent data they want to store along with an agreement as to the meaning and interpretation of the
data elements. The data administrator plays a key role in this process as they overview the business,
legal and ethical issues within the organization that impact on the data requirements.
The data requirements document is used to confirm the understanding of requirements with users. To
make sure that it is easily understood, it should not be overly formal or highly encoded. The document
should give a concise summary of all users’ requirements – not just a collection of individuals’
requirements – as the intention is to develop a single shared database.
The requirements should not describe how the data is to be processed, but rather what the data items
are, what attributes they have, what constraints apply and the relationships that hold between the data
items.
Analysis
Data analysis begins with the statement of data requirements and then produces a conceptual data
model. The aim of analysis is to obtain a detailed description of the data that will suit user requirements
so that both high and low level properties of data and their use are dealt with. These include properties
such as the possible range of values that can be permitted for attributes (e.g., in the school database
example, the student course code, course title and credit points).
The conceptual data model provides a shared, formal representation of what is being communicated
between clients and developers during database development – it is focused on the data in a database,
irrespective of the eventual use of that data in user processes or implementation of the data in specific
computer environments. Therefore, a conceptual data model is concerned with the meaning and
structure of data, but not with the details affecting how they are implemented.
The conceptual data model then is a formal representation of what data a database should contain and
the constraints the data must satisfy. This should be expressed in terms that are independent of how the
B.Com IInd Year. Subject- Relational Database Management System
model may be implemented. As a result, analysis focuses on the questions, “What is required?” not “How
is it achieved?”
Logical Design
Database design starts with a conceptual data model and produces a specification of a logical schema;
this will determine the specific type of database system (network, relational, object-oriented) that is
required. The relational representation is still independent of any specific DBMS; it is another
conceptual data model.
We can use a relational representation of the conceptual data model as input to the logical design
process. The output of this stage is a detailed relational specification, the logical schema, of all the tables
and constraints needed to satisfy the description of the data in the conceptual data model. It is during
this design activity that choices are made as to which tables are most appropriate for representing the
data in a database. These choices must take into account various design criteria including, for example,
flexibility for change, control of duplication and how best to represent the constraints. It is the tables
defined by the logical schema that determine what data are stored and how they may be manipulated in
the database.
Database designers familiar with relational databases and SQL might be tempted to go directly to
implementation after they have produced a conceptual data model. However, such a direct
transformation of the relational representation to SQL tables does not necessarily result in a database
that has all the desirable properties: completeness, integrity, flexibility, efficiency and usability. A good
conceptual data model is an essential first step towards a database with these properties, but that does
not mean that the direct transformation to SQL tables automatically produces a good database. This first
step will accurately represent the tables and constraints needed to satisfy the conceptual data model
description, and so will satisfy the completeness and integrity requirements, but it may be inflexible or
offer poor usability. The first design is then flexed to improve the quality of the database
design. Flexing is a term that is intended to capture the simultaneous ideas of bending something for a
different purpose and weakening aspects of it as it is bent.
Summarizes the iterative (repeated) steps involved in database design, based on the overview given. Its
main purpose is to distinguish the general issue of what tables should be used from the detailed
definition of the constituent parts of each table – these tables are considered one at a time, although they
are not independent of each other. Each iteration that involves a revision of the tables would lead to a
new design; collectively they are usually referred to as second-cut designs, even if the process iterates for
more than a single loop.
B.Com IInd Year. Subject- Relational Database Management System
First, for a given conceptual data model, it is not necessary that all the user requirements it represents
be satisfied by a single database. There can be various reasons for the development of more than one
database, such as the need for independent operation in different locations or departmental control over
“their” data. However, if the collection of databases contains duplicated data and users need to access
data in more than one database, then there are possible reasons that one database can satisfy multiple
requirements, or issues related to data replication and distribution need to be examined.
Second, one of the assumptions about database development is that we can separate the development of
a database from the development of user processes that make use of it. This is based on the expectation
that, once a database has been implemented, all data required by currently identified user processes
have been defined and can be accessed; but we also require flexibility to allow us to meet future
requirements changes. In developing a database for some applications, it may be possible to predict the
common requests that will be presented to the database and so we can optimize our design for the most
common requests.
Third, at a detailed level, many aspects of database design and implementation depend on the particular
DBMS being used. If the choice of DBMS is fixed or made prior to the design task, that choice can be used
to determine design criteria rather than waiting until implementation. That is, it is possible to
incorporate design decisions for a specific DBMS rather than produce a generic design and then tailor it
to the DBMS during implementation.
It is not uncommon to find that a single design cannot simultaneously satisfy all the properties of a good
database. So it is important that the designer has prioritized these properties (usually using information
from the requirements specification); for example, to decide if integrity is more important than
efficiency and whether usability is more important than flexibility in a given development.
At the end of our design stage, the logical schema will be specified by SQL data definition language (DDL)
statements, which describe the database that needs to be implemented to meet the user requirements.
B.Com IInd Year. Subject- Relational Database Management System
Implementation
In practice, implementation of the logical schema in a given DBMS requires a very detailed knowledge of
the specific features and facilities that the DBMS has to offer. In an ideal world, and in keeping with good
software engineering practice, the first stage of implementation would involve matching the design
requirements with the best available implementing tools and then using those tools for the
implementation. In database terms, this might involve choosing vendor products with DBMS and SQL
variants most suited to the database we need to implement. However, we don’t live in an ideal world and
more often than not, hardware choice and decisions regarding the DBMS will have been made well in
advance of consideration of the database design. Consequently, implementation can involve additional
flexing of the design to overcome any software or hardware limitations.
After the logical design has been created, we need our database to be created according to the
definitions we have produced. For an implementation with a relational DBMS, this will probably involve
the use of SQL to create tables and constraints that satisfy the logical schema description and the choice
of appropriate storage schema (if the DBMS permits that level of control).
One way to achieve this is to write the appropriate SQL DDL statements into a file that can be executed
by a DBMS so that there is an independent record, a text file, of the SQL statements defining the
database. Another method is to work interactively using a database tool like SQL Server Management
Studio or Microsoft Access. Whatever mechanism is used to implement the logical schema, the result is
that a database, with tables and constraints, is defined but will contain no data for the user processes.
After a database has been created, there are two ways of populating the tables – either from existing
data or through the use of the user applications developed for the database.
For some tables, there may be existing data from another database or data files. For example, in
establishing a database for a hospital, you would expect that there are already some records of all the
staff that have to be included in the database. Data might also be brought in from an outside agency
(address lists are frequently brought in from external companies) or produced during a large data entry
task (converting hard-copy manual records into computer files can be done by a data entry agency). In
B.Com IInd Year. Subject- Relational Database Management System
such situations, the simplest approach to populate the database is to use the import and export facilities
found in the DBMS.
Facilities to import and export data in various standard formats are usually available (these functions
are also known in some systems as loading and unloading data). Importing enables a file of data to be
copied directly into a table. When data are held in a file format that is not appropriate for using the
import function, then it is necessary to prepare an application program that reads in the old data,
transforms them as necessary and then inserts them into the database using SQL code specifically
produced for that purpose. The transfer of large quantities of existing data into a database is referred to
as a bulk load. Bulk loading of data may involve very large quantities of data being loaded, one table at a
time so you may find that there are DBMS facilities to postpone constraint checking until the end of the
bulk loading.
Note: These are general guidelines that will assist in developing a strong basis for the actual database
design (the logical model).
2. Document all attributes that belong to each entity. Select candidate and primary keys. Ensure
that all non-key attributes for each entity are full-functionally dependent on the primary key.
3. Develop an initial ER diagram and review it with appropriate personnel. (Remember that this is
an iterative process.)
4. Create new entities (tables) for multivalued attributes and repeating groups. Incorporate these
new entities (tables) in the ER diagram. Review with appropriate personnel.
The relational data model was introduced by C. F. Codd in 1970. Currently, it is the most widely used
data model.
The standard database access language called structured query language (SQL)
The relational data model describes the world as “a collection of inter-related relations (or tables).”
B.Com IInd Year. Subject- Relational Database Management System
Relation
A relation, also known as a table or file, is a subset of the Cartesian product of a list of domains
characterized by a name. And within a table, each row represents a group of related data values. A row,
or record, is also known as a tuple. The columns in a table is a field and is also referred to as an attribute.
You can also think of it this way: an attribute is used to define the record and a record contains a set of
attributes.
The steps below outline the logic between a relation and its domains.
3. Then r ⊆ D1×D2×…×Dn
Table
A database is composed of multiple tables and each table holds the data. Figure 7.1 shows a database
that contains three tables.
Column
A database stores pieces of information or facts in an organized way. Understanding how to use and get
the most out of databases requires us to understand that method of organization.
The principal storage units are called columns or fields or attributes. These house the basic components
of data into which your content can be broken down. When deciding which fields to create, you need to
think generically about your information, for example, drawing out the common components of the
B.Com IInd Year. Subject- Relational Database Management System
information that you will store in the database and avoiding the specifics that distinguish one item from
another.
Look at the example of an ID card in Figure 7.2 to see the relationship between fields and their data.
Domain
A domain is the original sets of atomic values used to model data. By atomic value, we mean that each
value in the domain is indivisible as far as the relational model is concerned. For example:
The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.
The domain of First Name is the set of character strings that represents names of people.
In summary, a domain is a set of acceptable values that a column is allowed to contain. This is based on
various properties and the data type for the column.
Records
Just as the content of any one document or item needs to be broken down into its constituent bits of data
for storage in the fields, the link between them also needs to be available so that they can be
reconstituted into their whole form. Records allow us to do this. Records contain fields that are related,
such as a customer or an employee. As noted earlier, a tuple is another term used for record.
Records and fields form the basis of all databases. A simple table gives us the clearest picture of how
records and fields work together in a database storage project.
B.Com IInd Year. Subject- Relational Database Management System
The simple table example in Figure shows us how fields can hold a range of different sorts of data. This
one has:
An Author field: this is displayed as Initial. Surname; its data type is text.
You can command the database to sift through its data and organize it in a particular way. For example,
you can request that a selection of records be limited by date: 1. all before a given date, 2. all after a
given date or 3. All between two given dates. Similarly, you can choose to have records sorted by date.
Because the field, or record, containing the data is set up as a Date field, the database reads the
information in the Date field not just as numbers separated by slashes, but rather, as dates that must be
ordered according to a calendar system.
Degree
The degree is the number of attributes in a table. In our example in Figure 7.3, the degree is 4.
Properties of a Table
A table has a name that is distinct from all other tables in the database.
Entries in columns are atomic. The table does not contain repeating groups or multivalued
attributes.
Entries from columns are from the same domain based on their data type including:
o character (string)
B.Com IInd Year. Subject- Relational Database Management System
o date
The entity relationship (ER) data model has existed for over 35 years. It is well suited to data modelling
for use with databases because it is fairly abstract and is easy to discuss and explain. ER models are
readily translated to relations. ER models, also called an ER schema, are represented by ER diagrams.
Here is an example of how these two concepts might be combined in an ER data model: Prof. Ba
(entity) teaches (relationship) the Database Systems course (entity).
For the rest of this chapter, we will use a sample database called the COMPANY database to illustrate the
concepts of the ER model. This database contains information about employees, departments and
projects. Important points to note include:
There are several departments in the company. Each department has a unique identification, a
name, location of the office and a particular employee who manages the department.
A department controls a number of projects, each of which has a unique name, a unique number
and a budget.
Each employee has a name, identification number, address, salary and birthdate. An employee is
assigned to one department but can join in several projects. We need to record the start date of
the employee in each project. We also need to know the direct supervisor of each employee.
We want to keep track of the dependents for each employee. Each dependent has a name,
birthdate and relationship with the employee.
An entity is an object in the real world with an independent existence that can be differentiated from
other objects. An entity might be
B.Com IInd Year. Subject- Relational Database Management System
Entities can be classified based on their strength. An entity is considered weak if its tables are existence
dependent.
Its primary key is derived from the primary key of the parent entity
o The Spouse table, in the COMPANY database, is a weak entity because its primary key is
dependent on the Employee table. Without a corresponding employee record, the spouse
record would not exist.
An entity is considered strong if it can exist apart from all of its related entities.
A table without a foreign key or a table that contains a foreign key that can contain nulls is a
strong entity
Another term to know is entity type which defines a collection of similar entities.
An entity set is a collection of entities of an entity type at a particular point of time. In an entity
relationship diagram (ERD), an entity type is represented by a name in a box. For example, in Figure 8.1,
the entity type is EMPLOYEE.
Existence dependency
Kinds of Entities
You should also be familiar with different kinds of entities including independent entities, dependent
entities and characteristic entities. These are described below.
B.Com IInd Year. Subject- Relational Database Management System
Independent entities
Independent entities, also referred to as kernels, are the backbone of the database. They are what other
tables are based on. Kernels have the following characteristics:
If we refer back to our COMPANY database, examples of an independent entity include the Customer
table, Employee table or Product table.
Dependent entities
Dependent entities, also referred to as derived entities, depend on other tables for their meaning. These
entities have the following characteristics:
Many to many relationships become associative tables with at least two foreign keys.
Characteristic entities
Characteristic entities provide more information about another table. These entities have the following
characteristics:
2. Create a new simple primary key. In the COMPANY database, these might include:
Employee (EID, Name, Address, Age, Salary) – EID is the simple primary key.
EmployeePhone (EID, Phone) – EID is part of a composite primary key. Here, EID
is also a foreign key.
Attributes
Each entity is described by a set of attributes (e.g., Employee = (Name, Address, Birthdate (Age), Salary).
Each attribute has a name, and is associated with an entity and a domain of legal values. However, the
information about attribute domain is not presented on the ERD.
In the entity relationship diagram, shown in Figure 8.2, each attribute is represented by an oval with a
name inside.
Types of Attributes
There are a few types of attributes you need to be familiar with. Some of these are to be left as is, but
some need to be adjusted to facilitate representation in the relational model. This first section will
discuss the types of attributes. Later on we will discuss fixing the attributes to fit correctly into the
relational model.
Simple attributes
Simple attributes are those drawn from the atomic value domains; they are also called single-valued
attributes. In the COMPANY database, an example of this would be: Name = {John} ; Age = {23}
B.Com IInd Year. Subject- Relational Database Management System
Composite attributes
Composite attributes are those that consist of a hierarchy of attributes. Using our database example, and
shown in Figure 8.3, Address may consist of Number, Street and Suburb. So this would be written as →
Address = {59 + ‘Meek Street’ + ‘Kingsford’}
Multivalued attributes
Multivalued attributes are attributes that have a set of values for each entity. An example of a
multivalued attribute from the COMPANY database, as seen in Figure 8.4, are the degrees of an
employee: BSc, MIT, PhD.
Derived attributes
Derived attributes are attributes that contain values calculated from other attributes. An example of this
can be seen in Figure 8.5. Age can be derived from the attribute Birthdate. In this situation, Birthdate
is called a stored attribute, which is physically saved to the database.
B.Com IInd Year. Subject- Relational Database Management System
Keys
An important constraint on an entity is the key. The key is an attribute or a group of attributes whose
values can be used to uniquely identify an individual entity in an entity set.
Types of Keys
Candidate key
A candidate key is a simple or composite key that is unique and minimal. It is unique because no two
rows in a table may have the same value at any time. It is minimal because every column is necessary in
order to attain uniqueness.
From our COMPANY database example, if the entity is Employee(EID, First Name, Last Name, SIN,
Address, Phone, BirthDate, Salary, DepartmentID), possible candidate keys are:
EID, SIN
First Name and Last Name – assuming there is no one else in the company with the same name
Last Name and DepartmentID – assuming two people with the same last name don’t work in the
same department
Composite key
Using the example from the candidate key section, possible composite keys are:
First Name and Last Name – assuming there is no one else in the company with the same name
Last Name and Department ID – assuming two people with the same last name don’t work in the
same department
Primary key
The primary key is a candidate key that is selected by the database designer to be used as an identifying
mechanism for the whole entity set. It must uniquely identify tuples in a table and not be null. The
primary key is indicated in the ER model by underlining the attribute.
A candidate key is selected by the designer to uniquely identify tuples in a table. It must not be
null.
B.Com IInd Year. Subject- Relational Database Management System
A key is chosen by the database designer to be used as an identifying mechanism for the whole
entity set. This is referred to as the primary key. This key is indicated by underlining the
attribute in the ER model.
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Secondary key
A secondary key is an attribute used strictly for retrieval purposes (can be composite), for example:
Phone and Last Name.
Alternate key
Alternate keys are all candidate keys not chosen as the primary key.
Foreign key
A foreign key (FK) is an attribute in a table that references the primary key in another table OR it can be
null. Both foreign and primary keys must be of the same data type.
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Nulls
A null is a special symbol, independent of data type, which means either unknown or inapplicable. It
does not mean zero or blank. Features of null include:
No data entry
Can represent
Can create problems when functions such as COUNT, AVERAGE and SUM are used
Normalization
Normalization should be part of the database design process. However, it is difficult to separate the
normalization process from the ER modelling process so the two techniques should be used
concurrently.
Use an entity relation diagram (ERD) to provide the big picture, or macro view, of an organization’s data
requirements and operations. This is created through an iterative process that involves identifying
relevant entities, their attributes and their relationships.
Normalization procedure focuses on characteristics of specific entities and represents the micro view of
entities within the ERD.
What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the process of
determining how much redundancy exists in a table. The goals of normalization are to:
Normalization theory draws heavily on the theory of functional dependencies. Normalization theory
defines six normal forms (NF). Each normal form involves a set of dependency properties that a schema
must satisfy and each normal form gives guarantees about the presence and/or absence of update
anomalies. This means that higher normal forms have less redundancy, and as a result, fewer update
problems.
Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally we only
want minimal redundancy for PK to FK. Everything else should be derived from other tables. There are
six normal forms, but we will only look at the first four, which are:
In the first normal form, only single values are permitted at the intersection of each row and column;
hence, there are no repeating groups.
B.Com IInd Year. Subject- Relational Database Management System
To normalize a relation that contains a repeating group, remove the repeating group and form two new
relations.
The PK of the new relation is a combination of the PK of the original relation plus an attribute from the
newly created relation for unique identification.
We will use the Student_Grade_Report table below, from a School database, as our example to explain
the process for 1NF.
In the Student Grade Report table, the repeating group is the course information. A student can
take many courses.
Remove the repeating group. In this case, it’s the course information for each student.
The PK must uniquely identify the attribute value (StudentNo and CourseNo).
After removing all the attributes related to the course and student, you are left with the student
course table (StudentCourse).
The Student table (Student) is now in first normal form with the repeating group removed.
For the second normal form, the relation must first be in 1NF. The relation is automatically in 2NF if, and
only if, the PK comprises a single attribute.
If the relation has a composite PK, then each non-key attribute must be fully dependent on the entire PK
and not on a subset of the PK (i.e., there must be no partial dependency or augmentation).
When examining the Student Course table, we see that not all the attributes are fully dependent
on the PK; specifically, all course information. The only attribute that is fully dependent is grade.
To be in third normal form, the relation must be in second normal form. Also all transitive dependencies
must be removed; a non-key attribute may not be functionally dependent on another non-key attribute.
Eliminate all dependent attributes in transitive relationship(s) from each of the tables that have
a transitive relationship.
Check new table(s) as well as table(s) modified to make sure that each table has a determinant
and that no table contains inappropriate dependencies.
At this stage, there should be no anomalies in third normal form. Let’s look at the dependency diagram
(Figure 12.1) for this example. The first step is to remove repeating groups, as discussed above.
To recap the normalization process for the School database, review the dependencies shown in Figure.
When a table has more than one candidate key, anomalies may result even though the relation is in
3NF. Boyce-Codd normal form is a special case of 3NF. A relation is in BCNF if, and only if, every
determinant is a candidate key.
BCNF Example 1
B.Com IInd Year. Subject- Relational Database Management System
The semantic rules (business rules applied to the database) for this table are:
The functional dependencies for this table are listed below. The first one is a candidate key; the second is
not.
3. Update – inconsistencies
To reduce the St_Maj_Adv relation to BCNF, you create two new tables:
St_Adv table
Student_id Advisor
111 Smith
111 Chan
320 Dobbs
671 White
803 Smith
Adv_Maj table
Advisor Major
Smith Physics
Chan Music
Dobbs Math
White Physics
BCNF Example 2
A relation is in BCNF if, and only if, every determinant is a candidate key. We need to create a table that
incorporates the first three FDs (Client_Interview2 table) and another table (StaffRoom table) for the
fourth FD.
Client_Interview2 table
StaffRoom table
During the normalization process of database design, make sure that proposed entities meet required
normal form before table structures are created. Many real-world databases have been improperly
designed or burdened with anomalies if improperly modified during the course of time. You may be
asked to redesign and modify existing databases. This can be a large undertaking if the tables are not
properly normalized.
Functional dependency
In a given table, an attribute Y is said to have a functional dependency on a set of
attributes X (written X → Y) if and only if each X value is associated with precisely one Y value.
For example, in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of
Birth", the functional dependency {Employee ID} → {Employee Date of Birth} would hold. It follows from
the previous two sentences that each {Employee ID} is associated with precisely one {Employee Date of
Birth}.
Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue
of X→Y and Y→Z.
Multivalve dependency
A multivalued dependency is a constraint according to which the presence of certain rows in a
table implies the presence of certain other rows.
Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables
each having a subset of the attributes of T.
B.Com IInd Year. Subject- Relational Database Management System
Modification Anomalies
Once our E-R model has been converted into relations, we may find that some relations are not properly
specified. There can be a number of problems:
Deletion Anomaly: Deleting one fact or data point from a relation results in other information
being lost.
Insertion Anomaly: Inserting a new fact or tuple into a relation requires we have information
from two or more entities – this situation might not be feasible.
Update Anomaly: Updating one fact in a relation requires us to update multiple tuples.
Here is a quick example to illustrate these anomalies: A company has a Purchase Order form:
Normalization Process
Relations can fall into one or more categories (or classes) called Normal Forms
Normal Form: A class of relations free from a certain set of modification anomalies.
Normal forms are given names such as:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)
Domain-Key normal form (DK/NF)
These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF.
The Normalization Process for a given relation consists of:
a. Specify the Key of the relation
b. Specify the functional dependencies of the relation.
Sample data (tuples) for the relation can assist with this step.
c. Apply the definition of each normal form (starting with 1NF).
d. If a relation fails to meet the definition of a normal form, change the relation (most often
by splitting the relation into two new relations) until it meets the definition.
e. Re-test the modified/new relations to ensure they meet the definitions of each normal
form.
In the next set of notes, each of the normal forms will be defined along with an example of the
normalization steps.