Chapter 1r
Chapter 1r
Today, Databases are essential to every business. They are used to maintain
internal records, to present data to customers and clients on the World-Wide-
Web, and to support many other commercial processes. Databases are likewise
found at the core of many modern organizations.
The power of databases comes from a body of knowledge and technology that
has developed over several decades and is embodied in specialized software
called a database management system, or DBMS. A DBMS is a powerful tool for
creating and managing large amounts of data efficiently and allowing it to
persist over long periods of time, safely. These systems are among the most
complex types of software available.
Data management passes through the different levels of development along with
the development in technology and services. These levels could best be described
by categorizing the levels into three levels of development. Even though there is
an advantage and a problem overcome at each new level, all methods of data
handling are in use to some extent. The major three levels are;
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1
1. Manual Approach
In the manual approach, data storage and retrieval follows the primitive and
traditional way of information handling where cards and paper are used for the
purpose. The data storage and retrieval will be performed using human labour.
Files, for as many event and objects as the organization has, are used to
store information.
Each of the files containing various kinds of information is labelled and
stored in one or more cabinets.
The cabinets could be kept in safe places for security purpose based on the
sensitivity of the information contained in it.
Insertion and retrieval is done by searching first for the right cabinet then
for the right the file then the information.
One could have an indexing system to facilitate access to the data
2
2. Traditional File Based Approach
After the introduction of Computer for data processing to the business
community, the need to use the device for data storage and processing
increase. There were, and still are, several computer applications with file
based processing used for the purpose of data handling. Even though the
approach evolved over time, the basic structure is still similar if not identical.
File based systems were an early attempt to computerize the manual filing
system.
This approach is the decentralized computerized data handling method.
A collection of application programs perform services for the end-users. In
such systems, every application program that provides service to end
users define and manage its own data
Such systems have number of programs for each of the different
applications in the organization.
Since every application defines and manages its own data, the system is
subject to serious data duplication problem.
File, in traditional file based approach, is a collection of records which
contains logically related data.
..
.....
File handling
Data entry routines
and reports
File definition
Sales Sales files
Sales application programs
File handling
Data entry routines
)lo
and reports
File definition
Contracts Contracts files
Contracts application programs
Sales Fi.ks
PropertyJor_Rent(l'roperty Number, Street, Area, City, I'oat Code, PropertyLype,
Number of Rooms.••Monthly Rent, Owner Number)
Owner(Owner Number, First Name, Last Name, Address.Telephone Number)
Reuter(RenLer Number, First Name, Last Narne , Address, Telephone Number,
Preferred Type, Maximum Rene)
Contracts Fil..:$
Lease(I .ense )..furn her, Properly Number, Renter 'Number, Monthly Reru,
Payment Method, Deposit, Paid, Rene Start Date, Rent Finish Date, Duration)
Property_for_Rent(Propcrly 'Nurnbcr, Street, Arca, City, Pm,t Code, Monthly Rcru )
Renter(Renter 'Nurnbcr , Firs I Nurnc, I ,asl Nurnc, Address, Tc lcpb onc Kurnher)
3
Limitations of the Traditional File Based approach
As business application become more complex demanding more flexible and
reliable data handling methods, the shortcomings of the file based system
became evident. These shortcomings include, but not limited to:
Separation or Isolation of Data: Available information in one application
may not be known. Data Synchronisation is done manually.
Limited data sharing- every application maintains its own data.
Lengthy development and maintenance time
Duplication or redundancy of data (money and time cost and loss of data
integrity)
Data dependency on the application- data structure is embedded in the
application; hence, a change in the data structure needs to change the
application as well.
Incompatible file formats or data structures (e.g. “C” and COBOL)
between different applications and programs creating inconsistency and
difficulty to process jointly.
Fixed query processing which is defined during application development
The limitations for the traditional file based data handling approach arise
from two basic reasons.
1. Definition of the data is embedded in the application program which
makes it difficult to modify the database definition easily.
2. No control over the access and manipulation of the data beyond that
imposed by the application programs.
The most significant problem experienced by the traditional file based approach
of data handling can be formalized by what is called “update anomalies”. We
have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one ore more data
value is modified on one application program but not on others
containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is
deleted from one application but remain untouched in other application
programs.
3. Insertion Anomalies: a problem experienced when ever there is new data
item to be recorded, and the recording is not made in all the applications.
And when same data item is inserted at different applications, there could
be errors in encoding which makes the new data item to be considered as
a totally different object.
4
3. Database Approach
Following a famous paper written by Dr. Edgar Frank Codd in 1970, database
systems changed significantly. Codd proposed that database systems should
present the user with a view of data organized as tables called relations. Behind
the scenes, there might be a complex data structure that allowed rapid response
to a variety of queries. But, unlike the user of earlier database systems, the user
of a relational system would not be concerned with the storage structure. Queries
could be expressed in a very high-level language, which greatly increased the
efficiency of database programmers. The database approach emphasizes the
integration and sharing of data throughout the organization.
5
Benefits of the database approach
Data can be shared: two or more users can access and use same data instead
of storing data in redundant manner for each user.
Improved accessibility of data: by using structured query languages, the
users can easily access data without programming experience.
Redundancy can be reduced: isolated data is integrated in database to
decrease the redundant data stored at different applications.
Quality data can be maintained: the different integrity constraints in the
database approach will maintain the quality leading to better decision
making
Inconsistency can be avoided: controlled data redundancy will avoid
inconsistency of the data in the database to some extent.
Transaction support can be provided: basic demands of any transaction
support systems are implanted in a full scale DBMS.
Integrity can be maintained: data at different applications will be integrated
together with additional constraints to facilitate validity and consistency of
shared data resource.
Security measures can be enforced: the shared data can be secured by having
different levels of clearance and other data security mechanisms.
Improved decision support: the database will provide information useful for
decision making.
Standards can be enforced: the different ways of using and dealing with data
by different unite of an organization can be balanced and standardized by
using database approach.
Compactness: since it is an electronic data handling method, the data is
stored compactly (no voluminous papers).
Speed: data storage and retrieval is fast as it will be using the modern fast
computer systems.
Less labour: unlike the other data handling methods, data maintenance will
not demand much resource.
Centralized information control: since relevant data in the organization will
be stored at one repository, it can be controlled and managed at the
central level.
6
Data entry
and reports
Sales Sales
application programs DBMS 14.:..---)l•O
G�---- Database
�-ra
a �-e i-
ny__,
r
Property, Owner, Renter
D
....
and reports
and Lease details
- File definitions
Contracts Contrac1s
application programs
7
Database Management System (DBMS)
Database Management System (DBMS) is a Software package used for providing
EFFICIENT, CONVENIENT and SAFE MULTI-USER (many people/programs accessing
same database, or even same data, simultaneously ) storage of and access to MASSIVE
amounts of PERSISTENT (data outlives programs that operate on it) data. A DBMS also
provides a systematic method for creating, updating, storing, retrieving data in a
database. DBMS also provides the service of controlling data access, enforcing
data integrity, managing concurrency control, and recovery. Having this in
mind, a full scale DBMS should at least have the following services to
provide to the user.
8
DBMS and Components of DBMS Environment
Programmers Users OBA
Application Database
Queries
programs scheme
•.
DBMS
DML
preprocessor
Program
object code
Database
manager
.. Dictionary
manager
•.
�o
Access File
methods manager
System
buffers •
Database
and
system catalog
9
Data Manipulation Language (DML):
o Is a core command used by end-users and programmers to store,
retrieve, and access the data in the database e.g. SQL
o Since the required data or Query by the user will be extracted using
this type of language, it is also called "Query Language"
Data Dictionary:
o Due to the fact that a database is a self describing system, this tool,
Data Dictionary, is used to store and organize information about
the data stored in the database.
The DBMS is software package that helps to design, manage, and use data using
the database approach. Taking a DBMS as a system, one can describe it with
respect to it environment or other systems interacting with the DBMS. The DBMS
environment has five components. To design and use a database, there will be
the interaction or integration of Hardware, Software, Data, Procedure and
People.
1. Hardware: are components that one can touch and feel. These
components are comprised of various types of personal computers,
mainframe or any server computers to be used in multi-user system,
network infrastructure, and other peripherals required in the system.
3. Data: since the goal of any database system is to have better control of
the data and making data useful, Data is the most important component to
the user of the database. There are two categories of data in any database
10
system: that is Operational and Metadata. Operational data is the data
actually stored in the system to be used by the user. Metadata is the data
that is used to store information about the database itself.
The structure of the data in the database is called the schema, which is
composed of the Entities, Properties of entities, and relationship between
entities and business constraints.
4. Procedure: this is the rules and regulations on how to design and use a
database. It includes procedures like how to log on to the DBMS, how to
use facilities, how to start and stop DBMS, how to make backup, how to
treat hardware and software failure, how to change the structure of the
database.
11
Database Development Life Cycle (DDLC)
12
Roles in Database Design and Use
As people are one of the components in DBMS environment, there are group of
roles played by different stakeholders of the designing and operation of a
database system.
13
1. Logical and Conceptual DBD
Identifies data (entity, attributes and relationship) relevant
to the organization
Identifies constraints on each data
Understand data and business rules in the organization
Sees the database independent of any data model at
conceptual level and consider one specific data model at
logical design phase.
2. Physical DBD
Take logical design specification as input and decide how it
should be physically realized.
Map the logical data model on the specified DBMS with respect
to tables and integrity constraints. (DBMS dependent designing)
Select specific storage structure and access path to the database
Design security measures required on the database
4. End Users
Workers, whose job requires accessing the database frequently for various
purposes, there are different group of users in this category.
1. Naïve Users:
Sizable proportion of users
Unaware of the DBMS
Only access the database based on their access level and
demand
Use standard and pre-specified types of queries.
2. Sophisticated Users
Users familiar with the structure of the Database and facilities of
the DBMS.
Have complex requirements
Have higher level queries
Are most of the time engineers, scientists, business analysts, etc
14
3. Casual Users
Users who access the database occasionally.
Need different information from the database each time.
Use sophisticated database queries to satisfy their needs.
Are most of the time middle to high level managers.
These users can be again classified as “Actors on the Scene” and “Workers
Behind the Scene”.
15
ANSI-SPARC Architecture
The purpose and origin of the Three-Level database
architecture
All users should be able to access same data. This is important since
the database is having a shared data feature where all the data is
stored in one location and all users will have their own customized
way of interacting with the data.
A user's view is unaffected or immune to changes made in other
views. Since the requirement of one user is independent of the other, a
change made in one user’s view should not affect other users.
Users should not need to know physical database storage details. As
there are naïve users of the system, hardware level or physical details
should be a black-box for such users.
DBA should be able to change database storage structures without
affecting the users' views. A change in file organization, access method
should not affect the structure of the data which in turn will have no
effect on the users.
Internal structure of database should be unaffected by changes to
physical aspects of storage, such as change of hard disk
DBA should be able to change conceptual structure of database
without affecting all users. In any database system, the DBA will have
the privilege to change the structure of the database, like adding tables,
adding and deleting an attribute, changing the specification of the
objects in the database.
All of the above and much more functionalities are possible due to the
three level ANSI-SPARC architecture.
16
User1 User2 Usern
External
View 1 View 2 Viewn
level
Conceptual Conceptual
level schema
Internal Internal
level schema
Physical data
organization
Database
17
ANSI-SPARC Architecture and Database Design Phases
External
scheme.
Conceptual
i
Logical/conceprul de'8ba= design
schema
Internal
schema i
Physical database design
Pt1ysical
storage
t
External Level: Users' view of the database. It describes that part of database
that is relevant to a particular user. Different users have their own
customized view of the database independent of other users.
18
The following example can be taken as an illustration for the difference between
the three levels in the ANSI-SPARC database Architecture. Where:
The first level is concerned about the group of users and their
respective data requirement independent of the other.
The second level is describing the whole content of the database
where one piece of information will be represented once.
The third level
External view 1 External view 2
Conceptual level
\
Staff Ko FKame I L'J,:une I DOTI
I
Salary Branch No
siruct .S'l:-\J.'F {
iru Staff J\:o;
int Branch No;
char F'Ja.me fl '51;
Internal level
char I .Narnc fl '5];
struct date Datc_of__l3irth;
float Salary;
struci STi\.FF "next; ('' pointer lo next Stall record */
} .
·'
index Staff Ko; index Branch Ko; t= define indexes for stall' *i
19
Data Independence
Logical Data Independence:
Refers to immunity of external schemas to changes in conceptual
schema.
Conceptual schema changes e.g. addition/removal of entities
should not require changes to external schema or rewrites of
application programs.
The capacity to change the conceptual schema without having to
change the external schemas and their application programs.
Physical Data Independence
The ability to modify the physical schema without changing the
logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels and
components should be well defined so that changes in some parts
do not seriously influence others.
The capacity to change the internal schema without having to
change the conceptual schema
Ex1emal/con cep1ual
mapping \ I Logical data in dependence
\ I
Conceptual
schema
Internal
schema
20
Database Languages
Data Definition Language (DDL)
Allows DBA or user to describe and name entitles, attributes and
relationships required for the application.
Specification notation for defining the database schema
21
A Classification of data models
Data Model
A specific DBMS has its own specific Data Definition Language to define a
database schema, but this type of language is too low level to describe the
data requirements of an organization in a way that is readily
understandable by a variety of users.
22
1. Hierarchical Model
The simplest data model
Record type is referred to as node or segment
The top node is the root node
Nodes are arranged in a hierarchical structure as sort of upside-
down tree
A parent node can have more than one child node
A child node can only have one parent node
The relationship between parent and child is one-to-many
Relation is established by creating physical link between stored
records (each is stored with a predefined access path to other
records)
To add new record type or relationship, the database must be
redefined and then stored in a new form.
Department
Employee Job
23
2. Network Model
Allows record types to have more than one parent unlike
hierarchical model
A network data models sees records as set members
Each set has an owner and one or more members
Allow no many to many relationship between entities
Like hierarchical model network model is a collection of physically
linked records.
Allow member records to have more than one owner
Department Job
Employee
Activity
Time Card
24
3. Relational Data Model
Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A
Relational Model for Large Shared Data Banks')
Terminologies originates from the branch of mathematics called set
theory and predicate logic and is based on the mathematical concept
called Relation
Can define more flexible and complex relationship
Viewed as a collection of tables called “Relations” equivalent to
collection of record types
Relation: Two dimensional table
Stores information or data in the form of tables rows and columns
A row of the table is called tuple equivalent to record
A column of a table is called attribute equivalent to fields
Data value is the value of the Attribute
Records are related by the data stored jointly in the fields of records in
two tables or files. The related tables contain information that creates
the relation
The tables seem to be independent but are related some how.
No physical consideration of the storage is required by the user
Many tables are merged together to come up with a new virtual view
of the relationship
Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field
25