0% found this document useful (0 votes)
13 views13 pages

Chapter 1

The document discusses databases and database management systems. It defines what a database is, describes properties of databases, and provides examples of different types of databases. It also explains what a database management system is and its key functions like defining, constructing, manipulating and sharing databases. The document contrasts file-based systems with database systems, outlining disadvantages of file-based systems.

Uploaded by

fentahunmuluye23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

Chapter 1

The document discusses databases and database management systems. It defines what a database is, describes properties of databases, and provides examples of different types of databases. It also explains what a database management system is and its key functions like defining, constructing, manipulating and sharing databases. The document contrasts file-based systems with database systems, outlining disadvantages of file-based systems.

Uploaded by

fentahunmuluye23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

ChapterOne-Introduction

1.1. Introduction to database system


A database is a collection of related data.By data,we mean known facts that can berecorded and
that have implicit meaning. For example, consider the names, telephonenumbers, and addresses
of the people you know. You may have recorded thisdata in an indexed address book or you may
have stored it on a hard drive, using apersonal computer and software such as Microsoft Access
or Excel. This collectionof related data with an implicit meaning is a database.

The preceding definition of database is quite general; for example, we may considerthe
collection of words that make up this page of text to be related data and hence toconstitute a
database. However, the common use of the term database is usuallymore restricted. A database
has the following implicit properties:

 A database represents some aspect of the real world, sometimes called themini-world or
the universe of discourse (UoD). Changes to the miniworldare reflected in the database.
 A database is a logically coherent collection of data with some inherentmeaning. A
random assortment of data cannot correctly be referred to as adatabase.
 A database is designed, built, and populated with data for a specific purpose.It has an
intended group of users and some preconceived applications inwhich these users are
interested.

In other words, a database has some source from which data is derived, some degreeof
interaction with events in the real world, and an audience that is actively interested in its
contents. The end users of a database may perform business transactions(for example, a customer
buys a camera) or events may happen (for example, anemployee has a baby) that cause the
information in the database to change. In orderfor a database to be accurate and reliable at all
times, it must be a true reflection ofthe mini-world that it represents.Therefore, changes must be
reflected in the databaseas soon as possible.

A database can be of any size and complexity. For example, the list of names and addresses
referred to earlier may consist of only a few hundred records, each with a simple structure. On
the other hand, the computerized catalog of a large library may contain half a million entries
organized under different categories—by primary author’s last name, by subject, by book title—
with each category organized alphabetically.

A database of even greater size and complexity is maintained by the Internal Revenue Service
(IRS) to monitor tax forms filed by taxpayers. If we assume that there are 100 million taxpayers
and each taxpayer files an average of five forms with approximately 400 characters of
information per form, we would have a database of 100 × 10 × 400 × 5 characters (bytes) of
6

information. If the IRS keeps the past three returns of each taxpayer in addition to the current
return, we would have a database of 8 × 10 bytes (800 gigabytes). This huge amount of
11

information must be organized and managed so that users can search for, retrieve, and update the
data as needed.

A database may be generated and maintained manually or it may be computerized.For example,


a library card catalog is a database that may be created and maintained manually. A
computerized database may be created and maintained either by a group of application programs
written specifically for that task or by a database management system.

A database management system (DBMS) is a collection of programs that enables users to


create and maintain a database. The DBMS is a general-purpose software system that facilitates
the processes of defining, constructing, manipulating, and sharing databases among various
users and applications.

Defining a database involves specifying the data types, structures, and constraints of the data to
be stored in the database.The database definition or descriptive information is also stored by the
DBMS in the form of a database catalog or dictionary; it is called meta-data.

Constructing the database is the process of storing the data on some storage medium that is
controlled by the DBMS.

Manipulating a database includes functions such as querying the database to retrieve specific
data, updating the database to reflect changes in the mini-world, and generating reports from the
data.

Sharing a database allows multiple users and programs to access the database simultaneously.

An application program accesses the database by sending queries or requests for data to the
DBMS.
A querytypically causes some data to be retrieved;

A transaction may cause some data to be read and some data to be written into the database.

Other important functions provided by the DBMS include protecting the database and
maintaining it over a long period of time.

Protection includes system protection against hardware or software malfunction (or crashes) and
security protection against unauthorized or malicious access. A typical large database may have
a life cycle of many years, so the DBMS must be able to maintain the database system by
allowing the system to evolve as requirements change over time.

It is not absolutely necessary to use general-purpose DBMS software to implement a


computerized database. We could write our own set of programs to create and maintain the
database, in effect creating our own special-purpose DBMS software. In either case—whether
we use a general-purpose DBMS or not—we usually have to deploy a considerable amount of
complex software. In fact, most DBMSs are very complex software systems. We will call the
database and DBMS software together a database system.The end user accesses the database
system through application programs and queries.

1.2. File based versus Database approach

Storage of large amounts of data has always been a matter of huge concern. In early days, file-
based systems were used. In this system, data was stored in discrete files and a collection of such
files was stored on a computer. These could be accessed by a computer operator. Files of
archived data were called tables because they looked like tables used in traditional file keeping.
Rows in the table were called records and columns were called fields.
Conventionally, before the database systems evolved, data in software systems was stored in flat
files.
Disadvantages of File-based Systems
In a file-based system, different programs in the same application may be interacting with
different private data files. There is no system enforcing any standardized control on the
organization and structure of these data files.

Data redundancy and inconsistency


Since data resides in different private data files, there are chances of redundancy and resulting
inconsistency.
Consider the duplication of data between the payroll and HR-personnel departments. If an
employee moves house and the change of address is communicated only to personnel and not to
payroll, the person’s pay slip will be sent to the wrong address.A more serious problem occurs if
an employee is promoted to a more senior position with an associated increase in salary. Again,
the change is notified to personnel but the change does not filter through payroll. And the
employee may be getting the wrong salary.

Three types of anomalies occurs

• Modification Anomalies

• Deletion Anomalies

• Insertion Anomalies

Unanticipated queries
In a file-based system, handling sudden/ad-hoc queries can be difficult, since it requires changes
in the existing programs. For example, the bank officer needs to generate a list of all the
customers who have an account balance of $20,000 or more. The bank officer has two choices:
either obtain the list of all customers and have the needed information extracted manually, or hire
a system programmer to design the necessary application program. Both alternatives are
obviously unsatisfactory. Suppose that such a program is written, and several days later, the
officer needs to trim that list to include only those customers who have opened their account one
year ago. As the program to generate such a list does not exist, it leads to a difficulty in accessing
the data.

Data isolation
Data are scattered in various files, and files may be in a different format. Though data used by
different programs in the application may be related, they reside as isolated data files.

Concurrent access anomalies


In large multi-user systems, the same file or record may need to be accessed by multiple users
simultaneously. Handling this in a file-based system is difficult.

Security problems
In data-intensive applications, security of data is a major concern. Users should be given access
only to required data and not to the whole database.
For example, in a banking system, payroll personnel need to view only that part of the database
that has information about the various bank employees. They do not need access to information
about customer accounts. Since application programs are added to the system in an ad-hoc
manner, it is difficult to enforce such security constraints. In a file-based system, this can be
handled only by additional programming in each application.

Integrity problems
In any application, there will be certain data integrity rules, which need to be maintained. These
could be in the form of certain conditions/constraints on the elements of the data records. In the
savings bank application, one such integrity rule could be 'Customer ID, which is the unique
identifier for a customer record, should not be empty'. There can be several such integrity rules.

In a file-based system, all these rules need to be explicitly programmed in the application
program. Though all these are common issues of concern to any data-intensive application, each
application had to handle all these problems on its own. The application programmer needs to
bother not only about implementing the application business rules but also, about handling these
common issues.

1.3. Database approach


The benefits of acquiring a DBMS are not so easy to measure and quantify. A DBMS has several
intangible advantages over traditional file systems, such as ease of use, consolidation of
company-wide information, wider availability of data, and faster access to information. With
Web-based access, certain parts of the data can be made globally accessible to employees as well
as external users. More tangible benefits include reduced application development cost, reduced
redundancy of data, and better control and security. Although databases have been firmly
entrenched in most organizations, the decision of whether to move an application from a file-
based to a database-centered approach comes up frequently.
This move is generally driven by the following factors:

1. Data complexity: As data relationships become more complex, the need for a DBMS is
felt more strongly.
2. Sharing among applications: The greater the sharing among applications, the more the
redundancy among files, and hence the greater the need for a DBMS.
3. Dynamically evolving or growing data: If the data changes constantly, it is easier to cope
with these changes using a DBMS than using a file system.
4. Frequency of ad hoc requests for data: File systems are not at all suitable for ad hoc
retrieval of data.
5. Data volume and need for control: The sheer volume of data and the need to control it
sometimes demands a DBMS.
1.4. Characteristics of the Database Approach
.The following are characteristics of database

 Self-Describing Nature of a Database System


 Insulation between Programs and Data, and Data Abstraction
 Support of Multiple Views of the Data
 Sharing of Data and Multiuser Transaction Processing

1.4.1 Self-Describing Nature of a Database System

The database system contains complete definition or description of the database structure and
constraints. This definition is stored in the system catalog, which contains information such as
the structure of each file, the type and storage format of each data item, and various constraints
on the data. The information stored in the catalog is called meta-data, and it describes the
structure of the primary database.

1.4.2 Insulation between Programs and Data, and Data Abstraction

The structure of data files is stored in the DBMS catalog separately from the access programs.
We call this property program-data independence. If we want to add another piece of data to
each STUDENT record, say the Birth-date, in a DBMS environment, we just need to change the
description of STUDENT records in the catalog to reflect the inclusion of the new data item Birth-
date; no programs are changed. The next time a DBMS program refers to the catalog, the new

1.4.3 Support of Multiple Views of the Data

A database typically has many users, each of whom may require a different perspective or view
of the database. A multiuser DBMS whose users have a variety of applications must provide
facilities for defining multiple views. For example, one user of the database may be interested
only in the transcript of each student; A second user, who is interested only in checking that
students have taken all the prerequisites of each course they register.

1.4.4 Sharing of Data and Multiuser Transaction Processing


A multiuser DBMS, as its name implies, must allow multiple users to access the database at the
same time. This is essential if data for multiple applications is to be integrated and maintained in
a single database. The DBMS must include concurrency control software to ensure that several
users trying to update the same data do so in a controlled manner so that the result of the updates
is correct. Examples of database applications include the following:

• Computerized library systems

• Automated teller machines

• Flight reservation systems

• Computerized parts inventory systems

1.5. Actors and Users of Database System


For a small personal database, such as the list of addresses, one person typically defines,
constructs, and manipulates the database. However, many persons are involved in the design,
use, and maintenance of a large database with a few hundred users. The people whose jobs
involve the day-to-day use of a large database; we call them the "actors on the scene."And we
consider people who may be called "workers behind the scene"—those who work to maintain the
database system environment, but who are not actively interested in the database itself.

Database Administrator (DBA) is responsible for authorizing accessto the database,


coordinating and monitoring its use, and acquiring software andhardware resources as needed.

Database designers are responsible for identifying the data to be stored in the databaseand for
choosing appropriate structures to represent and store this data.

End users are the people whose jobs require access to the database for querying,updating, and
generating reports; the database primarily exists for their use. Thereare several categories of end
users:

 Casual end users occasionally access the database, but they may need
differentinformation each time. They use a sophisticated database query languageto
specify their requests and are typically middle- or high-level managers orother occasional
browsers.
 Naive or parametric end users make up a sizable portion of database endusers. Their
main job function revolves around constantly querying andupdating the database, using
standard types of queries and updates—calledcanned transactions—that have been
carefully programmed and tested. Thetasks that such users perform are varied:

Bank tellers check account balances and post withdrawals and deposits.

Reservation agents for airlines, hotels, and car rental companies checkavailability for a
given request and make reservations.Employees at receiving stations for shipping
companies enter packageidentifications via bar codes and descriptive information through
buttonsto update a central database of received and in-transit packages.

 Sophisticated end users include engineers, scientists, business analysts, andothers who
thoroughly familiarize themselves with the facilities of theDBMS in order to implement
their own applications to meet their complexrequirements.
 Standalone users maintain personal databases by using ready-made programpackages
that provide easy-to-use menu-based or graphics-based interfaces. An example is the user
of a tax package that stores a variety of personalfinancial data for tax purposes.

A typical DBMS provides multiple facilities to access a database. Naive end usersneed to
learn very little about the facilities provided by the DBMS; they simply haveto
understand the user interfaces of the standard transactions designed and implementedfor
their use. Casual users learn only a few facilities that they may userepeatedly.
Sophisticated users try to learn most of the DBMS facilities in order toachieve their
complex requirements. Standalone users typically become very proficientin using a
specific software package.

 System analysts determine the requirements of end users, especially naive andparametric
end users, and develop specifications for standard canned transactionsthat meet these
requirements.
 Application programmers implement these specificationsas programs; then they test,
debug, document, and maintain these cannedtransactions. Such analysts and programmers
—commonly referred to as softwaredevelopers or software engineers—should be
familiar with the full range ofcapabilities provided by the DBMS to accomplish their
tasks.
DBMS Languages

Once the design of a database is completed and a DBMS is chosen to implement the database,
the first order of the day is to specify conceptual and internal schemas for the database and any
mappings between the two. In many DBMSs where no strict separation of levels is maintained,
one language, called the data definition language (DDL), is used by the DBA and by database
designers to define both schemas. The DBMS will have a DDL compiler whose function is to
process DDL statements in order to identify descriptions of the schema constructs and to store
the schema description in the DBMS catalog.

In DBMSs where a clear separation is maintained between the conceptual and internal levels, the
DDL is used to specify the conceptual schema only. Another language, the storage definition
language (SDL), is used to specify the internal schema. The mappings between the two schemas
may be specified in either one of these languages. For a true three-schema architecture, we
would need a third language, the view definition language (VDL), to specify user views and
their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both
conceptual and external schemas.

Once the database schemas are compiled and the database is populated with data, users must
have some means to manipulate the database. Typical manipulations include retrieval, insertion,
deletion, and modification of the data. The DBMS provides a data manipulation language (DML)
for these purposes.

2.2 DBMS Architecture and Data Independence


One fundamental characteristic of the database approach is that it provides some level of data
abstraction by hiding details of data storage that are not needed by most database users.We have
three distinct levels of data abstraction at which data items can be described. The levels form a
three level architecture comprising an external, a conceptual and an internal level.The objective
of the three-level architecture is to separate each users view of the database from the way it is
physically represented.

In a DBMS based on the three-schema architecture, each user group refers only to its own
external schema. Hence, the DBMS must transform a request specified on an external schema
into a request against the conceptual schema, and then into a request on the internal schema for
processing over the stored database.
E.g.

If the request is a database retrieval, the data extracted from the stored database must be
reformatted to match the user’s external view. The processes of transforming requests and results
between levels are called mappings.

In this architecture, schemas can be defined at the following three levels:

 The internal level has an internal schema, which describes the physical storage
structure of the database. The internal schema uses a physical data model and
describes the complete details of data storage and access paths for the database. data
actually exists only at the physical level

 The conceptual level has a conceptual schema, which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of
physical storage structures and concentrates on describing entities, data types,
relationships, user operations, and constraints. A high-level data model or an
implementation data model can be used at this level.

 The external or view level includes a number of external schemas or user views.
Each external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from that user group. A high-level data
model or an implementation data model can be used at this level.
2.2.2 Data Independence

The three-schema architecture can be used to explain the concept of data independence, which
can be defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level. We can define two types of data
independence:

 Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), or
to reduce the database (by removing a record type or data item). In the latter case,
external schemas that refer only to the remaining data should not be affected.
Application programs that reference the external schema constructs must work as
before, -after the conceptual schema undergoes a logical reorganization. Changes to
constraints can be applied also to the conceptual schema without affecting the
external schemas or application programs.

 Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas. Changes to the internal
schema may be needed because some physical files had to be reorganized—for
example, by creating additional access structures—to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not
have to change the conceptual schema.

2.1.1 Categories of Data Models


A data model: a collection of concepts that can be used to describe the structure of a database.
By structure of a database we mean the data types, relationships, and constraints that should
hold on the data. Most data models also include a set of basic operations for specifying
retrievals and updates on the database.

In addition to the basic operations provided by the data model, it is becoming more common to
include concepts in the data model to specify the dynamic aspect or behavior of a database
application. This allows the database designer to specify a set of valid user-defined operations
that are allowed on the database objects. An example of a user-defined operation could be
COMPUTE_GPA, which can be applied to a STUDENT object.

Databases can be differentiated based on functions and model of the data. A data model
describes a container for storing data, and the process of storing and retrieving data from that
container. The analysis and design of data models has been the basis of the evolution of
databases.
Many data models have been proposed, and we can categorize them according to the types of
concepts they use to describe the database structure.

 High-level or conceptual data models provide concepts that are close to the way many
users perceive data. Conceptual data models use concepts such as entities, attributes, and
relationships. An entity represents a real-world object or concept, such as an employee or
a project that is described in the database. An attribute represents some property of
interest that further describes an entity, such as the employee’s name or salary. A
relationship among two or more entities represents an interaction among the entities.
 Low-level or physical data models provide concepts that describe the details of how
data is stored in the computer. Concepts provided by low-level data models are generally
meant for computer specialists, not for typical end users. Physical data models describe
how data is stored in the computer by representing information such as record formats,
record orderings, and access paths. An access path is a structure that makes the search for
particular database records efficient.
 Representational (or implementation) data models, which provide concepts that may
be understood by end users but that are not too far removed from the way data is
organized within the computer. Representational data models hide some details of data
storage but can be implemented on a computer system in a direct way.

You might also like