Unit 1
Unit 1
Unit 1
1.1 Introduction
Data - Data is meaningful known raw facts that can be processed and stored as
information.
Database - Database is a collection of interrelated and organized
data. In general, it is a collection of files (tables).
DBMS - Database Management System (DBMS) is a collection of interrelated data
[usually called database] and a set of programs to access, update and manage those data
[which form part of management system]
OR
It is a software package to facilitate creation and maintenance of computerized database.
Importance: Database systems have become an essential component of life in modern society,
in that many frequently occurring events trigger the accessing of at least one database:
bibliographic library searches, bank transactions, hotel/airline reservations, grocery store
purchases, online (Web) purchases, etc., etc.
The applications mentioned above are all "traditional" ones for which the use of rigidly-
structured textual and numeric data suffices. Recent advances have led to the application of
database technology to a wider class of data. Examples include multimedia databases
(involving pictures, video clips, and sound messages) and geographic databases (involving
maps, satellite images).
Also, database search techniques are applied by some WWW search engines.
Definitions
The term database is often used, rather loosely, to refer to just about any collection of related
data. E&N say that, in addition to being a collection of related data, a database must have the
following properties:
It represents some aspect of the real (or an imagined) world, called the miniworld or
universe of discourse. Changes to the miniworld are reflected in the database. Imagine,
for example, a UNIVERSITY miniworld concerned with students, courses, course
sections, grades, and course prerequisites.
It is a logically coherent collection of data, to which some meaning can be attached.
(Logical coherency requires, in part, that the database not be self-contradictory.)
It has a purpose: there is an intended group of users and some preconceived applications
that the users are interested in employing.
To summarize: a database has some source (i.e., the miniworld) from which data are derived,
some degree of interaction with events in the represented miniworld (at least insofar as the data
is updated when the state of the miniworld changes), and an audience that is interested in using
it.
An Aside: data vs. information vs. knowledge: Data is the representation of "facts" or
"observations" whereas information refers to the meaning thereof (according to some
interpretation). Knowledge, on the other hand, refers to the ability to use information to achieve
intended ends.
Computerized vs. manual: Not surprisingly (this being a CS course), our concern will be with
computerized database systems, as opposed to manual ones, such as the card catalog-based
systems that were used in libraries in ancient times (i.e., before the year 2000). (Some authors
wouldn't even recognize a non-computerized collection of data as a database, but E&N do.)
Size/Complexity: Databases run the range from being small/simple (e.g., one person's recipe
database) to being huge/complex (e.g., Amazon's database that keeps track of all its products,
customers, and suppliers).
More specifically, a DBMS is a general purpose software system facilitating each of the
following (with respect to a database):
definition: specifying data types (and other constraints to which the data must conform)
and data organization
construction: the process of storing the data on some medium (e.g., magnetic disk) that
is controlled by the DBMS
manipulation: querying, updating, report generation
sharing: allowing multiple users and programs to access the database "simultaneously"
system protection: preventing database from becoming corrupted when hardware or
software failures occur
security protection: preventing unauthorized or malicious access to database.
Given all its responsibilities, it is not surprising that a typical DBMS is a complex piece of
software.
A database together with the DBMS software is referred to as a database system.
An Example:
UNIVERSITY database
Among the main ideas illustrated in this example is that each file/relation/table has a set of
named fields/attributes/columns, each of which is specified to be of some data type. (In addition
to a data type, we might put further restrictions upon a field, e.g., GRADE_REPORT must have
a value from the set {'A', 'B', ..., 'F'}.)
The idea is that, of course, each table will be populated with data in the form of
records/tuples/rows, each of which represents some entity (in the miniworld) or some
relationship between entities.
For example, each record in the STUDENT table represents a student. Similarly for the
COURSE and SECTION tables.
student and a section of a course. And between two
On the other hand, each record in courses.
GRADE_REPORT represents a relationship between a
each record in PREREQUISITE represents a relationship
Of course, a query/update must be conveyed to the DBMS in a precise way (via the query
language of the DBMS) in order to be processed.
As with software in general, developing a new database (or a new application for an existing
database) proceeds in phases, including requirements analysis and various levels of design
(conceptual (e.g., Entity-Relationship Modeling), logical (e.g., relational), and physical (file
structures)).
For example, a university's Registrar's Office would maintain data (and programs) relevant to
student grades and course enrollments. The Bursar's Office would maintain data (and programs)
pertaining to fees owed by students for tuition, room and board, etc. (Most likely, the people in
these offices would not be in direct possession of their data and programs, but rather the
university's Information Technology Department would be responsible for providing services
such as data storage, report generation, and programming.)
One result of this approach is, typically, data redundancy, which not only wastes storage space
but also makes it more difficult to keep changing data items consistent with one another, as a
change to one copy of a data item must be made to all of them (called duplication-of-effort).
Inconsistency results when one (or more) copies of a datum are changed but not others. (E.g., If
you change your address, informing the Registrar's Office should suffice to ensure that your
grades are sent to the right place, but does not guarantee that your next bill will be, as the copy of
your address "owned" by the Bursar's Office might not have been changed.)
In the database approach, a single repository of data is maintained that is used by all the
departments in the organization. (Note that "single repository" is used in the logical sense. In
physical terms, the data may be distributed among various sites, and possibly mirrored.)
1. Self-Description: A database system includes —in addition to the data stored that is of
relevance to the organization— a complete definition/description of the database's
structure and constraints. This meta-data (i.e., data about data) is stored in the so-called
system catalog, which contains a description of the structure of each file, the type and
storage format of each field, and the various constraints on the data (i.e., conditions that
the data must satisfy).
The system catalog is used not only by users (e.g., who need to know the names of tables
and attributes, and sometimes data type information and other things), but also by the
DBMS software, which certainly needs to "know" how the data is structured/organized in
order to interpret it in a manner consistent with that structure. Recall that a DBMS is
general purpose, as opposed to being a specific database application. Hence, the structure
of the data cannot be "hard-coded" in its programs (such as is the case in typical file
processing approaches), but rather must be treated as a "parameter" in some sense.
If, for some reason, we decide to change the structure of the data (e.g., by adding the first
two digits to the YEAR field, in order to make the program Y2K compliant!), every
application in which a description of that file's structure is hard-coded must be changed!
In contrast, DBMS access programs, in most cases, do not require such changes, because
the structure of the data is described (in the system catalog) separately from the programs
that access it and those programs consult the catalog in order to ascertain the structure of
the data (i.e., providing a means by which to determine boundaries between records and
between fields within records) so that they interpret that data properly.
In other words, the DBMS provides a conceptual or logical view of the data to
application programs, so that the underlying implementation may be changed without the
programs being modified. (This is referred to as program-data independence.)
Also, which access paths (e.g., indexes) exist are listed in the catalog, helping the DBMS
to determine the most efficient way to search for items in response to a q uery.
Data Abstraction:
is used to hide storage details and present the users with a conceptual view of the
A data model
database.
Programs refer to the data model constructs rather than data storage details
Note: In fairness to COBOL, it should be pointed out that it has a COPY feature that
allows different application programs to make use of the same file descriptor stored in a
"library". This provides some degree of program-data independence, but not nearly as
much as a good DBMS does. End of note.
Example by which to illustrate this concept: Suppose that you are given the task of
developing a program that displays the contents of a particular data file. Specifically,
each record should be displayed as follows:
Record #i:
value of first field
value of second field
...
...
value of last field
To keep things very simple, suppose that the file in question has fixed-length records of
57 bytes with six fixed-length fields of lengths 12, 4, 17, 2, 15, and 7 bytes, respectively,
all of which are ASCII strings. Developing such a program would not be difficult.
However, the obvious solution would be tailored specifically for a file having the
particular structure described here and would be of no use for a file with a different
structure.
Now suppose that the problem is generalized to say that the program you are to develop
must be able to display any file having fixed-length records with fixed-length fields that
are ASCII strings. Impossible, you say? Well, yes, unless the program has the ability to
access a description of the file's structure (i.e., lengths of its records and the fields
therein), in which case the problem is not hard at all. This illustrates the power of
metadata, i.e., data describing other data.
A view designed for an academic advisor might give the appearance that the data is
structured to point out the prerequisites of each course.
A good DBMS has facilities for defining multiple views. This is not only convenient for
users, but also addresses security issues of data access. (E.g., The Registrar's Office view
should not provide any means to access financial data.)
4. Data Sharing and Multi-user Transaction Processing: As you learned about (or will)
in the OS course, the simultaneous access of computer resources by multiple
users/processes is a major source of complexity. The same is true for multi-user DBMS's.
Arising from this is the need for concurrency control, which is supposed to ensure that
several users trying to update the same data do so in a "controlled" manner so that the
results of the updates are as though they were done in some sequential order (rather than
interleaved, which could result in data being incorrect).
This gives rise to the concept of a transaction, which is a process that makes one or more
accesses to a database and which must have the appearance of executing in isolation from
all other transactions (even ones that access the same data at the "same time") and of
being atomic (in the sense that, if the system crashes in the middle of its execution, the
database contents must be as though it did not execute at all).
These apply to "large" databases, not "personal" databases that are defined, constructed, and used
by a single person via, say, Microsoft Access.
1. Database Administrator (DBA): This is the chief administrator, who oversees and
manages the database system (including the data and software). Duties include
authorizing users to access the database, coordinating/monitoring its use, acquiring
hardware/software for upgrades, etc. In large organizations, the DBA might have a
support staff.
2. Database Designers: They are responsible for identifying the data to be stored and for
choosing an appropriate way to organize it. They also define views for different
categories of users. The final design must be able to support the requirements of all the
user sub-groups.
3. End Users: These are persons who access the database for querying, updating, and
report generation. They are main reason for database's existence!
o Casual end users: use database occasionally, needing different information each
time; use query language to specify their requests; typically middle- or high-level
managers.
o Naive/Parametric end users: Typically the biggest group of users; frequently
query/update the database using standard canned transactions that have been
carefully programmed and tested in advance. Examples:
bank tellers check account balances, post withdrawals/deposits
for airlines, hotels, etc., check availability of seats/rooms and make
reservation clerks
reservations.
shipping clerks (e.g.,at UPS) who use buttons, bar code scanners, etc., to update status of
in-transit packages.
o Sophisticated end users: engineers, scientists, business analysts who implement
their own applications to meet their complex needs.
o Stand-alone users: Use "personal" databases, possibly employing a special-
purpose (e.g., financial) software package. Mostly maintain personal databases
using ready-to-use packaged applications.
o An example is a tax program user that creates its own internal database.
o Another example is maintaining an address book
4. System Analysts, Application Programmers, Software Engineers:
o System Analysts: determine needs of end users, especially naive and parametric
users, and develop specifications for canned transactions that meet these needs.
o Application Programmers: Implement, test, document, and maintain programs
that satisfy the specifications mentioned above.
Capabilities/Advantages of DBMS's
On the other hand, redundancy can be used to improve performance of queries. Indexes,
for example, are entirely redundant, but help the DBMS in processing queries more
quickly.
The query processing and optimization module is responsible for choosing an efficient
query execution plan for each query submitted to the system. (See Chapter 15.)
5. Providing Backup and Recovery: The subsystem having this responsibility ensures that
recovery is possible in the case of a system crash during execution of one or more
transactions.
6. Providing Multiple User Interfaces: For example, query languages for casual users,
programming language interfaces for application programmers, forms and/or command
codes for parametric users, menu-driven interfaces for stand-alone users.
7. Representing Complex Relationships Among Data: A DBMS should have the
capability to represent such relationships and to retrieve related data quickly.
8. Enforcing Integrity Constraints: Most database applications are such that the semantics
(i.e., meaning) of the data require that it satisfy certain restrictions in order to make sense.
Perhaps the most fundamental constraint on a data item is its data type, which specifies the
universe of values from which its value may be drawn. (E.g., a Grade field could be
defined to be of type Grade_Type, which, say, we have defined as including precisely the
values in the set { "A", "A-", "B+", ..., "F" }.
Another kind of constraint is referential integrity, which says that if the database includes
an entity that refers to another one, the latter entity must exist in the database. For
example, if (R56547, CIL102) is a tuple in the Enrolled_In relation, indicating that a
student with ID R56547 is taking a course with ID CIL102, there must be a tuple in the
Student relation corresponding to a student with that ID.
9. Permitting Inferencing and Actions Via Rules: In a deductive database system, one
may specify declarative rules that allow the database to infer new data! E.g., Figure out
which students are on academic probation. Such capabilities would take the place of
application programs that would be used to ascertain such information otherwise.
Active database systems go one step further by allowing "active rules" that can be used to
initiate actions automatically.
Object-Oriented Database Management Systems (OODBMSs) were introduced in late 1980s and
early 1990s to cater to the need of complex data processing in CAD and other applications.
Their use has not taken off much.
Many relational DBMSs have incorporated object database concepts, leading to a new category
called object-relational DBMSs (ORDBMSs)
Extended relational systems add further capabilities (e.g. for multimedia data, XML, and other
data types)
Relational DBMS Products emerged in the 1980s
Data on the Web and E-commerce Applications:
Web contains data in HTML (Hypertext markup language) with links
among pages.
This has given rise to a new set of applications and E-commerce is using
new standards like XML (eXtended Markup Language).
Script programming languages such as PHP and JavaScript allow
generation of dynamic Web pages that are partially generated from a
database
New functionality is being added to DBMSs in the following areas:
Scientific Applications
XML (eXtensible Markup Language)
Image Storage and Management
Audio and Video data management
Data Warehousing and Data Mining
o Spatial data management
Time Series and Historical Data Management
o The above gives rise to new research and development in incorporating
new data types, complex data structures, new operations and storage and indexing
schemes in database systems.
Also allow database updates through Web pages
It is general purpose software that facilitates the following:
1. Defining: Specifying data types and structures, and constraints for data to be
stored.
2. Constructing: Storing data in a storage medium.
3. Manipulating: Involves querying, updating and generating reports.
4. Sharing: Allowing multiple users and programs to access data simultaneously.
Eg. Of DBMS- Access, dBase, FileMaker Pro, and FoxBASE, ORACLE etc.
1. To provide a way to store and retrieve database information that is both convenient and
efficient.
2. To manage large and small bodies of information. It involves defining structures for
storage of information and providing mechanism for manipulation of information.
3. It should ensure safety of information stored, despite system crashes or attempts at
unauthorized access.
4. Ifdataaretobesharedamongseveralusers,thensystemshouldavoidpossible
anomalous results.
Various views of Data
Data abstraction:
It can be summed up as follows.
1. When the DBMS hides certain details of how data is stored and maintained, it provides
what is called as the abstract view of data.
2. This is to simplify user-interaction with the system.
3. Complexity (of data and data structure) is hidden from users through several levels of
abstraction.
Features:
a) It is next-higher level of abstraction. Here whole Database is divided into small
simple structures.
b) Users at this level need not be aware of the physical-level complexity used to
implement the simple structures.
c) Here the aim is ease of use.
d) Generally, database administrators (DBAs) work at logical level of abstraction.
3. View level: Application programs hide details of data types. Views can also hide
information (e.g., salary) for security purposes.
Features:
a) It is the highest level of abstraction.
b) It describes only a part of the whole Database for particular group of users.
c) This view hides all complexity.
d) It exists only to simplify user interaction with system.
e) The system may provide many views for the whole system.
Data Models
A data model is a collection of concepts that can be used to describe the structure of a
database and provides the necessary means to achieve this abstraction whereas
structure of a database means the data types, relationships and constraints that should
hold on the data.
Collection of conceptual tools for describing data, data relationships, data semantics
and consistency constraints. The various data models that have been proposed fall into
three different groups. Object based logical models, record-based logical models and
physical models.
Object-Based Logical Models: They are used in describing data at the logical and view
levels. They are characterized by the fact that they provide fairly flexible structuring
capabilities and allow data constraints to be specified explicitly. There are many different
models and more are likely to come. Several of the more wiely known ones are:
·The E-R model
·The object-oriented model
·The semantic data model
·The functional data model
The E-R Model
The (E-R) data model is based on a p rc ption of a real worker that consists of a collection
of basic objects, called entiti s, and of r lationships among these objects.
The overall logical structure of a database can be expressed graphically by an E-R
diagram. Which is built up by t following components:
·Rectangles, which represent entity sets
·Ellipses, which represent attributes
· Diamonds, which rprsnt relationships among entity sets
· Lines, which link attributs to entity sets and entity sets to relationships.
E.g. suppose we have two entities like customer and account, then these two entities
can be modeled as follow:
The Object-Oriented Model
Like the E-R model the object-oriented model is based on a collection of objects. An object
contains values stored in instance variables within the object. An object also
contains bodies of code that operate on the object. These bodies of code are called
methods.
Classes: It is the collection of objects which consist of the same types of values and the
same methods.
E.g. account number & balance are instance variables; pay-interest is a method that uses
the above two variables and adds interest to the balance.
Semantic Models
These include the extended relational, the semantic network and the functional models.
They are characterized by their provision of richer facilities for capturing the meaning
of data objects and hence of maintaining database integrity. Systems based on these
models exist in monotype for at the time of writing and will begin to filter through the
next decade.
a. System complexity – In a network model, data are accessed one record at a time. This
makes it essential for the database designers, administrators, and programmers to be
familiar with the internal data structures to gain access to the data. Therefore, a user-
friendly database management system cannot be created using the network model.
b. Lack of structural independence – Making structural modifications to the database
is
very difficult in the network database model as the data access method is navigational.
Any changes made to the database structure require the application programs to be
modified before they can access data. Though the network atabase model achieves data
independence, it still fails to achieve structural independence.
700
CUSTOMER
A-222 700
A-305 350
1. Implementation Complexity –
Although the hierarchical
database model is
conceptually simple and easy to design, it
is quite complex to implement. The
database
designers should have very good
knowledge of the physicalata storage
characteristics.
2. Database management problems – If
you make any changes in the database
structure
We can explain the overall structure of DBMS/System structure and its components by
the diagram given below:
Figure: System Structure
1. Database systems are partitioned into modules for different functions. Some functions
(e.g. file systems) may be provided by the operating system.
2. Broadly the functional components of a database system are:
Advantages Disadvantages
Simpler to use Typicallydoesnotsupportmulti-user
access
Less expensive· Limited to smaller databases
Fitstheneedsofmanysmallbusinesses Limitedfunctionality(i.e.nosupportfor
and home users complicated transactions, recovery, etc.)
PopularFMS‘sarepackagedalongwith
theoperatingsystemsofpersonal
computers(i.e.MicrosoftCardfileand Decentralization ofata
Microsoft Works)
Good for database solutions for hand held Redundancy and Integrity issues
devices such as Palm Pilot
TechReg
Disadvantages of File Processing Syst m:
1. Data Redundancy – Since different programmers create the files and application
programs over a long period, t various files are likely to have different formats and the
programs may be written in several programming languages. Moreover, the same
information may be dupli ated in several files, this duplication of data over several files is
known as data redundan y. Eg. The address and telephone number of a particular
customer may app ar in a file that consists of saving- account records and in a file that
consists of checking account records. This redundancy leads to higher storage & excess
cost also leads to inconsistency discussed in the next.
2. Data Inconsistency – The various copies of same data may no longer agree i.e.
various copies of the same data may contain different information. Eg. A changed
customer address may be reflected in savings-account records but not elsewhere in the
system.
5. Integrity problems – The data stored in the database must satisfy certain types of
consistency constraints. Eg. The balance of a bank account may never fall below a
prescribed amount (say, ICICI 2500/- ). Developers enforce these constraints in the
system by adding appropriate code in the various application programs. However, when
new constraints are added, it is difficult to change the programs to enforce them. The
problem is compounded when constraints involve several data items from different files.
6. Atomicity problems – A computer system, like anyothermechanical or electrical device, is
subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to
the consistent state that existed prior to the failure. Eg. Beforesystem
8. Security Problems – Not every user of the database system should be able to access
all the data. Eg. In a bank system, payroll personnel need to see only that part of the
database that has information about the various bank employees. They do not need access
to information about customer accounts. But, since application programs are added to the
system in an ad hoc manner, enforcing such security constraints is difficult.
Difference between DBMS and File-processing system:
6. Several users can access data at tsame 6. Concurrent accesses may cause problems
time i.e concurrently without problems such as . Inconsistencies.
7.Securityfeatursanbeenabledin 7.Itmaybedifficulttoenforcesecurity
DBMS very easily. features.
• Minimal Data Redundancy - Since the whole data resides in one central database, the
various programs in the application can access data in different data files. Hence data
present in one file need not be duplicated in another.
This reduces data redundancy. However, this does not mean all redundancy can be
eliminated. There could be business or technical reasons for having some amount of
redundancy. Any such redundancy should be carefully controlled and the DBMS should
be aware of it.
• Data Consistency - Reduced data redundancy leads tobetter data consistency.
• Application Dev lopm nt Ease - The application programmer need not build the
functions for handling issu s like concurrent access, security, data integrity, etc. The
programmer only n ds to implement the application business rules. This brings in
application development ease. Adding additional functional modules is also easier than in
file based systems.
• Better Controls - Better controls can be achieved due to the centralized nature of the
system.
• Data Independence - The architecture of the DBMS can be viewed as a 3-level system
comprising the following:
- The internal or the physical level where the data resides.
- The conceptual level which is the level of the DBMS functions
- The external level which is the level of the application programs or the end user.
Data Independence is isolating an upper level from the changes in the organization or
structure of a lower level. For example, if changes in the file organization of a data file do
not demand for changes in the functions in the DBMS or in the application programs,
data independence is achieved. Thus Data Independence can be defined as immunity of
applications to change in physical representation and access technique. The provision of
data independence is a major objective for database systems.
• Reduced Maintenance - Maintenance is less and easy, again, due to the centralized
nature of the system.
Centralized DBMS:
• User can still connect through a remote terminal – however, all processing is done
at centralized site.
As prices of hardware declined, most users replaced their terminals with PCs and
workstations. At first database systems used these computers similarly to how they have
used is play terminals, so that DBMS itself was still a Centralized DBMS in which all the
DBMS functionality, application program execution and user interface processing were
carried out on one Machine
Basic 2-tier Client-Server Architectures
Clients
• Provide appropriate interfaces through a client software module to access and
utilize the various server resources.
• Clients may be diskless machines or PCs or Workstations with disks with only the
client software installed.
• Connected to the servers via some form of a network.
• (LAN: local area network, wireless network, etc.)
DBMS Server
• A client program may connect to several DBMSs, sometimes called the data
sources.
• In general, data sources can be files or other non-DBMS software that manages
data. Other variations of clients are possible: e.g., in some object DBMSs, more
functionality is transferred to clients including data dictionary functions,
optimization and recovery across multiple servers, etc.