0% found this document useful (0 votes)
100 views18 pages

Dbms U-1 (P-I) PDF

This document provides a history of database systems from the 1950s to present. It discusses the evolution from magnetic tapes for data storage to hard disks allowing direct data access. The relational model was introduced in 1970 and became widely adopted in commercial databases in the 1980s. The growth of the internet in the 1990s led to supporting web interfaces to data. More recent developments include XML and open-source databases gaining popularity in the 2000s. The introduction to DBMS discusses how databases aim to conveniently and efficiently store and retrieve large amounts of data while ensuring data integrity and safety. Database applications are widely used across various domains like banking, retail, education and more. Key advantages of DBMSs over file systems include reduced data redundancy, consistency

Uploaded by

Madness of books
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views18 pages

Dbms U-1 (P-I) PDF

This document provides a history of database systems from the 1950s to present. It discusses the evolution from magnetic tapes for data storage to hard disks allowing direct data access. The relational model was introduced in 1970 and became widely adopted in commercial databases in the 1980s. The growth of the internet in the 1990s led to supporting web interfaces to data. More recent developments include XML and open-source databases gaining popularity in the 2000s. The introduction to DBMS discusses how databases aim to conveniently and efficiently store and retrieve large amounts of data while ensuring data integrity and safety. Database applications are widely used across various domains like banking, retail, education and more. Key advantages of DBMSs over file systems include reduced data redundancy, consistency

Uploaded by

Madness of books
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-I: INTRODUCTION

History of Database Systems, Introduction to DBMS, Database System Applications, Database


Systems Versus File Systems, View of Data, Data Models, Database Languages - DDL and DML
Commands and Examples of Basic SQL Queries, Database Users and Administrators, Transaction
Management, Database System Structure, Application Architectures.

History of Database Systems


Information processing drives the growth of computers, as it has from the earliest
days of commercial computers. In fact, automation of data processing tasks
predates computers. Punched cards, invented by Herman Hollerith, were used at
the very beginning of the twentieth century to record U.S. census data, and
mechanical systems were used to process the cards and tabulate results. Punched
cards were later widely used as a means of entering data into computers.

Techniques for data storage and processing have evolved over the years:
1950s and early 1960s: Magnetic tapes were developed for data storage. Data
processing tasks such as payroll were automated, with data stored on tapes.
Processing of data consisted of reading data from one or more tapes and writing
data to a new tape. Data could also be input from punched card decks, and
output to printers.
For example, salary raises were processed by entering the raises on punched
cards and reading the punched card deck in synchronization with a tape
containing the master salary details. The records had to be in the same sorted
order. The salary raises would be added to the salary read from the master tape,
and written to a new tape; the new tape would become the new master tape. Tapes
(and card decks) could be read only sequentially, and data sizes were much larger
than main memory.

Late 1960s and 1970s: Widespread use of hard disks in the late 1960s changed
the scenario for data processing greatly, since hard disks allowed direct access to
data.
With disks, network and hierarchical databases could be created that allowed data
structures such as lists and trees to be stored on disk. Programmers could
construct and manipulate these data structures.
A landmark paper by Codd [1970] defined the relational model, the simplicity of
the relational model and the possibility of hiding implementation details
completely from the programmer were enticing indeed. Codd later won the
prestigious Association of Computing Machinery Turing Award for his work.

1980s: Although academically interesting, the relational model was not used in
practice initially, because of its perceived performance disadvantages; relational
databases could not match the performance of existing network and hierarchical
databases.
That changed with System R, a groundbreaking project at IBM Research that
developed techniques for the construction of an efficient relational database
system. Excellent overviews of System R are provided by Astrahan et al. [1976]
and Chamberlin et al. [1981]. The fully functional System R prototype led to IBM’s
first relational database product, SQL/DS. At the same time, the Ingres system
was being developed at the University of California at Berkeley.

It led to a commercial product of the same name. Initial commercial relational


database systems, such as IBM DB2, Oracle, Ingres, and DEC Rdb, played a
major role in advancing techniques for efficient processing of declarative queries.

The 1980s also saw much research on parallel and distributed databases, as well
as initial work on object-oriented databases.

Early 1990s: The SQL language was designed primarily for decision support
applications, which are query-intensive.

Many database vendors introduced parallel database products in this period.


Database vendors also began to add object-relational support to their databases.

1990s: The major event of the 1990s was the explosive growth of the World Wide
Web. Database systems also had to support Web interfaces to data.

2000s: The first half of the 2000s saw the emerging of XML and the associated
query language XQuery as a new database technology. Although XML is widely
used for data exchange, as well as for storing certain complex data types,
relational databases still form the core of a vast majority of large-scale database
applications.

This period also saw a significant growth in use of open-source database systems,
particularly PostgreSQL and MySQL

Introduction to DBMS
A database-management system (DBMS) is a collection of interrelated data and a
set of programs to access those data. The collection of data, usually referred to as
the database, contains information relevant to an enterprise. The primary goal of
a DBMS is to provide a way to store and retrieve database information that is both
convenient and efficient.
Database systems are designed to manage large bodies of information.
Management of data involves both defining structures for storage of information
and providing mechanisms for the manipulation of information. In addition, the
database system must ensure the safety of the information stored, despite system
crashes or attempts at unauthorized access. If data are to be shared among
several users, the system must avoid possible anomalous results.
Database System Applications

Databases are widely used. Here are some representative applications:


Enterprise Information

∑ Sales: For customer, product, and purchase information.


∑ Accounting: For payments, receipts, account balances, assets and other
accounting information.
∑ Human resources: For information about employees, salaries, payroll taxes,
and benefits, and for generation of paychecks.
∑ Manufacturing: For management of the supply chain and for tracking
production of items in factories, inventories of items in warehouses and
stores, and orders for items.
∑ Online retailers: For sales data noted above plus online order tracking,
generation of recommendation lists, and maintenance of online product
evaluations.
Banking and Finance
∑ Banking: For customer information, accounts, loans, and banking
transactions.
∑ Credit card transactions: For purchases on credit cards and generation of
monthly statements.
∑ Finance: For storing information about holdings, sales, and purchases of
financial instruments such as stocks and bonds; also for storing real-time
market data to enable online trading by customers and automated trading
by the firm.
Universities: For student information, course registrations, and grades (in
addition to standard enterprise information such as human resources and
accounting).
Airlines: For reservations and schedule information. Airlines were among the first
to use databases in a geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about the
communication networks.

Database Systems versus File Systems

The conventional file processing system suffers from the following shortcomings.
1. Data Redundancy
2. Data Inconsistency
3. Difficulty in Accessing Data
4. Data Isolation
5. Integrity Problems
6. Atomicity Problem
7. Concurrent Access anomalies
8. Security Problems

∑ Data Redundancy: Data Redundancy means same information is


duplicated in several files. This makes data redundancy.
∑ Data Inconsistency: Data Inconsistency means different copies of the same
data are not matching. That means different versions of same basic data are
existing. This occurs as the result of update operations that are not
updating the same data stored at different places.
∑ Example: Address Information of a customer is recorded differently
in different files.
∑ Difficulty in Accessing Data: It is not easy to retrieve information using a
conventional file processing system. Convenient and efficient information
retrieval is almost impossible using conventional file processing system.
∑ Data Isolation: Data are scattered in various files, and the files may be in
different format, writing new application program to retrieve data is difficult.
∑ Integrity Problems: The data values may need to satisfy some integrity
constraints. For example the balance field Value must be greater than 5000.
We have to handle this through program code in file processing systems.
But in database we can declare the integrity constraints along with
definition itself.
∑ Atomicity Problem: It is difficult to ensure atomicity in file processing
system. For example transferring $100 from Account A to account B. If a
failure occurs during execution there could be situation like $100 is
deducted from Account A and not credited in Account B.
∑ Concurrent Access anomalies: If multiple users are updating the same
data simultaneously it will result in inconsistent data state. In file
processing system it is very difficult to handle this using program code. This
results in concurrent access anomalies.
∑ Security Problems: Enforcing Security Constraints in file processing
system is very difficult as the application programs are added to the system
in an ad-hoc manner.
Advantages of DBMS:

Reduced data redundancy: In the conventional file processing system, every user
group maintains its own files for handling its data files. This may lead to
¸ Duplication of same data in different files.
¸ Wastage of storage space, since duplicated data is stored.
¸ Errors may be generated due to updating of the same data in
different files.
¸ Time in entering data again and again is wasted.
¸ Computer Resources are needlessly used.
Elimination of inconsistency:
In the file processing system information is duplicated throughout the system. So
changes made in one file may be necessary be carried over to another file. This
may lead to inconsistent data. So we need to remove this duplication of data in
multiple file to eliminate inconsistency.
Better data integration:
Data integration involves combining data residing in different sources and
providing users with a unified view of these data. Since data of the organization
using database approach is centralized and would be used by a number of users
at a time, It is essential to enforce integrity-constraints.
Data independence from applications programs:
There are two types of data independence:
1. Logical data independence: This is the capacity to change the
conceptual schema without having to change external schema or application
programs. We can change the conceptual schema to expand the database or to
reduce the database.
2. Physical data independence: This is the capacity to change the internal
schema without having to change the conceptual or external schema
Improved data access to users:
The DBMS makes it possible to produce quick answers to ad hoc queries. From a
database perspective, a query is a specific request issued to the DBMS for data
manipulation (ex: for reading or updating the data). Ad hoc query is a spur-of-the-
moment question. The DBMS sends back an answer (query result set) to the
application.
Improved data security:
Security means that protection of data against unauthorized disclosure. In
conventional systems, applications are developed in an adhoc/temporary manner.
Often different system of an organization would access different components of the
operational data, in such an environment enforcing security can be quiet difficult.
Setting up of a database makes it easier to enforce security restrictions since data
is now centralized. Different checks can be established for each type of access
(retrieve, modify, delete etc.) to each piece of information in the database.
Improved data sharing:
DBMS allows data to be shared by two or more users that means the same data
can be accessed by multiple users at the same time.
Improved decision making:
Better managed data and improved data access make it possible to generate better
quality information, on which better decisions are based.
Concurrent access and crash recovery:
A DBMS schedules concurrent access to the data in such a manner that users
can think of the data is being accessed by only one user at a time. Further the
DBMS protects users from the effects of system failures.
Reduced application development time:
The DBMS supports many important functions that are common to many
applications. Hence it reduces the application development time.

View of Data
A database system is a collection of interrelated data and a set of programs that
allow users to access and modify these data. A major purpose of a database
system is to provide users with an abstract view of the data. That is, the system
hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. Since many
database-system users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users’ interactions
with the system:
∑ Physical level. The lowest level of abstraction describes how the data are
actually stored. The physical level describes complex low-level data
structures in detail.E.g. index, B-tree, hashing.
∑ Logical level. The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data.
∑ View level. The highest level of abstraction describes only part of the entire
database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The
system may provide many views for the same database.
Instances and Schemas
Databases change over time as information is inserted and deleted. The collection
of information stored in the database at a particular moment is called an instance
of the database. The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all.
The concept of database schemas and instances can be understood by analogy to
a program written in a programming language. A database schema corresponds to
the variable declarations (along with associated type definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables
in a program at a point in time correspond to an instance of a database schema.

Database systems have several schemas, partitioned according to the levels of


abstraction. The physical schema describes the database design at the physical
level, while the logical schema describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes called
subschema’s that describe different views of the database.
Data Independence
The ability to modify a scheme definition in one level without affecting a scheme
definition in a higher level is called data independence. There are two kinds:
∑ Physical data independence: The ability to modify the physical scheme
without causing application programs to be rewritten modifications at this
level are usually to improve performance
∑ Logical data independence: The ability to modify the conceptual scheme
without causing application programs to be rewritten Usually done when
logical structure of database is altered
Logical data independence is harder to achieve as the application programs are
usually heavily dependent on the logical structure of the data.

Data Models
The Evolution of Data Models
∑ Hierarchical
∑ Network
∑ Relational
∑ Entity relationship
∑ Object oriented
Hierarchical data model: -
This is one of the traditional data model developed in 1960’s. In this model, the
records are represented in the form of a tree. It consist of many levels, the top
most level of the tree is considered as root or parent of the below levels.
∑ Each parent can have many children
∑ Each child has only one parent
∑ Tree is defined by path that traces parent segments to child segments,
beginning from the left

Fig: Hierarchical structure


Example:-

Universities

JNTU SVU

SVNE SVCE SVUCE

Advantages:-
∑ Conceptual simplicity
∑ Database security
∑ Data independence
∑ Database integrity
∑ Efficiency
Disadvantages:-
∑ Complex implementation
∑ Difficult to manage
∑ Lacks structural independence
∑ Complex applications programming and use
∑ Implementation limitations

Network data model:-


It was developed in 1970’s in order to overcome the problem encountered in using
hierarchical models. This model was standardized by a standard organization
CODASYL (Conference on Data System Language), which is created by Database
Task Group (DBTG).
In this database is represented as a collection of records and relationship between
these records were defined in the form of links. It supports many-to-many
relationship.
Advantages:-
∑ Conceptual simplicity
∑ Handles more relationship types
∑ Data access flexibility
∑ Promotes database integrity
∑ Data independence
Disadvantages:-
∑ System complexity
∑ Lack of structural independence

Relational Model:-
It is one of the most popular data model developed in 1970’s by E.F.Codd. In this
model data and relationship between data is represented in the form of table
consisting of rows and columns. The columns represent set of attributes and rows
represent an instance of the entity. It performs same basic functions provided by
hierarchical and network DBMS systems, plus other functions.

Advantages
∑ Structural independence
∑ Improved conceptual simplicity
∑ Easier database design, implementation, management, and use
∑ Ad hoc query capability
∑ Powerful database management system
Disadvantages
∑ Substantial hardware and system software overhead
∑ Can facilitate poor design and implementation

Entity Relationship Model


It is one of the most widely accepted and adapted graphical tool for data modeling,
Introduced by Chen in 1976.Graphical representation of entities and their
relationships in a database structure.
Entity relationship diagram (ERD) uses graphic representations to model database
components and entity is mapped to a relational table
∑ Entity instance (or occurrence) is row in table
∑ Entity set is collection of like entities
∑ Connectivity labels types of relationships
o Diamond connected to related entities through a relationship
line

Advantages
∑ Exceptional conceptual simplicity
∑ Visual representation
∑ Effective communication tool
∑ Integrated with the relational data model
Disadvantages
∑ Limited constraint representation
∑ Limited relationship representation
∑ No data manipulation language
∑ Loss of information content

Object oriented data model:-

Object-oriented data model that can be seen as extending the E-R model with
notions of encapsulation, methods (functions), and object identity. Inheritance,
object identity, and encapsulation (information hiding), with methods to provide
an interface to objects, are among the key concepts of object-oriented
programming that have found applications in data modeling.

Advantages
∑ It stores large number of different data types including text,
audio, video, and graphics.
∑ It supports the concept of inheritance and polymorphism.

Disadvantages
∑ It is difficult to manage

Semi structured data model:-


Semi structured data models permit the specification of data where individual
data items of the same type may have different sets of attributes.

Example: - XML

Database Languages - DDL and DML Commands and Examples of


Basic SQL Queries:-

Data Definition Language (DDL)


∑ Used to specify a database scheme as a set of definitions expressed in a
DDL
∑ DDL statements are compiled, resulting in a set of tables stored in a special
file called a data dictionary or data directory.
∑ The data directory contains metadata (data about data)
∑ The storage structure and access methods used by the database system are
specified by a set of definitions in a special type of DDL called a data storage
and definition language
∑ basic idea: hide implementation details of the database schemes from the
users
Data Manipulation Language (DML)
∑ Data Manipulation is:
o retrieval of information from the database
o insertion of new information into the database
o deletion of information in the database
o modification of information in the database
∑ A DML is a language which enables users to access and manipulate data.
∑ The goal is to provide efficient human interaction with the system.
∑ There are two types of DML:
o procedural: the user specifies what data is needed and how to get it
o nonprocedural: the user only specifies what data is needed
ß Easier for user
ß May not generate code as efficient as that produced by
procedural languages
∑ A query language is a portion of a DML involving information retrieval only.
The terms DML and query language are often used synonymously.
_________________________________________________________________________________
Database Users and Administrators
A primary goal of a database system is to retrieve information from and store new
information into the database. People who work with a database can be
categorized as database users or database administrators.
The database users fall into several categories:
∑ Naive users are unsophisticated users who interact with the system by
using permanent application programs (e.g. automated teller machine).
∑ Application programmers are computer professionals interacting with the
system through DML calls embedded in a program written in a host
language (e.g. C, PL/1, Pascal).These programs are called application
programs.
∑ Sophisticated users interact with the system without writing programs.
They form requests by writing queries in a database query language.
∑ Specialized users are sophisticated users writing special database
application programs (tools).

Functions of a Database Administrator


Database Administrator
One of the main reasons for using DBMSs is to have central control of both the
data and the programs that access those data. The database administrator is a
person having central control over data and programs accessing that data. Duties
of the database administrator include:
∑ Scheme definition: the creation of the original database scheme. This
involves writing a set of definitions in a DDL (data storage and definition
language), compiled by the DDL compiler into a set of tables stored in the
data dictionary.
∑ Storage structure and access method definition: writing a set of
definitions translated by the data storage and definition language compiler
∑ Scheme and physical organization modification: writing a set of
definitions used by the DDL compiler to generate modifications to
appropriate internal system tables (e.g. data dictionary). This is done rarely,
but sometimes the database scheme or physical organization must be
modified.
∑ Granting of authorization for data access: granting different types of
authorization for data access to various users
∑ Routine maintenance: Examples of the database administrator’s routine
maintenance activities are:
∑ Periodically backing up the database, either onto tapes or onto
remote servers, to prevent loss of data in case of disasters such as
flooding.
∑ Ensuring that enough free disk space is available for normal
operations, and upgrading disk space as required.
∑ Monitoring jobs running on the database and ensuring that
performance is not degraded by very expensive tasks submitted by
some users.
_________________________________________________________________________________
Transaction Management
A transaction is a collection of operations that performs a single logical function
in a database application.
An example is a funds transfer, in which one department account (say A) is
debited and another department account (say B) is credited. Clearly, it is essential
that either both the credit and debit occur, or that neither occur. That is, the
funds transfer must happen in its entirety or not at all. This all-or-none
requirement is called atomicity.
In addition, it is essential that the execution of the funds transfer preserve the
consistency of the database. That is, the value of the sum of the balances of A and
B must be preserved. This correctness requirement is called consistency.
If any transaction fails it doesn’t affect the transaction is called isolation. Finally,
after the successful execution of a funds transfer, the new values of the balances
of accounts A and B must persist, despite the possibility of system failure. This
persistence requirement is called durability.
Each transaction is a unit of both atomicity and consistency. Thus, we require
that transactions do not violate any database consistency constraints. That is, if
the database was consistent when a transaction started, the database must be
consistent when the transaction successfully terminates. However, during the
execution of a transaction, it may be necessary temporarily to allow inconsistency,
since either the debit of A or the credit of B must be done before the other. This
temporary inconsistency, although necessary, may lead to difficulty if a failure
occurs.
It is the programmer’s responsibility to define properly the various transactions,
so that each preserves the consistency of the database. For example, the
transaction to transfer funds from the account of department A to the account of
department B could be defined to be composed of two separate programs: one that
debits account A, and another that credits account B. The execution of these two
programs one after the other will indeed preserves consistency.
Ensuring the atomicity and durability properties is the responsibility of the
database system itself—specifically, of the recovery manager. If any transaction
fails then the database system performs failure recovery that detects system
failures and restores the database to the state that existed prior to the occurrence
of the failure.
Finally, when several transactions update the database concurrently, the
consistency of data may no longer be preserved, even though each individual
transaction is correct. It is the responsibility of the concurrency-control
manager to control the interaction among the concurrent transactions, to ensure
the consistency of the database. The transaction manager consists of the
concurrency-control manager and the recovery manager.

Database System Structure


A database System is divided into modules based on their function. The functional
components of a database system can be broadly divided into the storage manager
and the query processor components.
∑ Storage Manager
∑ Query Processor
Storage Manager
The storage manager is important because database typically require a large
amount of storage space. So it is very important efficient use of storage, and to
minimize the movement of data to and from disk.
A storage manager is a program module that provides the interface between the
low-level data stored in the database and the application programs and the
queries submitted to the system. The Storage manager is responsible for the
interaction with the file manager. The Storage manager translates the various
DML statements into low level file system commands. Thus the storage manager is
responsible for storing, retrieving, and updating data in the database. The storage
manager components include the following.
∑ Authorization and Integrity Manager
∑ Transaction Manger
∑ File Manager and Buffer Manger.
∑ Authorization and Integrity Manger tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
∑ Transaction manager ensures that the database remains in a consistent
state and allowing concurrent transactions to proceed without conflicting.
∑ The file manager manages the allocation of space on disk storage and the
data structures used to represent information stored on disk.
∑ The Buffer manager is responsible for fetching the data from disk storage
into main memory and deciding what data to cache in main memory.
∑ The storage manager implements the following data structures as part of
the physical system implementation. Data File, Data Dictionary, Indices.
Data files stores the database itself. The Data dictionary stores metadata
about the structure of database, in particular the schema of the database.
Indices provide fast access to data items.
The Query Processor:-
The Query Processor simplifies and facilitates access to data. The Query processor
includes the following component.
∑ DDL Interpreter
∑ DML Compiler
∑ Query Evaluation Engine
∑ The DDL interpreter interprets DDL statements and records the definition in
the data dictionary. The DML compiler translates DML statements in a query
language into an evaluation plan consisting of low-level instructions that the
query evaluation engine understands.
∑ The DML compiler also performs query optimization that is it picks the lowest
cost evaluation plan from among the alternatives.
∑ Query evaluation engine executes low level instructions generated by the
DML compiler.

Application Architectures
Most users of a database system today are not present at the site of the database
system, but connect to it through a network. We can therefore differentiate
between client machines, on which remote database user’s work, and server
machines, on which the database system runs. Database applications are usually
partitioned into two or three parts, as in shown in below Figure.
In two-tier architecture, the application resides at the client machine, where it
invokes database system functionality at the server machine through query
language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server.
In contrast, in three-tier architecture, the client machine acts as merely a front
end and does not contain any direct database calls. Instead, the client end
communicates with an application server, usually through a forms interface. The
application server in turn communicates with a database system to access data.
The business logic of the application, which says what actions to carry out under
what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate
for large applications, and for applications that run on the Worldwide Web.

TYPES OF DATABASES:
∑ A DBMS can support many different types of databases. Databases can be
classified as follows:
o According to the number of users
o Based on the location of database
o Based on the usage (how they will be used)
∑ According to the number of users, the databases are classified as:
ß Single user databases (supports only one user at a time)
ß Multi user databases:
∑ It supports multiple users at the same time. These are
two types.
∑ Workgroup database (supports small no of users(<50))
∑ Enterprise database (supports many users (>50))
∑ Based on the location of the database , the databases are classified as:
ß Centralized database (data located at single site)
ß Distributed database (data located at different sites)
∑ Based on the usage of the database, they are classified as :
ß Operational database (supports day to day operations)
ß Data warehouse (contains historical data)
o NOTE:
∑ The operational database can also be referred as transactional database or
production database
∑ Data warehouse can store data derived from many sources

You might also like