Dbms U-1 (P-I) PDF
Dbms U-1 (P-I) PDF
Techniques for data storage and processing have evolved over the years:
1950s and early 1960s: Magnetic tapes were developed for data storage. Data
processing tasks such as payroll were automated, with data stored on tapes.
Processing of data consisted of reading data from one or more tapes and writing
data to a new tape. Data could also be input from punched card decks, and
output to printers.
For example, salary raises were processed by entering the raises on punched
cards and reading the punched card deck in synchronization with a tape
containing the master salary details. The records had to be in the same sorted
order. The salary raises would be added to the salary read from the master tape,
and written to a new tape; the new tape would become the new master tape. Tapes
(and card decks) could be read only sequentially, and data sizes were much larger
than main memory.
Late 1960s and 1970s: Widespread use of hard disks in the late 1960s changed
the scenario for data processing greatly, since hard disks allowed direct access to
data.
With disks, network and hierarchical databases could be created that allowed data
structures such as lists and trees to be stored on disk. Programmers could
construct and manipulate these data structures.
A landmark paper by Codd [1970] defined the relational model, the simplicity of
the relational model and the possibility of hiding implementation details
completely from the programmer were enticing indeed. Codd later won the
prestigious Association of Computing Machinery Turing Award for his work.
1980s: Although academically interesting, the relational model was not used in
practice initially, because of its perceived performance disadvantages; relational
databases could not match the performance of existing network and hierarchical
databases.
That changed with System R, a groundbreaking project at IBM Research that
developed techniques for the construction of an efficient relational database
system. Excellent overviews of System R are provided by Astrahan et al. [1976]
and Chamberlin et al. [1981]. The fully functional System R prototype led to IBM’s
first relational database product, SQL/DS. At the same time, the Ingres system
was being developed at the University of California at Berkeley.
The 1980s also saw much research on parallel and distributed databases, as well
as initial work on object-oriented databases.
Early 1990s: The SQL language was designed primarily for decision support
applications, which are query-intensive.
1990s: The major event of the 1990s was the explosive growth of the World Wide
Web. Database systems also had to support Web interfaces to data.
2000s: The first half of the 2000s saw the emerging of XML and the associated
query language XQuery as a new database technology. Although XML is widely
used for data exchange, as well as for storing certain complex data types,
relational databases still form the core of a vast majority of large-scale database
applications.
This period also saw a significant growth in use of open-source database systems,
particularly PostgreSQL and MySQL
Introduction to DBMS
A database-management system (DBMS) is a collection of interrelated data and a
set of programs to access those data. The collection of data, usually referred to as
the database, contains information relevant to an enterprise. The primary goal of
a DBMS is to provide a way to store and retrieve database information that is both
convenient and efficient.
Database systems are designed to manage large bodies of information.
Management of data involves both defining structures for storage of information
and providing mechanisms for the manipulation of information. In addition, the
database system must ensure the safety of the information stored, despite system
crashes or attempts at unauthorized access. If data are to be shared among
several users, the system must avoid possible anomalous results.
Database System Applications
The conventional file processing system suffers from the following shortcomings.
1. Data Redundancy
2. Data Inconsistency
3. Difficulty in Accessing Data
4. Data Isolation
5. Integrity Problems
6. Atomicity Problem
7. Concurrent Access anomalies
8. Security Problems
Reduced data redundancy: In the conventional file processing system, every user
group maintains its own files for handling its data files. This may lead to
¸ Duplication of same data in different files.
¸ Wastage of storage space, since duplicated data is stored.
¸ Errors may be generated due to updating of the same data in
different files.
¸ Time in entering data again and again is wasted.
¸ Computer Resources are needlessly used.
Elimination of inconsistency:
In the file processing system information is duplicated throughout the system. So
changes made in one file may be necessary be carried over to another file. This
may lead to inconsistent data. So we need to remove this duplication of data in
multiple file to eliminate inconsistency.
Better data integration:
Data integration involves combining data residing in different sources and
providing users with a unified view of these data. Since data of the organization
using database approach is centralized and would be used by a number of users
at a time, It is essential to enforce integrity-constraints.
Data independence from applications programs:
There are two types of data independence:
1. Logical data independence: This is the capacity to change the
conceptual schema without having to change external schema or application
programs. We can change the conceptual schema to expand the database or to
reduce the database.
2. Physical data independence: This is the capacity to change the internal
schema without having to change the conceptual or external schema
Improved data access to users:
The DBMS makes it possible to produce quick answers to ad hoc queries. From a
database perspective, a query is a specific request issued to the DBMS for data
manipulation (ex: for reading or updating the data). Ad hoc query is a spur-of-the-
moment question. The DBMS sends back an answer (query result set) to the
application.
Improved data security:
Security means that protection of data against unauthorized disclosure. In
conventional systems, applications are developed in an adhoc/temporary manner.
Often different system of an organization would access different components of the
operational data, in such an environment enforcing security can be quiet difficult.
Setting up of a database makes it easier to enforce security restrictions since data
is now centralized. Different checks can be established for each type of access
(retrieve, modify, delete etc.) to each piece of information in the database.
Improved data sharing:
DBMS allows data to be shared by two or more users that means the same data
can be accessed by multiple users at the same time.
Improved decision making:
Better managed data and improved data access make it possible to generate better
quality information, on which better decisions are based.
Concurrent access and crash recovery:
A DBMS schedules concurrent access to the data in such a manner that users
can think of the data is being accessed by only one user at a time. Further the
DBMS protects users from the effects of system failures.
Reduced application development time:
The DBMS supports many important functions that are common to many
applications. Hence it reduces the application development time.
View of Data
A database system is a collection of interrelated data and a set of programs that
allow users to access and modify these data. A major purpose of a database
system is to provide users with an abstract view of the data. That is, the system
hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. Since many
database-system users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users’ interactions
with the system:
∑ Physical level. The lowest level of abstraction describes how the data are
actually stored. The physical level describes complex low-level data
structures in detail.E.g. index, B-tree, hashing.
∑ Logical level. The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data.
∑ View level. The highest level of abstraction describes only part of the entire
database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The
system may provide many views for the same database.
Instances and Schemas
Databases change over time as information is inserted and deleted. The collection
of information stored in the database at a particular moment is called an instance
of the database. The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all.
The concept of database schemas and instances can be understood by analogy to
a program written in a programming language. A database schema corresponds to
the variable declarations (along with associated type definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables
in a program at a point in time correspond to an instance of a database schema.
Data Models
The Evolution of Data Models
∑ Hierarchical
∑ Network
∑ Relational
∑ Entity relationship
∑ Object oriented
Hierarchical data model: -
This is one of the traditional data model developed in 1960’s. In this model, the
records are represented in the form of a tree. It consist of many levels, the top
most level of the tree is considered as root or parent of the below levels.
∑ Each parent can have many children
∑ Each child has only one parent
∑ Tree is defined by path that traces parent segments to child segments,
beginning from the left
Universities
JNTU SVU
Advantages:-
∑ Conceptual simplicity
∑ Database security
∑ Data independence
∑ Database integrity
∑ Efficiency
Disadvantages:-
∑ Complex implementation
∑ Difficult to manage
∑ Lacks structural independence
∑ Complex applications programming and use
∑ Implementation limitations
Relational Model:-
It is one of the most popular data model developed in 1970’s by E.F.Codd. In this
model data and relationship between data is represented in the form of table
consisting of rows and columns. The columns represent set of attributes and rows
represent an instance of the entity. It performs same basic functions provided by
hierarchical and network DBMS systems, plus other functions.
Advantages
∑ Structural independence
∑ Improved conceptual simplicity
∑ Easier database design, implementation, management, and use
∑ Ad hoc query capability
∑ Powerful database management system
Disadvantages
∑ Substantial hardware and system software overhead
∑ Can facilitate poor design and implementation
Advantages
∑ Exceptional conceptual simplicity
∑ Visual representation
∑ Effective communication tool
∑ Integrated with the relational data model
Disadvantages
∑ Limited constraint representation
∑ Limited relationship representation
∑ No data manipulation language
∑ Loss of information content
Object-oriented data model that can be seen as extending the E-R model with
notions of encapsulation, methods (functions), and object identity. Inheritance,
object identity, and encapsulation (information hiding), with methods to provide
an interface to objects, are among the key concepts of object-oriented
programming that have found applications in data modeling.
Advantages
∑ It stores large number of different data types including text,
audio, video, and graphics.
∑ It supports the concept of inheritance and polymorphism.
Disadvantages
∑ It is difficult to manage
Example: - XML
Application Architectures
Most users of a database system today are not present at the site of the database
system, but connect to it through a network. We can therefore differentiate
between client machines, on which remote database user’s work, and server
machines, on which the database system runs. Database applications are usually
partitioned into two or three parts, as in shown in below Figure.
In two-tier architecture, the application resides at the client machine, where it
invokes database system functionality at the server machine through query
language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server.
In contrast, in three-tier architecture, the client machine acts as merely a front
end and does not contain any direct database calls. Instead, the client end
communicates with an application server, usually through a forms interface. The
application server in turn communicates with a database system to access data.
The business logic of the application, which says what actions to carry out under
what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate
for large applications, and for applications that run on the Worldwide Web.
TYPES OF DATABASES:
∑ A DBMS can support many different types of databases. Databases can be
classified as follows:
o According to the number of users
o Based on the location of database
o Based on the usage (how they will be used)
∑ According to the number of users, the databases are classified as:
ß Single user databases (supports only one user at a time)
ß Multi user databases:
∑ It supports multiple users at the same time. These are
two types.
∑ Workgroup database (supports small no of users(<50))
∑ Enterprise database (supports many users (>50))
∑ Based on the location of the database , the databases are classified as:
ß Centralized database (data located at single site)
ß Distributed database (data located at different sites)
∑ Based on the usage of the database, they are classified as :
ß Operational database (supports day to day operations)
ß Data warehouse (contains historical data)
o NOTE:
∑ The operational database can also be referred as transactional database or
production database
∑ Data warehouse can store data derived from many sources