Unit-I Notes DBMS
Unit-I Notes DBMS
Data:
Data is raw fact or figures or entity.
When activities in the organization takes place, the effect of these activities need
to be recorded which is known as Data.
Information:
Processed data is called information
The purpose of data processing is to generate the information required for carrying
out the business activities.
Unit 1
Introduction to DBMS
(Database Management Systems)
Application
program End-user
Wei-Pang Yang, Information Management, NDHU DBMS 1-1
Metadata (meta data, or sometimes meta information) is "data about data", of any
sort in any media. An item of metadata may describe a collection of data including
multiple content items and hierarchical levels, for example a database schema. In data
processing, metadata is definitional data that provides information about or
documentation of other data managed within an application or environment. The term
should be used with caution as all data is about something, and is therefore metadata.
Database
Database may be defined in simple terms as a collection of data
A database is a collection of related data.
The database can be of any size and of varying complexity.
A database may be generated and maintained manually or it may be computerized
Characteristics of DBMS
To incorporate the requirements of the organization, system should be designed for
easy maintenance.
Information systems should allow interactive access to data to obtain new information
without writing fresh programs.
System should be designed to co-relate different data to meet new requirements.
An independent central repository, which gives information and meaning of available
data is required.
Integrated database will help in understanding the inter-relationships between data
stored in different applications.
The stored data should be made available for access by different users simultaneously.
Automatic recovery feature has to be provided to overcome the problems with
processing system failure.
DBMS Utilities
A data loading utility: Which allows easy loading of data from the external format
without writing programs.
A backup utility: Which allows to make copies of the database periodically to help in
cases of crashes and disasters.
Recovery utility: Which allows to reconstruct the correct state of database from the
backup and history of transactions.
Monitoring tools: Which monitors the performance so that internal schema can be
changed and database access can be optimized.
File organization: Which allows restructuring the data from one type to another?
DBMS:
1. DBMS is a collection of data and user is not required to write the procedures for
managing the database.
2. DBMS provides an abstract view of data that hides the details.
3. DBMS is efficient to use since there are wide varieties of sophisticated techniques to
store and retrieve the data.
4. DBMS takes care of Concurrent access using some form of locking.
5. DBMS has crash recovery mechanism, DBMS protects user from the effects of
system failures.
6. DBMS has a good protection mechanism.
Database Users:
There are four different types of database-system users, differentiated by the way they
expect to interact with the system. Different types of user interfaces have been
designed for the different types of users.
Naive users are unsophisticated users who interact with the system by invoking one of
the application programs that have been written previously.
For example, a bank teller who needs to transfer $50 from account A to
account B invokes a program called transfer. This program asks the teller
for the amount of money to be transferred, the account from which the
money is to be transferred, and the account to which the money is to be
transferred.
As another example, consider a user who wishes to find her account
balance over the World Wide Web. Such a user may access a form, where
she enters her account number. An application program at the Web server
then retrieves the account balance, using the given account number, and
passes this information back to the user. The typical user interface for
naive users is a forms interface, where the user can fill in appropriate
fields of the form. Naive users may also simply read reports generated
from the database.
Sophisticated users interact with the system without writing programs. Instead, they
form their requests in a database query language. They submit each such query to a
query processor, whose function is to break down DML statements into instructions
that the storage manager understands. Analysts who submit queries to explore data in
the database fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them
view summaries of data in different ways. For instance, an analyst can see total sales by
region (for example, North, South, East, and West), or by product, or by a combination
of region and product (that is, total sales of each product in each region). The tools also
permit the analyst to select specific regions, look at data in more detail (for example,
sales by city within a region) or look at the data in less detail (for example, aggregate
products together by category).
Another class of tools for analysts is data mining tools, which help them find certain
kinds of patterns in data.
Specialized users are sophisticated users who write specialized database applications that
do not fit into the traditional data-processing framework.
Advantages of DBMS.
Due to its centralized nature, the database system can overcome the disadvantages of
the file system-based system
1. Data independency:
Application program should not be exposed to details of data representation and
storage
DBMS provides the abstract view that hides these details.
2. Efficient data access:
DBMS utilizes a variety of sophisticated techniques to store and retrieve data
efficiently.
3. Data integrity and security:
Data is accessed through DBMS, it can enforce integrity constraints.
E.g.: Inserting salary information for an employee.
4. Data Administration:
When users share data, centralizing the data is an important task, Experience
professionals can minimize data redundancy and perform fine tuning which reduces
retrieval time.
5. Concurrent access and Crash recovery:
DBMS schedules concurrent access to the data. DBMS protects user from the effects
of system failure.
6. Reduced application development time.
DBMS supports important functions that are common to many applications.
Multimedia Databases
Geographic Information Systems (GIS)
Data Warehouses
Real-time and Active Databases
Model:
A model is an abstraction process that hides superfluous details. Data modeling is used
for representing entities of interest and their relationship in the database. Data model and
different types of Data Model is a collection of concepts that can be used to describe the
structure of a database which provides the necessary means to achieve the abstraction.
The structure of a database means that holds the data.
Data types
Relationships
Constraints
1. High Level-conceptual data model: User level data model is the high level or conceptual
model. This provides concepts that are close to the way that many users perceive data.
2. Low level-Physical data model : provides concepts that describe the details of how data is
stored in the computer model. Low level data model is only for Computer specialists not
for end-user.
3. Representation data model: It is between High level & Low level data model Which
provides concepts that may be understood by end-user but that are not too far removed
from the way data is organized by within the computer.
The most common data models are
1. Relational Model
The Relational Model uses a collection of tables both data and the relationship
among those data. Each table have multiple column and each column has a unique
name .
Customer Preethi and Rocky share the same account number A-111
Advantages:
1. The main advantage of this model is its ability to represent data in a simplified format.
2. The process of manipulating record is simplified with the use of certain key attributes
used to retrieve data.
3. Representation of different types of relationship is possible with this model.
2. Network Model
The data in the network model are represented by collection of records and relationships
among data are represented by links, which can be viewed as pointers.
The records in the database are organized as collection of arbitrary groups.
Advantages:
1.Representation of relationship between entities is implemented using pointers which
allows the representation of arbitrary relationship
2.Unlike the hierarchical model it is easy.
3.Data manipulation can be done easily with this model.
3. Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree like
structure. The structure allows repeating information using parent/child relationships:
each parent can have many children but each child only has one parent. All attributes
of a specific record are listed under an entity type.
Advantages:
1. The representation of records is done using an ordered tree, which is natural method
of implementation of one–to-many relationships.
2. Proper ordering of the tree results in easier and faster retrieval of records.
3. Allows the use of virtual records. This result in a stable database especially when
modification of the data base is made.
5. Object-Relational Models
• Schema Diagram:
An illustrative display of (most aspects of) a database schema.
• Schema Construct:
A component of the schema or an object within the schema, e.g.,STUDENT,
COURSE.
Database State:
The actual data stored in a database at a particular moment in time. This includes
the collection of all the data in the database. Also called database instance (or
occurrence or snapshot)
• The term instance is also applied to individual database components, e.g. record instance,
table instance, entity instance
• Initial Database State: Refers to the database state when it is initially loaded into the
system.
• Valid State: A state that satisfies the structure and constraints of the database.
Data independence can be defined as the capacity to change the schema at one
level without changing the schema at next higher level. There are two types of data
Independence. They are
1. Logical data independence.
2. Physical data independence.
1. Logical data independence is the capacity to change the conceptual schema without
having to change the external schema.
2. Physical data independence is the capacity to change the internal schema without
changing the conceptual schema.
Database management systems are complex software’s which were often developed and
optimized over years. From the view of the user, however, most of them have a quite similar
basic architecture. The discussion of this basic architecture shall help to understand the
connection with data modeling and the introduction to this module postulated 'data
independence' of the database approach.
Three-Scheme Architecture
Knowing about the conceptual and the derived logical scheme (discussed in unit Database Models,
Schemes and Instances this unit explains two additional schemes - the external scheme and the internal
scheme - which help to understand the DBMS architecture
External Scheme:
An external data scheme describes the information about the user view of specific users (single
users and user groups) and the specific methods and constraints connected with this information.
Internal Scheme:
The internal data scheme describes the content of the data and the required service functionality
which is used for the operation of the DBMS.
Therefore, the internal scheme describes the data from a view very close to the computer or system
in general. It completes the logical scheme with data technical aspects like storage methods or help
functions for more efficiency.
The right hand side of the representation above is also called the three-schemes architecture: internal,
logical and external scheme.
While the internal scheme describes the physical grouping of the data and the use of the storage space,
the logical scheme (derived from the conceptual scheme) describes the basic construction of the data
structure. The external scheme of a specific application, generally, only highlights that part of the
logical scheme which is relevant for its application. Therefore, a database has exactly one internal and
one logical scheme but may have several external schemes for several applications using this
database.
The aim of the three-schemes architecture is the separation of the user applications from the
physical database, the stored data. Physically the data is only existent on the internal level while other
forms of representation are calculated or derived respectively if needed. The DBMS has the task to
realize this representation between each of these levels
Data Independence
With knowledge about the three-schemes architecture the term data independence can be explained as
followed: Each higher level of the data architecture is immune to changes of the next lower level of the
architecture.
Physical Independence:
Therefore, the logical scheme may stay unchanged even though the storage space or type of some
data is changed for reasons of optimization or reorganization.
Logical Independence:
Also the external scheme may stay unchanged for most changes of the logical scheme. This is
especially desirable as in this case the application software does not need to be modified or newly
translated.
A database system is partitioned into modules that deal with each of the responsibilities of
the overall system. The functional components of a database system can be broadly
divided into the storage manager and the query processor components.
The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the
largest databases, terabytes of data. A gigabyte is 1000 megabytes (1 billion bytes), and a
terabyte is 1 million megabytes (1 trillion bytes). Since the main memory of computers
cannot store this much information, the information is stored on disks. Data are moved
between disk storage and main memory as needed. Since the movement of data to and
from disk is slow relative to the speed of the central processing unit, it is imperative that
the database system structure the data so as to minimize the need to move data between
disk and main memory.
The query processor is important because it helps the database system simplify and
facilitate access to data. High-level views help to achieve this goal; with them, users of
the system are not be burdened unnecessarily with the physical details of the
implementation of the system. However, quick processing of updates and queries is
important. It is the job of the database system to translate updates and queries written in a
nonprocedural language, at the logical level, into an efficient sequence of operations at
the physical level.
Storage Manager
A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submit-ted to
the system. The storage manager is responsible for the interaction with the file manager.
The raw data are stored on the disk using the file system, which is usu-ally provided by a
conventional operating system. The storage manager translates the various DML
statements into low-level file-system commands. Thus, the storage manager is
responsible for storing, retrieving, and updating data in the database.
Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without
conflicting.
File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than
the size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
Indices, which provide fast access to data items that hold particular values.
DML compiler, which translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Figure shows these components and the connections among them.
Application Architectures
Most users of a database system today are not present at the site of the database system,
but connect to it through a network. We can therefore differentiate between client
machines, on which remote database users work, and server machines, on which the
database system runs.
Database applications are usually partitioned into two or three parts, as in Figure. In
two-tier architecture, the application is partitioned into a component that resides at the
client machine, which invokes database system functionality at the server machine through
query language statements. Application program interface standards like ODBC and JDBC
are used for interaction between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end
and does not contain any direct database calls. Instead, the client end communicates with
an application server, usually through a forms interface. The application server in turn
communicates with a database system to access data. The business logic of the
application, which says what actions to carry out under what conditions, is embedded in
the application server, instead of being distributed across multiple clients. Three-tier
applications are more appropriate for large applications, and for applications that run on
the World Wide Web.
Data processing drives the growth of computers, as it has from the earliest days of
commercial computers. In fact, automation of data processing tasks predates computers.
Punched cards, invented by Hollerith, were used at the very beginning of the twentieth
century to record U.S. census data, and mechanical systems were used to process the cards
and tabulate results. Punched cards were later widely used as a means of entering data into
computers.
Centralized Systems
Run on a single computer system and do not interact with other computer systems.
General-purpose computer system: one to a few CPUs and a number of device
controllers that are connected through a common bus that provides access to shared
memory.
Single-user system (e.g., personal computer or workstation):desk-top unit, single
user, usually has only one CPU and one or two hard disks; the OS may support only
one user.
Multi-user system: more disks, more memory, multiple CPUs, and a multi-user OS.
Serve a large number of users who are connected to the system vie terminals. Often
called server systems.
Client-Server Systems
Server systems satisfy requests generated at client systems, whose general structure is
shown below:
Transaction Servers
Also called query server systems or SQL server systems; clients send requests to
the server system where the transactions are executed, and results are shipped
back to the client.
Requests specified in SQL, and communicated to the server through a remote
procedure call (RPC) mechanism.
Transactional RPC allows many RPC calls to collectively form a transaction.
Open Database Connectivity (ODBC) is an application program interface
standard from Microsoft for connecting to a server, sending SQL requests, and
receiving results.
Data Servers
Used in LANs, where there is a very high speed connection between the clients and
the server, the client machines are comparable in processing power to the server
machine, and
the tasks to be executed are compute intensive.
Ship data to client machines where processing is performed, and then ship results
back to the server machine.
This architecture requires full back-end functionality at the clients.
Used in many object-oriented database systems
Data inconsistency is likely to occur when there is data redundancy. Data redundancy
occurs when the data file/database file contains redundant unnecessarily duplicated
data. That’s why one major goal of good database design is to eliminate data
redundancy.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction
with database system.
Ans: At physical level these records can be described as blocks of storage (bytes,
gigabytes, terabytes etc.) in memory. These details are often hidden from the
programmers.
At the logical level these records can be described as fields and attributes along with
their data types, their relationship among each other can be logically implemented. The
programmers generally work at this level because they are aware of such things about
database systems.
At view level, user just interact with system with the help of GUI and enter the details
at the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.
Ans: Data independence means that "the application is independent of the storage
structure and access strategy of data". In other words, The ability to modify the schema
definition in one level should not affect the schema definition in the next higher level.
Two types of Data Independence:
1. Physical Data Independence: Modification in physical level should not affect the
logical level.
2. Logical Data Independence: Modification in logical level should affect the view
level.
✔Failures may leave database in an inconsistent state with partial updates carried out
✔E.g. transfer of funds from one account to another should either complete or not
happen at all Concurrent access by multiple users
Traditionally, manual files are being used to store all internal and external data within an
organization. These files are being stored in cabinets and for security purposes, the cabinets
are locked or located in a secure area. When any information is needed, you may have to
search starting from the first page until you found the information that you are looking for.
To speed up the searching process, you may create an indexing system to help you locate the
information that you are looking for quickly. You may have such system that store all your
results or important documents.
The manual filing system works well if the number of items stored is not large. However, this
kind of system may fail if you want to do a cross-reference or process any of the information
in the file. Then, computer based data processing emerge and it replaces the traditional filing
system with computer-based data processing system or file-based system. However, instead
of having a centralized store for the organizations operational access, a decentralized
approach was taken. In this approach, each department would have their own file-based
system where they would monitor and control separately.
File processing system at Make-Believe real estate company Make-Believe real estate
company has three departments, that are, Sales, Contract and Personnel. Each of these
departments were physically located in the same building, but in separate floors, and each has
its own file-based system. The function of the Sales department is to sell and rent properties.
The function of the Contract department is handle the lease agreement associated with
properties for rent. The function of the Personnel department is to store the information about
the staff. Figure illustrates the file-based system for Make-Believe real estate company. Each
department has its own application program that handles similar operations like data entry,
file maintenance and generation of reports.
2. What are the anomalies related to traditional file processing systems?
Data Redundancy: Data Redundancy means same information is duplicated in several files. This
makes data redundancy.
Data Inconsistency: Data Inconsistency means different copies of the same data are not
matching. That means different versions of same basic data are existing. This occurs as the
result of update operations that are not updating the same data stored at different places.
Difficulty in Accessing Data: It is not easy to retrieve information using a conventional file
processing system. Convenient and efficient information retrieval is almost impossible using
conventional file processing system.
Data Isolation: Data are scattered in various files, and the files may be in different format,
writing new application program to retrieve data is difficult.
Integrity Problems: The data values may need to satisfy some integrity constraints. For
example the balance field Value must be grater than 5000. We have to handle this through
program code in file processing systems. But in database we can declare the integrity
constraints along with definition itself.
Atomicity Problem: It is difficult to ensure atomicity in file processing system. For example
transferring $100 from Account A to account B.If a failure occurs during execution there could
be situation like $100 is deducted from Account A and not credited in Account B.
Concurrent Access anomalies: If multiple users are updating the same data simultaneously it
will result in inconsistent data state. In file processing system it is very difficult to handle this
using program code. This results in concurrent access anomalies.
Security Problems: Enforcing Security Constraints in file processing system is very difficult as
the application programs are added to the system in an ad-hoc manner.
3. What are the disadvantages of File Processing System? What are the
advantages of DBMS?
Ans: The database approach offers a number of potential advantages compared to
traditional file processing systems.
3. Difficulty in accessing Data: - In Classical file organization the data is stored in the
files. Whenever data has to be retrieved as per the requirements then a new application
program has to be written. This is tedious process.
4. Data isolation: - Since data is scattered in various files, and files may be in different
formats, it is difficult to write new application programs to retrieve the appropriate data.
7. Integrity Problem: - The data values stored in the database must satisfy certain types
of consistency constraints. For example, the balance of a bank account may never fall
below a prescribed amount. These constraints are enforced in the system by adding
appropriate code in the various application programs. However, when new constraints
are added, it is difficult to change the programs to enforce them. The problem is
compounded when constraints involve several data items from different files.
There are two important reasons that data base applications can often be developed
much more rapidly than conventional file applications.
a) Assuming that the database and the related data capture and maintenance
applications have already been designed and implemented, the programmer can
concentrate on the specific functions required for the new application, without having to
worry about file design or low-level implementation details.
Disadvantages:-
1. It occupies more amount of space. It is generic
3. More complex and expensive hardware and software resources are needed.
The use of external views representing subsets of the database has some important
advantages:
It makes it easy to identify specific data required to support each business unit’s operations.
It makes the designer’s job easy by providing feedback about the model’s adequacy.
Specifically, the model can be checked to ensure that it supports all processes as defined by
their external models, as
well as all operational requirements and constraints.
It helps to ensure security constraints in the database design. Damaging an entire database is
more difficult when each business unit works with only a subset of data.
Having identified the external views, a conceptual model is used, graphically represented by
an ERD to integrate all external views into a single view. The conceptual model represents a
global view of the entire database as viewed by the entire organization. That is, the
conceptual model integrates all external views (entities, relationships, constraints, and
processes) into a single global view of the data in the enterprise. The conceptual model yields
some very important advantages.
First, it provide relatively easily understood bird’s eye (macro level) view of the data
environment.
Second, the conceptual model is independent of both software and hardware. Software
independence means that the model does not depend on the DBMS software used to
implement the model.
Hardware independence means that the model does not depend on the hardware used in the
implementation of the model. Therefore, changes in either the hardware or the DBMS
software will have no effect on the database design at the conceptual level. Generally, the
term logical design is used to refer to the task of creating a conceptual data model that could
be implemented in any DBMS.
The Physical Model: The physical model operates at the lowest level of abstraction,
describing the way data are saved on storage media such as disks or tapes. The storage
structures used are dependent on the software (the DBMS and the operating system) and on
the type of storage devices that the computer can handle. The precision required in the
physical model’s definition demands that database designers who work at this level have a
detailed knowledge of the hardware and software used to implement the database design. As
noted earlier, the physical model is dependent on the DBMS, methods of accessing files, and
types of hardware storage devices supported by the operating system. When you can change
the physical model without affecting the internal model, you have physical independence.
Therefore, a change in storage devices or methods and even a change in operating system will
not affect the internal model.
•Derived from the rules in the mini world that the database represents
First of all, there are always very special types of data which require special forms of
representation
and/or inference. Some examples are the following.
Limitations regarding special forms of data:
Temporal data
Spatial data
Multimedia data
Unstructured data (warehousing/mining)
Document libraries (digital libraries)
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is
called a database administrator (DBA). The functions of a DBA include: Schema
definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL. Storage structure and access-method definition. Schema
and physical-organization modification. The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization, or to alter the physical
organization to improve performance. Granting of authorization for data access. By granting
different types of authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a special system
structure that the database system consults whenever someone attempts to access the data in
the system. Routine maintenance. Examples of the database administrator’s routine
maintenance activities are: Periodically backing up the database, either onto tapes or onto
remote servers, to prevent loss of data in case of disasters such as flooding. Ensuring that
enough free disk space is available for normal operations, and upgrading disk space as
required. Monitoring jobs running on the database and ensuring that performance is not
degraded by very expensive tasks submitted by some users.
8. How different users interact with the database system?
Ans: A primary goal of a database system is to retrieve information from and store new
information in the database. People who work with a database can be categorized as database
users or database administrators.
There are four different types of database-system users, differentiated by the way they expect
to interact with the system. Different types of user interfaces have been designed for the
different types of users.
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs
to transfer $50 from account A to account B invokes a program called transfer. This program
asks the teller for the amount of money to be transferred, the account from which the money
is to be transferred, and the account to which the money is to be transferred. As another
example, consider a user who wishes to find her account balance over the World Wide Web.
Such a user may access a form, where she enters her account number. An application
program at the Web server then retrieves the account balance, using the given account
number, and passes this information back to the user. The typical user 17 interface for naive
users is a forms interface, where the user can fill in appropriate fields of the form. Naive
users may also simply read reports generated from the database.
Other problems arise with file management systems. The first problem is data redundancy:
As applications and their data files were created by different programmers over a period of
time, the same data could be duplicated in several files. In the university example, each data
file will contain records about students, many of whom will be represented in other data files.
Therefore, student files in the aggregate will contain some amount of duplicate data. This
wastes physical computer storage media, the students’ time and effort, and the clerks’ time
needed to enter and maintain the data. Data redundancy leads to the potential for data
inconsistency. Data inconsistency means that the actual values across various copies of the
data no longer agree or are not synchronized. For example, if a student changes his or her
address, the new address must be changed across all applications in the university that require
the address. File organization also leads to difficulty in accessing data from different
applications, a problem called data isolation. With applications uniquely designed and
implemented, data files are likely to be organized differently, stored in different formats (e.g.,
height in inches versus height in centimeters), and often physically inaccessible to other
applications. In the university example, an administrator who wanted to know which students
taking advanced courses were also starting players on the football team would most likely not
be able to get the answer from the computer-based file system. He or she would probably
have to manually compare printed output data from two data files. This process would take a
great deal of time and effort and would ignore the
greatest strengths of computers—fast and accurate processing.