0% found this document useful (0 votes)
40 views34 pages

Unit-I Notes DBMS

The document provides an introduction to database management systems (DBMS). It discusses that a database is a collection of related data and a DBMS is software designed to assist with maintaining and using large collections of data. Some key developments in DBMS history include the first general purpose DBMS in 1960, IBM's IMS in 1960, Edgar Codd's relational database model in 1970, and the introduction of SQL in the 1980s. The document also defines data, information, metadata, and discusses database users and utilities.

Uploaded by

Yugi Yugu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views34 pages

Unit-I Notes DBMS

The document provides an introduction to database management systems (DBMS). It discusses that a database is a collection of related data and a DBMS is software designed to assist with maintaining and using large collections of data. Some key developments in DBMS history include the first general purpose DBMS in 1960, IBM's IMS in 1960, Edgar Codd's relational database model in 1970, and the introduction of SQL in the 1980s. The document also defines data, information, metadata, and discusses database users and utilities.

Uploaded by

Yugi Yugu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Introduction to Database:

Database is a collection of related data. Database management system is software


designed to assist the maintenance and utilization of large scale collection of data.
DBMS came into existence in 1960 by Charles. Integrated data store which is also
called as the first general purpose DBMS. Again in 1960 IBM brought IMS-
Information management system. In 1970 Edgor Codd at IBM came with new
database called RDBMS. In 1980 then came SQL Architecture- Structure Query
Language. In 1980 to 1990 there were advances in DBMS e.g. DB2, ORACLE.

Data:
 Data is raw fact or figures or entity.
 When activities in the organization takes place, the effect of these activities need
to be recorded which is known as Data.

Information:
 Processed data is called information
 The purpose of data processing is to generate the information required for carrying
out the business activities.

Unit 1
Introduction to DBMS
(Database Management Systems)

Application
program End-user
Wei-Pang Yang, Information Management, NDHU DBMS 1-1

In general data management consists of following tasks


 Data capture: Which is the task associated with gathering the data as and when they
originate.
 Data classification: Captured data has to be classified based on the nature and
intended usage.
 Data storage: The segregated data has to be stored properly.
 Data arranging: It is very important to arrange the data properly
 Data retrieval: Data will be required frequently for further processing, hence it is very
important to create some indexes so that data can be retrieved easily.
 Data maintenance: Maintenance is the task concerned with keeping the data up-to-
date.
 Data Verification: Before storing the data it must be verified for any error.
 Data Coding: Data will be coded for easy reference.
 Data Editing: Editing means re-arranging the data or modifying the data for
presentation.
 Data transcription: This is the activity where the data is converted from one form into
another.
 Data transmission: This is a function where data is forwarded to the place where it
would be used further.

Metadata (meta data, or sometimes meta information) is "data about data", of any
sort in any media. An item of metadata may describe a collection of data including
multiple content items and hierarchical levels, for example a database schema. In data
processing, metadata is definitional data that provides information about or
documentation of other data managed within an application or environment. The term
should be used with caution as all data is about something, and is therefore metadata.

Database
 Database may be defined in simple terms as a collection of data
 A database is a collection of related data.
 The database can be of any size and of varying complexity.
 A database may be generated and maintained manually or it may be computerized

Database Management System


 A Database Management System (DBMS) is a collection of program that enables user
to create and maintain a database.
 The DBMS is hence a general purpose software system that facilitates the process of
defining constructing and manipulating database for various applications.

Characteristics of DBMS
 To incorporate the requirements of the organization, system should be designed for
easy maintenance.
 Information systems should allow interactive access to data to obtain new information
without writing fresh programs.
 System should be designed to co-relate different data to meet new requirements.
 An independent central repository, which gives information and meaning of available
data is required.
 Integrated database will help in understanding the inter-relationships between data
stored in different applications.
 The stored data should be made available for access by different users simultaneously.
 Automatic recovery feature has to be provided to overcome the problems with
processing system failure.

DBMS Utilities
 A data loading utility: Which allows easy loading of data from the external format
without writing programs.
 A backup utility: Which allows to make copies of the database periodically to help in
cases of crashes and disasters.
 Recovery utility: Which allows to reconstruct the correct state of database from the
backup and history of transactions.
 Monitoring tools: Which monitors the performance so that internal schema can be
changed and database access can be optimized.
 File organization: Which allows restructuring the data from one type to another?

Difference between File system & DBMS


File System:
1. File system is a collection of data. Any management with the file system, user has to
write the procedures
2. File system gives the details of the data representation and Storage of data.
3. In File system storing and retrieving of data cannot be done efficiently.
4. Concurrent access to the data in the file system has many problems like
Reading the file while other deleting some information, updating some information
5. File system doesn’t provide crash recovery mechanism.
Eg. While we are entering some data into the file if System crashes then content of the
file is lost.
6. Protecting a file under file system is very difficult.

DBMS:
1. DBMS is a collection of data and user is not required to write the procedures for
managing the database.
2. DBMS provides an abstract view of data that hides the details.
3. DBMS is efficient to use since there are wide varieties of sophisticated techniques to
store and retrieve the data.
4. DBMS takes care of Concurrent access using some form of locking.
5. DBMS has crash recovery mechanism, DBMS protects user from the effects of
system failures.
6. DBMS has a good protection mechanism.
Database Users:
 There are four different types of database-system users, differentiated by the way they
expect to interact with the system. Different types of user interfaces have been
designed for the different types of users.
 Naive users are unsophisticated users who interact with the system by invoking one of
the application programs that have been written previously.
 For example, a bank teller who needs to transfer $50 from account A to
account B invokes a program called transfer. This program asks the teller
for the amount of money to be transferred, the account from which the
money is to be transferred, and the account to which the money is to be
transferred.
 As another example, consider a user who wishes to find her account
balance over the World Wide Web. Such a user may access a form, where
she enters her account number. An application program at the Web server
then retrieves the account balance, using the given account number, and
passes this information back to the user. The typical user interface for
naive users is a forms interface, where the user can fill in appropriate
fields of the form. Naive users may also simply read reports generated
from the database.

 Application programmers are computer professionals who write application


programs. Application programmers can choose from many tools to develop user
interfaces. Rapid application development (RAD) tools are tools that enable an
application programmer to construct forms and reports without writing a program.
There are also special types of programming languages that combine imperative
control structures (for example, for loops, while loops and if-then-else statements) with
statements of the data manipulation language. These languages, sometimes called
fourth-generation languages, often include special features to facilitate the generation
of forms and the display of data on the screen. Most major commercial database
systems include a fourth generation language.

 Sophisticated users interact with the system without writing programs. Instead, they
form their requests in a database query language. They submit each such query to a
query processor, whose function is to break down DML statements into instructions
that the storage manager understands. Analysts who submit queries to explore data in
the database fall in this category.

 Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them
view summaries of data in different ways. For instance, an analyst can see total sales by
region (for example, North, South, East, and West), or by product, or by a combination
of region and product (that is, total sales of each product in each region). The tools also
permit the analyst to select specific regions, look at data in more detail (for example,
sales by city within a region) or look at the data in less detail (for example, aggregate
products together by category).

Another class of tools for analysts is data mining tools, which help them find certain
kinds of patterns in data.

 Specialized users are sophisticated users who write specialized database applications that
do not fit into the traditional data-processing framework.

Among these applications are computer-aided design systems, knowledge base


and expert systems, systems that store data with complex data types (for example,
graphics data and audio data), and environment-modeling systems.

Advantages of DBMS.
Due to its centralized nature, the database system can overcome the disadvantages of
the file system-based system

1. Data independency:
 Application program should not be exposed to details of data representation and
storage
 DBMS provides the abstract view that hides these details.
2. Efficient data access:
 DBMS utilizes a variety of sophisticated techniques to store and retrieve data
efficiently.
3. Data integrity and security:
 Data is accessed through DBMS, it can enforce integrity constraints.
E.g.: Inserting salary information for an employee.
4. Data Administration:
 When users share data, centralizing the data is an important task, Experience
professionals can minimize data redundancy and perform fine tuning which reduces
retrieval time.
5. Concurrent access and Crash recovery:
 DBMS schedules concurrent access to the data. DBMS protects user from the effects
of system failure.
6. Reduced application development time.
 DBMS supports important functions that are common to many applications.

Database Applications & Introduction of different Data Model


Traditional Applications:

 Numeric and Textual Databases

More Recent Applications:

 Multimedia Databases
 Geographic Information Systems (GIS)
 Data Warehouses
 Real-time and Active Databases

Model:
A model is an abstraction process that hides superfluous details. Data modeling is used
for representing entities of interest and their relationship in the database. Data model and
different types of Data Model is a collection of concepts that can be used to describe the
structure of a database which provides the necessary means to achieve the abstraction.
The structure of a database means that holds the data.
 Data types
 Relationships
 Constraints

Types of Data Models


1. High Level- Conceptual data model.
2. Low Level – Physical data model.
3. Relational or Representational
4. Object-oriented Data Models:
5. Object-Relational Models:

1. High Level-conceptual data model: User level data model is the high level or conceptual
model. This provides concepts that are close to the way that many users perceive data.
2. Low level-Physical data model : provides concepts that describe the details of how data is
stored in the computer model. Low level data model is only for Computer specialists not
for end-user.
3. Representation data model: It is between High level & Low level data model Which
provides concepts that may be understood by end-user but that are not too far removed
from the way data is organized by within the computer.
The most common data models are

1. Relational Model
The Relational Model uses a collection of tables both data and the relationship
among those data. Each table have multiple column and each column has a unique
name .

Relational database comprising of two tables


Customer –Table.

Customer Preethi and Rocky share the same account number A-111

Advantages:

1. The main advantage of this model is its ability to represent data in a simplified format.
2. The process of manipulating record is simplified with the use of certain key attributes
used to retrieve data.
3. Representation of different types of relationship is possible with this model.

2. Network Model

The data in the network model are represented by collection of records and relationships
among data are represented by links, which can be viewed as pointers.
The records in the database are organized as collection of arbitrary groups.

Advantages:
1.Representation of relationship between entities is implemented using pointers which
allows the representation of arbitrary relationship
2.Unlike the hierarchical model it is easy.
3.Data manipulation can be done easily with this model.

3. Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree like
structure. The structure allows repeating information using parent/child relationships:
each parent can have many children but each child only has one parent. All attributes
of a specific record are listed under an entity type.

Advantages:
1. The representation of records is done using an ordered tree, which is natural method
of implementation of one–to-many relationships.
2. Proper ordering of the tree results in easier and faster retrieval of records.
3. Allows the use of virtual records. This result in a stable database especially when
modification of the data base is made.

4. Object-oriented Data Models


 Several models have been proposed for implementing in a database system.
 One set comprises models of persistent O-O Programming Languages such as
C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).
 Additionally, systems like O2, ORION (at MCC – then ITASCA), IRIS (at H.P.-
used in Open OODB).

5. Object-Relational Models

 Most Recent Trend. Started with Informix


 Universal Server.
 Relational systems incorporate concepts from object databases leading to object
relational.
 Object Database Standard: ODMG-93, ODMG-version 2.0,ODMG-version 3.0.
 Exemplified in the latest versions of Oracle-10i,DB2, and SQL Server and other
DBMSs.
 Standards included in SQL-99 and expected to be enhanced in future SQL standards.

Concepts of Schema, Instance and Data Independence:

Schemas versus Instances


Database Schema:
The description of a database. Includes descriptions of the database structure,
data types, and the constraints on the database.

• Schema Diagram:
An illustrative display of (most aspects of) a database schema.
• Schema Construct:
A component of the schema or an object within the schema, e.g.,STUDENT,
COURSE.

 Database State:
The actual data stored in a database at a particular moment in time. This includes
the collection of all the data in the database. Also called database instance (or
occurrence or snapshot)

• The term instance is also applied to individual database components, e.g. record instance,
table instance, entity instance

Database Schema vs. Database State

• Database State: Refers to the content of a database at a moment in time.

• Initial Database State: Refers to the database state when it is initially loaded into the
system.

• Valid State: A state that satisfies the structure and constraints of the database.

• Distinction: The database schema changes very infrequently.


The database state changes every time the database is updated

• Schema is also called intension


• State is also called extension
Example of a Database Schema

Example of a database state


Data Independence

Data independence can be defined as the capacity to change the schema at one
level without changing the schema at next higher level. There are two types of data
Independence. They are
1. Logical data independence.
2. Physical data independence.

1. Logical data independence is the capacity to change the conceptual schema without
having to change the external schema.
2. Physical data independence is the capacity to change the internal schema without
changing the conceptual schema.

Three tier schema Architecture for data Independence

Database management systems are complex software’s which were often developed and
optimized over years. From the view of the user, however, most of them have a quite similar
basic architecture. The discussion of this basic architecture shall help to understand the
connection with data modeling and the introduction to this module postulated 'data
independence' of the database approach.
Three-Scheme Architecture

Knowing about the conceptual and the derived logical scheme (discussed in unit Database Models,
Schemes and Instances this unit explains two additional schemes - the external scheme and the internal
scheme - which help to understand the DBMS architecture

External Scheme:
An external data scheme describes the information about the user view of specific users (single
users and user groups) and the specific methods and constraints connected with this information.
Internal Scheme:
The internal data scheme describes the content of the data and the required service functionality
which is used for the operation of the DBMS.
Therefore, the internal scheme describes the data from a view very close to the computer or system
in general. It completes the logical scheme with data technical aspects like storage methods or help
functions for more efficiency.

The right hand side of the representation above is also called the three-schemes architecture: internal,
logical and external scheme.

While the internal scheme describes the physical grouping of the data and the use of the storage space,
the logical scheme (derived from the conceptual scheme) describes the basic construction of the data
structure. The external scheme of a specific application, generally, only highlights that part of the
logical scheme which is relevant for its application. Therefore, a database has exactly one internal and
one logical scheme but may have several external schemes for several applications using this
database.
The aim of the three-schemes architecture is the separation of the user applications from the
physical database, the stored data. Physically the data is only existent on the internal level while other
forms of representation are calculated or derived respectively if needed. The DBMS has the task to
realize this representation between each of these levels

Data Independence
With knowledge about the three-schemes architecture the term data independence can be explained as
followed: Each higher level of the data architecture is immune to changes of the next lower level of the
architecture.
Physical Independence:
Therefore, the logical scheme may stay unchanged even though the storage space or type of some
data is changed for reasons of optimization or reorganization.
Logical Independence:
Also the external scheme may stay unchanged for most changes of the logical scheme. This is
especially desirable as in this case the application software does not need to be modified or newly
translated.

Database System Structure, Environment

A database system is partitioned into modules that deal with each of the responsibilities of
the overall system. The functional components of a database system can be broadly
divided into the storage manager and the query processor components.

The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the
largest databases, terabytes of data. A gigabyte is 1000 megabytes (1 billion bytes), and a
terabyte is 1 million megabytes (1 trillion bytes). Since the main memory of computers
cannot store this much information, the information is stored on disks. Data are moved
between disk storage and main memory as needed. Since the movement of data to and
from disk is slow relative to the speed of the central processing unit, it is imperative that
the database system structure the data so as to minimize the need to move data between
disk and main memory.

The query processor is important because it helps the database system simplify and
facilitate access to data. High-level views help to achieve this goal; with them, users of
the system are not be burdened unnecessarily with the physical details of the
implementation of the system. However, quick processing of updates and queries is
important. It is the job of the database system to translate updates and queries written in a
nonprocedural language, at the logical level, into an efficient sequence of operations at
the physical level.

Storage Manager

A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submit-ted to
the system. The storage manager is responsible for the interaction with the file manager.
The raw data are stored on the disk using the file system, which is usu-ally provided by a
conventional operating system. The storage manager translates the various DML
statements into low-level file-system commands. Thus, the storage manager is
responsible for storing, retrieving, and updating data in the database.

The storage manager components include:

Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.

Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without
conflicting.

File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.

Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than
the size of main memory.

The storage manager implements several data structures as part of the physical system
implementation:

Data files, which store the database itself.

Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.

Indices, which provide fast access to data items that hold particular values.

The Query Processor


The query processor components include
DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.

DML compiler, which translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.

A query can usually be translated into any of a number of alternative evaluation


plans that all give the same result. The DML compiler also performs query
optimization, that is, it picks the lowest cost evaluation plan from among the
alternatives.

Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Figure shows these components and the connections among them.

Application Architectures
Most users of a database system today are not present at the site of the database system,
but connect to it through a network. We can therefore differentiate between client
machines, on which remote database users work, and server machines, on which the
database system runs.

Database applications are usually partitioned into two or three parts, as in Figure. In
two-tier architecture, the application is partitioned into a component that resides at the
client machine, which invokes database system functionality at the server machine through
query language statements. Application program interface standards like ODBC and JDBC
are used for interaction between the client and the server.

In contrast, in a three-tier architecture, the client machine acts as merely a front end
and does not contain any direct database calls. Instead, the client end communicates with
an application server, usually through a forms interface. The application server in turn
communicates with a database system to access data. The business logic of the
application, which says what actions to carry out under what conditions, is embedded in
the application server, instead of being distributed across multiple clients. Three-tier
applications are more appropriate for large applications, and for applications that run on
the World Wide Web.

Data processing drives the growth of computers, as it has from the earliest days of
commercial computers. In fact, automation of data processing tasks predates computers.
Punched cards, invented by Hollerith, were used at the very beginning of the twentieth
century to record U.S. census data, and mechanical systems were used to process the cards
and tabulate results. Punched cards were later widely used as a means of entering data into
computers.

Centralized and Client and Server architecture for the database

Centralized Systems
 Run on a single computer system and do not interact with other computer systems.
 General-purpose computer system: one to a few CPUs and a number of device
controllers that are connected through a common bus that provides access to shared
memory.
 Single-user system (e.g., personal computer or workstation):desk-top unit, single
user, usually has only one CPU and one or two hard disks; the OS may support only
one user.
 Multi-user system: more disks, more memory, multiple CPUs, and a multi-user OS.
Serve a large number of users who are connected to the system vie terminals. Often
called server systems.

Client-Server Systems
Server systems satisfy requests generated at client systems, whose general structure is
shown below:

 Database functionality can be divided into:


1. Back-end: manages access structures, query evaluation and optimization,
concurrency control and recovery.
2. Front-end: consists of tools such as forms, report-writers, and graphical user
interface facilities.
The interface between the front-end and the back-end is through SQL
or through an application program interface.
 Advantages of replacing mainframes with networks of workstations or personal
computers connected to back-end server machines:
1. Better functionality for the cost
2. Flexibility in locating resources and expanding facilities
3. Better user interfaces\
4. Easier maintenance
 Server systems can be broadly categorized into two kinds:

1. Transaction servers which are widely used in relational database systems,


and
2.Data servers, used in object-oriented database systems

Transaction Servers
Also called query server systems or SQL server systems; clients send requests to
the server system where the transactions are executed, and results are shipped
back to the client.
 Requests specified in SQL, and communicated to the server through a remote
procedure call (RPC) mechanism.
Transactional RPC allows many RPC calls to collectively form a transaction.
 Open Database Connectivity (ODBC) is an application program interface
standard from Microsoft for connecting to a server, sending SQL requests, and
receiving results.
Data Servers
 Used in LANs, where there is a very high speed connection between the clients and
the server, the client machines are comparable in processing power to the server
machine, and
the tasks to be executed are compute intensive.
 Ship data to client machines where processing is performed, and then ship results
back to the server machine.
 This architecture requires full back-end functionality at the clients.
 Used in many object-oriented database systems

Objective type of questions (Very short notes)

1. What do you mean by data inconsistency?


Ans: Data inconsistency exists when different and conflicting versions of the same data
appear in different places. Data inconsistency creates unreliable information, because it
will be difficult to determine which version of the information is correct.

Data inconsistency is likely to occur when there is data redundancy. Data redundancy
occurs when the data file/database file contains redundant unnecessarily duplicated
data. That’s why one major goal of good database design is to eliminate data
redundancy.

2. What is the importance of atomicity?


Ans: Atomicity is a feature of databases systems dictating where a transaction must be
all-or-nothing. That is, the transaction must either fully happen, or not happen at all. It
must not complete partially.

3. What does durability mean?


Ans: In database systems, durability is the ACID property which guarantees that
transactions that have committed will survive permanently. For example, if a flight
booking reports that a seat has successfully been booked, then the seat will remain
booked even if the system crashes.

4. What are the disadvantages in File Processing System?


Ans: 1) Data redundancy and inconsistency.
2) Difficult in accessing data.
3) Data isolation.
4) Data integrity.
5) Concurrent access is not possible.
6) Security Problems.
5. What is physical, logical and view level of data abstraction?

Ans: We have three levels of abstraction:


Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.

View level: Highest level of data abstraction. This level describes the user interaction
with database system.

6. Why does database system offer different level of abstraction?

Ans: At physical level these records can be described as blocks of storage (bytes,
gigabytes, terabytes etc.) in memory. These details are often hidden from the
programmers.

At the logical level these records can be described as fields and attributes along with
their data types, their relationship among each other can be logically implemented. The
programmers generally work at this level because they are aware of such things about
database systems.

At view level, user just interact with system with the help of GUI and enter the details
at the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.

7. What is a data model?


Ans: Data model is Relatively simple representation, usually graphical, of complex real
world data Structures and Communications tool to facilitate interaction among the
designer, the applications programmer, and the end user.

8. What is the importance of data model?


Ans: End users have different views and needs for data and Data model organizes data
for various users

9. What is Data Independence?

Ans: Data independence means that "the application is independent of the storage
structure and access strategy of data". In other words, The ability to modify the schema
definition in one level should not affect the schema definition in the next higher level.
Two types of Data Independence:

1. Physical Data Independence: Modification in physical level should not affect the
logical level.
2. Logical Data Independence: Modification in logical level should affect the view
level.

10. State the role of buffer manager.


Ans: Buffer manager intelligently shuffles data from main memory to disk: It is
transparent to higher levels of DBMS operation

11. Explain five duties of Database Administrator.


Ans:
 Deciding the information content of the database
 Deciding the storage structure and access strategy
 Liaising with the users
 Defining authorization checks and validation procedures
 Defining a strategy for backup and recovery
 Monitoring performance and responsibilities to changes in requirements
12. What is the purpose of Transaction Manager?
Ans: A user’s program may carry out many operations on the data retrieved from the
database, but the DBMS is only concerned about what data is read/written from/to the
database. Transaction Manager controls the execution of transactions.

13. State the function of file manger.


Ans: Atomicity of updates

✔Failures may leave database in an inconsistent state with partial updates carried out

✔E.g. transfer of funds from one account to another should either complete or not
happen at all Concurrent access by multiple users

✔Concurrent accessed needed for performance

✔Uncontrolled concurrent accesses can lead to inconsistencies


–E.g. two people reading a balance and updating it at the same time Security problems
Database systems offer solutions to all the above problems

14. Define the terms 1) physical schema 2) logical schema.


Ans: Physical schema: The physical schema describes the database design at the physical
level, which is the lowest level of abstraction describing how the data are actually stored.
Logical schema: The logical schema describes the database design at the logical level, which
describes what data are stored in the database and what relationship exists among the data.

1. How does conventional file processing system works?


Ans: A file-based system is a collection of application programs that perform services for the
end-users such as students reports for the academic office and lecturers report for the dean’s
office. Each program defines and manages its own data.

Traditionally, manual files are being used to store all internal and external data within an
organization. These files are being stored in cabinets and for security purposes, the cabinets
are locked or located in a secure area. When any information is needed, you may have to
search starting from the first page until you found the information that you are looking for.
To speed up the searching process, you may create an indexing system to help you locate the
information that you are looking for quickly. You may have such system that store all your
results or important documents.

The manual filing system works well if the number of items stored is not large. However, this
kind of system may fail if you want to do a cross-reference or process any of the information
in the file. Then, computer based data processing emerge and it replaces the traditional filing
system with computer-based data processing system or file-based system. However, instead
of having a centralized store for the organizations operational access, a decentralized
approach was taken. In this approach, each department would have their own file-based
system where they would monitor and control separately.

Lets refer to the following example.

File processing system at Make-Believe real estate company Make-Believe real estate
company has three departments, that are, Sales, Contract and Personnel. Each of these
departments were physically located in the same building, but in separate floors, and each has
its own file-based system. The function of the Sales department is to sell and rent properties.
The function of the Contract department is handle the lease agreement associated with
properties for rent. The function of the Personnel department is to store the information about
the staff. Figure illustrates the file-based system for Make-Believe real estate company. Each
department has its own application program that handles similar operations like data entry,
file maintenance and generation of reports.
2. What are the anomalies related to traditional file processing systems?

Ans: the anomalies related to traditional file processing systems:

Data Redundancy: Data Redundancy means same information is duplicated in several files. This
makes data redundancy.

Data Inconsistency: Data Inconsistency means different copies of the same data are not
matching. That means different versions of same basic data are existing. This occurs as the
result of update operations that are not updating the same data stored at different places.

Example: Address Information of a customer is recorded differently in different files.

Difficulty in Accessing Data: It is not easy to retrieve information using a conventional file
processing system. Convenient and efficient information retrieval is almost impossible using
conventional file processing system.

Data Isolation: Data are scattered in various files, and the files may be in different format,
writing new application program to retrieve data is difficult.

Integrity Problems: The data values may need to satisfy some integrity constraints. For
example the balance field Value must be grater than 5000. We have to handle this through
program code in file processing systems. But in database we can declare the integrity
constraints along with definition itself.

Atomicity Problem: It is difficult to ensure atomicity in file processing system. For example
transferring $100 from Account A to account B.If a failure occurs during execution there could
be situation like $100 is deducted from Account A and not credited in Account B.

Concurrent Access anomalies: If multiple users are updating the same data simultaneously it
will result in inconsistent data state. In file processing system it is very difficult to handle this
using program code. This results in concurrent access anomalies.

Security Problems: Enforcing Security Constraints in file processing system is very difficult as
the application programs are added to the system in an ad-hoc manner.

3. What are the disadvantages of File Processing System? What are the
advantages of DBMS?
Ans: The database approach offers a number of potential advantages compared to
traditional file processing systems.

1. Program-Data Independence: The separation of data descriptions from the


application programs that use the data is called Data independence. With the database
approach. Data descriptions are stored in a central location called the repository. This
property of database systems allows an organization’s data to change without changing
the application programs that process the data.
2. Data Redundancy and Inconsistency: In File-processing System, files having
different formats and application programs may created by different programmers.
Similarly different programs may be written in several programming languages. The
same information placed at different files which cause redundancy and inconsistency
consequently higher storage and access cost. For Example, the address and telephone
number of a person may exist in two files containing saving account records and
checking account records. Now a change in person’s address may reflect the saving
account records but not any where in the whole system. This results the data
inconsistency. One solution to avoid this data redundancy is keeping the multiple copies
of same information, replace it by a system where the address and telephone number
stored at just one place physically while it is accessible to all applications from this
itself. DBMS can handle Data Redundancy and Inconsistency.

3. Difficulty in accessing Data: - In Classical file organization the data is stored in the
files. Whenever data has to be retrieved as per the requirements then a new application
program has to be written. This is tedious process.

4. Data isolation: - Since data is scattered in various files, and files may be in different
formats, it is difficult to write new application programs to retrieve the appropriate data.

5. Concurrent access: - There is no central control of data in classical file organization.


So, the concurrent access of data by many users is difficult to implement.

6. Security Problems: - Since there is no centralized control of data in classical file


organization. So, security, enforcement is difficult in File-processing system.

7. Integrity Problem: - The data values stored in the database must satisfy certain types
of consistency constraints. For example, the balance of a bank account may never fall
below a prescribed amount. These constraints are enforced in the system by adding
appropriate code in the various application programs. However, when new constraints
are added, it is difficult to change the programs to enforce them. The problem is
compounded when constraints involve several data items from different files.

8. Improved Data Sharing: - A database is designed as a shared corporate resource.


Authorized internal and external users are granted permission to use the database, and
each user is provided one or more user views to facilitate this use. A user view is a
logical description of some portion of the database that is required by a user to perform
some task.

9. Increased Productive of Application Development: - A major advantage of the


database approach is that it greatly reduces the cost and time of developing new
business applications.

There are two important reasons that data base applications can often be developed
much more rapidly than conventional file applications.
a) Assuming that the database and the related data capture and maintenance
applications have already been designed and implemented, the programmer can
concentrate on the specific functions required for the new application, without having to
worry about file design or low-level implementation details.

b) The data base management system provided a number of high-level productivity


tools such as forms and reports generators and high-level languages that automate some
of the activities of database design and implementation.

Disadvantages:-
1. It occupies more amount of space. It is generic

2. More time they access data.

3. More complex and expensive hardware and software resources are needed.

4. Sophisticated security measures must be implemented to prevent unauthorized access


of sensitive data in online storage.

4. Explain all different levels of data abstraction?


Ans: The database designer starts with an abstract view of the overall data environment
and adds details as the design comes closer to implementation. Using levels of
abstraction can also be very helpful in integrating multiple (and sometimes conflicting)
views of data as seen at different levels of an organization. The ANSI/SPARC
architecture (as it is often referred to) defines three levels of data abstraction: external,
conceptual, and internal. You can use this framework to better understand database
models, as shown in Figure
The External Model:
The external model is the end users’ view of the data environment. The term end
users refers to people who use the application programs to manipulate the data and generate
information. End users usually operate in an environment in which an application has a
specific business unit focus. Companies are generally divided into several business units,
such as sales, finance, and marketing. Each business unit is subject to specific constraints and
requirements, and each one uses a data subset of the overall data in the organization.
Therefore, end users working within those business units view
their data subsets as separate from or external to other units within the organization.

The use of external views representing subsets of the database has some important
advantages:

It makes it easy to identify specific data required to support each business unit’s operations.

It makes the designer’s job easy by providing feedback about the model’s adequacy.
Specifically, the model can be checked to ensure that it supports all processes as defined by
their external models, as
well as all operational requirements and constraints.

It helps to ensure security constraints in the database design. Damaging an entire database is
more difficult when each business unit works with only a subset of data.

It makes application program development much simpler.

The Conceptual Model:

Having identified the external views, a conceptual model is used, graphically represented by
an ERD to integrate all external views into a single view. The conceptual model represents a
global view of the entire database as viewed by the entire organization. That is, the
conceptual model integrates all external views (entities, relationships, constraints, and
processes) into a single global view of the data in the enterprise. The conceptual model yields
some very important advantages.

First, it provide relatively easily understood bird’s eye (macro level) view of the data
environment.

Second, the conceptual model is independent of both software and hardware. Software
independence means that the model does not depend on the DBMS software used to
implement the model.

Hardware independence means that the model does not depend on the hardware used in the
implementation of the model. Therefore, changes in either the hardware or the DBMS
software will have no effect on the database design at the conceptual level. Generally, the
term logical design is used to refer to the task of creating a conceptual data model that could
be implemented in any DBMS.

The Internal Model:


Once a specific DBMS has been selected, the internal model maps the conceptual model to
the DBMS. The internal model is the representation of the database as “seen” by the DBMS.
In other words, the internal model requires the designer to match the conceptual model’s
characteristics and constraints to those of the selected implementation model. An internal
schema depicts a specific representation of an internal model, using the database constructs
supported by the chosen database. Because the internal model depends on specific database
software, it is said to be software dependent. Therefore, a change in the DBMS software
requires that the internal model be changed to fit the characteristics and requirements of the
implementation database model.

The Physical Model: The physical model operates at the lowest level of abstraction,
describing the way data are saved on storage media such as disks or tapes. The storage
structures used are dependent on the software (the DBMS and the operating system) and on
the type of storage devices that the computer can handle. The precision required in the
physical model’s definition demands that database designers who work at this level have a
detailed knowledge of the hardware and software used to implement the database design. As
noted earlier, the physical model is dependent on the DBMS, methods of accessing files, and
types of hardware storage devices supported by the operating system. When you can change
the physical model without affecting the internal model, you have physical independence.
Therefore, a change in storage devices or methods and even a change in operating system will
not affect the internal model.

5. Explain in brief object based and semi-structured databases.


Ans:
A semi structured database (SSDB) is a collection of objects in which attribute labels may be
associated with values of different types, even within the same complex object. For this
reason, it is usually difficult or impossible to define a schema for the data that has a complete,
independent description of all objects in the database. Instead, typically each object has its
type annotated to itself, i.e., a “local schema”. As a consequence, the actual schema of the
database, i.e., the union of all local schemas, is relatively large compared to the data alone,
thus blurring the line that separates data from metadata. Figure 1(a) depicts a textual
description of a hypothetical semi structured database of bibliographic entries, which we use
as an example throughout the paper. This example is not intended to cover all aspects of semi
structured data, rather to give a flavor of the kinds of constructs we will be dealing with. In
particular, note the irregular type for book objects and the reference from the article
object to the first book object, via its oid
Semi structured data models. Semi structured databases are often modeled by rooted directed
graphs, in which vertices represent objects and edges represent relationships among objects:
attribute edges represent the attributes of the objects, while reference edges represent
references between objects (e.g., the cites attribute in our example). Reference edges are
useful for avoiding duplicating the description of objects. Figure 1(b) shows a graph that
models our example database.

6. Explain Database System Architecture.


Ans: Components of a DBMS
These functional units of a database system can be divided into two parts:
1. Query Processor Units(Components)
2. Storage Manager Units

Query Processor Units:


Query processor units deal with execution of DDL and DML statements.
 DDL Interpreter— Interprets DDL statements into a set of tables containing
metadata.
 DML Compiler— Translates DML statements into low level instructions that
the query evaluation engine understands.
 Embedded DML Pre-compiler—Converts DML statements embedded in an
application program into normal procedure caIls in the host language.
 Query Evaluation Engine—Executes low level instructions generated by DML
compiler.

Storage Manager Units:


Storage manager units provide interface between the low level data stored in
database and the application programs & queries submitted to the system.
 Authorization Manager—Checks the authority of users to access data.
 Integrity Manager—Checks for the satisfaction of the integrity constraints.
 Transaction Manager—Preserves atomicity and controls concurrency.
 File Manager—Manages allocation of space on disk storage.
 Buffer Manager—Fetches data from disk storage to memory for being used.
In addition to these functional units ,several data structures are required to implement
physical storage system. These are described below:
 Data Files— To store user data.
 Data Dictionary and System Catalog— To store meta data. It issued heavily,
almost for each and every data manipulation operation. So, it should be accessed
efficiently.
 Indices—To provide faster access to data items.
 Statistical Data— To store statistical information about the data in the database.
This information issued by the query processor to select efficient ways to
execute a query.

7. Write short note on followings:

(i) Relational Constraints


Ans: Constraints

•Restrictions on the permitted values in a database state

•Derived from the rules in the mini world that the database represents

 Inherent model-based constraints or implicit constraints


• Inherent in the data model
• e.g., duplicate tuples are not allowed in a relation
Schema-based constraints or explicit constraints
•Can be directly expressed in schemas of the data model
•e.g., films have only one director
Application-based or semantic constraints
•Also called business rules
•Not directly expressed in schemas
•Expressed and enforced by application program
• e.g., this year’s salary increase can be no more than last year’s

(ii) Disadvantages of Relational Approach


Ans: It is convenient to divide the limitations up into two categories.

First of all, there are always very special types of data which require special forms of
representation
and/or inference. Some examples are the following.
Limitations regarding special forms of data:
 Temporal data
 Spatial data
 Multimedia data
 Unstructured data (warehousing/mining)
 Document libraries (digital libraries)

Limitations regarding SQL as the query language:


·Recursive queries (e.g., compute the ancestor relation from the parent relation):
·Although part of the SQL:1999 standard, recursive queries are still not supported by many
systems (e.g. PostgreSQL).
·
Support for recursive queries in SQL:1999 is weak in any case. (Only so-called linear queries
are supported.)

(iii) Instances and Schemas


Ans: Independent from the database model it is important to differentiate
between the description of the database and the database itself. The
description of the database is called database scheme or also metadata. The
database scheme is defined during the database design process and changes
very rarely afterwards.
The actual content of the database, the data, changes often over the years. A
database state at a specific time defined through the currently existing
content and relationship and their attributes is called a database instance
The following illustration shows that a database scheme could be looked at
like a template or building plan for one or several database instances.
7. Describe the differences between data and database administration.
Ans: A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. This is a collection of related data with an implicit meaning
and hence is a database. The collection of data, usually referred to as the database, contains
information relevant to an enterprise. The primary goal of a DBMS is to provide a way to
store and retrieve database information that is both convenient and efficient. By data, we
mean known facts that can be recorded and that have implicit meaning. For example,
consider the names, telephone numbers, and addresses of the people you know. You may
have recorded this data in an indexed address book, or you may have stored it on a diskette,
using a personal computer and software such as DBASE IV or V, Microsoft ACCESS, or
EXCEL. A datum– a unit of data – is a symbol or a set of symbols which is used to represent
something. This relationship between symbols and what they represent is the essence of what
we mean by information. Hence, information is interpreted data – data supplied with
semantics. Knowledge refers to the practical use of information. While information can be
transported, stored or shared without many difficulties the same can not be said about
knowledge. Knowledge necessarily involves a personal experience. Referring back to the
scientific experiment, a third person reading the results will have information about it, while
the person who conducted the experiment personally will have knowledge about it. Database
systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the
information stored, despite system crashes or attempts at unauthorized access. If data are to
be shared among several users, the system must avoid possible anomalous results. Because
information is so important in most organizations, computer scientists have developed a large
body of concepts and techniques for managing data.

One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is
called a database administrator (DBA). The functions of a DBA include: Schema
definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL. Storage structure and access-method definition. Schema
and physical-organization modification. The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization, or to alter the physical
organization to improve performance. Granting of authorization for data access. By granting
different types of authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a special system
structure that the database system consults whenever someone attempts to access the data in
the system. Routine maintenance. Examples of the database administrator’s routine
maintenance activities are: Periodically backing up the database, either onto tapes or onto
remote servers, to prevent loss of data in case of disasters such as flooding. Ensuring that
enough free disk space is available for normal operations, and upgrading disk space as
required. Monitoring jobs running on the database and ensuring that performance is not
degraded by very expensive tasks submitted by some users.
8. How different users interact with the database system?
Ans: A primary goal of a database system is to retrieve information from and store new
information in the database. People who work with a database can be categorized as database
users or database administrators.

Database Users and User Interfaces:

There are four different types of database-system users, differentiated by the way they expect
to interact with the system. Different types of user interfaces have been designed for the
different types of users.
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs
to transfer $50 from account A to account B invokes a program called transfer. This program
asks the teller for the amount of money to be transferred, the account from which the money
is to be transferred, and the account to which the money is to be transferred. As another
example, consider a user who wishes to find her account balance over the World Wide Web.
Such a user may access a form, where she enters her account number. An application
program at the Web server then retrieves the account balance, using the given account
number, and passes this information back to the user. The typical user 17 interface for naive
users is a forms interface, where the user can fill in appropriate fields of the form. Naive
users may also simply read reports generated from the database.

Application programmers are computer professionals who write application programs.


Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program. There are also special types of
programming languages that combine imperative control structures (for example, for loops,
while loops and if-then-else statements) with statements of the data manipulation language.
These languages, sometimes called fourth-generation languages, often
include special features to facilitate the generation of forms and the display of data on the
screen. Most major commercial database systems include a fourth generation language.
Sophisticated users interact with the system without writing programs. Instead, they form
their requests in a database query language. They submit each such query to a query
processor, whose function is to break down DML statements into instructions that the
storage manager understands. Analysts who submit queries to explore data in the database
fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view
summaries of data in different ways. For instance, an analyst can see total sales by region (for
example, North, South, East, and West), or by product, or by a combination of region and
product (that is, total sales of
each product in each region). The tools also permit the analyst to select specific regions, look
at data in more detail (for example, sales by city within a region) or look at the data in less
detail (for example, aggregate products together by category). Another class of tools for
analysts is data mining tools, which help them find certain kinds of patterns in data.
Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are
computer-aided design systems, knowledge base and expert systems, systems that store data
with complex data types (for example, graphics data and audio data), and environment-
modeling systems.

9. Explain with suitable example data redundancy and inconsistency problem.


Ans: Records can be arranged in several ways on a storage medium, and the arrangement
determines the manner in which individual records can be accessed. In sequential file
organization, data records must be retrieved in the same physical sequence in which they are
stored. (The operation is like a tape recorder.) In direct or random file organization, users can
retrieve records in any sequence, without regard to actual physical order on the storage
medium.
The operation is like a CD drive.) Magnetic tape utilizes sequential file organization, whereas
magnetic disks use direct file organization. The indexed sequential access method (ISAM)
uses an index of key fields to locate individual records (see Figure T3.2). An index to a file
lists the key field of each record and where that record is physically located in storage.
Records are stored on disks in their key sequence. A track index shows the highest value of
the key field that can be found on a specific track. To locate a specific record, the track index
is searched to locate the cylinder and the track containing the record. The track is then
sequentially read to find the record.
The direct file access method uses the key field to locate the physical address of a record.
This process employs a mathematical formula called a transform algorithm to translate the
key field directly into the record’s storage location on disk. The algorithm performs a
mathematical calculation on the record key, and the result of that calculation is the record’s
address. The direct access method is most appropriate when individual records must be
located directly and rapidly for immediate processing, when a few records in the file need to
be retrieved at one time, and when the required records are found in no particular sequence.
Organizations typically began automating one application at a time. These systems grew
independently, without overall planning. Each application required its own data, which were
organized into a data file. This approach led to redundancy, inconsistency, data isolation, and
other problems. uses a university file environment as an example. The applications (e.g.,
registrar, accounting, or athletics) would share some common core functions, such as input,
report generation, querying, and data browsing. However, these common functions would
typically be designed, coded, documented, and tested, at great expense, for each application.
Moreover, users must be trained to use each application. File environments often waste
valuable resources creating and maintaining similar applications, as well as in training users
how to use them.

Other problems arise with file management systems. The first problem is data redundancy:
As applications and their data files were created by different programmers over a period of
time, the same data could be duplicated in several files. In the university example, each data
file will contain records about students, many of whom will be represented in other data files.
Therefore, student files in the aggregate will contain some amount of duplicate data. This
wastes physical computer storage media, the students’ time and effort, and the clerks’ time
needed to enter and maintain the data. Data redundancy leads to the potential for data
inconsistency. Data inconsistency means that the actual values across various copies of the
data no longer agree or are not synchronized. For example, if a student changes his or her
address, the new address must be changed across all applications in the university that require
the address. File organization also leads to difficulty in accessing data from different
applications, a problem called data isolation. With applications uniquely designed and
implemented, data files are likely to be organized differently, stored in different formats (e.g.,
height in inches versus height in centimeters), and often physically inaccessible to other
applications. In the university example, an administrator who wanted to know which students
taking advanced courses were also starting players on the football team would most likely not
be able to get the answer from the computer-based file system. He or she would probably
have to manually compare printed output data from two data files. This process would take a
great deal of time and effort and would ignore the
greatest strengths of computers—fast and accurate processing.

You might also like