Database Unit-1 Notes
Database Unit-1 Notes
Database System Applications: Databases are widely used. Here are some representative
applications:
Banking: For customer information, accounts, and loans, and banking transactions.
Airlines: For reservations and schedule information. Airlines were among the first to use databases
in a geographically distributed manner—terminals situated around the world accessed the central
database system through phone lines and other data networks.
Universities: For student information, course registrations, and grades.
Credit card transactions: For purchases on credit cards and generation of monthly statements.
Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
Finance: For storing information about holdings, sales, and purchases of financial instruments such
as stocks and bonds.
Sales: For customer, product, and purchase information.
Manufacturing: For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/stores, and orders for items.
Human resources: For information about employees, salaries, payroll taxes and benefits, and for
generation of pay-checks.
• Difficulty in accessing data. Suppose that one of the bank officers needs to find out the names of all
customers who live within a particular postal-code area. The officer asks the data-processing department to
generate such a list. Because the designers of the original system did not anticipate this request, there is no
application program on hand to meet it. Conventional file-processing environments do not allow needed data
to be retrieved in a convenient and efficient manner. More responsive data-retrieval systems are required for
general use.
• Data isolation. Because data are scattered in various files, and files may be in different formats, writing
new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. For example, the balance of a bank account may never fall below a prescribed amount (say, Rs.
2500). Developers enforce these constraints in the system by adding appropriate code in the various
application programs. However, when new constraints are added, it is difficult to change the programs to
enforce them. The problem is compounded when constraints involve several data items from different files.
• Atomicity problems. A computer system, like any other mechanical or electrical device, is subject to
failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state
that existed prior to the failure. Consider a program to transfer Rs. 5000 from account A to account B. If a
system failure occurs during the execution of the program, it is possible that the Rs. 5000 was removed from
account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is
essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the
funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity in
a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster response,
many systems allow multiple users to update the data simultaneously. In such an environment, interaction of
concurrent updates may result in inconsistent data. Consider bank account A, containing Rs.5000. If two
customers withdraw funds (say Rs. 500 and Rs. 1000 respectively) from account A at about the same time,
the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose
that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the
amount being withdrawn, and write the result back. If the two programs run concurrently, they may both
read the value Rs.5000, and write back Rs 4500 and Rs 4000, respectively. Depending on which one writes
the value last, the account may contain either Rs. 4500 or Rs. 4000, rather than the correct value of Rs. 3500.
To guard against this possibility, the system must maintain some form of supervision. But supervision is
difficult to provide because data may be accessed by many different application programs that have not been
coordinated previously.
• Security problems. Not every user of the database system should be able to access all the data. For
example, in a banking system, payroll personnel need to see only that part of the database that has
information about the various bank employees. They do not need access to information about customer
accounts. But, since application programs are added to the system in an ad hoc manner, enforcing such
security constraints is difficult. These difficulties, among others, prompted the development of database
systems.
At the physical level, a customer, account, or employee record can be described as a block of consecutive
storage locations (for example, words or bytes). The language-compiler hides this level of detail from
programmers. Similarly, the database system hides many of the lowest-level storage details from database
programmers. Database administrators, on the other hand, may be aware of certain details of the physical
organization of the data.
At the logical level, each such record is described by a type definition, as in the previous code segment, and
the interrelationship of these record types is defined as well. Programmers using a programming language
work at this level of abstraction. Similarly, database administrators usually work at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of the data types.
Similarly, at the view level, several views of the database are defined, and database users see these views. In
addition to hiding details of the logical level of the database, the views also provide a security mechanism to
prevent users from accessing certain parts of the database. For example, tellers in a bank see only that part of
DBMS (UNIT-1) PRONAB ADHIKARI
the database that has information on customer accounts; they cannot access information about salaries of
employees.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for describing
data, data relationships, data semantics, and consistency constraints.
To illustrate the concept of a data model, we outline two data models in this section: the entity-relationship
model and the relational model. Both provide a way to describe the design of a database at the logical level.
The overall logical structure (schema) of a database can be expressed graphically by an E-R diagram, which
is built up from the following components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Relational Model
The relational model uses a collection of tables to represent both data and the relationships among those
data. Each table has multiple columns, and each column has a unique name.
The relational model is an example of a record-based model. Record-based models are so named because the
database is structured in fixed-format records of several types. Each table contains records of a particular
type. Each record type defines a fixed number of fields, or attributes. The columns of the table correspond to
the attributes of the record type.
Sample relationship Model: Student table with 3 columns and four records.
111 Ashish 23
123 Saurav 22
169 Lester 24
234 Lou 26
Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id & Course_Name
are attributes of table Course. The rows with values are the records (commonly known as tuples).
The relational data model is the most widely used data model, and a vast majority of current database
systems are based on the relational model.
The relational model is at a lower level of abstraction than the E-R model. Database designs are often
carried out in the E-R model, and then translated to the relational model.
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to which all the other data is
linked. The hierarchy starts from the Root data, and expands like a tree, adding child nodes to the parent
nodes.
In this model, a child node will only have a single parent node. This model efficiently describes many real-
world relationships like index of a book, recipes etc. In hierarchical model, data is organised into tree-like
structure with one one-to-many relationship between two different types of data, for example, one
department can have many courses, many professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more like a graph, and are
allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this database model.
Also, as the data is more related, hence accessing the data is also easier and fast. This model expands on the
hierarchical model by providing multiple paths among segments i.e more than one parent-child relationship.
Hence this model allows one-to-one, one-to-many and many-to-many relationships.
DBMS (UNIT-1) PRONAB ADHIKARI
This was the most widely used database model, before Relational Model was introduced.
Database Languages
A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates. In practice, the data definition and data
manipulation languages are not two separate languages; instead they simply form parts of a single database
language, such as the widely used SQL language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a data-
definition language (DDL). For instance, the following statement in the SQL language defines the
account table:
create table account (account-number char(10), balance integer)
Execution of the above DDL statement creates the account table.
In addition, it updates a special set of tables called the data dictionary or data directory. A data
dictionary contains metadata—that is, data about data. The schema of a table is an example of
metadata. A database system consults the data dictionary before reading or modifying actual data.
We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language. These statements
define the implementation details of the database schemas, which are usually hidden from the users.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user does
not have to specify how to get the data, the database system has to figure out an efficient means of accessing
data. The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that involves
information retrieval is called a query language. Although technically incorrect, it is common practice to
use the terms query language and data manipulation language synonymously.
There are a number of database query languages in use, either commercially or experimentally.
At the physical level, we must define algorithms that allow efficient access to data. At higher levels of
abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently with the system.
The query processor component of the database system translates DML queries into sequences of actions at
the physical level of the database system.
4. Database Administrator One of the main reasons for using DBMSs is to have central control of
both the data and the programs that access those data. A person who has such central control over the
system is called a database administrator (DBA). The functions of a DBA include:
Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
Storage structure and access-method definition.
Schema and physical-organization modification. The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization, or to alter the physical
organization to improve performance.
Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
Routine maintenance. Examples of the database administrator’s routine maintenance activities are:
Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of
data in case of disasters such as flooding.
Ensuring that enough free disk space is available for normal operations, and upgrading disk space
as required.
Monitoring jobs running on the database and ensuring that performance is not degraded by very
expensive tasks submitted by some users.
Transaction Management
Often, several operations on the database form a single logical unit of work. An example is a funds transfer,
in which one account (say A) is debited and another account (say B) is credited. Clearly, it is essential that
either both the credit and debit occur, or that neither occur. That is, the funds transfer must happen in its
entirety or not at all. This all-or-none requirement is called atomicity. In addition, it is essential that the
execution of the funds transfer preserve the consistency of the database. That is, the value of the sum A + B
must be preserved. This correctness requirement is called consistency. Finally, after the successful
DBMS (UNIT-1) PRONAB ADHIKARI
execution of a funds transfer, the new values of accounts A and B must persist, despite the possibility of
system failure. This persistence requirement is called durability.
A transaction is a collection of operations that performs a single logical function in a database application.
Each transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not violate
any database-consistency constraints. That is, if the database was consistent when a transaction started, the
database must be consistent when the transaction successfully terminates. However, during the execution of
a transaction, it may be necessary temporarily to allow inconsistency, since either the debit of A or the credit
of B must be done before the other. This temporary inconsistency, although necessary, may lead to difficulty
if a failure occurs. It is the programmer’s responsibility to define properly the various transactions, so that
each preserves the consistency of the database. For example, the transaction to transfer funds from account A
to account B could be defined to be composed of two separate programs: one that debits account A, and
another that credits account B. The execution of these two programs one after the other will indeed preserve
consistency. However, each program by itself does not transform the database from a consistent state to a
new consistent state. Thus, those programs are not transactions.
Ensuring the atomicity and durability properties is the responsibility of the database system itself—
specifically, of the transaction-management component. In the absence of failures, all transactions
complete successfully, and atomicity is achieved easily. However, because of various types of failure, a
transaction may not always complete its execution successfully. If we are to ensure the atomicity property, a
failed transaction must have no effect on the state of the database. Thus, the database must be restored to the
state in which it was before the transaction in question started executing.
The database system must therefore perform failure recovery, that is, detect system failures and restore the
database to the state that existed prior to the occurrence of the failure. Finally, when several transactions
update the database concurrently, the consistency of data may no longer be preserved, even though each
individual transaction is correct. It is the responsibility of the concurrency-control manager to control the
interaction among the concurrent transactions, to ensure the consistency of the database.
Database systems designed for use on small personal computers may not have all these features. For
example, many small systems allow only one user to access the database at a time. Others do not offer
backup and recovery, leaving that to the user. These restrictions allow for a smaller data manager, with
fewer requirements for physical resources—especially main memory. Although such a low-cost, low-feature
approach is adequate for small personal databases, it is inadequate for a medium- to large-scale enterprise.
Application Architectures
Most users of a database system today are not present at the site of the database system, but connect to it
through a network. We can therefore differentiate between client machines, on which remote database users
work, and server machines, on which the database system runs.
Database applications are usually partitioned into two or three parts-
In a two-tier architecture, the application is partitioned into a component that resides at the client machine,
which invokes database system functionality at the server machine through query language statements.
Application program interface standards like ODBC and JDBC are used for interaction between the client
and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does not contain
any direct database calls. Instead, the client end communicates with an application server, usually through
a forms interface. The application server in turn communicates with a database system to access data. The
business logic of the application, which says what actions to carry out under what conditions, is embedded
in the application server, instead of being distributed across multiple clients. Three-tier applications are more
appropriate for large applications, and for applications that run on the World Wide Web.