DBMS - Unit I
DBMS - Unit I
UNIT – I
1
1.1 A HISTORICAL PERSPECTIVE
❖ From the earliest days of computers, storing and manipulating data is the major application focus.
❖ The first general purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and
was called the Integrated Data Store.
❖ It formed the basis for the network data model, and strongly influenced database systems
through the 1960s.
❖ In the late 1960s, IBM developed the Information Management System (IMS) DBMS, IMS formed the basis
for the hierarchical data model.
❖ In 1970, Edgar Codd, at IBM's San Jose Research Laboratory, proposed a new data representation framework
called the relational data model.
❖ This is a big change in the development of database systems, it ignited rapid development of several DBMSs
based on the relational model along with a rich body of theoretical results.
❖ Database systems matured as an academic discipline, and the popularity of relational DBMSs changed the
commercial landscape. Their benefits were widely recognized, and the use of DBMSs for managing corporate
data became standard practice.
❖ In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm, and database
systems continued to gain widespread use.
❖ The SQL query language for relational databases, developed as part of IBM's System R project, is now the
standard query language. SQL was standardized in the late 1980s.
❖ In the late 1980s and the 1990s, advances and researches have been made in many areas of database systems
to support complex analysis of data from all parts of an enterprise.
❖ Several vendors (e.g., IBM's DB2, Oracle 8, Informix UDS) have extended their systems with the ability to
store new data types such as images and text, and with the ability to ask more complex queries.
❖ Specialized systems have been developed by numerous vendors for creating data warehouses, consolidating
data from several databases, and for carrying out specialized analysis.
❖ Enterprise resource planning (ERP) and management resource planning (MRP) packages have been developed.
❖ Most significantly DBMSs have entered into the Internet Age.
❖ The use of a DBMS to store data that is accessed through a Web browser is becoming widespread.
❖ Queries are generated through Web-accessible forms and answers are formatted using a markup language such
as HTML, in order to be easily displayed in a browser.
❖ All the database vendors are adding features to their DBMS aimed at making it more suitable for deployment
over the Internet.
2
❖ Database management continues to gain importance as more and more data is brought on-line, and made ever
more accessible through computer networking.
❖ Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital
libraries, and a host of scientific projects such as the human genome mapping effort and NASA's Earth
Observation System project.
❖ Companies are using DBMS concept in their decision-making processes and mine their data repositories for
useful information about their businesses.
1.2 FILE SYSTEMS VERSUS A DBMS
1.2.1 File Processing System
❖ In case of file-processing system, The system stores permanent records in various files with the support of
operating system.
❖ Many application programs are used to manipulate (add, Extract, update, delete) the stored data.
❖ Ex:- Consider part of a savings-bank enterprise that keeps information about all customers and savings
accounts.
❖ To allow users to manipulate the information, the system has a number of application programs
• A program to debit or credit an account
• A program to add a new account
• A program to find the balance of an account
❖ System programmers wrote these application programs to meet the needs of the bank.
❖ New application programs are added to the system as the need arises.
❖ For example, suppose that the savings bank decides to generate monthly statements. As a result, the bank
creates new permanent files that contain information about to generate monthly statements.
❖ Thus, as time goes by, the system acquires more files and more application programs.
❖ Keeping organizational information in a file-processing system has a number of major disadvantages:
1.2.2 File Processing System Disadvantages
1. Data redundancy and inconsistency 2. Difficulty in accessing data. 3. Data isolation
4. Integrity problems 5. Atomicity problems 6. Concurrent-access anomalies 7. Security
problems
1. Data redundancy and inconsistency: -
❖ Since different programmers create the files and application programs over a long period, the various files are
likely to have different formats and the programs may be written in several programming languages. Moreover,
the same information may be duplicated in several places (files).
❖ For example, the address and telephone number of a particular customer may appear in a file that consists of
savings-account records and in a file that consists of checking-account records. This redundancy leads to
3
higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the
same data may no longer agree. For example, a changed customer address may be reflected in savings-account
records but not elsewhere in the system.
2. Difficulty in accessing data: -
❖ Suppose that one of the bank officers needs to find out the names of all customers who live within a particular
postal-code area. The officer asks the data-processing department to generate such a list. Because the designers
of the original system did not anticipate this request, there is no application program on hand to meet it. There
is, however, an application program to generate the list of all customers. The bank officer has now two choices:
either obtain the list of all customers and extract the needed information manually or ask a system programmer
to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such
a program is written, and that, several days later, the same officer needs to trim that list to include only those
customers who have an account balance of $10,000 or more. As expected, a program to generate such a list
does not exist. Again, the officer has the preceding two options, neither of which is satisfactory.
❖ The point here is that conventional file-processing environments do not allow needed data to be retrieved in a
convenient and efficient manner. More responsive data-retrieval systems are required for general use.
3. Data isolation: -
❖ Because data are scattered in various files, and files may be in different formats, writing new application
programs to retrieve the appropriate data is difficult.
4. Integrity problems: -
❖ The data values stored in the database must satisfy certain types of consistency constraints.
❖ For example, the balance of a bank account may never fall below a prescribed amount (say, 1000Rs).
Developers enforce these constraints in the system by adding appropriate code in the various application
programs. However, when new constraints are added, it is difficult to change the programs to enforce them.
The problem is compounded when constraints involve several data items from different files.
5. Atomicity problems: -
❖ A computer system, like any other mechanical or electrical device, is subject to failure.
❖ In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed
prior to the failure.
❖ Consider a program to transfer Rs 5000 from account A to account B. If a system failure occurs during the
execution of the program, it is possible that the Rs5000 was removed from account A but was not credited to
account B, resulting in an inconsistent database state. Clearly, it is essential to database consistency that either
both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic—it must happen
in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
6. Concurrent-access anomalies: -
4
❖ For the sake of overall performance of the system and faster response, many systems allow multiple users to
update the data simultaneously. In such an environment, interaction of concurrent updates may result in
inconsistent data.
❖ Consider bank account A, containing Rs 10000. If two customers withdraw funds (say Rs 5000 and Rs1000
respectively) from account A at about the same time, the result of the concurrent executions may leave the
account in an incorrect (or inconsistent) state.
❖ Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by
the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both
read the value 10000, and write back Rs5000 and Rs 9000, respectively. Depending on which one writes the
value last, the account may contain Rs 5000 or Rs 9000, rather than the correct value of Rs 4000.
❖ To guard against this possibility, the system must maintain some form of supervision. But supervision is
difficult to provide because data may be accessed by many different application programs that have not been
coordinated previously.
7. Security problems: -
❖ Not every user of the database system should be able to access all the data.
❖ For example, in a banking system, payroll personnel need to see only that part of the database that has
information about the various bank employees.
❖ They do not need access to information about customer accounts. But, since application programs are added
to the system in an ad hoc manner, enforcing such security constraints is difficult.
1.2.3 DBMS
❖ DBMS is a piece of software that is designed to make the preceding tasks easier.
❖ By storing data in a DBMS, we can use the DMBS features to manage the data in an efficient manner.
❖ DBMS supports huge volume of data and more number of users.
1.2.4 Advantages of DBMS
❖ Using a DBMS to manage data has many advantages:
1. Data independence: -
❖ Application programs should be as independent as possible from details of data representation and storage.
❖ The DBMS can provide an abstract view of the data to insulate application code from such details.
2. Efficient data access: -
❖ A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently.
❖ This feature is especially important if the data is stored on external storage devices.
3. Data integrity and security: -
❖ If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data.
5
❖ For example, before inserting salary information for an employee, the DBMS can check that the department
budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to different
classes of users.
4. Data administration:-
❖ When several users share the data, centralizing the administration of data can offer significant improvements.
❖ Experienced professionals who understand the nature of the data being managed, and how different groups of
users use it, can be responsible for organizing the data representation to minimize redundancy and for fine-
tuning the storage of the data to make retrieval efficient.
5. Concurrent access and crash recovery: -
❖ A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being
accessed by only one user at a time. Further, the DBMS protects users from the effects of system failures.
6. Reduced application development time: -
❖ Clearly, the DBMS supports many important functions that are common to many applications accessing data
stored in the DBMS.
❖ This, in conjunction with the high-level interface to the data, facilitates quick development of applications.
Such applications are also likely to be more robust than applications developed from scratch because many
important tasks are handled by the DBMS instead of being implemented by the application.
1.3 THE DATA MODEL
❖ Underlying the structure of a database is the data model
❖ Data model is a collection of conceptual tools for describing data, data relationships, data semantics (meaning
and use of specific pieces of data), and consistency constraints.
❖ To illustrate the concept of a data model, we use two data models to describe the design of a database at the
logical level they are 1. The E-R Model, 2. Relation Model
1.3.1 The Entity-Relationship Model
❖ The entity-relationship (E-R) data model is based on a perception of a real world that consists of a collection
of basic objects, called entities, and of relationships among these objects.
❖ An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For example,
each person is an entity, and bank accounts can be considered as entities.
❖ Entities are described in a database by a set of attributes.
❖ For example, the attributes account-number and balance may describe one particular account in a bank.
❖ Attributes customer-name; customer-street address and customer-city may describe a customer entity.
❖ An extra attribute customer-id is used to uniquely identify customers (since it may be possible to have two
customers with the same name, street address, and city).
Note: - In India we use Aadhar number as unique number.
6
❖ A relationship is an association among several entities. For example, a depositor relationship associates a
customer with each account that he has.
❖ The set of all entities of the same type and the set of all relationships of the same type are termed an entity set
and relationship set, respectively.
❖ The overall logical structure (schema) of a database can be expressed graphically by an E-R diagram, which
is built up from the following components:
Percentage Table
Roll Number %
13601A501 94
13601A502 90
13601A503 91
13601A504 88
8
Figure 1.2 Network Data model
4. Hierarchical data model: -
❖ A hierarchical model represents the data in a tree-like structure in which there is a single parent for each record.
9
Figure 1.4: Levels of Abstraction in a DBMS
❖ The database description consists of a schema at each of these three levels of abstraction
i) Physical Schema: -
❖ Also called as Physical level or internal level.
❖ The physical schema describes the database design at the physical level
❖ It is the lowest level of abstraction for DBMS which defines how the data is actually stored.
❖ It defines data-structures to store data and access methods used by the database.
❖ Actually, it is decided by developers or database application programmers how to store the data in the database.
❖ The entire database is described in this level that is physical or internal level.
❖ It is a very complex level to understand.
❖ We must decide what file organizations to use to store the relations, and create auxiliary data structures called
indexes to speed up data retrieval operations.
❖ We create and uses indexes for faster retrieval of the data.
ii) Conceptual schema: -
❖ The conceptual schema also called the logical schema or logical level
❖ It describes what data is stored in the database and what relationship exists among those data.
❖ It is less complex than the physical level.
❖ Logical level is used by developers or database administrators (DBA). So, overall, the logical level contains
tables (fields and attributes) and relationships among table attributes.
iii) External schema: -
❖ It is the highest level and also called as view level.
❖ A view is conceptually a relation, but the records in a view are not stored in the DBMS.
❖ Rather, they are computed using a definition for the view, in terms of relations stored in the DBMS.
❖ There are different levels of views and every view only defines a part of the entire data.
10
❖ External schemas, allow data access to be customized (and authorized) at the level of individual users or groups
of users. i.e. it can be used by all users (all levels' users).
❖ It also simplifies interaction with the user and it provides many views or multiple views of the same database.
❖ This level is the least complex and easy to understand.
❖ For example, a user can interact with a system using GUI that is view level and can enter details at GUI or
screen and the user does not know how data is stored and what data is stored, this detail is hidden from the user.
❖ Each external schema consists of a collection of one or more views and relations from the conceptual schema.
1.5 DATA INDEPENDENCE
❖ It is the ability to modify the scheme without affecting the programs and the application to be rewritten.
❖ Data is separated from the programs, so that the changes made to the data will not affect the program execution
and the application.
❖ The data independence provides the database in simple structure
❖ We know the main purpose of the three levels of data abstraction is to achieve data independence.
❖ If the database changes and expands over time, it is very important that the changes in one level should not
affect the data at other levels of the database.
❖ This would save time and cost required when changing the database.
❖ There are two levels of data independence based on three levels of abstraction. They are
i) Physical Data Independence ii) Logical Data Independence
i) Physical Data Independence: -
❖ Physical Data Independence means changing the physical level without affecting the logical level or conceptual
level.
❖ Using this property, we can change the storage device of the database without affecting the logical schema.
❖ The changes in the physical level may include changes using the following −
i) A new storage device like magnetic tape, hard disk, etc.
ii) A new data structure for storage.
iii) A different data access method or using an alternative files organization technique.
iv) Changing the location of the database.
ii) Logical Data Independence: -
❖ Logical view of data is the user view of the data.
❖ It presents data in the form that can be accessed by the end users.
❖ Codd’s Rule of Logical Data Independence says that users should be able to manipulate the logical View of
data without any information of its physical storage.
❖ Software or the computer program is used to manipulate the logical view of the data.
11
❖ Database administrator is the one who decides what information is to be kept in the database and how to use
the logical level of abstraction.
❖ The changes in the logical level may include
i) Change the data definition.
❖ Adding, deleting, or updating any new attribute, entity or relationship in the database.
12
Figure 1.5: The Structure of DBMS
❖ The above figure shows the structure of a typical DBMS based on the relational data model.
❖ The DBMS accepts SQL commands generated from a variety of user interfaces, produces query evaluation
plans, executes these plans against the database, and returns the answers.
❖ The Query evaluation Engine consists
i) Plan Executor ii) Parser iii) Operator Evaluator iv) Optimizer
1) Plan Executor: - It prepares the execution plans. An execution plan is a blueprint for
evaluating a query, and is usually represented as a tree of relational operators
2) Parser: - It translates a textual format into a combination of internal binary structures that can
be easily manipulated by the optimizer.
3) Optimizer: - A query optimizer is a critical database management system (DBMS) component that analyzes
Structured Query Language (SQL) queries and determines efficient execution mechanisms. It picks the lowest cost
evaluation plan from among the alternatives.
4) Operator evaluator: - Which executes low-level instructions generated by the DML compiler.
❖ DML means Data Manipulation Language commands - Insert, Update, Delete, Select etc..
❖ DDL means Data Definition Language commands - Create, Alter, Drop, Truncate
Files and Access methods: - Supports in storing the records in the form of files (pages), organizing the files,
allocating the indexes, fetching the files and also keeps the track of files.
Buffer Manager: - It is a critical part of the database system, it enables the database to handle data sizes that are
much larger than the size of main memory.
A Buffer Manager is responsible for allocating space to the buffer (space in main memory) in order to store
data into the buffer. It is responsible for fetching data from disk storage into main memory, and deciding what data
to cache in main memory.
Disk Space Manager: - The lowest layer of the DBMS software deals with management of space on disk, where
the data is stored. Higher layers allocate, de-allocate, read, and write pages through this layer.
❖ The DBMS supports concurrency and crash recovery by carefully scheduling user requests
and maintaining a log of all changes to the database.
❖ The concurrency control mechanism consists
i) Transaction Manager ii) Lock Manager
i) Transaction Manager: - A transaction manager is a part of an application that controls the
coordination of transactions over one or more resources.
The transaction manager is responsible for creating transaction objects and managing them.
ii) Lock Manager: - The lock manager keeps track of requests for locks and grants locks on database objects when
they become available.
13
The Recovery manager: - It is responsible for maintaining a log, and restoring the system to a consistent state
after a crash.
The disk space manager, buffer manager, and file and access method layers must interact with these
components.
14
❖ In this step we must consider typical expected workloads that our database must support and further refine
the database design to ensure that it meets desired performance criteria.
❖ This step involve in building indexes on some tables and clustering some tables, or it may involve a
substantial redesign of parts of the database schema obtained from the earlier design steps.
6) Security Design:
❖ In this step, we identify different user groups and different roles played by various users (e.g., the
development team, the customer support representatives, and the product manager).
❖ For each role and user group, we must identify the parts of the database that they must be able to access and
must not be able to access.
Note: The ER model is most relevant to the first three steps:
1.8 ENTITIES, ATTRIBUTES, AND ENTITY SETS
❖ An entity is an object in the real world that is distinguishable from other objects.
❖ It is often useful to identify a collection of similar entities.
❖ Such a collection is called an entity set.
❖ An entity is described using a set of attributes.
❖ All entities in a given entity set have the same attributes; this is essentially what we mean by similar.
❖ For each attribute associated with an entity set, we must identify a domain of possible values.
❖ For each entity set, we choose a key.
❖ A key is a minimal set of attributes whose values uniquely identify an entity in the set.
❖ There could be more than one candidate key; if so, we designate one of them as the primary key.
❖ Each attribute in the primary key is underlined
❖ The Employees entity set with attributes ssn, name, and lot is shown in Figure.
5. Derived Attributes:
❖ An attribute which is derived from another attribute is called as a ‘derived attribute.
Example: 1.‘Age’ attribute is derived from another attribute ‘Date’.
2. Average_salary in a department should not be saved directly in the database, instead it can be derived
6. Stored Attributes:
❖ An attribute which is not derived from another attribute is called as a ‘stored attribute.
❖ Stored attributes are those attributes that are stored in the physical database
Example: In the above example,’ Date’ is a stored attribute.
1.9 RELATIONSHIPS AND RELATIONSHIP SETS
❖ A relationship is an association among two or more entities.
❖ A set of similar relationships can be collected into a relationship set.
16
Figure: 1.7 The works_in relationship set
❖ Here each relationship indicates a department in which an employee works.
❖ Note that several relationship sets might involve the same entity sets. For example, we could also have a
Manages relationship set involving Employees and Departments.
❖ A relationship can also have descriptive attributes.
❖ Descriptive attributes are used to record information about the relationship.
❖ An instance of a relationship set is a set of relationships.
❖ Can be thought of as a `snapshot' of the relationship set at some instant of time.
❖ An instance of the Works_In relationship set is shown in the below figure
17
❖ For the simplicity purpose, since value is shown beside each relationship
❖ As another example of an ER diagram, suppose that each department has offices in several locations and we
want to record the locations at which each employee works.
❖ This relationship is ternary because we must record an association between an employee, a department, and a
location.
❖ The ER diagram for this variant of Works In, which we call Works In2, is shown in Figure
18
❖ If an entity set plays more than one role, the role indicator concatenated with an attribute name from the entity
set gives us a unique name for each attribute in the relationship set.
❖ For example, the Reports To relationship set has attributes corresponding to the ssn of the supervisor and the
ssn of the subordinate, and the names of these attributes are supervisor ssn and subordinate ssn.
1.10 ADDITIONAL FEATURES OF THE ER MODEL
❖ We now look at some of the constructs in the ER model that allow us to describe some subtle (delicate)
properties of the data.
❖ The expressiveness of the ER model is a big reason for its widespread use.
1.10.1 Key Constraints
❖ Key constraints are used to apply some condition on the relationship that exist between the entities.
❖ Consider the Works In relationship shown in Figure 1.7 An employee can work in several departments, and a
department can have several employees.
❖ Now consider another relationship set called Manages between the Employees and Departments entity sets
such that each department has at most one manager, although a single employee is allowed to manage more
than one department.
❖ The restriction that each department has at most one manager is an example of a key constraint, and it implies
that each Departments entity appears in at most one Manages relationship in any allowable instance of
Manages.
❖ This restriction is indicated in the ER diagram of Figure 1.11 by using an arrow from Departments to Manages.
The arrow states that given a Departments entity, we can uniquely determine the Manages relationship in which
it appears.
❖ A relationship set like Manages is sometimes said to be one-to-many, to indicate that one employee can be
associated with many departments (in the capacity of a manager), whereas each department can be associated
with at most one employee as its manager.
19
❖ In contrast, the Works In relationship set, in which an employee is allowed to work in several departments and
a department is allowed to have several employees, is said to be many-to-many.
❖ If we add the restriction that each employee can manage at most one department to the Manages relationship
set, which would be indicated by adding an arrow from Employees to Manages in Figure 1.11, we have a one-
to-one relationship set.
1.10.2 Key Constraints for Ternary Relationships
❖ If an entity set E has a key constraint in a relationship set R, each entity in an instance of E appears in at most
one relationship in (a corresponding instance of) R. To indicate a key constraint on entity set E in relationship
set R, we draw an arrow from E to R.
❖ In Figure 1.12, we show a ternary relationship with key constraints.
20
Figure 1.13 An Instance of Works In3
❖ An instance of the Works In3 relationship set is shown in Figure 2.9. Notice that each department can be
associated with several employees and locations, and each location can be associated with several departments
and employees; however, each employee is associated with a single department and location.
1.10.3 Participation Constraints
❖ The participation constraint specifies the number of instances of an entity can participate in a relationship set.
❖ If all the instances of an entity participates in the relationship then it is said to be total participation
❖ Example:- Every employees works at least in any one department
❖ If the relationship is total, the two are connected by a thick line
❖ A participation that is not total is said to be partial. Connected with normal line.
❖ Example: - The participation of the entity set Employees in Manages is partial, since not every employee gets
to manage a department. If the participation of an entity set
❖ The presence of an arrow indicates a key constraint.
❖ It breaks an entity into multiple entities from higher level (super class) to lower level (subclass).
❖ Example: - The class vehicle can be specialized into Car, Truck and Motorcycle ( Top Down Approach)
Hence, vehicle is the super class and Car, Truck, Motorcycle are subclasses.
❖ All three of these inherit attributes from vehicle.
❖ It shows the relationships between the classes in the form of an organization chart.
22
Figure 1.16 Class Hierarchy
23
❖ Aggregation refers to the process by which entities are combined to form a single meaningful entity.
❖ The specific entities are combined because they do not make sense on their own.
❖ In aggregation, relationship with its corresponding entities is aggregated into a higher level entity.
❖ For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor.
❖ In the real world, if a visitor visits a coaching center then he will never enquiry about the Course only or just
about the Center instead he will ask the enquiry about both.
25