0% found this document useful (0 votes)
73 views

DBMS

BSC 2ND SEMESTER

Uploaded by

Sanjana Sanjana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

DBMS

BSC 2ND SEMESTER

Uploaded by

Sanjana Sanjana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

DBMS : Stands Data Base Management System.


 Data: Data is any known facts or any smallest information that can be recorded and have implicit
meaning
Eg :- Sanjana , BSC(CS) ,DCA, 2004 .
Data Information
 Why we need Data ? 1)Data is collection 1)Information
Ans-To derive some information from it. of raw facts and is processed
figures data
 Information :-When data is processed ,organized,structured
2)Data is not 2)Information
or presented in a given context to make it more useful it is
arranged. is arranged
called information .
3)Data is 3)Information
unorganized is organized
 Data Base:-It is collection of related data..…. Here related
data means if are collecting the information of an employee.It 4)Data does not 4)Information
should be related to employee. And DATA BASE should have depend on depends on
collecting of this employee data. information data
Eg: Name Age Designation Salary 5)Data is low -level 5)Information
Related Data Sanjana 20 Clerk 19000 knowledge. is the second
Collection Sana 23 Data Analyst 50000 level of
Of : language
Related :
data Sara 23 Data Analyst 80,000
 Data Base System:- It is a system in which ensure uses the Database Technology in order to achieve
an organized store a large no.of dynamic associated data with the help of Hardware ,software (DBMS),
OS.

 Data Base System:- Composed of 5 major parts - Hardware , Software(DBMS), people, procedure,
data

 Data Base Management System:-It is a set of software programs that allows users to create,edit and
update data in database files and store and retrieve data from those database files.
Example-Oracle, MS Sql server ,MYSQL,SQL ,DB2(IBM)

 Properties of Database or Why we use Data Base System:-


Ans-There are 6 important feature of Data bases-
1) Completeness 2)Integrity 3)Flexibility 4)Efficiently 5)Usability 6)Redundancy less

 Introduction of DBMS :- Data Base DBMS

A database is a A DBMS is a
 A data Base Management System (DBMS) is a collection of
collection of collection of
interrelated data and a set of programs to access those data.
connected programs that
information allow you to
 DBMS is used to organize the data in the form of a
about create,manage
table ,schema,view and report etc.
people,location and operate a
 The primary goal of a DBMS is to provide a way to store and
or things database
retrieve database information that is both convenient and efficient.

 Database Management System is the combination of two words.


Database+Management System=DBMS
 A Database is a collection of related information stored so that it is available to many users for different
purpose.
 Data base management system is a collection of programs that enables users to create and maintain the
database.
Application DBMS OS Database

 DBMS can also be define as an interface between the application program and the OS to access and
manipulate that database.
 Database Management system is a software which is used to manage the database.

Example- MySQL, Oracle etc are a very popular commercial database which is used in different application.
 Characteristics of DBMS :-1)Self describing nature of a database system(catalog)
2) It can provide a clear and logical view of the process that manipulates data
3) DBMS contains automatic backup and recovery procedures
4) It can reduce the complex relationship between data
5) It is used to provide security of data.

 Application of DBMS:-
1) Banking : For maintaining customer information,accounts ,loans and banking transactions
2) Universities :For maintaining student records ,course registration grades.
3) Railway Reservation :For checking the availability of reservation in different trains,tickets etc.
4) Airlines :For reservation and schedule information
5) Telecommunication- :For Keeping records of calls mode ,generating monthly bills etc.
6) Finance : For storing information about holidays ,sales and purches of financial instruments
7) Sales :For customer ,product and purchase information

 Advantage of DBMS
1) Control database Redundancy:- It control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database .
2) Data sharing :- It DBMS the authorized users of an organization can share the data among multiple users.
3) Easily Maintenance:-It can be easily maintainable due to the centralized nature of the database system.
4) Reduce Time:-It reduces development time and maintenance need.
5) Backup: It provides backup and recovery subsystems which create automatic backup of data from
hardware and software failure and restores the data if requires.
6) Multiple user interface: It provides different types of user interface like graphical user
interfaces ,application program interface.

 Disadvantage of DBMS: 1)Cost of hardware and software :It requires a high speed of data processor
and large memory size to run DBMS software.
2) Size :It occupies a large space of disks and large memory to run then efficiently.
3) Complexity : Database system creates additional complexity and requirements.

 Disadvantage of File System :- 1) Data Redundancy and Inconsistency 2)Difficulty in Accessing Data
3) Data Isolation 4) Integrity Problem 5) Atomicity Problem 6)Concurrent Access Anomalies
7)Security problem
 Type of databases :-There are various types of databases used for storing different varieties of data.

Type of Database

Centralized Distributed No SQL Cloud Relation Network Object Oriented Hierarchical


Database Database Database Database Database Database Database Database

1) Centralized Database:-It is the type of database that stores data at a centralized database system. It
comforts the users to access the stored data from different locations through several applications. These
applications contain the authentication process to let users access data securely.
An example of a Centralized database can be Central Library that carries a central database of each library
in a college/university.

Advantages of Centralized Database


o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database


o The size of the centralized database is large,which increases the response time for fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database:- In distributed systems, data is distributed among different database systems of an
organization. These database systems are connected via communication links. Such links help the end-users
to access the data easily.
Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc.
It divided into two subpart-
Distributed Data base

Homogeneous Heterogeneous
DDB DDB

o Homogeneous DDB: Those database systems which execute on the same operating system and use the
same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating systems under
different application procedures, and carries different hardware devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database, i.e., the system can be expanded by
including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database:-It stores data in the form of rows(tuple) and columns(attributes), and together
forms a table(relation). A relational database uses SQL for storing, manipulating, as well as maintaining the
data. E.F. Codd invented the database in 1970. Each table in the database carries a key that makes the data
unique from others
Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc

4)No Sql Database(With out structure data store kora):-Non-SQL/Not Only SQL is a type of database that
is used for storing a wide range of data sets. It is not a relational database as it stores data not only in
tabular form but in several different ways.
It also divides into 4 sub part-
a. Key-value storage
b. Document-oriented Database
c. Graph Databases
d. Wide-column stores
Advantages of NoSQL Database
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.

5)Cloud Database:-A type of database where data is stored in a virtual environment and executes over the
cloud computing platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for
accessing the database. There are numerous cloud platforms, but the best options are:
o Amazon Web Services(AWS)
o Microsoft Azure
o ScienceSoft
o Google Cloud SQL, etc

6)Object-oriented Databases:The type of database that uses the object-based data model approach for
storing data in the database system. The data is represented and stored as objects which are similar to the
objects used in the object-oriented programming language.

7) Hierarchical Databases:It is the type of database that stores data in the form of parent-children
relationship nodes. Here, it organizes data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record in the tree will contain
only one parent. On the other hand, each parent record can have multiple child records.
8)Network Databases:It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them. Unlike the hierarchical
database, it allows each record to have multiple children and parent nodes to form a generalized graph
structure.

 What is RDBMS (Relational Database Management System)


 RDBMS stands for Relational Database Management System.
 It is called Relational Database Management System (RDBMS) because it is based on the relational model
introduced by E.F. Codd.
 RDBMS stores data in the form of related tables.
 An important feature of relational systems is that a single database can be spread across several tables.
 Due to a collection of an organized set of tables data can be accessed easily in RDBMS.
 Everything in a relational database is stored in the form of relations.
 All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL, and
Microsoft Access are based on RDBMS.

 table/Relation :Everything in a relational database is stored in the form of relations. The RDBMS
database uses tables to store data. A table is a collection of related data entries and contains rows and
columns to store data.
Properties of a Relation:
o Each relation has a unique name by which it is identified in the database.
o Relation does not contain duplicate tuples.
o The tuples of a relation have no specific order.
o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.

 -
row or record: A row of a table is also called a record or tuple. It contains the specific information of
each entry in the table. It is a horizontal entity in the table
Properties of a row:
o No two tuples are identical to each other in all their entries.
o All tuples of the relation have the same format and the same number of entries.
o The order of the tuple is irrelevant. They are identified by their content, not by their position.

 column/attribute/fields :A column is a vertical entity in the table which contains all information
associated with a specific field in a table.
Properties of an Attribute:
o Every attribute of a relation must have a name.
o Null values are permitted for the attributes.
o Default values can be specified for an attribute automatically inserted if no other value is specified for
an attribute.
o Attributes that uniquely identify each tuple of a relation are the primary key.
o
 data item/Cells:-The smallest unit of data in the table is the individual data item. It is stored at the
intersection of tuples and attributes. ID Name AGE COURSE
Properties of data items:1)Data items are atomic.
2)The data items for an attribute should be drawn from the 1 Debraj 20 BSC
same domain.

In the below example, the data item in the student table consists of Debraj, 20 and BSC, etc.
 Degree:The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
ID Name AGE COURSE
1 Sara 24 B.tech
2 Sana 20 C.A
3 Deb 20 BCA
4 Raj 22 MCA
5 Debraj 20 BSC

 Cardinality:The total number of tuples at any one time in a relation is known as the table's cardinality.
The relation whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.

 Domain:The domain refers to the possible values each attribute can contain. It can be specified using
standard data types such as integers, floating numbers, etc. For example, An attribute entitled
Marital_Status may be limited to married or unmarried values.

 Codd’s Rules in RDBMS :-Dr E.F codd is an IBM researcher who first developed the relational data
model in 1970. In 1985 Dr.codd published a list of 12 rules that define an ideal relational database
and has provided a guideline for the design of all relational database

 Codd’s Rules in DBMS

Rule 1: The Information Rule : This rule simply requires that all data should be presented in table form this
is the basis of relational model.

Rule 2: The Guaranteed Access Rule :Each data element is guaranteed to be accessible logically with a
combination of the table name, primary key (row value), and attribute name (column value).

Rule 3: Systematic Treatment of NULL Values:Every Null value in a database must be given a systematic
and uniform treatment.

Rule 4: Active Online Catalog Rule:The database catalog, which contains metadata about the database, must
be stored and accessed using the same relational database management system.

Rule 5: The Comprehensive Data Sub language Rule: A crucial component of any efficient database system
is its ability to offer an easily understandable data manipulation language (DML) that facilitates defining,
querying, and modifying information within the database.

Rule 6: The View Updating Rule:All views that are theoretically up datable must also be up datable by the
system.

Rule 7: High-level Insert, Update, and Delete:-A successful database system must possess the feature of
facilitating high-level insertions, updates, and deletions that can grant users the ability to conduct these
operations with ease through a single query.

Rule 8: Physical Data Independence:Application programs and activities should remain unaffected when
changes are made to the physical storage structures or methods.

Rule 9: Logical Data Independence :Application programs and activities should remain unaffected when
changes are made to the logical structure of the data, such as adding or modifying tables.

Rule 10: Integrity Independence:Integrity constraints should be specified separately from application
programs and stored in the catalog. They should be automatically enforced by the database system.

Rule 11: Distribution Independence:The distribution of data across multiple locations should be invisible to
users, and the database system should handle the distribution transparently.

Rule 12: Non-Subversion Rule:If the interface of the system is providing access to low-level records, then
the interface must not be able to damage the system and bypass security and integrity constraints.
Key DBMS RDBMS

DBMS stands for Database RDBMS stands for Relational Database


Definition
Management System. Management System.

Data Storage Data is stored as file. Data is stored as tables.

Normalization cannot be Normalization can be achieved.


Normalization
achieved.

Distributed DBMS has no support for RDBMS supports distributed databases.


database distributed databases.

DBMS supports single user at a RDBMS supports multiple users at a


User
time. time.

Example File systems, XML, etc. Oracle, SQL Server.

Basics File System DBMS

Structure The file system is a way of arranging the


DBMS is software for managing the
files in a storage medium within a
database.
computer.

Data Redundant data can be present in a file


In DBMS there is no redundant data.
Redundancy system.

Query There is no efficient query processing in the Efficient query processing is there in
processing file system. DBMS.

Complexity It has more complexity in handling as


It is less complex as compared to DBMS.
compared to the file system.

Cost It has a comparatively higher cost than a


It is less expensive than DBMS.
file system.

User Access Only one user can access data at a time. Multiple users can access data at a time.

Example COBOL, C++ Oracle, SQL Server

1-Tier Architecture:- this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.Any changes done here will directly be done on the database itself. It
doesn't provide a handy tool for end users.The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate with the database for the quick response.
 Application architecture of DBMS :-
o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
with a large number of PCs, web servers, database servers and other components that are connected
with networks.
o DBMS architecture depends upon how users are connected to the database to get their request done.

 Types of DBMS Architecture - 1-tier architecture , 2-tier architecture and 3-tier architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of
two types like: 2-tier architecture and 3-tier architecture.

2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.

Fig: 2-tier architecture Fig: 3-tier architecture.


3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture,
client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further communicates
with the database system.
o End user has no idea about the existence of the database beyond the application server. The database
also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.

 Schema:- The schema defines the tables, the attributes along with its size and type and relationship
between attributes(column) and table.Overol design of the data is called the database schema.

 Database Instance:- Database changes over time as information are inserted and deleted .The
collection at particular moment is called Database Instance.

 Three schema Architecture(***VVIP)/Three levels of DBMS Architecture :-


o The overall design of the database is called the database schema.
o The three schema architecture is also called ANSI/SPARC architecture or three-level architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and physical database.
View External External External
Level Schema schema schema

Or Logical Conceptual schema


level

Physical level/ Physical Schema


Internal level

Database

1. Internal Level/Physical level :-The internal level has an internal schema which describes the physical
storage structure of the database.

o The internal schema is also known as a physical schema.


o It uses the physical data model. It is used to define that how the data will be stored in a block.
o The physical level is used to describe complex low-level data structures in detail.
The internal level is generally is concerned with the following activities:
o Storage space allocations.
For Example: B-Trees, Hashing etc.
o Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and sequencing.

2. Conceptual Level :The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.

o In the conceptual level, internal details such as an implementation of the data structure are hidden.
o Programmers and database administrators work at this level.
o
3. External Level /View Level/view schema :At the external level, a database contains several schemas that
sometimes called as sub schema. The sub schema is used to describe the different view of the database.

o An external schema is also known as view schema.


o Each view schema describes the database part that a particular user group is interested and hides the
remaining database from that user group.
o The view schema describes the end user interaction with database systems.
 Data Independence:-The ability to modify a schema definition in one level without affecting a
schema definition in the next higher level is called Data independence .
o Data independence is one of the main advantage of DBMS .

Data independence is two type-


o Physical data independence
o Physical data independence

1. Logical Data Independence:-Logical data


independence refers to the ability to modify
the schema definition at logical level or view
level without affecting the schema definition at
conceptual level.
Or.
o Logical data independence refers
characteristic of being able to change
the conceptual schema without having
to change the external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.
o Logical data independence occurs at the user interface level.
2. Physical Data Independence:-Physical data independence refers to the ability to modify the schema
definition at physical level without affecting the schema definition at conceptual level.
Or.
o Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the Conceptual structure
of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.

 Data Model:-Data model is the modeling of the data description ,data semantics and consistency
constraints of the data.
• Data model provides the conceptual tools for describing the design of a database at each level of data
abstraction.
• A data model can also be define as the collection of high level data description constructs that hide many
low level storage details.
There are mainly three types of data model

data model

Object Based Record Base Physical


Data model Data model data model

Entity-Relationship Relational Model Unifying Model


Model
Network Model

Object -oriented data Hierarchical Frame memory


Model Model model

1) Object Based Data model :-It is used to describe the data at the logical and view level .Object based data
model provide flexible structuring and structuring capabilities and allow to specify data constraints.

There are mainly two types of object based data model.


A) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and
relationships among them. These objects are known as entities, and relationship is an association among
these entities.
F .name F .name
Name Name
L.name L.name
Teacher Study Student

ID D.O.B Relationship name ID

B) Object Oriented Data Model :-In an object oriented model ,information or data is displayed as an object
and these objects store the value in the instance variable .In this model object oriented programming images
are used.
This model works with object oriented programming language like Python,Java etc it was constructed in the
1980.

2) Record Base Data Model:-It is used to describe data at logical and view level.
• This data model is used to specify the overall logical structure and to specify the higher level structure
and provide higher level description
.
There are three type of record based data mode-

A) Relational Data model:-


• This type of model designs the data in the form of rows and columns within a table.
• Each table has multiple columns and each column has a unique name.
• This model was initially described by Edgar F. Codd, in 1969.
• This model uses the certain mathematical operations from relational algebra and relational calculus an the
relation such as union ,join etc.
Roll no Name Address
O1 Sana Purulia
02 Sara Ketka
03 Deb Delhi

B) Network Data Model :-In network data model data is organized into graph .And it can have more than one
parent node. It permits the modeling of man to many relationships in data.
Store

Customer Manager salesman

Order Items
C) Hierarchical Data Model:- The Hierarchical Data Model organizes data in a tree structure .
In this model each entity has only one parent and may abstract children.There is only one entity in this model
that we call root.
College

Department Information

Course Teacher Student

3)Physical Data model:-This data model is used to describe the data at low level

 DBMS in Interface:-A database management system (DBMS) interface is a user interface that allows
for the ability to input queries to a database without using the query language itself.
User-friendly interfaces provided by DBMS may include the following:
 Menu-Based Interfaces
 Forms-Based Interfaces
 Graphical User Interfaces
 Natural Language Interfaces
 Speech Input and Output Interfaces
 Interfaces for Parametric Users
 Interfaces for the Database Administrator (DBA)
1) Menu-Based Interfaces:These interfaces present the user with lists of options (called menus) that lead
the user through the formation of a request. The basic advantage of using menus is that they remove the
tension of remembering specific commands and syntax of any query language.
2) Forms-Based Interfaces:A forms-based interface displays a form to each user. Users can fill out all of
the form entries to insert new data, or they can fill out only certain entries, in which case the DBMS will
redeem the same type of data for other remaining entries.. Many DBMS’s have form specification
languages which are special languages that help specify such forms.

3) Graphical User Interface:A GUI typically displays a schema to the user in diagrammatic form. The user
then can specify a query by manipulating the diagram. In many cases, GUI utilize both menus and forms.
Most GUI use a pointing device such as a mouse, to pick a certain part of the displayed schema diagram.
4) Natural Language Interfaces:These interfaces accept requests written in English or some other
language and attempt to understand them. A Natural language interface has its own schema, which is
similar to the database conceptual schema .
5) Speech Input and Output Interfaces:There is limited use of speech be it for a query or an answer to a
question or being a result of a request it is becoming commonplace. Applications with limited vocabulary
such as inquiries for telephone directory, flight arrival/departure, and bank account information are
allowed speech for input and output to enable ordinary folks to access this information.
The Speech input is detected using predefined words and used to set up the parameters that are supplied
to the queries. For output, a similar conversion from text or numbers into speech takes place.
6) Interface for Parametric Users:Interfaces for Parametric Users contain some commands that can be
handled with a minimum of keystrokes. It is generally used in bank transactions for transferring money.
These operations are performed repeatedly.
7) Interfaces for Database Administrators (DBA):-Most database system contains privileged commands
that can be used only by the DBA’s staff. These include commands for creating accounts, setting system
parameters etc.

 Database Languages in DBMS


o A DBMS has appropriate languages and interfaces to express database queries and updates.
o Database languages can be used to read, store and update the data in the database.
Types of Database Languages

1. Data Definition Language (DDL)


o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
o Create: It is used to create objects in the database.
o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why they come under Data definition
language.
2. Data Manipulation Language (DML)
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It
handles user requests.
Here are some tasks that come under DML:
o Select: It is used to retrieve data from a database.
o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.
3. Data Control Language (DCL)
o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.
(But in Oracle database, the execution of data control language does not have the feature of rolling
back.)
Here are some tasks that come under DCL:
o Grant: It is used to give user access privileges to a database.
o Revoke: It is used to take back permissions from the user.
There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.
4. Transaction Control Language (TCL)
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
Here are some tasks that come under TCL:
o Commit: It is used to save the transaction on the database.
o Rollback: It is used to restore the database to original since the last Commit.
 ER (Entity Relationship) Diagram in DBMS :-
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to
define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to design
view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc.

 component of ER Diagram :-

Strong Entity
set set

1. Entity: it is a thing or object in the real world that is distinguishable from all other object .
• Anything about Which we store information is called an Entity.
Entity Set :- It is a set of entities of the some type that share the some properties or attributes.
• An Entity set can be represented as rectangle.

Type of Entity Set :-There are two type of Entity Set-

1)Weak Entity set :- An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.

2) Strong Entity Set :- A strong entity set is an entity set that contains sufficient attributes to uniquely
identify all its entities .
• Primary key exists for a strong entity set.
• Single rectangle is used to representing a strong entity set .

Strong entity set Weak Entity set


 Attributes in ER Model :-The attribute is used to describe the property of an entity. An Entity set may
contain any number of attributes. Attributes are represented in an elliptical shape.

Type of attributes :-
1) Simple attribute : An attribute that cannot be further subdivided into components is a simple
attribute. It is represent by ellipse.
Example: The roll number of a student, the id number of an employee.
Roll no

2) Composite attribute : An attribute that can be split into components is a composite attribute.
The composite attributes is represent by an ellipse and those Ellipse are connected with an ellipse

First Name

Name Middle Name

Last Name
3)Multi-valued attribute : An attributes can have more than one value these attributes are known as a
Multi-valued attributes.The double ellipse is used to represent multi valued attributes.
Example: A student can have more than one phone number.

Phone no

4)Derived attribute : An attribute that can be derived from other attributes is derived attributes.
It can be represented by a dashed ellipse.
Example:A person age changes are time and can be derived from another attributes like date of birth.

Age age

5)Key attribute:The key attributes is used to represent the main characteristic of an Entity .It represent a
primary key.The key attribute is represented by an ellipse with the next underlined.
Student -ID

6)Single-valued attribute : The attribute which takes up only a single value for each entity instance is a
single-valued attribute.
Example: The age of a student.

7)Complex attribute : Those attributes, which can be formed by the nesting of composite and multi-valued
attributes, are called “Complex Attributes“. These attributes are rarely used in DBMS(DataBase
Management System). That’s why they are not so popular.

8)Stored attribute:The stored attribute are those attribute which doesn’t require any type of further
update since they are stored in the database.
Example: DOB(Date of birth) is the stored attribute.

 Relationship/Mapping construction :-

Relationship:-A relationship is used to describe the relation between entities. Diamond or rhombus is used
to represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship:-When only one instance of an entity is associated with the relationship, then it
is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship:-When only one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.

c. Many-to-one relationship:When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship:When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is known as a many-to-many
relationship.
For example, Employee can assign by many projects and project can have many employees.

 Notation of E-R Diagram :-Database can be represented using the notations. In ER diagram, many
notations are used to express the cardinality. These notations are as follows:
 Construct an E-R diagram for a hospital with a set of patients and a set of medical doctor.

 Keys in DBMS .

Keys:-
o A key is a value which can always be used to uniquely identify an object instance.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.

Types of keys:
1. Primary key
o Primary key can be define as the minimum no of candidate key this is chosen by the database designer
as the principle means of identifying entities within an entity set.
o It is a unique key.
o It can identity only one tuple (are cord) at a time .
o It has no duplicate values it has unique values
o It cannot be NULL.
o Primary keys are not necessary to be a single column,more than one the column can also be a primary
key for a table.

For example- In student table with attributes(s-roll no,s-name,s-branch,s-year)


Primary key- P1 S-Roll no

2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate
keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport_Number, License_Number, etc., are considered a candidate key.

3.Super Key: A super key is a set of one or more attributes that taken collectively allow us to identify
uniquely an entity in the entity set.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a
key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o A Foreign keys is a column whose value are the same as the primary key of another table.
o It combines two or more relations(table) at a time.
o They act as a crass reference between the tables
o Foreign key are the column of the table used to point to the primary key of another table

Roll no Name ID ID Branch Address

1 Sana 123 (F.R) 234 IT Purulia


Foreign key 456 CSE Adra
2 Sara 234
678 EC Hariyana
3 Zoi 456

6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also
known as Concatenated Key.
For example-In student table with attribute (s_roll no,s_ID,s_name,s_branch)
Composite key- s_roll no s_ID.

7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a
primary key is large and complex and has no relationship with many other relations. The data values of the
artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee
relations. So it would be better to add a new virtual attribute to identify each tuple in the relation uniquely.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a
relation. These attributes or combinations of the attributes are called the candidate keys. One key is chosen
as the primary key from these candidate keys, and the remaining candidate key, if it exists, is termed the
alternate key. In other words, the total number of the alternate keys is the total number of candidate keys
minus the primary key. The alternate key may or may not exist. If there is only one candidate key in a
relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In
this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the
Alternate key.
 DBMS Generalization :-
o Generalization is like a bottom-up approach in which two or more entities of lower level combine to
form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the lower level to
form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher level entity Person

 DBMS Specialization :-
o Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one
higher level entity can be broken down into two lower level entities.
o Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
relationship set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.

 E-R Diagram with generalization and specialization :-

Name Street City

Person
Salary Credit-rating
Is a

Employee Customer
Generalization

Is A
Specialization

Officer Teller secretary

Officer number Station Hours Hours


Number Worted Worted

 DBMS Aggregation :-
o Aggregation is a technique to express relationship among relationship.
o Through E-R modeling we cannot express relationship among relationships .Thus we use the concept
of aggregation for this purpose
o Aggregation is an abstraction through which relationship are treated as entities
o In aggregation, the relation between two entities is treated as a single entity.
o In aggregation, relationship with its corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about the Center instead he will ask the enquiry about both.

 DBMS Architecture/Structure/Component :-A database system is partitional into modules that deal
with each of the responsibilities of the overall system.
DBMS Architecture divided into 4 parts-
1) DBMS Users
2) Query Processor
3) Storage Processor
4) Disk Storage.

1) DBMS User: Database users are categorized based up on their interaction with the data base

There are 4 main type of DBMS users -

A)Naive/End Users :-End users are the unsophisticated who don’t have any DBMS knowledge but they
frequently use the database applications in their daily life to get the desired results.
For example- Railways ticket booking users are have users.Clearcks in any bank is a naive user because they
do not have any DBMS knowledge but they still use the database and perform their given tasks.

B)Application Programmer :-A application program are the back end programmers who writes the code for
the application program.They are the computer professionals.These program could be written in
programming languages such as Net,C,C++,Java etc.

C)Sophisticated Users :-Sophisticated users can be engineers,scientists can be business analyst,who are
familiar with the database.They can develop their own database application according to their requirement.
They don’t write the program code but they interact the data base by writing SQL queries directly through the
query processor.
D) Database Administrator(DBA):-DBA is a person/team who who defines the schema and also controls the
3 levels of database.
• The DBA will then create a new account id and password for the user if he/she need to access the database.
• DBA is also responsible for providing security to the database he allow only the authorized users to
access/modify the database.
• DBA monitors the recovery and backup and provide technical support.
• The DBA has a DBA account in the DBMS which called a system or super user account.
• DBA repairs damage caused due to hardware and/or software failures.
2)Query Processor:-In interprets the requests(queries) received from end user via an application program
into instruction.It also executes the user request which is received from the DML compiler.

Query processor contains the following component -

a) DDL Compiler : The DDL statements are sent to DDL compiler,which converts these statements to set of
tables.These tables contains the meta data concerning the database and are in the form that can be used by
other components of the DBMS.
b) DML pre-compiler and Query Processor :-The DML pre compiler converts the DML statements embedded
in an application program to normal procedure calls in the host language.

The query processor component includes :

I) DDL interpreter :It processes the DDL statements into a set of table containing meta data(data about
data)
II) DML Compiler:-It processes the DML statements into low level instruction(machine language)so that
they can be executed.
III) Query Evaluation Engine :-Which executes low-level instructions generated by the DML compiler.

3)Storage Manager/processor:-Storage manager is a program that provides an interface between the data
Stored in the database and the queries received. It is also known as database control system .it maintains
the consistency and integrity of the database by applying the constraints and executes the DCL statements.
It is responsible for updating ,storing deleting and retrieving data in the database.

It contains the following components:-


a) Authorization Manager :It tests for the satisfaction of integrity constraints and checks the
authority of users to access data.
b) Transaction manager: which ensure that the database remains in a consistent (correct)state
despite system failures ,and that concurrent transaction execution proceed without conflicting.
c) File Manager:-which manager the allocation of space on disk storage and the data structures
used to represent information stored on disk.
d) Buffer Manager:-it is responsible for cache memory and the transfer of data between the
secondary storage and main memory.

5) Disk storage:- it contains the following components


a) Data files: it stores the data
b) Data dictionary : it contains the information about the structure of any data base object.it is
the repository of information that governs the meta data
c) Indices:It provides fast access to data items that hold particular values.

 Reduction of ER diagram to Table :-


The database can be represented using the notations, and these notations can be reduced to a collection of
tables.
In the database, every entity set or relationship set can be represented in tabular form.
The ER diagram is given below:
There are some points for converting the ER diagram to the table:
o Entity type becomes a table:-In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE
forms individual tables.

o A key attribute of the entity type represented by the primary key:-In the given ER diagram,
COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of the entity.

o The multi valued attribute is represented by a separate table:-In the student table, a hobby is a
multi valued attribute. So it is not possible to represent multiple values in a single column of
STUDENT table. Hence we create a table STUD_HOBBY with column name STUDENT_ID and HOBBY.
Using both the column, we create a composite key.

o Composite attribute represented by components:-In the given ER diagram, student address is a


composite attribute. It contains CITY, PIN, DOOR#, STREET, and STATE. In the STUDENT table, these
attributes can merge as an individual column.

o Derived attributes are not considered in the table:-In the STUDENT table, Age is the derived
attribute. It can be calculated at any point of time by calculating the difference between current date
and Date of Birth.

 Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
between the tables. Table structure for the given ER diagram is as below:

Figure: Table structure


 Relation Model Concept :-Relational model can represent as a table with columns and rows. Each row is
known as a tuple. Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.


Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a domain,
dom(Ai)

Relational instance: In the relational database system, the relational instance is represented by a finite set
of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name of all columns or
attributes.

Relational key: In the relational key, each row has one or more attributes. It can identify the row in the
relation uniquely.

Example: STUDENT Relation

Name Roll no Phone number address


Sana 123 9824191898 Delhi
Sara 234 8721978231 Purulia
Deb 345 1921268269 Ketka

o In the given table given table,name,roll no,phone no are the attributes


o The instance of schema STUDENT has 3 tuples.
Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence

 Integrity Constraints :-
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.

Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of the
attribute must be available in the corresponding domain.
Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if the primary
key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2,
then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:

4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key
can contain a unique and null value in the relational table.
Example:

 Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query.Relational algebra mainly provides theoretical foundation for relational databases and SQL. It uses
operators to perform queries.
Types of Relational operation

1. Select Operation( σ):


o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

Notation: σ p(r)
Where: σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: Student
Name Roll No Address
Sana 02 Purulia
Sara 04 Bakura
Query 1 :Give all information of student having roll no is 04
Deb 08 Delhi Solution : σ roll no=04(student)
Raj 13 Bombay
Query 2: Find all information of student having name is deb and
address is Delhi
Solution : σ Name=”deb” and address=”Delhi”(student)
σ (Name=”deb”) V (address=”Delhi”)(student) [V= or]

2. Project Operation( ∏):


o This operation selects certain columns from the table and discard the other columns.
o This operation shows the list of these attributes that we wish to appear in the result.
o Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
The general format for project operation is ∏<attribute List>(R)
Where ∏ is the symbol used to represent the project operation and attribute list is the list of attributes from
the attributes of Relation R.
Example -Student
Name Roll No Address
Sana 02 Purulia
Sara 04 Bakura
Deb 08 Delhi
Raj 13 Bombay

Query 01:Find student Name in the student table


(student)
Solution : ∏ name
Query 02: Find student Name and address list.
(student)
Solution : ∏ name ,address

3. Union Operation(∪):
o It performs binary union between two given relations and is define as R ∪S
Where R and S are either database relations or relation result set(temporary relation )

Union operation to be valid ,the following conditions must hold-


o R and S must have the attribute of the same number.
o Attribute domains must be compatible.
o Duplicate tuples are eliminated automatically.
For Example:
(student 1) (student 2)
∏ name ∪ ∏ name
4. Set Intersection(∩):
o It performs binary intersection between two given relations and is defines as R ∩ S
Where R and S are either database relations or relation result set.
(student 1) (student 2)
Example-∏ name ∩ ∏ name
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R
but not in S.
o It is denoted by intersection minus (-).

Notation: R - S
(student 1) - (student 2)
Example -∏ name ∏ name

6. Cartesian product:
o The Cartesian product is used to combine each row in one table with each row in the other table. It is
also known as a cross product.
o It is denoted by X.

Notation: E X D
Where E and D are relations and their output will be define as-
E X D={q €|q€E and € ε D}
Example:
(Student 1 X Student 2)
Σ Name=’Kamal
7. Rename Operation:The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Normalization
 Functional Dependency :-The functional dependency is a relationship that exists between two attributes.
It typically exists between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.

For example:Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know
the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as: Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency:-
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:-Consider a table with two columns Employee_Id and Employee_Name.


{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.

2. Non-trivial functional dependency:-


o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
 Inference Rule (IR):
o The inference rule is a type of assertion. It can apply to a set of FD(functional dependency) to derive
other FD.
o Using the inference rule, we can derive additional functional dependency from the initial set.
The Functional dependency has 6 types of inference rule:
1. Reflexive Rule (IR1):-In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y then X → Y
Example-X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2):-The augmentation is also called as a partial dependency. In augmentation, if X
determines Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example-For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3):-In the transitive rule, if X determines Y and Y determine Z, then X must also
determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule (IR4):-Union rule says, if X determines Y and X determines Z, then X must also determine Y
and Z.
If X → Y and X → Z then X → YZ
5. Decomposition Rule (IR5):-Decomposition rule is also known as project rule. It is the reverse of union
rule.This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
6. Pseudo transitive Rule (IR6):-In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ
determines W.
If X → Y and YZ → W then XZ → W

Problem-1:-Find the candidate key of relation R(A,B,C,D,E,F) with functional dependency.


A →C
C→D (Set of attributes whose closure contains all attributes of given relation)
D→B
E→F
Solution:- A B C D E F ={A,B,C D,E,F}
S.K A→C
A B D E F={A,B,C,D,E,F}
S.k A→C,C→D,A→D {Triangle dependency}

ABEF={A,B,C,D,E,F}
S.K A→C,C→D,D→B,A→B Transitive Dependency

AEF={A,B,C,D,E,F}
E→F
AE={A,B,C,D,E,F}
Candidate Key
Prime attributes=(A,E) if no prime attributes available to right hand side of any function dependency so,it
has only one candidate key ( AE ).
Problem-2:Find the possible candidate key of the Relation R(A,B,C,D) with functional dependency
A→B,B→C,C→A.
Solution: S.K A B C D → {A,B,C,D}
S.K A C D →{A,B,C,D}
S.K A D →{A,B,C,D}
AD is candidate key.
AD prime attributes A,D prime attributes A available in right side of function dependency C→A so,another
candidate key CD.
Again C is available in right side of function dependency B→C so another candidate key.
BD
So candidate key = AD,CD,BD

 What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table

 Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies
leads to data redundancy and can cause data integrity and other problems as the database grows.
Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.
When we normalize the database ,we have 4 goals :
1) Arranging data into logical groupings such that each group describes a small part of the whole
2) Minimizing the amount of duplicate data called redundancy ,stored in a database.
3) Organizing the data such that when you modify it you make the changes only in one place.
4) Building a database in which you can access and manipulate the data quickly and efficiently without
compromising the integrity of the data in storage.

Normal Form Description

1NF (first normal A relation is in 1NF if it contains an atomic value. It Eliminate Repeating Groups
form)

2NF (2nd normal A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
form) on the primary key. It Eliminate partial functional dependency.

3NF (3rd normal A relation will be in 3NF if it is in 2NF and no transition dependency exists. It Eliminate
form) transitive dependency.

BCNF (4 th A stronger definition of 3NF is known as Boyce Codd's normal form.it is called 3.5 NF.
normal form)

4NF(4th normal A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
form dependency. Eliminate multi- values Dependency

5NF( 5th normal A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
form) lossless.Eliminate join Dependency.
 Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
 Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

 First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Example: Relation Student is not in 1NF because of multi-valued attribute std_PHONE.
Student table:
Student ID Student Name Student phone No. Student Branch
11 Sana 8187991881,91921929272 IT
12 Deb 7821712888 CS`
13 Raj 8789018277,78187218727 ES
14 Sara 78171918198 ME
15 Sonai 79117872171 IT

The decomposition of the Student table into 1NF as :-


Student ID Student Name Student Phone Student Branch
11 Sana 8187991881 IT
11 Sana 91921929272 IT
12 Deb 7821712888 CS
13 Raj 8789018277 ES
13 Raj 78187218727 ES
14 Sara 78171918198 ME
15 Sonai 79117872171 IT
 Second Normal Form (2NF)
o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table: Teacher_ID Subject Teacher_age
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table: TEACHER_SUBJECT table:


Teacher_ID Subject
Teacher_ID Teacher_age
25 Chemistry
25 30 25 Biology
47 35 47 English
83 Math
83 38
83 Computer
 Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third
normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial
function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_Department location table:

Emp No E.Name Sal Depart.Name Depart.location


1001 Sana 7500 Accounts 102
1002 Sara 5000 Sales 104
1003 Deb 1000 Accounts 102
1004 Raj 4500 Sales 104
1005 Sonai 6500 Store 106

Table not in 3NF because it hold the transitive dependency.


EmpNo -> Dept_Name -> Dept_location
EmpNo ->Dept_Location
To make it in 3NF we decompose and remove the transitive dependency so we convert the given table in #NF
decompose two sub table such as-
(Table 1)Employee_Department :-

Emp No E.Name Sal Depart.Name


1001 Sana 7500 Accounts
1002 Sara 5000 Sales
1003 Deb 1000 Accounts
1004 Raj 4500 Sales
1005 Sonai 6500 Store

(Table 2)Department _Location:-

Depart.Name Depart.location
Accounts 102
Sales 104
Store 106

Table Converted in 3NF

 Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
Emp_Id Emp-Country Emp-Dept Emp_Type Emp_Dept No
264 India Designing D394 283
264 India Testing D394 300
364 UK Stored D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
Emp-id Emp-Country

264 India
364 UK
EMP_DEPT table:
Emp-Dept Emp_Type Emp_Dept No
Designing D394 283
Testing D394 300
Stored D283 232
Developing D283 549

EMP_DEPT_MAPPING table:
Emp_ID Emp_Dept
D394 283
D394 300
D283 232
D283 549

Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.

 Fourth normal form (4NF)


oA relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
oFor a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be
a multi-valued dependency.
Example
STUDENT Stu-Id Course Hobby
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables.
STUDENT_COURSE
Stu-Id Course
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
Stu-Id Hobby
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

 Fifth normal form (5NF) :-


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example : Faculty (Original Table)
Faculty Subject Committee
Sana DBMS Placement
Sana Java Placement
Sana C Placement
Sana DBMS Scholarship
Sana Java Scholarship
Sana C Scholarship

The given table is not in 4NF and 5NF first we convert it in 4NF with converting in two sub table.
Table-1 Faculty-Subject
Faculty Subject
Sana DBMS
Sana Java
Sana C
Table-2 Faculty -committee:-
Faculty Committee
Sana Placement
Sana Scholarship
To convert it in 5NF ,we join both table1 and table2 if it give the result same as original table (faculty)then
its in 5NF otherwise not in 5NF.
Table 1 + Table 2
Faculty Subject Committee
Sana DBMS Placement
Sana DBMS Scholarship
Sana Java Placement
Sana Java Scholarship
Sana C Placement
Sana C Scholarship Is equl to original table so ,it is in 5NF.

 Relational Decomposition:-
o When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be
lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it
was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.
Example: Emp-info
Emp_ID Emp_Name Emp_Age Emp_location Dept_ID Dept_Name
€001 Sana 29 Hariduwar Dpt1 Operation
€002 Sara 32 Dehradun Dpt2 HR
€003 Deb 22 Delhi Dpt3 Finance
Decompose the above table into two tables :
1) Emp Details
Emp_ID Emp_Name Emp_Age Emp_location
€001 Sana 29 Hariduwar
€002 Sara 32 Dehradun
€003 Deb 22 Delhi

2) Dept Details
Emp_ID Dept_ID Dept_Name
€001 Dpt1 Operation
€002 Dpt2 HR
€003 Dpt3 Finance
Now ,natural join is applied on the above two tables.

 Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD
A->BC is a part of relation R1(ABC).
 Lossy Decomposition :When a relation is decomposed into two or more relational schemas,the loss of
information unavoidable when the original relation is retrieved.
Example-Emp Info
Emp_ID Emp_Name Emp_Age Emp_location Dept_ID Dept_Name
€001 Sana 29 Hariduwar Dpt1 Operation
€002 Sara 32 Dehradun Dpt2 HR
€003 Deb 22 Delhi Dpt3 Finance
Decompose the above table into two tables :
< Emp Details >
Emp_ID Emp_Name Emp_Age Emp_location
€001 Sana 29 Hariduwar
€002 Sara 32 Dehradun
€003 Deb 22 Delhi

< Dept Details >


Dept_ID Dept_Name
Dpt1 Operation
Dpt2 HR
Dpt3 Finance
Now,you want be able to join the above tables since Emp_ID is not part of Dept Details relation.
Therefore,The above relation has lossy decomposition.

Q. Consider the schema S=(V,W,X,Y,Z)suppose the following F.D hold:


Z -> V
W ->Y
XY->Z
V ->WX
State whether the following decomposition schema S is lossless join decomposition or loss
decomposition.
1) S1 = (V,W,X) 2) S1=(V,W,X)
S2=(V,Y,Z) S2=(X Y Z)
Ans:- R1 & R2 are two relation is lossless decomposition if -
R1 n R2 ->R1
R1 n R2 ->R2
1)S1=(V,W,X)
S2=(V,Y,Z)
S2 n S2 = V -> S1
V -> S2
V -> V
V ->WX
V->VWX
V->S1
So,The decomposition
S1=(V,W,X)
S2=(V,Y,Z) is lossless
Ii) S1=(V,W,Y)
S2=(X,Y,Z)
S1 n S2 -> X ->S1 ->VWX
X ->S2 ->XYZ
X -> X
X -> W
->Y
->Z
Lossy decomposition because x not determine V,WX,or XYZ
X -> VWX
X ->XYZ Lossy Decomposition
 Transaction:
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform operations
for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This small
transaction contains several low-level tasks:
X's Account
Open_Account(X)
Old_Balance = X.balance
New_Balance = Old_Balance - 800
X.balance = New_Balance
Close_Account(X)
Y's Account
Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)
Or,Transaction System :-
o Collection of operations that form a single logical unit of work are called transaction.
o A transaction is a unit of program execution that accesses and possibly updates various data items.
o Transaction is define as logical unit of database processing that includes one or more database access
operations.
Operations of Transaction:
Transaction access data using two operations/Following are the main operations of transaction:-
I. Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main
memory.
II. Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that transaction may
fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's
value will remain 4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:


1) Commit: It is used to save the work done permanently.
2) Rollback: It is used to undo the work done.

ACID properties of Truncation

 Transaction property:-The transaction has the four properties. These are used to maintain
consistency in a database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability

1)Atomicity:-
o It states that all operations of the transaction take place at once if not, the transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit
and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B
consists of Rs 300. Transfer Rs 100 from account A to account B.

T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.
If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then
the amount will be deducted from A but not added to B. This shows the inconsistent database state. In order
to ensure correctness of database state, the transaction must be executed in entirety.
2)Consistency
o The integrity constraints are maintained so that the database is consistent before and after the
transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable
state.
o The consistent property of database states that every transaction sees a consistent database instance.
o The transaction is used to transform the database from one consistent state to another consistent
state.
For example: The total amount must be maintained before or after the transaction.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then inconsistency will
occur.
3)Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be used by the
second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't
be accessed by any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
4)Durability
o The durability property is used to indicate the performance of the database's consistent state. It states
that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When
a transaction is completed, then the database reaches a state known as the consistent state. That
consistent state cannot be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.

 States of Transaction
In a database, the transaction can be in one of the following states -

1)Active state:-The active state is the first state of every transaction. In this state, the transaction is being
executed.
For example: Insertion or deletion or updating a record is done here. But all the records are still not saved
to the database.
2)Partially committed:-In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed in this state.
3)Committed:-A transaction is said to be in a committed state if it executes all its operations successfully. In
this state, all the effects are now permanently saved on the database system.
4)Failed state:-If any of the checks made by the database recovery system fails, then the transaction is said
to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks,
then the transaction will fail to execute.
5)Aborted:If any of the checks fail and the transaction has reached a failed state then the database recovery
system will make sure that the database is in its previous consistent state. If not then it will abort or roll
back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction, all the
executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two operations:
1. Re-start the transaction
2. Kill the transaction

 Schedule:-A series of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction.

1. Serial Schedule:-The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no
interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the individual operations of the
transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has
interleaving of operations.
3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the transaction to
execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have interleaving of their
operations.
o A non-serial schedule will be serializable if its result is equal to the result of its transactions executed
serially.
Here,
Schedule A and Schedule B are serial schedule.
Schedule C and Schedule D are Non-serial schedule.

 Conflict Serializable Schedule


o A schedule is called conflict serializability if after swapping of non-conflicting operations, it can
transform into a serial schedule.
o The schedule will be a conflict serializable if it is conflict equivalent to a serial schedule.
Conflicting Operations:-
The two operations become conflicting if all conditions satisfy:
1. Both belong to separate transactions.
2. They have the same data item.
3. They contain at least one write operation.
Example:
Swapping is possible only if S1 and S2 are logically equal.

Here, S1 = S2. That means it is non-conflict.

Here, S1 ≠ S2. That means it is conflict.

 View Serializability/schedule:-
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions:

1. Initial Read:-An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.

Above two schedules are view equivalent because Initial read operation in S1 is done by T1 and in S2 it is also
done by T1.
2. Updated Read:-In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A
which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and in S2, T3 is reading
A updated by T1.
3. Final Write:-A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.

Above two schedules is view equal because Final write operation in S1 is done by T3 and in S2, the final write
operation is also done by T3.
 File Organization
o The File is a collection of records. Using the primary key, we can access the records. The type and
frequency of access can be determined by the type of file organization which was used for a given set
of records.
o File organization is a logical relationship among various records. This method defines how file records
are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of blocks, and
the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store only one fixed
length record in any given file. An alternative approach is to structure our files so that we can contain
multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length records.
 Objective of file organization
o It contains an optimal selection of records, i.e., records can be selected as fast as possible.
o To perform insert, delete or update transaction on the records should be quick and easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.
Types of file organization:
File organization contains various methods. These particular methods have pros and cons on the basis of
access or selection. In the file organization, the programmer decides the best-suited file organization method
according to his requirement.
Types of file organization are as follows:

o Sequential file organization


o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization
1)Sequential File Organization:-This method is the easiest method for file organization. In this method, files
are stored sequentially. This method can be implemented in two ways:
1. Pile File Method:
o It is a quite simple method. In this method, we store the record in a sequence, i.e., one after another.
Here, the record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory blocks. When
it is found, then it will be marked for deleting, and the new record is inserted.
 Insertion of the new record:-Suppose we have four records R1, R3 and so on upto R9 and R8 in a
sequence. Hence, records are nothing but a row in the table. Suppose we want to insert a new record R2
in the sequence, then it will be placed at the end of the file. Here, records are nothing but a row in any
table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort the sequence
in ascending or descending order. Sorting of records is based on any primary key or any other key.
o In the case of modification of any record, it will update the record and then sort the file, and lastly,
the updated record is placed in the right place.

 Insertion of the new record:-Suppose there is a preexisting sorted sequence of four records R1, R3 and
so on upto R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then it will be
inserted at the end of the file, and then it will sort the sequence.

Pros of sequential file organization


o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used for report generation or statistical calculations.
Cons of sequential file organization
o It will waste time as we cannot jump on a particular record that is required but we have to move
sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

 2)Heap file organization


o It is the simplest and most basic type of organization. It works with data blocks. In heap file
organization, the records are inserted at the file's end. When the records are inserted, it doesn't
require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new data block need not
to be the very next data block, but it can select any data block in the memory to store new records.
The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It is the DBMS
responsibility to store and manage the new records.
Insertion of a new record
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new record R2
in a heap. If the data block 3 is full then it will be inserted in any of the database selected by the DBMS, let's
say data block 1.

If we want to search, update or delete the data in heap file organization, then we need to traverse the data
from staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming because
there is no sorting or ordering of records. In the heap file organization, we need to check all the data until we
get the requested record.
Pros of Heap file organization
o It is a very good method of file organization for bulk insertion. If there is a large number of data
which needs to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the sequential record.
Cons of Heap file organization
o This method is inefficient for the large database because it takes time to search or modify the record.
o
o This method is inefficient for large databases.

 3)Hash File Organization:-Hash File Organization uses the computation of hash function on some fields
of the records. The hash function's output determines the location of disk block where the records are to
be placed.

When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the
address is generated using the hash key and record is directly inserted. The same process is applied in the
case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
 4)B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access method. It uses a
tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records. For each
primary key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this
method, all the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf
nodes. They do not contain any records.

The above B+ tree shows that:


o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They have only
pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to the right contain
next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and accessed easily.
Pros of B+ tree file organization
o In this method, searching becomes very easy as all the records are stored only in the leaf nodes and
sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o It is a balanced tree structure, and any insert/update/delete does not affect the performance of tree.
Cons of B+ tree file organization
o This method is inefficient for the static method.

 5)Indexed sequential access method (ISAM)


ISAM method is an advanced sequential file organization. In this method, records are stored in the file using
the primary key. An index value is generated for each primary key and mapped with the record. This index
contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block is fetched and
the record is retrieved from the memory.
Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a huge database is
quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is based on the
primary key values, we can retrieve the data for the given range of value. In the same way, the partial
value can also be easily searched, i.e., the student name starting with 'JA' can be easily searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain the sequence.

 6)Cluster file organization


o When the two or more records are stored in the same file, it is known as clusters. These files will have
two or more tables in the same data block, and key attributes which are used to map these tables
together are stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining the tables with the same
condition. These joins will give only a few records from both tables. In the given example, we are
retrieving the record for only particular departments. This method can't be used to retrieve the record
for the entire department.

In this method, we can directly insert, update or delete any record. Data is sorted based on the key with
which searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:-In indexed cluster, records are grouped based on the cluster key and stored together.
The above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the
records are grouped based on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:-It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the records with the same
hash key value.
Pros of Cluster file organization
o The cluster file organization is used when there is a frequent request for joining the tables with same
joining condition.
o It provides the efficient result when there is a 1:M mapping between the tables.
Cons of Cluster file organization
o This method has the low performance for the very large database.
o This method is not suitable for a table with a 1:1 condition.
 Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the number of disk accesses
required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a database table
quickly.
Index structure:
Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary key or candidate
key of the table. The values of the primary key are stored in sorted order so that the corresponding
data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers holding the
address of the disk block where the value of the particular key can be found.
Indexing Methods

Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If
their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
o In the case of a database with no index, we have to search the disk block from starting till it reaches
543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record after reading
542*2= 1084 bytes which are very less compared to the previous case.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Dense index
o The dense index contains an index record for every search key value in the data file. It makes
searching faster.
o In this, the number of records in the index table is same as the number of records in the main table.
o It needs more space to store index record itself. The index records have the search key and a pointer
to the actual record on the disk.

Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the records in the
main table in a gap.

Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created on non-
primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the unique value
and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we use a clustering
index, where all employees which belong to the same Dept_ID are considered within a single cluster, and
index pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.

The previous schema is little confusing because one disk block is shared by records which belong to the
different cluster. If we use separate disk block for separate clusters, then it is called better technique.

Secondary Index:-In the sparse indexing, as the size of the table grows, the size of mapping also grows.
These mappings are usually kept in the primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the address got from mapping. If the mapping size
grows then fetching the address itself becomes slower. In this case, the sparse index will not be efficient. To
overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this method,
the huge range for the columns is selected initially so that the mapping size of the first level becomes small.
Then each range is further divided into smaller ranges. The mapping of the first level is stored in the primary
memory, so that address fetch is faster. The mapping of the second level and actual data are stored in the
secondary memory (hard disk).
For example:
o If you want to find the record of roll 111 in the diagram, then it will search the highest entry which is
smaller than or equal to 111 in the first level index. It will get 100 at this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the address 110,
it goes to the data block and starts searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is also done in the
same manner.
 B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random
access as well as sequential access.
 Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the order n
where n is fixed for every B+ tree.
o It contains an internal node and leaf node.

Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
o At most, an internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
 Searching a record in B+ Tree:-Suppose we have to search 55 in the below B+ tree structure. First, we
will fetch for the intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we will be
redirected to the third leaf node. Here DBMS will perform a sequential search to find 55.

 B+ Tree Insertion:-Suppose we want to insert a record 60 in the below structure. It will go to the 3rd
leaf node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60
there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the fill factor,
balance and order.

The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf
node of the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70)
into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added to it,
and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the
node where it fits and then place it in that leaf node.
 B+ Tree Deletion:-Suppose we want to delete 60 from the above example. In this case, we have to
remove 60 from the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to have a
balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

Basis of B tree B+ tree


Compariso
n

All internal and leaf nodes have


Pointers Only leaf nodes have data pointers
data pointers

Since all keys are not available at


All keys are at leaf nodes, hence search is
Search leaf, search often takes more
faster and more accurate.
time.

Redunda No duplicate of keys is Duplicate of keys are maintained and all


nt Keys maintained in the tree. nodes are present at the leaf.

Insertion takes more time and it Insertion is easier and the results are always
Insertion
is not predictable sometimes. the same.
Basis of B tree B+ tree
Compariso
n

Deletion of the internal node is


Deletion of any node is easy because all node
Deletion very complex and the tree has to
are found at leaf.
undergo a lot of transformations.

Leaf Leaf nodes are not stored as Leaf nodes are stored as structural linked
Nodes structural linked list. list.

Sequential access to nodes is not Sequential access is possible just like linked
Access
possible list

For a particular number nodes Height is lesser than B tree for the same
Height
height is larger number of nodes

Applicati B-Trees used in Databases, Search B+ Trees used in Multilevel Indexing,


on engines Database indexing

Number Number of nodes at any Each intermediary node can have n/2 to n
of Nodes intermediary level ‘l’ is 2l. children.

S.NO B-tree Binary tree

In a B-tree, a node can have maximum


While in binary tree, a node can have
1. ‘M'(‘M’ is the order of the tree) number
maximum two child nodes or sub-trees.
of child nodes.

While binary tree is not a sorted tree. It can


B-tree is called a sorted tree as its
2. be sorted in inorder, preorder, or postorder
nodes are sorted in inorder traversal.
traversal.

B-tree has a height of log(M*N) (Where


While binary tree has a height of log2(N)
3. ‘M’ is the order of tree and N is the
(Where N is the number of nodes).
number of nodes).

Unlike B-tree, binary tree is performed when


B-Tree is performed when the data is
4. the data is loaded in the RAM(faster
loaded into the disk.
memory).

B-tree is used in DBMS(code indexing, While binary tree is used in Huffman coding
5.
etc). and Code optimization and many others.

To insert the data or key in B-tree is While in binary tree, data insertion is not
6.
more complicated than a binary tree. more complicated than B-tree.
Definition of B-tree
B-tree in DBMS is an m-way tree which self balances itself. Due to their balanced structure, such trees are
frequently used to manage and organise enormous databases and facilitate searches. In a B-tree, each node
can have a maximum of n child nodes. In DBMS, B-tree is an example of multilevel indexing. Leaf nodes and
internal nodes will both have record references. B-Tree is called Balanced stored trees as all the leaf nodes
are at same levels.

Properties of B-tree
 All leaves are at the same level.
 B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends upon disk block size.
 Every node except the root must contain at least t-1 keys. The root may contain a minimum of 1 key.
 All nodes (including root) may contain at most (2*t – 1) keys.
 Number of children of a node is equal to the number of keys in it plus 1.
 All keys of a node are sorted in increasing order. The child between two keys k1 and k2 contains all
keys in the range from k1 and k2.
 B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search Trees
grow downward and also shrink from downward.
 Like other balanced Binary Search Trees, the time complexity to search, insert, and delete is O(log
n).
 Insertion of a Node in B-Tree happens only at Leaf Node.

Need of B-tree
 For having optimized searching we cannot increase a tree's height. Therefore, we want the tree to be
as short as possible in height.
 Use of B-tree in DBMS, which has more branches and hence shorter height, is the solution to this
problem. Access time decreases as branching and depth grow.
 Hence, use of B-tree is needed for storing data as searching and accessing time is decreased.
 The cost of accessing the disc is high when searching tables Therefore, minimising disc access is our
goal.
 So to decrease time and cost, we use B-tree for storing data as it makes the Index Fast.
Interesting Facts about B-Trees:
 The minimum height of the B-Tree that can exist with n number of nodes and m is the
maximum number of children of a node can have

is:
 The maximum height of the B-Tree that can exist with n number of nodes and t is the
minimum number of children that a non-root node can have

is: and
Traversal in B-Tree:
Traversal is also similar to Inorder traversal of Binary Tree. We start from the leftmost child, recursively
print the leftmost child, then repeat the same process for the remaining children and keys. In the end,
recursively print the rightmost child.

Search Operation in B-Tree:


Search is similar to the search in Binary Search Tree. Let the key to be searched is k.

 Start from the root and recursively traverse down.


 For every visited non-leaf node,
 If the node has the key, we simply return the node.
 Otherwise, we recur down to the appropriate child (The child which is just before the first
greater key) of the node.
 If we reach a leaf node and don’t find k in the leaf node, then return NULL.
Searching a B-Tree is similar to searching a binary tree. The algorithm is similar and goes with recursion.
At each level, the search is optimized as if the key value is not present in the range of the parent then the
key is present in another branch. As these values limit the search they are also known as limiting values or
separation values. If we reach a leaf node and don’t find the desired key then it will display NULL.

How Database B-Tree Indexing Works


 When B-tree is used for database indexing, it becomes a little more complex because it has both a key
and a value. The value serves as a reference to the particular data record. A payload is the collective
term for the key and value.
 For index data to particular key and value, the database first constructs a unique random index or
a primary key for each of the supplied records. The keys and record byte streams are then all stored
on a B+ tree. The random index that is generated is used for indexing of the data.
 So this indexing helps to decrease the searching time of data. In a B-tree, all the data is stored on the
leaf nodes, now for accessing a particular data index, database can make use of binary search on the
leaf nodes as the data is stored in the sorted order.
 If indexing is not used, the database reads each and every records to locate the requested record and it
increases time and cost for searching the records, so B-tree indexing is very efficient.
How Searching Happens in Indexed Database?
The database does a search in the B-tree for a given key and returns the index in O(log(n)) time. The record
is then obtained by running a second B+tree search in O(log(n)) time using the discovered index. So overall
approx time taken for searching a record in a B-tree in DBMS Indexed databases is O(log(n)).
Examples of B-Tree
Suppose there are some numbers that need to be stored in a database, so if we store them in a B-tree in DBMS,
they will be stored in a sorted order so that the searching time can be logarithmic.
example:

The above data is stored in sorted order according to the values, if we want to search for the node containing
the value 48, so the following steps will be applied:
 First, the parent node with key having data 100 is checked, as 48 is less than 100 so the left children
node of 100 is checked.
 In left children, there are 3 keys, so it will check from the leftmost key as the data is stored in sorted
order.
 Leftmost element is having key value as 48 which match the element to be searched, so thats how we
the element we wanted to search.
Applications of B-Trees:
 It is used in large databases to access data stored on the disk
 Searching for data in a data set can be achieved in significantly less time using the B-Tree
 With the indexing feature, multilevel indexing can be achieved.
 Most of the servers also use the B-tree approach.
 B-Trees are used in CAD systems to organize and search geometric data.
 B-Trees are also used in other areas such as natural language processing, computer networks, and
cryptography.
Advantages of B-Trees:
 B-Trees have a guaranteed time complexity of O(log n) for basic operations like insertion, deletion,
and searching, which makes them suitable for large data sets and real-time applications.
 B-Trees are self-balancing.
 High-concurrency and high-throughput.
 Efficient storage utilization.
Disadvantages of B-Trees:
 B-Trees are based on disk-based data structures and can have a high disk usage.
 Not the best for all cases.
 Slow in comparison to other data structures.

Time Complexity of B-Tree:


Sr. No. Algorithm Time Complexity

1. Search O(log n)

2. Insert O(log n)

3. Delete O(log n)

You might also like