DBMS
DBMS
↓
Data: Data is any known facts or any smallest information that can be recorded and have implicit
meaning
Eg :- Sanjana , BSC(CS) ,DCA, 2004 .
Data Information
Why we need Data ? 1)Data is collection 1)Information
Ans-To derive some information from it. of raw facts and is processed
figures data
Information :-When data is processed ,organized,structured
2)Data is not 2)Information
or presented in a given context to make it more useful it is
arranged. is arranged
called information .
3)Data is 3)Information
unorganized is organized
Data Base:-It is collection of related data..…. Here related
data means if are collecting the information of an employee.It 4)Data does not 4)Information
should be related to employee. And DATA BASE should have depend on depends on
collecting of this employee data. information data
Eg: Name Age Designation Salary 5)Data is low -level 5)Information
Related Data Sanjana 20 Clerk 19000 knowledge. is the second
Collection Sana 23 Data Analyst 50000 level of
Of : language
Related :
data Sara 23 Data Analyst 80,000
Data Base System:- It is a system in which ensure uses the Database Technology in order to achieve
an organized store a large no.of dynamic associated data with the help of Hardware ,software (DBMS),
OS.
Data Base System:- Composed of 5 major parts - Hardware , Software(DBMS), people, procedure,
data
Data Base Management System:-It is a set of software programs that allows users to create,edit and
update data in database files and store and retrieve data from those database files.
Example-Oracle, MS Sql server ,MYSQL,SQL ,DB2(IBM)
A database is a A DBMS is a
A data Base Management System (DBMS) is a collection of
collection of collection of
interrelated data and a set of programs to access those data.
connected programs that
information allow you to
DBMS is used to organize the data in the form of a
about create,manage
table ,schema,view and report etc.
people,location and operate a
The primary goal of a DBMS is to provide a way to store and
or things database
retrieve database information that is both convenient and efficient.
DBMS can also be define as an interface between the application program and the OS to access and
manipulate that database.
Database Management system is a software which is used to manage the database.
Example- MySQL, Oracle etc are a very popular commercial database which is used in different application.
Characteristics of DBMS :-1)Self describing nature of a database system(catalog)
2) It can provide a clear and logical view of the process that manipulates data
3) DBMS contains automatic backup and recovery procedures
4) It can reduce the complex relationship between data
5) It is used to provide security of data.
Application of DBMS:-
1) Banking : For maintaining customer information,accounts ,loans and banking transactions
2) Universities :For maintaining student records ,course registration grades.
3) Railway Reservation :For checking the availability of reservation in different trains,tickets etc.
4) Airlines :For reservation and schedule information
5) Telecommunication- :For Keeping records of calls mode ,generating monthly bills etc.
6) Finance : For storing information about holidays ,sales and purches of financial instruments
7) Sales :For customer ,product and purchase information
Advantage of DBMS
1) Control database Redundancy:- It control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database .
2) Data sharing :- It DBMS the authorized users of an organization can share the data among multiple users.
3) Easily Maintenance:-It can be easily maintainable due to the centralized nature of the database system.
4) Reduce Time:-It reduces development time and maintenance need.
5) Backup: It provides backup and recovery subsystems which create automatic backup of data from
hardware and software failure and restores the data if requires.
6) Multiple user interface: It provides different types of user interface like graphical user
interfaces ,application program interface.
Disadvantage of DBMS: 1)Cost of hardware and software :It requires a high speed of data processor
and large memory size to run DBMS software.
2) Size :It occupies a large space of disks and large memory to run then efficiently.
3) Complexity : Database system creates additional complexity and requirements.
Disadvantage of File System :- 1) Data Redundancy and Inconsistency 2)Difficulty in Accessing Data
3) Data Isolation 4) Integrity Problem 5) Atomicity Problem 6)Concurrent Access Anomalies
7)Security problem
Type of databases :-There are various types of databases used for storing different varieties of data.
Type of Database
1) Centralized Database:-It is the type of database that stores data at a centralized database system. It
comforts the users to access the stored data from different locations through several applications. These
applications contain the authentication process to let users access data securely.
An example of a Centralized database can be Central Library that carries a central database of each library
in a college/university.
2) Distributed Database:- In distributed systems, data is distributed among different database systems of an
organization. These database systems are connected via communication links. Such links help the end-users
to access the data easily.
Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc.
It divided into two subpart-
Distributed Data base
Homogeneous Heterogeneous
DDB DDB
o Homogeneous DDB: Those database systems which execute on the same operating system and use the
same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating systems under
different application procedures, and carries different hardware devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database, i.e., the system can be expanded by
including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.
3) Relational Database:-It stores data in the form of rows(tuple) and columns(attributes), and together
forms a table(relation). A relational database uses SQL for storing, manipulating, as well as maintaining the
data. E.F. Codd invented the database in 1970. Each table in the database carries a key that makes the data
unique from others
Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc
4)No Sql Database(With out structure data store kora):-Non-SQL/Not Only SQL is a type of database that
is used for storing a wide range of data sets. It is not a relational database as it stores data not only in
tabular form but in several different ways.
It also divides into 4 sub part-
a. Key-value storage
b. Document-oriented Database
c. Graph Databases
d. Wide-column stores
Advantages of NoSQL Database
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.
5)Cloud Database:-A type of database where data is stored in a virtual environment and executes over the
cloud computing platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for
accessing the database. There are numerous cloud platforms, but the best options are:
o Amazon Web Services(AWS)
o Microsoft Azure
o ScienceSoft
o Google Cloud SQL, etc
6)Object-oriented Databases:The type of database that uses the object-based data model approach for
storing data in the database system. The data is represented and stored as objects which are similar to the
objects used in the object-oriented programming language.
7) Hierarchical Databases:It is the type of database that stores data in the form of parent-children
relationship nodes. Here, it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record in the tree will contain
only one parent. On the other hand, each parent record can have multiple child records.
8)Network Databases:It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them. Unlike the hierarchical
database, it allows each record to have multiple children and parent nodes to form a generalized graph
structure.
table/Relation :Everything in a relational database is stored in the form of relations. The RDBMS
database uses tables to store data. A table is a collection of related data entries and contains rows and
columns to store data.
Properties of a Relation:
o Each relation has a unique name by which it is identified in the database.
o Relation does not contain duplicate tuples.
o The tuples of a relation have no specific order.
o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.
-
row or record: A row of a table is also called a record or tuple. It contains the specific information of
each entry in the table. It is a horizontal entity in the table
Properties of a row:
o No two tuples are identical to each other in all their entries.
o All tuples of the relation have the same format and the same number of entries.
o The order of the tuple is irrelevant. They are identified by their content, not by their position.
column/attribute/fields :A column is a vertical entity in the table which contains all information
associated with a specific field in a table.
Properties of an Attribute:
o Every attribute of a relation must have a name.
o Null values are permitted for the attributes.
o Default values can be specified for an attribute automatically inserted if no other value is specified for
an attribute.
o Attributes that uniquely identify each tuple of a relation are the primary key.
o
data item/Cells:-The smallest unit of data in the table is the individual data item. It is stored at the
intersection of tuples and attributes. ID Name AGE COURSE
Properties of data items:1)Data items are atomic.
2)The data items for an attribute should be drawn from the 1 Debraj 20 BSC
same domain.
In the below example, the data item in the student table consists of Debraj, 20 and BSC, etc.
Degree:The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
ID Name AGE COURSE
1 Sara 24 B.tech
2 Sana 20 C.A
3 Deb 20 BCA
4 Raj 22 MCA
5 Debraj 20 BSC
Cardinality:The total number of tuples at any one time in a relation is known as the table's cardinality.
The relation whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
Domain:The domain refers to the possible values each attribute can contain. It can be specified using
standard data types such as integers, floating numbers, etc. For example, An attribute entitled
Marital_Status may be limited to married or unmarried values.
Codd’s Rules in RDBMS :-Dr E.F codd is an IBM researcher who first developed the relational data
model in 1970. In 1985 Dr.codd published a list of 12 rules that define an ideal relational database
and has provided a guideline for the design of all relational database
Rule 1: The Information Rule : This rule simply requires that all data should be presented in table form this
is the basis of relational model.
Rule 2: The Guaranteed Access Rule :Each data element is guaranteed to be accessible logically with a
combination of the table name, primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values:Every Null value in a database must be given a systematic
and uniform treatment.
Rule 4: Active Online Catalog Rule:The database catalog, which contains metadata about the database, must
be stored and accessed using the same relational database management system.
Rule 5: The Comprehensive Data Sub language Rule: A crucial component of any efficient database system
is its ability to offer an easily understandable data manipulation language (DML) that facilitates defining,
querying, and modifying information within the database.
Rule 6: The View Updating Rule:All views that are theoretically up datable must also be up datable by the
system.
Rule 7: High-level Insert, Update, and Delete:-A successful database system must possess the feature of
facilitating high-level insertions, updates, and deletions that can grant users the ability to conduct these
operations with ease through a single query.
Rule 8: Physical Data Independence:Application programs and activities should remain unaffected when
changes are made to the physical storage structures or methods.
Rule 9: Logical Data Independence :Application programs and activities should remain unaffected when
changes are made to the logical structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence:Integrity constraints should be specified separately from application
programs and stored in the catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence:The distribution of data across multiple locations should be invisible to
users, and the database system should handle the distribution transparently.
Rule 12: Non-Subversion Rule:If the interface of the system is providing access to low-level records, then
the interface must not be able to damage the system and bypass security and integrity constraints.
Key DBMS RDBMS
Query There is no efficient query processing in the Efficient query processing is there in
processing file system. DBMS.
User Access Only one user can access data at a time. Multiple users can access data at a time.
1-Tier Architecture:- this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.Any changes done here will directly be done on the database itself. It
doesn't provide a handy tool for end users.The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate with the database for the quick response.
Application architecture of DBMS :-
o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
with a large number of PCs, web servers, database servers and other components that are connected
with networks.
o DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS Architecture - 1-tier architecture , 2-tier architecture and 3-tier architecture
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of
two types like: 2-tier architecture and 3-tier architecture.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
Schema:- The schema defines the tables, the attributes along with its size and type and relationship
between attributes(column) and table.Overol design of the data is called the database schema.
Database Instance:- Database changes over time as information are inserted and deleted .The
collection at particular moment is called Database Instance.
Database
1. Internal Level/Physical level :-The internal level has an internal schema which describes the physical
storage structure of the database.
2. Conceptual Level :The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o In the conceptual level, internal details such as an implementation of the data structure are hidden.
o Programmers and database administrators work at this level.
o
3. External Level /View Level/view schema :At the external level, a database contains several schemas that
sometimes called as sub schema. The sub schema is used to describe the different view of the database.
Data Model:-Data model is the modeling of the data description ,data semantics and consistency
constraints of the data.
• Data model provides the conceptual tools for describing the design of a database at each level of data
abstraction.
• A data model can also be define as the collection of high level data description constructs that hide many
low level storage details.
There are mainly three types of data model
data model
1) Object Based Data model :-It is used to describe the data at the logical and view level .Object based data
model provide flexible structuring and structuring capabilities and allow to specify data constraints.
B) Object Oriented Data Model :-In an object oriented model ,information or data is displayed as an object
and these objects store the value in the instance variable .In this model object oriented programming images
are used.
This model works with object oriented programming language like Python,Java etc it was constructed in the
1980.
2) Record Base Data Model:-It is used to describe data at logical and view level.
• This data model is used to specify the overall logical structure and to specify the higher level structure
and provide higher level description
.
There are three type of record based data mode-
B) Network Data Model :-In network data model data is organized into graph .And it can have more than one
parent node. It permits the modeling of man to many relationships in data.
Store
Order Items
C) Hierarchical Data Model:- The Hierarchical Data Model organizes data in a tree structure .
In this model each entity has only one parent and may abstract children.There is only one entity in this model
that we call root.
College
Department Information
3)Physical Data model:-This data model is used to describe the data at low level
DBMS in Interface:-A database management system (DBMS) interface is a user interface that allows
for the ability to input queries to a database without using the query language itself.
User-friendly interfaces provided by DBMS may include the following:
Menu-Based Interfaces
Forms-Based Interfaces
Graphical User Interfaces
Natural Language Interfaces
Speech Input and Output Interfaces
Interfaces for Parametric Users
Interfaces for the Database Administrator (DBA)
1) Menu-Based Interfaces:These interfaces present the user with lists of options (called menus) that lead
the user through the formation of a request. The basic advantage of using menus is that they remove the
tension of remembering specific commands and syntax of any query language.
2) Forms-Based Interfaces:A forms-based interface displays a form to each user. Users can fill out all of
the form entries to insert new data, or they can fill out only certain entries, in which case the DBMS will
redeem the same type of data for other remaining entries.. Many DBMS’s have form specification
languages which are special languages that help specify such forms.
3) Graphical User Interface:A GUI typically displays a schema to the user in diagrammatic form. The user
then can specify a query by manipulating the diagram. In many cases, GUI utilize both menus and forms.
Most GUI use a pointing device such as a mouse, to pick a certain part of the displayed schema diagram.
4) Natural Language Interfaces:These interfaces accept requests written in English or some other
language and attempt to understand them. A Natural language interface has its own schema, which is
similar to the database conceptual schema .
5) Speech Input and Output Interfaces:There is limited use of speech be it for a query or an answer to a
question or being a result of a request it is becoming commonplace. Applications with limited vocabulary
such as inquiries for telephone directory, flight arrival/departure, and bank account information are
allowed speech for input and output to enable ordinary folks to access this information.
The Speech input is detected using predefined words and used to set up the parameters that are supplied
to the queries. For output, a similar conversion from text or numbers into speech takes place.
6) Interface for Parametric Users:Interfaces for Parametric Users contain some commands that can be
handled with a minimum of keystrokes. It is generally used in bank transactions for transferring money.
These operations are performed repeatedly.
7) Interfaces for Database Administrators (DBA):-Most database system contains privileged commands
that can be used only by the DBA’s staff. These include commands for creating accounts, setting system
parameters etc.
component of ER Diagram :-
Strong Entity
set set
1. Entity: it is a thing or object in the real world that is distinguishable from all other object .
• Anything about Which we store information is called an Entity.
Entity Set :- It is a set of entities of the some type that share the some properties or attributes.
• An Entity set can be represented as rectangle.
1)Weak Entity set :- An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.
2) Strong Entity Set :- A strong entity set is an entity set that contains sufficient attributes to uniquely
identify all its entities .
• Primary key exists for a strong entity set.
• Single rectangle is used to representing a strong entity set .
Type of attributes :-
1) Simple attribute : An attribute that cannot be further subdivided into components is a simple
attribute. It is represent by ellipse.
Example: The roll number of a student, the id number of an employee.
Roll no
2) Composite attribute : An attribute that can be split into components is a composite attribute.
The composite attributes is represent by an ellipse and those Ellipse are connected with an ellipse
First Name
Last Name
3)Multi-valued attribute : An attributes can have more than one value these attributes are known as a
Multi-valued attributes.The double ellipse is used to represent multi valued attributes.
Example: A student can have more than one phone number.
Phone no
4)Derived attribute : An attribute that can be derived from other attributes is derived attributes.
It can be represented by a dashed ellipse.
Example:A person age changes are time and can be derived from another attributes like date of birth.
Age age
5)Key attribute:The key attributes is used to represent the main characteristic of an Entity .It represent a
primary key.The key attribute is represented by an ellipse with the next underlined.
Student -ID
6)Single-valued attribute : The attribute which takes up only a single value for each entity instance is a
single-valued attribute.
Example: The age of a student.
7)Complex attribute : Those attributes, which can be formed by the nesting of composite and multi-valued
attributes, are called “Complex Attributes“. These attributes are rarely used in DBMS(DataBase
Management System). That’s why they are not so popular.
8)Stored attribute:The stored attribute are those attribute which doesn’t require any type of further
update since they are stored in the database.
Example: DOB(Date of birth) is the stored attribute.
Relationship/Mapping construction :-
Relationship:-A relationship is used to describe the relation between entities. Diamond or rhombus is used
to represent the relationship.
a. One-to-One Relationship:-When only one instance of an entity is associated with the relationship, then it
is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship:-When only one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship:When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship:When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is known as a many-to-many
relationship.
For example, Employee can assign by many projects and project can have many employees.
Notation of E-R Diagram :-Database can be represented using the notations. In ER diagram, many
notations are used to express the cardinality. These notations are as follows:
Construct an E-R diagram for a hospital with a set of patients and a set of medical doctor.
Keys in DBMS .
Keys:-
o A key is a value which can always be used to uniquely identify an object instance.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of keys:
1. Primary key
o Primary key can be define as the minimum no of candidate key this is chosen by the database designer
as the principle means of identifying entities within an entity set.
o It is a unique key.
o It can identity only one tuple (are cord) at a time .
o It has no duplicate values it has unique values
o It cannot be NULL.
o Primary keys are not necessary to be a single column,more than one the column can also be a primary
key for a table.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate
keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3.Super Key: A super key is a set of one or more attributes that taken collectively allow us to identify
uniquely an entity in the entity set.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a
key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
4. Foreign key
o A Foreign keys is a column whose value are the same as the primary key of another table.
o It combines two or more relations(table) at a time.
o They act as a crass reference between the tables
o Foreign key are the column of the table used to point to the primary key of another table
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also
known as Concatenated Key.
For example-In student table with attribute (s_roll no,s_ID,s_name,s_branch)
Composite key- s_roll no s_ID.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a
primary key is large and complex and has no relationship with many other relations. The data values of the
artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee
relations. So it would be better to add a new virtual attribute to identify each tuple in the relation uniquely.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a
relation. These attributes or combinations of the attributes are called the candidate keys. One key is chosen
as the primary key from these candidate keys, and the remaining candidate key, if it exists, is termed the
alternate key. In other words, the total number of the alternate keys is the total number of candidate keys
minus the primary key. The alternate key may or may not exist. If there is only one candidate key in a
relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In
this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the
Alternate key.
DBMS Generalization :-
o Generalization is like a bottom-up approach in which two or more entities of lower level combine to
form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the lower level to
form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person
DBMS Specialization :-
o Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one
higher level entity can be broken down into two lower level entities.
o Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
relationship set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
Person
Salary Credit-rating
Is a
Employee Customer
Generalization
Is A
Specialization
DBMS Aggregation :-
o Aggregation is a technique to express relationship among relationship.
o Through E-R modeling we cannot express relationship among relationships .Thus we use the concept
of aggregation for this purpose
o Aggregation is an abstraction through which relationship are treated as entities
o In aggregation, the relation between two entities is treated as a single entity.
o In aggregation, relationship with its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about the Center instead he will ask the enquiry about both.
DBMS Architecture/Structure/Component :-A database system is partitional into modules that deal
with each of the responsibilities of the overall system.
DBMS Architecture divided into 4 parts-
1) DBMS Users
2) Query Processor
3) Storage Processor
4) Disk Storage.
1) DBMS User: Database users are categorized based up on their interaction with the data base
A)Naive/End Users :-End users are the unsophisticated who don’t have any DBMS knowledge but they
frequently use the database applications in their daily life to get the desired results.
For example- Railways ticket booking users are have users.Clearcks in any bank is a naive user because they
do not have any DBMS knowledge but they still use the database and perform their given tasks.
B)Application Programmer :-A application program are the back end programmers who writes the code for
the application program.They are the computer professionals.These program could be written in
programming languages such as Net,C,C++,Java etc.
C)Sophisticated Users :-Sophisticated users can be engineers,scientists can be business analyst,who are
familiar with the database.They can develop their own database application according to their requirement.
They don’t write the program code but they interact the data base by writing SQL queries directly through the
query processor.
D) Database Administrator(DBA):-DBA is a person/team who who defines the schema and also controls the
3 levels of database.
• The DBA will then create a new account id and password for the user if he/she need to access the database.
• DBA is also responsible for providing security to the database he allow only the authorized users to
access/modify the database.
• DBA monitors the recovery and backup and provide technical support.
• The DBA has a DBA account in the DBMS which called a system or super user account.
• DBA repairs damage caused due to hardware and/or software failures.
2)Query Processor:-In interprets the requests(queries) received from end user via an application program
into instruction.It also executes the user request which is received from the DML compiler.
a) DDL Compiler : The DDL statements are sent to DDL compiler,which converts these statements to set of
tables.These tables contains the meta data concerning the database and are in the form that can be used by
other components of the DBMS.
b) DML pre-compiler and Query Processor :-The DML pre compiler converts the DML statements embedded
in an application program to normal procedure calls in the host language.
I) DDL interpreter :It processes the DDL statements into a set of table containing meta data(data about
data)
II) DML Compiler:-It processes the DML statements into low level instruction(machine language)so that
they can be executed.
III) Query Evaluation Engine :-Which executes low-level instructions generated by the DML compiler.
3)Storage Manager/processor:-Storage manager is a program that provides an interface between the data
Stored in the database and the queries received. It is also known as database control system .it maintains
the consistency and integrity of the database by applying the constraints and executes the DCL statements.
It is responsible for updating ,storing deleting and retrieving data in the database.
o A key attribute of the entity type represented by the primary key:-In the given ER diagram,
COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of the entity.
o The multi valued attribute is represented by a separate table:-In the student table, a hobby is a
multi valued attribute. So it is not possible to represent multiple values in a single column of
STUDENT table. Hence we create a table STUD_HOBBY with column name STUDENT_ID and HOBBY.
Using both the column, we create a composite key.
o Derived attributes are not considered in the table:-In the STUDENT table, Age is the derived
attribute. It can be calculated at any point of time by calculating the difference between current date
and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
between the tables. Table structure for the given ER diagram is as below:
Relational instance: In the relational database system, the relational instance is represented by a finite set
of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all columns or
attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row in the
relation uniquely.
Integrity Constraints :-
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of the
attribute must be available in the corresponding domain.
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key
can contain a unique and null value in the relational table.
Example:
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query.Relational algebra mainly provides theoretical foundation for relational databases and SQL. It uses
operators to perform queries.
Types of Relational operation
Notation: σ p(r)
Where: σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: Student
Name Roll No Address
Sana 02 Purulia
Sara 04 Bakura
Query 1 :Give all information of student having roll no is 04
Deb 08 Delhi Solution : σ roll no=04(student)
Raj 13 Bombay
Query 2: Find all information of student having name is deb and
address is Delhi
Solution : σ Name=”deb” and address=”Delhi”(student)
σ (Name=”deb”) V (address=”Delhi”)(student) [V= or]
3. Union Operation(∪):
o It performs binary union between two given relations and is define as R ∪S
Where R and S are either database relations or relation result set(temporary relation )
Notation: R - S
(student 1) - (student 2)
Example -∏ name ∏ name
6. Cartesian product:
o The Cartesian product is used to combine each row in one table with each row in the other table. It is
also known as a cross product.
o It is denoted by X.
Notation: E X D
Where E and D are relations and their output will be define as-
E X D={q €|q€E and € ε D}
Example:
(Student 1 X Student 2)
Σ Name=’Kamal
7. Rename Operation:The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Normalization
Functional Dependency :-The functional dependency is a relationship that exists between two attributes.
It typically exists between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know
the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as: Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency:-
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B
ABEF={A,B,C,D,E,F}
S.K A→C,C→D,D→B,A→B Transitive Dependency
AEF={A,B,C,D,E,F}
E→F
AE={A,B,C,D,E,F}
Candidate Key
Prime attributes=(A,E) if no prime attributes available to right hand side of any function dependency so,it
has only one candidate key ( AE ).
Problem-2:Find the possible candidate key of the Relation R(A,B,C,D) with functional dependency
A→B,B→C,C→A.
Solution: S.K A B C D → {A,B,C,D}
S.K A C D →{A,B,C,D}
S.K A D →{A,B,C,D}
AD is candidate key.
AD prime attributes A,D prime attributes A available in right side of function dependency C→A so,another
candidate key CD.
Again C is available in right side of function dependency B→C so another candidate key.
BD
So candidate key = AD,CD,BD
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table
1NF (first normal A relation is in 1NF if it contains an atomic value. It Eliminate Repeating Groups
form)
2NF (2nd normal A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
form) on the primary key. It Eliminate partial functional dependency.
3NF (3rd normal A relation will be in 3NF if it is in 2NF and no transition dependency exists. It Eliminate
form) transitive dependency.
BCNF (4 th A stronger definition of 3NF is known as Boyce Codd's normal form.it is called 3.5 NF.
normal form)
4NF(4th normal A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
form dependency. Eliminate multi- values Dependency
5NF( 5th normal A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
form) lossless.Eliminate join Dependency.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
Depart.Name Depart.location
Accounts 102
Sales 104
Store 106
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
Emp-id Emp-Country
264 India
364 UK
EMP_DEPT table:
Emp-Dept Emp_Type Emp_Dept No
Designing D394 283
Testing D394 300
Stored D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
Emp_ID Emp_Dept
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables.
STUDENT_COURSE
Stu-Id Course
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
Stu-Id Hobby
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
The given table is not in 4NF and 5NF first we convert it in 4NF with converting in two sub table.
Table-1 Faculty-Subject
Faculty Subject
Sana DBMS
Sana Java
Sana C
Table-2 Faculty -committee:-
Faculty Committee
Sana Placement
Sana Scholarship
To convert it in 5NF ,we join both table1 and table2 if it give the result same as original table (faculty)then
its in 5NF otherwise not in 5NF.
Table 1 + Table 2
Faculty Subject Committee
Sana DBMS Placement
Sana DBMS Scholarship
Sana Java Placement
Sana Java Scholarship
Sana C Placement
Sana C Scholarship Is equl to original table so ,it is in 5NF.
Relational Decomposition:-
o When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be
lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it
was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.
Example: Emp-info
Emp_ID Emp_Name Emp_Age Emp_location Dept_ID Dept_Name
€001 Sana 29 Hariduwar Dpt1 Operation
€002 Sara 32 Dehradun Dpt2 HR
€003 Deb 22 Delhi Dpt3 Finance
Decompose the above table into two tables :
1) Emp Details
Emp_ID Emp_Name Emp_Age Emp_location
€001 Sana 29 Hariduwar
€002 Sara 32 Dehradun
€003 Deb 22 Delhi
2) Dept Details
Emp_ID Dept_ID Dept_Name
€001 Dpt1 Operation
€002 Dpt2 HR
€003 Dpt3 Finance
Now ,natural join is applied on the above two tables.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD
A->BC is a part of relation R1(ABC).
Lossy Decomposition :When a relation is decomposed into two or more relational schemas,the loss of
information unavoidable when the original relation is retrieved.
Example-Emp Info
Emp_ID Emp_Name Emp_Age Emp_location Dept_ID Dept_Name
€001 Sana 29 Hariduwar Dpt1 Operation
€002 Sara 32 Dehradun Dpt2 HR
€003 Deb 22 Delhi Dpt3 Finance
Decompose the above table into two tables :
< Emp Details >
Emp_ID Emp_Name Emp_Age Emp_location
€001 Sana 29 Hariduwar
€002 Sara 32 Dehradun
€003 Deb 22 Delhi
Transaction property:-The transaction has the four properties. These are used to maintain
consistency in a database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
1)Atomicity:-
o It states that all operations of the transaction take place at once if not, the transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit
and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B
consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.
If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then
the amount will be deducted from A but not added to B. This shows the inconsistent database state. In order
to ensure correctness of database state, the transaction must be executed in entirety.
2)Consistency
o The integrity constraints are maintained so that the database is consistent before and after the
transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable
state.
o The consistent property of database states that every transaction sees a consistent database instance.
o The transaction is used to transform the database from one consistent state to another consistent
state.
For example: The total amount must be maintained before or after the transaction.
States of Transaction
In a database, the transaction can be in one of the following states -
1)Active state:-The active state is the first state of every transaction. In this state, the transaction is being
executed.
For example: Insertion or deletion or updating a record is done here. But all the records are still not saved
to the database.
2)Partially committed:-In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed in this state.
3)Committed:-A transaction is said to be in a committed state if it executes all its operations successfully. In
this state, all the effects are now permanently saved on the database system.
4)Failed state:-If any of the checks made by the database recovery system fails, then the transaction is said
to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks,
then the transaction will fail to execute.
5)Aborted:If any of the checks fail and the transaction has reached a failed state then the database recovery
system will make sure that the database is in its previous consistent state. If not then it will abort or roll
back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction, all the
executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two operations:
1. Re-start the transaction
2. Kill the transaction
Schedule:-A series of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction.
1. Serial Schedule:-The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no
interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the individual operations of the
transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has
interleaving of operations.
3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the transaction to
execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have interleaving of their
operations.
o A non-serial schedule will be serializable if its result is equal to the result of its transactions executed
serially.
Here,
Schedule A and Schedule B are serial schedule.
Schedule C and Schedule D are Non-serial schedule.
View Serializability/schedule:-
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions:
1. Initial Read:-An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.
Above two schedules are view equivalent because Initial read operation in S1 is done by T1 and in S2 it is also
done by T1.
2. Updated Read:-In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A
which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and in S2, T3 is reading
A updated by T1.
3. Final Write:-A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.
Above two schedules is view equal because Final write operation in S1 is done by T3 and in S2, the final write
operation is also done by T3.
File Organization
o The File is a collection of records. Using the primary key, we can access the records. The type and
frequency of access can be determined by the type of file organization which was used for a given set
of records.
o File organization is a logical relationship among various records. This method defines how file records
are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of blocks, and
the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store only one fixed
length record in any given file. An alternative approach is to structure our files so that we can contain
multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length records.
Objective of file organization
o It contains an optimal selection of records, i.e., records can be selected as fast as possible.
o To perform insert, delete or update transaction on the records should be quick and easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.
Types of file organization:
File organization contains various methods. These particular methods have pros and cons on the basis of
access or selection. In the file organization, the programmer decides the best-suited file organization method
according to his requirement.
Types of file organization are as follows:
Insertion of the new record:-Suppose there is a preexisting sorted sequence of four records R1, R3 and
so on upto R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then it will be
inserted at the end of the file, and then it will sort the sequence.
If we want to search, update or delete the data in heap file organization, then we need to traverse the data
from staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming because
there is no sorting or ordering of records. In the heap file organization, we need to check all the data until we
get the requested record.
Pros of Heap file organization
o It is a very good method of file organization for bulk insertion. If there is a large number of data
which needs to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the sequential record.
Cons of Heap file organization
o This method is inefficient for the large database because it takes time to search or modify the record.
o
o This method is inefficient for large databases.
3)Hash File Organization:-Hash File Organization uses the computation of hash function on some fields
of the records. The hash function's output determines the location of disk block where the records are to
be placed.
When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the
address is generated using the hash key and record is directly inserted. The same process is applied in the
case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
4)B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access method. It uses a
tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records. For each
primary key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this
method, all the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf
nodes. They do not contain any records.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key with
which searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:-In indexed cluster, records are grouped based on the cluster key and stored together.
The above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the
records are grouped based on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:-It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the records with the same
hash key value.
Pros of Cluster file organization
o The cluster file organization is used when there is a frequent request for joining the tables with same
joining condition.
o It provides the efficient result when there is a 1:M mapping between the tables.
Cons of Cluster file organization
o This method has the low performance for the very large database.
o This method is not suitable for a table with a 1:1 condition.
Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the number of disk accesses
required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a database table
quickly.
Index structure:
Indexes can be created using some database columns.
o The first column of the database is the search key that contains a copy of the primary key or candidate
key of the table. The values of the primary key are stored in sorted order so that the corresponding
data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers holding the
address of the disk block where the value of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If
their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
o In the case of a database with no index, we have to search the disk block from starting till it reaches
543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record after reading
542*2= 1084 bytes which are very less compared to the previous case.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Dense index
o The dense index contains an index record for every search key value in the data file. It makes
searching faster.
o In this, the number of records in the index table is same as the number of records in the main table.
o It needs more space to store index record itself. The index records have the search key and a pointer
to the actual record on the disk.
Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the records in the
main table in a gap.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created on non-
primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the unique value
and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we use a clustering
index, where all employees which belong to the same Dept_ID are considered within a single cluster, and
index pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which belong to the
different cluster. If we use separate disk block for separate clusters, then it is called better technique.
Secondary Index:-In the sparse indexing, as the size of the table grows, the size of mapping also grows.
These mappings are usually kept in the primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the address got from mapping. If the mapping size
grows then fetching the address itself becomes slower. In this case, the sparse index will not be efficient. To
overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this method,
the huge range for the columns is selected initially so that the mapping size of the first level becomes small.
Then each range is further divided into smaller ranges. The mapping of the first level is stored in the primary
memory, so that address fetch is faster. The mapping of the second level and actual data are stored in the
secondary memory (hard disk).
For example:
o If you want to find the record of roll 111 in the diagram, then it will search the highest entry which is
smaller than or equal to 111 in the first level index. It will get 100 at this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the address 110,
it goes to the data block and starts searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is also done in the
same manner.
B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random
access as well as sequential access.
Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the order n
where n is fixed for every B+ tree.
o It contains an internal node and leaf node.
Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
o At most, an internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
Searching a record in B+ Tree:-Suppose we have to search 55 in the below B+ tree structure. First, we
will fetch for the intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we will be
redirected to the third leaf node. Here DBMS will perform a sequential search to find 55.
B+ Tree Insertion:-Suppose we want to insert a record 60 in the below structure. It will go to the 3rd
leaf node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60
there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the fill factor,
balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf
node of the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70)
into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added to it,
and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the
node where it fits and then place it in that leaf node.
B+ Tree Deletion:-Suppose we want to delete 60 from the above example. In this case, we have to
remove 60 from the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to have a
balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:
Insertion takes more time and it Insertion is easier and the results are always
Insertion
is not predictable sometimes. the same.
Basis of B tree B+ tree
Compariso
n
Leaf Leaf nodes are not stored as Leaf nodes are stored as structural linked
Nodes structural linked list. list.
Sequential access to nodes is not Sequential access is possible just like linked
Access
possible list
For a particular number nodes Height is lesser than B tree for the same
Height
height is larger number of nodes
Number Number of nodes at any Each intermediary node can have n/2 to n
of Nodes intermediary level ‘l’ is 2l. children.
B-tree is used in DBMS(code indexing, While binary tree is used in Huffman coding
5.
etc). and Code optimization and many others.
To insert the data or key in B-tree is While in binary tree, data insertion is not
6.
more complicated than a binary tree. more complicated than B-tree.
Definition of B-tree
B-tree in DBMS is an m-way tree which self balances itself. Due to their balanced structure, such trees are
frequently used to manage and organise enormous databases and facilitate searches. In a B-tree, each node
can have a maximum of n child nodes. In DBMS, B-tree is an example of multilevel indexing. Leaf nodes and
internal nodes will both have record references. B-Tree is called Balanced stored trees as all the leaf nodes
are at same levels.
Properties of B-tree
All leaves are at the same level.
B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends upon disk block size.
Every node except the root must contain at least t-1 keys. The root may contain a minimum of 1 key.
All nodes (including root) may contain at most (2*t – 1) keys.
Number of children of a node is equal to the number of keys in it plus 1.
All keys of a node are sorted in increasing order. The child between two keys k1 and k2 contains all
keys in the range from k1 and k2.
B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search Trees
grow downward and also shrink from downward.
Like other balanced Binary Search Trees, the time complexity to search, insert, and delete is O(log
n).
Insertion of a Node in B-Tree happens only at Leaf Node.
Need of B-tree
For having optimized searching we cannot increase a tree's height. Therefore, we want the tree to be
as short as possible in height.
Use of B-tree in DBMS, which has more branches and hence shorter height, is the solution to this
problem. Access time decreases as branching and depth grow.
Hence, use of B-tree is needed for storing data as searching and accessing time is decreased.
The cost of accessing the disc is high when searching tables Therefore, minimising disc access is our
goal.
So to decrease time and cost, we use B-tree for storing data as it makes the Index Fast.
Interesting Facts about B-Trees:
The minimum height of the B-Tree that can exist with n number of nodes and m is the
maximum number of children of a node can have
is:
The maximum height of the B-Tree that can exist with n number of nodes and t is the
minimum number of children that a non-root node can have
is: and
Traversal in B-Tree:
Traversal is also similar to Inorder traversal of Binary Tree. We start from the leftmost child, recursively
print the leftmost child, then repeat the same process for the remaining children and keys. In the end,
recursively print the rightmost child.
The above data is stored in sorted order according to the values, if we want to search for the node containing
the value 48, so the following steps will be applied:
First, the parent node with key having data 100 is checked, as 48 is less than 100 so the left children
node of 100 is checked.
In left children, there are 3 keys, so it will check from the leftmost key as the data is stored in sorted
order.
Leftmost element is having key value as 48 which match the element to be searched, so thats how we
the element we wanted to search.
Applications of B-Trees:
It is used in large databases to access data stored on the disk
Searching for data in a data set can be achieved in significantly less time using the B-Tree
With the indexing feature, multilevel indexing can be achieved.
Most of the servers also use the B-tree approach.
B-Trees are used in CAD systems to organize and search geometric data.
B-Trees are also used in other areas such as natural language processing, computer networks, and
cryptography.
Advantages of B-Trees:
B-Trees have a guaranteed time complexity of O(log n) for basic operations like insertion, deletion,
and searching, which makes them suitable for large data sets and real-time applications.
B-Trees are self-balancing.
High-concurrency and high-throughput.
Efficient storage utilization.
Disadvantages of B-Trees:
B-Trees are based on disk-based data structures and can have a high disk usage.
Not the best for all cases.
Slow in comparison to other data structures.
1. Search O(log n)
2. Insert O(log n)
3. Delete O(log n)