Dbms Notes
Dbms Notes
Dbms Notes
Database
The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff,
students and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
o Data Updation: It is used for the insertion, modification, and deletion of the
actual data in the database.
o Data Retrieval: It is used to retrieve the data from the database which can
be used by applications for various purposes.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the
information.
o It can provide a clear and logical view of the process that manipulates data.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it
stores all the data in one single database file and that recorded data is placed
in the database.
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor
and large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them
efficiently.
Types of Database :
1) Centralized Database
It is the type of database that stores data at a centralized database system. It
comforts the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users
access data securely. An example of a Centralized database can be Central Library
that carries a central database of each library in a college/university.
o It is less costly because fewer vendors are required to handle the data sets.
o If any server failure occurs, entire data will be lost, which could be a huge
loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed
among different database systems of an organization. These database systems are
connected via communication links. Such links help the end-users to access the data
easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite,
etc.
3) Relational Database
This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A
relational database uses SQL for storing, manipulating, as well as maintaining the
data. E.F. Codd invented the database in 1970. Each table in the database carries a
key that makes the data unique from others. Examples of Relational databases are
MySQL, Microsoft SQL Server, Oracle, etc.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of
data sets. It is not a relational database as it stores data not only in tabular form but
in several different ways. It came into existence when the demand for building
modern applications increased. Thus, NoSQL presented a wide variety of database
technologies in response to the demands. We can further divide a NoSQL database
into the following four types:
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over
the cloud computing platform. It provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous
cloud platforms, but the best options are:
o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing
data in the database system. The data is represented and stored as objects which
are similar to the objects used in the object-oriented programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship
nodes. Here, it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record
in the tree will contain only one parent. On the other hand, each parent record can
have multiple child records.
8) Network Databases
It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them.
Unlike the hierarchical database, it allows each record to have multiple children and
parent nodes to form a generalized graph structure.
9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This
database is basically designed for a single user.
• Data security –
A file system provides a password mechanism to protect the database but
how longer can the password be protected?No one can guarantee that.
This doesn’t happen in the case of DBMS. DBMS has specialized
features that help provide shielding to its data.
Data Abstraction
Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from
users. This process of hiding irrelevant details from user is called data
abstraction.
Logical level: This is the middle level of 3-level data abstraction architecture.
It describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.
Database Language
o A DBMS has appropriate languages and interfaces to express database
queries and updates.
o Database languages can be used to read, store and update the data in the
database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.
(But in Oracle database, the execution of data control language does not have
the feature of rolling back.)
There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.
o Rollback: It is used to restore the database to original since the last Commit.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Logical data independence is used to separate the external level from the
conceptual view.
o If we do any changes in the conceptual view of the data, then the user view
of the data would not be affected.
o If we do any changes in the storage size of the database system server, then
the Conceptual structure of the database will not be affected.
o DBMS architecture depends upon how users are connected to the database to
get their request done.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means
the user can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick
response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier
architecture, applications on the client end can directly communicate with the
database at the server side. For this interaction, API's like: ODBC, JDBC are
used.
o The user interfaces and application programs are run on the client-side.
o End user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user
beyond the application.
o Mapping is not good for small DBMS because it takes more time.
1. Internal Level
o The internal level has an internal schema which describes the physical
storage structure of the database.
o It uses the physical data model. It is used to define that how the data will
be stored in a block.
2. Conceptual Level
o The conceptual schema describes the design of a database at the
conceptual level. Conceptual level is also known as logical level.
o The conceptual level describes what data are to be stored in the database
and also describes what relationship exists among those data.
3. External Level
o At the external level, a database contains several schemas that sometimes
called as subschema. The subschema is used to describe the different view
of the database.
o Each view schema describes the database part that a particular user group
is interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database
systems.
Unit 2
Data Models
Data models define how the logical structure of a database is modeled. Data Models
are fundamental entities to introduce abstraction in a DBMS. Data models define how
data is connected to each other and how they are processed and stored inside the
system.
1) Relational Data Model: This type of model designs the data in the form of rows
and columns within a table. Thus, a relational model uses tables for representing
data and in-between relationships. Tables are also called relations. This model was
initially described by Edgar F. Codd, in 1969. The relational data model is the widely
used model which is primarily used by commercial data processing applications.
4) Semistructured Data Model: This type of data model is different from the
other three data models (explained above). The semistructured data model allows
the data specifications at places where the individual data items of the same type
may have different attributes sets. The Extensible Markup Language, also known as
XML, is widely used for representing the semistructured data. Although XML was
initially designed for including the markup information to the text document, it gains
importance because of its application in the exchange of data
ER Model
The ER model defines the conceptual view of a database. It works around real-world
entities and the associations among them. At view level, the ER model is considered
a good option for designing databases.
Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and
courses offered can be considered as entities. All these entities have some attributes
or properties that give them their identity. Entities are represented by means of
rectangles. Rectangles are named with the entity set they represent.
Attributes
Entities are represented by means of their properties, called attributes. All attributes
have values. For example, a student entity may have name, class, and age as
attributes.
Attributes are the properties of entities. Attributes are represented by means of
ellipses. Every ellipse represents one attribute and is directly connected to its entity
(rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every
node is then connected to its attribute. That is, composite attributes are represented
by ellipses that are connected with an ellipse.
Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be
divided further. For example, a student's phone number is an atomic value of
10 digits.
• Composite attribute − Composite attributes are made of more than one
simple attribute. For example, a student's complete name may have
first_name and last_name.
• Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in
the database. For example, average_salary in a department should not be
saved directly in the database, instead it can be derived. For another example,
age can be derived from data_of_birth.
Relationship
The association among entities is called a relationship. For example, an
employee works_at a department, a student enrolls in a course. Here, Works_at
and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a
relationship too can have attributes. These attributes are called descriptive
attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the
relationship.
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree
Participation Constraints
• Total Participation − Each entity is involved in the relationship. Total
participation is represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial
participation is represented by single lines.
Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of
entities to which another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two
entity sets.
o For binary relationship set R on an entity set A and B, there are four possible
mapping cardinalities. These are as follows:
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2,
and an entity in E2 is associated with at most one entity in E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in
E2, and an entity in E2 is associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2,
and an entity in E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities
in E2, and an entity in E2 is associated with any number of entities in E1.
ER Diagram Symbols and Notations
Entity
An entity can be a person, place, event, or object that is relevant to a given system. For
example, a school system may include students, teachers, major courses, subjects, fees, and
other items. Entities are represented in ER diagrams by a rectangle and named using singular
nouns.
Weak Entity-
A weak entity is an entity that depends on the existence of another entity. In
more technical terms it can be defined as an entity that cannot be identified by
its own attributes. It uses a foreign key combined with its attributed to form the
primary key. An entity like order item is a good example for this. The order item
will be meaningless without an order so it depends on the existence of the order.
• A strong entity set is an entity set that contains sufficient attributes to uniquely identify
all its entities.
• In other words, a primary key exists for a strong entity set.
• Primary key of a strong entity set is represented by underlining it
• Difference between Strong and Weak Entity:
Two strong entity’s relationship While the relation between one strong and
is represented by single one weak entity is represented by double
4. diamond. diamond.
Strong entity have either total While weak entity always has total
5. participation or not. participation.
Concepts
Tables − In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is
called a tuple.
Relation instance − A finite set of tuples in the relational database system
represents relation instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name),
attributes, and their names.
Relation key − Each row has one or more attributes, known as relation key, which
can identify the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as
attribute domain.
COOD’S RULES
Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database must
obey in order to be regarded as a true relational database.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations
as input and yields instances of relations as output. It uses operators to perform
queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively
on a relation and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set different
• Cartesian product
• Rename
We will discuss all these operations in the following sections.
• Set intersection
• Assignment
• Natural join
Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query
language, that is, it tells what to do but never explains how to do it.
Relational calculus exists in two forms −
Tuple Relational Calculus (TRC)
Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article on
'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)
In DRC, the filtering variable uses the domain of attributes instead of entire tuple
values (as done in TRC, mentioned above).
Notation −
{ a , a , a , ..., a | P (a , a , a , ... ,a )}
1 2 3 n 1 2 3 n
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject =
'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where
subject is database.
Just like TRC, DRC can also be written using existential and universal quantifiers.
DRC also involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is
equivalent to Relational Algebra.
Mapping Entity
An entity is a real-world object with some attributes.
Mapping Process
1. Domain constraints
2. Key constraints
3. Entity Integrity constraints
4. Referential integrity constraints
Let discuss each of the above constraints in detail.
1. Domain constraints :
1. Every domain must contain atomic values(smallest indivisible units) it means
composite and multi-valued attributes are not allowed.
2. We perform datatype check here, which means when we assign a data type to a
column we limit the values that it can contain. Eg. If we assign the datatype of
attribute age as int, we cant give it values other then int datatype.
3. Example:
Example:
1. Keys are the entity set that is used to identify an entity within its entity set
uniquely.
2. An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table.
Example
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated
with it.
1. Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
Normalization
1) Normalization is the process of organizing the data in the database.
2) Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate the undesirable
characteristics like Insertion, Update and Deletion Anomalies.
3) Normalization divides the larger table into the smaller table and links
them using relationship.
4) The normal form is used to reduce redundancy from the database
table.
Purpose of Normalization
Normalization is the process of structuring and handling the relationship between
data to minimize redundancy in the relational table and avoid the unnecessary
anomalies properties from the database like insertion, update and delete. It helps to
divide large database tables into smaller tables and make a relationship between
them. It can remove the redundant data and ease to add, manipulate or delete table
fields.
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.
1. X is a super key.
Example:
EMPLOYEE_DETAIL table:
is stricter than 3NF. A table complies with BCNF if it is in 3NF and for
every functional dependency X->Y, X should be the super key of the table.
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables
like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a
key.
21 Math Singing
34 Chemistry Dancing
3. The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and
HOBBY.
4. In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So
there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
5. So to make the above table into 4NF, we can decompose it into two tables:
6. STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
7. STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
4. In the above table, John takes both Computer and Math class for Semester 1
but he doesn't take Math class for Semester 2. In this case, combination of all
these fields required to identify a valid data.
5. Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject
as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank.
6. So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
7. P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
8. P2
SUBJECT LECTURER
Computer Anshika
Computer Rohit
Math Rohit
9. P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 Rohit
Semester 1 Rohit