1 DBMS
1 DBMS
BASIC TERMINOLOGIES
Database: Database is a collection of data which are organized in particular way to enable its users
to easily manage and update the data.
DBMS: DBMS is a system used to Define, Create, and Update and Control the data in database.
Example: XML files, MS ACCESS.
DBMS RDBMS
DBMS stores data as a file RDBMS, data is stored in the form of
tables.
DBMS has low software and hardware RDBMS has higher hardware and
requirements software requirements
In DBMS, data redundancy is Occur In RDBMS, keys and indexes do not allow
data redundancy
DATABASE MODEL:
A Database model defines how data will be stored, accessed and updated in a database
management system.
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
Hierarchical model:
In hierarchical model, data is organised into tree-like structure. one-to-many relationship between
data. For example, one department can have many courses. Child node has only one parent
Mahesh Kumar S
2
Network Model:
This is an extension of the Hierarchical model. In this model data is organised more like a graph.
Child node may has more than one parent node. Many-to-many data relationships between data.
ER model: Entity-relationship Model used to represent the relationships into pictorial form .This
model is good to design a database, which can converted into tables in relational model
Let's take an example, If we have to design a School Database, then Student will be an entity with
attributes name, age, address etc. As Address is generally complex, it can be another entity with
attributes street name, pin code, city etc., and there will be a relationship between them.
RELATIONAL MODEL:
In this model, data is organised in two-dimensional table. This model used widely.
Mahesh Kumar S
3
For example, NTFS (New Technology file system) is the Windows file system, and EXT
(Extensible file system) is the Linux file system.
1. Data Redundancy:
It is possible that the same information may be duplicated in different files. This leads to data
redundancy and cause for memory wastage.
2. Data Inconsistency:
Because of data redundancy, it is possible that data may not be in consistent state.
Note: Data consistency means any changes made to the one database are immediately reflected to
all other database.it is possible only when data has no redundancy.
3. Integrity Problems:
Data integrity means that the data contained in the database is correct and consistent. But
consistent is not possible in File based system.so Integrity problem occur.
Data are found in various files. Also different files may have different formats and these files may be
stored in different folders. So, due to this data isolation, it is difficult to share data among different
applications.
5. Fixed Queries
Query or reports needed by the organization has to be developed by the application programmer.
In a database as there is a single database and any change in it is reflected immediately. Because of
this, there is no chance of encountering duplicate data.
2. Data Consistency
Data consistency is ensured in a database because there is no data redundancy. Any changes made
to the database are immediately reflected to all the users and there is no data inconsistency.
3. Data Integrity
Mahesh Kumar S
4
Data integrity means that the data is correct and consistent in the database. There are multiple
databases in a DBMS. Data integrity can be ensure in DBMS.
4. Sharing of Data
In DBMS, the users of the database can share the data among themselves. There are various levels
of authorisation to access the data. Many remote users can also access the database simultaneously
and share the data between themselves.
5. Data Security
Data Security is vital concept in a database. Only authorised users should be allowed to access the
Database.
Database Management System automatically takes care of backup and recovery. The users don't
need to backup data periodically because this is taken care of by the DBMS. It also restores the
database after a crash or system failure.
Disadvantage of DBMS:
1. Increased Cost
2. Maintenance Cost
3. Frequent Upgrade needed
Three schema Architecture
The three schema architecture is used to separate the user applications and physical database. The
three schema architecture contains three-levels.
1. Internal Level
-describes the database structure.
- define how data will be stored in a block.
-Internal level is also known as physical level.
2. Conceptual Level
-describes what data are to be stored in the
database and also describes what relationship
exists among those data.
-Conceptual level is also known as logical level.
-Programmers and database administrators
work at this level.
3. External Level
-At the external level, a database contains
several schemas called as subschema.
Mahesh Kumar S
5
-It describe the database which is interested by user group and hides the remaining database from
that user group.
-External schema is also known as view schema.
ER MODEL
-used to represent the relationships into pictorial form .So, Different clients can easily understand.
This model is good to design a database. It can converted into tables in relational model.
Mahesh Kumar S
6
An Entity is a table which used to represent the object with a physical existence – a particular
person, car, house, or employee – or the object with a conceptual existence – a company, a job, or a
university course.
For example, in a school database, students, teachers and courses offered can be considered as
different entities. All these entities have some attributes.
An entity set contain entities with attribute sharing similar values. For example, a Students set may
contain all the students of a school; likewise a Teachers set may contain all the teachers of a school.
Relationship
--participation (all the rows(tuples) participating or not)
Type of relationship
– one to one
–
– one to many or many to one
ENTITY
Mahesh Kumar S
7
Weak Entity
- is an entity that depends on another entity.
-Like strong entity, weak entity does not have any primary key
-In ER MODEL, Weak entities represented with double rectangular box.
-Weak entity always has total participation but Strong entity may not have total participation.
Example:
A company may store the information of dependants (Parents, Children, Spouse) of an Employee.
But the dependents don’t have existence without the employee. So Dependent will be weak entity
type and Employee will be strong entity type for Dependant.
Types of Attribute:
1. composite(fast name, last name) & simple attribute(name)
2. single value(pan no) & multivalued attribute(mobile no)
3. derived attribute (age)
4. complex attribute(address)
Mahesh Kumar S
8
Mahesh Kumar S
9
Example:
Let’s say we have two entities Student and Teacher.
Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary
These two entities have two common attributes: Name and Address, we can make a generalized
entity with these common attributes
After generalization:
Specialization
-Specialization is a top-down approach, and it is opposite to Generalization.
- In specialization, one higher level entity can be broken down into two lower level entities.
Mahesh Kumar S
10
Aggregation
In Aggregation relation between two entities is treated as a single entity.
Ex:
Let’s consider the situation. There is a manager who manages Employee as well as projects.
Before aggregation:
After Aggregation:
A manager not only manages the employee working under
them but he has to manage the project as well. In such
scenario if entity “manager” makes a “manages” relationship
with either “employee” or “project” entity alone then it will
not make any sense because he has to manage both. In our
example, the relationship “works-on” between “employee” &
“project” acts as one entity that has a relationship “manages”
with the entity “manager”.
RELATIONAL MODEL:
In this model, data is organised in two-dimensional table which contain row and coloumn . This
model used widely.
Mahesh Kumar S
11
Keys used to identify relationships between tables and also to uniquely identify any row of
data inside a table. A Key can be a single attribute or a group of attributes.
1. Super Key
Super Keys are set of column which can uniquely identify each row within a table. Super Key is a
superset of Candidate key.
3. Primary Key
Primary key is a candidate key which is used to uniquely identify each row(tuple) in a table. Primary
key has no null value. One table may contain only one primary key.
4. Foreign key
Foreign key is the column of table which is to point to the primary key of another table.Its used to
describe relationship b/w two tables.
For example, In a company, every employee works in a specific department .Here employee and
department are two different entities. So we can't store the information of the department in the
employee table.
Mahesh Kumar S
12
We add primary key of DEPARTMENT table, Department_Id as a new attribute in the EMPLOYEE
table. Now in the EMPLOYEE table, Department_Id is the foreign key and both the tables are
related.
5. Composite Key
Composite key is a set of two or more columns that uniquely identify each row in a table.
6. Unique Key
A unique key is a set of one or more than one columns of a table that uniquely identify each row in a
table. Unique key can have NULL value. Multiple unique key can be found in a given table.
Note:
Composite means more than one field makes the row unique. So a composite key has more
than one field. A Unique key can have 1 or more fields. So, all composite keys are unique keys
but all unique keys are not composite keys.
Primary key will not accept NULL values whereas unique key can accept one NULL value. A
table can have only primary key whereas there can be multiple unique key on a table.
Here, Roll number is primary key and Citizen_ID is unique key. Citizen_ID column should be unique
because each citizen of a country must have his or her Unique identification number like Aadhaar
Number. But if student is migrated to another country in that case, he or she would not have any
Citizen_ID and the entry could have a NULL value.
Mahesh Kumar S
13
Join Operations:
A JOIN is used to combine rows from two or more tables, based on a related column
between them.
Inner Join (Natural Join): An inner join combines two different tables and returns rows where the key
exists in both tables.
Outer Join:
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched
records from the right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched
records from the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or
right table
CROSS JOIN (Cartesian Join): The CARTESIAN JOIN is also known as CROSS JOIN. In a CARTESIAN
JOIN there is a join for each row of one table to every row of another table. This usually happens
when the matching column or WHERE condition is not specified. i.e., the number of rows in the
result-set is the product of the number of rows of the two tables.
SELF JOIN: As the name signifies, in SELF JOIN a table is joined to itself. In other words we can say
that it is a join on certain condition between two copies of the same table. Here, the table has a
FOREIGN KEY which references its own PRIMARY KEY.Ex: The employee table might be joined to
itself in order to show the manager name and the employee name in the same row.
Equi Join: If you have written INNER join using where clause than using an equality operator as = will
be known as an equijoin. The main difference between Self Join and Equi Join is that In Self Join we
join one table to itself rather than joining two tables. Both Self Join and Equi Join are types of INNER
Join in SQL.
Table A:
Mahesh Kumar S
14
Table B:
Natural Join (Inner Join): Table contain Only Common row (foreign key of A which are found
in primary key of B) from A AND B
Natural join of above table:
Left Outer Join: Table contain all row from table and their respective row from right table
using foreign key .if any foreign key of left table which is primary key of right table is missing
in right table, that respective column of row filled as NULL value.
Mahesh Kumar S
15
Integrity Constraints
Data Integrity:
Data integrity means that the data is correct and consistent in the database.
Note: Data consistency means any changes made to the one database are immediately
reflected to all other database.it is possible only when data has no redundancy.
Example: Let us imagine we have a customer database where we have two tables i.e.
'customer_table'(customer_id, customer_name, purchase_id) and
'purchase_table'(purchase_id, purchhased_item). These two tables are related by purchase
Mahesh Kumar S
16
id.Therefore any purchase is made by the customer then that data of the purchased item will
be stored in the purchase_table.
Integrity Constraints are set of rules which are need to follow while entering data into the
database table
Mahesh Kumar S
17
Key Integrity Constraints: It is used to maintain integrity of different keys which are available in
table.
The types of key constraints-
Primary key constraints- It uniquely identifies a row in a table. Not contain Null value
Unique key constraints- provides a unique/distinct values to specified columns.
Foreign Key constraints- ensures referential integrity of the relationship
NOT NULL constraints- ensures that the specified column doesn’t contain a NULL value.
Check constraints- checks for the predefined conditions before inserting the data inside the
table.
DEFAULT constraints: provides a default value to a column if none is specified.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. If column A
of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A).
Functional dependencies used in Normalization to reduce the Data redundancy.
Mahesh Kumar S
18
Multivalued dependency in DBMS: if one attribute determinant more than one attributes which
have no dependency between them.
Transitive dependency:
A transitive functional dependency happens when a functional dependency is indirectly formed by
two functional dependencies.
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we can
know his age.
Partial Dependency
If the proper subset of candidate key determines non-prime attribute, is called partial dependency
Normalization
Keyword:
Data redundancy-same data occur different place
Data consistent- Data consistency means any changes made to the one database are immediately
reflected to all the users or all other database.it is possible only when data has no redundancy.
Not consistent-the age of person x is different in different table
Anomaly-something which different from what is expected/normal
Non-prime attribute-attribute which is not part of any candidate key
Mahesh Kumar S
19
Proper subset of candidate key-if AB is candidate key. The proper subset of candidate key is {A,B}
Partial Dependency– If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
Normalization of Database
Normalization is a method of organizing the data in the database which helps you to avoid data
redundancy, insertion, update & deletion anomaly in data base.
Above table is not normalized. There will be chance for data inconsistency if hod of branch CSE will
update. I think above table is Not in 3NF.Because,non key is determinant non key value.
branch->hod
branch->office_tel
The normalized version of above table:
Rollno Name Branch id
401 Akon 101
402 Bkon 101
403 Ckon 101
404 Dkon 101
Mahesh Kumar S
20
Type of Anomalies:
There are three types of anomalies: insertion, update and deletion anomalies.
Insertion Anomaly
Unable to add new data to the database due to absence of other data .
-Let’s say we have a table that has 4 columns. Student ID, Student Name, Student Address and
Student Grades. Now when a new student enrol in school, even though first three attributes can be
filled but 4th attribute will have NULL value because he doesn't have any marks yet. If student grade
not accept null values, this student data could not entered into the database. This results in
database inconsistencies.
Updation Anomaly
-occur when forgetting to update the value of data in multiple places.
Let say we have 10 columns in a table out of which 2 are called employee Name and employee
address. Now if one employee changes its location then we would have to update the table. if the
table is not normalized one employee can have multiple entries(data redundancy ) and while
updating one of them might get missed. It will lead to data inconsistency.
Deletion Anomaly
-Deletion Anomalies happen when the deletion of unwanted information causes important
information to be deleted as well.
For example, if a single database record contains information about a particular product along with
information about a salesperson for the company. If the salesperson quits, then information about
the product is deleted along with salesperson information.
Type of Normalization
First Normal Form (1NF)
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple
values. It should hold only atomic value(single value).
1 normalized form:
Mahesh Kumar S
21
Mahesh Kumar S
22
Transitive dependency:
Non key->non key
Mahesh Kumar S
23
Mahesh Kumar S
24
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
Denormalization in Databases
Denormalization is the process of increasing the redundancy in the database.
It is the opposite process of normalization.
It is mostly done for improving the performance.
Denormalization adds redundant data into normalized database for reducing the problems
during database queries which combine data from the various tables into a single table.
Denormalization method:
Adding Redundant columns:
In this method, only the redundant column which is frequently used is added to the main table.
Example:
We have to generate a
report where we have to
show employee details and
his department name. Here
we need to have join
EMPLOYEE with DEPT to get
department name.
But joining the huge EMPLOYEE and DEPT table will affect the
performance of the query. But we cannot merge DEPT with
EMPLOYEE. In this case, what we can add the redundant
column DEPT_NAME to EMPLOYEE, so that it avoids EMPLOYEE
TABLE join with DEPT TABLE and increasing the performance.
Mahesh Kumar S
25
Pros of Denormalization:-
1. Retrieving data is faster.
2. Queries to retrieve can be simpler(and therefore less likely to have bugs), since we need to
look at fewer tables.
Cons of Denormalization:-
1. Updates and inserts are more expensive.
2. Data may be inconsistent.
Mahesh Kumar S
26
TRANSACTION
• Transaction is a unit consist of logically related operation like(read, write)
For example, you are transferring money from your bank account to your friend’s account, the set of
operations would be like this:
1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account
This whole set of operations can be called a transaction.
Mahesh Kumar S
27
Property of Transaction
Atomicity
The transaction cannot occur partially. Each transaction is treated as one unit. Atomicity
ensure that the transaction either completed successfully or not executed at all.
Transaction control manager component responsible for atomicity.
Roll back: If a transaction fails then all the changes are not visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and
B consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
If the transaction T fails after the completion of transaction T1 but before completion of transaction
T2, then the amount will be deducted from A but not added to B. This shows the inconsistent
database state. In order to ensure correctness of database state, the transaction must be executed
in entirety.
Consistency
• It ensure that if the system is consistent before and after transaction
• User /application programmer responsible for consistent state.
For example: The total amount must be maintained before or after the transaction.
1. Total before T occurs = 600+300=900
2. Total after T occurs= 500+400=900
Isolation
• It ensure that no transaction should be affected by other parallelly executed transaction.
• In isolation, if the transaction T1 is being executed and is using the data item X then that data
item can't be accessed by any other transaction T2 until the transaction T1 ends.
• The concurrency control manager component responsible for isolation property.
Mahesh Kumar S
28
Durability
It ensure that all the updates made by transaction must become permanent in DB
irrespective of hardware and software failure.
The recovery manager component is responsibility for durability.
States of Transaction
In a database, the transaction can be in one of the following states -
Active state
• Active state is the first state of the every transaction. In this state, transaction is being
executed.
• For example: Insertion or deletion or updating a record is done here. But all the records are
still not saved to the database.
Partially committed
• In the partially committed state, a transaction executes its final operation but the data is still
not saved to the database.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In this
state, all the effects are now permanently saved on the database system.
Failed state
• If any of the query made from the database system fails then the transaction is said to be in
failed state.
• In the example of total mark calculation, if the database is not able to fire a query to fetch
the marks then the transaction will fail to execute.
Aborted − If transaction reached failed state, then the recovery manager rolls back all its write
operations on the database to bring the database back to its original state .
Schedule
Transactions are set of operations on database. If multiple transactions are running concurrently,
we need to perform the only one operation on the database. This sequence of operations is known
as Schedule.
Mahesh Kumar S
29
1. Serial Schedule
In Serial schedule, one transaction is executed completely before execution of another transaction.
This type of execution also known as non-interleaved execution
2. Non-serial Schedule
In non-serial schedules,
Multiple transactions execute concurrently.
Operations of all the transactions are inter leaved or mixed with each other.
Mahesh Kumar S
30
a. Serializable schedule
The non-serial schedule is said to be in a serializable schedule only when it is equivalent to the serial
schedules, for an n number of transactions. Here, multiple transactions can execute concurrently
Conflict Serializable: A schedule is called conflict serializable if it can be transformed into a serial
schedule by swapping non-conflicting operations.
View Serializable: To check whether a given schedule is view serializable, we need to check whether
the given schedule is View Equivalent to its serial schedule.
Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the following conditions:
Initial Read: Initial read of each data item in transactions must match in both schedules.
Final Write: Final write operations on each data item must match in both the schedules.
Update Read For example, In schedule S1, T1 performs a read operation on X after the write
operation on X by T2 then in S2, T1 should read the X after T2 performs write on X.
4. Non-Serializable Schedules
A non-serial schedule which is not serializable is called as a non-serializable schedule.
A non-serializable schedule may or may not consistent and may or may not recoverable.
Concurrency Control:
Concurrency control enables the multiple users to access shared database at the same time.
When multiple users(transaction) are accessing the database at the same time, and at least
one is updating data, there may be the chance of conflicts, which can result in data
inconsistencies.
Mahesh Kumar S
31
Ex: Assume that two people who go to two different electronic machine at the same time to buy a
movie ticket for the same movie and the same show time.
However, there is only one seat left in for the movie show in that particular theatre. Without
concurrency control, it is possible that both person will end up purchasing a ticket. However,
concurrency control method does not allow this to happen. Concurrency control only provides a
ticket to the buyer who has completed the transaction process first.
Mahesh Kumar S
32
The phantom read problem occurs when a transaction reads a variable once but when it tries to
read that same variable again, an error occurs saying that the variable does not exist.
In the above example, once transaction 2 reads the variable X, transaction 1 deletes the variable X
without transaction 2’s knowledge. Thus, when transaction 2 tries to read X, again it is not able to it.
Mahesh Kumar S
33
3. Time-stamp protocol
A unique identifier created by DBMS which indicate the relative starting time of each
transaction.
Condition: Ts(1)<Ts(2)
Mahesh Kumar S
34
It is a situation in which Two or more transaction hold lock and waiting for one another to release
locks. Real life example:
A) 2 person call each other at a same time. result: both are getting busy state.
B) Exam paper and pen: person A have pen and person B have paper. To complete the exam
successfully Person A need paper and Person B need pen.so, this is not possible .so dead lock
occur.
Mahesh Kumar S
35
Mahesh Kumar S
36
2.Mutual exclusion:
3.No Pre-emption:
4.circular wait:
Mahesh Kumar S
37
Starvation:
Starvation can be best explained with the help of an example – Suppose there are 3 transactions
namely T1, T2, and T3 in a database that are trying to acquire a lock on data item ‘ I ‘. Now, suppose
the scheduler grants the lock to T1(maybe due to some priority), and the other two transactions are
waiting for the lock. As soon as the execution of T1 is over, another transaction T4 also come over
and request unlock on data item ‘I’. Now, this time the scheduler grants lock to T4, and T2, T3 has to
wait again. In this way if new transactions keep on requesting the lock, T2 and T3 may have to wait
for an indefinite period of time, which leads to Starvation.
INDEXING
The index is a type of data structure. It is used to locate and access the data in a database
table quickly.
Indexing is used to increase the performance of a database by decrease the number of disk
accesses required when a query is processed.
Structure of a index:
Mahesh Kumar S
38
Type of index:
Dense Index
In dense index, there is an index row for every primary key value in the database. This makes
searching faster but requires more space to store index records itself. Each Index row contain
primary key value and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every primary key. To search a record, we first
proceed by index record and reach at the actual location of the data. If the data we are looking for is
not found in index record, we directly reach by following the index, then the system starts
sequential search until the desired data is found
Mahesh Kumar S
39
Clustering Index
– A clustered index can be defined as an ordered data file. The index record is created on non-
primary key column.
– The rows which have similar characteristics are grouped, and indexes are created for these
group.
Example: suppose a company contains several employees in each department. Suppose we use a
clustering index, where all employees which belong to the same Dept_ID are considered within a
single cluster, and index pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
Secondary:
− Secondary index generated by using candidate key.
Secondary index enables accessing data file without using primary key. Secondary index must be a
dense index(index defined on each row). Order of the search key values in the index file and
the actual records are in different order.
A file can have several secondary indexes.
Mahesh Kumar S
40
Multilevel Index
When stored large number of data it may result to large number of index. With help of multi index
we can overcome this problem. In Multi-level Index ,the actual index table divided into several
smaller index table in order to make the outermost level as small which can be saved in a single disk
block.so that index table can easily be accommodated anywhere in the main memory.
B tree
A B-Tree of order m can have at most m-1 keys and m children.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the following
properties.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a
B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height,
thus balanced.
Mahesh Kumar S
41
Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random
access as well as sequential access.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B+ tree is of the order n where n is fixed
for every B+ tree.
Internal nodes −
• Internal (non-leaf) nodes contain at least ⌈n/2⌉pointers, except the root node.
• At most, an internal node can contain n pointers.
Leaf nodes −
• Leaf nodes contain at least ⌈n/2⌉record pointers and ⌈n/2⌉key values.
• At most, a leaf node can contain n record pointers and n key values.
• Every leaf node contains one block pointer P to point to next leaf node and forms a
linked list.
Mahesh Kumar S
42
Note:
Avl tree, red black tree not used in data base system. Because they have growth in depth wise not in
breath wise.so, they take more accessing time. B , B+ tree grow breadth wise and they take less
access time compare to other tree data structure.
Mahesh Kumar S
43
transaction in undo-list.
All the transactions in the undo-list are undone and their logs are removed. All the transactions in
the redo-list redone and their logs are saved.
With
Mahesh Kumar S