Pdf1 Merged
Pdf1 Merged
DBMS
Database: Management
-> It is a System
collection of
similar data
types.
Module 1
DBMS:-
It is the integration of different softwares which are utilized to create, modify and control the database for the
benefits of user .
Advantages:-
• it reduces a data redudancy.
• Sharing of data among multiple users.
• It maintains data integrity.
• It maintains data consistency.
• It provides better security to data.
• It provides database backup and security methods.
Disadvantages:-
• Problems associated with centralisation.
• Problems associated with database backup and recovery.
• Cost of hardware and software are expensive.
Characteristics of DB approach:-
bit -> character -> field -> record -> file -> DB
1. Bit:-
. A bit is the smallestunit of data representation.
.8 bits = 1 byte.
2. Character:-
. It is a collection of bits.
001 TV 25,000
002 WM 50,000
003 AC 30,000
SCHEMA
. It is a logical database description and is drawn as a chart of the types of data that are used.
. It gives the names of the entities and attributes to specify the relationship between them.
. It is a framework into which the values of data item can be treated. Ex:- the logical representation of Student
Schema.
(In SQL, don’t use space)
ex:- Student_name = String;
Student_id : int;
Student_age: int;
or studentName
3-Schema Architecture :
i. The architecture of DBMS is divided into 3 levels -- > External level, Conceptual level, and Internal
level.
ii. The purpose of 3-Schema architecture is to seperate the user application from the physical database.
3-Schema architecture:
DB DB DB
ER Diagram
. It describes a structure of database with the help of a diagram.
. ER diagram has the logical structure of database.
. ER diagram consistes the following symbols :-
rectangle -> (for Entity)
1. Entity:-
It is an object which is used to represent the real world things.
Ex: table, chair, etc.
.Entities set:-
It is a collection of similar types of entities.
Ex: “all Doctor” , “All Staffs” , etc.
Entities are divided into two types:
i. Weak entity ii. Strong Entity
i. Weak Entity:-
An entity that can’t be uniquely identified by its own attributes and depends on the relationship with
other entity is called Weak Entity.
Ex: a bank account can’t be uniquely identified without knowing the bank to which the account belongs.
3. RelationShip:-
It is an evolution between 2 or more entity. In Other word, RelationShip shows the relation between two 2
or more entities.
RelationShip is divided into 4 types i.e., :-
i. One-to-One relationship ii. One-to-Many relationship
iii. Many-to-One relationship iv. Many-to-Many relationship
i. One-to-One :- One entity is associated with another one Entity.
Ex:-
CUTM
has staffs
Chatrapur
iii. Many-to-One :- More than one entity is associated with one entity.
Ex:-
iv. Many-to-Many :- More than one entity is associated with more than one entity of another set.
Ex:-
Data Model give us an idea that how the final system will look like after its complete implimentation.
-> A Data Model in DBMS, is the concept of tools that are developed to summarize the description of the
database.
-> It defines how the logical Structure of a database is modeled.
-> A Data Model is collection of conceptual tools for describing :
• Data
• Data relationships
• Data semantics and
• Consistency constraints
-> It describes the design of a database at each level of data abstraction.
-> It defines how data is connected to each other and how they are processed and stored inside the system.
1. Relational Model :-
-> Most widely used model by commercial data processing applications.
-> It uses collection of tables for representing data and relationships among those data.
-> Data is stored in tables called relations.
-> Each table is a group of columns and rows, where column represents attributes of an entity and rows
represents records(or tuples).
-> Attributes or Fields :- Each column in a relation is called an attribute. The values of the attribute should
be from the same domain.
Ex:- we have different attributes of the student like Student_id, Student_name,
Student_age, etc
-> Tuples or records :- Each row in the relation called tuple. A tuple defines a collection of attribute values.
So, each row in a relation contains unique values.
Ex:- each row has all the information about any specific individual like the first has
information about student ashish.
-> This model was initially described by Edgar F.Codd, in 1969.
Diagramatically:- Relational Model in DBMS
Attributes(columns)
2. Hierarchical Model :-
-> It was the first DBMS model
-> In hierarchical model, data is organized into tree like structure with each record is having one
parent record and many children.
-> The main drawback of this model is that, it can have only one to many relationships between nodes.
-> Hierarchical models are rarely used now.
Diagramatically:- Hierarchical model in DBMS
Electronics
Televisions Portable
Electronics
Flash
3. Network Model :-
-> This model is an extention of the hierarchical model. It was most popular model before the
relational model.
-> Network Model is same as hierarchical model except that it has graph-like structure rather than a
tree-based structure and are allowed more than one parent node.
-> It supports many-to-many data relationships.
-> This was the most widely used database model, before Relational Model was introduced.
Diagramatically:- Network Model in DBMS
College
CSE Library
Department
Student
Keys
There are mainly eight different types of Keys in DBMS and each key has it’s different functionality:
A Super Key is a group of single or multiple keys which identifies row in a table. A Super key may have
additional attributes that are not needed for unique identification.
Example:
In the above given example, EmpSSN and EmpNum name are superkeys.
Primary Key in DBMS is a column or group of columns in a table that uniquely identify every row in that table.
The Primary Key can’t be a duplicate meaning the same value can’t appear more than once in the table. A table
cannot have more than one primary key.
Example :-
In the following example, StudID is a Primary key.
Example:
In this table, StudID, Roll No, Email are qualified to become a primary key. But since StudID is the primary
key, Roll No, Email becomes the alternate key.
Candidate key in SQL is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key
with no repeated attributes. The Primary key should be selected from the candidate keys. Every table must have
at least a single candidate key. A table can have multiple candidate keys but only a single primary key.
Example:
In the given table StuID, Roll No, and Email are candidate keys which help us to uniquely identify the student
record in the table.
Foreign Key is a column that creates a relationship between two tables. The purpose of foreign keys is to
maintain data integrity and allow navigation between two different instances of an entity. It acts as a cross-
reference between two tables as it references the primary key of another table.
Example:
DeptCode DeptName
001 Science
002 English
003 Computer
In thik key in DBMS example, we have two table, teach and department in a school. However, there is no way
to see which search work in which department.
In this table, adding the foreign key in Deptcode to the Teacher name, we can create a relationship between the
two tables.
Teacher ID DeptCode Fname Lname
B002 002 David Warner
B017 002 Sara Joseph
B009 001 Miker Brunton
Normalization
Normalization in DBMS is a technique using which you can organize the data in the database tables so that :
• There is less repetition of data.
• A large set of data is structured into a bunch of smaller tables.
• And the tables have a proper relationship between them.
DBMS Normalization is a systematic approach to decompose (break down) tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion anomaly in DBMS, Update anomaly in
DBMS, and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data, and set up the relationship
between tables.
Why we need Normalization in DBMS?
Normalization is required for,
•Eliminating redundant(useless) data, therefore handling data integrity, because if data is repeated it
increases the chances of inconsistent data.
•Normalization helps in keeping data consistent by storing the data in one table and referencing it
everywhere else.
•Storage optimization although that is not an issue these days because Database storage is cheap.
•Breaking down large tables into smaller tables with relationships, so it makes the database structure
more scalable and adaptable.
•Ensuring data dependencies make sense i.e. data is logically stored.
Problems without Normalization in DBMS :
If a table is not properly normalized and has data redundancy(repetition) then it will not only eat up extra
memory space but will also make it difficult for you to handle and update the data in the database, without
losing data.
Insertion, Updation, and Deletion Anomalies are very frequent if the database is not normalized.
To understand these anomalies let us take an example of a Student table.
rollno name branch hod office_tel
401 Akon CSE Mr. X 53337
402 Bkon CSE Mr. X 53337
403 Ckon CSE Mr. X 53337
404 Dkon CSE Mr. X 53337
In the table above, we have data for four Computer Sci. students.
As we can see, data for the fields branch, hod(Head of Department), and office_tel are repeated for the
students who are in the same branch in the college, this is Data Redundancy.
1. Insertion Anomaly in DBMS
•Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be
inserted, or else we will have to set the branch information as NULL.
•Also, if we have to insert data for 100 students of the same branch, then the branch information will be
repeated for all those 100 students.
•These scenarios are nothing but Insertion anomalies.
•If you have to repeat the same data in every row of data, it's better to keep the data separately and
reference that data in each row.
•So in the above table, we can keep the branch information separately, and just use the branch_id in the
student table, where branch_id can be used to get the branch information.
2. Updation Anomaly in DBMS
•What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer science department?
In that case, all the student records will have to be updated, and if by mistake we miss any record, it will
lead to data inconsistency.
•This is an Updation anomaly because you need to update all the records in your table just because one
piece of information got changed.
3. Deletion Anomaly in DBMS
•In our Student table, two different pieces of information are kept together, the Student information
and the Branch information.
•So if only a single student is enrolled in a branch, and that student leaves the college, or for some
reason, the entry for the student is deleted, we will lose the branch information too.
•So never in DBMS, we should keep two different entities together, which in the above example is
Student and branch,
Primary Key and Non-key attributes :
Before we move on to learn different Normal Forms in DBMS, let's first understand what is a primary key and
what are non-key attributes.
As you can see in the table above, the student_id column is a primary key because using the student_id value
we can uniquely identify each row of data, hence the remaining columns then become the non-key attributes.
Types of DBMS Normal forms
Normalization rules are divided into the following normal forms:
1.First Normal Form
2.Second Normal Form
3.Third Normal Form
4.BCNF
5.Fourth Normal Form
6.Fifth Normal Form
Let's cover all the Database Normal forms one by one with some basic examples to help you understand the
DBMS normal forms.
1. First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
1.It should only have single(atomic) valued attributes/columns.
2.Values stored in a column should be of the same domain.
3.All the columns in a table should have unique names.
4.And the order in which data is stored should not matter.
Let's see an example.
If we have an Employee table in which we store the employee information along with the employee skillset, the
table will look like this:
emp_id emp_name emp_mobile emp_skills
emp_id emp_skill
1 Python
1 JavaScript
2 HTML
2 CSS
2 JavaScript
3 Java
3 Linux
3 C++
HTML, CSS,
2 Darth Trader 8888853337
JavaScript
Let's take an example to understand Partial dependency and the Second Normal Form.
What is Partial Dependency?
When a table has a primary key that is made up of two or more columns, then all the columns(not included in
the primary key) in that table should depend on the entire primary key and not on a part of it. If any
column(which is not in the primary key) depends on a part of the primary key then we say we have Partial
dependency in the table.
Confused? Let's take an example.
If we have two tables Students and Subjects, to store student information and information related to subjects.
Student table:
student_id student_name branch
1 Akon CSE
2 Bkon Mechanical
Subject Table:
subject_id subject_name
1 C Language
2 DSA
3 Operating System
And we have another table Score to store the marks scored by students in any subject like this,
student_id subject_id marks teacher_name
1 1 70 Miss. C
1 2 82 Mr. D
2 1 65 Mr. Op
Now in the above table, the primary key is student_id + subject_id, because both these information are
required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject information or just the
subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will be Normalized as
per the Second Normal Form.
Updated Subject table:
subject_id subject_name teacher_name
1 C Language Miss. C
2 DSA Mr. D
1 1 70
1 2 82
2 1 65
1 1 70 Theory 100
1 2 82 Theory 100
2 1 42 Practical 50
•In the table above, the column exam_type depends on both student_id and subject_id, because a
student can be in the CSE branch or the Mechanical branch, and based on that they may have different
exam types for different subjects.
The CSE students may have both Practical and Theory for Compiler Design, whereas Mechanical
branch students may only have Theory exams for Compiler Design. But the column total_marks just
depends on the exam_type column. And the exam_type column is not a part of the primary key.
Because the primary key is student_id + subject_id, hence we have a Transitive dependency here.
How to Transitive Dependency?
You can create a separate table for ExamType and use it in the Score table.
New ExamType table,
exam_type_id exam_type total_marks duration
1 Practical 50 45
Now that we understand what is transaction, we should understand what are the problems
associated with it.
The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash
etc. This is a serious problem that can leave database in an inconsistent state. Assume that
transaction fail after third operation (see the example above) then the amount would be
deducted from your account but your friend will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.
Rollback: If any of the operation fails then rollback all the changes done by previous
operations.
Even though these operations can help us avoiding several issues that may arise during
transaction but they are not sufficient when two transactions are running concurrently. To
handle those problems we need to understand database ACID properties.
Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as
one unit and either runs to completion or is not executed at all. It involves the following two
operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then amount has been deducted from X but not added
to Y. This results in an inconsistent database state. Therefore, the transaction must be executed
in entirety in order to ensure correctness of database state.
Consistency
This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above, The total amount before and after the transaction must be maintained.
Isolation
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of database state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any other transaction until
that particular change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X= 500, Y = 500.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of
operations takes place due to which T’’ reads correct value of X but incorrect value of Y and
sum computed by
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency
of a database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes
are durably stored.
1. Serial Schedules:
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule
is one in which no transaction starts until a running transaction has ended are called
serial schedules. i.e., In Serial schedule, a transaction is executed completely before
starting the execution of another transaction. In other words, you can say that in serial
schedule, a transaction does not start execution until the currently running transaction
finished execution. This type of execution of transaction is also known as non-
interleaved execution. The example we have seen above is the serial schedule.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
R(B)
where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T 1 —> T2
2. Non-Serial Schedule:
This is a type of Scheduling where the operations of multiple transactions are interleaved.
This might lead to a rise in the concurrency problem. The transactions are executed in a non-
serial manner, keeping the end result correct and same as the serial schedule. Unlike the
serial schedule where one transaction must wait for another to complete all its operation, in
the non-serial schedule, the other transaction proceeds without waiting for the previous
transaction to complete. This sort of schedule does not provide any benefit of the concurrent
transaction. It can be of two types namely, Serializable and Non-Serializable Schedule.
The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.
Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the other
hand, a serial schedule does not need the serializability because it follows a transaction only
when the previous transaction is complete. The non-serial schedule is said to be in a serializable
schedule only when it is equivalent to the serial schedules, for an n number of transactions.
Since concurrency is allowed in this case thus, multiple transactions can execute concurrently.
These are of two types:
Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.
Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose changes they read
commit are called recoverable schedules. In other words, if some transaction Tj is reading value
updated or written by some other transaction Ti, then the commit of Tj must occur after the
commit of Ti.
Example – Consider the following schedule involving two transactions T 1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Commit
This is a recoverable schedule since T1 commits before T2, that makes the value read by
T2 correct.
3. Non-Recoverable Schedule:
Example: Consider the following schedule involving two transactions T 1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Abort
4. T2 read the value of A written by T1, and committed. T1 later aborted, therefore the value
read by T2 is wrong, but since T2 committed, this schedule is non-recoverable.
SQL Commands
o SQL commands are instructions. It is used to communicate with the database.
It is also used to perform specifc tassss functionss and queries of data.
o SQL can perform various tasss lise create a tables add data to tabless drop
the tables modify the tables set permission for users.
o CREATE
o ALTER
o DROP
o TRUNCATE
Syntax:
COLUMN_NAME1 DATATYPES(size)s
COLUMN_NAME2 DATATYPES(size)s
--------------
COLUMN_NAMEN DATATYPES(size)s
);
Example:
EMPNo VARCHAR2(20)s
EName VARCHAR2(20)s
Job VARCHAR2(20)s
DOB DATE
);
b. DROP : This statement is used to drop an existing database. When you use this statement,
complete information present in the database will be lost.
Syntax
DROP DATABASE DatabaseName;
Example
DROP DATABASE Employee;
The ‘DROP TABLE’ Statement
This statement is used to drop an existing table. When you use this statement, complete
information present in the table will be lost.
Syntax
DROP TABLE TableName;
Example
DROP Table Emp;
c. ALTER
This command is used to delete, modify or add constraints or columns in an existing
table.
The ‘ALTER TABLE’ Statement
This statement is used to add, delete, modify columns in an existing table.
The ‘ALTER TABLE’ Statement with ADD/DROP COLUMN
You can use the ALTER TABLE statement with ADD/DROP Column command
according to your need. If you wish to add a column, then you will use the ADD
command, and if you wish to delete a column, then you will use the DROP COLUMN
command.
Syntax
ALTER TABLE TableName ADD ColumnName Datatype;
Syntax:
Example:
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data
into the row of a table.
Syntax:
Or
For example:
Syntax:
For example:
Syntax1:
Syntax1
o Grant
o Revose
Example
Example
o COMMIT
o ROLLBACK
o SAVEPOINT
COMMIT;
Example:
COMMIT;
Syntax:
ROLLBACK;
Example:
ROLLBACK;
Syntax:
SAVEPOINT SAVEPOINT_NAME;
SELECT
This statement is used to select data from a database and the data returned is stored in
a result table, called the result-set.
Syntax
Apart from just using the SELECT keyword individually, you can use the following
keywords with the SELECT statement:
o
o DISTINCT
o ORDER BY
o GROUP BY
o HAVING Clause
o INTO
Example
Example
Select all employees from the 'Emp’ table sorted by EmpNo:
-- Select all employees from the 'Emp table sorted by EmpNo in Descending order:
/* Select all employees from the 'Emp' table sorted bsoEmpNo in Descending order and Ename in
Ascending order: */
Example
To list the number of employees from each city.
Example
To list the number of employees in each city. The employees should be sorted high to low and only
those cites must be included who have more than 5 employees:*/
SELECT COUNT(EmpNo), City FROM Emp GROUP BY City HAVING COUNT(EmpNo) > 2 ORDER BY
COUNT(EmpNo) DESC;
Example
To create a backup of database 'Employee'
o Integrity constraints ensure that the data insertion, updating, and other
processes have to be performed in such a way that data integrity is not
affected.
1. Domain constraints
o Domain constraints can be defned as the defnition of a valid set of values
for an attribute.
o The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.
Example:
o This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those
rows.
o A table can contain a null value other than the primary key feld.
Example:
3. Referential Integrity Constraints
o A referential integrity constraint is specifed between two tables.
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set
uniquely.
o An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table.
Example: