Notes On Database Management System
Notes On Database Management System
Database
Management
System
1|Page
What is Data?
Data is nothing but facts and statistics stored or free flowing over a network, generally it's
raw and unprocessed. For example: When you visit any website, they might store you IP
address, that is data, in return they might add a cookie in your browser, marking you that you
visited the website, that is data, your name, it's data, your age, it's data.
Data becomes information when it is processed, turning it into something meaningful. Like,
based on the cookie data saved on user's browser, if a website can analyse that generally men
of age 20-25 visit us more, that is information, derived from the data collected.
What is a Database?
A Database is a collection of related data organised in a way that data can be easily accessed,
managed and updated. Database can be software based or hardware based, with one sole
purpose, storing data.
During early computer days, data was collected and stored on tapes, which were mostly
write-only, which means once data is stored on it, it can never be read again. They were slow
and bulky, and soon computer scientists realised that they needed a better solution to this
problem.
Larry Ellison, the co-founder of Oracle was amongst the first few, who realised the need for
a software based Database Management System.
What is DBMS?
A DBMS is a software that allows creation, definition and manipulation of database,
allowing users to store, process and analyse data easily. DBMS provides us with an interface
or a tool, to perform various operations like creating database, storing data in it, updating
data, creating tables in the database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data
consistency in case of multiple users.
Here are some examples of popular DBMS used these days:
MySql
Oracle
SQL Server
IBM DB2
PostgreSQL
Amazon SimpleDB (cloud based) etc.
2|Page
Advantages of DBMS
Disadvantages of DBMS
It's Complexity
Except MySQL, which is open source, licensed DBMSs are generally costly.
They are large in size.
DBMS 3-tier architecture divides the complete system into three inter-related but
independent modules.
3|Page
Physical Level: At physical level, the information about location of database objects in data
store is kept. Various users of DBMS are unaware about the locations of these objects.
Conceptual Level: At conceptual level, data is represented in the form of various database
tables. For Example, STUDENT database may contain STUDENT and COURSE tables
which will be visible to users but users are unaware about their storage.
External Level: An external level specifies a view of the data in terms of conceptual level
tables. Each external level view is used to cater the needs of a particular category of users.
For Example, FACULTY of a university is interested in looking course details of students,
STUDENTS are interested in looking all details related to academics, accounts, courses and
hostel details as well. So, different views can be generated for different users.
Data Independence
Data independence means change of data at one level should not affect another level. Two
types of data independence are present in this architecture:
Physical Data Independence: Any change in physical location of tables and indexes should
not affect conceptual level or external view of data. This data independence is easy to achieve
and implemented by most of the DBMS.
Conceptual Data Independence: The data at conceptual level schema and external level
schema must be independent. This means, change in conceptual schema should not affect
external schema. e.g.; Adding or deleting attributes of a table should not affect the user’s
view of table. But this type of independence is difficult to achieve as compared to physical
data independence because the changes in conceptual schema are reflected in user’s view.
Database systems comprise of complex data-structures. In order to make the system efficient
in terms of retrieval of data, and reduce complexity in terms of usability of users, developers
use abstraction i.e. hide irrelevant details from the users. This approach simplifies database
design.
There are mainly 3 levels of data abstraction:
Physical: This is the lowest level of data abstraction. It tells us how the data is actually stored
in memory. The access methods like sequential or random access and file organisation
methods like B+ trees, hashing used for the same. Usability, size of memory, and the number
of times the records are factors which we need to know while designing the database.
Suppose we need to store the details of an employee. Blocks of storage and the amount of
memory used for these purposes is kept hidden from the user.
Logical: This level comprises of the information that is actually stored in the database in the
form of tables. It also stores the relationship among the data entities in relatively simple
structures. At this level, the information available to the user at the view level is unknown.
We can store the various attributes of an employee and relationships, e.g. with the manager
can also be stored.
View: This is the highest level of abstraction. Only a part of the actual database is viewed by
the users. This level exists to ease the accessibility of the database by an individual user.
Users view data in the form of rows and columns. Tables and relations are used to store data.
4|Page
Multiple views of the same database may exist. Users can just view the data and interact with
the database, storage and implementation details are hidden from them.
The main purpose of data abstraction is achieving data independence in order to save time
and cost required when the database is modified or altered.
We have namely two levels of data independence arising from these levels of abstraction:
Physical level data independence: It refers to the characteristic of being able to modify the
physical schema without any alterations to the conceptual or logical schema, done for
optimisation purposes, e.g., Conceptual structure of the database would not be affected by
any change in storage size of the database system server. Changing from sequential to
random access files is one such example.These alterations or modifications to the physical
structure may include:
Utilising new storage devices.
Modifying data structures used for storage.
Altering indexes or using alternative file organisation techniques etc.
Logical level data independence: It refers characteristic of being able to modify the logical
schema without affecting the external schema or application program. The user view of the
data would not be affected by any changes to the conceptual view of the data. These changes
may include insertion or deletion of attributes, altering table structures entities or
relationships to the logical schema etc.
ER Model is used to model the logical view of the system from data perspective which
consists of these components:
Entity, Entity Type, Entity Set –
An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a
university course.
An Entity is an object of Entity Type and set of all entities is called as entity set. e.g.; E1 is an
entity having Entity Type Student and set of all students is called Entity Set. In ER diagram,
Entity Type is represented as:
5|Page
Attribute(s):
Attributes are the properties which define the entity type. For example, Roll_No, Name,
DOB, Age, Address, Mobile_No are the attributes which defines entity type Student. In ER
diagram, attribute is represented by an oval.
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key
attribute.For example, Roll_No will be unique for each student. In ER diagram, key
attribute is represented by an oval with underlying lines.
2. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For
example, Address attribute of student Entity type consists of Street, City, State, and
Country. In ER diagram, composite attribute is represented by an oval comprising of
ovals.
Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, multivalued
attribute is represented by double oval.
6|Page
Derived Attribute –
An attribute which can be derived from other attributes of the entity type is known as
derived attribute. e.g.; Age (can be derived from DOB). In ER diagram, derived
attribute is represented by dashed oval.
Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is called as
binary relationship.For example, Student is enrolled in Course.
n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called as n-ary
relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known
as cardinality.
Participation Constraint:
Participation Constraint is applied on the entity participating in the relationship set.
7|Page
1. Total Participation – Each entity in the entity set must participate in the relationship.
If each student must enroll in a course, the participation of student will be total. Total
participation is shown by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in
the relationship. If some courses are not enrolled by any of the student, the participation
of course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having
total participation and Course Entity set having partial participation.
Generalization and Specialization both the terms are more common in Object Oriented
Technology, and they are also used in the Database with the same
features. Generalization occurs when we ignore the differences and acknowledge the
similarities between lower entities or child classes or relations (tables in DBMS) to form a
higher entity. However, when we moved on to the specialization, it spilt a higher entity to
form lower entities, then we discover the differences between those lower entities.
8|Page
Comparison Chart
BASIS FOR
GENERALIZATION SPECIALIZATION
COMPARISON
manner.
Entities The higher level entity must have The higher level entity may not have
a schema. schema.
entities. entity.
entities.
9|Page
Codd's Rule for Relational DBMS
E.F Codd was a Computer Scientist who invented the Relational model for Database
management. Based on relational model, the Relational database was created. Codd
proposed 13 rules popularly known as Codd's 12 rules to test DBMS's concept against his
relational model. Codd's rule actualy define what quality a DBMS requires in order to
become a Relational Database Management System (RDBMS). Till now, there is hardly any
commercial product that follows all the 13 Codd's rules. Even Oracle follows only eight and
half (8.5) out of 13. The Codd's 12 rules are as follows.
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be able to manage
database entirely through the relational capabilities.
10 | P a g e
Rule 7: Relational Level Operation
There must be Insert, Delete, and Update operations at each level of relations. Set operation
like Union, Intersection and minus should also be supported.
Constraints in DBMS-
Relational constraints are the restrictions imposed on the database contents and operations.
They ensure the correctness of data in the database.
11 | P a g e
1. Domain constraint
2. Tuple Uniqueness constraint
3. Key constraint
4. Entity Integrity constraint
5. Referential Integrity constraint
1. Domain Constraint-
Example-
Consider the following Student table-
STU_ID Name Age
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
S004 Rahul A
Here, value ‘A’ is not allowed since only integer values can be taken by the age attribute.
12 | P a g e
2. Tuple Uniqueness Constraint-
Tuple Uniqueness constraint specifies that all the tuples must be necessarily unique in any
relation.
Example-01:
This relation satisfies the tuple uniqueness constraint since here all the tuples are unique.
Example-02:
This relation does not satisfy the tuple uniqueness constraint since here all the tuples are not
unique.
3. Key Constraint-
13 | P a g e
Example-
This relation does not satisfy the key constraint as here all the values of primary key are not
unique.
Entity integrity constraint specifies that no attribute of primary key must contain a null
value in any relation.
This is because the presence of null value in the primary key violates the uniqueness
property.
Example-
This relation does not satisfy the entity integrity constraint as here the primary key contains a
NULL value.
This constraint is enforced when a foreign key references the primary key of a relation.
It specifies that all the values taken by the foreign key must either be available in the
relation of the primary key or be null.
14 | P a g e
Important Results-
The following two important results emerges out due to referential integrity constraint-
We can not insert a record into a referencing relation if the corresponding record does not
exist in the referenced relation.
We can not delete or update a record of the referenced relation if the corresponding record
exists in the referencing relation.
Example-
Student
STU_ID Name Dept_no
S001 Akshay D10
S002 Abhishek D10
S003 Shashank D11
S004 Rahul D14
Department
Dept_name Dept_name Dept_name
Dept_name Dept_name Dept_name
Dept_name Dept_name Dept_name
Dept_name Dept_name Dept_name
Dept_name Dept_name Dept_name
15 | P a g e
Here,
The relation ‘Student’ does not satisfy the referential integrity constraint.
This is because in relation ‘Department’, no value of primary key specifies department no.
14.
Thus, referential integrity constraint is violated.
What is SQL
All DBMS like MySQL, Oracle, MS Access, Sybase, Informix, Postgres, and SQL Server
use SQL as standard database language.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Domain integrity: It enforces valid entries for a given column by restricting the type, the
format, or the range of values.
Referential integrity: It specifies that rows cannot be deleted, which are used by other
records.
User-defined integrity: It enforces some specific business rules that are defined by users.
These rules are different from entity, domain or referential integrity.
16 | P a g e
Difference between DBMS and RDBMS
Although DBMS and RDBMS both are used to store information in physical database but
there are some remarkable differences between them.
The main differences between DBMS and RDBMS are given below:
SQL Commands
17 | P a g e
DELETE: it deletes data from database.
Operator Description
ALL this is used to compare a value to all values in another value set.
AND this operator allows the existence of multiple conditions in an SQL statement.
ANY this operator is used to compare the value in list according to the condition.
BETWEEN this operator is used to search for values, that are within a set of values
IN this operator is used to compare a value to that specified list value
NOT the NOT operator reverse the meaning of any logical operator
OR this operator is used to combine multiple conditions in SQL statements
EXISTS the EXISTS operator is used to search for the presence of a row in a specified table
LIKE this operator is used to compare a value to similar values using wildcard operator
SQL JOIN
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to
combine two or more tables".
The SQL JOIN clause takes records from two or more tables in a database and combines it
together.
1. inner join,
2. left outer join,
3. right outer join,
18 | P a g e
4. full outer join, and
5. cross join.
In the process of joining, rows of both tables are combined in a single table.
If you want to access more than one table through a select statement.
If you want to combine two or more table then SQL JOIN statement is used .it combines rows
of that tables in one table and one can retrieve the information by a SELECT statement.
The joining of two or more tables is based on common field between them.
SQL INNER JOIN also known as simple join is the most common type of join.
A column or columns is called primary key (PK) that uniquely identifies each row in the
table.
If you want to create a primary key, you should define a PRIMARY KEY constraint when
you create or modify a table.
When multiple columns are used as a primary key, it is known as composite primary key.
In designing the composite primary key, you should use as few columns as possible. It is
good for storage and performance both, the more columns you use for primary key the more
storage space you require.
Inn terms of performance, less data means the database can process faster.
19 | P a g e
Difference between primary key and foreign key in SQL:
These are some important difference between primary key and foreign key in SQL-
Primary key cannot be null on the other hand foreign key can be null.
Primary key uniquely identify a record in a table while foreign key is a field in a table that is
primary key in another table.
There is only one primary key in the table on the other hand we can have more than one
foreign key in the table.
By default primary key adds a clustered index on the other hand foreign key does not
automatically create an index, clustered or non-clustered. You must manually create an index
for foreign key.
A composite key is a combination of two or more columns in a table that can be used to
uniquely identify each row in the table when the columns are combined uniqueness is
guaranteed, but when it taken individually it does not guarantee uniqueness.
Sometimes more than one attributes are needed to uniquely identify an entity. A primary key
that is made by the combination of more than one attribute is known as a composite key.
Composite key is a key which is the combination of more than one field or column of a given
table. It may be a candidate key or primary key.
Columns that make up the composite key can be of different data types.
A unique key is a set of one or more than one fields/columns of a table that uniquely identify
a record in a database table.
You can say that it is little like primary key but it can accept only one null value and it cannot
have duplicate values.
The unique key and primary key both provide a guarantee for uniqueness for a column or a
set of columns.
There is an automatically defined unique key constraint within a primary key constraint.
There may be many unique key constraints for one table, but only one PRIMARY KEY
constraint for one table.
20 | P a g e
Alternate Key in SQL
Let's take an example of student it can contain NAME, ROLL NO., ID and CLASS.
Here ROLL NO. is primary key and rest of all columns like NAME, ID and CLASS are
alternate keys.
If a table has more than one candidate key, one of them will become the primary key and rest
of all are called alternate keys.
In simple words, you can say that any of the candidate key which is not part of primary key is
called an alternate key. So when we talk about alternate key, the column may not be primary
key but still it is a unique key in the column.
SQL vs NoSQL
There are a lot of databases used today in the industry. Some are SQL databases, some are
NoSQL databases. The conventional database is SQL database system that uses tabular
relational model to represent data and their relationship. The NoSQL database is the newer
one database that provides a mechanism for storage and retrieval of data other than tabular
relations model used in relational databases.
21 | P a g e
Difference between DELETE and TRUNCATE statement in SQL
The main differences between SQL DELETE and TRUNCATE statements are given below:
ACID property is used to ensure that the data transactions are processed reliably in a database
system.
Atomicity: it requires that each transaction is all or nothing. It means if one part of the
transaction fails, the entire transaction fails and the database state is left unchanged.
Consistency: the consistency property ensure that the data must meet all validation rules. In
simple words you can say that your transaction never leaves your database without
completing its state.
Isolation: this property ensure that the concurrent property of execution should not be met.
The main goal of providing isolation is concurrency control.
Durability: durability simply means that once a transaction has been committed, it will
remain so, come what may even power loss, crashes or errors.
Normalization
22 | P a g e
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes
are fully functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and
has no multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
Normalization Rule
Normalization rules are divided into the following normal forms:
23 | P a g e
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
24 | P a g e