DBMS Proper
DBMS Proper
DBMS
Database Management System (DBMS) is a software for storing and retrieving users'
data while considering appropriate security measures. It allows users to create their
own databases as per their requirement. Examples are - MySQL, Oracle, MongoDB.
1. Processing queries and object management : We can directly store data in the form
of objects in a DBMS. Application level code needs to be written to handle, store and scan
through the data in a file system whereas a DBMS gives us the ability to query the
database.
3. Efficient memory management and indexing : In file systems, files are indexed in
place of objects so query operations require entire file scans whereas in a DBMS, object
indexing takes place efficiently through database schema based on any attribute of the
data. This helps in fast retrieval of data based on the indexed attribute.
4. Concurrent access : Multiple users can access the database at the same time when
we are using the DBMS.
5. Security : Only authorised users should be allowed to access the database and their
identity should be authenticated using a username and password.
6. Backup and Recovery : The users don't need to backup data periodically because this
is taken care of by the DBMS. Moreover, it also restores the database after a crash or
system failure to its previous condition.
TYPES OF DBMS ARCHITECTURE
1-Tier Architecture : simplest architecture of Database in which the client, server, and
Database all reside on the same machine. A simple one tier architecture example would be
anytime you install a Database in your system and access it to practice SQL queries. But
such architecture is rarely used in production.
2-Tier Architecture : Two-tier architecture consists of two layers - Client Layer and
Database Layer. The application logic is either buried inside the user interface on
the client or within the database on the server (or both). It is easy to build and
maintain. It is less secure as client can communicate with database directly.
Performance is affected on scaling. Example – Railway Reservation System, Bank
operations during physical visit.
CREATE - used to create the database or its objects (tables, indexes, constraints)
ALTER - used to alter the structure of the database
DROP - used to delete objects from the database
TRUNCATE - used to remove all records from a table
COMMENT - used to add comments to the data dictionary
RENAME - used to rename an object
3. DCL(Data Control Language) : It contains commands which are required to deal with
the user rights, permissions and controls of the database system. Examples are -
GRANT - It gives user access privileges to a database
REVOKE - It takes back permissions from the user
4. TCL(Transaction Control Language) : It contains commands which are required to
deal with the transaction within the database i.e to run the the changes made by the DML
statement. Examples are -
COMMIT - It saves the transaction or work done on the database
SAVEPOINT - It sets a point in a transaction to which you can later roll back
ROLLBACK - It restores the database to original since the last COMMIT
DATA ABSTRACTION
The process of hiding irrelevant details from users is known as data abstraction.
Data abstraction can be divided into 3 levels -
3. External or View Level : This is the highest level of abstraction. It describes how the
data should be shown to the user and hides the details of the table schema and its
physical storage from the users.
INTEGRITY CONSTRAINTS
Integrity constraints are a set of rules which are used to maintain the quality of information
and to guard the database against accidental damage. Types are -
1. Domain Integrity : All attributes in table must have a defined domain i.e. a finite
set of values which have to be used. When we assign a datatype to a column, we
limit the values that it can contain. In addition we can also have value restriction as
per business rules eg. gender must be M or F.
2. Entity Integrity : Each table must have a column or a set of columns through
which we can uniquely identify a row. These columns cannot have NULL values. (No
primary key can take NULL value)
4. Key Integrity : These are called uniqueness constraints since it ensures that
every tuple in the relation should be unique. A relation can have multiple keys or
candidate keys, out of which we choose one of the keys as primary key which
should be unique and not null.
DBMS vs RDBMS
DBMS RDBMS
Data fetching is slower for the large Data fetching is fast because of relational
amount of data. approach.
1. One To One : Such a relationship exists when each record of one table is related to
only one record of the other table. For example, If there are two entities ‘Person’ and
‘Passport’. So, each person can have only one passport and each passport belongs to
only one person.
2. One To Many : Such a relationship exists when each record of one table can be related
to one or more than one record of the other table. This relationship is the most common
relationship found. For example, If there are two entity type ‘Customer’ and ‘Account’ then
each ‘Customer’ can have more than one ‘Account’ but each ‘Account’ is held by only one
‘Customer’.
3. Many To Many : Such a relationship exists when each record of the first table can
be related to one or more than one record of the second table and a single record of
the second table can be related to one or more than one record of the first table. For
example, If there are two entity type ‘Customer’ and ‘Product’ then each customer
can buy more than one product and a product can be bought by many different
customers.
KEYS IN DBMS
Candidate Key : The minimal set of attributes which can uniquely identify rows of a table.
There can be more than one candidate keys in a table. One key amongst all candidate
keys can be chosen as a primary key.
Super Key : The set of all the keys which help to identify rows in a table uniquely. It is the
superset of a candidate keys.
Primary Key : It is a column of a table or a set of columns that helps to identify every
record present in that table uniquely. There can be more than one candidate key in relation
out of which one can be chosen as the primary key. There can be only one primary Key in
a table. A primary key must be unique and not null.
Alternate/Secondary Key : All the candidate keys which are not chosen as primary keys
are considered as alternate keys.
Unique Key : The unique key is very similar to the primary key except that primary keys
don’t allow NULL values in the column but unique keys allow them and there can be only
one primary key in a table while there can be multiple unique keys in the table.
Composite Key : A composite key refers to a combination of two or more columns that
can uniquely identify each row in a table.
Foreign Key : It is used to establish relationships between two tables. A foreign key will
require each value in a column or set of columns to match the primary key of the
referential table. Foreign keys help to maintain data and referential integrity.
E-R MODEL
Entity Type : It is a collection of entities that have the same attributes. So, an entity type in
an ER diagram is defined by a name (here, STUDENT) and a set of attributes(here,
Roll_no, Student_name, Age, Mobile_no) which is the STUDENT table.
Types -
1. Strong Entity Type : Those entity types which have a key attribute. The primary key
helps in identifying each entity uniquely. It is represented by a rectangle. In the above
example, Roll_no identifies each element of the table uniquely and hence, we can say that
STUDENT is a strong entity type.
2. Weak Entity Type : Those entity types which doesn't have a key attribute. Weak entity
type can't be identified on its own. It depends upon some other strong entity for its distinct
identity. It is represented by a double outlined rectangle. The relationship between a weak
entity type and strong entity type is called an identifying relationship and shown with a
double outlined diamond instead of a single outlined diamond. Example: If we have two
tables of Customer(Customer_id, Name, Mobile_no, Age, Gender) and Address(Locality,
Town, State, Customer_id). Here we cannot identify the address uniquely as there can be
many customers from the same locality. So, for this, we need an attribute of Strong Entity
Type i.e ‘Customer’ here to uniquely identify entities of 'Address' Entity Type.
Entity Set : It is a set of all the entities present in a specific entity type in a database. For
example, a set students, employees, teachers, etc. represent an entity set. We can say
that entity type is a superset of the entity set as all the entities are included in the entity
type.
Comparison
ER Model Relational Model
Basis
SQL
Structured Query Language (SQL) is the standard language used to operate, manage, and
access databases. It is owned, hosted, maintained, and offered by Microsoft.
MySQL
JOIN
A SQL Join statement is used to combine data or rows from two or more tables based on a
related column between them. Different types of Joins are :
1. INNER JOIN : selects all rows from both the tables as long as the condition satisfies i.e
value of the common field will be same. Example -
2. LEFT JOIN : returns all the rows of the table on the left side of the join and
matching rows for the table on the right side of join. The rows for which there is no
matching row on right side, the result-set will contain null. Example -
4. FULL JOIN : creates the result-set by combining result of both LEFT JOIN and RIGHT
JOIN. The result-set will contain all the rows from both the tables. The rows for which there
is no matching, the result-set will contain NULL values.
5. SELF JOIN : used to join a table to itself as if the table were two tables;
temporarily renaming at least one table in the SQL statement. Example -
VIEWS
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a
real table in the database. We can create a view by selecting fields from one or more
tables present in the database. Syntax -
Uses of a View :
• Summarize data from various tables which can be used to generate reports.
• Restrict access to the data in such a way that a user can see and (sometimes)
modify exactly what they need and no more.
• Limiting the visibility of columns (via select) or rows (via where) to just those
required for a particular task.
• Combining rows and or columns from multiple tables into one logical table.
TRIGGER
A trigger is a stored procedure in database which automatically invokes whenever a
special event in the database occurs. For example, a trigger can be invoked when a row is
inserted into a specified table or when certain table columns are being updated.
Syntax :
Explanation of syntax :
Example :
Given Student Report Database, in which student marks assessment is recorded. In such
schema, create a trigger so that the total marks is automatically inserted whenever a
record is insert.
Above SQL statement will create a trigger in the student database in which whenever
subjects marks are entered, before inserting this data into the database, trigger will
compute those two values and insert with the entered values.
SQL INJECTION
It is a code injection technique that might destroy your database. It is one of the most
common web hacking techniques. SQL injection usually occurs when you ask a user for
input, like their username/userid, and instead of a name/id, the user gives you an SQL
statement that you will unknowingly run on your database.
SUBQUERY
A Subquery or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause. Subqueries must be enclosed within parentheses.
A subquery is used to return data that will be used in the main query as a condition to further
restrict the data to be retrieved. Example -
DELETE vs TRUNCATE
The DELETE command is used to While this command is used to delete all the
1.
delete specified rows(one or more). rows from a table.
In the DELETE command, a tuple is While in this command, data page is locked
4.
locked before removing it. before removing the table data.
CURSOR
2. Explicit Cursors : are created by users whenever the user requires them. They are used
for fetching data from table in row-by-row manner.
Example : OPEN s1
There are total 6 methods to access data from cursor. They are as follows :
FIRST is used to fetch only the first row from cursor table.
LAST is used to fetch only last row from cursor table.
NEXT is used to fetch data in forward direction from cursor table.
PRIOR is used to fetch data in backward direction from cursor table.
ABSOLUTE n is used to fetch the exact nth row from cursor table.
RELATIVE n is used to fetch the data in incremental way as well as decremental way.
Syntax :
FETCH NEXT/FIRST/LAST/PRIOR/ABSOLUTE n/RELATIVE n FROM cursor_name
Example :
FETCH FIRST FROM s1
FETCH LAST FROM s1
FETCH NEXT FROM s1
FETCH PRIOR FROM s1
FETCH ABSOLUTE 7 FROM s1
FETCH RELATIVE -2 FROM s1
Example : CLOSE s1
Example : DEALLOCATE s1
INDEXING IN DBMS
Indexing is a way to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed. It is a data structure technique which is
used to quickly retrieve records from database.
An Index is a small table having only two columns. The first column comprises of a Search
Key that contains a copy of the primary or candidate key of a table. Its second column
contains a set of pointers for holding the address of the disk block where that specific key
value can be found.
Types of Indexing
1. Primary Index : Default format of indexing. Primary Index is an ordered file which is of
fixed length size with two fields. The first field is the primary key and second field is the
pointer to that specific data block.
The primary Indexing in DBMS is also further divided into two types :
• Dense Index : The dense index contains an index record for every search key value
in the data file. It makes searching
faster. In this, the number of records
Dense Index
in the index table is same as the number
of records in the main table. Sparse Index
• Sparse Index : In this, the index record appears only for a few items present in the
data file. Each entry in the index table points to a block.
To locate a record, we find the index record with the largest search key value less than or
equal to the search key value we are looking for.
We start at that record pointed to by the index record, and proceed along with the pointers
in the file (that is, sequentially) until we find the desired record.
2. Clustered Index : It is defined on an ordered data file. The data file is ordered on a non-
key field. In some cases, the index is created on non-primary key columns which may not
be unique for each record. In such cases, in order to identify the records faster, we will group
two or more columns together to get the unique values and create index out of them. This
method is known as the clustering index. Basically, records with similar characteristics are
grouped together and indexes are created for these groups.
In this example, if we know the value of Employee number, we can obtain Employee
Name, city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.
NORMALIZATION
Purpose of Normalization :
Types of Normalization :
A relation is in 1NF if every attribute in that relation is singled valued attribute. First normal
form disallows the multi-valued attribute and composite attribute.
Example : Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
A table or relation must be in 1NF and all the non-primary key attributes should be fully
functional dependent on the primary key. It applies to relations with composite keys, that
is, relations with a primary key composed of two or more attributes.
Example : Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.
TEACHER_DETAIL
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A table or relation must be in 2NF and there should be no transitive dependency for non-
prime attributes.
Example :
That's why we need to move the EMP_CITY and EMP_STATE to the new
EMPLOYEE_ZIP table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if for every
functional dependency X → Y, X is the super/candidate key of the table.
Example : Let's assume there is a company where employees work in more than one
department.
The table is not in BCNF because left side part of both the functional dependencies i.e.
EMP_DEPT and EMP_ID are not alone candidate key.
To convert the given table into BCNF, we decompose it into three tables :
EMP_COUNTRY table :
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table :
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table :
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Candidate Keys -
For the first table : EMP_ID
For the second table : EMP_DEPT
For the third table : {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a
candidate key.
DENORMALIZATION
When we normalize tables, we break them into multiple smaller tables. So when we want
to retrieve data from multiple tables, we need to perform some kind of join operation on
them. In that case, we use the denormalization technique that eliminates the drawback of
normalization.
So, Denormalization is a database optimization technique in which we add redundant data
to one or more tables.
For example, in a normalized database, we might have a Courses table and a Teachers
table. Each entry in Courses would store the teacherID for a Course but not the
teacherName. When we need to retrieve a list of all Courses with the Teacher name, we
would do a join between these two tables. The drawback is that if tables are large, we may
spend an unnecessarily long time doing joins on tables. In this case, we'll update the
database with denormalization, redundancy, and extra effort to maximize the efficiency
benefits of fewer joins.
TRANSACTION
A transaction can be defined as a group of tasks. A single task is the minimum processing unit
which cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's
account to B's account. This very simple and small transaction involves several low-level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
STATES OF TRANSACTION
1. Active State : This is the first state in the life cycle of a transaction. A transaction is
called in an active state as long as its instructions are getting executed. All the changes
made by the transaction now are stored in the buffer in main memory.
2. Partially Committed State : After the last instruction of transaction has executed, it
enters into a partially committed state. After entering this state, the transaction is
considered to be partially committed. It is not considered fully committed because all the
changes made by the transaction are still stored in the buffer in main memory.
3. Committed State : After all the changes made by the transaction have been
successfully stored into the database, it enters into a committed state. After a transaction
has entered the committed state, it is not possible to roll back the transaction.
4. Failed State : When a transaction is getting executed in the active state or partially
committed state and some failure occurs due to which it becomes impossible to continue
the execution, it enters into a failed state.
5. Aborted State : After the transaction has failed and entered into a failed state, all the
changes made by it have to be undone. To undo the changes made by the transaction, it
becomes necessary to roll back the transaction. After the transaction has rolled back
completely, it enters into an aborted state.
6. Terminated State : This is the last state in the life cycle of a transaction.
After entering the committed state or aborted state, the transaction finally enters into a
terminated state where its life cycle finally comes to an end.
ACID PROPERTIES
In order to maintain consistency in a database, before and after the transaction, certain
properties are followed. These are called ACID properties.
1. Atomicity : By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all. There is no midway i.e. transactions do not occur partially. Atomicity
is the main focus in the bank systems.
Example – if person A having $30 in his account wishes to send $10 to person B’s
account. Suppose in account B, a sum of $100 is already present. Now, there will be two
operations that will take place. One is the deduction of $10 from account A and second is
the addition of $10 to account B. Now, suppose the first operation of debit executes
successfully, but the credit operation, however, fails. Thus in account A, balance becomes
$20 and in account B, the balance remains $100.
2. Consistency : This means that integrity constraints must be maintained so that the
database is consistent before and after the transaction. It refers to the correctness of a
database.
Example – Suppose, there are three accounts, A, B, and C, where A is making a
transaction T one by one to both B & C. There are two operations that take place, i.e.,
Debit and Credit. Account A firstly debits $50 to account B, and the amount in account A is
read $300 by B before the transaction. After the successful transaction T, the available
amount in B becomes $150. Now, A debits $20 to account C, and that time, the value read
by C is $250 (that is correct as a debit of $50 has been successfully done to B). The debit
and credit operation from account A to C has been done successfully. We can see that the
transaction is done successfully, and the value is also read correctly. Thus, the data is
consistent. In case the value read by B and C is $300, which means that data is
inconsistent because when the debit operation executes, it will not be consistent.
3. Isolation : This property ensures that multiple transactions can occur concurrently
without leading to the inconsistency of database state. Transactions occur independently
without interference. Changes occurring in a particular transaction will not be visible to any
other transaction until that particular change in that transaction has been committed.
Example - If two operations are concurrently running on two different accounts, then the
value of both accounts should not get affected. As you can see in the below diagram,
account A is making T1 and T2 transactions to account B and C, but both are executing
independently without affecting each other.
4.
Durability : This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and they
persist even if a system failure occurs. These updates now become permanent and are
stored in non-volatile memory.