Database Management System
Database Management System
Basic Concept:
DBMS allows users to create their own databases as per their requirement. The term “DBMS”
includes the user of the database and other application programs. It provides an interface between
the data and the software application.
We need to specify the structure of the records of each file by defining the different types of
data elements to be stored in each record.
We can also use a coding scheme to represent the values of a data item.
Basically, your Database will have 5 tables with a foreign key defined amongst the various
tables.
MySQL
Microsoft Access
Oracle
PostgreSQL
dBASE
FoxPro
SQLite
IBM DB2
LibreOffice Base
MariaDB
Microsoft SQL Server etc.
Application of DBMS
An entity contains a real-world property called an attribute. Attributes are defined by a set of
values known as domains. For example, in an office the employee is an entity, the office is the
database, employee ID, name are the attributes. The logical association between the different
entities are known as the relationship among them.
3. Relational Data Model
The most popular and extensively used data model is the relational data model. The data model
allows the data to be stored in tables called a relation. The relations are normalized and the
normalized relation values are known as atomic values. Each of the rows in a relation is called
tuples which contains the unique value. The attributes are the values in each of the columns which
are of the same domain.
Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.
These commands are used to update the database schema that's why they come under Data
definition language.
(But in Oracle database, the execution of data control language does not have the feature
of rolling back.)
There are the following operations which have the authorization of Revoke:
DBMS Architecture
The architecture of DBMS depends on the computer system on which it runs. For example, in a
client-server DBMS architecture, the database systems at server machine can run several
requests made by client machine. We will understand this communication with the help of
diagrams.
For example, lets say you want to fetch the records of employee from the database and the
database is available on your computer system, so the request to fetch employee details will be
done by your computer and the records will be fetched from the database by your computer as
well. This type of system is generally referred as local database system.
In two-tier architecture, the Database system is present at the server machine and the DBMS
application is present at the client machine, these two machines are connected with each other
through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a query
language like sql, the server perform the request on the database and returns the result back to
the client. The application connection interface such as JDBC, ODBC are used for the interaction
between server and client.
Database Users
Database users are the one who really use and take the benefits of database. There will be
different types of users depending on their need and way of accessing the database.
1. Application Programmers – They are the developers who interact with the database
by means of DML queries. These DML queries are written in the application programs like
C, C++, JAVA, Pascal etc. These queries are converted into object code to communicate
with the database. For example, writing a C program to generate the report of employees
who are working in particular department will involve a query to fetch the data from
database. It will include a embedded SQL query in the C Program.
2. Sophisticated Users – They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request
the database. They directly interact with the database by means of query language like
SQL. These users will be scientists, engineers, analysts who thoroughly study SQL and
DBMS to apply the concepts in their requirement. In short, we can say this category
includes designers and developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but they write special database
application programs. They are the developers who develop the complex programs to the
requirement.
4. Stand-alone Users – These users will have stand –alone database for their personal
use. These kinds of database will have readymade database packages which will have
menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to interact with the
database. For example, online library system, ticket booking systems, ATMs etc which has
existing application and users use them to interact with the database to fulfill their requests.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one level of
the database system without altering the schema at the next higher level.
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model
is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy
to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city, street
name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be
taken as an entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like
Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to
one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the
right associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Types of relationships
There are 3 types of relationships. One to One, One to Many, Many to Many.
One to One: When one record in the first table relates to only one record in the second
table and vice versa. Here you may think that if it is one to one relationship then why do
not we store data into one table only rather than having two separate tables? The answer
to that is we design that way for security purposes. Let us say, we want to store data of
our name, email, address, contact, and password. Here, the detail of the password is
very sensitive and therefore we can create a different table just for that and we can store
the password in a separate table so that only certain people with access to it can see
that.
One to Many: This is the most common type of relationship. One record in the first table
relates to many records in the second table but one record of the second table can only
relate to one record of the first table. For example, we may have one to many relationship
between a person and bank account where one person can have many bank accounts
but a bank account can only have one specific owner. (assuming joint bank account is
not allowed)
Many to Many: One record in the first table relates to many records in the second table
and vice versa. Generally, we break down one many to many relation to two one to many
relations in logical design and the intermediary table is referred as a junction table. An
example would be student and course where one student can take many courses and
each course can be taken by many students.
With so much information available for companies, investing in a database management systems
is of critical importance for brands across all sectors and groups. Today, virtually all companies
and brands run of database systems. These storehouse of organised information can help brands
to store information of all kinds, which they can not just sort but also make available at the click of
a mouse as well. In short, database management systems helps brands to track every part of their
business in a fast, effective, efficient and successful way than ever before
Structural Constraints :
Structural Constraints are also called Structural properties of a database management system
(DBMS). Cardinality Ratios and Participation Constraints taken together are called Structural
Constraints. The name constraints refer to the fact that such limitations must be imposed on the
data, for the DBMS system to be consistent with the requirements.
The Structural constraints are represented by Min-Max notation. This is a pair of numbers(m, n)
that appear on the connecting line between the entities and their relationships. The minimum
number of times an entity can appear in a relation is represented by m whereas, the maximum time
it is available is denoted by n. If m is 0 it signifies that the entity is participating in the relation
partially, whereas, if m is either greater than or equal to 1, it denotes total participation of the entity.
Extended ER Modeling Features
Specialization – The process of designating to sub grouping within an entity set is called
specialization. In above figure, the “person” is distinguish in to whether they are “employee”
or “customer”.
Formally in above figure specialization is depicted by a triangle component labelled (is a), means
the customer is a person.
Sometime this ISA (is a) referred as a superclass-subclass relationship. This is also used to
emphasize on to creating the distinct lower level entity sets.
Generalization – generalization is relationship that exist between higher level entity set and
one or more lower level entity sets. Generalization synthesizes these entity sets into single
entity set.
Higher level and lower level entity sets – This property is created by specialization and
generalization. The attributes of higher level entity sets are inherited by lower level entity sets.
For example: In above figure “customers” and “employee” inherits the attributes of “person”.
Attribute inheritance: When given entity set is involved as a lower entity set in only one “ISA”
(is a) relationship, it is referred as a single attribute inheritance. If lower entity set is involved in
more than one ISA (is a) relationship, it is referred as a multi attribute inheritance.
Aggregation: there is a one limitation with E-R model that it cannot express relationships
among relationships. So aggregation is an abstraction through which relationship is treated
as higher level entities.
A schema diagram can display only some aspects of a schema like the name of record type, data
type, and constraints. Other aspects can’t be specified through the schema diagram. For example,
the given figure neither show the data type of each data item nor the relationship among various
files.
In the database, actual data changes quite frequently. For example, in the given figure, the
database changes whenever we add a new grade or add a student. The data at a particular
moment of time is called the instance of the database.
Relational Model
Relational Model (RM) represents the database as a collection of relations. A relation is nothing
but a table of values. Every row in the table represents a collection of related data values. These
rows in the table denote a real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row. The
data are represented as a set of relations. In the relational model, data are stored as tables.
However, the physical storage of the data is independent of the way the data are logically
organized.
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be able to manage database
entirely through the relational capabilities.
Relational Algebra
RELATIONAL ALGEBRA is a widely used procedural query language. It collects instances of
relations as input and gives occurrences of relations as output. It uses various operations to
perform this action. SQL Relational algebra query operations are performed recursively on a
relation. The output of these operations is a new relation, which might be formed from one or more
input relations.
SELECT (σ)
The SELECT operation is used for selecting a subset of the tuples according to a given selection
condition. Sigma(σ)Symbol denotes it. It is used as an expression to choose tuples which meet the
selection condition. Select operator selects tuples that satisfy a given predicate.
σp(r)
σ is the predicate
p is prepositional logic
Projection(π)
The projection eliminates all attributes of the input relation but those mentioned in the projection
list. The projection method defines a relation that contains a vertical subset of Relation.
This helps to extract the values of specified attributes to eliminates duplicate values. (pi) symbol is
used to choose attributes from a relation. This operator helps you to keep specific columns from a
relation and discards the other columns.
Example of Projection:
1 Google Active
2 Amazon Active
3 Apple Inactiv
e
4 Alibaba Active
Intersection
An intersection is defined by the symbol ∩
A∩B
Defines a relation consisting of a set of all tuple that are in both A and B. However, A and B must
be union-compatible
Relational Calculus?
Contrary to Relational Algebra which is a procedural query language to fetch data and which also
explains how it is done, Relational Calculus in non-procedural query language and has no
description about how the query will work or the data will b fetched. It only focusses on what to do,
and not on how to do it.
Relational Calculus exists in two forms:
SQL | Datatypes
1. Binary Datatypes :
There are four subtypes of this datatype which are given below :
6.
Date and Time Datatype The details are given in below table.
SQL: Literals
String Literals
String literals are always surrounded by single quotes (').
For example:
'TechOnTheNet.com'
'This is a literal'
'XYZ'
'123'
Integer Literals
Integer literals can be either positive numbers or negative numbers, but do not contain decimals. If
you do not specify a sign, then a positive number is assumed. Here are some examples of valid
integer literals:
536
+536
-536
Decimal Literals
Decimal literals can be either positive numbers or negative numbers and contain decimals. If you
do not specify a sign, then a positive number is assumed. Here are some examples of valid
decimal literals:
24.7
+24.7
-24.7
Datetime Literals
Datetime literals are character representations of datetime values that are enclosed in single
quotes. Here are some examples of valid datetime literals:
In order to make/perform changes on the physical structure of any table residing inside a
database, DDL is used. These commands when executed are auto commit in nature and all
the changes in the table are reflected and saved immediately. DDL commands includes :
Once the tables are created and database is generated using DDL commands,
manipulation inside those tables and databases is done using DML commands. The
advantage of using DML commands is, if in case any wrong changes or values are made,
they can be changes and rolled back easily. DML commands includes :
DCL commands as the name suggests manages the matters and issues related to the data
control in any database. TCL commands mainly provides special privilege access to users
and is also used to specify the roles of users accordingly. There are two commonly used
DCL commands, these are:
SQL Operators
SQL statements generally contain some reserved words or characters that are used to perform
operations such as comparison and arithmetical operations etc. These reserved words or
characters are known as operators.
Example:
- It subtracts right hand operand from left hand operand a-b will give -50
/ It divides left hand operand by right hand operand b/a will give 2
% It divides left hand operand by right hand operand and returns b%a will give 0
reminder
= Examine both operands value that are equal or not,if yes condition (a=b) is not
become true. true
!= This is used to check the value of both operands equal or not,if not (a!=b) is true
condition become true.
<> Examines the operand's value equal or not, if values are not equal (a<>b) is
condition is true true
> Examine the left operand value is greater than right Operand, if yes (a>b) is not
condition becomes true true
< Examines the left operand value is less than right Operand, if yes (a<=""
condition becomes true td="">
>= Examines that the value of left operand is greater than or equal to the (a>=b) is not
value of right operand or not,if yes condition become true true
<= Examines that the value of left operand is less than or equal to the (a<=b) is
value of right operand or not, if yes condition becomes true true
!< Examines that the left operand value is not less than the right operand (a!<=""
value td="">
!> Examines that the value of left operand is not greater than the value of (a!>b) is true
right operand
Operator Description
ALL this is used to compare a value to all values in another value set.
AND this operator allows the existence of multiple conditions in an SQL statement.
ANY this operator is used to compare the value in list according to the condition.
BETWEEN this operator is used to search for values, that are within a set of values
NOT the NOT operator reverse the meaning of any logical operator
LIKE this operator is used to compare a value to similar values using wildcard operator
SQL Table
o SQL Table is a collection of data which is organized in terms of rows and columns. In
DBMS, the table is known as relation and row as a tuple.
o Table is a simple form of data storage. A table is also considered as a convenient
representation of relations.
In the above table, "EMPLOYEE" is the table name, "EMP_ID", "EMP_NAME", "CITY",
"PHONE_NO" are the column names. The combination of data of multiple columns forms a row,
e.g., 1, "Kristen", "Washington" and 7289201223 are the data of one row.
Operation on Table
1. Create table
2. Drop table
3. Delete table
4. Rename table
Syntax
Example
Drop table
A SQL drop table is used to delete a table definition and all the data from a table. When this command is executed, all
the information available in the table is lost forever, so you have to very careful while using this command.
Syntax
Firstly, you need to verify the EMPLOYEE table using the following command:
This table shows that EMPLOYEE table is available in the database, so we can drop it as follows:
Now, we can check whether the table exists or not using the following command:
Syntax
Example
Unique and non-unique indexes: When you create an index, you may allow the indexed columns
to contain duplicate values; the index will still list all of the rows with duplicates. You may also
specify that values in the indexed columns must be unique, just as they must be with a primary
key. In fact, when you create a primary key constraint on a table, Oracle and most other systems
will automatically create a unique index on the primary key columns, as well as not allowing null
values in those columns. One good reason for you to create a unique index on non-primary key
fields is to enforce the integrity of a candidate key, which otherwise might end up having
(nonsense) duplicate values in different rows.
Queries versus insertion/update: It might seem as if you should create an index on every column
or group of columns that will ever by used in an ORDER BY clause (for example: lastName,
firstName). However, each index will have to be updated every time that a row is inserted or a
value in that column is updated. Although index structures such as B or B+ trees allow this to
happen very quickly, there still might be circumstances where too many indexes would detract
from overall system performance. This and similar issues are often covered in more advanced
courses.
Syntax: As you would expect by now, the SQL to create an index is:
To specify sort order, add the keyword ASC or DESC after each column name, just as you would
do in an ORDER BY clause.
What is QUERY?
A query is an operation that retrieves data from one or more tables or views.
SELECT statement can be used for retrieving the data from various tables in a database.
Example:
<Employee> Table
Output:
Ename
ABC
PQR
Example:
SELECT DISTINCT city FROM Employee;
Output:
City
Bangalore
Mumbai
Pune
4. SELECT using IN
'IN' determines whether a specified value matches any value in a sub-query or a list.
Example:
SELECT Eid, Ename FROM Employee
WHERE Salary IN (5000, 20000);
Output:
Eid Ename
E001 ABC
E003 LMN
Example:
SELECT Eid, Ename, Salary FROM Employee
WHERE Salary BETWEEN 5000 AND 30000;
Output:
NOT BETWEEN
Example:
SELECT Eid, Ename, Age FROM Employee
WHERE Age NOT BETWEEN 24 AND 25;
Output:
E001 ABC 29
E002 PQR 30
E005 STU 32
LIKE clause is used for comparing a value with similar values using wildcard operators (% and _ ).
Suppose, if you want user name starts with 'S', then use 'LIKE' clause as follows,
Example:
SELECT Ename, City, Salary FROM Employee
WHERE Ename LIKE 'S%';
Output:
Sub-query is a inner query within another query. It is used to return data in the main query as a
condition to retrieved the data.
Sub-queries are nested SELECT statement.
It is a query within a query.
Sub-queries are mostly appear within the WHERE or HAVING clause of another SQL statement.
It defines with another SELECT statement with a FROM clause and optional WHERE, GROUP
BY and HAVING clauses.
It produces a single column of data as its result.
In a sub-query, ORDER BY clause cannot be specified. ORDER BY clause is specified in the
main query.
Sub-query is always enclosed in parentheses.
It cannot be a UNION, only a single SELECT statement is allowed.
In a sub-query, 'SELECT *' cannot be used unless the referring table has only one column and
nested query is evaluated first.
Example:
SELECT Ename, Salary FROM Employee
WHERE Salary IN
(SELECT MAX (Salary) FROM Employee);
Output:
Ename Salary
PQR 30000
Following are the comparison operators where sub-queries are expressed as one SELECT
statement connected to another,
Comparison Operator
Operator Description
= Equal to
Operator Description
IN Equal to any value retrieved in an Inner query.
1. COUNT FUNCTION
o COUNT function is used to Count the number of rows in a database table. It can work on
both numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
Example: SUM()
SELECT SUM(COST)
FROM PRODUCT_MAST;
Output:
670
Example: SUM() with WHERE
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3;
Output:
320
Example: SUM() with GROUP BY
1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3
4. GROUP BY COMPANY;
Output:
Com1 150
Com2 170
Example: SUM() with HAVING
1. SELECT COMPANY, SUM(COST)
2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING SUM(COST)>=170;
Output:
Com1 335
Com3 170
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG function returns the average of all
non-Null values.
Syntax
1. AVG()
2. or
3. AVG( [ALL|DISTINCT] expression )
Example:
1. SELECT AVG(COST)
2. FROM PRODUCT_MAST;
Output:
67.00
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function determines the largest value of all
selected values of a column.
Syntax
1. MAX()
2. or
3. MAX( [ALL|DISTINCT] expression )
Example:
1. SELECT MAX(RATE)
2. FROM PRODUCT_MAST;
30
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function determines the smallest value of all
selected values of a column.
Syntax
1. MIN()
2. or
3. MIN( [ALL|DISTINCT] expression )
Example:
1. SELECT MIN(RATE)
2. FROM PRODUCT_MAST;
Output:
10
Functional Dependency?
Functional Dependency (FD) determines the relation of one attribute to another attribute in a
database management system (DBMS) system. Functional dependency helps you to maintain the
quality of data in the database. A functional dependency is denoted by an arrow →. The functional
dependency of X on Y is represented by X → Y. Functional Dependency plays a vital role to find
the difference between good and bad database design.
Example:
In this example, if we know the value of Employee number, we can obtain Employee Name, city,
salary, etc. By this, we can say that the city, Employee Name, and salary are functionally
depended on Employee number.
In this example, maf_year and color are independent of each other but dependent on car_model.
In this example, these two columns are said to be multivalue dependent on car_model.
car_model-> colour
For example:
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional dependency.
Transitive dependency:
A transitive is a type of functional dependency which happens when t is indirectly formed by two
functional dependencies.
Example:
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we
can know his age.
Properties of Decomposition-
The following two properties must be followed when decomposing a given relation-
1. Lossless decomposition-
Lossless decomposition ensures-
No information is lost from the original relation during decomposition.
When the sub relations are joined back, the same relation is obtained that was decomposed.
Every decomposition must always be lossless.
2. Dependency Preservation-
Dependency preservation ensures-
None of the functional dependencies that holds on the original relation are lost.
The sub relations still hold or satisfy the functional dependencies of the original relation.
Types of Decomposition-
Decomposition of a relation can be completed in the following two ways-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
Noluralization up to 5 NF
NORMALIZATION is a database design technique that reduces data redundancy and eliminates
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules
divides larger tables into smaller tables and links them using relationships. The purpose of
Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third
Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal
Form.
1NF Example
It is clear that we can't move forward to make our simple database in 2nd Normalization form
unless we partition the table above.
Table 1
Table 2
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member
information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1.
Records can be uniquely identified in Table 1 using membership id
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
TABLE 1
Table 2
Table 3
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in
Table 3
Now our little example is at a level that cannot further be decomposed to attain higher normal
forms of normalization. In fact, it is already in higher normalization forms. Separate efforts for
moving into next levels of normalizing data are normally needed in complex databases. However,
we will be discussing next levels of normalizations in brief in the following.
Unit-5.
Selected Database Issues:
This whole set of operations can be called a transaction. Although I have shown you read, write
and update operations in the above example but the transaction can have operations like read,
write, insert, update, delete.
1. R(A);
2. A = A - 10000;
3. W(A);
4. R(B);
5. B = B + 10000;
6. W(B);
In the above transaction R refers to the Read operation and W refers to the write operation.
The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash etc.
This is a serious problem that can leave database in an inconsistent state. Assume that
transaction fail after third operation (see the example above) then the amount would be deducted
from your account but your friend will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.
Rollback: If any of the operation fails then rollback all the changes done by previous operations .
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the employees
whose salary is greater than or equal to 10000. For doing this, the following query is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, a query processing begins its working.
2. Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated relational
algebra expression with the instructions used for specifying and evaluating each operation. Thus, after
translating the user query, the system executes a query evaluation plan.
Query Optimization
Query: A query is a request for information from a database.
Query Plans: A query plan (or query execution plan) is an ordered set of steps used to access data
in a SQL relational database management system.
Query Optimization: A single query can be executed through different algorithms or re-written in
different forms and structures. Hence, the question of query optimization comes into the picture –
Which of these forms or pathways is the most optimal? The query optimizer attempts to determine
the most efficient way to execute a given query by considering the possible query plans.
Importance: The goal of query optimization is to reduce the system resources required to fulfill a
query, and ultimately provide the user with the correct result set faster.
First, it provides the user with faster results, which makes the application seem faster to the
user.
Secondly, it allows the system to service more queries in the same amount of time, because
each request takes less time than unoptimized queries.
Thirdly, query optimization ultimately reduces the amount of wear on the hardware (e.g. disk
drives), and allows the server to run more efficiently (e.g. lower power consumption, less
memory usage).
There are broadly two ways a query can be optimized:
1. Analyze and transform equivalent relational expressions: Try to minimize the tuple and
column counts of the intermediate and final query processes (discussed here).
2. Using different algorithms for each operation: These underlying algorithms determine how
tuples are accessed from the data structures they are stored in, indexing, hashing, data
retrieval and hence influence the number of disk and block accesses (discussed in query
processing).
Analyze and transform equivalent relational expressions
Concurrency Control?
Concurrency control is the procedure in DBMS for managing simultaneous operations without
conflicting with each another. Concurrent access is quite easy if all users are just reading data.
There is no way they can interfere with one another. Though for any practical database, would
have a mix of reading and WRITE operations and hence the concurrency is a challenge.
Concurrency control is used to address such conflicts which mostly occur with a multi-user
system. It helps you to make sure that database transactions are performed concurrently without
violating the data integrity of respective databases.
Therefore, concurrency control is a most important element for the proper functioning of a system
where two or multiple database transactions that require access to the same data, are executed
simultaneously.
Lock-Based Protocols
Two Phase
Timestamp-Based Protocols
Validation-Based Protocols
A. Lock-based Protocols
A lock is a data variable which is associated with a data item. This lock signifies that operations
that can be performed on the data item. Locks help synchronize access to the database items by
concurrent transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed only once
the lock request is granted.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a
lock is acquired on a data item to perform a write operation, it is called an exclusive lock.
A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared
between transactions. This is because you will never have permission to update data on the data
item.
For example, consider a case where two transactions are reading the account balance of a
person. The database will let them read by placing a shared lock. However, if another transaction
wants to update that account's balance, shared lock prevent it until the reading process is over.
With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be
held concurrently on the same data item. X-lock is requested using lock-x instruction. Transactions
may unlock the data item after finishing the 'write' operation.
For example, when a transaction needs to update the account balance of a person. You can
allows this transaction by placing X lock on it. Therefore, when the second transaction wants to
read or write, exclusive lock prevent this operation.
This type of lock-based protocols allows transactions to obtain a lock on every object before
beginning operation. Transactions may unlock the data item after finishing the 'write' operation.
4. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of required data items
which are needed to initiate an execution process. In the situation when all locks are granted, the
transaction executes. After that, all locks release when all of its operations are over.
This locking protocol divides the execution phase of a transaction into three different parts.
In the first phase, when the transaction begins to execute, it requires permission for the
locks it needs.
The second part is where the transaction obtains all the locks. When a transaction releases
its first lock, the third phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases
the acquired locks.
C. Timestamp-based Protocols
3. The timestamp-based algorithm uses a timestamp to serialize the execution of
concurrent transactions. This protocol ensures that every conflicting read and write
operations are executed in timestamp order. The protocol uses the System Time or
Logical Count as a Timestamp.
The older transaction is always given priority in this method. It uses system time to determine the
time stamp of the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when
they will execute. Timestamp-based protocols manage conflicts as soon as an operation is
created.
Example:
Advantages:
Schedules are serializable just like 2PL protocols
No waiting for the transaction, which eliminates the possibility of deadlocks!
Disadvantages:
From any failure set of circumstances, there are both voluntary and involuntary ways for both,
backing up of data and recovery. So, recovery techniques which are based on deferred update
and immediate update or backing up data can be used to stop loss in the database.
Crash recovery
Crash recovery is the operation through which the database is transferred back to a compatible
and operational condition. In DBMS, this is performed by rolling back insufficient transactions and
finishing perpetrated transactions that even now existed in memory when the crash took place.
With many transactions being implemented with each second shows that, DBMS may be a
tremendously complex system. The fundamental hardware of the system manages to sustain
robustness and stiffness of software which depends upon its complex design. It’s anticipated that
the system would go behind with some methodology or techniques to restore lost data when it fails
or crashes in between the transactions.
Classification of failure
The following points are the generalization of failure into various classifications, to examine
the source of a problem,
Storage structure
Volatile storage: A volatile storage cannot hold on crashes in the system. These
devices are located within reach of CPU. Examples of volatile storage are the main
memory and cache memory.
Non-volatile storage: A non-volatile storage are created to hold on crashes in the
system. These devices are enormous in the magnitude of data storage, but not quick
in approachability. Examples of non-volatile storage are hard-disks, magnetic tapes,
flash memory, and RAM.
To recover and also to sustain the transaction atomicity, there are two types of
methodology,
Sustaining each transaction logs and before actually improving the database put
them down onto some storage which is substantial.
Sustaining shadow paging, in which on a volatile memory the improvements are
completed and afterward, the real database is reformed.
Log-based Recovery
The log is an order of sequence of records, which sustains the operations record
accomplished by a transaction in the database. Before the specific changes and
improvements survive on a storage media which is stable and failing securely, it’s essential
that the logs area unit put down in storage.
Log-based recovery puts down a log regarding a transaction when a transaction begins to
be involved in the system and starts implementation.
The logs are interleaved, when multiple transactions are being implemented in collateral. It
would be difficult for the system of recovery to make an order of sequence of all logs again,
and then start recovering at the time of recovery. Most recent times Database systems use
the abstraction of 'checkpoints' to make this condition uncomplicated.
Checkpoint
The checkpoint is an established process where all the logs which are previously used are
clear out from the system and stored perpetually in a storage disk. Checkpoint mention a
point before which the DBMS was in a compatible state, and all the transactions were
perpetrated.