Database Management System
Database Management System
Module 1:
Database:
One of the primary aims of a database is to supply users with a summary of data, hiding the
process of manipulating and storing of the data.
Therefore, the starting point for the design of a database should be an abstract and general
description of the information needs of the organization that is to be represented in the
database.
And hence you will require an environment to store data and make it work as a database.
Database environment:
The figure is divided into two halves, the top half of the figure refers to the various users of
the database environment.
The lower half shows the internals of the DBMS responsible for storage of data and
processing of transactions.
The database and DBMS catalogue are stored in a disk. Access to the disk is obtained by the
operating system.
The figure shows the interfaces for the DBA staff, casual users who work with queries,
application programmers who programs using some host languages and at last parametric
users who do data entry work.
The DDL compiler process schema definitions specified in the DDL and store description of
the schemas in the DBMS catalogue.
The catalogue include information such as name and size of files, datatype of data items and
storage details of each file.
Casual users and persons with occasional need for information from the database interact
using some form of interface called interacting query interface.
These queries are parsed and analysed by the query complier that compile them into an
internal form.
The internal query is subjected to query optimizer which eliminates redundancies and
rearranges the query.
Application programmers write programs in host languages such as java, c, etc. and are
submitted to pre-complier.
The pre-compiler extract DML commands from the application programs and are sent to
DML compiler for compilation into object code for database access. The rest of the program
is sent to host language compiler.
The object code and the program sent to the host language compiler are linked together to
form a canned transactions.
The canned transactions are predefined set of operations which has the executable code.
The executable code has the call for the run time database processor to execute privileged
commands, query with run time parameters.
A concurrency control is integrated to the run time data processor for back up recovery
before it is stored.
Database users:
Database administrator:
In a database environment the primary resources is the database itself and the secondary
resource is the DBMS and its related software. Administering these resources is the
responsibility of the DBA.
DBA is responsible for:
1. Authorizing access to the database ,
2. Coordinating and monitoring the data use and
3. Acquiring software and hardware resources as needed.
Database designers:
Database designers are responsible for identifying data to be stored in the database and to
represent and store data.
End users:
End users are the people whose job require access to the database for querying, updating
and generating reports.
Native users:
Their main function is querying and updating the database using standard type of query.
Sophisticated users:
These users include engineers, scientists, and business analysts etc. who have deep
knowledge with the facilities of the DBMS in order to meet their complex requirements.
Characteristics of database:
3-schema architecture:
2. Conceptual level:
The conceptual schema describes the design of a database at the conceptual
level. It is also known as logical level.
The conceptual schema describes the structure of the whole database.
It describes what data are to be stored in the database and the relationship
among those data.
The internal details such as implementation are hidden, DBA and
programmers work at this level.
3. External level:
An external schema is also known as view schema.
Each view schema describes the database part that the user is interested in
and hides the remaining database.
DBMS languages:
Entity:
Entity type:
Strong entity type:
It is an entity that has its own existence and is independent. It is not dependent on any
other entity.
The relationship between a strong and a weak entity type is known as an identifying
relationship.
Using a double diamond, the Entity-Relationship Diagram represents a relationship between
the strong and the weak entity type.
Attributes:
Each entity has attributes which describes the properties or characteristics of the entity.
For example, an employee entity can be described by employee name, age, address, salary
and job etc.
There are several types of attributes in the DBMS, they are:
1. Simple attributes
2. Composite attributes
3. Single and multivalued attributes
4. Stored attributes
5. Derived attributes
Composite attributes are those attributes which can be divided into smaller sub parts, for
example address attribute of the employee can be divided into district, country, pin code
number.
Attributes that are not divisible are called simple or atomic attributes.
Most attributes have single value for a particular entity, such attributes are called single
valued. For example age is a single valued attribute.
An attribute that have multiple values are called multivalued attributes, for example, phone
number.
In most cases attribute values are related, example: age and date of birth of a person. The
age attribute is a derived attribute because it can be determined from current date and the
person’s date of birth.
The date of birth of a person is unique and constant, which is called a stored attribute.
Relation and tuples:
ER diagram:
Relational calculus:
Primary key:
Primary key is the column or columns that contain values that uniquely identify each row in
a table.
They serve as unique identifiers for each row in a table.
Foreign key:
Foreign keys link data in one table to the data in another table which has the primary key.
Super key:
A super key in DBMS is a set of one or more attributes that uniquely identifies a row in a
table.
Integrity constraints:
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constrains ensure that the data insertion, updating, and other process have to be
performed in such a way that data integrity is not affected.
(Note: data integrity is the process of ensuring that data is accurate, consistent and
complete)
Domain constraints:
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date etc.
Module 3:
DDL commands:
Create:
This command is used to create a new table in SQL
Create table employee(empid int, empno int, address varchar(10));
Alter:
There are three uses of the command
1. To add new columns:
alter table employee add(empid, empno, address);
2. To remove any column from the table:
alter table employee drop column(age);
3. To modify the data type of existing table:
alter table employee modify(age int);
Drop:
Drop command is used to destroy the created table. After the execution of this command,
the table along all of its data will be destroyed.
Drop table table_name;
DML commands:
Insert:
This command is used to insert values to the table. The values must be inserted in order in
which the columns are created.
Insert into employee values(value1, value2, value3);
Select:
It is used for viewing or retrieving data from the table. It has the clauses: select, from,
where.
Select sname from student where id=100;
Delete:
The delete command is used to remove records or tuples from the tables based on a
condition.
Delete from employee where ename ‘v%’;
Update:
It is used to modify the values of one or more selected tuples depending on a condition.
Update employee set ename=’biju’ where empno=3;
DCL commands:
Grant:
Assigns new privileges to a user account allowing access to specific database objects or
actions.
Revoke:
Removes previously granted privileges from a user account, taking away their access to
certain database object or actions.
Substring comparison:
Like operator:
It is used in ‘where’ clause to search for a specific pattern. Two wild characters used in
conjunction with LIKE operators, which are % and _.
% is used for more than one characters and _ is used for a single character.
Different cases:
1. LIKE a% specifies the strings which starts with the letter ‘a’.
2. LIKE %a for the strings which ends with the letter ‘a’.
3. LIKE %a% specifies the strings which have the letter ‘a’ in the middle.
4. LIKE a _ _ for the strings which have 3 characters starting with the letter ‘a’.
5. LIKE _ a% which have second letter as ‘a’.
For example:
Select name from employee where name LIKE ‘%ab’;
Between operator:
It uses the clause ‘between’.
For example:
Select name, salary, from employee where salary BETWEEN 4000 and 5000;
Set operations allow the result of multiple query into combined single result set.
Set operations include operators such as:
1. Union – it combines the result of two SQL queries into a single table. It eliminates
duplicate rows from the result set.
Select column from first table;
Union
Select column from second table;
2. Intersect – it combines the result of two SQL queries and return the common rows
from both.
Select column from first table;
Intersect
Select column from second table;
3. Except – also called minus operator. It is used to combine two SQL queries and
return the tuple from the first table that are not present in the second table.
Select column from first table;
Minus
Select column from second table;
Nested query:
Nested query is a query which is written inside another query. The result of inner query is
used in the execution of outer query.
It has two types:
1. Independent nested query: Execution starts from inner query to outer query.
Execution of the inner query is independent of the outer query.
SQL join:
A ‘join’ clause is used to combine rows of two or more tables based on a related column
between them.
Four types of SQL join:
1. Inner join: it returns records that have matching values in both the tables.
2. Left outer join: it returns all records from the left table and the matched records
from the right table.
3. Right outer join: it returns all records from the right table of the matched records
from the left table.
4. Full outer join: it returns all records when there is a match in either left or right table.
Aggregate functions:
Group by clause:
It is used with aggregate functions to provide summary reports from the database and
group the elements based on the attribute.
For example: select count(*), avg(salary) from employee group by emp_no;
Having clause:
Having clause is used along with group by clause to group tuples based on certain
conditions.
For example: select count(*), avg(salary) from employee group by emp_no having
salary>5000;
Order by clause:
SQL allows us to order the tuples in the result of a query. For ordering the result of a query
we can use order by clause.
Query results can be ordered either in ascending or descending order. The default form of
ordering is ascending order.
For example: select salary from employee where e_name=’abc’ order by salary asc;
View in SQL:
A view is a single table derived from other real tables. The other real table can be base
tables or previously defined views.
A view does not exist in physical form, it is considered to be a virtual table.
We can create view tables using view command.
That is for example:
Create view employee as select id, name from department;
Module 4:
Functional dependency:
A functional dependency (FD) is a constraint between two set of attributes from the
database.
FD is a relationship that exists when one attribute uniquely determines another attribute.
If R is a relation with the attributes X and Y such that, a FD between these attributes is
represented as X->Y. it specifies that Y is functionally dependent on X.
The set of attribute of X is called the left hand side of FD and Y is called the right hand side of
FD.
There are 4 informal guidelines that may be used as a measure to determine the quality of
relational schema:
1. Making sure that the semantics of the attribute is clear in the schema
2. Reducing the redundant information in tuples
3. Reducing the NULL values in tuples
4. Disallowing the possibility of generating spurious tuples
(Note: spurious tuple is a record in a database that is created when two tables are joined
incorrectly)
Making sure the semantics is clear:
Whenever we form a relational schema, there should contain some meaning among the
attributes. That is, the attributes belonging to one relation must have certain real world
meaning and a proper interpretation.
Reducing the redundant information:
If the information is stored redundantly, it leaves to the wastage of spaces. This problem is
called update anomalies.
Disallowing the possibility of spurious tuples:
Spurious tuple is a record in a database that is created when two tables are joined
incorrectly.
Normalization:
Indexing in DBMS:
Indexes are used to retrieve the data from the record efficiently. It is same as the index
shown in the front of text books.
Index structure is usually defined on a single filed of a file called indexing field.
Several types of indexing available are:
1. Primary indexes
2. Clustering indexes
3. Secondary indexes
Primary indexes:
If the index is created on basis of the primary key of the table then it is known as
primary indexing.
The index has two fields. The first field specifies the primary key of the first record in each
block and the second field specifies the pointer to that record.
The data is stored in blocks. Index contains one index entry for each block in the file.
The primary index can be classified into two types:
1. Dense index: in this, the number of records in the index table is the same as the
number of records in the main table.
2. Sparse index: in this, instead of pointing to each record in the main table, the index
points to some of the records in the main table.
Clustering index:
In clustering index, the record are ordered on a non-key field.
Then that non-key field is called clustering field and the data file in contains the
clustering field is called clustering file.
Clustering index also has two fields. The first field is the value of the clustering field and
the second field is the pointer to that field.
Clustering index is an example of sparse indexing.
Secondary index:
Secondary index provides secondary means of accessing the data file if it has some
primary access.
Secondary index file contains two fields. The first field is the secondary key value and the
second field is the pointer to that field.
When we create a view, the view does not store any data by default instead it queries the
base table and gets the data.
But we can change this default behaviour in the SQL server. For that we need to create an
index on view, this view is called indexed view.
Module 5:
Transactions:
A transaction includes one or more database access operations, these include insertion,
deletion, modification or retrieval operations.
Database access operations:
Read-item(X): reads a database item named X into a program.
Write item(X): write the value of variable X into the database.
Why is concurrency control needed?
Transaction concepts:
Operations of transaction:
1. BEGIN-TRANSACTION – this marks the beginning of transaction operations.
2. READ or WRITE – it specify the read or write operation.
3. END-TRANSACTION – this marks the end of the transaction.
4. COMMIT-TRANSACTION – this signals a successful end of transaction operation.
5. ROLLBACK or ABORT – this signals an unsuccessful end of a transaction operation.
Transaction states:
A transaction enters into active state immediately after it starts execution, where it can
execute READ or WRITE operation.
When the transaction ends, it moves to partially committed state. At this point, the system
checks whether the operations are executed successfully.
If the transaction is successful it reaches the commit point and enters the committed state.
When the transaction is committed, it concludes its execution successfully and enters in
terminated state.
However a transaction can fall into a failed state, if its operation are executed
unsuccessfully if the transaction is aborted in between.
Properties of transaction:
ACID properties:
Atomicity:
A transaction is an atomic processing. It should either be performed in its entirety or not
performed at all.
It is the responsibility of the transaction recovery subsystem to ensure atomicity.
Consistency preservation:
A transaction should be consistency preserving. That is, it should be completely executed
from beginning to end without interference from other transactions.
It is the responsibility of the database programmers who write programmes to maintain the
database in consistent state.
Isolation:
A transaction should be isolated from other transactions. That is, the execution of a
transaction should not be interfered with any other transactions executing concurrently .
Durability or permanency:
It states that the changes applied to the database by a committed transaction must persist
in the database.
It is the responsibility of transaction recovery subsystem of DBMS.
Database security:
Control measures:
There are 4 control measures used to provide the security of data in databases.
1. Access control
2. Inference control
3. Flow control
4. Data encryption
Access control:
This security mechanism include provisions for restricting access to the database system.
This function is called access control. Access control is done by creating user accounts and
passwords to control the login process by the DBMS.
DBA is responsible for creating new user accounts and passwords for various users.
Inference Control:
It is a control measure used in statistical databases. It allows query access only to aggregate
data and not individual records.
That is, it ensures that information about the individuals cannot be accessed.
Flow control:
It prevents the information from flowing in such a way that it cannot reaches to
unauthorized users.
Data encryption:
The data is encoded using some coding algorithms, so that only authorized user can access
the coded data through decryption.
Database audits:
To keep a record of all updates applied to the databases by a particular user, we keep the
system log.
We do log entries so that they also include account number of the user and the device Id of
the user are recorded in the log. If any tampering with the database is suspected, a
database audit is performed.
Database audit consists of reviewing the log entries to examine all database access
operations when an illegal or unauthorized operation is found
The DBA can determine the unauthorized account number and device ID which is used to
perform that operation.
A database log, that is mainly used for the security purpose is called an audit trail.
Granting privileges:
‘GRANT’ option is used by the DBA to grant privileges to a user account.
Syntax: grant privilege_name on table_name to user_name;
Different cases of granting:
(user name = amen, table name = employee)
1. To creation of table:
grant createtab to amen;
Revoking privileges:
‘REVOKE’ command is used by the DBA to cancels the privileges from the user account.
Syntax: revoke privilege_name on table_name from user_name;
Different cases of revoking:
(user name = amen, table name = employee)
1. To cancel the select privilege:
revoke select on employee from amen;
DAC is an access control model that allows relation owners to determine who can access the
relation and what level of access they have.
The privileges at this level are creation of tables, creating view tables, select
privilege, altering privileges such as, add, drop and modify.