Rdbms III Sem
Rdbms III Sem
RELATIONAL
DATABASE
MANAGEMENT SYSTEM
BCOM CA I YEAR II SEMESTER
T.SRIKANTH LECT. IN COMPUTERS
2018
UNIT-1
DATABASE CONCEPTS
Data:
Data is nothing but a collection of raw facts of a person or a thing.
Ex : collection of numbers , characters and text.
Information:
Information is nothing but from which the data can be derived. The processed
data is also known as information.
Meta Data:
Meta data is data about data. (Or) metadata is data that describes other data.
It tells us how, when and by whom the data is modified.
Data Base:
Data base is an organized collection of data.
(Or)
Data base is a collection of inter-related information which is stored in a well
arranged manner.
The aim of the database is to organize huge amount of information in
an efficient manner.
DBMS:
Database management system is a collection of inter-related data and a set of
programs, which are stored, processed and retrieved (accessed).
i) Ms-Access.
ii) Oracle.
iii) Sybase.
iv) Db2.
v) Tera data
vi) SQL server.
vii) Fox pro.
Relational Database:
Relational data base is an organized collection of logically related data.
(Or)
A relational database is a database that stores information about both the data
and its relation. In relational database, the data is organized in the form of tables.
RDBMS:-
A relational database management system is a database management
system that is based on the relational model introduced by E.F.Codd.
RDBMS is used for storage of information in new database used for financial
records, manufacturing and personnel data. Relational databases are easy to
understand and use. Many popular databases are based on the relational database
model.
Hierarchical XML
Object Oriented
1960’s:
Databases were first introduced in 1960’s. In mid-1960’s there were a number
of general purpose database systems. Data was stored in magnetic tapes.
1970’s:
The Hierarchical and Network data models were developed for invoices & bills
of different goods.
1980’s:
Edgar F. Codd proposed a new model called Relational data model. A new
database was developed called as Relational database.
1990’s:
In 1990’s Object oriented & Object relational databases were developed. Data
in the database is treated as object.
Today:
In this decade, non-relational databases are used like XML database.
DATA MODELS:
(Or)
B name
C name
B id
Cid B address
C address
borrows from
CUSTOMER LOAN BANK
B Mobile B branch
Salary
Mobile no
CUSTOMER
Cust_no
Cust _name
Cust_address
Methods()
3) Relational model :-
The relation model uses a collection of tables to represent both the
data and relationships. In this data, each table has multiple columns and each
column has a unique relation. Here, relation refers to a two-dimension table
containing rows and columns of the data.
CUSTOMER TABLE
Cust – no Cust – name Cust – address
4) Hierarchical model:-
Data in hierarchical model is represented as collection of records and
relationships among data are represented as links.
CUSTOMER
5) NETWORK MODEL:-
Data in the network model is represented as collection of records and
relationship among data is represented by lines. The records in the database
are organized as collection of arbitrary graphs.
DBA is the person who is responsible for complete maintenance and effective
working of the database. DBA is also responsible for managing databases.
ROLES OF DBA:
The following are the roles of DBA.
Database design:
Design is of two types
1) Conceptual design
Conceptual data base design consists of data definitions, relationship
between data.
2) Physical design
Physical database design determines the physical structure of
database and access methods.
User Training:
The DBA is responsible for providing guide lines to the user. Training
sessions, user manuals and help line centers are provided to user about details of
database creation.
Database Security and Integrity:
DBA controls data access by using authentication and authorization.
--Authentication is the process of checking whether the user is valid or not
--Authorization is the process of verifying the permissions of a valid user.
The DBA is responsible for giving passwords & controlling privileges.
Data Integrity means the problems of maintaining the accuracy and consistency of
data values.
Database System Performance:
The DBA should include technical people to identify and solve system
response-time problems. The DBA may maintain redundant/duplicate copies of data
at different locations/places to improve the system performance.
Functions of DBA:
The following are the functions of a DBA:
1. Schema Definition:-
The DBA creates database schema by executing DDL statements.
Schema includes the logical structure of database.
2. Data Definition:-
The DBMS provides functions to define the structure of the data in
application. It includes defining and modifying record structure, type and size
of fields.
3. Data Manipulation:-
Once data structure is defined, data needs to be inserted, modified or
deleted. The functions which perform these operations can handle planned and
unplanned data manipulation needs.
4. Granting authorization for data access:-
The DBA provides different access rights to the users according to their level.
5. Routine Maintenance:-
Some of the routine maintenance activities of a DBA are given below.
Taking back up of database.
Ensuring enough disk space.
Ensure that performance is not degraded.
Performance tuning.
6. Security:-
The DBA focuses on security methods to prevent unauthorized users
from accessing the database.
7. Backup and Recovery:-
Backup and recovery procedures are tested regularly to maintain
effectiveness in restoring database after failure.
8. Performance Evaluation:-
System performance is monitored by collecting statistics on transaction
volume, response time, error rates.
9. Integrity Checking:-
Schedules have been developed for testing the integrity of the data
stored in the database.
DATABASE APPROCH:
Database approach is a way in which data is stored in the database.
Sharability:
Sharability means the same data can be used at the same time for different
processes by different people.
Availability:
Availability means data should be made available when and where it is
needed and also in different formats.
Data Independence:
Data independence means the ability to modify a schema definition in one
level without effecting the schema definition in the next level. Schema means the
overall design of the database.
There are two levels of data independence.
--Physical Data Independence:
The ability to modify the physical schema without rewriting the
application programs.
--Logical Data Independence:
The ability to modify the conceptual schema without rewriting the
application programs.
Data Integrity:
The DBMS enforces integrity rules to minimize data redundancy and
maximize data consistency. The data relationships are used to enforce data integrity.
-Entity:-
An entity is a real world object. An entity is represented by a rectangle. An
entity can be a person, place or object.
Ex: A human being, computer, bike.
-Entity set:-
A collection of similar entities is called entity set or entity type.
Ex: All employees of an organization EMP.
All departments of a university DEPT.
Types of Entities:-
--Weak Entity:-
Weak entity is an entity that depends on other entity or entities.
The weak entity is shown in double rectangle box.
--Strong Entity:-
A strong entity is an entity that does not depend on other entity
or entities.
FATHER
FAMILY
-Attribute:-
Types of Attributes:-
--Simple attribute:-
If an attribute cannot be divided into simpler components, it is a simple
attribute.
Ex:- EmpNo, StdNo
--Composite attribute:-
If an attribute can be divided into components, it is called a composite
attribute.
Ex:- Name of student can be divided into First-name and Last-name
--Single-Valued attribute:-
If an attribute can take only a single value, it is a single valued attribute.
Ex:- age of a student
--Multi-Valued attribute:-
If an attribute can take more than one value, it is a multi-valued
attribute. Multi-valued attribute is represented with double ellipse.
Ex:- Telephone number (Mobile number, Landline)
--Derived attribute:-
An attribute which is calculated or derived from other attributes is called
a derived attribute. Derived attribute is represented with dashed ellipse.
Ex:- Total and average are calculated on marks of the student.
Relationship:
A relationship is an association of two or more entities. A meaningful
relationship is called relationship set or relationship type.
Ex:- Working is a relationship between EMP and DEPT
Degree of Relationship:-
Relationship degree means the number of entities in a relationship.
Relationships can be classified by their degree.
Unary Relationship:-
In unary relationship, there is only one entity.
ENTITY
Binary Relationship:-
In binary relationship, two entities are involved.
Relation
ENTITY -ship
ENTITY
Ternary Relationship:-
In ternary relationship, three entities are involved.
Relation
ENTITY -ship
ENTITY
ENTITY
n-ary Relationship:-
In this relationship, four or more entities are involved.
ENTITY
Relatio
ENTITY n-ship
ENTITY
ENTITY
--Key:-
It is an attribute used to identify data in a relation.
--Simple Key:-
A simple key contains a single attribute in a relation.
--Composite Key:-
It contains more than one attribute.
--Super Key:-
It is a set of attributes of a relation variable for which in all relations assigned
to that variable, there are no two distinct tuples that have the same values for the
attributes in this set.
--Candidate Key:-
It is an attribute that uniquely identifies a row.
--Primary Key:-
A primary key is the candidate key which is selected as the principal unique
identifier.
Ex:- stdno in student relation
--Foreign Key:-
It is an attribute that is a non key attribute in one relation and as a primary key
attribute in another relation.
Ex:- In EMP table, ‘dept-no’ is a foreign key which is already a primary
key in DEPT table.
--Alternate Key:-
It is a candidate key which is not selected to be the primary key.
Ex:- If user forgets his registration number in an application, name &
date of birth is an alternate key.
UNIT-2
DATABASE INTEGRITY AND NORMALIZATION
INTEGRITY CONSTRAINTS:
Integrity constraint ensures that changes made to the database by authorized
users do not result in loss of data consistency. A relational data model includes the
following constraints.
--Domain constraint
--Entity Integrity
--Referential Integrity
--Operational constraints
Domain Constraint:-
Domain is a set of atomic data values with unique data type in relational
database. Domain constraint is used to validate data when Insert or Update
statement is executed in the relational database.
Ex:- Not Null, Default and Check are domain integrity constraints.
--Not Null Constraint:-
NULL in SQL indicates data which does not exist in the database.
NULL concept was introduced by E.F.Codd to represent missing data in the
relational database model.
By default, all columns can store with NULL data value. NOT NULL
constraint is used to define the column that needs data value.
--Default Constraint:-
Default constraint is used to define a default number, string, date in the
mandatory column. Default data must match with data type and range of the
mandatory column.
--Check Constraint:-
Check constraint is used to define a logical expression that valid data
when Insert or Update data is performed in a table of relational database.
Entity Integrity:-
An entity is an object in the real world. Set of attributes whose values uniquely
identify a record of relation is called key. Each relation must have one key. This key
can be either primary key or unique key.
NORMALIZATION:
Normalization is a process of decomposing the relations with anomalies
to produce smaller and well-structured relations.
(or)
Normalization is a process of evaluating table structures to minimize
data redundancy and reduce data anomalies.
Remove Partial
dependency
SECOND NORMAL
FORM (2NF)
Remove Transitive
dependency
Remove any
remaining
FIFTH NORMAL FORM anomalies
(5NF)
Functional Dependency:
A functional dependency is a constraint between two or more attributes
in which the value of one attribute is determined by the value of another
attribute.
Partial dependency:
A partial dependency is a functional dependency in which one or more
non key attribut5es are functionally dependent on part of the primary key.
Transitive dependency:
A transitive dependency is a functional dependency that exists between
two or more non key attributes.
Multi valued dependency:
It is a dependency that exists when there are at least 3 attributes
(A,B,&C) in a relation with a well defined set of B & C values for each ‘A’
value. Here B & C values are independent of each other.
Normalization works through a series of stages. Each stage in a
normalization process is known as a normal form. A normalization process
consists of 6 normal forms.
1) First Normal Form (1NF)
PRIMARY KEY
SNO COURSE
LANGUAGE
SNAME FEE
KNOWN
PARTIAL DEPENDENCY
STUDENT
STUDENT-LANG
PRODUCT
C-ID C-NAME
P-ID P-DESC P-PRICE P-QTY C-CITY
c-idc-name, c-city
For the above relation, c-name, c-city are dependent on c-id
which is a non key attribute. This is a transitive dependency. There is a
3 step process to convert a relation into 3NF.
Step1:
Create a new relation for each determinant in the table. This
determinant will act as a primary key for new relation.
Step2:
Move the attributes that are dependent on this determinant from old
relation to new relation.
Step3:
Leave the attribute (a determinant) which is a non key attribute) in old
relation to serve as a foreign key.
As per these steps, the PRODUCT relation can be decomposed into 2
relations, PRODUCT, CUSTOMER as below:
PRODUCT
P-QTY
P-ID P_DESC P-PRICE C-ID
CUSTOMER
PARTIAL DEPENDENCY
The table contains partial dependency. It can be removed by
decomposing the table into 2 tables as shown below:
TIME TABLE
FACULTY
FACULTY SUBJECT
After removing the multiple values for attributes the relation becomes as
below:
COURSE FACULTY BOOK
COURSE FACULTY
FACULTY BOOK
FILE ORGANIZATION:
File organization is a way of arranging files on the disk and access method
tells how data is retrieved based on file organization.
File organization is the arrangement of files on the disk. It includes the way
how records and blocks are placed on storage medium. File organization is the
method of accessing and retrieving the record from the database.
A DBMS supports several file organization techniques. The important task of
DBA is to choose a good organization for each file based on its type of use.
Seven organization models are used to access file from the secondary
memory device with various techniques.
1. Sequential
2. Index-sequential
3. Hashed/Direct
4. Multi-key
5. Multi-list
6. Inverted
7. Heap
1. Sequential File Organization:
In sequential file organization, records are arranged sequentially. A
magnetic tape file such as printer can only have a sequential organization. A
sequentially organized file may be stored on either a serial-access or direct
access storage medium.
The task of file handling is the responsibility of the system software known
as Input-Output Control System (IOCS). Block is used to group a number of
consecutive records. IOCS take scare of blocking.
Beginning of file End of file
Name$
Record 3
ii) It is very useful when a random access of records by specifying the key is
required.
iii) Updating is easily done.
Disadvantages:-
i) Performance decreases as file grows.
ii) Must follow index structure to locate data.
iii) Expensive system accessories.
Area of Use:-
It supports applications that access individual records rather than searching
through the entire collection in sequence.
Ex: Reservation Enquiry system
3. Hashed/Direct Access File Organization:
Direct file organization also referred as random or relative organization. Files
in this type of organization are stored in a direct access storage device (DASD) like
magnetic disk using an identifying key. This identifying key gives the actual storage
position of record in the file. The system can directly locate the key to find the
desired record without searching any other records.
1 Rec001
2 Free
3 Rec003
4 Rec004
Free
5
Relative Free
Record 6
Rec007
Number 7
325 Rec325
Rec326
326
Free
327
Rec328
328
It is used in online systems where rapid response and fast updating are
important compared to sequential access file organization. To access a record, the
CPU can go directly to the desired record without searching all the other records in
the file. Direct access is used where file activity is low.
Advantages:-
i) Immediate access to records for updating is possible.
ii) Transactions need not be stored.
iii) Different discs are not required for updating records.
iv) Random inquiries in business situations can be easily handled.
Disadvantages:-
i) Data may be accidentally erased or over-written.
ii) Less efficient in the use of storage space.
iii) Expensive hardware and software are required.
iv) System design is complex and costly.
v) File updating is more difficult compared to sequential files.
Area of Use:-
A direct file organization is suitable for interactive online applications such as
airline and railway reservation systems.
4. Multi-Key File Organization:
Multi key file organization is used to access the records by using more than
one key. It allows multiple access paths each having a dissimilar key. There are
several methods used to execute multi key file organization. Most of these
methods are based on building indexes to provide direct access by the key value.
Two of the common methods for this organization are:
Multi-list file organization
Inverted file organization
5. Multi-List File Organization:
Multi list file organization is a multi-index linked file organization. In a multi list
organization, indexes are defined on the multiple fields that frequently used to
search the record.
In the following diagram, one index has been defined on the field Book ID and
another on Category.
Heap file organization is the simplest file organization. Here records are
inserted at the end of the file. Once the data block is full, the next record is stored in
the new block. This method can select any block in the memory to store the new
records.
When a new record has to be retrieved, we need to search from beginning of
the file. Similarly, if we want to delete or update a record, we need to search for the
record from the beginning of the file and we can delete and update the record.
If a new record is inserted then in above diagram it will be inserted into the
data block1.
Advantages:
1. Very good method for bulk insertion.
2. Fetching of records is faster in small files.
Disadvantages:
1. It is not suitable for larger files.
2. No proper memory management.
UNIT-3
SQL-STRUCTURED QUERY LANGUAGE
SQL stands for “Structured Query Language”. It is a query language used for
accessing and modifying information in the database. IBM first developed SQL in
1970’s.
In a simple manner, SQL is a non-procedural, English-like language that
processes data in groups of records rather than one record at a time. Few functions
of SQL are:
Store data
Modify data
Retrieve data
Delete data
Create tables and other database objects
SQL Environment:
Catalog:-
A set of schemas that describes database is called a catalog.
Schema:-
It is a structure that contains description of objects.
Data Definition Language (DDL):-
Commands that define database which includes creating, altering and
dropping tables.
Data Manipulation Language (DML):-
Commands that maintains and query a database.
Data Control Language (DCL):-
Commands that controls a database including, administrative privileges and
committing data.
Role of SQL in Database Architecture:-
The Oracle RDBMS is available on many operating system platforms.
Oracle is a relational DBMS- even the data dictionary is simply a collection
of tables of data along with indexes and other objects.
SQL has a basic grammar and syntax.
The functionality of SQL language is same on these operating system
platforms.
Using SQL does not require programming experience.
DATABASE SCHEMA:
In a relational database, the schema defines tables, fields in each table and
relationships between fields and tables. Schemas are generally stored in a data
dictionary.
Levels of database schema:-
1. Conceptual schema: a map of concepts and their relationships.
2. Logical schema: a map of entities, attributes and relations.
3. Physical schema: a particular implementation of a logical schema.
4. Schema object, Oracle database object.
5. Schema is the overall structure of the database
Each Data Schema includes:-
o A list of variables with description, definition and format.
o A list of domains with definition.
o A list of themes and modules with definitions.
SQL TABLES:
In relational databases, a table is a set of values that is organized using a
model of vertical columns and horizontal rows. A table has a specified number of
columns but can have any number of rows.
In database terms, a table is responsible for storing data in the database.
Database tables consist of rows and columns. Row contains each record in the table
and column is responsible for defining the type of data.
VARCHAR2:-
a. A replacement for VARCHAR in new versions of oracle.
b. VARCHAR can store up to 2000 bytes of characters while VARCHAR2
can store up to 4000 bytes of characters.
c. If VARCHAR is used in declaration then it will occupy space for NULL
values. In case of VARCHAR2 data type, it will not occupy any space.
SQL COMMANDS:
SQL commands are instructions used to communicate with the database to
perform specific task that work with data. SQL commands help user to interact with
SQL applications.
SQL commands are grouped into 4 major categories depending on the
functionality:
Data Definition Language (DDL) Commands:-
These SQL commands are used for creating, modifying and dropping
the structure of database objects. The commands are CREATE, ALTER,
DROP, RENAME and TRUNCATE.
Data Manipulation Language (DML) Commands:-
These SQL commands are used for storing, retrieving, modifying and
deleting data. These commands are SELECT, INSERT, UPDATE and
DELETE.
Data Control Language (DCL) Commands:-
These SQL commands are used for providing security to database
objects. These commands are GRANT and REVOKE.
Transaction Control Language (TCL) Commands:-
These SQL commands are used for managing changes affecting the
data. These commands are COMMIT, ROLLBACK and SAVEPOINT.
SQL COMMANDS
DDL Commands:-
DDL Commands are used to create, modify and delete the structure of object
in the database. The syntax of DDL commands always includes table keyword after
the command name.
Create
Describe/desc
Alter
Drop
Truncate
Rename
CREATE TABLE Command:-
The CREATE TABLE statement is used to create tables to store data. It is
used to create structure of the table. Integrity constraints like primary key, unique key
& foreign key can be defined for the columns while creating table.
Syntax:
CREATE TABLE <table name> (column1 data type(size), column2
data type(size),……..);
--table name is the name of the table.
--column1, column2, …… are column names.
--data type is the data type of columns like char, date, number etc.
Ex:
SQL>CREATE TABLE emp(id number(5), name char(20), dept
char(10), sal number(10));
SQL> Table Created
DESCRIBE/DESC TABLE Command:-
DESCRIBE TABLE is used to list all the columns in the specified table or
view. It displays one row per table column containing:
Column
Type
Nullable
Primary Key
Syntax:
DESC <table name>;
EX:
SQL>DESC std;
ALTER TABLE Command:-
ALTER TABLE command is used to modify the structure of a table by
modifying the definition of its columns. It is used to perform following functions:
1. Add, drop, modify table/columns.
2. Add and drop constraints.
3. Enable and disable constraints.
Syntax:
ALTER TABLE table_name ADD(new column data type(size));
ALTER TABLE table_name MODIFY(new column data type(size));
Syntax to add a column:
ALTER TABLE table_name ADD column_name data type(size);
Ex:
SQL> ALTER TABLE emp ADD experience number(3);
SQL> Table altered
Syntax to drop a column:
ALTER TABLE table_name DROP column_name;
Ex:
SQL> ALTER TABLE emp DROP location;
Syntax:
TRUNCATE TABLE table_name;
Ex:
SQL> TRUNCATE TABLE emp;
Differences between DROP and TRUNCATE:-
If a table is dropped, all the relationships with other tables are not valid, the
integrity constraints are dropped. If we want to use the table again, we have to
create it again with the integrity constraints and relationships with other tables should
be established.
But if a table is truncated, the table structure remains same, so the problems of
the drop are not there in truncate.
RENAME Command:-
RENAME command is used to change the name of a table.
Syntax:
RENAME TABLE <old table name> TO <new table name>;
Ex:
SQL> RENAME TABLE emp TO employ;
SQL> Table Renamed
DML COMMANDS:
DML commands allows user to access the table and insert, modify and delete
data in the table. These commands are only used on data of the table but does not
involve with structure of the table.
The DML commands are:
Insert
Select
Update
Delete
INSERT Command:-
This command is used to enter and store the values in the database. The data
values given by the user in insert statement should match with the data type
declared for the selected column in the table. SQL insert command is implemented
in 3 methods.
Syntax:
SELECT column-list FROM table-name
[WHERE clause]
[GROUP BY clause]
[HAVING clause]
[ORDER BY clause]
--WHERE clause is used to give conditions and retrieve only selected records.
--GROUP BY clause is used to collect data from multiple records and group results
by one or more columns.
--HAVING clause is used with GROUP BY clause to filter records given by GROUP
BY clause.
--ORDER BY clause is used to sort the result set by a specified column either in
ascending or descending order.
ASELECT statement is used to display the records in multiple methods as:
All rows and all columns
Selected columns and all rows
Selected rows with all columns
Selected rows with selected columns
Ex:
SQL> SELECT * FROM std;
SQL> SELECT sno, sname, dob FROM std;
SQL> SELECT * FROM std WHERE sno=3;
UPDATE Command:-
It is used to edit or update the data based on conditions for selected record or
field. Update command is implemented with SET keyword which changes previous
values with current values.
Syntax:
UPDATE <table-name> SET column-name1=value1, column-
name2=value2,…. [WHERE condition];
Ex:
SQL> UPDATE std SET name=’akshay’ WHERE sno=3;
DELETE Command:-
DELETE command is used to delete records from the table either single
record or multiple records based on a condition using WHERE clause.
Syntax:
DELETE FROM table-name [WHERE condition];
Ex:
SQL>DELETE FROM std WHERE name=’priya’;
SQL> DELETE FROM std;
This command deletes all the rows from the table if no condition is specified.
DCL COMMANDS:
DCL commands allow database administrators to configure security access to
relational databases. Two types of DCL commands are GRANT and REVOKE. Only
database administrator or owner of database object can provide/remove privileges
on database object.
GRANT Command:-
GRANT is a command used to provide access or privileges on
database objects to users.
Syntax:
GRANT privilege-name ON object-name TO {user-name|PUBLIC|role-
name} [WITH GRANT OPTION]
--Privilege-name is access right granted to the user. Some of access
rights are ALL, EXECUTE & SELECT.
--Object-name is name of database object like TABLE, VIEW &
STORED PROC.
--User-name is name of the user to whom access right is granted.
--PUBLIC is used to grant access rights to all users
--Role-name is set of privileges grouped together
--WITH GRANT OPTION allows user to grant access rights to other
users.
EX:
SQL> GRANT SELECT ON student TO jaya;
REVOKE Command:-
REVOKE command is used to remove given privileges from selected
user of the database.
Syntax:
REVOKE [GRANT OPTION FOR] [permission] ON [object] FROM
[user] [CASCADE]
--GRANT OPTION FOR removes the specified user’s ability to grant specified
permission to other users.
--CASCADE revokes the specified permission from any users.
EX:
SQL> REVOKE SELECT ON student FROM jaya;
TCL COMMANDS:
TCL commands are used in transactions performed by users. The TCL
commands are COMMIT, ROLLBACk & SAVEPOINT.
COMMIT:-
A COMMIT statement commits all the changes made to the data. It
commits all changes made by SQL statements during unit of work.
EX:
SQL> COMMIT;
ROLLBACK:-
ROLLBACK is a process of undoing changes to a database. It ends a
unit of work and back out all relational database changes made by that unit of
work. Rolling back to save point enables selected changes to be undone.
A ROLLBACK is automatically performed when:
The default activation group ends without a final COMMIT
A failure occurs that prevents activation group to complete its
work
A failure occurs that causes a loss of connection to an
application server.
An activation group other than default activation group ends
abnormally.
SAVE POINT:-
The SAVE POINT statement names and marks the current point in the
processing a transaction.
A simple rollback or commit erases all save points.
The save point moves from its old position to current position in
the transaction.
An implicit save point is marked before executing an INSERT,
UPDATE or DELETE statement. If the statement fails, a rollback
to the implicit save point is done.
EX:
SQL> SAVE POINT ss1;
SQL WHERE CLAUSE:
SQL WHERE clause is used to specify a condition while getting the data from
single table or joining with multiple tables. If the given condition is satisfied then only
it returns specific value from the table. WHERE clause is used to filter the records
and fetch only necessary records. The WHERE clause is not only used in SELECT
statement but also in UPDATE, DELETE statements etc.
Syntax:
SELECT col1, col2,.….,colN FROM table-name WHERE [condition];
EX:
SQL> SELECT * FROM Emp WHERE sal>5000;
SQL NULL VALUE:
The SQL NULL is the term used to represent missing data. A field with a
NULL value is a field with no value. NULL values represent missing unknown data.
By default a table column can hold NULL values.
We have to use IS NULL and IS NOT NULL operators for retrieving NULL
content.
EX:
SQL> CREATE TABLE emp (id number(3) NOT NULL, name varchar2(30)
NOT NULL, sal number(15,2), address carchar2(30));
The NULL value can cause problems when selecting data. IS NULL or IS
NOT NULL operators are used in order to retrieve the null information to check for a
NULL value.
EX:
SQL>SELECT * FROM emp WHERE sal IS NOT NULL;
SQL>SELECT * FROM emp WHERE sal IS NULL;
GROUP BY & HAVING:
SQL GROUP BY aggregates column values into a single record value.
GROUP BY requires a list of table columns on which the calculations are performed.
EX:
SQL> SELECT address FROM emp GROUP BY address;
Address
Delhi
Mumbai
Indore
SQL returned the values that are unique. We can also use mathematical functions
with GROUP BY like SUM().
EX:
SQL> SELCT address, SUM(sal) AS “area wise total sal” FROM emp;
Address Area wise total salary
Delhi 23000
Mumbai 8000
Indore 13000
GROUP BY clause helps the user to retrieve selected data and also perform
calculations.
The SQL HAVING clause is like a WHERE clause for aggregated data. IT is
used with conditional statements just like WHERE to filter results. Any column name
used in HAVING clause must also appear in GROUP BY clause.
EX:
SQL>SELECT address, SUM(sal) FROM emp GROUP BY address
HAVING SUM (sal)>1000;
Address Sal
Delhi 23000
Mumbai 8000
Indore 13000
SQL JOINS:
SQL Joins are used to relate information in different tables. A SQL join
condition is used in SQL WHERE clause of select, update, delete statements. It
combines records from two or more tables in a database. It combines fields from two
tables by using common values.
Joining two tables effectively creates another table which combines
information from both tables.
SQL joins can be classified into Equi join and Non Equi join.
1. SQL Equi join:
It is a simple SQL join condition which uses equal sign as comparison
operator. An equi-join is further classified into two categories:
SQL specifies two different syntaxes to express joins: explicit join notation and
implicit join notation.
The explicit join notation uses the JOIN keyword to specify the table to join
and ON keyword to specify predicates for the join.
Ex:
SQL>SELECT * FROM emp INNER JOIN dept ON emp.Did=dept.Did;
The implicit join notation simply lists the tables for joining in the FROM clause
of the SELECT statement using commas to separate them.
Ex:
SQL>SELECT * FROM emp, dept WHERE emp.Did=dept.Did;
The inner joins are further classified into equi joins, natural joins or as cross joins.
--Equi Join:-
It is a comparator based join that uses only equality comparisons in the
join predicate.
Ex:
SQL>SELECT * FROM emp JOIN dept ON emp.Did=dept.Did;
Or
SQL>SELECT * FROM emp, dept WHERE emp.Did=dept.Did;
If the columns in an equi-join have same name, SQL provides a short hand
notation for expressing equi joins by the construct USING.
SQL>SELECT * FROM emp INNER JOIN dept USING(Did);
--Natural Join:-
A natural join is a type of equi join where the join predicate arises
implicitly by comparing all columns in both tables that have same column
names in the joined tables. The resulting table contains only one column for
each pair of equally named columns.
Ex:
SQL>SELECT * FROM emp NATURAL JOIN dept;
OUTER JOIN:
An outer join does not require each record in the two joined tables to have a
matching record. The joined table gets each record even if no other matching record
exists.
Outer joins are classified into left outer joins, right outer joins and full outer
joins.
SQL views are data objects and like SQL tables they can be queried, updated
and dropped. A SQL VIEW is a virtual table containing columns and rows except that
the data contained inside a view is generated dynamically from SQL tables and does
not exist inside the view itself.
A view used only to look table data is called read-only view. A view used to
look and implement insert, update and delete statements is called updatable view.
Reasons to create a view:
When data security is required
When data redundancy is to be kept minimum.
Creating a view:
CREATE VIEW view-name AS SELECT column-list FROM table-name
[WHERE condition];
The SELECT statement is used to define the columns and rows to display in
the view.
Ex:
SQL>CREATE VIEW vemp AS SELECT eid, ename FROM emp;
Updating a view:
The definition of the view created in SQL can be modified without
dropping it by using following syntax:
CREATE OR REPLACE VIEW view-name AS SELECT column=list
FROM table-name WHERE condition;
Ex:
SQL>CREATE OR REPLACE VIEW vemp AS SELECt eid, ename, sal
WHERE sal>5000 FROM emp;
Updatable and Read-Only VIEWs:
Views are either updatable or read-only but not both. INSERT,UPDATE and
DELETE operations are allowed on updatable views and base tables but not allowed
on read-only views.
An updatable view is one that can have each of its rows associated with
exactly one row in base table. When a view is changed, the changes pass through
the view to base table.
Updatable views in standard SQL are defined only for queries that meet these
criteria:
2. IN Sub queries:
When we want to compare a single attribute with a list of values, we
use IN operator/IN sub query.
The following example lists all customers who have purchased hammers,
saws or saw blades.
EX:
SQL>SELECT C-code, C_name FROM CUST JOIN PRODUCT WHERE
P_code IN (SELECT P_code FROM PRODUCT WHERE P-desc LIKE
‘%hammer%’ OR P-desc LIKE ‘%saw%’);
3. HAVING Sub queries:
A sub query with a HAVING clause is known as a HAVING sub query.
Generally, HAVING clause is used to filter the output of a GROUP BY query.
EX:
SELECT P_code, SUM(P-units) FROM PRODUCT GROUPBY P_code
HAVING SUM(P_units)>(SELECT AVG(P_units) FROM PRODUCT);
UNIT-4
TRANSACTIONS AND CONCURRENCY MANAGEMENT
Transaction:
A transaction is any action that reads from (and/or) writes to a database. A
transaction is a logical unit of work that must be entirely completed successfully or
entirely aborted (cancelled), no intermediate states are accepted.
A successful transaction changes the database from one consistent state to
another.
Transaction States:
Each and every transaction has following 5 states:
1. Active
2. Partially committed
3. Committed
4. Failed
5. Aborted
1. Active state:
This is the first state of transaction and here transaction is being
executed.
2. Partially committed state:
This is also an execution phase where last step in the transaction is
executed. But data is not saved in the database.
3. Committed state:
In this state, all the transactions are permanently saved to the
database. This step is the last step of a transaction if it executes without fail.
4. Failed state:
If a transaction cannot proceed to the execution state because of
failure of system or database, then the transaction is said to be in failed state.
5. Aborted state:
If a transaction is failed to execute, then the database recovery system
brings the database to consistent state by aborting or rolling back the
transaction. If the transaction fails in the middle, all the executed transactions
are rolled back to consistent state before executing the transaction. Once the
transaction is aborted it is either restarted or killed by the DBMS.
Concurrent Transactions:
When many transactions take place at the same time, they are called
Concurrent transactions. Managing the execution of concurrent transactions is called
concurrency control.
Serializable Schedules:
A schedule is a process of grouping the transactions into one and executing
them in a predefined order. A schedule is required in a database because when
some transactions execute in parallel, they may affect the result of the transaction –
means if one transaction is updating the values which the other transaction is
accessing, then the order of these two transactions will change the result of second
transaction.
A schedule is called serial schedule, if the transactions in the schedule are
defined to execute one after the other.
Serializability ensures that the schedule for the concurrent execution of the
transactions give consistent results. This property is important in multi-user and
distributed database system.
Locking Protocol:
A locking protocol is a set of rules that describes how the database entities
can be accessed.
Concurrency Control:
When there are multiple transactions executing at the same time on same
data, it may affect the result of the transaction. Hence it is necessary to maintain
the order of execution of those transactions. In addition, it should not alter the
ACID property of a transaction.
In order to maintain the concurrent access of transactions, two protocols are
introduced.
Lock Based Protocol: - Lock is in other words called as access. In this type
of protocol any transaction will not be processed until the transaction gets the
lock on the record. That means any transaction will not retrieve or insert or
update or delete the data unless it gets the access to that particular data.
These locks are broadly classified as Binary locks and shared / exclusive locks.
Binary lock: In this, data can either be locked or unlocked. It will have only these
two states. It can be locked for retrieve or insert or update or delete the data or
unlocked for not using the data.
Shared / Exclusive lock: In this technique the data is said to be exclusively locked if
for insert / update /delete. When it is exclusively locked no other transaction can read
or write the data. When a data is read from the database, then its lock is shared i.e.;
the data can be read by other transaction too but it cannot be changed while
retrieving the data.
Simplistic Lock Protocol: -As the name suggests it is the simplest way of locking
the data during the transaction. This protocol allows all the transaction to get the lock
on the data before insert / update /delete on it. After completing the transaction, it will
unlock the data.
Pre-claiming Protocol: - In this protocol, it evaluates the transaction to list all the
data items on which transaction needs lock. It then requests DBMS for the lock on all
those data items before the transaction begins. If DBMS gives the lock on all the
data, then this protocol allows the transaction to begin. Once the transaction is
complete, it releases all the locks. If all locks are given by DBMS, then it reverts the
transactions and waits for the lock.
For example, if we have to calculate total marks of 3 subjects, then this protocol will
evaluate the transaction and list the locks on subject1 marks, subject2 marks and
then subject3 marks. Once it gets all the locks, it will start the transaction.
Two Phase Locking Protocol (2PL): - In this type of protocol, as the transaction
begins to execute, it starts requesting for the locks that it needs. It goes on
requesting for the locks as and when it is needed. Hence it has a growing phase of
locks. At one stage it will have all the locks. Once the transaction is complete it goes
on releasing the locks. Hence it will have descending phase of locks. Thus this
protocol has two phases – growing phase of locks and shrinking phase of locks.
For example, if we have to calculate total marks of 3 subjects, then this protocol will
go on asking for the locks on subject1 marks, subject2 marks and then subject3
marks. As and when it gets the locks on the subject marks it reads the marks. It does
not wait till all the locks are received. Then it will have total calculation. Once it is
complete it release the lock on subject3 marks, subject2 marks and subject1 marks.
In this protocol, if we need to have exclusive lock on any data for writing, then we
have to first get the shared lock for reading. Then we have to request / modify the
lock to exclusive lock.
Strict Two Phase Locking (Strict 2PL): - This protocol is similar to 2PL in the first
phase. Once it receives the lock on the data, it completes the transaction. Here it
does not release the locks as it is used and no more required. It waits till whole
transaction to complete and commit, then it releases all the locks at a time. This
protocol hence does not have shrinking phase of lock release.
This algorithm states that if there is an active write operation on data X when
a transaction T is requesting for X, then reject the transaction T. If the transaction T
is started as soon as write is complete or no going write operation on X, then
execute T.
This algorithm describes about write operation. If there is an active read or write
on data X, and at the same time if the transaction T is requesting for X, then the
transaction is rejected. If there is no active read / write operation on X, then execute
the transaction.
DEADLOCK:
Dead lock occurs when two transactions wait for each other to unlock data.
Dead locks are possible only when one of the transactions wants to obtain an
exclusive lock on a data item. Dead lock condition does not exist among shared
locks.
Above diagram shows how two transactions are waiting for each other and never
completes.
Deadlock avoidance
Wait-For-Graph
In this method a graph is drawn based on the transaction and their lock on the
resource. If the graph created has a closed loop, then there is a deadlock. In DBMS
maintains this graph for all the transactions waiting for the resources and checks if
there is a loop.
Suppose T1 and T2 are two transactions. Suppose T1 is requesting for a
resource R which is held by T2. In this case, wait for graph draws an arrow from T1
to T2. If T2 releases the resource R, then this arrow is deleted.
Deadlock Prevention
The DBMS verifies each transaction and sees if there can be deadlock
situation upon execution of the transaction. If it finds everything is fine, then allows
the transaction to execute. If it finds that there can be a deadlock, it never allows the
transaction to execute.
DBMS basically checks for the timestamp at which a transaction has been
initiated and orders the transactions based on it. If there are any transactions at
same time period with requesting each other resource, then it stops those
transactions before executing it. In above case, DBMS will never allow the
transaction to execute simultaneously. This method is suitable for large system.
Wait-Die Scheme
In this method, if a transaction request for the resource which is already
locked by other transaction, then the DBMS checks for the timestamp of both the
transaction and allows older transaction to wait until the resource is available for
execution.
Checks if TS (T1) <TS (T2) – if T1 is the older transaction and T2 has held
some resource, then it allows T1 to wait until resource is available for
execution. That means if a younger transaction has locked some resource
and older transaction is waiting for it, then older transaction is allowed wait for
it till it is available.
If T1 is older transaction and has held some resource with it and if T2 is
waiting for it, then T2 is killed and restarted latter with random delay but with
the same timestamp. i.e.; if the older transaction has held some resource and
younger transaction waits for the resource, then younger transaction is killed
and restarted with very minute delay with same timestamp.
a. Read phase :
In a Read phase, the updates are prepared using private (or local) copies (or
versions) of the granule. In this phase, the transaction reads values of committed
data from the database, executes the needed computations, and makes the updates
to a private copy of the database values. All update operations of the transaction are
recorded in a temporary update file, which is not accessed by the remaining
transactions.
It is conventional to allocate a timestamp to each transaction at the end of its
Read to determine the set of transactions that must be examined by the validation
procedure. These set of transactions are those who have finished their Read phases
since the start of the transaction being verified
c. Write phase :
In a Write phase, the changes are permanently applied to the database and
the updated granules are made public. Otherwise, the updates are discarded and the
transaction is restarted. This phase is only for the Read-Write transactions and not
for Read-only transactions.
Advantages of Optimistic Methods for Concurrency Control:
i. This technique is very efficient when conflicts are rare. The occasional
conflicts result in the transaction roll back.
ii. The rollback involves only the local copy of data, the database is not involved
and thus there will not be any cascading rollbacks.
Problems of Optimistic Methods for Concurrency Control:
i. Conflicts are expensive to deal with, since the conflicting transaction must be
rolled back.
ii. Longer transactions are more likely to have conflicts and may be repeatedly
rolled back because of conflicts with short transactions.
Applications of Optimistic Methods for Concurrency Control :
i. Only suitable for environments where there are few conflicts and no long
transactions.
ii. Acceptable for mostly Read or Query database systems that require very few
update transactions
3. Logical errors:-
Bad data or missing data are common conditions that may stop the
program continuing with normal execution.
Recovery Procedures:
To maintain data integrity, a transaction must be in one of the following two
states:
1. Aborted:-
A transaction may not always complete its process successfully. To
make sure the incomplete transaction will not affect the consistent state of the
database, such transactions must be aborted.
2. Committed:-
A transaction that successfully completes its processing is said to be
committed. A committed transaction always leaves the database in a new
consistent state. The log plays a key role in failure recovery.
Database Backups:
Database backups can be either physical or logical.
Physical backups are primary concern in backup and recovery strategy which
are copies of physical database files.
Logical backups contain logical data such as tables and stored procedures.
Logical backups can supplement physical backups.
Physical backups have large granularity and limited transportability but are
very fast. Logical backups have fine granularity and complete transportability but are
slower than physical backups.
DATABASE SECURITY:
Unauthorized access to the database includes the following:
Theft of information
Unauthorized modification of data
Un authorized destruction of data
Database security methods focus on preventing unauthorized users from
accessing the database.
Authentication:-
Database access usually requires user authentication and authorization. For
user authentication, the first level of security establishes that the person seeking
system entry is an authorized user. His or her identity may be established by
o Something the user knows such as log-on number and password
o Something the user possesses such as a plastic ID card
o A physical representation of the user such as finger print or voice print.
Authorization allows the database users to access certain part of database. The
process of verifying the identity of a user is called as authentication.
--Password Authentication:-
In this scheme, the user is asked to enter the user name and password to log
into the database. DBMS then verifies the combination of user name and password
to authenticate the user and allows him the access to the database if he is an
authorized user otherwise access is denied.
--Access Control:-
Most database users need only a small portion of database. Allowing them
access to whole database is undesirable. Thus an organization should develop
effective security policy to enable a group of users to access only a required portion
of the database. Once the security policy is developed, it should be enforced to
achieve the level of security required.
Authorization and Views:-
It is used to grant privileges to users which help them to access certain
portion of the database. Each user is allowed to perform only necessary operation on
the database that is required to perform their function. The person who is
responsible for granting privileges to database users is called authorizer.
The authorization information is maintained in a table called access matrix.
The columns of access matrix represent objects and rows represent subjects. Here
object is any part of the database that needs to be protected from unauthorized
access. Subject is an active user or account who operates on various objects in the
database.
UNIT-5
DISTRIBUTED AND CLIENT SERVER DATABASES
DISRIBUTED DATABASES:
Types of DDBMS:
--Non-homogeneous DBMSs:-
It is sometimes desirable to integrate databases maintained by different
DBMSs on different computers. One approach of integrating these databases
is to provide a single user interface that can be used to access the data
maintained by the non homogeneous DBMSs.
Data Replication:
Data replication refers to the storage of data copies at multiple sites served by
a computer network. Data replication is the frequent electronic copying data from a
database in one computer or server to a database in another so that all users share
the same level of information.
Replicated data is subject to mutual consistency rule. The mutual consistency
rule requires that all copies of data fragments must be identical.
Advantages of Replication:-
Availability
Parallelism
Reduced data transfer
Disadvantages of Replication:-
Increased cost of updates
Increased complexity of concurrency control
Three replication scenarios:-
o A fully replicated database stores multiple copies of each database fragment
at multiple sites.
o A partially replicated database stores multiple copies of some database
fragments at multiple sites.
o An un replicated database stores each database fragment at a single site.
Database replication can be done in at least 3 different ways:-
--Snapshot replication:
Data on one server is simply copied to another server or to another
database on the same server.
--Merging replication:
Data from two or more databases is combined into a single database.
--Transactional replication:
Users receive full initial copies of the database and then receive
periodic updates as data changes.
Database Partitioning:
A partition is a division of a logical database or its constituting elements into
distinct independent parts. Database partitioning is normally done for manageability,
performance or availability reasons. Each partition is spread over multiple nodes and
users at the node can perform local transactions on the partitions.
Efficiency is obtained by the strategy that implements a partitioned database.
With this approach, the database is distributed such that there is no overlapping or
replication of data maintained at various locations.
Reliability is also affected, since a failure of one computer system means that
the data which is stored at that location is not available to the users anywhere in the
system.
A partitioned database is treated as a series of independently operated
database systems with remote access capability.
Data Fragmentation:
Data fragmentation allows to break a single object into two or more segments
or fragments. Each fragment can be stored at any site over a computer network.
Fragmentation aims to improve:
Reliability
Performance
Balanced storage capacity and costs
Communication costs
Security
The following information is used to decide fragmentation:
Quantitative information:
Frequency of queries, site, where query is run and selectivity of queries
etc.
Qualitative information:
Types of access of data, read/write etc.
There are 3 types of data fragmentation strategies:
Horizontal fragmentation:-
It refers to the division of a relation into subsets (fragments) of tuples.
Each fragment is stored at a different node and each fragment has unique
rows.
Vertical fragmentation:-
It refers to the division of a relation into subsets (fragments) of attributes.
Each fragment is stored at a different node and each fragment has unique
columns.
Mixed fragmentation:-
It refers to a combination of horizontal and vertical strategies. A table
may be divided into several horizontal subsets (rows) each one having a
subset of attributes (columns).
Advantages of Fragmentation:
Horizontal:
o Allows parallel processing on fragments of a relation.
o Allows a relation to be split so that tuples are located where they are most
frequently accessed.
Vertical:
o Allows tuple to be split so that each part of tuple is stored where it is most
frequently accessed.
o Tuple-id attribute allows efficient joining of vertical fragments.
Vertical and horizontal fragmentation can be combined:
o Fragments may be successively fragmented to an arbitrary depth.
Replication and fragmentation can be combined:
o Relation is partitioned into several fragments. System maintains several
identical replicas of each such fragment.
Data Integrity:
As the data is distributed, the transaction activities may take place at a
number of sites and it can be difficult to maintain a time ordering among actions.
When two or more transactions are executing at the same time and both
require access to the same data in order to complete their processing is a problem.
Most concurrency control algorithms for distributed database systems use
some form of check to see that the result of a transaction is the same as if its actions
were executed serially.
To implement concurrency control, the following must be known:
1. The type of scheduling algorithm used
2. The location of scheduler
3. How replicated data is controlled
When transactions are performed sequentially, all the actions are performed
and then all the actions of next transaction are executed. There is no concurrency
and this is called a serial execution.
Some of the principal methods of maintaining data integrity in a distributed
database system are Two phase commit protocol, Distributed locking and Time
stamping.
Two Phase Commit Protocol (2PC):
Under 2PC, a single transaction can update many different databases or
resources and these resources may be distributed across networks and have
independent availability and failure modes.
Commit protocols are used to ensure atomicity across sites.
o A transaction which executes at multiple sites must either be
committed at all the sites or aborted at all the sites.
o Not acceptable to have a transaction committed at one site and
aborted at another
The two phase commit protocol is widely used
The three phase commit protocol (3PC) is more complicated and more
expensive but avoids some drawbacks of two-phase commit protocol
Two-phase commit protocol:-
Assumes fail-stop model – failed sites simply stop working and do not
cause any other harm such as sending incorrect messages to other sites.
Execution of the protocol is initiated by the coordinator after the last step of
the transaction.
The protocol involves all local sites at which the transaction is executed
Let T be a transaction initiated at site Si and let the transaction coordinator
at Si be Ci.
Implementation Process:-
Phase 1: The coordinator gets the participants ready to write the results into
database.
Phase 2: Everybody writes the results into database
--Coordinator: The process at the site where the transaction originates
and which controls the execution.
--Participant: The process at the other sites that participate in executing
the transaction.
Centralized 2PC:
P P
C P C P C
P P
Phase 1 Phase 2
Distributed Locking:
The most common way in which access to items is controlled is by “locks”.
Lock manager is a part of DBMS that record, for each item I, whether one or more
transactions are reading or writing any part of item I. If so, the lock manager stops
another transaction from gaining access to item I.
The lock manager can store the current locks in a lock table which consists of
records (<item>, <lock type>, <transaction>). The meaning of record (I, L, T) is that
transaction T has a lock of type L on item I.
The DDBMS maintains a local lock manager at each location which
administers the lock and unlock requests for data items stored at that site. Locks
may be applied in two modes: shared and exclusive.
If a transaction locks a record in shared mode, it can read that record but
cannot update that record. If a transaction locks a record in exclusive mode, it can
both read and update the record and no other record can access the record while it
is exclusively locked. No two transactions can hold exclusive locks on same record
at the same time. Any number of transactions can be able to achieve shared locks
on the same record at the same time.
If there is only a single copy of a record, then the logical record is identical to
its only physical copy. Appropriate locks are maintained by sending lock-request
messages to the site at which the copy resides.
1. Define start and stop keys, select statistics options and press the OK
button.
2. Define as many start and end keys that you desire reporting on and
press the close button when finished.
3. The next step is to add the start and end keys to the Event Logs on the
activities or processes that the cycle time and counts will be monitored
between.
Time stamp based concurrency control protocols can be used in distributed systems.
Each transaction must be given a unique timestamp.
Main problem: how to generate a time stamp in distributed manner.
o Each site generates a unique local timestamp using either a logical
counter or the local clock.
o Global unique time stamp is obtained by concatenating the unique
local time stamp with unique identifier.
Improved security
Decreased costs and increased scalability
Client 3
Client 1 Client 2
NETWORK
Server
Database computer
Internet server
Server Internet
Application Information Server
Internet clients
Advantages of Client/Server:-
1. Processing of the entire database is spread over clients and server.
2. DBMS can achieve high performance.
3. Client applications can take full advantage of advanced user interfaces.
Disadvantages of Client/Server:-
1. Implementation is more complex.
2. It is possible that network is not well suited for client/server communications.
3. Additional load on DBMS server to handle concurrency control.