Unit I Relational Databases
Unit I Relational Databases
Purpose of Database System – Views of data – Data Models – Database System Architecture –
Introduction to relational databases – Relational Model – Keys – Relational Algebra – SQL fundamentals
– Advanced SQL features – Embedded SQL– Dynamic SQL.
Data:
It is a collection of information.
The facts that can be recorded and which have implicit meaning known as 'data'.
Example:
Customer
1. cname.
2. cno.
3. ccity.
Database:
It is a collection of interrelated data.
These can be stored in the form of tables.
A database can be of any size and varying complexity.
A database may be generated and manipulated manually or it may be computerized.
Example:
Customer database consists the fields as cname, cno, and ccity
cname cno ccity
Database System:
It is computerized system, whose overall purpose is to maintain the information and to make that the
information is available on demand.
Advantages:
1. Redundency can be reduced.
2. Inconsistency can be avoided.
3. Data can be shared.
4. Standards can be enforced.
5. Security restrictions can be applied.
6. Integrity can be maintained.
7. Data gathering can be possible.
8. Requirements can be balanced.
Database Management System (DBMS):
It is a collection of programs that enables user to create and maintain a database. In other words it is
general-purpose software that provides the users with the processes of defining, constructing and
manipulating the database for various applications.
Disadvantages in File Processing
1. Data redundancy and inconsistency.
2. Difficult in accessing data.
3. Data isolation.
4. Data integrity.
5. Concurrent access is not possible.
6. Security Problems.
Advantages of DBMS:
1. Data Independence.
2. Efficient Data Access.
3. Data Integrity and security.
4. Data administration.
5. Concurrent access and Crash recovery.
6. Reduced Application Development Time.
Applications
Database Applications:
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions.
1. To see why database management systems are necessary, let's look at a typical ``file-
processing system'' supported by a conventional operating system.
The application is a savings bank:
Savings account and customer records are kept in permanent system files.
Application programs are written to manipulate files to perform the following tasks:
Debit or credit an account.
Add a new account.
Find an account balance.
Generate monthly statements.
2. Development of the system proceeds as follows:
New application programs must be written as the need arises.
New permanent files are created as required.
but over a long period of time files may be in different formats, and
Application programs may be in different languages.
3. So we can see there are problems with the straight file-processing approach:
Data redundancy and inconsistency
Same information may be duplicated in several places.
All copies may not be updated properly.
Difficulty in accessing data
May have to write a new application program to satisfy an unusual request.
E.g. find all customers with the same postal code.
Could generate this data manually, but a long job...
Data isolation
Data in different files.
Data in different formats.
Difficult to write new application programs.
Multiple users
Want concurrency for faster response time.
Need protection for concurrent updates.
E.g. two customers withdrawing funds from the same account at the same time - account
has $500 in it, and they withdraw $100 and $50.
The result could be $350, $400 or $450 if no protection.
Security problems
Every user of the system should be able to access only the data they are permitted to see.
E.g. payroll people only handle employee records, and cannot see customer accounts; tellers
only access account data and cannot see payroll data.
Difficult to enforce this with application programs.
Integrity problems
Data may be required to satisfy constraints.
E.g. no account balance below $25.00.
Again, difficult to enforce or to change constraints with the file-processing approach.
These problems and others led to the development of database management systems.
* VIEWS OF DATA
A database system is a collection of interrelated data and a set of programs that allow users to access
and modify these data. A major purpose of a database system is to provide users with an abstract view
of the data. That is, the system hides certain details of how the data are stored and maintained.
Data Abstraction For the system to be usable, it must retrieve data efficiently. The need for efficiency
has led designers to use complex data structures to represent data in the database. Since many
database-system users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interactions with the system:
Data abstraction
Database systems are made-up of complex data structures. To ease the user interaction with database,
the developers hide internal irrelevant details from users. This process of hiding irrelevant details from
user is called data abstraction.
Definition of schema: Design of a database is called the schema. Schema is of three types: Physical
schema, logical schema and view schema.
For example: In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the database, it doesn’t show
the data present in those tables. Schema is only a structural view (design) of a database as shown in the
diagram below.
The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
Design of database at logical level is called logical schema, programmers and database administrators
work at this level, at this level data can be described as certain types of data records gets stored in data
structures, however the internal details such as implementation of data structure is hidden at this level
(available at physical level).
Design of database at view level is called view schema. This generally describes end user interaction
with database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance of
database. Database schema defines the variable declarations in tables that belong to a particular
database; the value of these variables at a moment of time is called the instance of that database.
For example, let’s say we have a single table student in the database, today the table has 100 records, so
today the instance of the database has 100 records. Let’s say we are going to add another 100 records
in this table by tomorrow so the instance of database tomorrow will have 200 records in table. In short,
at a particular moment the data stored in database is called the instance that changes over time when
we add or delete data from the database.
* DATA MODELS
Data Models Underlying the structure of a database is the data model: a collection of conceptual tools
for describing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, logical, and view levels.
A characteristic of the database approach is that it provides a level of data abstraction, by hiding details
of data storage that are not needed by most users.
A data model is a collection of concepts that can be used to describe the structure of a database. The
model provides the necessary means to achieve the abstraction.
The structure of a database is characterized by data types, relationships, and constraints that hold for
the data. Models also include a set of operations for specifying retrievals and updates.
Data models are changing to include concepts to specify the behavior of the database application. This
allows designers to specify a set of user defined operations that are allowed.
The architecture of a database system is greatly influenced by the underlying computer system on
which the database system runs. Database systems can be centralized, or client-server, where one
server machine executes work on behalf of multiple client machines. Database systems can also be
designed to exploit parallel computer architectures. Distributed databases span multiple geographically
separated machines.
Database Users:
Users are differentiated by the way they expect to interact with the system:
Application programmers
Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces.
Rapid application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
Sophisticated users
Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language.
They submit each such query to a query processor, whose function is to break down DML
statements into instructions that the storage manager understands.
Specialized users
Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework.
Among these applications are computer-aided design systems, knowledge base and expert
systems, systems that store data with complex data types (for example, graphics data and audio
data), and environment-modeling systems.
Naïve users
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
For example, a bank teller who needs to transfer $50 from account A to account B invokes a
program called transfer. This program asks the teller for the amount of money to be
transferred, the account from which the money is to be transferred, and the account to which
the money is to be transferred.
Database Administrator
Coordinates all the activities of the database system. The database administrator has a good
understanding of the enterprise’s information resources and needs.
Database administrator's duties include:
Schema definition: The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
Storage structure and access method definition.
Schema and physical organization modification: The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
Granting user authority to access the database: By granting different types of authorization,
the database administrator can regulate which parts of the database various users can access.
Specifying integrity constraints.
Monitoring performance and responding to changes in requirements.
Query Processor:
The query processor will accept query from user and solves it by accessing the database.
Parts of Query processor:
DDL interpreter
This will interprets DDL statements and fetch the definitions in the data dictionary.
DML compiler
This will translates DML statements in a query language into low level instructions that the query
evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans for same
query result DML compiler will select best plan for query optimization.
Query evaluation engine
This engine will execute low-level instructions generated by the DML compiler on DBMS.
Storage Manager/Storage Management:
A storage manager is a program module which acts like interface between the data stored in a
database and the application programs and queries submitted to the system.
Thus, the storage manager is responsible for storing, retrieving and updating data in the database.
The storage manager components include:
Authorization and integrity manager: Checks for integrity constraints and authority of users to
access data.
Transaction manager: Ensures that the database remains in a consistent state although there
are system failures.
File manager: Manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
Buffer manager: It is responsible for retrieving data from disk storage into main memory. It
enables the database to handle data sizes that are much larger than the size of main memory.
Data structures implemented by storage manager.
Data files: Stored in the database itself.
Data dictionary: Stores metadata about the structure of the database.
Indices: Provide fast access to data items.
*INTRODUCTION TO RELATIONAL DATABASES
Relational Database Management System (RDBMS) consists of:
A set of tables
A schema
A schema is a description of data in terms of data model and Defines tables and their attributes (field
or column)
The central data description construct is a relation:
Can be thought as records
eg. Information on student is stored in a relation with the following schema: Student(sid: string,
sname: string, login: string, gpa: numeric)
Tables ≡ relation:
is a subset of the Cartesian product of the domains of the column data type.
Stores information about an entity or theme
Consist of columns (fields) and rows (records).
Rows ≡ tuple, describing information about a single item, eg. A specific student
Columns ≡ attributes, describing a single characteristic (attributes) of its item, eg. Its ID
number, GPA, etc
Every row is unique & identified by a key
Entity
Entity is an object in the real world that is distinguishable from other objects. eg. Students,
lecturers, courses, rooms.
Described using a set of attributes whose domain values must be identified.
The attribute 'name of Student' ⇨ 20-character strings
RELATIONAL MODEL
In relational model, the data and relationships are represented by collection of inter-related tables.
Each table is a group of column and rows, where column represents attribute of an entity and rows
represents records.
Sample relationship Model: Student table with 3 columns and four records.
Table: Student
Stu_Id Stu_Name Stu_Age
111 Ashish 23
123 Saurav 22
169 Lester 24
234 Lou 26
Table: Course
Stu_Id Course_Id Course_Name
Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id & Course_Name
are attributes of table Course. The rows with values are the records (commonly known as tuples).
*KEYS
Key plays an important role in relational database; it is used for identifying unique rows from table. It
also establishes relationship among tables.
Types of keys in DBMS
Primary Key – A primary is a column or set of columns in a table that uniquely identifies tuples (rows)
in that table.
Primary Key Example in DBMS
Let’s take an example to understand the concept of primary key. In the following table, there are three
attributes: Stu_ID, Stu_Name & Stu_Age. Out of these three attributes, one attribute or a set of more than
one attributes can be a primary key.
Attribute Stu_Name alone cannot be a primary key as more than one student can have same name.
Attribute Stu_Age alone cannot be a primary key as more than one student can have same age.
Attribute Stu_Id alone is a primary key as each student has a unique id that can identify the student
record in the table.
Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that case we try to
find a set of attributes that can uniquely identify a row in table. We will see the example of it after this
example.
Table Name: STUDENT
Stu_Id Stu_Name Stu_Age
101 Steve 23
102 John 24
103 Robert 28
104 Steve 29
105 Carl 29
Super Key – A super key is a set of one of more columns (attributes) to uniquely identify rows in a
table.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only thing we take care
while selecting candidate key is: It should not have any redundant attribute. That’s the reason they are
also termed as minimal super key.
Let’s take an example to understand this:
Table: Employee
Emp_SSN Emp_Number Emp_Name
--------- ---------- --------
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys: The above table has following super keys. All of the following sets of super key are able to
uniquely identify a row of the employee table.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys: As I mentioned in the beginning, a candidate key is a minimal super key with no
redundant attributes. The following two set of super keys are chosen from the above sets as there are
no redundant attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes that are not
necessary for unique identification.
Super key vs Candidate Key
I have been getting lot of comments regarding the confusion between super key and candidate key. Let
me give you a clear explanation.
1. First you have to understand that all the candidate keys are super keys. This is because the candidate
keys are chosen out of the super keys.
2. How we choose candidate keys from the set of super keys? We look for those keys from which we
cannot remove any fields. In the above example, we have not chosen {Emp_SSN, Emp_Name} as
candidate key because {Emp_SSN} alone can identify a unique row in the table and Emp_Name is
redundant.
Primary key:
A Primary key is selected from a set of candidate keys. This is done by database admin or database
designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a primary key for the
table Employee.
Candidate Key – A super key with no redundant attribute is known as candidate key
Candidate Key Example
Let’s take an example of table “Employee”. This table has three attributes: Emp_Id, Emp_Number &
Emp_Name. Here Emp_Id & Emp_Number will be having unique values and Emp_Name can have
duplicate values as more than one employees can have same name.
Emp_Id Emp_Number Emp_Name
------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
How many super keys the above table can have?
1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}
5. {Emp_Id, Emp_Number, Emp_Name}
6. {Emp_Number, Emp_Name}
Let’s select the candidate keys from the above set of super keys.
1. {Emp_Id} – No redundant attributes
2. {Emp_Number} – No redundant attributes
3. {Emp_Id, Emp_Number} – Redundant attribute. Either of those attributes can be a minimal super key
as both of these columns have unique values.
4. {Emp_Id, Emp_Name} – Redundant attribute Emp_Name.
5. {Emp_Id, Emp_Number, Emp_Name} – Redundant attributes. Emp_Id or Emp_Number alone are
sufficient enough to uniquely identify a row of Employee table.
6. {Emp_Number, Emp_Name} – Redundant attribute Emp_Name.
The candidate keys we have selected are:
{Emp_Id}
{Emp_Number}
Note: A primary key is selected from the set of candidate keys. That means we can either have Emp_Id
or Emp_Number as primary key. The decision is made by DBA (Database administrator)
Alternate Key – Out of all candidate keys, only one gets selected as primary key, remaining keys are
known as alternate or secondary keys.
Alternate Key Example
Let’s take an example to understand the alternate key concept. Here we have a table Employee, this
table has three attributes: Emp_Id, Emp_Number & Emp_Name.
Table: Employee/strong>
Emp_Id Emp_Number Emp_Name
------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in the above table:
{Emp_Id}
{Emp_Number}
DBA (Database administrator) can choose any of the above key as primary key. Lets say Emp_Id is
chosen as primary key.
Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be called
alternative or secondary key.
Composite Key – A key that consists of more than one attribute to uniquely identify rows (also known
as records & tuples) in a table is called composite key.
Composite key Example
Let’s consider a table Sales. This table has four columns (attributes) – cust_Id, order_Id, product_code &
product_count.
Table – Sales
cust_Id order_Id product_code product_count
-------- -------- --------- -------------
C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42
None of these columns alone can play a role of key in this table.
Column cust_Id alone cannot become a key as a same customer can place multiple orders, thus the
same customer can have multiple entries.
Column order_Id alone cannot be a primary key as a same order can contain the order of multiple
products, thus same order_Id can be present multiple times.
Column product_code cannot be a primary key as more than one customer can place order for the
same product.
Column product_count alone cannot be a primary key because two orders can be placed for the same
product count.
Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}
This is a composite key as it is made up of more than one attributes.
Foreign Key – Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.
For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it points to the
primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id
C01 101
C02 102
C03 101
C05 102
C06 103
C07 102
Student table:
101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with the primary key tag of another table, if it points
to a unique column (not necessarily a primary key) of another table then too, it would be a foreign key.
So, a correct definition of foreign key would be: Foreign keys are the columns of a table that points to
the candidate key of another table.
*RELATIONAL ALGEBRA
Relational algebra is a procedural query language that works on relational model. The purpose of a
query language is to retrieve data from database or perform various operations such as insert, update,
and delete on the data. When I say that relational algebra is a procedural query language, it means that
it tells what data to be retrieved and how to be retrieved.
On the other hand relational calculus is a non-procedural query language, which means it tells what
data to be retrieved but doesn’t tell how to retrieve it. We will discuss relational calculus in a separate
tutorial.
Types of operations in relational algebra
We have divided these operations in two categories:
1. Basic Operations
2. Derived Operations
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
Derived Operations:
1. Natural Join (⋈)
2. Left, Right, Full outer join
3. Intersection (∩)
4. Division (÷)
Let’s discuss these operations one by one with the help of examples.
Select Operator (σ)
Select Operator is denoted by sigma (σ) and it is used to find the tuples (or rows) in a relation (or
table) which satisfy the given condition.
If you understand little bit of SQL then you can think of it as a where clause in SQL, which is used for
the same purpose.
Syntax of Select Operator (σ)
Condition/Predicate (Relation/Table name)
Select Operator (σ) Example
Table: CUSTOMER
---------------
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
Customer_City="Agra" (CUSTOMER)
Output:
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
Project Operator (∏)
Project operator is denoted by ∏ symbol and it is used to select desired columns (or attributes) from a
table (or relation).
Project operator in relational algebra is similar to the Select statement in SQL.
Syntax of Project Operator (∏)
∏ column_name1, column_name2, ...., column_nameN(table_name)
Project Operator (∏) Example
In this example, we have a table CUSTOMER with three columns, we want to fetch only two columns of
the table, which we can do with the help of Project Operator ∏.
Table: CUSTOMER
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
∏ Customer_Name, Customer_City (CUSTOMER)
Output:
Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Union Operator (∪)
Union operator is denoted by ∪ symbol and it is used to select all the rows (tuples) from two tables
(relations).
Let’s discuss union operator a bit more. Lets say we have two relations R1 and R2 both have same
columns and we want to select all the tuples(rows) from these relations then we can apply the union
operator on these relations.
Note: The rows (tuples) that are present in both the tables will only appear once in the union set. In
short you can say that there are no duplicates present after the union operation.
Syntax of Union Operator (∪)
table_name1 ∪ table_name2
Union Operator (∪) Example
Table 1: COURSE
Course_Id Student_Name Student_Id
--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT
Student_Id Student_Name Student_Age
------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:
∏ Student_Name (COURSE) ∪ ∏ Student_Name (STUDENT)
Output:
Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names present in the output even though we had few
common names in both the tables, also in the COURSE table we had the duplicate name itself.
Intersection Operator (∩)
Intersection operator is denoted by ∩ symbol and it is used to select common rows (tuples) from two
tables (relations).
Let’s say we have two relations R1 and R2 both have same columns and we want to select all those
tuples(rows) that are present in both the relations, then in that case we can apply intersection
operation on these two relations R1 ∩ R2.
Note: Only those rows that are present in both the tables will appear in the result set.
Syntax of Intersection Operator (∩)
table_name1 ∩ table_name2
Intersection Operator (∩) Example
Lets take the same example that we have taken above.
Table 1: COURSE
Course_Id Student_Name Student_Id
--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT
Student_Id Student_Name Student_Age
------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:
∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)
Output:
Student_Name
------------
Aditya
Steve
Paul
Lucy
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we have two relations R1 and R2 and we want to select
all those tuples(rows) that are present in Relation R1 but not present in Relation R2, this can be done
using Set difference R1 – R2.
Syntax of Set Difference (-)
table_name1 - table_name2
Set Difference (-) Example
Let’s take the same tables COURSE and STUDENT that we have seen above.
Query:
Lets write a query to select those student names that are present in STUDENT table but not present in
COURSE table.
∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)
Output
Student_Name
------------
Carl
Rick
Cartesian product (X)
Cartesian Product is denoted by X symbol. Lets say we have two relations R1 and R2 then the cartesian
product of these two relations (R1 X R2) would combine each tuple of first relation R1 with the each
tuple of second relation R2. I know it sounds confusing but once we take an example of this, you will be
able to understand this.
Syntax of Cartesian product (X)
R1 X R2
Cartesian product (X) Example
Table 1: R
Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S
Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.
RXS
Output:
Col_A Col_B Col_X Col_Y
----- ------ ------ ------
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Note: The number of rows in the output will always be the cross product of number of rows in each
table. In our example table 1 has 3 rows and table 2 has 3 rows so the output has 3×3 = 9 rows.
Rename (ρ)
Rename (ρ) operation can be used to rename a relation or an attribute of a relation.
Rename (ρ) Syntax:
ρ(new_relation_name, old_relation_name)
Rename (ρ) Example
Lets say we have a table customer, we are fetching customer names and we are renaming the resulted
relation to CUST_NAMES.
Table: CUSTOMER
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:
CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl
*SQL FUNDAMENTALS
SQL is a programming language for Relational Databases. It is designed over relational algebra and
tuples relational calculus. SQL comes as a package with all major distributions of RDBMS.
SQL comprises both data definition and data manipulation languages. Using the data definition
properties of SQL, one can design and modify database schema, whereas data manipulation properties
allows SQL to store and retrieve data from database.
Structured Query Language (SQL) as we all know is the database language by the use of which we can
perform certain operations on the existing database and also we can use this language to create a
database. SQL uses certain commands like Create, Drop, and Insert etc. to carry out the required tasks.
1. DDL – Data Definition Language
2. DML – Data Manipulation Language
3. DCL – Data Control Language
4. TCL- Transaction Control Language
DATA DEFINITION LANGUAGE
SQL uses the following set of commands to define database schema:
CREATE
Creates new databases, tables, and views from RDBMS.
For example:
Create database tutorialspoint;
Create table article;
Create view for_students;
DROP
Drops commands, views, tables, and databases from RDBMS.
For example:
Drop object_type object_name;
Drop database tutorialspoint;
Drop table article;
Drop view for_students;
ALTER
Modifies database schema.
For example:
Alter object_type object_name parameters;
Alter table article add subject varchar;
DATA MANIPULATION LANGUAGE
SQL is equipped with data manipulation language (DML). DML modifies the database instance by
inserting, updating, and deleting its data. DML is responsible for all forms data modification in a
database. SQL contains the following set of commands in its DML section:
SELECT/FROM/WHERE
INSERT INTO/VALUES
UPDATE/SET/WHERE
DELETE FROM/WHERE
These basic constructs allow database programmers and users to enter data and information into the
database and retrieve efficiently using a number of filter options.
SELECT/FROM/WHERE
SELECT
This is one of the fundamental query commands of SQL. It is similar to the projection operation of
relational algebra. It selects the attributes based on the condition described by WHERE clause.
FROM
This clause takes a relation name as an argument from which attributes are to be selected/projected. In
case more than one relation names are given, this clause corresponds to Cartesian product.
WHERE
This clause defines predicate or conditions, which must match in order to qualify the attributes to be
projected.
For example:
Select author_name
From book_author
Where age > 50;
This command will yield the names of authors from the relation book_author whose age is greater than
50.
INSERT INTO/VALUES
This command is used for inserting values into the rows of a table (relation).
Syntax:
INSERT INTO table (column1 [, column2, column3 ... ]) VALUES (value1 [, value2, value3 ... ]) Or
INSERT INTO table VALUES (value1, [value2, ... ])
For example:
INSERT INTO tutorialspoint (Author, Subject) VALUES ("anonymous", "computers");
UPDATE/SET/WHERE
This command is used for updating or modifying the values of columns in a table (relation).
Syntax:
UPDATE table_name SET column_name = value [, column_name = value ...] [WHERE condition]
For example:
UPDATE tutorialspoint SET Author="webmaster" WHERE Author="anonymous";
DELETE/FROM/WHERE
This command is used for removing one or more rows from a table (relation).
Syntax:
DELETE FROM table_name [WHERE condition];
For example:
DELETE FROM tutorialspoint
WHERE Author="unknown";
DATA CONTROL LANGUAGE
DCL stands for Data Control Language.
DCL is used to control user access in a database.
This command is related to the security issues.
Using DCL command, it allows or restricts the user from accessing data in database schema.
DCL commands are as follows,
1. GRANT
2. REVOKE
It is used to grant or revoke access permissions from any database user.
1. GRANT COMMAND
GRANT command gives user's access privileges to the database.
This command allows specified users to perform specific tasks.
Syntax
GRANT <privilege list>
ON <relation name or view name>
TO <user/role list>;
Example :
GRANT Command
GRANT ALL ON employee
TO ABC;
[WITH GRANT OPTION]
2. REVOKE COMMAND
REVOKE command is used to cancel previously granted or denied permissions. This command
withdraws access privileges given with the GRANT command.
It takes back permissions from user.
Syntax:
REVOKE <privilege list>
ON <relation name or view name>
FROM <user name>;
Example :
REVOKE Command
REVOKE UPDATE
ON employee
FROM ABC;
Difference between GRANT and REVOKE command.
GRANT REVOKE
GRANT command allows a user to perform certain REVOKE command disallows a user to perform
activities on the database. certain activities.
It grants access privileges for database objects to It revokes access privileges for database objects
other users. previously granted to other users.
Example: Example:
GRANT privilege_name REVOKE privilege_name
ON object_name ON object_name
TO FROM
{ {
user_name|PUBLIC|role_name user_name|PUBLIC|role_name
} }
[WITH GRANT OPTION];
ROLLBACK COMMIT
ROLLBACK command is used to undo the The COMMIT command is used to save the
changes made by the DML commands. modifications done to the database values by
the DML commands.
It rollbacks all the changes of the current It will make all the changes permanent that
transaction. cannot be rolled back.
Syntax: Syntax:
DELETE FROM table_name ROLLBACK COMMIT;
SQL is one of the most demanding skills in the current world. Every day a huge amount of data is
collected and one have to deal with these databases to make insightful information. Hence it is
important for us to learn SQL as it is a special-purpose database programming language which help to
generate useful strategies from a database and can easily interact with large and massive database, no
matter what is the size. These features of SQL make SQL a most powerful tool. Hence, here are some of
the major SQL features which make it a successful database programming language:
Characteristics of SQL
High Performance
SQL provide high performance programming capability for highly transactional, heavy workload
and high usage database system. SQL programming gives various ways to describe the data more
analytically.
High Availability
SQL is compatible with databases like MS Access, Microsoft SQL Server, MySQL, Oracle Database,
SAP HANA, SAP Adaptive Server, etc. All of these systems support SQL and it is easy to create an
application extension for procedural programming and various other functions which is additional
features thus converting SQL into a powerful tool.
Scalability and Flexibility
SQL provide Scalability and Flexibility. It is very easy to create new tables and previously created or
not used tables can be dropped or deleted in a database.
Robust Transactional Support
With SQL programming can handle large records and manage numerous transactions.
High Security
It is very easy to provide permissions on tables, procedures, and views hence SQL give security to
your data.
Comprehensive Application Development
SQL is used by many programmers to program apps to access a database. No matter what is the size
of organization, SQL works for every small or large organization
Management Ease
SQL is used in almost every relational database management system. “Select“, “Create”, “Insert”,
“Drop”, “Update”, and “Delete” are the standard and common SQL commands that helps us to
manage large amount of data from a database very quickly and efficiently.
Open Source
SQL is an open-source programming language for building relational database management system
*EMBEDDED SQL
Embedded SQL is a method of combining the computing power of a programming language and the
database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written in
line with the program source code of the host language. The embedded SQL statements are parsed by
an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from
the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL
statements in programs written in any number of languages such as: C/C++, COBOL and Fortran.
The SQL standards committee defined the embedded SQL standard in two steps: a formalism called
Module Language was defined, then the embedded SQL standard was derived from Module Language.
Embedded SQL is a robust and convenient method of combining the computing power of a
programming language with SQL's specialized data management and manipulation capabilities.
Static Vs Dynamic SQL:
Static SQL
The source form of a static SQL statement is embedded within an application program written in a host
language such as COBOL.
The statement is prepared before the program is executed and the operational form of the statement
persists beyond the execution of the program.
Static SQL statements in a source program must be processed before the program is compiled. This
processing can be accomplished through the DB2 precompiled or the SQL statement coprocessor.
The DB2 precompiled or the coprocessor checks the syntax of the SQL statements, turns them into host
language comments, and generates host language statements to invoke DB2.
Dynamic SQL:
Programs that contain embedded dynamic SQL statements must be precompile like those that
contain static SQL, but unlike static SQL, the dynamic statements are constructed and
prepared at run time. The source form of a dynamic statement is a character string that is passed to
DB2 by the program using the static SQL statement PREPARE or EXECUTE IMMEDIATE.
UNIT-II DATABASE DESIGN
*ENTITY-RELATIONSHIP MODEL
What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual database
implementation we need to be familiar with the following terms to go further.
Entity
Anything that has an independent existence and about which we collect data. It is also known as entity
type. In ER modeling, notation for entity is given below.
ENTITY
Entity instance
Entity instance is a particular member of the entity type.
Example for entity instance: A particular employee
Regular Entity
An entity which has its own key attribute is a regular entity.
Example for regular entity: Employee.
Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its own
is a weak 19 entity.
Example for a weak entity: In a parent/child relationship, a parent is considered as a strong entity
and the child is a weak entity.
In ER modeling, notation for weak entity is given below.
ENTITY
Attributes
Properties/characteristics which describe entities are called attributes. In ER modeling, notation for
attribute is given below
Attribute
Domain of Attributes
The set of possible values that an attribute can take is called the domain of the attribute. For example,
the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this set can be
termed as the domain of the attribute day.
Key attribute
The attribute (or combination of attributes) which is unique for every entity instance is called key
attribute. E.g the employee_id of an employee, pan_card_number of a person etc.If the key attribute
consists of two or more attributes in combination; it is called a composite key.
Attribute
In Simple attribute
If an attribute cannot be divided into simpler components, it is a simple attribute.
Example for simple attribute: employee_id of an employee.
Composite attribute
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute : Name of the employee which can be split into First_name,
Middle_name, and Last_name.
Single valued Attributes
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
Example for single valued attribute : age of a student. It can take only one value for a particular student.
Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is a multi-valued attribute.
Multi-valued example for multi valued attribute: telephone number of an employee, a particular
employee may have multiple telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.
Attribute
Stored Attribute
An attribute which need to be stored permanently is a stored attribute
Example for stored attribute: name of a student
Derived Attribute
An attribute which can be calculated or derived based on other attributes is a derived attribute.
Example for derived attribute: age of employee which can be calculated from date of birth and
current date. In ER modeling, notation for derived attribute is given below.
Attribute
Relationships
Associations between entities are called relationships
Example: An employee works for an organization. Here "works for" is a relation between the entities
employee and organization. In ER modeling, notation for relationship is given below.
Relationship
However in ER Modeling, to connect a weak Entity with others, you should use a weak relationship
notation as given below
Relationship
Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary relationship is the general
form for degree n. Special cases are unary, binary, and ternary, where the degree is 1, 2, and 3,
respectively.
Example for unary relationship: An employee ia a manager of another employee
Example for binary relationship: An employee works-for department.
Example for ternary relationship: customer purchase item from a shop keeper
Cardinality of a Relationship
Relationship cardinalities specify how many of each entity type is allowed. Relationships can have four
possible connectivities as given below.
1. One to one (1:1) relationship
2. One to many (1:N) relationship
3. Many to one (M:1) relationship
4. Many to many (M:N) relationship The minimum and maximum values of this connectivity is called
the cardinality of the relationship
Relationship Participation
1. Total
In total participation, every entity instance will be connected through the relationship to another
instance of the other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head of the
department.
In other words, only few instances of employee entity participate in the above relationship. So
employee entity's participation is partial in the said relationship. However each department will be
headed by some employee. So department entity's participation is total in the said relationship.
Advantages and Disadvantages of ER Modeling (Merits and Demerits of ER Modeling)
Advantages
1. ER Modeling is simple and easily understandable. It is represented in business user’s language and it
can be understood by non-technical specialist.
2. Intuitive and helps in Physical Database creation.
3. Can be generalized and specialized based on needs.
4. Can help in database design.
5. Gives a higher level description of the system.
Disadvantages
1. Physical design derived from E-R Model may have some amount of ambiguities or inconsistency.
2. Sometime diagrams may lead to misinterpretations
*ER Diagram
An Entity–relationship model (ER model) describes the structure of a database with the help of a
diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a design or
blueprint of a database that can later be implemented as a database. The main components of E-R
model are: entity set and relationship set.
What is an Entity Relationship Diagram (ER Diagram)?
An ER diagram shows the relationship among entity sets. An entity set is a group of similar entities and
these entities can have attributes. In terms of DBMS, an entity is a table or attribute of a table in
database, so by showing relationship among tables and their attributes, ER diagram shows the
complete logical structure of a database. Lets have a look at a simple ER diagram to understand this
concept.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students however
a student cannot study in multiple colleges at the same time. Student entity has attributes such as
Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these terms in
detail in the next section(Components of a ER Diagram) of this guide so don’t worry too much about
these terms now, just go through them once.
Rectangle: Represents Entity sets.
Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multi-valued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
Components of an ER Diagram
As shown in the above diagram, an ER diagram has three main components:
1.Entity
2.Attribute
3. Relationship
1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an ER diagram.
For example: In the following ER diagram we have two entities Student and College and these two
entities have many to one relationship as many students study in a single college. We will read more
about relationships later, for now focus on entities.
Weak Entity
An entity that cannot be uniquely identified by its own attributes and relies on the relationship with
other entity is called weak entity. The weak entity is represented by a double rectangle. For example – a
bank account cannot be uniquely identified without knowing the bank to which the account belongs, so
bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multi-valued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll number
can uniquely identify a student from a set of students. Key attribute is represented by oval same as
other attributes however the text of key attribute is underlined.