Dbms Notes 1
Dbms Notes 1
Consider a case where you wish to store the name, the CGPA attained, and the roll number
of all the students of a particular class. This structured data can be easily stored in a table
as described below:
As we can notice from the above relation:
Any given row of the relation indicates a student i.e., the row of the table describes
a real-world entity.
The columns of the table indicate the attributes related to the entity. In this case,
the roll number, CGPA, and the name of the student.
NOTE: A database implemented and organized in terms of the relational model is known as
a relational database management system (RDBMS). Hence, the relational model describes
how data is stored in relational databases.
Highlights:
As discussed earlier, a relational database is based on the relational model. This database
consists of various components based on the relational model. These include:
Relation : Two-dimensional table used to store a collection of data elements.
Tuple : Row of the relation, depicting a real-world entity.
Attribute/Field : Column of the relation, depicting properties that define the relation.
Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it
describes the legal values that an attribute can take.
Degree : It is the total number of attributes present in the relation.
Cardinality : It specifies the number of entities involved in the relation i.e., it is the total
number of rows present in the relation.
Relational Schema : It is the logical blueprint of the relation i.e., it describes the design and
the structure of the relation. It contains the table name, its attributes, and their types:
For our Student relation example, the relational schema will be:
Relational Instance : It is the collection of records present in the relation at a given time.
Relation Key : It is an attribute or a group of attributes that can be used to uniquely identify
an entity in a table or to determine the relationship between two tables. Relation keys can
be of 6 different types:
1. Candidate Key
2. Super Key
3. Composite Key
4. Primary Key
5. Alternate Key
6. Foreign Key
Highlights:
Relational models make use of some rules to ensure the accuracy and accessibility of the
data. These rules or constraints are known as Relational Integrity Constraints. These
constraints are checked before performing any operation like insertion, deletion, or updation
on the data present in a relational database. These constraints include:
Domain Constraint : It specifies that every attribute is bound to have a value that lies inside
a specific range of values. It is implemented with the help of the Attribute Domain concept.
Key Constraint : It states that every relation must contain an attribute or a set of attributes
(Primary Key) that can uniquely identify a tuple in that relation. This key can never be NULL
or contain the same value for two different tuples.
Referential Integrity Constraint : It is defined between two inter-related tables. It states that
if a given relation refers to a key attribute of a different or same table, then that key must
exist in the given relation.
Highlights:
1. To ensure data accuracy and accessibility, Relational Integrity Constraints are implemented.
2. It includes domain, key, and referential integrity constraints.
When we notice any unexpected behavior while working with the relational databases, there
may be a presence of too much redundancy in the data stored in the database. This can cause
anomalies in the DBMS and it can be of various types such as:
Insertion Anomalies: It is the inability to insert data in the database due to the
absence of other data. For example: Suppose we are dividing the whole class into
groups for a project and the GroupNumber attribute is defined so that null values are
not allowed. If a new student is admitted to the class but not immediately assigned to
a group then this student can't be inserted into the database.
Deletion Anomalies - It is the accidental loss of data in the database upon deletion of
any other data element. For example: Suppose, we have an employee relation that
contains the details of the employee along with the department they are working in.
Now, if a department has one employee working in it and we remove the information
of this employee from the table, there will be the loss of data related to the department
also. This can lead to data inconsistency.
Modification/Update Anomalies - It is the data inconsistency that arises from data
redundancy and partial updation of data in the database. For example: Suppose,
while updating the data into the database duplicate entries were entered. Now, if the
user does not realize that the data is stored redundantly after updation, there will be
data inconsistency in the database.
All these anomalies can lead to unexpected behavior and inconvenience for the user. These
anomalies can be removed with the help of a process known as normalization.
Highlights:
Edgar F. Codd, the creator of the relational model proposed 13 rules known as Codd
Rules that states:
For a database to be considered as a perfect relational database, it must follow the following
rules:
1. Foundation Rule - The database must be able to manage data in relational form.
2. Information Rule - All data stored in the database must exist as a value of some table cell.
3. Guaranteed Access Rule - Every unique data element should be accessible by only a
combination of the table name, primary key value, and the column name.
4. Systematic Treatment of NULL values - Database must support NULL values.
5. Active Online Catalog - The organization of the database must exist in an online catalog that
can be queried by authorized users.
6. Comprehensive Data Sub-Language Rule - Database must support at least one language
that supports: data definition, view definition, data manipulation, integrity constraints,
authorization, and transaction boundaries.
7. View Updating Rule - All views should be theoretically and practically updatable by the
system.
8. Relational Level Operation Rule - The database must support high-level insertion, updation,
and deletion operations.
9. Physical Data Independence Rule - Data stored in the database must be independent of the
applications that can access it i.e., the data stored in the database must not depend on any
other data or an application.
10. Logical Data Independence Rule - Any change in the logical representation of the data
(structure of the tables) must not affect the user's view.
11. Integrity independence - Changing the integrity constraints at the database level should not
reflect any change at the application level.
12. Distribution independence - The database must work properly even if the data is stored in
multiple locations or is being used by multiple end-users.
13. Non-subversion Rule - Accessing the data by low-level relational language should not be
able to bypass the integrity rules and constraints expressed in the high-level relational
language.
Highlights:
1. Codd Rules are 13 sets of constraints that a perfect relational database must follow.
2. Codd Rules were introduced by Edgar F. Codd to resolve the database standardization
problem.
The advantages and reasons due to which the relational model in DBMS is widely accepted
as a standard are:
Simple and Easy To Use - Storing data in tables is much easier to understand and implement
as compared to other storage techniques.
Manageability - Because of the independent nature of each relation in a relational database,
it is easy to manipulate and manage. This improves the performance of the database.
Query capability - With the introduction of relational algebra, relational databases provide
easy access to data via high-level query language like SQL.
Data integrity - With the introduction and implementation of relational constraints, the
relational model can maintain data integrity in the database.
Highlights:
Relational databases are simple to use, easy to manage, provide data integrity, and are
query capable.
All the advantages of relational databases are because of the use of tables and constraints.
Disadvantages of using the relational model
The main disadvantages of relational model in DBMS occur while dealing with a huge
amount of data as:
The performance of the relational model depends upon the number of relations present in
the database.
Hence, as the number of tables increases, the requirement of physical memory increases.
The structure becomes complex and there is a decrease in the response time for the queries.
Because of all these factors, the cost of implementing a relational database increase.
Highlights:
Conclusion
Relational model in DBMS is an approach to logically represent and manage the data stored
in a database by storing data in tables.
Relations, Attributes and Tuples, Degree and Cardinality, Relational Schema and Relation
instance, and Relation Keys are some important components of the Relational Model.
To maintain data integrity constraints such as domain, key, and referential integrity are
implemented in the relational model.
Presence of redundancy in data can lead to insertion, deletion, and updation anomalies in a
relational database.
A perfect relational database follows and implements all the 13 Codd Rules.
Because of the use of tables and constraints, relational models are simple to use, easy to
manage, provide data integrity, and are query capable.
Increasing the amount of data can lead to performance and storage issues with relational
databases.
Read More:
Keys is DBMS.
Relational Calculus in DBMS.
Schema in DBMS.
Data Independence in DBMS.
In simple databases, one attribute is enough to uniquely identify all rows. However, that’s not
always the case. Consider the following example,
There are 3 sections (A, B, C) in the 5th grade of a certain school. Based on the alphabetic
order, the students are given a roll number, which uniquely identifies them in a classroom.
However, since there are 3 such sections, there will be many students who have the same roll
number but are in different sections.
In this case, we can uniquely identify a student using their section and roll numbers together.
For example, (5B, 12) represents exactly one student, who is in 5B and has the roll number
12. To know more about Keys in DBMS check out this article.
1) Super Keys
A Super Key is essentially just a key, i.e. it can uniquely identify all the attributes in a
database.
2) Composite Keys
A Composite Key is a key that contains more than one attribute. In the student table
mentioned above, the key – (Section, Roll Number) is a Composite Key. This key can contain
any number of attributes (greater than 1). Trivially, the key involving all the columns in the
table is the largest Composite Key possible.
3) Candidate Keys
A Candidate Key is a key that contains the least possible attributes, and that maintains the
criteria that it can uniquely identify any table row. Again, in the student table mentioned
above, Roll Number cannot be a candidate key, since it cannot identify a student across 5th
grade. Similarly, the key (Section, Roll Number, Name) cannot be a candidate key since we
can make do with (Section, Roll Number) as a key, which has 1 less attribute.
4) Primary Key
One candidate key from the set of all possible candidate keys is chosen to be the primary key.
This primary key is used to identify rows once decided, which reduces the complexity of data
retrieval since we would rely on only 1 key for most queries. A primary key cannot have null
values for obvious reasons.
5) Alternate Keys
After a primary key is chosen from the set of candidate keys, the leftover keys are called
Alternate Keys.
6) Foreign Key
A Foreign Key in table X is a primary key in another table Y, which is used to identify the
rows in table Y from the point of view of table X.
For example, if each college student had a proctor, we could put the details of the proctor on
the student table itself. But, since many students can have the same proctor, doing so will
result in redundant data. To eliminate this redundancy, we can create a separate proctor table
and mention the proctor’s id for each student in the student table.
In this scenario, proctor_Id is the foreign key in the Student table and is used to cross-
reference the proctor’s details.
Since a super key is just a key that can uniquely identify a set of attributes, all candidate keys
come under the bracket of super keys. The set of all Super Keys is a superset of all Candidate
Keys.
Essentially, the super keys with the least number of attributes form the candidate keys. With
this, we can create all super keys by just pairing the candidate keys with other table columns.
Super keys in DBMS are important since they’re the starting point of keys, normal forms, and
more!
The main purpose of a super key is just to identify rows in the table. In many cases, you can't
identify a table with any random column, since a column with duplicates will not be able to
identify a unique row.
Super Keys remove this ambiguity and make data retrieval easy.
Interestingly, we can't identify any row with less than 1 column, which makes (section, roll
number) a Candidate Key.
We can do the same with the key (first name, last name) for this table specifically, but it is
possible for 2 people to have the same first and last names, which is why it is avoided.
We can create other super keys using the candidate keys. For example, the key (section, roll
number, first name) is also a Super Key. It is not a candidate key, since it doesn't use the least
number of columns required to uniquely identify a row.
Conclusion
Super key in DBMS is the key that can uniquely identify any row in a database.
Candidate Keys are super keys with the least number of columns.
We can generate the set of all super keys using the candidate keys as a base.
Unit-2
Relational Algebra
Basic Operations
Derived Operations
Applying these operations over relations/tables will give us new relations as output.
Basic Operations
Six fundamental operations are mentioned below. The majority of data retrieval
operations are carried out by these. Let's know them one by one.
But, before moving into detail, let's have two tables or we can say
relations STUDENT(ROLL, NAME, AGE) and EMPLOYEE(EMPLOYEE_NO, NAME,
AGE) which will be used in the below examples.
STUDENT
1 Aman 20
2 Atul 18
3 Baljeet 19
4 Harsh 20
5 Prateek 21
6 Prateek 23
EMPLOYEE
EMPLOYEE_NO NAME AGE
E-1 Anant 20
E-2 Ashish 23
E-3 Baljeet 25
E-4 Harsh 20
E-5 Pranav 22
Select (σ)
σ AGE=20 (STUDENT)
Project (∏)
∏ NAME(STUDENT)
This will return the following output:
NAME
Aman
Atul
Baljeet
Harsh
Prateek
∏ ROLL,NAME(STUDENT)
ROLL NAME
1 Aman
2 Atul
3 Baljeet
4 Harsh
5 Prateek
6 Prateek
Union (∪)
If relations don't have the same set of attributes, then the union of such relations will
result in NULL.
∏ NAME(STUDENT) ∪ ∏ NAME(EMPLOYEE)
NAME
Aman
NAME
Anant
Ashish
Atul
Baljeet
Harsh
Pranav
Prateek
Set Difference as its name indicates is the difference between two relations (R-S). It is
denoted by a "Hyphen"(-) and it returns all the tuples(rows) which are in relation R but not
in relation S. It is also a binary operator.
Notation : R - S
Where R is the first relation
S is the second relation
Just like union, the set difference also comes with the exception of the same set of
attributes in both relations.
Let's take an example where we would like to know the names of students who are in
STUDENT Relation but not in EMPLOYEE Relation.
∏ NAME(STUDENT) - ∏ NAME(EMPLOYEE)
NAME
Aman
Atul
Prateek
Cartesian product is denoted by the "X" symbol. Let's say we have two relations R and S.
Cartesian product will combine every tuple(row) from R with all the tuples from S. I know
it sounds complicated, but once we look at an example, you'll see what I mean.
Notation: R X S
Where R is the first relation
S is the second relation
STUDENT X EMPLOYEE
. . . And so on.
Rename (ρ)
Rename operation is denoted by "Rho"(ρ). As its name suggests it is used to rename the
output relation. Rename operator too is a binary operator.
Notation: ρ(R,S)
Where R is the new relation name
S is the old relation name
ρ(STUDENT_NAME,∏ NAME(STUDENT))
STUDENT_NAME
NAME
Aman
Atul
Baljeet
Harsh
Prateek
Takeaway
Derived Operations
Also known as extended operations, these operations can be derived from basic operations
and hence named Derived Operations. These include three operations: Join Operations,
Intersection operations, and Division operations.
Join Operations
Join Operation in DBMS are binary operations that allow us to combine two or more
relations.
They are further classified into two types: Inner Join, and Outer Join.
EMPLOYEE
E_NO E_NAME CITY EXPERIENCE
E-1 Ram Delhi 04
E-2 Varun Chandigarh 09
E-3 Ravi Noida 03
E-4 Amit Bangalore 07
DEPARTMENT
Also, let's have the Cartesian Product of the above two relations. It will be much easier to
understand Join Operations when we have the Cartesian Product.
Inner Join
When we perform Inner Join, only those tuples returned that satisfy the certain condition.
It is also classified into three types: Theta Join, Equi Join and Natural Join.
Theta Join (θ)
Theta Join combines two relations using a condition. This condition is represented by the
symbol "theta"(θ). Here conditions can be inequality conditions such as >,<,>=,<=, etc.
Notation : R ⋈θ S
Where R is the first relation
S is the second relation
Check the Cartesian Product, if in any tuple/row EXPERIENCE >= MIN_EXPERIENCE then
insert this tuple/row in output relation.
Equi Join
Equi Join is a special case of theta join where the condition can only
contain **equality(=)** comparisons.
A non-equijoin is the inverse of an equi join, which occurs when you join on a condition
other than "=".
Let's have an example where we would like to join EMPLOYEE and DEPARTMENT relation
where E_NO from EMPLOYEE = E_NO from DEPARTMENT.
EMPLOYEE ⋈EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_EXPERIENCE
Check Cartesian Product, if the tuple contains same E_NO, insert that tuple in the output
relation
A comparison operator is not used in a natural join. It does not concatenate like a
Cartesian product. A Natural Join can be performed only if two relations share at least one
common attribute. Furthermore, the attributes must share the same name and
domain.
Natural join operates on matching attributes where the values of the attributes in both
relations are the same and remove the duplicate ones.
Preferably Natural Join is performed on the foreign key.
Notation : R ⋈ S
Where R is the first relation
S is the second relation
Let's say we want to join EMPLOYEE and DEPARTMENT relation with E_NO as a common
attribute.
Notice, here E_NO has the same name in both the relations and also consists of the same
domain, i.e., in both relations E_NO is a string.
EMPLOYEE ⋈ DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
But unlike the above operation, where we have two columns of E_NO, here we are having
only one column of E_NO. This is because Natural Join automatically keeps a single
copy of a common attribute.
Outer Join
Unlike Inner Join which includes the tuple that satisfies the given condition, Outer Join
also includes some/all the tuples which don't satisfy the given condition. It is also of three
types: Left Outer Join, Right Outer Join, and Full Outer Join.
As we can see from the diagram, Left Outer Join returns the matching tuples(tuples
present in both relations) and the tuples which are only present in Left Relation, here R.
However, if the matching tuples are NULL, then attributes/columns of Right Relation, here
S are made NULL in the output relation.
Here we are combining EMPLOYEE and DEPARTMENT relation with the constraint that
EMPLOYEE's E_NO must be equal to DEPARTMENT's E_NO.
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
As you can see here, all the tuples from left, i.e., EMPLOYEE relation are present. But E-4 is
not satisfying the given condition, i.e., E_NO from EMPLOYEE must be equal to E_NO from
DEPARTMENT, still it is included in the output relation. This is because Outer Join also
includes some/all the tuples which don't satisfy the condition. That's why Outer Join
marked E-4's corresponding tuple/row from DEPARTMENT as NULL.
Right Outer Join returns the matching tuples and the tuples which are only present in
Right Relation here S.
The same happens with the Right Outer Join, if the matching tuples are NULL, then the
attributes of Left Relation, here R are made NULL in the output relation.
We will combine EMPLOYEE and DEPARTMENT relations with the same constraint as
above.
As all the tuples from DEPARTMENT relation have a corresponding E_NO in EMPLOYEE
relation, therefore no tuple from EMPLOYEE relation contains a NULL.
Full Outer Join returns all the tuples from both relations. However, if there are no
matching tuples then, their respective attributes are made NULL in output relation.
Again, combine the EMPLOYEE and DEPARTMENT relation with the same constraint.
Intersection (∩)
∏ NAME(STUDENT) ∩ ∏ NAME(EMPLOYEE)
NAME
Baljeet
Harsh
Division (÷)
Let's have two relations, ENROLLED and COURSE. ENROLLED consist of two attributes
STUDENT_ID and COURSE_ID. It denotes the map of students who are enrolled in given
courses.
ENROLLED
STUDENT_ID COURSE_ID
Student_1 DBMS
Student_2 DBMS
Student_1 OS
Student_3 OS
COURSE
COURSE_ID
DBMS
OS
Now the query is to return the STUDENT_ID of students who are enrolled in every course.
ENROLLED(STUDENT_ID, COURSE_ID)/COURSE(COURSE_ID)
STUDENT_ID
Student_1
SQL Commands
o SQL commands are instructions. It is used to communicate with the database. It is also
used to perform specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the table, modify
the table, set permission for users.
o CREATE
o ALTER
o DROP
o TRUNCATE
Syntax:
Example:
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
Example
c. ALTER: It is used to alter the structure of the database. This change could be either to
modify the characteristics of an existing attribute or probably to add a new attribute.
Syntax:
EXAMPLE
d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
Syntax:
1. TRUNCATE TABLE table_name;
Example:
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of
a table.
Syntax:
Or
For example:
b. UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
1. UPDATE students
2. SET User_Name = 'Sonoo'
3. WHERE Student_Id = '3'
Syntax:
For example:
o Grant
o Revoke
Example
Example
These operations are automatically committed in the database that's why they cannot be
used while creating tables or dropping them.
a. Commit: Commit command is used to save all the transactions to the database.
Syntax:
1. COMMIT;
Example:
b. Rollback: Rollback command is used to undo transactions that have not already been
saved to the database.
Syntax:
1. ROLLBACK;
Example:
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.
Syntax:
1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
DQL is used to fetch the data from the database.
o SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.
Syntax:
1. SELECT expressions
2. FROM TABLES
3. WHERE conditions;
For example:
1. SELECT emp_name
2. FROM employee
3. WHERE age > 20;
In Database Management Systems, integrity constraints are pre-defined set of rules that are
applied on the table fields(columns) or relations to ensure that the overall validity, integrity,
and consistency of the data present in the database table is maintained. Evaluation of all the
conditions or rules mentioned in the integrity constraint is done every time a table insert,
update, delete, or alter operation is performed. The data can be inserted, updated, deleted, or
altered only if the result of the constraint comes out to be True. Thus, integrity constraints are
useful in preventing any accidental damage to the database by an authorized user.
Domain integrity constraint contains a certain set of rules or conditions to restrict the kind
of attributes or values a column can hold in the database table. The data type of a domain
can be string, integer, character, DateTime, currency, etc.
Example:
Consider a Student's table having Roll No, Name, Age, Class of students.
101 Adam 14 6
102 Steve 16 8
103 David 8 4
104 Bruce 18 12
105 Tim 6 A
In the above student's table, the value A in the last row last column violates the
domain integrity constraint because the Class attribute contains only integer
values while A is a character.
Entity Integrity Constraint
Entity Integrity Constraint is used to ensure that the primary key cannot be null. A primary
key is used to identify individual records in a table and if the primary key has a null value,
then we can't identify those records. There can be null values anywhere in the table
except the primary key column.
Example:
ID Name Salary
1101 Jackson 40000
1102 Harry 60000
1103 Steve 80000
1104 Ash 1800000
James 36000
In the above employee's table, we can see that the ID column is the primary key
and contains a null value in the last row which violates the entity integrity
constraint.
Referential Integrity Constraint
Referential Integrity Constraint ensures that there must always exist a valid relationship
between two relational database tables. This valid relationship between the two tables
confirms that a foreign key exists in a table. It should always reference a corresponding
value or attribute in the other table or be null.
Example:
Consider an Employee and a Department table where Dept_ID acts as a foreign key
between the two tables
Employees Table
Department Table
Dept_ID Dept_Name
1 Sales
2 HR
3 Technical
In the above example, Dept_ID acts as a foreign key in the Employees table and
a primary key in the Department table. Row having DeptID=4 violates the
referential integrity constraint since DeptID 4 is not defined as a primary key
column in the Departments table.
Key constraint
Keys are the set of entities that are used to identify an entity within its entity set uniquely.
There could be multiple keys in a single entity set, but out of these multiple keys, only one
key will be the primary key. A primary key can only contain unique and not null values in
the relational database table
Example:
Consider a student's table
The last row of the student's table violates the key integrity constraint since Roll
No 102 is repeated twice in the primary key column. A primary key must be
unique and not null therefore duplicate values are not allowed in the Roll No
column of the above student's table.
Conclusion
Integrity Constraints in Database Management Systems are the set of pre-defined rules
responsible for maintaining the quality and consistency of data in the database.
Evaluation against the rules mentioned in the integrity constraint is done every time an
insert, update, delete, or alter operation is performed on the table.
Integrity Constraints in DBMS are of 4 types:
1. Domain Constraint
2. Entity Constraint
3. Referential Integrity Constraint
4. Key Constraint
COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;
SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;
The MAX() function returns the largest value of the selected column.
MIN()
SELECT MIN(column_name)
FROM table_name
WHERE condition;
MAX()
SELECT MAX(column_name)
FROM table_name
WHERE condition;
Count():
Count(*): Returns total number of records .i.e 6.
Count(salary): Return number of Non Null values over the column salary. i.e
5.
Count(Distinct Salary): Return number of distinct Non Null values over the
column salary .i.e 4
Sum():
sum(salary): Sum all Non Null values of Column salary i.e., 310
sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.
Avg():
Min():
Min(salary): Minimum value in the salary column except NULL i.e., 40.
Max(salary): Maximum value in the salary i.e., 80.
SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a
related column between them.
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
LEFT JOIN
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;
Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
Tip: FULL OUTER JOIN and FULL JOIN are the same.
Note: FULL OUTER JOIN can potentially return very large result-sets!
Student_Number Student_Name S
1 John
2 Mary
3 Damon
Student_Number Student_Name
2 Mary
3 Damon
6 Matt
Union
Union combines two different results obtained by a query into a single result in the
form of a table. However, the results should be similar if union is to be applied on
them. Union removes all duplicates, if any from the data and only displays distinct
values. If duplicate values are required in the resultant data, then UNION ALL is used.
An example of union is −
This will display the names of all the students in the table Art_Students and
Dance_Students i.e John, Mary, Damon and Matt.
Intersection
The intersection operator gives the common data values between the two data sets
that are intersected. The two data sets that are intersected should be similar for the
intersection operator to work. Intersection also removes all duplicates before
displaying the result.
An example of intersection is −
This will display the names of the students in the table Art_Students and in the table
Dance_Students i.e all the students that have taken both art and dance classes
.Those are Mary and Damon in this example.
Set difference
The set difference operators takes the two sets and returns the values that are in the
first set but not the second set.
An example of set difference is −
This will display the names of all the students in table Art_Students but not in table
Dance_Students i.e the students who are taking art classes but not dance classes.
That is John in this example.
Views in SQL
o Views in SQL are considered as a virtual table. A view also contains rows and
columns.
o To create the view, we can select the fields from one or more tables present in
the database.
o A view can either have specific rows based on certain condition or all the rows
of a table.
ample table:
Student_Detail
STU_ID NAME ADDRESS
1 Stephan Delhi
2 Kathrin Noida
3 David Ghaziabad
4 Alina Gurugram
Student_Marks
1 Stephan 97 19
2 Kathrin 86 21
3 David 74 18
4 Alina 90 20
5 John 96 18
1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view from
a single table or multiple tables.
Syntax:
Query:
Just like table query, we can query the view to view the data.
Output:
NAME ADDRESS
Stephan Delhi
Kathrin Noida
David Ghaziabad
In the given example, a view is created named MarksView from two tables
Student_Detail and Student_Marks.
Query:
Stephan Delhi 97
Kathrin Noida 86
David Ghaziabad 74
Alina Gurugram 90
4. Deleting View
A view can be deleted using the Drop View statement.
Syntax
Example:
A Subquery or Inner query or a Nested query is a query within another SQL query
and embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition
to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements
along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless
multiple columns are in the main query for the subquery to compare its
selected columns.
An ORDER BY command cannot be used in a subquery, although the
main query can use an ORDER BY. The GROUP BY command can be
used to perform the same function as the ORDER BY in a subquery.
Subqueries that return more than one row can only be used with
multiple value operators such as the IN operator.
The SELECT list cannot include any references to values that evaluate
to a BLOB, ARRAY, CLOB, or NCLOB.
A subquery cannot be immediately enclosed in a set function.
The BETWEEN operator cannot be used with a subquery. However, the
BETWEEN operator can be used within the subquery.
Sample Table:
DATABASE
STUDENT
NAME ROLL_NO SECTION
Ravi 104 A
Sumathi 105 B
Raj 102 A
Sample Queries
To display NAME, LOCATION, PHONE_NUMBER of the students from DATABASE table whose
section is A
WHERE ROLL_NO IN
Explanation : First subquery executes “ SELECT ROLL_NO from STUDENT where SECTION=’A’ ”
returns ROLL_NO from STUDENT table whose SECTION is ‘A’.Then outer-query executes it and return
the NAME, LOCATION, PHONE_NUMBER from the DATABASE table of the student whose ROLL_NO is
returned from inner subquery. Output:
Table2: Student2
Output:
To delete students from Student2 table whose rollno is same as that in Student1 table and
having location as Chennai.
DELETE FROM Student2
FROM Student1
Output:
To update name of the students to geeks in Student2 table whose location is same as Raju,Ravi in
Student1 table
UPDATE Student2
SET NAME=’geeks’
FROM Student1
Output:
1 row updated successfully.
Unit-3
Functional dependency
A -> B
001 Finance
002 Marketing
003 HR
Therefore, the above functional dependency between DeptId and DeptName can
be determined as DeptId is functionally dependent on DeptName −
Employee_Id Name
1 Zayn
2 Phobe
3 Hikki
4 David
1 Zayn 24
2 Phobe 34
3 Hikki 26
4 David 29
1 Zayn 24
2 Phobe 34
3 Hikki 26
4 David 29
4 Phobe 24
1 Zayn CD 11
2 Phobe AB 24
3 Hikki CD 11
4 David PQ 71
5 Phobe LM 21
William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.
Normalization
Why Do We Need Normalization?
1. Insertion anomalies: This occurs when we are not able to insert data into a
database because some attributes may be missing at the time of insertion.
2. Updation anomalies: This occurs when the same data items are repeated with the
same values and are not linked to each other.
3. Deletion anomalies: This occurs when deleting one part of the data deletes the
other necessary information from the database.
Normal Forms
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary
key.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
EMP_ID → EMP_COUNTRY
Play Video
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
Candidate keys:
For the first table: EMP_ID
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent
of each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent
on a third attribute that's why it always requires at least three attributes.
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOB/BY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Repetition of information
Example
Redundancy
--Data for branch name, branch city, assets are repeated for each loan that a branch makes.
Null Values
In the given example the database design is faulty which makes the above pitfalls in database.
So we observe that in relational database design if the design is not good then there will be faults in
databases.