0% found this document useful (0 votes)
128 views174 pages

DBMS Notes (BCS and Engineering)

The document discusses different types of data, records, and files including flat files, delimited files, fixed-length files, hierarchical files, and database files. It also covers file-based data management systems and their functionalities and advantages/disadvantages. The document then defines what a database management system is and its key features. It discusses relational and non-relational database management systems. Finally, it covers different database languages including data definition language, data manipulation language, data control language, and transactional control language.

Uploaded by

satyamrajput2486
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views174 pages

DBMS Notes (BCS and Engineering)

The document discusses different types of data, records, and files including flat files, delimited files, fixed-length files, hierarchical files, and database files. It also covers file-based data management systems and their functionalities and advantages/disadvantages. The document then defines what a database management system is and its key features. It discusses relational and non-relational database management systems. Finally, it covers different database languages including data definition language, data manipulation language, data control language, and transactional control language.

Uploaded by

satyamrajput2486
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 174

DBMS

Unit 1
Introduc on to database:
A collected information which is in an organized form for easier access,
management, and various updating is known as a database.

Data can be defined as a collection of facts and records on which we can


apply reasoning or can-do discussion or some calculation. The data is always
easily available and is in plenty. It can be used for processing some useful
information from it. Also, it can be in redundant, can be irrelevant. Data can
exist in form of graphics, reports, tables, text, etc. that represents every kind
of information, that allows easy retrieval, updating, analysis, and output of
data by systematically organized or structured repository of indexed
information.

Containers having a huge amount of data are known as databases, for


example, a public library stores books. Databases are computer structures that
save, organize, protect, and deliver data.

Any system that manages databases is called a database management


system, or DBM. The typical diagram representation for a database is a
cylinder.

Inside a database, the data is recorded in a table which is a collection of


rows, columns, and it is indexed so that to find relevant information becomes
an easier task. As new information is added, data gets updated, expanded
and deleted. The various processes of databases create and update
themselves, querying the data they contain and running applications against
it.

Types of data, records and files:


Sure! Here are the different types of data, records, and files:
By Structure:
 Flat files: These are the simplest type of file, where each record is a single
line of text with fields separated by delimiters such as commas, tabs, or pipes.
They are easy to create and read, but can be inflexible and difficult to query.
 Delimited files: Similar to flat files, but the fields are separated by a specific
character or sequence of characters, such as commas (CSV), tabs (TSV), or
semicolons.
DBMS
 Fixed-length files: Each record has a fixed length, regardless of the amount of
data in each field. This can be more efficient for storage and processing, but it
can be difficult to accommodate changes in the data structure.
 Hierarchical files: Records are organized in a tree-like structure, with parent
records containing child records. This can be useful for representing complex
relationships between data items.
 Database files: These files are used to store data in a relational database
management system (RDBMS). They are typically organized into tables,
which are collections of records with a fixed schema.
By Content:
 Text files: These files contain human-readable text, such as documents,
emails, or source code.
 Image files: These files store images, such as photographs, drawings, or
diagrams.
 Audio files: These files store audio recordings, such as music, speeches, or
sound effects.
 Video files: These files store video recordings, such as movies, TV shows, or
video games.
 Binary files: These files contain data in a binary format, which is not human-
readable. They are often used to store data that is not text-based, such as
program executables or compressed files.
By Purpose:
 Transaction files: These files are used to record changes to data, such as
sales transactions or financial records.
 Master files: These files contain relatively static data, such as customer
records or product information.
 Log files: These files record events that occur in a system, such as system
errors or user activity.
 Archive files: These files are used to store data that is no longer actively used,
but needs to be kept for historical purposes.
.
File based system:
The systems that are used to organize and maintain data files are known as
file based data systems. These file systems are used to handle a single or
multiple files and are not very efficient.

Functionalities
DBMS
The functionalities of a File-based Data Management System are as follows −

 A file based system helps in basic data management for any user.
 The data stored in the file based system should remain consistent. Any transactions
done in the file based system should not alter the consistency property.
 The file based system should not allow any illegal or potentially hazardous operations
to occur on the data.
 The file based system should allow concurrent access by different processes and this
should be carefully coordinated.
 The file based system should make sure that the data is uniformly structured and
stored so it is easier to access it.

Advantages of File Based System


 The file Based system is not complicated and is simpler to use.
 Because of the above point, this system is quite inexpensive.
 Because the file based system is simple and cheap, it is normally suitable for home
users and owners of small businesses.
 Since the file based system is used by smaller organisations or individual users, it
stores comparatively lesser amount of data. Hence, the data can be accessed faster
and more easily.

Disadvantages of File Based System


 The File based system is limited to a smaller size and cannot store large amounts of
data.
 This system is relatively uncomplicated but this means it cannot support
complicated queries, data recovery etc.
 There may be redundant data in the file based system as it does not have a complex
mechanism to get rid of it.
 The data is not very secure in a file based system and may be corrupted or
destroyed.
 The data files in the file based system may be stored across multiple locations.
Consequently, it is difficult to share the data easily with multiple users.

DBMS:
A Database Management System (DBMS) is a so ware system that is
designed to manage and organize data in a structured manner. It
DBMS
allows users to create, modify, and query a database, as well as
manage the security and access controls for that database.

DBMS provides an environment to store and retrieve the data in


coinvent and efficient manner.

Key Features of DBMS


Data modeling: A DBMS provides tools for crea ng and modifying
data models, which define the structure and rela onships of the data
in a database.
Data storage and retrieval: A DBMS is responsible for storing and
retrieving data from the database, and can provide various methods
for searching and querying the data.
Concurrency control: A DBMS provides mechanisms for controlling
concurrent access to the database, to ensure that mul ple users can
access the data without conflic ng with each other.
Data integrity and security: A DBMS provides tools for enforcing data
integrity and security constraints, such as constraints on the values of
data and access controls that restrict who can access the data.
Backup and recovery: A DBMS provides mechanisms for backing up
and recovering the data in the event of a system failure.
DBMS can be classified into two types: Rela onal Database
Management System (RDBMS) and Non-Rela onal Database
Management System (NoSQL or Non-SQL)
RDBMS: Data is organized in the form of tables and each table has a
set of rows and columns. The data are related to each other through
primary and foreign keys.
DBMS
NoSQL: Data is organized in the form of key-value pairs, documents,
graphs, or column-based. These are designed to handle large-scale,
high-performance scenarios.
A database is a collec on of interrelated data which helps in the
efficient retrieval, inser on, and dele on of data from the database
and organizes the data in the form of tables, views, schemas, reports,
etc. For Example, a university database organizes the data about
students, faculty, admin staff, etc. which helps in the efficient
retrieval, inser on, and dele on of data from it.

1. Database Languages
2. Data Defini on Language
3. Data Manipula on Language
4. Data Control Language
5. Transac onal Control Language
6. Data Defini on Language
DDL is the short name for Data Defini on Language, which deals with
database schemas and descrip ons, of how the data should reside in
the database.

CREATE: to create a database and its objects like (table, index, views,
store procedure, func on, and triggers)
ALTER: alters the structure of the exis ng database
DROP: delete objects from the database
TRUNCATE: remove all records from a table, including all spaces
allocated for the records are removed
COMMENT: add comments to the data dic onary
RENAME: rename an object
DBMS
Data Manipula on Language
DML is the short name for Data Manipula on Language which deals
with data manipula on and includes most common SQL statements
such SELECT, INSERT, UPDATE, DELETE, etc., and it is used to store,
modify, retrieve, delete and update data in a database.

SELECT: retrieve data from a database


INSERT: insert data into a table
UPDATE: updates exis ng data within a table
DELETE: Delete all records from a database table
MERGE: UPSERT opera on (insert or update)
CALL: call a PL/SQL or Java subprogram
EXPLAIN PLAN: interpreta on of the data access path
LOCK TABLE: concurrency Control
Data Control Language
DCL is short for Data Control Language which acts as an access
specifier to the database.(basically to grant and revoke permissions
to users in the database

GRANT: grant permissions to the user for running DML(SELECT,


INSERT, DELETE,…) commands on the table
REVOKE: revoke permissions to the user for running DML(SELECT,
INSERT, DELETE,…) command on the specified table
Transac onal Control Language
DBMS
TCL is short for Transac onal Control Language which acts as an
manager for all types of transac onal data and all transac ons. Some
of the command of TCL are

Roll Back: Used to cancel or Undo changes made in the database


Commit: It is used to apply or save changes in the database
Save Point: It is used to save the data on the temporary basis in the
database
Data retrieval language:
DRL is short for Data Retrieval Language which is used for retrieval of
data. It can also be said as DML.

SELECT: Used for extrac ng the required data.


Database Management System
The so ware which is used to manage databases is called Database
Management System (DBMS). For Example, MySQL, Oracle, etc. are
popular commercial DBMS used in different applica ons. DBMS
allows users the following tasks:

Data Defini on: It helps in the crea on, modifica on, and removal of
defini ons that define the organiza on of data in the database.
Data Upda on: It helps in the inser on, modifica on, and dele on of
the actual data in the database.
Data Retrieval: It helps in the retrieval of data from the database
which can be used by applica ons for various purposes.
User Administra on: It helps in registering and monitoring users,
enforcing data security, monitoring performance, maintaining data
DBMS
integrity, dealing with concurrency control, and recovering
informa on corrupted by unexpected failure.
Applica ons of DBMS:
Enterprise Informa on: Sales, accoun ng, human resources,
Manufacturing, online retailers.
Banking and Finance Sector: Banks maintaining the customer details,
accounts, loans, banking transac ons, credit card transac ons.
Finance: Storing the informa on about sales and holdings,
purchasing of financial stocks and bonds.
University: Maintaining the informa on about student course
enrolled informa on, student grades, staff roles.
Airlines: Reserva ons and schedules.
Telecommunica ons: Prepaid, postpaid bills maintance.
Paradigm Shi from File System to DBMS
File System manages data using files on a hard disk. Users are
allowed to create, delete, and update the files according to their
requirements. Let us consider the example of file-based University
Management System. Data of students is available to their respec ve
Departments, Academics Sec on, Result Sec on, Accounts Sec on,
Hostel Office, etc. Some of the data is common for all sec ons like
Roll No, Name, Father Name, Address, and Phone number of
students but some data is available to a par cular sec on only like
Hostel allotment number which is a part of the hostel office. Let us
discuss the issues with this system:

Redundancy of data: Data is said to be redundant if the same data is


copied at many places. If a student wants to change their Phone
number, he or she has to get it updated in various sec ons. Similarly,
DBMS
old records must be deleted from all sec ons represen ng that
student.
Inconsistency of Data: Data is said to be inconsistent if mul ple
copies of the same data do not match each other. If the Phone
number is different in Accounts Sec on and Academics Sec on, it will
be inconsistent. Inconsistency may be because of typing errors or not
upda ng all copies of the same data.
Difficult Data Access: A user should know the exact loca on of the
file to access data, so the process is very cumbersome and tedious. If
the user wants to search the student hostel allotment number of a
student from 10000 unsorted students’ records, how difficult it can
be.
Unauthorized Access: File Systems may lead to unauthorized access
to data. If a student gets access to a file having his marks, he can
change it in an unauthorized way.
No Concurrent Access: The access of the same data by mul ple users
at the same me is known as concurrency. The file system does not
allow concurrency as data can be accessed by only one user at a
me.
No Backup and Recovery: The file system does not incorporate any
backup and recovery of data if a file is lost or corrupted.
Advantages of DBMS
Data organiza on: A DBMS allows for the organiza on and storage of
data in a structured manner, making it easy to retrieve and query the
data as needed.
Data integrity: A DBMS provides mechanisms for enforcing data
integrity constraints, such as constraints on the values of data and
access controls that restrict who can access the data.
DBMS
Concurrent access: A DBMS provides mechanisms for controlling
concurrent access to the database, to ensure that mul ple users can
access the data without conflic ng with each other.
Data security: A DBMS provides tools for managing the security of
the data, such as controlling access to the data and encryp ng
sensi ve data.
Backup and recovery: A DBMS provides mechanisms for backing up
and recovering the data in the event of a system failure.
Data sharing: A DBMS allows mul ple users to access and share the
same data, which can be useful in a collabora ve work environment.
Disadvantages of DBMS
Complexity: DBMS can be complex to set up and maintain, requiring
specialized knowledge and skills.
Performance overhead: The use of a DBMS can add overhead to the
performance of an applica on, especially in cases where high levels
of concurrency are required.
Scalability: The use of a DBMS can limit the scalability of an
applica on, since it requires the use of locking and other
synchroniza on mechanisms to ensure data consistency.
Cost: The cost of purchasing, maintaining and upgrading a DBMS can
be high, especially for large or complex systems.
Limited Use Cases: Not all use cases are suitable for a DBMS, some
solu ons don’t need high reliability, consistency or security and may
be be er served by other types of data storage.
These are the main reasons which made a shi from file system to
DBMS. Also, see
DBMS
A Database Management System (DBMS) is a so ware system that
allows users to create, maintain, and manage databases. It is a
collec on of programs that enables users to access and manipulate
data in a database. A DBMS is used to store, retrieve, and manipulate
data in a way that provides security, privacy, and reliability.

Several Types of DBMS


Rela onal DBMS (RDBMS): An RDBMS stores data in tables with
rows and columns, and uses SQL (Structured Query Language) to
manipulate the data.
Object-Oriented DBMS (OODBMS): An OODBMS stores data as
objects, which can be manipulated using object-oriented
programming languages.
NoSQL DBMS: A NoSQL DBMS stores data in non-rela onal data
structures, such as key-value pairs, document-based models, or graph
models.
Overall, a DBMS is a powerful tool for managing and manipula ng
data, and is used in many industries and applica ons, such as
finance, healthcare, retail, and more.

Levels of Data Abstractions in DBMS


In DBMS, there are three levels of data abstraction, which are as follows:
DBMS
1. Physical or Internal Level:
The physical or internal layer is the lowest level of data abstraction in the database
management system. It is the layer that defines how data is actually stored in the
database. It defines methods to access the data in the database. It defines complex
data structures in detail, so it is very complex to understand, which is why it is kept
hidden from the end user.

Data Administrators (DBA) decide how to arrange data and where to store data. The Data
Administrator (DBA) is the person whose role is to manage the data in the database at the
physical or internal level. There is a data center that securely stores the raw data in detail on
hard drives at this level.

2. Logical or Conceptual Level:


The logical or conceptual level is the intermediate or next level of data abstraction. It
explains what data is going to be stored in the database and what the relationship is
between them.

It describes the structure of the entire data in the form of tables. The logical level or
conceptual level is less complex than the physical level. With the help of the logical
level, Data Administrators (DBA) abstract data from raw data present at the physical
level.

3. View or External Level:


View or External Level is the highest level of data abstraction. There are different views
at this level that define the parts of the overall data of the database. This level is for
the end-user interaction; at this level, end users can access the data based on their
queries.

Advantages of data abstraction in DBMS


o Users can easily access the data based on their queries.
o It provides security to the data stored in the database.
o Database systems work efficiently because of data abstraction.

Instances & Schema

Instances:
An Instance is the state of an operational database with data at any given time. It contains a
snapshot of the database. The instances can be changed by certain CRUD opuations, such as
DBMS
like addition, and deletion of data. It may be noted that any search query will not make any
kind of changes in the instances.

Example:
Let’s say a table teacher in our database whose name is School, suppose the table has 50
records so the instance of the database has 50 records for now and tomorrow we are going
to add another fifty records so tomorrow the instance has a total of 100 records. This is
called an instance.

Schema
Schema is the overall description of the database. The basic structure of how the data will be
stored in the database is called schema.

Schema is of three types: Logical Schema, Physical Schema and view Schema.

Logical Schema – It describes the database designed at a logical level.

Physical Schema – It describes the database designed at the physical level.

View Schema – It defines the design of the database at the view level.
DBMS
Example:
Let’s say a table teacher in our database named school, the teacher table requires the name,
dob, and doj in their table so we design a structure as:

Teacher table
name: String
doj: date
dob: date

Above given is the schema of the table teacher.

Difference Between Schema and Instance

Schema Instance

It is the collection of information stored


It is the overall description of the database.
in a database at a particular moment.

Data in instances can be changed using


The schema is same for the whole database.
addition, deletion, and updation.

Does not change Frequently. Changes Frequently.

Defines the basic structure of the database i.e. It is the set of Information stored at a
how the data will be stored in the database. particular time.

Conclusion

In short, the schema is the blueprint of the database, while the instance is the actual data
that is in the database. The schema is the database’s design, and the instance is the data it
contains.

Database users:
Database users are categorized based up on their interaction with the database. These are
seven types of database users in DBMS.

1. Database Administrator (DBA) : Database Administrator (DBA) is a person/team


who defines the schema and also controls the 3 levels of database. The DBA will then
create a new account id and password for the user if he/she need to access the
database. DBA is also responsible for providing security to the database and he
DBMS
allows only the authorized users to access/modify the data base. DBA is responsible
for the problems such as security breaches and poor system response time.

 DBA also monitors the recovery and backup and provide technical support.

 The DBA has a DBA account in the DBMS which called a system or superuser
account.

 DBA repairs damage caused due to hardware and/or software failures.

 DBA is the one having privileges to perform DCL (Data Control Language)
operations such as GRANT and REVOKE, to allow/restrict a particular user
from accessing the database.

2. Naive / Parametric End Users : Parametric End Users are the unsophisticated who
don’t have any DBMS knowledge but they frequently use the database applications in
their daily life to get the desired results. For examples, Railway’s ticket booking users
are naive users. Clerks in any bank is a naive user because they don’t have any DBMS
knowledge but they still use the database and perform their given task.

3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.

4. Sophisticated Users : Sophisticated users can be engineers, scientists, business


analyst, who are familiar with the database. They can develop their own database
applications according to their requirement. They don’t write the program code but
they interact the database by writing SQL queries directly through the query
processor.

5. Database Designers : Data Base Designers are the users who design the structure of
database which includes tables, indexes, views, triggers, stored procedures and
constraints which are usually enforced before the database is created or populated
with data. He/she controls what data must be stored and how the data items to be
related. It is responsibility of Database Designers to understand the requirements of
different user groups and then create a design which satisfies the need of all the user
groups.

6. Application Programmers : Application Programmers also referred as System


Analysts or simply Software Engineers, are the back-end programmers who writes the
code for the application programs. They are the computer professionals. These
programs could be written in Programming languages such as Visual Basic,
Developer, C, FORTRAN, COBOL etc. Application programmers design, debug, test,
and maintain set of programs called “canned transactions” for the Naive (parametric)
users in order to interact with database.

7. Casual Users / Temporary Users : Casual Users are the users who occasionally
use/access the database but each time when they access the database they require
the new information, for example, Middle or higher level manager.
DBMS
8. Specialized users : Specialized users are sophisticated users who write
specialized database application that does not fit into the traditional data-
processing framework. Among these applications are computer aided-design
systems, knowledge-base and expert systems etc.

DBMS structure:
Database Management System (DBMS) is software that allows access to data stored in a
database and provides an easy and effective method of –

 Defining the information.

 Storing the information.

 Manipulating the information.

 Protecting the information from system crashes or data theft.

 Differentiating access permissions for different users.

Data Theft: When somebody steals the information stored on databases, and servers, this
process is known as Data Theft.

Note: Structure of Database Management System is also referred to as Overall System


Structure or Database Architecture but it is different from the tier architecture of Database.

The database system is divided into three components: Query Processor, Storage Manager,
and Disk Storage. These are explained as following below.
DBMS

Architecture of DBMS

1. Query Processor: It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which is received
from the DML compiler.
Query Processor contains the following components –

 DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.

 DDL Interpreter: It processes the DDL statements into a set of table containing meta
data (data about data).

 Embedded DML Pre-compiler: It processes DML statements embedded in an


application program into procedural calls.

 Query Optimizer: It executes the instruction generated by DML Compiler.

2. Storage Manager: Storage Manager is a program that provides an interface between the
data stored in the database and the queries received. It is also known as Database Control
System. It maintains the consistency and integrity of the database by applying the
constraints and executing the DCL statements. It is responsible for updating, storing,
DBMS
deleting, and retrieving data in the database.
It contains the following components –

 Authorization Manager: It ensures role-based access control, i.e,. checks whether


the particular person is privileged to perform the requested operation or not.

 Integrity Manager: It checks the integrity constraints when the database is


modified.

 Transaction Manager: It controls concurrent access by performing the operations in


a scheduled way that it receives the transaction. Thus, it ensures that the database
remains in the consistent state before and after the execution of a transaction.

 File Manager: It manages the file space and the data structure used to represent
information in the database.

 Buffer Manager: It is responsible for cache memory and the transfer of data
between the secondary storage and main memory.

3. Disk Storage: It contains the following components –

 Data Files: It stores the data.

 Data Dictionary: It contains the information about the structure of any database
object. It is the repository of information that governs the metadata.

 Indices: It provides faster retrieval of data item.

The structure of a Database Management System (DBMS) can be divided into three main
components: the Internal Level, the Conceptual Level, and the External Level.

1. Internal Level: This level represents the physical storage of data in the database. It is
responsible for storing and retrieving data from the storage devices, such as hard
drives or solid-state drives. It deals with low-level implementation details such as data
compression, indexing, and storage allocation.

2. Conceptual Level: This level represents the logical view of the database. It deals with
the overall organization of data in the database and the relationships between them.
It defines the data schema, which includes tables, attributes, and their relationships.
The conceptual level is independent of any specific DBMS and can be implemented
using different DBMSs.

3. External Level: This level represents the user’s view of the database. It deals with how
users access the data in the database. It allows users to view data in a way that makes
DBMS
sense to them, without worrying about the underlying implementation details. The
external level provides a set of views or interfaces to the database, which are tailored
to meet the needs of specific user groups.

The three levels are connected through a schema mapping process that translates data from
one level to another. The schema mapping process ensures that changes made at one level
are reflected in the other levels.

In addition to these three levels, a DBMS also includes a Database Administrator (DBA)
component, which is responsible for managing the database system. The DBA is responsible
for tasks such as database design, security management, backup and recovery, and
performance tuning.

Overall, the structure of a DBMS is designed to provide a high level of abstraction to users,
while still allowing low-level implementation details to be managed effectively. This allows
users to focus on the logical organization of data in the database, without worrying about
the physical storage or implementation details.
DBMS
Unit 2
Entity in DBMS
Database Management System (DBMS) is an essential tool to manage data, but do you
know how important entities are in DBMS?

The role of the entity is the representation and management of data. In this article, we
are going to discuss entities in DBMS.

Entity:
An entity is referred to as an object or thing that exists in the real world. For example,
customer, car, pen, etc.

Entities are stored in the database, and they should be distinguishable, i.e., they should
be easily identifiable from the group. For example, a group of pens that are from the
same company cannot be identified, so they are only objects, but pens with different
colours become unique and will be called an entity like a red pen, green pen, blue pen,
black pen, etc.

In a group of pens, we can easily identify any pen because of its different colours, so a
pen of different colours is an entity.

For extracting data from the database, each data must be unique in its own way so
that it becomes easier to differentiate between them. Distinct and unique data is
known as an entity.
DBMS
An entity has some attributes which depict the entity's characteristics. For example, an
entity "Student" has attributes such as "Student_roll_no", "Student_name",
"Student_subject", and "Student_marks".

Example of Entity in DBMS in tabular form:


Student_rollno Student_name Student_subject Student_marks

1 Robert English 85

2 Parker Mathematics 75

3 Harry Science 80

4 George Geography 70

Some entities are related to other entities in the table. For example, the "Student"
entity is related to the "University" entity. The ERD (Entity Relationship Diagram) model
comes to light to visually show the relationship between several entities.

Kinds of Entity:
There are two kinds of entities, which are as follows:

1. Tangible Entity:

It is an entity in DBMS, which is a physical object that we can touch or see. In simple
words, an entity that has a physical existence in the real world is called a tangible entity.

For example, in a database, a table represents a tangible entity because it contains a


physical object that we can see and touch in the real world. It includes colleges, bank
lockers, mobiles, cars, watches, pens, paintings, etc.

2. Intangible Entity:

It is an entity in DBMS, which is a non-physical object that we cannot see or touch. In


simple words, an entity that does not have any physical existence in the real world is
known as an intangible entity.

For example, a bank account logically exists, but we cannot see or touch it.

Entity Type:
DBMS
A collection of entities with general characteristics is known as an entity type.

For example, a database of a corporate company has entity types such as employees,
departments, etc. In DBMS, every entity type contains a set of attributes that explain
the entity.

The Employee entity type can have attributes such as name, age, address, phone
number, and salary.

The Department entity type can have attributes such as name, number, and location
in the department.

Kinds of Entity Type


There are two kinds of entity type, which are as follows:

1. Strong Entity Type: It is an entity that has its own existence and is independent.

The entity relationship diagram represents a strong entity type with the help of a single
rectangle. Below is the ERD of the strong entity type:
DBMS
In the above example, the "Customer" is the entity type with attributes such as ID,
Name, Gender, and Phone Number. Customer is a strong entity type as it has a unique
ID for each customer.

2. Weak Entity Type: It is an entity that does not have its own existence and relies on
a strong entity for its existence.

The Entity Relationship Diagram represents the weak entity type using double
rectangles. Below is the ERD of the weak entity type:

In the above example, "Address" is a weak entity type with attributes such as House
No., City, Location, and State.

The relationship between a strong and a weak entity type is known as an identifying
relationship.

Using a double diamond, the Entity-Relationship Diagram represents a relationship


between the strong and the weak entity type.
DBMS
Let us see an example of the relationship between the Strong entity type and weak
entity type with the help of ER Diagram:

Entity Set
An entity set is a group of entities of the same entity type.

For example, an entity set of students, an entity set of motorbikes, an entity of


smartphones, an entity of customers, etc.

Entity sets can be classified into two types:

1. Strong Entity Set:

In a DBMS, a strong entity set consists of a primary key.

For example, an entity of motorbikes with the attributes, motorbike's registration


number, motorbike's name, motorbike's model, and motorbike's colour.

ADVERTISEMENT

Below is the representation of a strong entity set in tabular form:


DBMS

Example of Entity Relationship Diagram representation of the above strong entity set:
DBMS
2. Weak Entity Set:

In a DBMS, a weak entity set does not contain a primary key.

For example, An entity of smartphones with its attributes, phone's name, phone's
colour, and phone's RAM.

Below is the representation of a weak entity set in tabular form:

Example of Entity Relationship Diagram representation of the above weak entity set:
DBMS
Conclusion:
In this article, you read all the vital things related to entities in DBMS.

o We have discussed that entity is anything that exists in the real world and is identifiable.
o We have discussed the types of entities, which are tangible entities and intangible
entities.
o We have discussed entity type and types of entity type, which are weak entity type and
strong entity type.
o We have discussed entity sets and types of entity sets, which are weak entity sets and
strong entity sets.

attribute and data association relation between entities

Importance of DM

Data modeling involves creating a visual representation of the different types of


data an organization collects and the relationship between those data objects.

A data modeling process can help you achieve several goals.

First, you can identify missing and redundant data. As a result, you can improve
efficiency by making the necessary adjustments to your data.

Second, data modelling allows you to define how a database should be


structured, including the conceptual, logical, and physical layers.

Third, data modelling eliminates confusion by providing a clear picture of


organizational data needs. Developers or data experts will know how to get
started, which saves significant time and effort.

This article will help you understand the process of data modelling and its
importance and provide different data modeling techniques and examples.

The importance of data modeling


DBMS
Data modeling has numerous benefits, including the following:

 Accurate data representation. Data modeling allows developers, system


designers, database administrators, and other professionals to have a clear
picture of the business needs. Conceptual and logical data models use
flowcharts to show the required data objects and their connection to business
processes. These data models also determine the actual values that the
database will store. For instance, a customer data model can have strings,
integers, and Boolean values. Therefore, you can say that a data model acts
as a roadmap that enables developers to accurately represent all the required
data objects in a database.
 Relationship between database objects. Flowcharts in the conceptual and
logical data models allow you to understand the relationship between different
data objects. These relationships simplify complex business processes and
eliminate confusion during development.
 Control of data redundancy. In this context, redundancy involves having
several copies of the same data in a database. Apart from potentially affecting
data quality, duplicate data also consumes extra storage space. Data
modeling enables you to perform tests and set rules on how to deal with data
redundancies. For example, you can remove a particular table in your
database if there’s another with similar data.
 Security. With data modeling, you can identify security issues likely to affect
the application before you push it to production. Resolving these issues
allows you to end up with a more secure application. Modeling can also assist
you in fulfilling different data integrity requirements.
 Faster software builds. Conceptual, logical, and physical data models guide
developers on what needs to be implemented at different application stages.
As a result, these models eliminate confusion and allow developers to
dedicate more time and focus on the actual code implementation, leading to
faster software builds.
 Reduced cost. Data modeling can help you save significant time and costs.
For example, the planning or conceptual stage allows you to discover
problems that could have otherwise affected your system during production.
Identifying and handling these issues at an early stage is less frustrating and
less cost-intensive than discovering them farther into the process. You can
also use data modeling tools, such as Erwin, to scan for errors and automate
different data processes and structures, which improves data quality.
 Improved performance. Data modeling can make your application run more
efficiently. Data models serve as a high-level plan, enabling developers to
know which type of data the application will need and how to consume or
store it. Such requirements can help meet the needs of data-intensive fields
like machine learning, business intelligence, artificial intelligence, and big
data.
 Better documentation. Data modeling allows developers to document their
data structures, relationships, and other important business requirements.
Developers and other stakeholders can use these documents for future
reference.
 High-quality applications. A huge advantage of data modeling is that it
facilitates the creation of high-quality applications, which are easy to
DBMS
maintain. Such applications follow a detailed plan and structure, making them
less likely to crash. Such apps are also highly scalable.
 Quality decision-making. A data model helps break down complex data
architecture into simple terms that are more understandable to business
stakeholders outside the tech bubble. Business analysts can use this
information to ensure that all data requirements are fulfilled.
 Data analytics and visualization. Data modeling can help you transform
large amounts of data into valuable information for decision-making, business
intelligence, and other uses.

The evolution of data model:

Managing data was the key and was essential. Therefore, data model
originated to solve the file system issues. Here are the Data Models in DBMS

Hierarchical Model
In Hierarchical Model, a hierarchical relation is formed by collection of relations
and forms a tree-like structure.

The relationship can be defined in the form of parent child type.

One of the first and most popular Hierarchical Model is Information


Management System (IMS), developed by IBM.

Example

The hierarchy shows an Employee can be an Intern, on Contract or Full- Time.


Sub-levels show that Full-Time Employee can be hired as a Writer, Senior
Writer or Editor:
DBMS

Advantages

 The design of the hierarchical model is simple.


 Provides Data Integrity since it is based on parent/ child relationship
 Data sharing is feasible since the data is stored in a single database.
 Even for large volumes of data, this model works perfectly.

Disadvantages

 Implementation is complex.
 This model has to deal with anomalies like Insert, Update and Delete.
 Maintenance is difficult since changes done in the database may want you to do
changes in the entire database structure.

Network Model
The Hierarchical Model creates hierarchical tree with parent/ child relationship,
whereas the Network Model has graph and links.

The relationship can be defined in the form of links and it handles many-to-
many relations. This itself states that a record can have more than one parent.

Example
DBMS

Advantages

 Easy to design the Network Model


 The model can handle one-one, one-to-many, many-to-many relationships.
 It isolates the program from other details.
 Based on standards and conventions.

Disadvantages

 Pointers bring complexity since the records are based on pointers and graphs.
 Changes in the database isn’t easy that makes it hard to achieve structural
independence.

Relational Model
A relational model groups data into one or more tables. These tables are
related to each other using common records.

The data is represented in the form of rows and columns i.e. tables:
DBMS

Example

Let us see an example of two


relations <Employee> and <Department> linked to each other,
with DepartmentID, which is Foreign Key of <Employee> table and Primary
key of <Department> table.

Advantages

 The Relational Model does not have any issues that we saw in the previous two
models i.e. update, insert and delete anomalies have nothing to do in this model.
 Changes in the database do not require you to affect the complete database.
 Implementation of a Relational Model is easy.
 To maintain a Relational Model is not a tiresome task.

Disadvantages
DBMS
 Database inefficiencies hide and arise when the model has large volumes of data.
 The overheads of using relational data model come with the cost of using powerful
hardware and devices.

Types of data models , Advantages, Disadvantages:


A Data Model in Database Management System (DBMS) is the
concept of tools that are developed to summarize the
description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual
database. It shows us from the design of the data to its proper
implementation of data.
Types of Relational Models
1. Conceptual Data Model
2. Representational Data Model
3. Physical Data Model
It is basically classified into 3 types:-

1. Conceptual Data Model


The conceptual data model describes the database at a very
high level and is useful to understand the needs or
requirements of the database. It is this model, that is used in
DBMS
the requirement-gathering process i.e. before the Database
Designers start making a particular database. One such popular
model is the entity/relationship model (ER model). The E/R
model specializes in entities, relationships, and even attributes
that are used by database designers. In terms of this concept, a
discussion can be made even with non-computer science(non-
technical) users and stakeholders, and their requirements can
be understood.
Entity-Relationship Model( ER Model): It is a high-level data
model which is used to define the data and the relationships
between them. It is basically a conceptual design of any
database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can
be a name, place, object, class, etc. These are represented
by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description
of the entity. These are represented by Eclipse in an ER
Diagram. It can be Age, Roll Number, or Marks for a
Student.
3. Relationship: Relationships are used to define relations
among different entities. Diamonds and Rhombus are
used to show Relationships.
Characteristics of a conceptual data model
 Offers Organization-wide coverage of the business
concepts.
DBMS
 This type of Data Models are designed and developed for
a business audience.
 The conceptual model is developed independently of
hardware specifications like data storage capacity, location
or software specifications like DBMS vendor and
technology. The focus is to represent data as a user will
see it in the “real world.”
Conceptual data models known as Domain models create a
common vocabulary for all stakeholders by establishing basic
concepts and scope
2. Representational Data Model
This type of data model is used to represent only the logical
part of the database and does not represent the physical
structure of the database. The representational data model
allows us to focus primarily, on the design part of the database.
A popular representational model is a Relational model. The
relational Model consists of Relational Algebra and Relational
Calculus. In the Relational Model, we basically use tables to
represent our data and the relationships between them. It is a
theoretical concept whose practical implementation is done in
Physical Data Model.
The advantage of using a Representational data model is to
provide a foundation to form the base for the Physical model
3. Physical Data Model
The physical Data Model is used to practically implement
Relational Data Model. Ultimately, all data in a database is
stored physically on a secondary storage device such as discs
and tapes. This is stored in the form of files, records, and certain
DBMS
other data structures. It has all the information on the format in
which the files are present and the structure of the databases,
the presence of external data structures, and their relation to
each other. Here, we basically save tables in memory so they
can be accessed efficiently. In order to come up with a good
physical model, we have to work on the relational model in a
better way. Structured Query Language (SQL) is used to
practically implement Relational Algebra.
This Data Model describes HOW the system will be
implemented using a specific DBMS system. This model is
typically created by DBA and developers. The purpose is actual
implementation of the database.
Characteristics of a physical data model:
 The physical data model describes data need for a single
project or application though it maybe integrated with
other physical data models based on project scope.
 Data Model contains relationships between tables that
which addresses cardinality and nullability of the
relationships.
 Developed for a specific version of a DBMS, location, data
storage or technology to be used in the project.
 Columns should have exact datatypes, lengths assigned
and default values.
 Primary and Foreign keys, views, indexes, access profiles,
and authorizations, etc. are defined
Some Other Data Models
1. Hierarchical Model
DBMS
The hierarchical Model is one of the oldest models in the data
model which was developed by IBM, in the 1950s. In a
hierarchical model, data are viewed as a collection of tables, or
we can say segments that form a hierarchical relation. In this,
the data is organized into a tree-like structure where each
record consists of one parent record and many children. Even if
the segments are connected as a chain-like structure by logical
associations, then the instant structure can be a fan structure
with multiple branches. We call the illogical associations as
directional associations.
2. Network Model
The Network Model was formalized by the Database Task group
in the 1960s. This model is the generalization of the hierarchical
model. This model can consist of multiple parent segments and
these segments are grouped as levels but there exists a logical
association between the segments belonging to any level.
Mostly, there exists a many-to-many logical association
between any of the two segments.
3. Object-Oriented Data Model
In the Object-Oriented Data Model, data and their relationships
are contained in a single structure which is referred to as an
object in this data model. In this, real-world problems are
represented as objects with different attributes. All objects have
multiple relationships between them. Basically, it is a
combination of Object Oriented programming and a Relational
Database Model.
4. Float Data Model
DBMS
The float data model basically consists of a two-dimensional
array of data models that do not contain any duplicate
elements in the array. This data model has one drawback it
cannot store a large amount of data that is the tables can not
be of large size.
5. Context Data Model
The Context data model is simply a data model which consists
of more than one data model. For example, the Context data
model consists of ER Model, Object-Oriented Data Model, etc.
This model allows users to do more than one thing which each
individual data model can do.
6. Semi-Structured Data Model
Semi-Structured data models deal with the data in a flexible
way. Some entities may have extra attributes and some entities
may have some missing attributes. Basically, you can represent
data here in a flexible way.
Advantages of Data Models
1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in
minimizing Data Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for
building the physical database.
5. The information in the data model can be used for defining
the relationship between tables, primary and foreign keys,
and stored procedures.
DBMS
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes
difficult to understand the data model.
2. You must have the proper knowledge of SQL to use
physical models.
3. Even smaller change made in structure require
modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data
stored characteristics.
Conclusion
 Data modeling is the process of developing data model for
the data to be stored in a Database.
 Data Models ensure consistency in naming conventions,
default values, semantics, security while ensuring quality of
the data.
 Data Model structure helps to define the relational tables,
primary and foreign keys and stored procedures.
 There are three types of conceptual, logical, and physical.
 The main aim of conceptual model is to establish the
entities, their attributes, and their relationships.
 Logical data model defines the structure of the data
elements and set the relationships between them.
 A Physical Data Model describes the database specific
implementation of the data model.
DBMS
 The main goal of a designing data model is to make
certain that data objects offered by the functional team
are represented accurately.
 The biggest drawback is that even smaller change made in
structure require modification in the entire application.
 Reading this Data Modeling tutorial, you will learn from the
basic concepts such as What is Data Model? Introduction
to different types of Data Model, advantages,
disadvantages, and data model example.
DBMS
Unit 3

Database design
Database Design can be defined as a set of procedures or
collection of tasks involving various steps taken to implement a
database. Following are some critical points to keep in mind to
achieve a good database design:
1. Data consistency and integrity must be maintained.
2. Low Redundancy
3. Faster searching through indices
4. Security measures should be taken by enforcing various
integrity constraints.
5. Data should be stored in fragmented bits of information in
the most atomic format possible.
However, depending on specific requirements above criteria
might change. But these are the most common things that
ensure a good database design.
What are the Following Steps that can be taken by a
Database Designer to Ensure Good Database Design?
Step 1: Determine the goal of your database, and ensure clear
communication with the stakeholders (if any). Understanding
the purpose of a database will help in thinking of various use
cases & where the problem may arise & how we can prevent it.
Step 2: List down all the entities that will be present in the
database & what relationships exist among them.
DBMS
Step 3: Organize the information into different tables such that
no or very little redundancy is there.
Step 4: Ensure uniqueness in every table. The uniqueness of
records present in any relation is a very crucial part of database
design that helps us avoid redundancy. Identify the key
attributes to uniquely identify every row from columns. You can
use various key constraints to ensure the uniqueness of your
table, also keep in mind the uniquely identifying records must
consume as little space as possible & shall not contain any
NULL values.
Step 5: After all the tables are structured, and information is
organized apply Normalization Forms to identify anomalies that
may arise & redundancy that can cause inconsistency in the
database.
Primary Terminologies Used in Database Design
Following are the terminologies that a person should be
familiar with before designing a database:
 Redundancy: Redundancy refers to the duplicity of the
data. There can be specific use cases when we need or
don’t need redundancy in our Database. For ex: If we have
a banking system application then we may need to strictly
prevent redundancy in our Database.
 Schema: Schema is a logical container that defines the
structure & manages the organization of the data stored
in it. It consists of rows and columns having data types for
each column.
 Records/Tuples: A Record or a tuple is the same thing,
basically its where our data is stored inside a table
DBMS
 Indexing: Indexing is a data structure technique to
promote efficient retrieval of the data stored in our
database.
 Data Integrity & Consistency: Data integrity refers to the
quality of the information stored in our database and
consistency refers to the correctness of the data stored.
 Data Models: Data models provide us with visual
modeling techniques to visualize the data & the
relationship that exists among those data. Ex: model,
Network Model, Object Oriented Model, Hierarchical
model, etc.
 Functional Dependency: Functional Dependency is a
relationship between two attributes of the table that
represents that the value of one attribute can be
determined by another. Ex: {A -> B}, A & B are two
attributes and attribute A can uniquely determine the
value of B.
 Transaction: Transaction is a single logical unit of work. It
signifies that some changes are made in the database. A
transaction must satisfy the ACID or BASE properties
(depending on the type of Database).
 Schedule: Schedule defines the sequence of transactions
in which they’re executed by one or multiple users.
 Concurrency: Concurrency refers to allowing multiple
transactions to operate simultaneously without interfering
with one another.
Database Design Lifecycle
DBMS
The database design lifecycle goes something like this:

Lifecycle of Database Design


1. Requirement Analysis
It’s very crucial to understand the requirements of our
application so that you can think in productive terms. And
imply appropriate integrity constraints to maintain the data
integrity & consistency.
2. Logical & Physical Design
This is the actual design phase that involves various steps that
are to be taken while designing a database. This phase is
further divided into two stages:
 Logical Data Model Design: This phase consists of
coming up with a high-level design of our database based
on initially gathered requirements to structure & organize
our data accordingly. A high-level overview on paper is
made of the database without considering the physical
level design, this phase proceeds by identifying the kind
of data to be stored and what relationship will exist among
those data.
Entity, Key attributes identification & what constraints are
to be implemented is the core functionality of this phase.
DBMS
It involves techniques such as Data Modeling to visualize
data, normalization to prevent redundancy, etc.
 Physical Design of Data Model: This phase involves the
implementation of the logical design made in the previous
stage. All the relationships among data and integrity
constraints are implemented to maintain consistency &
generate the actual database.
3. Data Insertion and testing for various integrity
Constraints
Finally, after implementing the physical design of the database,
we’re ready to input the data & test our integrity. This phase
involves testing our database for its integrity to see if
something got left out or, if anything new to add & then
integrating it with the desired application.
Logical Data Model Design
The logical data model design defines the structure of data and
what relationship exists among those data. The following are
the major components of the logical design:
1. Data Models: Data modeling is a visual modeling technique
used to get a high-level overview of our database. Data models
help us understand the needs and requirements of our
database by defining the design of our database through
diagrammatic representation. Ex: model, Network model,
Relational Model, object-oriented data model.
DBMS

Data Models
2. Entity: Entities are objects in the real world, which can have
certain properties & these properties are referred to as
attributes of that particular entity. There are 2 types of entities:
Strong and weak entity, weak entity do not have a key attribute
to identify them, their existence solely depends on one 1-
specific strong entity & also have full participation in a
relationship whereas strong entity does have a key attribute to
uniquely identify them.
Weak entity example: Loan -> Loan will be given to a customer
(which is optional) & the load will be identified by the
customer_id to whom the lone is granted.
3. Relationships: How data is logically related to each other
defines the relationship of that data with other entities. In
simple words, the association of one entity with another is
defined here.
A relationship can be further categorized into – unary, binary,
and ternary relationships.
 Unary: In this, the associating entity & the associated
entity both are the same. Ex: Employee Manages
themselves, and students are also given the post of
monitor hence here the student themselves is a monitor.
DBMS
 Binary: This is a very common relationship that you will
come across while designing a database.
Ex: Student is enrolled in courses, Employee is managed by
different managers, One student can be taught by many
professors.
 Ternary: In this, we have 3 entities involved in a single
relationship. Ex: an employee works on a project for a
client. Note that, here we have 3 entities: Employee,
Project & Client.
4. Attributes: Attributes are nothing but properties of a specific
entity that define its behavior. For example, an employee can
have unique_id, name, age, date of birth (DOB), salary,
department, Manager, project id, etc.
5. Normalization: After all the entities are put in place and the
relationship among data is defined, we need to look for
loopholes or possible ambiguities that may arise as a result of
CRUD operations. To prevent various Anomalies such as
INSERTION, UPDATION, and DELETION Anomalies.
Data Normalization is a basic procedure defined for databases
to eliminate such anomalies & prevent redundancy.
An Example of Logical Design
DBMS

Logical Design Example


Physical Design
The main purpose of the physical design is to actually
implement the logical design that is, show the structure of the
database along with all the columns & their data types, rows,
relations, relationships among data & clearly define how
relations are related to each other.
Following are the steps taken in physical design
DBMS
Step 1: Entities are converted into tables or relations that
consist of their properties (attributes)
Step 2: Apply integrity constraints: establish foreign key, unique
key, and composite key relationships among the data. And
apply various constraints.
Step 3: Entity names are converted into table names, property
names are translated into attribute names, and so on.
Step 4: Apply normalization & modify as per the requirements.
Step 5: Final Schemes are defined based on the entities &
attributes derived in logical design.
DBMS
Physical Design
Conclusion
In conclusion, a good database design is an essential part of a
strong database management system (DBMS). It provides the
basis for data governance, data storage, and data retrieval. The
quality of a database has a direct impact on a system’s overall
performance and dependability. It is important to consider data
organization, standardization, performance, integrity, and more
when designing a database to meet the needs of your
organization and your users.

Design phases:

here are three main phases in designing a database management system


(DBMS):

1. Conceptual design: This phase focuses on understanding the


business requirements and identifying the high-level
entities, relationships, and attributes that will be stored in the
database. It is essentially a blueprint of the database, independent
of any specific DBMS product or technology.
DBMS

2. Logical design: This phase refines the conceptual design by


translating it into a specific data model, such as the relational
model. The logical design defines the tables, columns, data
types, and constraints that will be used to store the data.
DBMS

3. Physical design: This phase takes the logical design and maps it to
the specific storage structures and access methods provided by the
chosen DBMS. The physical design considers factors such as
performance, scalability, and security.

Physical design phase in DBMS


Each phase is important for creating a well-designed database that
meets the needs of the organization. The phases are iterative, and it is
common to revisit earlier phases as the design progresses.

Normalization:
It is the process of minimizing redundancy from a relation or set of
relations. Redundancy in relation may cause insertion, deletion, and
update anomalies. So, it helps to minimize the redundancy in
relations. Normal forms are used to eliminate or reduce redundancy in
database tables.
Normalization of DBMS
In database management systems (DBMS), normal forms are a series of
guidelines that help to ensure that the design of a database is efficient,
organized, and free from data anomalies. There are several levels of
normalization, each with its own set of guidelines, known as normal
forms.
DBMS
Important Points Regarding Normal Forms in DBMS
 First Normal Form (1NF): This is the most basic level of
normalization. In 1NF, each table cell should contain only a single
value, and each column should have a unique name. The first
normal form helps to eliminate duplicate data and simplify queries.
 Second Normal Form (2NF): 2NF eliminates redundant data by
requiring that each non-key attribute be dependent on the primary
key. This means that each column should be directly related to the
primary key, and not to other columns.
 Third Normal Form (3NF): 3NF builds on 2NF by requiring that all
non-key attributes are independent of each other. This means that
each column should be directly related to the primary key, and not
to any other columns in the same table.
 Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF
that ensures that each determinant in a table is a candidate key. In
other words, BCNF ensures that each non-key attribute is
dependent only on the candidate key.
 Fourth Normal Form (4NF): 4NF is a further refinement of BCNF
that ensures that a table does not contain any multi-valued
dependencies.
 Fifth Normal Form (5NF): 5NF is the highest level of
normalization and involves decomposing a table into smaller
tables to remove data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data
consistency, and improve database performance. However, higher levels
of normalization can lead to more complex database designs and
queries. It is important to strike a balance between normalization and
practicality when designing a database.
Advantages of Normal Form
DBMS
 Reduced data redundancy: Normalization helps to eliminate
duplicate data in tables, reducing the amount of storage space
needed and improving database efficiency.
 Improved data consistency: Normalization ensures that data is
stored in a consistent and organized manner, reducing the risk of
data inconsistencies and errors.
 Simplified database design: Normalization provides guidelines for
organizing tables and data relationships, making it easier to design
and maintain a database.
 Improved query performance: Normalized tables are typically
easier to search and retrieve data from, resulting in faster query
performance.
 Easier database maintenance: Normalization reduces the
complexity of a database by breaking it down into smaller, more
manageable tables, making it easier to add, modify, and delete
data.
Overall, using normal forms in DBMS helps to improve data quality,
increase database efficiency, and simplify database design and
maintenance.
First Normal Form
If a relation contain composite or multi-valued attribute, it violates first
normal form or a relation is in first normal form if it does not contain any
composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is singled valued attribute.
 Example 1 – Relation STUDENT in table 1 is not in 1NF because of
multi-valued attribute STUD_PHONE. Its decomposition into 1NF
has been shown in table 2.
DBMS

Example
 Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
 In the above table Course is a multi-valued attribute so it is not in
1NF. Below Table is in 1NF as there is no multi-valued attribute

ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form

To be in second normal form, a relation must be in first normal form and


relation must not contain any partial dependency. A relation is in 2NF if it
has No Partial Dependency, i.e., no non-prime attribute (attributes
which are not part of any candidate key) is dependent on any proper
subset of any candidate key of the table. Partial Dependency – If the
DBMS
proper subset of candidate key determines non-prime attribute, it is
called partial dependency.

 Example 1 – Consider table-3 as following below.


STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000

 {Note that, there are many courses having the same course fee}
Here, COURSE_FEE cannot alone decide the value of COURSE_NO
or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with COURSE_NO
cannot decide the value of STUD_NO; Hence, COURSE_FEE would
be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO ->
COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO,
which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate
key, which is a partial dependency and so this relation is not in
2NF. To convert the above relation to 2NF, we need to split the
table into two tables such as : Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
DBMS
 NOTE: 2NF tries to reduce the redundant data getting stored in
memory. For instance, if there are 100 students taking C1 course,
we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course fee
for C1 is 1000.
 Example 2 – Consider following functional dependencies in
relation R (A, B , C, D )

AB -> C [A and B together determine C]


BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial
dependency, i.e., any proper subset of AB doesn’t determine any non-
prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO ->
STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.

So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the


third normal form.
To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE,
COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All
possible candidate keys in above relation are {A, E, CD, BC} All attributes
are on right sides of all functional dependencies are prime.
DBMS
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with
FD set as {BC->D, AC->BE, B->E}

Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C
can’t be derived from any other attribute of the relation, so there will be
only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate
key {A, C} in this example and others will be non-prime {B, D, E} in this
example.
Step 3: The relation R is in 1st normal form as a relational DBMS does
not allow multi-valued or composite attribute. The relation is in 2nd
normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).

The relation is not in 3rd normal form because in BC->D (neither BC is a


super key nor D is a prime attribute) and in B->E (neither B is a super key
nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an
FD should be super key or RHS should be prime attribute. So the highest
normal form of relation will be 2nd Normal form.

For example consider relation R(A, B, C) A -> BC, B -> A and B both are
super keys so above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any
transitive dependency for non-prime attributes. The basic condition with
the Third Normal Form is that, the relation must be in Second Normal
Form.
Below mentioned is the basic condition that must be hold in the non-
trivial functional dependency X -> Y:
 X is a Super Key.
DBMS
 Y is a Prime Attribute ( this means that element of Y is some part of
Candidate Key).

For more, refer to Third Normal Form in DBMS.


BCNF
BCNF (Boyce-Codd Normal Form) is just a advanced version of Third
Normal Form. Here we have some additional rules than Third Normal
Form. The basic condition for any relation to be in BCNF is that it must
be in Third Normal Form.
We have to focus on some basic rules that are for BCNF:
1. Table must be in Third Normal Form.
2. In relation X->Y, X must be a superkey in a relation.
For more, refer to BCNF in DBMS.
Fourth Normal Form
Fourth Normal Form contains no non-trivial multivaued dependency
except candidate key. The basic condition with Fourth Normal Form is
that the relation must be in BCNF.
The basic rules are mentioned below.
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
For more, refer to Fourth Normal Form in DBMS.
Fifth Normal Form
Fifth Normal Form is also called as Projected Normal Form. The basic
conditions of Fifth Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed.

For more, refer to Fifth Normal Form in DBMS.


Applications of Normal Forms in DBMS
DBMS
 Data consistency: Normal forms ensure that data is consistent and
does not contain any redundant information. This helps to prevent
inconsistencies and errors in the database.
 Data redundancy: Normal forms minimize data redundancy by
organizing data into tables that contain only unique data. This
reduces the amount of storage space required for the database
and makes it easier to manage.

 Response time: Normal forms can improve query performance by


reducing the number of joins required to retrieve data. This helps
to speed up query processing and improve overall system
performance.
 Database maintenance: Normal forms make it easier to maintain
the database by reducing the amount of redundant data that
needs to be updated, deleted, or modified. This helps to improve
database management and reduce the risk of errors or
inconsistencies.
 Database design: Normal forms provide guidelines for designing
databases that are efficient, flexible, and scalable. This helps to
ensure that the database can be easily modified, updated, or
expanded as needed.

Some Important Points about Normal Forms


 BCNF is free from redundancy caused by Functional Dependencies.
 If a relation is in BCNF, then 3NF is also satisfied.
 If all attributes of relation are prime attribute, then the relation is
always in 3NF.

 A relation in a Relational Database is always and at least in 1NF


form.
 Every Binary Relation ( a Relation with only 2 attributes ) is always in
BCNF.
DBMS
 If a Relation has only singleton candidate keys( i.e. every candidate
key consists of only 1 attribute), then the Relation is always in 2NF(
because no Partial functional dependency possible).
 Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
 There are many more Normal forms that exist after BCNF, like 4NF
and more. But in real world database systems it’s generally not
required to go beyond BCNF.
Conclusion
In Conclusion, relational databases can be arranged according to a set of
rules called normal forms in database administration (1NF, 2NF, 3NF,
BCNF, 4NF, and 5NF), which reduce data redundancy and preserve data
integrity. By resolving various kinds of data anomalies and dependencies,
each subsequent normal form expands upon the one that came before
it. The particular requirements and properties of the data being stored
determine which normal form should be used; higher normal forms offer
stricter data integrity but may also result in more complicated database
structures.

ER model entity set/ ER model:


The Entity Relational Model is a model for identifying entities to be
represented in the database and representation of how those entities are
related. The ER data model specifies enterprise schema that represents
the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the
entities present in the database. ER models are used to model real-world
objects like a person, a car, or a company and the relation between these
real-world objects. In short, the ER Diagram is the structural format of
the database.
Why Use ER Diagrams In DBMS?
DBMS
 ER diagrams are used to represent the E-R model in a database,
which makes them easy to be converted into relations (tables).

 ER diagrams provide the purpose of real-world modeling of


objects which makes them intently useful.
 ER diagrams require no technical knowledge and no hardware
support.
 These diagrams are very easy to understand and easy to create
even for a naive user.
 It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model

ER Model is used to model the logical view of the system from a data
perspective which consists of these symbols:
 Rectangles: Rectangles represent Entities in the ER Model.
 Ellipses: Ellipses represent Attributes in the ER Model.
 Diamond: Diamonds represent Relationships among Entities.

 Lines: Lines represent attributes to entities and entity sets with


other relationship types.
 Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
DBMS
 Double Rectangle: Double Rectangle represents a Weak Entity.

Symbols used in ER Diagram

Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among
Entities in a Database System.

Components of ER Diagram
DBMS
Entity
An Entity may be an object with a physical existence – a particular
person, car, house, or employee – or it may be an object with a
conceptual existence – a company, a job, or a university course.
Entity Set: An Entity is an object of Entity Type and a set of all entities is
called an entity set. For Example, E1 is an entity having Entity Type
Student and the set of all students is called Entity Set. In ER diagram,
Entity Type is represented as:

Entity Set

1. Strong Entity

A Strong Entity is a type of entity that has a key Attribute. Strong Entity
does not depend on other Entity in the Schema. It has a primary key, that
helps in identifying it uniquely, and it is represented by a rectangle.
These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in
the entity set. But some entity type exists for which key attributes can’t
be defined. These are called Weak Entity types.
DBMS
For Example, A company may store the information of dependents
(Parents, Children, Spouse) of an Employee. But the dependents don’t
have existed without the employee. So Dependent will be a Weak Entity
Type and Employee will be Identifying Entity type for Dependent, which
means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The
participation of weak entity types is always total. The relationship
between the weak entity type and its identifying strong entity type is
called identifying relationship and it is represented by a double
diamond.

Strong Entity and Weak Entity


Attributes
Attributes are the properties that define the entity type. For example,
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes
that define entity type Student. In ER diagram, the attribute is
represented by an oval.

Attribute
1. Key Attribute

The attribute which uniquely identifies each entity in the entity set is
called the key attribute. For example, Roll_No will be unique for each
DBMS
student. In ER diagram, the key attribute is represented by an oval with
underlying lines.

Key Attribute

2. Composite Attribute
An attribute composed of many other attributes is called a composite
attribute. For example, the Address attribute of the student Entity type
consists of Street, City, State, and Country. In ER diagram, the composite
attribute is represented by an oval comprising of ovals.

Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For
example, Phone_No (can be more than one for a given student). In ER
diagram, a multivalued attribute is represented by a double oval.
DBMS
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type
is known as a derived attribute. e.g.; Age (can be derived from DOB). In
ER diagram, the derived attribute is represented by a dashed oval.

Derived Attribute
The Complete Entity Type Student with its Attributes can be represented
as:

Entity and Attributes


Relationship Type and Relationship Set
DBMS
A Relationship Type represents the association between entity types. For
example, ‘Enrolled in’ is a relationship type that exists between entity
type Student and Course. In ER diagram, the relationship type is
represented by a diamond and connecting the entities with lines.

Entity-Relationship Set

A set of relationships of the same type is known as a relationship set. The


following relationship set depicts S1 as enrolled in C2, S2 as enrolled in
C1, and S3 as registered in C3.

Relationship Set
Degree of a Relationship Set

The number of different entity sets participating in a relationship set is


called the degree of a relationship set.
1. Unary Relationship: When there is only ONE entity set participating
in a relation, the relationship is called a unary relationship. For example,
one person is married to only one person.
DBMS

Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in
a relationship, the relationship is called a binary relationship. For
example, a Student is enrolled in a Course.

Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a
relation, the relationship is called an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a
relationship set is known as cardinality. Cardinality can be of different
types:
1. One-to-One: When each entity in each entity set can take part only
once in the relationship, the cardinality is one-to-one. Let us assume that
a male can marry one female and a female can marry one male. So the
relationship will be one-to-one.
the total number of tables that can be used in this is 2.
DBMS

one to one cardinality


Using Sets, it can be represented as:

Set Representation of One-to-One

2. One-to-Many: In one-to-many mapping as well where each entity can


be related to more than one relationship and the total number of tables
that can be used in this is 2. Let us assume that one surgeon deparment
can accomodate many doctors. So the Cardinality will be 1 to M. It
means one deparment has many Doctors.

total number of tables that can used is 3.


DBMS

one to many cardinality


Using sets, one-to-many cardinality can be represented as:
Set Representation of One-to-Many
DBMS
3. Many-to-One: When entities in one entity set can take part only once
in the relationship set and entities in other entity sets can take part more
than once in the relationship set, cardinality is many to one. Let us
assume that a student can take only one course but one course can be
taken by many students. So the cardinality will be n to 1. It means that
for one course there can be n students but for one student, there will be
only one course.
The total number of tables that can be used in this is 3.

many to one cardinality


Using Sets, it can be represented as:

Set Representation of Many-to-One


In this case, each student is taking only 1 course but 1 course has been
taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more
than once in the relationship cardinality is many to many. Let us assume
that a student can take more than one course and one course can be
taken by many students. So the relationship will be many to many.
the total number of tables that can be used in this is 3.
DBMS

many to many cardinality


Using Sets, it can be represented as:

Many-to-Many Set Representation


DBMS
In this example, student S1 is enrolled in C1 and C3 and Course C3 is
enrolled by S1, S3, and S4. So it is many-to-many relationships.

Participation Constraint
Participation Constraint is applied to the entity participating in the
relationship set.
1. Total Participation – Each entity in the entity set must participate in
the relationship. If each student must enroll in a course, the participation
of students will be total. Total participation is shown by a double line in
the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any of
the students, the participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity
set having total participation and Course Entity set having partial
participation.

Total Participation and Partial Participation


Using Set, it can be represented as,

Set representation of Total Participation and Partial Participation


DBMS
Every student in the Student Entity set participates in a relationship but
there exists a course C4 that is not taking part in the relationship.

How to Draw ER Diagram?


 The very first step is Identifying all the Entities, and place them in a
Rectangle, and labeling them accordingly.
 The next step is to identify the relationship between them and
pace them accordingly using the Diamond, and make sure that,
Relationships are not connected to each other.
 Attach attributes to the entities properly.
 Remove redundant entities and relationships.

Add proper colors to highlight the data present in the database.

ER diagram basic structure:


DBMS
Cardinality in DBMS (Mapping Constraints)
DBMS
DBMS stands for Database Management System, which is a tool, or a software used to
do various operations on a Database like the Creation of the Database, Deletion of the
Database, or Updating the current Database. To simplify processing and data querying,
the most popular types of Databases currently in use typically model their data as rows
and columns in a set of tables. The data may then be handled, updated, regulated, and
structured with ease. For writing and querying data, most Databases employ
Structured Query Language (SQL).

Cardinality
Cardinality means how the entities are arranged to each other or what is the
relationship structure between entities in a relationship set. In a Database
Management System, Cardinality represents a number that denotes how many times
an entity is participating with another entity in a relationship set. The Cardinality of
DBMS is a very important attribute in representing the structure of a Database. In a
table, the number of rows or tuples represents the Cardinality.

Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the mapping of
one entity set to another entity set in a relationship set. We generally take the example
of a binary relationship set where two entities are mapped to each other.

Cardinality is very important in the Database of various businesses. For example, if we


want to track the purchase history of each customer then we can use the one-to-many
cardinality to find the data of a specific customer. The Cardinality model can be used
in Databases by Database Managers for a variety of purposes, but corporations often
use it to evaluate customer or inventory data.

There are four types of Cardinality Mapping in Database Management Systems:

1. One to one
2. Many to one
3. One to many
4. Many to many
DBMS
One to One
One to one cardinality is represented by a 1:1 symbol. In this, there is at most one
relationship from one entity to another entity. There are a lot of examples of one-to-
one cardinality in real life databases.

For example, one student can have only one student id, and one student id can
belong to only one student. So, the relationship mapping between student and
student id will be one to one cardinality mapping.

Another example is the relationship between the director of the school and the school
because one school can have a maximum of one director, and one director can belong
to only one school.

Note: it is not necessary that there would be a mapping for all entities in an entity
set in one-to-one cardinality. Some entities cannot participate in the mapping.

Many to One Cardinality:


In many to one cardinality mapping, from set 1, there can be multiple sets that can
make relationships with a single entity of set 2. Or we can also describe it as from set
2, and one entity can make a relationship with more than one entity of set 1.

One to one Cardinality is the subset of Many to one Cardinality. It can be represented
by M:1.

ADVERTISEMENT
DBMS
For example, there are multiple patients in a hospital who are served by a single
doctor, so the relationship between patients and doctors can be represented by Many
to one Cardinality.

One to Many Cardinalities:


In One-to-many cardinality mapping, from set 1, there can be a maximum single set
that can make relationships with a single or more than one entity of set 2. Or we can
also describe it as from set 2, more than one entity can make a relationship with only
one entity of set 1.

One to one cardinality is the subset of One-to-many Cardinality. It can be represented


by 1: M.

For Example, in a hospital, there can be various compounders, so the relationship


between the hospital and compounders can be mapped through One-to-many
Cardinality.
DBMS

Many to Many Cardinalities:


In many, many cardinalities mapping, there can be one or more than one entity that
can associate with one or more than one entity of set 2. In the same way from the end
of set 2, one or more than one entity can make a relation with one or more than one
entity of set 1.

It is represented by M: N or N: M.

One to one cardinality, One to many cardinalities, and Many to one cardinality is the
subset of the many to many cardinalities.

ADVERTISEMENT

For Example, in a college, multiple students can work on a single project, and a single
student can also work on multiple projects. So, the relationship between the project
and the student can be represented by many to many cardinalities.
DBMS

Appropriate Mapping Cardinality


Evidently, the real-world context in which the relation set is modeled determines the
Appropriate Mapping Cardinality for a specific relation set.

ADVERTISEMENT

ADVERTISEMENT

o We can combine relational tables with many involved tables if the Cardinality is one-
to-many or many-to-one.
o One entity can be combined with a relation table if it has a one-to-one relationship
and total participation, and two entities can be combined with their relation to form a
single table if both of them have total participation.
o We cannot mix any two tables if the Cardinality is many-to-many.

An entity type should have a key attribute which uniquely identifies each entity
in the entity set, but there exists some entity type for which key attribute can’t be
defined. These are called Weak Entity type.
The entity sets which do not have sufficient attributes to form a primary key are
known as weak entity sets and the entity sets which have a primary key are
known as strong entity sets.
As the weak entities do not have any primary key, they cannot be identified on
their own, so they depend on some other entity (known as owner entity). The
weak entities have total participation constraint (existence dependency) in its
identifying relationship with owner identity. Weak entity types have partial keys.
DBMS
Partial Keys are set of attributes with the help of which the tuples of the weak
entities can be distinguished and identified.
Note – Weak entity always has total participation but Strong entity may not have
total participation.
Weak entity is depend on strong entity to ensure the existence of weak entity.
Like strong entity, weak entity does not have any primary key, It has partial
discriminator key. Weak entity is represented by double rectangle. The relation
between one strong and one weak entity is represented by double diamond.

Weak entities are represented with double rectangular box in the ER Diagram
and the identifying relationships are represented with double diamond. Partial
Key attributes are represented with dotted lines.

Example-1:
In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is the
identifying relationship and ‘Payment Number’ is the partial key. Primary Key of
the Loan along with the partial key would be used to identify the records.
DBMS

Example-2:
The existence of rooms is entirely dependent on the existence of a hotel. So room
can be seen as the weak entity of the hotel.
Example-3:
The bank account of a particular bank has no existence if the bank doesn’t exist
anymore.
Example-4:
A company may store the information of dependents (Parents, Children, Spouse)
of an Employee. But the dependents don’t have existence without the employee.
So Dependent will be weak entity type and Employee will be Identifying Entity
type for Dependent.
Other examples:

Strong entity | Weak entity


Order | Order Item
Employee | Dependent
Class | Section
Host | Logins
Note – Strong-Weak entity set always has parent-child relationship.

ERD symbols and notations:


DBMS

Entity Symbol Name Description

These shapes are independent from


other entities, and are often called
parent entities, since they will often
Strong
entity
have weak entities that depend on
them. They will also have a primary key,
distinguishing each occurrence of the
entity.

Weak entities depend on some other


Weak entity entity type. They don't have primary
keys, and have no meaning in the
diagram without their parent entity.

Associative entities relate the instances of several


Associative
entity types. They also contain attributes specific to
entity
the relationship between those entity instances.

ERD relationship symbols

Within entity-relationship diagrams, relationships are used to


document the interaction between two entities. Relationships are
usually verbs such as assign, associate, or track and provide useful
information that could not be discerned with just the entity types.
DBMS

Relationship Symbol Name Description

Relationships are associations


Relationship
between or among entities.

Weak Weak Relationships are connections


relationship
between a weak entity and its owner.

ERD attribute symbols

ERD attributes are characteristics of the entity that help users to


better understand the database. Attributes are included to include
details of the various entities that are highlighted in a conceptual ER
diagram.
DBMS

Attribute Symbol Name Description

Attributes are characteristics of an


Attribute entity, a many-to-many relationship, or
a one-to-one relationship.

Multivalued Multivalued attributes are those that are


attribute can take on more than one value.

Derived attributes are attributes whose


Derived
attribute value can be calculated from related
attribute values.

Relationship
Relationships are associations between
or among entities.
DBMS
Arrow notation

Arrow notation is simple and easily recognizable. Its main focus is the number of
relationships entities have within a system.

Arrow notation symbols

 Zero or one relationship – a single-headed arrow, with an open


circle on the line.

 One relationship – A straight line with one arrowhead

 Zero or many relationships – Two arrowheads, one sitting just


behind the other, and a straight line

 One or many relationships – Two arrowheads, like above, with a


short, perpendicular line.
DBMS
Barker’s notation

Invented by Richard Barker, this notation is most commonly used for describing data
in Oracle. Barker’s notations are widely used, and most people familiar with
diagrams have seen these notations before.

Barker’s notation symbols

Entities are displayed as rectangles with rounded edges, attributes with one of three
symbols inside the entity box, and relationships with various lines.

Attributes can be shown in one of three ways:

 Unique identifier – Shown with a pound sign, or hashtag (#)


 Mandatory – This attribute is shown with an asterisk (*)
 Optional – Represented by ‘O’

Relationships are drawn to show both optionality and degree, so there each
relationship line tells the reader two things.

Optionality symbols

 Mandatory relationship – A straight, solid line


 Optional relationship – A dashed line
DBMS
Barker’s notation

Relationship degree symbols

 One to one – One solid line, like with mandatory relationships


 One to many – On one end of the line, there are two branching lines,
like a fork or a trident
 Many to many – Trident shapes are on both ends of the line
 UID bar – A perpendicular line, just behind the trident shape on a
relationship line

Chen notation

Peter Chen, the creator of Chen’s notation, invented this more detailed way of
notation in 1976. Chen’s notation was one of the first to be used in software design,
and is still popular in ERD creation. Unlike Barker’s notation and UML, entities,
relationships, and attributes have many different ways of being represented.

Chen notation symbols

Let’s start with entities. An entity is shown in a rectangle, just like in many other
notations. But, that is where the similarities stop. There are 2 more ways to describe
entities:

 Weak entity – A rectangle within a rectangle


 Associative entity – A diamond within a rectangle

Attributes are in ovals. Here are some other symbols used to define attributes:

 Key attribute – The title of the attribute is underlined


 Partial key attribute – The attribute’s name is underlined with a
dashed line
 Composite attributes – These attributes branch off from a larger
attribute, and are a different color
 Multivalued attribute – An oval within an oval
 Computed/derived attribute – An oval with a dashed line

Relationships are defined with optionality, cardinality, degree, participation, and


type; using lines and diamonds.

Type

 Strong relationship – A solid-lined diamond


 Weak relationship – A diamond within a diamond, like a weak
entity

Optionality

 Mandatory – A solid line


DBMS
 Optional – A dashed line

Note that these are the same as in Barker’s notation.

Cardinality (degree)

 One to one – A 1 is at each end of the relationship


 One to many – A 1 is at one end, and N is at the other. N represents
‘many’.
 Many to one – Like one to many, but reversed
 Many to many – M is on one end, and N is on the other

Participation

 Total participation – Two parallel lines


 Partial participation – One line
DBMS
Crow notation
Crow’s foot notation, often called just Crow notation, uses many of the same symbols
as Barker’s notation and arrow notation, but in different ways. Entities are in
rectangles with their attributes inside. Relationships are defined much like they are
in other notations, but major difference is the presence of multiplicities.
Multiplicities are symbols that tell the reader the number of times instances can
associate with others.

Crow notation symbols

Multiplicity symbols

There are two marks that indicate multiplicity. The first mark, closest to the end of
the line, represents the maximum number of times an instance of an entity can be
associated with other instances

 One time – A short, perpendicular line


 Many – A three-pronged line, like in Barker’s notation

The second mark, behind the first, represents the minimum number of times an
instance of an entity can be associated with other instances. The minimum can only
be zero or one, and they are referred to as ‘optional’ or ‘mandatory’, respectively.

Some of the crow’s foot notation relationships

 Optional – An open circle


 Mandatory – A short, perpendicular line
DBMS
IDEF1X notation
Short for Integration Definition for Information Modeling, this notation style is
simple and uses letters more often than the other notations listed here.

IDEF1X notation symbols

Attributes don’t have any specific symbols, but their placement determines what type
they are. Primary attributes are listed in the top half of the entity rectangle, and other
attributes are listed in the lower half.

Like in Chen notation, entities can be described as weak or strong, or, dependant and
independent.

 Strong entity – A rectangle with sharp corners


DBMS
 Weak entity – A rectangle with rounded corners

Relationships are shown with either dashed or solid lines, with filled-in circles at one
or both ends, like many other notation types. Letters are at the end of relationships
lines, and tell the reader the cardinalities of the entity relationships.

 Zero, one, or more – No letter


 One or more – P
 Zero or more – Z
 Cardinality is specified – {n}
 N to M – N-M
 Exactly N – N

UML notation
UML, which stands for Unified Modeling Language, is extremely popular and used in
many different diagram types. Check out our recent post on UML diagrams to learn
more about them.

Make your own ER diagram in Gleek.

UML notation symbols

 Entity – Rectangle shape


 Generalization – Empty arrow at the end of a relationship line
 Aggregation – Empty diamond at the end of a relationship line
 Composition – Filled in diamond at the end of a relationship line

Relationships only have solid lines, and have numbers signifying cardinalities at the
end of the line. There are a few ways to show cardinality.
DBMS
UML notation

 Zero or one – 0..1


 Only one – 1
 Zero or more – 0..*
 One or more – 1..*
 Specific range – n..m

ERD Issues:

1) Use of Entity Set vs Attributes


The use of an entity set or attribute depends on the structure of the real-world
enterprise that is being modelled and the semantics associated with its attributes. It
leads to a mistake when the user use the primary key of an entity set as an attribute of
another entity set. Instead, he should use the relationship to do so. Also, the primary
key attributes are implicit in the relationship set, but we designate it in the relationship
sets.

2) Use of Entity Set vs. Relationship Sets


It is difficult to examine if an object can be best expressed by an entity set or relationship set.
To understand and determine the right use, the user need to designate a relationship set for
describing an action that occurs in-between the entities. If there is a requirement of
representing the object as a relationship set, then its better not to mix it with the entity set.

3) Use of Binary vs n-ary Relationship Sets


Generally, the relationships described in the databases are binary relationships. However,
non-binary relationships can be represented by several binary relationships. For example, we
can create and represent a ternary relationship 'parent' that may relate to a child, his father,
as well as his mother. Such relationship can also be represented by two binary relationships
i.e, mother and father, that may relate to their child. Thus, it is possible to represent a non-
binary relationship by a set of distinct binary relationships.

4) Placing Relationship Attributes


The cardinality ratios can become an affective measure in the placement of the
relationship attributes. So, it is better to associate the attributes of one-to-one or one-
to-many relationship sets with any participating entity sets, instead of any relationship
set. The decision of placing the specified attribute as a relationship or entity attribute
should possess the charactestics of the real world enterprise that is being modelled.

For example, if there is an entity which can be determined by the combination of


participating entity sets, instead of determing it as a separate entity. Such type of
attribute must be associated with the many-to-many relationship sets.
DBMS
Thus, it requires the overall knowledge of each part that is involved inb desgining and
modelling an ER diagram. The basic requirement is to analyse the real-world enterprise
and the connectivity of one entity or attribute with other.

12 Codd's Rules
Every database has tables, and constraints cannot be referred to as a rational database
system. And if any database has only relational data model, it cannot be a Relational
Database System (RDBMS). So, some rules define a database to be the correct
RDBMS. These rules were developed by Dr. Edgar F. Codd (E.F. Codd) in 1985, who
has vast research knowledge on the Relational Model of database Systems. Codd
presents his 13 rules for a database to test the concept of DBMS against his relational
model, and if a database follows the rule, it is called a true relational database
(RDBMS). These 13 rules are popular in RDBMS, known as Codd's 12 rules.
DBMS
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database
through its relational capabilities.

Rule 1: Information Rule


A database contains various information, and this information must be stored in each
cell of a table in the form of rows and columns.

Rule 2: Guaranteed Access Rule


Every single or precise data (atomic value) may be accessed logically from a relational
database using the combination of primary key value, table name, and column name.

Rule 3: Systematic Treatment of Null Values


This rule defines the systematic treatment of Null values in database records. The null
value has various meanings in the database, like missing the data, no value in a cell,
inappropriate information, unknown data and the primary key should not be null.

Rule 4: Active/Dynamic Online Catalog based on the


relational model
It represents the entire logical structure of the descriptive database that must be stored
online and is known as a database dictionary. It authorizes users to access the database
and implement a similar query language to access the database.

Rule 5: Comprehensive Data SubLanguage Rule


The relational database supports various languages, and if we want to access the
database, the language must be the explicit, linear or well-defined syntax, character
strings and supports the comprehensive: data definition, view definition, data
manipulation, integrity constraints, and limit transaction management operations. If
the database allows access to the data without any language, it is considered a
violation of the database.

Rule 6: View Updating Rule


All views table can be theoretically updated and must be practically updated by the
database systems.
DBMS
Rule 7: Relational Level Operation (High-Level Insert,
Update and delete) Rule
A database system should follow high-level relational operations such as insert,
update, and delete in each level or a single row. It also supports union, intersection
and minus operation in the database system.

Rule 8: Physical Data Independence Rule


All stored data in a database or an application must be physically independent to
access the database. Each data should not depend on other data or an application. If
data is updated or the physical structure of the database is changed, it will not show
any effect on external applications that are accessing the data from the database.

Rule 9: Logical Data Independence Rule


It is similar to physical data independence. It means, if any changes occurred to the
logical level (table structures), it should not affect the user's view (application). For
example, suppose a table either split into two tables, or two table joins to create a
single table, these changes should not be impacted on the user view application.

Rule 10: Integrity Independence Rule


A database must maintain integrity independence when inserting data into table's cells
using the SQL query language. All entered values should not be changed or rely on
any external factor or application to maintain integrity. It is also helpful in making the
database-independent for each front-end application.

Rule 11: Distribution Independence Rule


The distribution independence rule represents a database that must work properly,
even if it is stored in different locations and used by different end-users. Suppose a
user accesses the database through an application; in that case, they should not be
aware that another user uses particular data, and the data they always get is only
located on one site. The end users can access the database, and these access data
should be independent for every user to perform the SQL queries.

Rule 12: Non Subversion Rule


The non-submersion rule defines RDBMS as a SQL language to store and manipulate
the data in the database. If a system has a low-level or separate language other than
SQL to access the database system, it should not subvert or bypass integrity to
transform data.
DBMS
Unit 4
SQL : create and manage using create and alter
SQL | DDL, DML, TCL and DCL
In this article, we’ll be discussing Data Definition Language, Data Manipulation
Language, Transaction Control Language, and Data Control Language.

DDL (Data Definition Language) :


Data Definition Language is used to define the database structure or schema. DDL is
also used to specify additional properties of the data. The storage structure and
access methods used by the database system by a set of statements in a special
type of DDL called a data storage and definition language. These statements define
the implementation details of the database schema, which are usually hidden from
the users. The data values stored in the database must satisfy certain consistency
constraints.
For example, suppose the university requires that the account balance of a
department must never be negative. The DDL provides facilities to specify such
constraints. The database system checks these constraints every time the database
is updated. In general, a constraint can be an arbitrary predicate pertaining to the
database. However, arbitrary predicates may be costly to the test. Thus, the
database system implements integrity constraints that can be tested with minimal
overhead.
DBMS
1. Domain Constraints : A domain of possible values must be associated with
every attribute (for example, integer types, character types, date/time types).
Declaring an attribute to be of a particular domain acts as the constraints on
the values that it can take.
2. Referential Integrity : There are cases where we wish to ensure that a value
appears in one relation for a given set of attributes also appear in a certain set
of attributes in another relation i.e. Referential Integrity. For example, the
department listed for each course must be one that actually exists.
3. Assertions : An assertion is any condition that the database must always
satisfy. Domain constraints and Integrity constraints are special form of
assertions.
4. Authorization : We may want to differentiate among the users as far as the
type of access they are permitted on various data values in database. These
differentiation are expressed in terms of Authorization. The most common
being :
read authorization – which allows reading but not modification of data ;
insert authorization – which allow insertion of new data but not modification of
existing data
update authorization – which allows modification, but not deletion.
Some Commands:

CREATE : to create objects in database


ALTER : alters the structure of database
DROP : delete objects from database
RENAME : rename an objects
Following SQL DDL-statement defines the department table :

create table department


(dept_name char(20),
building char(15),
budget numeric(12,2));
Execution of the above DDL statement creates the department table with three
columns – dept_name, building, and budget; each of which has a specific datatype
associated with it.

DML (Data Manipulation Language) :


DBMS
DML statements are used for managing data with in schema objects.
DML are of two types –

1. Procedural DMLs : require a user to specify what data are needed and how
to get those data.
2. Declarative DMLs (also referred as Non-procedural DMLs) : require a user
to specify what data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than procedural DMLs.
However, since a user does not have to specify how to get the data, the database
system has to figure out an efficient means of accessing data.
Some Commands :

SELECT: retrieve data from the database


INSERT: insert data into a table
UPDATE: update existing data within a table
DELETE: deletes all records from a table, space for the records remain
Example of SQL query that finds the names of all instructors in the History
department :

select instructor.name
from instructor
where instructor.dept_name = 'History';
The query specifies that those rows from the table instructor where the dept_name is
History must be retrieved and the name attributes of these rows must be displayed.

TCL (Transaction Control Language) :


Transaction Control Language commands are used to manage transactions in the
database. These are used to manage the changes made by DML-statements. It also
allows statements to be grouped together into logical transactions.
Examples of TCL commands –

COMMIT: Commit command is used to permanently save any transaction


into the database.
ROLLBACK: This command restores the database to last committed state.
It is also used with savepoint command to jump to a savepoint
in a transaction.
SAVEPOINT: Savepoint command is used to temporarily save a transaction so
that you can rollback to that point whenever necessary.
DBMS
DCL (Data Control Language) :
A Data Control Language is a syntax similar to a computer programming language
used to control access to data stored in a database (Authorization). In particular, it is
a component of Structured Query Language (SQL).
Examples of DCL commands :

GRANT: allow specified users to perform specified tasks.


REVOKE: cancel previously granted or denied permissions.
The operations for which privileges may be granted to or revoked from a user or role
apply to both the Data definition language (DDL) and the Data manipulation
language (DML), and may include CONNECT, SELECT, INSERT, UPDATE,
DELETE, EXECUTE and USAGE.
In the Oracle database, executing a DCL command issues an implicit commit.
Hence, you cannot roll back the command.
References : kakeboksen.td.org.uit.no

DDL Statements for Creating and Managing Tables:


CREATE TABLE:
 Creates a new table in the database.
 Syntax:
SQL
CREATE TABLE table_name (
column1 data_type constraints,
column2 data_type constraints,
...
);
 Example:
SQL
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100) UNIQUE
);
ALTER TABLE:
 Modifies the structure of an existing table.
 Common uses:
DBMS
o Adding a new column:
SQL
ALTER TABLE table_name ADD column_name data_type constraints;
- Dropping a column:
SQL
ALTER TABLE table_name DROP COLUMN column_name;
- Renaming a column:
SQL
ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
- Modifying a column's data type or constraints:
SQL
ALTER TABLE table_name ALTER COLUMN column_name SET DATA TYPE
new_data_type constraints;
Other Common DDL Statements:
 DROP TABLE: Deletes a table from the database.
 TRUNCATE TABLE: Deletes all data from a table, but keeps the table
structure.
 CREATE INDEX: Creates an index on one or more columns to improve query
performance.
 DROP INDEX: Deletes an index.

DDL Statements for Creating and Managing Tables:


CREATE TABLE:
 Creates a new table in the database.
 Syntax:
SQL
CREATE TABLE table_name (
column1 data_type constraints,
column2 data_type constraints,
...
);
 Example:
SQL
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100) UNIQUE
);
ALTER TABLE:
 Modifies the structure of an existing table.
DBMS
 Common uses:
o Adding a new column:
SQL
ALTER TABLE table_name ADD column_name data_type constraints;
- Dropping a column:
SQL
ALTER TABLE table_name DROP COLUMN column_name;
- Renaming a column:
SQL
ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
- Modifying a column's data type or constraints:
SQL
ALTER TABLE table_name ALTER COLUMN column_name SET DATA TYPE
new_data_type constraints;
Other Common DDL Statements:
 DROP TABLE: Deletes a table from the database.
 TRUNCATE TABLE: Deletes all data from a table, but keeps the table
structure.
 CREATE INDEX: Creates an index on one or more columns to improve query
performance.
 DROP INDEX: Deletes an index.

DML Commands in SQL


DML is an abbreviation of Data Manipulation Language.

The DML commands in Structured Query Language change the data present in the
SQL database. We can easily access, store, modify, update and delete the existing
records from the database using DML commands.

Following are the four main DML commands in SQL:

1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command

SELECT DML Command


DBMS
SELECT is the most important data manipulation command in Structured Query
Language. The SELECT command shows the records of the specified table. It also shows
the particular record of a particular column by using the WHERE clause.

Backward Skip 10sPlay VideoForward Skip 10s

Syntax of SELECT DML command

1. SELECT column_Name_1, column_Name_2, ….., column_Name_N FROM Name


_of_table;

Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of


those columns whose data we want to retrieve from the table.

If we want to retrieve the data from all the columns of the table, we have to use the
following SELECT command:

1. SELECT * FROM table_name;


Examples of SELECT Command
Example 1: This example shows all the values of every column from the table.

1. SELECT * FROM Student;

This SQL statement displays the following values of the student table:

Student_ID Student_Name Student_Marks

BCA1001 Abhay 85

BCA1002 Anuj 75

BCA1003 Bheem 60

BCA1004 Ram 79

BCA1005 Sumit 80

Example 2: This example shows all the values of a specific column from the table.

ADVERTISEMENT
DBMS
1. SELECT Emp_Id, Emp_Salary FROM Employee;

This SELECT statement displays all the values of Emp_Salary and Emp_Id column
of Employee table:

Emp_Id Emp_Salary

201 25000

202 45000

203 30000

204 29000

205 40000

Example 3: This example describes how to use the WHERE clause with the SELECT
DML command.

Let's take the following Student table:

Student_ID Student_Name Student_Marks

BCA1001 Abhay 80

BCA1002 Ankit 75

BCA1003 Bheem 80

BCA1004 Ram 79

BCA1005 Sumit 80

If you want to access all the records of those students whose marks is 80 from the
above table, then you have to write the following DML command in SQL:
DBMS
1. SELECT * FROM Student WHERE Stu_Marks = 80;

The above SQL query shows the following table in result:

Student_ID Student_Name Student_Marks

BCA1001 Abhay 80

BCA1003 Bheem 80

BCA1005 Sumit 80

INSERT DML Command


INSERT is another most important data manipulation command in Structured Query
Language, which allows users to insert data in database tables.

Syntax of INSERT Command

1. INSERT INTO TABLE_NAME ( column_Name1 , column_Name2 , column_Nam


e3 , .... column_NameN ) VALUES (value_1, value_2, value_3, .... value_N ) ;

Examples of INSERT Command


Example 1: This example describes how to insert the record in the database table.

Let's take the following student table, which consists of only 2 records of the student.

Stu_Id Stu_Name Stu_Marks Stu_Age

101 Ramesh 92 20

201 Jatin 83 19

Suppose, you want to insert a new record into the student table. For this, you have to
write the following DML INSERT command:
DBMS
1. INSERT INTO Student (Stu_id, Stu_Name, Stu_Marks, Stu_Age) VALUES (104,
Anmol, 89, 19);

UPDATE DML Command


UPDATE is another most important data manipulation command in Structured Query
Language, which allows users to update or modify the existing data in database tables.

Syntax of UPDATE Command

1. UPDATE Table_name SET [column_name1= value_1, ….., column_nameN = val


ue_N] WHERE CONDITION;
ADVERTISEMENT

ADVERTISEMENT

Here, 'UPDATE', 'SET', and 'WHERE' are the SQL keywords, and 'Table_name' is the
name of the table whose values you want to update.

Examples of the UPDATE command


Example 1: This example describes how to update the value of a single field.

Let's take a Product table consisting of the following records:

Product_Id Product_Name Product_Price Product_Quan ty

P101 Chips 20 20

P102 Chocolates 60 40

P103 Maggi 75 5

P201 Biscuits 80 20

P203 Namkeen 40 50

Suppose, you want to update the Product_Price of the product whose Product_Id is
P102. To do this, you have to write the following DML UPDATE command:

1. UPDATE Product SET Product_Price = 80 WHERE Product_Id = 'P102' ;


DBMS
Example 2: This example describes how to update the value of multiple fields of
the database table.

Let's take a Student table consisting of the following records:

Stu_Id Stu_Name Stu_Marks Stu_Age

101 Ramesh 92 20

201 Jatin 83 19

202 Anuj 85 19

203 Monty 95 21

102 Saket 65 21

103 Sumit 78 19

104 Ashish 98 20

Suppose, you want to update Stu_Marks and Stu_Age of that student whose Stu_Id is
103 and 202. To do this, you have to write the following DML Update command:

1. UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id = 103 AN


D Stu_Id = 202;

DELETE DML Command


DELETE is a DML command which allows SQL users to remove single or multiple
existing records from the database tables.

This command of Data Manipulation Language does not delete the stored data
permanently from the database. We use the WHERE clause with the DELETE command
to select specific rows from the table.

Syntax of DELETE Command

1. DELETE FROM Table_Name WHERE condition;


DBMS
Examples of DELETE Command
Example 1: This example describes how to delete a single record from the table.

Let's take a Product table consisting of the following records:

Product_Id Product_Name Product_Price Product_Quan ty

P101 Chips 20 20

P102 Chocolates 60 40

P103 Maggi 75 5

P201 Biscuits 80 20

P203 Namkeen 40 50

Suppose, you want to delete that product from the Product table whose Product_Id is
P203. To do this, you have to write the following DML DELETE command:

ADVERTISEMENT

1. DELETE FROM Product WHERE Product_Id = 'P202' ;

Example 2: This example describes how to delete the multiple records or rows
from the database table.

Let's take a Student table consisting of the following records:

Stu_Id Stu_Name Stu_Marks Stu_Age

101 Ramesh 92 20

201 Jatin 83 19

202 Anuj 85 19
DBMS
203 Monty 95 21

102 Saket 65 21

103 Sumit 78 19

104 Ashish 98 20

Suppose, you want to delete the record of those students whose Marks is greater than
70. To do this, you have to write the following DML Update command:

1. DELETE FROM Student WHERE Stu_Marks > 70 ;

Retrieving data using the SQL Select Statement


SQL is a comprehensive database language. SQL, pronounced
Sequel or simply S-Q-L, is a computer programming language
used for querying relational databases following a nonprocedural
approach. When you extract information from a database using
SQL, this is termed querying the database.

A relational database is implemented through the use of a


Relational Database Management System (RDBMS). An RDBMS
performs all the basic functions of the DBMS software mentioned
above along with a multitude of other functions that make the
relational model easier to understand and to implement. RDBMS
users manipulate data through the use of a special data
manipulation language. Database structures are defined through
the use of a data definition language. The commands that system
users execute in order to store and retrieve data can be entered
at a terminal with an RDBMS interface by typing the commands,
or entered through use of some type of graphical interface. The
DBMS then processes the commands.

Capabilities of the SELECT Statement


Data retrieval from data base is done through appropriate and
efficient use of SQL. Three concepts from relational theory
DBMS
encompass the capability of the SELECT statement: projection,
selection, and joining.

 Projection: A project operation selects only certain columns


(fields) from a table. The result table has a subset of the
available columns and can include anything from a single
column to all available columns.
 Selection: A select operation selects a subset of rows
(records) in a table (relation) that satisfy a selection
condition. The ability to select rows from out of complete
result set is called Selection. It involves conditional filtering
and data staging. The subset can range from no rows, if
none of the rows satisfy the selection condition, to all rows
in a table.
 Joining: A join operation combines data from two or more
tables based on one or more common column values. A join
operation enables an information system user to process the
relationships that exist between tables. The join operation is
very powerful because it allows system users to investigate
relationships among data elements that might not be
anticipated at the time that a database is designed.

Consider the above table structures. Fetching first_name name,


department_id and salary for a single employee from EMPLOYEES
table is Projection. Fetching employee details whose salary is less
than 5000, from EMPLOYEES table is Selection. Fetching
employee's first name, department name by joining EMPLOYEES
and DEPARTMENTS is Joining.
DBMS
Basic SELECT statement
The basic syntax for a SELECT statement is presented below.

SELECT [DISTINCT | ALL] {* | select_list}


FROM {table_name [alias] | view_name}
[{table_name [alias] | view_name}]...
[WHERE condition]
[GROUP BY condition_list]
[HAVING condition]
[ORDER BY {column_name | column_# [ ASC | DESC ] }
...

The SELECT clause is mandatory and carries out the relational


project operation.

The FROM clause is also mandatory. It identifies one or more


tables and/or views from which to retrieve the column data
displayed in a result table.

The WHERE clause is optional and carries out the relational select
operation. It specifies which rows are to be selected.

The GROUP BY clause is optional. It organizes data into groups by


one or more column names listed in the SELECT clause.

The optional HAVING clause sets conditions regarding which


groups to include in a result table. The groups are specified by
the GROUP BY clause.

The ORDER BY clause is optional. It sorts query results by one or


more columns in ascending or descending order.

Arithmetic expressions and NULL values in the SELECT


statement
An arithmetic expression can be created using the column names,
operators and constant values to embed an expression in a
SELECT statement. The operator applicable to a column depends
on column's data type. For example, arithmetic operators will not
fit for character literal values. For example,

SELECT employee_id, sal * 12 ANNUAL_SAL


DBMS
FROM employees;

The above query contains the arithmetic expression (sal * 12) to


calculate annual salary of each employee.

Arithmetic operators
Operators act upon the columns (known as operands) to result
into a different result. In case of multiple operators in an
expression, the order of evaulation is decided by the operator
precedence. Here are the elementary rules of precedence -

 Multiplication and division occur before Addition and


Subtraction.
 Operators on the same priority are evaluated from left to
right.
 Use paretheses to override the default behavior of the
operators.

Below table shows the precedence of the operators, in such


cases. Precedence Level Operator Symbol Operation

Description Operator Precedence


Addition + Lowest
Subtraction - Lowest
Multiplication * Medium
Division / Medium
Brackets ( ) Highest

Examine the below queries (a), (b), and (c)

 SQL> SELECT 2*35 FROM DUAL;


 SQL> SELECT salary + 1500 FROM employees;
 SQL> SELECT first_name, salary, salary +
(commission_pct* salary) FROM employees;

Query (a) multiplies two numbers, while (b) shows addition of


$1500 to salaries of all employees. Query (c) shows the addition
of commission component to employee's salary. As per the
precedence, first commission would be calculated on the salary,
and then added to the salary.

Column Alias
DBMS
An alias is used to rename a column or an expression during
display. The alias to a column or an expression appears as the
heading in the output of a query. It is useful in providing a
meaningful heading to long expressions in the SELECT query. By
default, the alias appears in uppercase in the query output
without spaces. To override this behavior, alias must be enclosed
within double quotes to preserve the case and spaces in the alias
name.

SELECT price * 2 as DOUBLE_PRICE, price * 10 "Double


Price"
FROM products;

DOUBLE_PRICE Double Price


------------ ------------
39.9 39.9
60 60
51.98 51.98

Concatenation operators
Concatenation operator can be used to join two string values or
expressions in a SELECT query. The double vertical bar symbol is
used as string concatenation operator. It is applicable only for
character and string column values resulting into a new character
expression. Example

SQL> SELECT 'ORACLE'||' CERTIFICATION' FROM dual;

The above query shows concatenation of two character literals


values.

Literals
Any hard coded value, which is not stored in database, in the
SELECT clause, is known s Literal. It can be number, character,
or date value. Character and date values must be enclosed within
quotes. Consider the below SQL queries.examples of using literals
of different data types in SQL queries.

The query below uses two character literals to join them together.

SQL> SELECT 'ORACLE'||' CERTIFICATION' FROM DUAL


DBMS
The query below uses character literals to pretty print the
employee's salary.

SQL> SELECT first_name ||'earns'|| salary||' as of


'|||sysdate
FROM employees

Quote Operator
The quote operator is used to specify the quotation mark
delimiter of your own. You can chose a convenient delimiter,
depedning on the data.

SELECT department_name|| ' Department' ||q'['s


Manager Id: ]'|| manager_id
FROM departments;

NULL
If a column doesn't has a definite value, it is considered as NULL.
NULL value denotes unknown or unavailable. It is not zero for
numeric values, not blank space for character values.

Columns with NULL value can be selected in a SELECT query and


can be the part of an arithmetic expression. Any arithmetic
expression using NULL values results into NULL. For this reason,
columns with NULL value must be handled differently by
specifying their alternate values using Oracle supplied functions
like NVL or NULLIF.

SQL> SELECT NULL + 1000 NUM


FROM DUAL;

NUM
--------

DISTINCT Keyword
If the data is expected to have duplicate results, use DISTINCT
keyword to eliminate duplicates and diplay only the unique results
in the query output. Only the selected columns are validated for
duplication and the rows will be logically eliminated from the
DBMS
query output. To be noted, the DISTINCT keyword must appear
just after the SELECT clause.

The simple query below demonstrates the use of DISTINCT to


display unique department ids from EMPLOYEES table.

SQL> SELECT DISTINCT DEPARTMENT_ID


FROM employees;

DEPARTMENT_ID
---------------
10
20
30
40

DESCRIBE command
The structural metadata of a table may be obtained by querying
the database for the list of columns that comprise it using the
DESCRIBE command. It will list the used column names, their null
property and data type.

Syntax:

DESC[RIBE] [SCHEMA].object name

For example,

DESC EMPLOYEE

will display the EMPLOYEE table structure i.e. columns, their data
types, precision and nullable property.

Restricting and Sorting Data


The essential capabilities of SELECT statement are Selection,
Projection and Joining. Displaying specific columns from a table is
known as a project operation. We will now focus on displaying
specific rows of output. This is known as a select operation.
Specific rows can be selected by adding a WHERE clause to a
SELECT query. As a matter of fact, the WHERE clause appears
just after the FROM clause in SELECT query hierarchy. The
DBMS
sequence has to be maintained in all scenarios. If violated, Oracle
raises an exception.

Syntax:
SELECT *|{[DISTINCT] column| expression [alias],..}
FROM table
[WHERE condition(s)]

In the syntax,

 WHERE clause is the keyword


 [condition] contains column names, expressions, constants,
literals and a comparison operator.

Suppose that your manager is working on the quarterly budget


for your organization. As part of this activity, it is necessary to
produce a listing of each employee's essential details, but only for
employees that are paid at least $25,000 annually. The SQL
query below accomplishes this task. Note the use of the WHERE
clause shown in bold text.

SELECT Employee_ID, Last_Name, First_Name, Salary


FROM employees
WHERE Salary >= 25000;

EMPLOYEE_ID LAST_NAME FIRST_NAME SALARY


---------- --------------- --------------- --------
---
88303 Jones Quincey
$30,550.00
88404 Barlow William
$27,500.00
88505 Smith Susan
$32,500.00

3 rows selected

Points to be noted -
 A SELECT clause can contain only one WHERE clause.
However, multiple filter conditions can be appended to
WHERE clause using AND or OR operator.
DBMS
 The columns, literals or expressions in a predicate clause
must be of similar or interconvertible data types.
 Column alias cannot be used in the WHERE clause.
 Character literals must be enclosed within single quotation
marks and are case sensitive.
 Date literals must be enclosed within single quotation marks
and are format sensitive. Default format is DD-MON-RR.

Comparison Operators
Comparison operators are used in predicates to compare one
term or operand with another term. SQL offers comprehensive
set of equality, inequality and miscellaneous operators. They can
be used depending on the data and filter condition logic in the
SELECT query. When you use comparison operators in a WHERE
clause, the arguments (objects or values you are comparing) on
both sides of the operator must be either a column name, or a
specific value. If a specific value is used, then the value must be
either a numeric value or a literal string. If the value is a
character string or date, you must enter the value within single
quotation marks (' ').

Oracle has nine comparison operators to be used in equality or


inequality conditions.

Operator Meaning
= equal to
< less than
> greater than
>= greater than or equal to
<= less than or equal to
!= not equal to
<> not equal to

Other Oracle operators are BETWEEN..AND, IN, LIKE, and IS


NULL.

The BETWEEN Operator


The BETWEEN operator can be used to compare a column value
within a definite range. The specified range must have a lower
and upper limit where both are inclusive during comparison. Its
DBMS
use is similar to composite inequality operator (<= and >=). It
can be used with numeric, character and date type values.

For example, the WHERE condition SALARY BETWEEN 1500 AND


2500 in a SELECT query will list those employees whose salary is
between 1500 and 2500.

The IN Operator
The IN operator is used to test a column value in a given set of
value. If the column can be equated to any of the values from the
given set, the condition is validated. The condition defined using
the IN operator is also known as the membership condition.

For example, the WHERE condition SALARY IN (1500, 3000, 2500) in


a SELECT query will restrict the rows where salary is either of
1500, 3000 or 2500.

The LIKE Operator


The LIKE operator is used for pattern matching and wildcard
searches in a SELECT query. If a portion of the column value is
unknown, wildcard can be used to substitute the unknown part. It
uses wildcard operators to build up the search string, thus search
is known as Wildcard search. These two operators are Percentile
('%') and Underscore ('_'). Underscore ('_') substitutes a single
character while percentile ('%') replaces more than one
characters. They can be used in combination as well.

For example, the below SELECT query lists the first names of
those employees whose last name starts with 'SA'.

SELECT first_name
FROM employees
WHERE last_name LIKE 'SA%';

IS (NOT) NULL Conditions


To be noted, NULL values cannot be tested using equality
operator. It is because NULL values are unknown and unassigned
while equality operator tests for a definite value. The IS NULL
DBMS
operator serves as equality operator to check NULL values of a
column.

For example, the WHERE condition COMMISSION_PCT IS NULL in a


SELECT query will list employees who don't have commission
percentage.

Logical Operators
Multiple filter conditions can be added to the WHERE clause
predicate. More than one condition can be combined together
using logical operators AND, OR and NOT.

 AND: joins two or more conditions, and returns results only


when all of the conditions are true.
 OR: joins two or more conditions, and it returns results
when any of the conditions are true.
 NOT: negates the expression that follows it.

The AND operator links two or more conditions in a WHERE clause


and returns TRUE only if all the conditions are true. Suppose that
a manager needs a list of female employees. Further, the list
should only include employees with last names that begin with
the letter "E" or that come later in the alphabet. Additionally, the
result table should be sorted by employee last name. There are
two simple conditions to be met. The WHERE clause may be
written as: WHERE Gender = 'F' AND last_name > 'E'.

SELECT last_name "Last Name", first_name "First Name",


Gender "Gender"
FROM employees
WHERE Gender = 'F' AND last_name > 'E'
ORDER BY last_name;

The OR operator links more than one condition in a WHERE clause


and returns TRUE if either of the condition returns true. Suppose
that your organizational manager's requirements change a bit.
Another employee listing is needed, but in this listing the
employees should: (1) be female or, (2) have a last name that
begins with the letter "T" or a letter that comes later in the
alphabet. The result table should be sorted by employee last
name. In this situation either of the two conditions can be met in
order to satisfy the query. Female employees should be listed
DBMS
along with employees having a name that satisfies the second
condition.

The NOT operator is used to negate an expression or conition.

The ORDER BY Clause


When you display only a few rows of data, it may be unnecessary
to sort the output; however, when you display numerous rows,
managers may be aided in decision making by having the
information sorted. Output from a SELECT statement can be
sorted by using the optional ORDER BY clause. When you use the
ORDER BY clause, the column name on which you are ordering
must also be a column name that is specified in the SELECT
clause.

The below SQL query uses an ORDER BY clause to sort the result
table by the last_name column in ascending order. Ascending
order is the default sort order.

SELECT last_name, first_name


FROM employees
WHERE last_name >= 'J'
ORDER BY last_name;

last_name first_name
--------------- ---------------
Jones Quincey
Klepper Robert
Quattromani Toni
Schultheis Robert

Sorting can be based on numeric and date values also. Sorting


can also be done based on multiple columns.

By default, the ORDER BY clause will sort output rows in the


result table in ascending order. We can use the keyword DESC
(short for descending) to enable descending sort. The alternative
default is ASC which sorts in ascending order, but the ASC
keyword is rarely used since it is the default. When the ASC or
DESC optional keyword is used, it must follow the column name
on which you are sorting in the WHERE clause.
DBMS
Positional Sorting - Numeric position of the column in the selected
column list can be given in ORDER BY clause, instead of column
name. It is mainly used in UNION queries (discussed later). The
Query orders the result set by salary since it appears 2nd in the
column list.

SELECT first_name, salary


FROM employees
ORDER BY 2;

Substitution Variables
When a SQL query has to be executed more than once for the
different set of inputs, substitution variables can be used.
Substitution variables can be used to prompt for user inputs
before the query execution. They are widely used in query based
report generation which takes data range from the users as input
for the conditional filtering and data display. Substitution
variables are prefixed by a single-ampersand (&) symbol to
temporarily store values. For example,

SELECT EMPLOYEE_ID, LAST_NAME, SALARY


FROM employees
WHERE LAST_NAME = &last_name
OR EMPLOYEE_ID = &EMPNO;

When the above SELECT query is executed, oracle identifies the


'&' as substitution variable. It prompts user to enter value for
'last_name' and 'EMPNO' as below.

Enter value for last_name:


Enter value for empno:

Once the user provides inputs to both the variables, values are
substituted, query is verified and executed.

Points to be noted -
 If the variable is meant to substitute a character or date
value, the literal needs to be enclosed in single quotes. A
useful technique is to enclose the ampersand substitution
variable in single quotes when dealing with character and
date values.
DBMS
 Both SQL Developer and SQL* Plus support the substitution
variables and the DEFINE/UNDEFINE commands. Though
SQL Developer or SQL* Plus does not support validation
checks (except for data type) on user input.
 You can use the substitution variables not only in the
WHERE clause of a SQL statement, but also as substitution
for column names, expressions, or text.

Using the Double-Ampersand Substitution Variable


When the same substitution variable is used at more than one
place, then to avoid re-entering the same data again, we use
double ampersand substitution. In such cases, value of the
substitution variable, once entered, would be substituted at all
instants of usage.

SELECT first_name, HIRE_DATE, SEPARATION_DATE


FROM employees
WHERE HIRE_DATE LIKE '%&DT%' AND SEPARATION_DATE
'%&&DT%'

Note that the same value of &DT is substituted twice in the above
query. So, its value once given by the user will be substituted at
two places.

The DEFINE and VERIFY Commands


Setting the definition of variables in a session is set by DEFINE
feature of SQL* Plus. The variables can be defined in the session,
so as to avoid halt during query execution. Oracle reads the same
variable whenever encountered in an SQL query. It is in ON state
by default. With the help of DEFINE clause, one can declare a
variable in command line before query execution as DEFINE
variable=value;.

Verify command verifies the above substitution showing as OLD


and NEW statement. It is OFF by default and can be set to ON
using SET command.

SQL> SET DEFINE ON


SQL> SET VERIFY ON
SQL> DEFINE NAME = MARTIN'
SQL> SELECT first_name, SALARY
DBMS
FROM employees
WHERE first_name = '&NAME';
OLD 1: select first_name, sal from employee where
first_name = '&first_name'
new 1: select first_name, sal from employee where
first_name = 'MARTIN'

first_name SALARY
------- -------
MARTIN 5000

Using Single-Row Functions

Using Single row functions to customize output


Oracle SQL supplies a rich library of in-built functions which can
be employed for various tasks. The essential capabilities of a
functions can be the case conversion of strings, in-string or
substring operations, mathematical computations on numeric
data, and date operations on date type values. SQL Functions
optionally take arguments from the user and mandatorily return a
value.

On a broader category, there are two types of functions :-

Single Row functions - Single row functions are the one who work
on single row and return one output per row. For example, length
and case conversion functions are single row functions.

Multiple Row functions - Multiple row functions work upon group of


rows and return one result for the complete set of rows. They are
also known as Group Functions.

Single row functions


Single row functions can be character functions, numeric
functions, date functions, and conversion functions. Note that
these functions are used to manipulate data items. These
functions require one or more input arguments and operate on
each row, thereby returning one output value for each row.
Argument can be a column, literal or an expression. Single row
DBMS
functions can be used in SELECT statement, WHERE and ORDER
BY clause. Single row functions can be -

 General functions - Usually contains NULL handling functions.


The functions under the category are NVL, NVL2, NULLIF,
COALESCE, CASE, DECODE.
 Case Conversion functions - Accepts character input and
returns a character value. Functions under the category are
UPPER, LOWER and INITCAP.
o UPPER function converts a string to upper case.
o LOWER function converts a string to lower case.
o INITCAP function converts only the initial alphabets of
a string to upper case.

 Character functions - Accepts character input and returns
number or character value. Functions under the category
are CONCAT, LENGTH, SUBSTR, INSTR, LPAD, RPAD, TRIM
and REPLACE.
o CONCAT function concatenates two string values.
o LENGTH function returns the length of the input string.
o SUBSTR function returns a portion of a string from a
given start point to an end point.
o INSTR function returns numeric position of a character
or a string in a given string.
o LPAD and RPAD functions pad the given string upto a
specific length with a given character.
o TRIM function trims the string input from the start or
end.
o REPLACE function replaces characters from the input
string with a given character.
 Date functions - Date arithmetic operations return date or
numeric values. Functions under the category are
MONTHS_BETWEEN, ADD_MONTHS, NEXT_DAY, LAST_DAY,
ROUND and TRUNC.
o MONTHS_BETWEEN function returns the count of
months between the two dates.
o ADD_MONTHS function add 'n' number of months to an
input date.
o NEXT_DAY function returns the next day of the date
specified.
o LAST_DAY function returns last day of the month of the
input date.
DBMS
o ROUND and TRUNC functions are used to round and
truncates the date value.
 Number functions - Accepts numeric input and returns
numeric values. Functions under the category are ROUND,
TRUNC, and MOD.
o ROUND and TRUNC functions are used to round and
truncate the number value.
o MOD is used to return the remainder of the division
operation between two numbers.

Illustrations
General functions

The SELECT query below demonstrates the use of NVL function.

SELECT first_name, last_name, salary, NVL


(commission_pct,0)
FROM employees
WHERE rownum < 5;

FIRST_NAME LAST_NAME
SALARY NVL(COMMISSION_PCT,0)
-------------------- ------------------------- --------
-- ---------------------
Steven King
24000 0
Neena Kochhar
17000 0
Lex De Haan
17000 0
Alexander Hunold
9000 0
Case Conversion functions

The SELECT query below demonstrates the use of case


conversion functions.

SELECT UPPER (first_name), INITCAP (last_name), LOWER


(job_id)
FROM employees
WHERE rownum < 5;
DBMS
UPPER(FIRST_NAME) INITCAP(LAST_NAME)
LOWER(JOB_
-------------------- ------------------------- --------
--
STEVEN King ad_pres
NEENA Kochhar ad_vp
LEX De Haan ad_vp
ALEXANDER Hunold it_prog
Character functions

The SELECT query below demonstrates the use of CONCAT


function to concatenate two string values.

SELECT CONCAT (first_name, last_name)


FROM employees
WHERE rownum < 5;

CONCAT(FIRST_NAME,LAST_NAME)
--------------------------------
EllenAbel
SundarAnde
MozheAtkinson
DavidAustin

The SELECT query below demonstrates the use of SUBSTR and


INSTR functions. SUBSTR function returns the portion of input
string from 1st position to 5th position. INSTR function returns
the numeric position of character 'a' in the first name.

SELECT SUBSTR (first_name,1,5), INSTR (first_name,'a')


FROM employees
WHERE rownum < 5;

SUBST INSTR(FIRST_NAME,'A')
----- ---------------------
Ellen 0
Sunda 5
Mozhe 0
David 2

The SELECT query below demonstrates the usage of LPAD and


RPAD to pretty print the employee and job information.

SELECT RPAD(first_name,10,'_')||LPAD (job_id,15,'_')


DBMS
FROM employees
WHERE rownum < 5;

RPAD(FIRST_NAME,10,'_')||
-------------------------
Steven____________AD_PRES
Neena_______________AD_VP
Lex_________________AD_VP
Alexander_________IT_PROG
Number functions

The SELECT query below demonstrates the use of ROUND and


TRUNC functions.

SELECT ROUND (1372.472,1)


FROM dual;

ROUND(1372.472,1)
-----------------
1372.5

SELECT TRUNC (72183,-2)


FROM dual;

TRUNC(72183,-2)
---------------
72100
Date arithmetic operations

The SELECT query below shows a date arithmetic function where


difference of employee hire date and sysdate is done.

SELECT employee_id, (sysdate - hire_date)


Employment_days
FROM employees
WHERE rownum < 5;

EMPLOYEE_ID EMPLOYMENT_DAYS
----------- ---------------
100 3698.61877
101 2871.61877
102 4583.61877
103 2767.61877
DBMS

Date functions

The SELECT query below demonstrates the use of


MONTHS_BETWEEN, ADD_MONTHS, NEXT_DAY and LAST_DAY
functions.

SELECT employee_id, MONTHS_BETWEEN (sysdate, hire_date)


Employment_months
FROM employees
WHERE rownum < 5;

EMPLOYEE_ID EMPLOYMENT_MONTHS
----------- -----------------
100 121.504216
101 94.3751837
102 150.633248
103 90.9558289

SELECT ADD_MONTHS (sysdate, 5), NEXT_DAY (sysdate),


LAST_DAY (sysdate)
FROM dual;

ADD_MONTH NEXT_DAY( LAST_DAY(


--------- --------- ---------
01-JAN-14 05-AUG-13 31-AUG-13

Conversion Functions and conditional expressions

Using Conversion func ons

Besides the SQL utility functions, Oracle inbuilt function library


contains type conversion functions. There may be scenarios
where the query expects input in a specific data type, but it
receives it in a different data type. In such cases, Oracle implicitly
tries to convert the unexpected value to a compatible data type
which can be substituted in place and application continuity is not
compromised. Type conversion can be either implicitly done by
Oracle or explicitly done by the programmer.

Implicit data type conversion works based on a matrix which


showcases the Oracle's support for internal type casting. Besides
DBMS
these rules, Oracle offers type conversion functions which can be
used in the queries for explicit conversion and formatting. As a
matter of fact, it is recommended to perform explicit conversion
instead of relying on software intelligence. Though implicit
conversion works well, but to eliminate the skew chances where
bad inputs could be difficult to typecast internally.

Implicit Data Type Conversion


A VARCHAR2 or CHAR value can be implicitly converted to
NUMBER or DATE type value by Oracle. Similarly, a NUMBER or
DATA type value can be automatically converted to character
data by Oracle server. Note that the impicit interconversion
happens only when the character represents the a valid number
or date type value respectively.

For example, examine the below SELECT queries. Both the


queries will give the same result because Oracle internally treats
15000 and '15000' as same.

Query-1
SELECT employee_id,first_name,salary
FROM employees
WHERE salary > 15000;
Query-2
SELECT employee_id,first_name,salary
FROM employees
WHERE salary > '15000';

Explicit Data Type Conversion


SQL Conversion functions are single row functions which are
capable of typecasting column value, literal or an expression .
TO_CHAR, TO_NUMBER and TO_DATE are the three functions
which perform cross modification of data types.

TO_CHAR function
TO_CHAR function is used to typecast a numeric or date input to
character type with a format model (optional).
DBMS
Syntax
TO_CHAR(number1, [format], [nls_parameter])

For number to character conversion, nls parameters can be used


to specify decimal characters, group separator, local currency
model, or international currency model. It is an optional
specification - if not available, session level nls settings will be
used. For date to character conversion, the nls parameter can be
used to specify the day and month names, as applicable.

Dates can be formatted in multiple formats after converting to


character types using TO_CHAR function. The TO_CHAR function
is used to have Oracle 11g display dates in a particular format.
Format models are case sensitive and must be enclosed within
single quotes.

Consider the below SELECT query. The query format the


HIRE_DATE and SALARY columns of EMPLOYEES table using
TO_CHAR function.

SELECT first_name,
TO_CHAR (hire_date, 'MONTH DD, YYYY') HIRE_DATE,
TO_CHAR (salary, '$99999.99') Salary
FROM employees
WHERE rownum < 5;

FIRST_NAME HIRE_DATE SALARY


-------------------- ------------------ ----------
Steven JUNE 17, 2003 $24000.00
Neena SEPTEMBER 21, 2005 $17000.00
Lex JANUARY 13, 2001 $17000.00
Alexander JANUARY 03, 2006 $9000.00

The first TO_CHAR is used to convert the hire date to the date
format MONTH DD, YYYY i.e. month spelled out and padded with
spaces, followed by the two-digit day of the month, and then the
four-digit year. If you prefer displaying the month name in mixed
case (that is, "December"), simply use this case in the format
argument: ('Month DD, YYYY').

The second TO_CHAR function in Figure 10-39 is used to format


the SALARY to display the currency sign and two decimal
positions.
DBMS
Oracle offers comprehensive set of format models. The below
table shows the list of format models which can be used to
typecast date and number values as character using TO_CHAR.

Format
Description
Model

It returns a comma in the specified posi on. You can specify


mul ple commas in a number format model. Restric ons:A
,(comma) comma element cannot begin a number format model. A comma
cannot appear to the right of a decimal character or period in a
number format model.

Returns a decimal point, which is a period (.) in the specified


.(period) posi on. Restric on: You can specify only one period in a number
format model

$ Returns value with a leading dollar sign

0 Returns leading zeros. Returns trailing zeros.

Returns value with the specified number of digits with a leading


space if posi ve or with a leading minus if nega ve. Leading zeros
9
are blank, except for a zero value, which returns a zero for the
integer part of the fixed-point number.

Returns blanks for the integer part of a fixed-point number when


B
the integer part is zero (regardless of "0"s in the format model).

Returns in the specified posi on the ISO currency symbol (the


C
current value of the NLS_ISO_CURRENCY parameter).

Returns in the specified posi on the decimal character, which is


the current value of the NLS_NUMERIC_CHARACTER parameter.
D
The default is a period (.). Restric on: You can specify only one
decimal character in a number format model.

EEE Returns a value using in scien fic nota on.

FM Returns a value with no leading or trailing blanks.


DBMS
Returns in the specified posi on the group separator (the current
value of the NLS_NUMERIC_CHARACTER parameter). You can
G specify mul ple group separators in a number format model.
Restric on: A group separator cannot appear to the right of a
decimal character or period in a number format model

Returns in the specified posi on the local currency symbol (the


L
current value of the NLS_CURRENCY parameter).

Returns nega ve value with a trailing minus sign (-). Returns


posi ve value with a trailing blank. Restric on: The MI format
MI
element can appear only in the last posi on of a number format
model.

Returns nega ve value in . It can appear only in the end of a


PR
number format model.

Returns a value as Roman numerals in uppercase. Returns a value


RN,rm as Roman numerals in lowercase. Value can be an integer
between 1 and 3999.

Returns nega ve value with a leading or trailing minus sign (-).


Returns posi ve value with a leading or trailing plus sign (+).
S
Restric on: The S format element can appear only in the first or
last posi on of a number format model.

"Text minimum". Returns (in decimal output) the smallest number


TM
of characters possible. This element is case-insensi ve.

Returns in the specified posi on the "Euro" (or other) dual


U currency symbol (the current value of the NLS_DUAL_CURRENCY
parameter).

Returns a value mul plied by 10n (and if necessary, round it up),


V
where n is the number of 9's a er the "V".

X Returns the hexadecimal value of the specified number of digits.

TO_NUMBER function
DBMS
The TO_NUMBER function converts a character value to a numeric
datatype. If the string being converted contains nonnumeric
characters, the function returns an error.

Syntax
TO_NUMBER (string1, [format], [nls_parameter])

The below table shows the list of format models which can be
used to typecast character values as number using TO_NUMBER.

Format
Description
Model

CC Century

SCC Century BC prefixed with -

YYYY Year with 4 numbers

SYYY Year BC prefixed with -

IYYY ISO Year with 4 numbers

YY Year with 2 numbers

RR Year with 2 numbers with Y2k compa bility

YEAR Year in characters

SYEAR Year in characters, BC prefixed with -

BC BC/AD Indicator

Q Quarter in numbers (1,2,3,4)

MM Month of year 01, 02...12

MONTH Month in characters (i.e. January)

MON JAN, FEB


DBMS
WW Week number (i.e. 1)

W Week number of the month (i.e. 5)

IW Week number of the year in ISO standard.

DDD Day of year in numbers (i.e. 365)

DD Day of the month in numbers (i.e. 28)

D Day of week in numbers(i.e. 7)

DAY Day of the week in characters (i.e. Monday)

FMDAY Day of the week in characters (i.e. Monday)

DY Day of the week in short character descrip on (i.e. SUN)

Julian Day (number of days since January 1 4713 BC, where


J
January 1 4713 BC is 1 in Oracle)

HH,H12 Hour number of the day (1-12)

HH24 Hour number of the day with 24Hours nota on (0-23)

AM, PM AM or PM

MI, SS Number of minutes and seconds (i.e. 59) ,

SSSSS Number of seconds this day.

Short date format. Depends on NLS-se ngs. Use only with


DS
mestamp.

Long date format. Depends on NLS-se ngs. Use only with


DL
mestamp.

Abbreviated era name. Valid only for calendars: Japanese


E
Imperial, ROC Official, Thai Buddha.
DBMS
EE The full era name

FF The frac onal seconds. Use with mestamp.

The frac onal seconds. Use with mestamp. The digit controls the
FF1..FF9
number of decimal digits used for frac onal seconds.

FM Fill Mode: suppresses blanks in output from conversion

Format Exact: requires exact pa ern matching between data and


FX
format model.

IYY OR IY OR I The last 3,2,1 digits of the ISO standard year. Output only

RM The Roman numeral representa on of the month (I .. XII)

RR The last 2 digits of the year.

The last 2 digits of the year when used for output. Accepts fout-
RRRR
digit years when used for input.

Spelled format. Can appear of the end of a number element. The


SP result is always in english. For example month 10 in format MMSP
returns "ten"

SPTH Spelled and ordinal format; 1 results in first.

Converts a number to it's ordinal format. For example 1 becoms


TH
1st.

Short me format. Depends on NLS-se ngs. Use only with


TS
mestamp.

TZD Abbreviated me zone name. ie PST.

TZH,TZM Time zone hour/minute displacement.

TZR Time zone region

X Local radix character. In America this is a period (.)


DBMS
The SELECT queries below accept numbers as character inputs
and prints them following the format specifier.

SELECT TO_NUMBER('121.23', '9G999D99')


FROM DUAL

TO_NUMBER('121.23','9G999D99')
------------------------------
121.23

SELECT TO_NUMBER('1210.73', '9999.99')


FROM DUAL;

TO_NUMBER('1210.73','9999.99')
------------------------------
1210.73

TO_DATE function
The function takes character values as input and returns
formatted date equivalent of the same. The TO_DATE function
allows users to enter a date in any format, and then it converts
the entry into the default format used by Oracle 11g.

Syntax:
TO_DATE( string1, [ format_mask ], [ nls_language ] )

A format_mask argument consists of a series of elements


representing exactly what the data should look like and must be
entered in single quotation marks.

Format Model Description

YEAR Year, spelled out

YYYY 4-digit year

YYY,YY,Y Last 3, 2, or 1 digit(s) of year.

IYY,IY,I Last 3, 2, or 1 digit(s) of ISO year.


DBMS
IYYY 4-digit year based on the ISO standard

RRRR Accepts a 2-digit year and returns a 4-digit year.

Q Quarter of year (1, 2, 3, 4; JAN-MAR = 1).

MM Month (01-12; JAN = 01).

MON Abbreviated name of month.

MONTH Name of month, padded with blanks to length of 9 characters.

RM Roman numeral month (I-XII; JAN = I).

Week of year (1-53) where week 1 starts on the first day of


WW
the year and con nues to the seventh day of the year.

Week of month (1-5) where week 1 starts on the first day of


W
the month and ends on the seventh.

IW Week of year (1-52 or 1-53) based on the ISO standard.

D Day of week (1-7).

DAY Name of day.

DD Day of month (1-31).

DDD Day of year (1-366).

DY Abbreviated name of day.

J Julian day; the number of days since January 1, 4712 BC.

HH12 Hour of day (1-12).

HH24 Hour of day (0-23).

MI,SS Minute (0-59).


DBMS
SSSSS Seconds past midnight (0-86399).

Frac onal seconds. Use a value from 1 to 9 a er FF to indicate


FF the number of digits in the frac onal seconds. For example,
'FF4'.

AM,PM Meridian indicator

AD,BC AD, BC indicator

TZD Daylight savings informa on. For example, 'PST'

TZH,TZM,TZR Time zone hour/minute/region.

The following example converts a character string into a date:

SELECT TO_DATE('January 15, 1989, 11:00 A.M.', 'Month


dd, YYYY, HH:MI A.M.', 'NLS_DATE_LANGUAGE = American')
FROM DUAL;

TO_DATE('
---------
15-JAN-89

General Functions
General functions are used to handle NULL values in database.
The objective of the general NULL handling functions is to replace
the NULL values with an alternate value. We shall briefly see
through these functions below.

NVL

The NVL function substitutes an alternate value for a NULL value.

Syntax:
NVL( Arg1, replace_with )

In the syntax, both the parameters are mandatory. Note that NVL
function works with all types of data types. And also that the data
DBMS
type of original string and the replacement must be in compatible
state i.e. either same or implicitly convertible by Oracle.

If arg1 is a character value, then oracle converts replacement


string to the data type compatible with arg1 before comparing
them and returns VARCHAR2 in the character set of expr1. If
arg1 is numeric, then Oracle determines the argument with
highest numeric precedence, implicitly converts the other
argument to that data type, and returns that data type.

The SELECT statement below will display 'n/a' if an employee has


been not assigned to any job yet i.e. JOB_ID is NULL. Otherwise,
it would display the actual JOB_ID value.

SELECT first_name, NVL(JOB_ID, 'n/a')


FROM employees;
NVL2

As an enhancement over NVL, Oracle introduced a function to


substitute value not only for NULL columns values but also for
NOT NULL columns. NVL2 function can be used to substitute an
alternate value for NULL as well as non NULL value.

Syntax:
NVL2( string1, value_if_NOT_null, value_if_null )

The SELECT statement below would display 'Bench' if the


JOB_CODE for an employee is NULL. For a definite not null value
of JOB CODE, it would show constant value 'Job Assigned'.

SQL> SELECT NVL2(JOB_CODE, 'Job Assigned', 'Bench')


FROM employees;
NULLIF

The NULLIF function compares two arguments expr1 and expr2.


If expr1 and expr2 are equal, it returns NULL; else, it returns
expr1. Unlike the other null handling function, first argument
can't be NULL.

Syntax:
NULLIF (expr1, expr2)
DBMS
Note that first argument can be an expression that evaluates to
NULL, but it can't be the literal NULL. Both the parameters are
mandatory for the function to execute.

The below query returns NULL since both the input values, 12 are
equal.

SELECT NULLIF (12, 12)


FROM DUAL;

Similarly, below query return 'SUN' since both the strings are not
equal.

SELECT NULLIF ('SUN', 'MOON')


FROM DUAL;
COALESCE

COALESCE function, a more generic form of NVL, returns the first


non-null expression in the argument list. It takes minimum two
mandatory parameters but maximum arguments has no limit.

Syntax:
COALESCE (expr1, expr2, ... expr_n )

Consider the below SELECT query. It selects the first not null
value fed into address fields for an employee.

SELECT COALESCE (address1, address2, address3) Address


FROM employees;

Interestingly, the working of COALESCE function is similar to


IF..ELSIF..ENDIF construct. The query above can be re-written as
-

IF address1 is not null THEN


result := address1;
ELSIF address2 is not null THEN
result := address2;
ELSIF address3 is not null THEN
result := address3;
ELSE
result := null;
DBMS
END IF;

Conditional Functions
Oracle provides conditional functions DECODE and CASE to
impose conditions even in SQL statement.

The DECODE function

The function is the SQL equivalence of IF..THEN..ELSE conditional


procedural statement. DECODE works with
values/columns/expressions of all data types.

Syntax:
DECODE (expression, search, result [, search,
result]... [, default])

DECODE function compares expression against each search value


in order. If equality exists between expression and search
argument, then it returns the corresponding result. In case of no
match, default value is returned, if defined, else NULL. In case of
any type compatibility mismatch, oracle internally does possible
implicit conversion to return the results.

As a matter of fact, Oracle considers two nulls to be equivalent


while working with DECODE function.

SELECT DECODE(NULL,NULL,'EQUAL','NOT EQUAL')


FROM DUAL;

DECOD
-----
EQUAL

If expression is null, then Oracle returns the result of the first


search that is also null. The maximum number of components in
the DECODE function is 255.

SELECT first_name, salary, DECODE (hire_date,


sysdate,'NEW JOINEE','EMPLOYEE')
FROM employees;
DBMS
CASE expression

CASE expressions works on the same concept as DECODE but


differs in syntax and usage.

Syntax:
CASE [ expression ]
WHEN condition_1 THEN result_1
WHEN condition_2 THEN result_2
...
WHEN condition_n THEN result_n
ELSE result
END

Oracle search starts from left and moves rightwards until it finds
a true condition, and then returns result expression associated
with it. If no condition is found to be true, and an ELSE clause
exists, then Oracle returns result defined with else. Otherwise,
Oracle returns null.

The maximum number of arguments in a CASE expression is 255.


All expressions count toward this limit, including the initial
expression of a simple CASE expression and the optional ELSE
expression. Each WHEN ... THEN pair counts as two arguments.
To avoid exceeding this limit, you can nest CASE expressions so
that the return_expr itself is a CASE expression.

SELECT first_name, CASE WHEN salary < 200 THEN


'GRADE 1'
WHEN salary > 200 AND salary < 5000
THEN 'GRADE 2'
ELSE 'GRADE 3'
END CASE
FROM employees;

ENAM CASE
---- -------
JOHN GRADE 2
EDWIN GRADE 3
KING GRADE 1
DBMS
DBMS
Using Group functions

Reporting Aggregate data using the Group functions


SQL has numerous predefined aggregate functions that can be
used to write queries to produce exactly this kind of
information.The GROUP BY clause specifies how to group rows
from a data table when aggregating information, while the
HAVING clause filters out rows that do not belong in specified
groups.

Aggregate functions perform a variety of actions such as counting


all the rows in a table, averaging a column's data, and summing
numeric data. Aggregates can also search a table to find the
highest "MAX" or lowest "MIN" values in a column. As with other
types of queries, you can restrict, or filter out the rows these
functions act on with the WHERE clause. For example, if a
manager needs to know how many employees work in an
organization, the aggregate function named COUNT(*) can be
used to produce this information.The COUNT(*) function shown in
the below SELECT statement counts all rows in a table.

SELECT COUNT(*)
FROM employees;

COUNT(*)
----------
24

The result table for the COUNT(*) function is a single column


from a single row known as a scalar result or value. Notice that
the result table has a column heading that corresponds to the
name of the aggregate function specified in the SELECT clause.

Some of the commonly used aggregate functions are as below -

SUM( [ALL | DISTINCT] expression )

AVG( [ALL | DISTINCT] expression )

COUNT( [ALL | DISTINCT] expression )

COUNT(*)
DBMS
MAX(expression)

MIN(expression)

The ALL and DISTINCT keywords are optional, and perform as


they do with the SELECT clauses that you have learned to
write.The ALL keyword is the default where the option is
allowed.The expression listed in the syntax can be a constant,a
function, or any combination of column names, constants, and
functions connected by arithmetic operators.However, aggregate
functions are most often used with a column name. Except
COUNT function,all the aggregate functions do not consider NULL
values.

There are two rules that you must understand and follow when
using aggregates:

 Aggregate functions can be used in both the SELECT and


HAVING clauses (the HAVING clause is covered later in this
chapter).
 Aggregate functions cannot be used in a WHERE clause. Its
violation will produce the Oracle ORA-00934 group function
is not allowed here error message.

Illustrations
The below SELECT query counts the number of employees in the
organization.

SELECT COUNT(*) Count


FROM employees;

COUNT
-----
24

The below SELECT query returns the average of the salaries of


employees in the organization.

SELECT AVG(Salary) average_sal


FROM employees;
DBMS
AVERAGE_SAL
-----------
15694

The below SELECT query returns the sum of the salaries of


employees in the organization.

SELECT SUM(Salary) total_sal


FROM employees;

TOTAL_SAL
---------
87472

The below SELECT query returns the oldest and latest hired dates
of employees in the organization.

SELECT MIN (hire_date) oldest, MAX (hire_date) latest


FROM employees;

OLDEST LATEST
--------- -----------
16-JAN-83 01-JUL-2012

GROUP BY
Aggregate functions are normally used in conjunction with a
GROUP BY clause. The GROUP BY clause enables you to use
aggregate functions to answer more complex managerial
questions such as:

What is the average salary of employees in each department?

How many employees work in each department?

How many employees are working on a particular project?

Group by function establishes data groups based on columns and


aggregates the information within a group only. The grouping
criterion is defined by the columns specified in GROUP BY clause.
Following this hierarchy, data is first organized in the groups and
then WHERE clause restricts the rows in each group.
DBMS
Guidelines of using GROUP BY clause

(1) All the dependent columns or columns used in GROUP BY


function must form the basis of grouping, hence must be included
in GROUP BY clause also.

SELECT DEPARTMENT_ID, SUM(SALARY)


FROM employees;

DEPARTMENT_ID,
*
ERROR at line 2:
ORA-00937: not a single-group group function

(2) GROUP BY clause does not support the use of column alias,
but the actual names.

(3) GROUP BY clause can only be used with aggregate functions


like SUM, AVG, COUNT, MAX, and MIN.If it is used with single row
functions,Oracle throws an exception as "ORA-00979: not a
GROUP BY expression".

(4) Aggregate functions cannot be used in a GROUP BY clause.


Oracle will return the "ORA-00934: group function not allowed"
here error message.

Below query lists the count of employees working in each


department.

SELECT DEPARTMENT_ID, COUNT (*)


FROM employees
GROUP BY DEPARTMENT_ID;

Similarly, below query to find sum of salaries for respective job


ids in each department. Note the group is established based on
Department and Job id. So they appear in GROUP BY clause.

SELECT DEPARTMENT_ID, JOB_ID, SUM (SAL)


FROM employees
GROUP BY DEPARTMENT_ID, JOB_ID;
DBMS
The below query also produces the same result. Please note that
grouping is based on the department id and job id columns but
not used for display purpose.

SELECT SUM (SALARY)


FROM employees
GROUP BY DEPARTMENT_ID, JOB_ID;

Use of DISTINCT, ALL keywords with Aggregate functions


By specifying DISTINCT keyword with the input parameter, group
by function considers only the unique value of the column for
aggregation. By specifying ALL keyword with the input parameter,
group by function considers all the values of the column for
aggregation, including nulls and duplicates. ALL is the default
specification.

The HAVING clause


The HAVING clause is used for aggregate functions in the same
way that a WHERE clause is used for column names and
expressions.Essentially,the HAVING and WHERE clauses do the
same thing, that is filter rows from inclusion in a result table
based on a condition. While it may appear that a HAVING clause
filters out groups, it does not.Rather,a HAVING clause filters
rows.

When all rows for a group are eliminated so is the group.To


summarize, the important differences between the WHERE and
HAVING clauses are:

A WHERE clause is used to filter rows BEFORE the GROUPING


action (i.e., before the calculation of the aggregate functions).

A HAVING clause filters rows AFTER the GROUPING action (i.e.,


after the calculation of the aggregate functions).

SELECT JOB_ID, SUM (SALARY)


FROM employees
GROUP BY JOB_ID
HAVING SUM (SALARY) > 10000;
DBMS
The HAVING clause is a conditional option that is directly related
to the GROUP BY clause option because a HAVING clause
eliminates rows from a result table based on the result of a
GROUP BY clause.

SELECT department_id, AVG(Salary)


FROM employees
HAVING AVG(Salary) > 33000;
ERROR at line 1: ORA-00937: not a single-group group
fun

Get Data from Multiple Tables

Displaying Data from Multiple Tables


The related tables of a large database are linked through the use
of foreign and primary keys or what are often referred to as
common columns. The ability to join tables will enable you to add
more meaning to the result table that is produced. For 'n' number
tables to be joined in a query, minimum (n-1) join conditions are
necessary. Based on the join conditions, Oracle combines the
matching pair of rows and displays the one which satisfies the
join condition.

Joins are classified as below

 Natural join (also known as an equijoin or a simple join) -


Creates a join by using a commonly named and defined
column.
 Non-equality join - Joins tables when there are no
equivalent rows in the tables to be joined-for example, to
match values in one column of a table with a range of values
in another table.
 Self-join - Joins a table to itself.
 Outer join - Includes records of a table in output when
there's no matching record in the other table.
 Cartesian join (also known as a Cartesian product or cross
join) - Replicates each row from the first table with every
row from the second table.Creates a join between tables by
displaying every possible record combination.
DBMS
Natural Join
The NATURAL keyword can simplify the syntax of an equijoin.A
NATURAL JOIN is possible whenever two (or more) tables have
columns with the same name,and the columns are join
compatible, i.e., the columns have a shared domain of values.The
join operation joins rows from the tables that have equal column
values for the same named columns.

Consider the one-to-many relationship between the


DEPARTMENTS and EMPLOYEES tables.Each table has a column
named DEPARTMENT_ID.This column is the primary key of the
DEPARTMENTS table and a foreign key of the EMPLOYEES table.

SELECT E.first_name NAME,D.department_name DNAME


FROM employees E NATURAL JOIN departments D;

FIRST_NAME DNAME
---------- ------
MILLER DEPT 1
JOHN DEPT 1
MARTIN DEPT 2
EDWIN DEPT 2

The below SELECT query joins the two tables by explicitly


specifying the join condition with the ON keyword.

SELECT E.first_name NAME,D.department_name DNAME


FROM employees E JOIN departments D
ON (E.department_id = D.department_id);

There are some limitations regarding the NATURAL JOIN.You


cannot specify a LOB column with a NATURAL JOIN.Also, columns
involved in the join cannot be qualified by a table name or alias.

USING Clause
Using Natural joins, Oracle implicitly identify columns to form the
basis of join. Many situations require explicit declaration of join
conditions. In such cases, we use USING clause to specify the
joining criteria. Since, USING clause joins the tables based on
equality of columns, it is also known as Equijoin. They are also
known as Inner joins or simple joins.
DBMS
Syntax:
SELECT <column list>
FROM TABLE1 JOIN TABLE2
USING (column name)

Consider the below SELECT query, EMPLOYEES table and


DEPARTMENTS table are joined using the common column
DEPARTMENT_ID.

SELECT E.first_name NAME,D.department_name DNAME


FROM employees E JOIN departments D
USING (department_id);

Self Join
A SELF-JOIN operation produces a result table when the
relationship of interest exists among rows that are stored within a
single table. In other words, when a table is joined to itself, the
join is known as Self Join.

Consider EMPLOYEES table,which contains employee and their


reporting managers.To find manager's name for an employee
would require a join on the EMP table itself. This is a typical
candidate for Self Join.

SELECT e1.FirstName Manager,e2.FirstName Employee


FROM employees e1 JOIN employees e2
ON (e1.employee_id = e2.manager_id)
ORDER BY e2.manager_id DESC;

Non Equijoins
A non-equality join is used when the related columns can't be
joined with an equal sign-meaning there are no equivalent rows
in the tables to be joined.A non-equality join enables you to store
a range's minimum value in one column of a record and the
maximum value in another column. So instead of finding a
column-tocolumn match, you can use a non-equality join to
determine whether the item being shipped falls between
minimum and maximum ranges in the columns.If the join does
find a matching range for the item, the corresponding shipping
fee can be returned in the results. As with the traditional method
DBMS
of equality joins, a non-equality join can be performed in a
WHERE clause. In addition, the JOIN keyword can be used with
the ON clause to specify relevant columns for the join.

SELECT E.first_name,
J.job_hisal,
J.job_losal,
E.salary
FROM employees E JOIN job_sal J
ON (E.salary BETWEEN J.job_losal AND J.job_losal);

We can make use all comparison parameter discussed earlier like


equality and inequality operators, BETWEEN, IS NULL, IS NOT
NULL, and RELATIONAL.

Outer Joins
An Outer Join is used to identify situations where rows in one
table do not match rows in a second table, even though the two
tables are related.

There are three types of outer joins: the LEFT, RIGHT, and FULL
OUTER JOIN. They all begin with an INNER JOIN, and then they
add back some of the rows that have been dropped. A LEFT
OUTER JOIN adds back all the rows that are dropped from the
first (left) table in the join condition, and output columns from
the second (right) table are set to NULL. A RIGHT OUTER JOIN
adds back all the rows that are dropped from the second (right)
table in the join condition, and output columns from the first
(left) table are set to NULL. The FULL OUTER JOIN adds back all
the rows that are dropped from both the tables.

Right Outer Join


A RIGHT OUTER JOIN adds back all the rows that are dropped
from the second (right) table in the join condition, and output
columns from the first (left) table are set to NULL. Note the below
query lists the employees and their corresponding departments.
Also no employee has been assigned to department 30.

SELECT E.first_name, E.salary, D.department_id


FROM employees E, departments D
WHERE E.DEPARTMENT_ID (+) = D.DEPARTMENT_ID;
DBMS
FIRST_NAME SALARY DEPARTMENT_ID
---------- ---------- ----------
JOHN 6000 10
EDWIN 2000 20
MILLER 2500 10
MARTIN 4000 20
30

Left Outer Join


A LEFT OUTER JOIN adds back all the rows that are dropped from
the first (left) table in the join condition, and output columns
from the second (right) table are set to NULL. The query
demonstrated above can be used to demonstrate left outer join,
by exchanging the position of (+) sign.

SELECT E.first_name, E.salary, D.department_id


FROM employees E, departments D
WHERE D.DEPARTMENT_ID = E.DEPARTMENT_ID (+);

FIRST_NAME SALARY DEPARTMENT_ID


---------- ---------- ----------
JOHN 6000 10
EDWIN 2000 20
MILLER 2500 10
MARTIN 4000 20
30

Full Outer Join


The FULL OUTER JOIN adds back all the rows that are dropped
from both the tables. Below query shows lists the employees and
their departments. Note that employee 'MAN' has not been
assigned any department till now (it's NULL) and department 30
is not assigned to any employee.

SELECT nvl (e.first_name,'-') first_name, nvl (to_char


(d.department_id),'-') department_id
FROM employee e FULL OUTER JOIN department d
ON e. depARTMENT_ID = d. depARTMENT_ID;

FIRST_NAME DEPARTMENT_ID
DBMS
---------- --------------------
MAN -
JOHN 10
EDWIN 20
MILLER 10
MARTIN 20
- 30

6 rows selected.

Cartesian product or Cross join


For two entities A and B, A * B is known as Cartesian product. A
Cartesian product consists of all possible combinations of the
rows from each of the tables. Therefore, when a table with 10
rows is joined with a table with 20 rows, the Cartesian product is
200 rows (10 * 20 = 200).For example, joining the employee
table with eight rows and the department table with three rows
will produce a Cartesian product table of 24 rows (8 * 3 = 24).

Cross join refers to the Cartesian product of two tables. It


produces cross product of two tables. The above query can be
written using CROSS JOIN clause.

A Cartesian product result table is normally not very useful. In


fact, such a result table can be terribly misleading. If you execute
the below query for the EMPLOYEES and DEPARTMENTS tables,
the result table implies that every employee has a relationship
with every department, and we know that this is simply not the
case!

SELECT E.first_name, D.DNAME


FROM employees E,departments D;
Cross join can be written as,
SELECT E.first_name, D.DNAME
FROM employees E CROSS JOIN departments D;
DBMS

SQL - Sub Queries

SQL Subqueries
An SQL Subquery, is a SELECT query within another query. It is
also known as Inner query or Nested query and the query
containing it is the outer query.

The outer query can contain the SELECT, INSERT, UPDATE, and
DELETE statements. We can use the subquery as a column
expression, as a condition in SQL clauses, and with operators like
=, >, <, >=, <=, IN, BETWEEN, etc.

Rules to be followed
Following are the rules to be followed while writing subqueries −

 Subqueries must be enclosed within parentheses.


 Subqueries can be nested within another subquery.
 A subquery must contain the SELECT query and the FROM
clause always.
 A subquery consists of all the clauses an ordinary SELECT
clause can contain: GROUP BY, WHERE, HAVING, DISTINCT,
TOP/LIMIT, etc. However, an ORDER BY clause is only used
when a TOP clause is specified. It can't include COMPUTE or
FOR BROWSE clause.
 A subquery can return a single value, a single row, a single
column, or a whole table. They are called scalar subqueries.

Subqueries with the SELECT Statement


Subqueries are most frequently used with the SELECT statement.
The basic syntax is as follows −

SELECT column_name [, column_name ]


FROM table1 [, table2 ]
DBMS
WHERE column_name
OPERATOR (SELECT column_name [,column_name ] FROM table1 [, table2 ]
[WHERE]);
Example

In the following query, we are creating a table


named CUSTOMERS −

CREATE TABLE CUSTOMERS (


ID INT NOT NULL,
NAME VARCHAR(20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);

Here, we are inserting records into the above-created table using


INSERT INTO statement −

INSERT INTO CUSTOMERS VALUES


(1, 'Ramesh', 32, 'Ahmedabad', 2000.00),
(2, 'Khilan', 25, 'Delhi', 1500.00),
(3, 'Kaushik', 23, 'Kota', 2000.00),
(4, 'Chaitali', 25, 'Mumbai', 6500.00),
(5, 'Hardik', 27, 'Bhopal', 8500.00),
(6, 'Komal', 22, 'Hyderabad', 4500.00),
(7, 'Muffy', 24, 'Indore', 10000.00);

The table is displayed as −

ID NAME AGE ADDRESS SALARY

1 Ramesh 32 Ahmedabad 2000.00

2 Khilan 25 Delhi 1500.00

3 kaushik 23 Kota 2000.00


DBMS
4 Chaitali 25 Mumbai 6500.00

5 Hardik 27 Bhopal 8500.00

6 Komal 22 Hyderabad 4500.00

7 Muffy 24 Indore 10000.00

Now, let us check the following subquery with a SELECT


statement.

SELECT * FROM CUSTOMERS


WHERE ID IN (SELECT ID FROM CUSTOMERS WHERE SALARY > 4500);

This would produce the following result −

ID NAME AGE ADDRESS SALARY

4 Chaitali 25 Mumbai 6500.00

5 Hardik 27 Bhopal 8500.00

7 Muffy 24 Indore 10000.00

Subqueries with the INSERT Statement


We can also use the subqueries along with the INSERT
statements. The data returned by the subquery is inserted into
another table.

The basic syntax is as follows −

INSERT INTO table_name [ (column1 [, column2 ]) ]


SELECT [ *|column1 [, column2 ] FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]
Example

In the following example, we are creating another table


CUSTOMERS_BKP with similar structure as CUSTOMERS table −
DBMS
CREATE TABLE CUSTOMERS_BKP (
ID INT NOT NULL,
NAME VARCHAR(20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);

Now to copy the complete records of CUSTOMERS table into the


CUSTOMERS_BKP table, we can use the following query −

INSERT INTO CUSTOMERS_BKP


SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID FROM CUSTOMERS);

The above query produces the following output −

Query OK, 7 rows affected (0.01 sec)


Records: 7 Duplicates: 0 Warnings: 0
Verification

Using the SELECT statement, we can verify whether the records


from CUSTOMERS table have been inserted into
CUSTOMERS_BKP table or not −

SELECT * FROM CUSTOMERS_BKP;

The table will be displayed as −

ID NAME AGE ADDRESS SALARY

1 Ramesh 32 Ahmedabad 2000.00

2 Khilan 25 Delhi 1500.00

3 kaushik 23 Kota 2000.00

4 Chaitali 25 Mumbai 6500.00


DBMS
5 Hardik 27 Bhopal 8500.00

6 Komal 22 Hyderabad 4500.00

7 Muffy 24 Indore 10000.00

Subqueries with the UPDATE Statement


A subquery can also be used with the UPDATE statement. You can
update single or multiple columns in a table using a subquery.

The basic syntax is as follows −

UPDATE table
SET column_name = new_value
[WHERE OPERATOR [VALUE](SELECT COLUMN_NAME FROM TABLE_NAME
[WHERE]);
Example

We have the CUSTOMERS_BKP table available which is backup of


CUSTOMERS table. The following example updates SALARY by
0.25 times in the CUSTOMERS table for all the customers whose
AGE is greater than or equal to 27.

UPDATE CUSTOMERS
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );

Following is the output of the above query −

Query OK, 2 rows affected (0.01 sec)


Rows matched: 2 Changed: 2 Warnings: 0
Verification

This would impact two rows and if you verify the contents of the
CUSTOMERS using the SELECT statement as shown below.

SELECT * FROM CUSTOMERS;

The table will be displayed as −


DBMS
ID NAME AGE ADDRESS SALARY

1 Ramesh 32 Ahmedabad 500.00

2 Khilan 25 Delhi 1500.00

3 kaushik 23 Kota 2000.00

4 Chaitali 25 Mumbai 6500.00

5 Hardik 27 Bhopal 2125.00

6 Komal 22 Hyderabad 4500.00

7 Muffy 24 Indore 10000.00

Subqueries with the DELETE Statement


The subquery can be used with the DELETE statement as well;
like with any other statements mentioned above.

The basic syntax is as follows −

DELETE FROM TABLE_NAME


[WHERE OPERATOR [ VALUE ](SELECT COLUMN_NAME FROM
TABLE_NAME)[WHERE)];
Example

We have a CUSTOMERS_BKP table available which is a backup of


the CUSTOMERS table. The following example deletes the records
from the CUSTOMERS table for all the customers whose AGE is
greater than or equal to 27.

DELETE FROM CUSTOMERS


WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );

The above query generate the following output −

OK, 2 rows affected (0.01 sec)


DBMS
Verification

If you verify the contents of the CUSTOMERS table using the


SELECT statement as shown below.

SELECT * FROM CUSTOMERS;

The table will be displayed as −

ID NAME AGE ADDRESS SALARY

2 Khilan 25 Delhi 1500.00

3 kaushik 23 Kota 2000.00

4 Chaitali 25 Mumbai 6500.00

6 Komal 22 Hyderabad 4500.00

7 Muffy 24 Indore 10000.00


DBMS
SET Operators in SQL
SET operators are special type of operators which are used to combine the result of two queries.

Operators covered under SET operators are:

1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS

There are certain rules which must be followed to perform operations using SET operators in SQL.
Rules are as follows:

1. The number and order of columns must be the same.


2. Data types must be compatible.

Let us see each of the SET operators in more detail with the help of examples.

All the examples will be written using the MySQL database.


DBMS
Consider we have the following tables with the given data.

Table 1: t_employees

ID Name Department Salary Year_of_Experience

1 Aakash Singh Development 72000 2

2 Abhishek Pawar Production 45000 1

3 Pranav Deshmukh HR 59900 3

4 Shubham Mahale Accounts 57000 2

5 Sunil Kulkarni Development 87000 3

6 Bhushan Wagh R&D 75000 2

7 Paras Jaiswal Marketing 32000 1

Table 2: t2_employees

ID Name Department Salary Year_of_Experience

1 Prashant Wagh R&D 49000 1

2 Abhishek Pawar Production 45000 1

3 Gautam Jain Development 56000 4

4 Shubham Mahale Accounts 57000 2

5 Rahul Thakur Production 76000 4

6 Bhushan Wagh R&D 75000 2


DBMS
7 Anand Singh Marketing 28000 1

Table 3: t_students

ID Name Hometown Percentage Favourite_Subject

1 Soniya Jain Udaipur 89 Physics

2 Harshada Sharma Kanpur 92 Chemistry

3 Anuja Rajput Jaipur 78 History

4 Pranali Singh Nashik 88 Geography

5 Renuka Deshmukh Panipat 90 Biology

6 Swati Kumari Faridabad 93 English

7 Prachi Jaiswal Gurugram 96 Hindi

Table 4: t2_students

ID Name Hometown Percentage Favourite_Subject

1 Soniya Jain Udaipur 89 Physics

2 Ishwari Dixit Delhi 86 Hindi

3 Anuja Rajput Jaipur 78 History

4 Pakhi Arora Surat 70 Sanskrit

5 Renuka Deshmukh Panipat 90 Biology

6 Jayshree Patel Pune 91 Maths


DBMS
7 Prachi Jaiswal Gurugram 96 Hindi

1. UNION:
ADVERTISEMENT

o UNION will be used to combine the result of two select statements.


o Duplicate rows will be eliminated from the results obtained after performing the UNION operation.

Example 1:

Write a query to perform union between the table t_employees and the table t2_employees.

Query:

1. mysql> SELECT *FROM t_employees UNION SELECT *FROM t2_employees;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_employees table and perform a UNION operation with the records fetched by
the second SELECT query from the t2_employees table.

You will get the following output:

ID Name Department Salary Year_of_Experience

1 Aakash Singh Development 72000 2

2 Abhishek Pawar Production 45000 1

3 Pranav Deshmukh HR 59900 3

4 Shubham Mahale Accounts 57000 2

5 Sunil Kulkarni Development 87000 3

6 Bhushan Wagh R&D 75000 2

7 Paras Jaiswal Marketing 32000 1


DBMS
1 Prashant Wagh R&D 49000 1

3 Gautam Jain Development 56000 4

5 Rahul Thakur Production 76000 4

7 Anand Singh Marketing 28000 1

Since we have performed union operation between both the tables, so only the records from the
first and second table are displayed except for the duplicate records.

Example 2:

Write a query to perform union between the table t_students and the table t2_students.

Query:

1. mysql> SELECT *FROM t_students UNION SELECT *FROM t2_students;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_students table and perform a UNION operation with the records fetched by the
second SELECT query from the t2_students table.

ADVERTISEMENT

You will get the following output:

ID Name Department Salary Year_of_Experience

1 Soniya Jain Udaipur 89 Physics

2 Harshada Sharma Kanpur 92 Chemistry

3 Anuja Rajput Jaipur 78 History

4 Pranali Singh Nashik 88 Geography

5 Renuka Deshmukh Panipat 90 Biology


DBMS
6 Swati Kumari Faridabad 93 English

7 Prachi Jaiswal Gurugram 96 Hindi

2 Ishwari Dixit Delhi 86 Hindi

4 Pakhi Arora Surat 70 Sanskrit

6 Jayshree Patel Pune 91 Maths

Since we have performed union operation between both the tables, so only the records from the
first and second table are displayed except for the duplicate records.

2. UNION ALL
o This operator combines all the records from both the queries.
o Duplicate rows will be not be eliminated from the results obtained after performing the UNION ALL
operation.

ADVERTISEMENT
ADVERTISEMENT

Example 1:

Write a query to perform union all operation between the table t_employees and the table
t2_employees.

Query:

1. mysql> SELECT *FROM t_employees UNION ALL SELECT *FROM t2_employees;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_employees table and perform UNION ALL operation with the records fetched
by the second SELECT query from the t2_employees table.

You will get the following output:

ID Name Department Salary Year_of_Experience


DBMS
1 Aakash Singh Development 72000 2

2 Abhishek Pawar Production 45000 1

3 Pranav Deshmukh HR 59900 3

4 Shubham Mahale Accounts 57000 2

5 Sunil Kulkarni Development 87000 3

6 Bhushan Wagh R&D 75000 2

7 Paras Jaiswal Marketing 32000 1

1 Prashant Wagh R&D 49000 1

2 Abhishek Pawar Production 45000 1

3 Gautam Jain Development 56000 4

4 Shubham Mahale Accounts 57000 2

5 Rahul Thakur Production 76000 4

6 Bhushan Wagh R&D 75000 2

7 Anand Singh Marketing 28000 1

Since we have performed union all operation between both the tables, so all the records from the
first and second table are displayed, including the duplicate records.

Example 2:

Write a query to perform union all operation between the table t_students and the table
t2_students.

Query:
DBMS
1. mysql> SELECT *FROM t_students UNION ALL SELECT *FROM t2_students;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_students table and perform UNION ALL operation with the records fetched by
the second SELECT query from the t2_students table.

You will get the following output:

ID Name Hometown Percentage Favourite_Subject

1 Soniya Jain Udaipur 89 Physics

2 Harshada Sharma Kanpur 92 Chemistry

3 Anuja Rajput Jaipur 78 History

4 Pranali Singh Nashik 88 Geography

5 Renuka Deshmukh Panipat 90 Biology

6 Swati Kumari Faridabad 93 English

7 Prachi Jaiswal Gurugram 96 Hindi

1 Soniya Jain Udaipur 89 Physics

2 Ishwari Dixit Delhi 86 Hindi

3 Anuja Rajput Jaipur 78 History

4 Pakhi Arora Surat 70 Sanskrit

5 Renuka Deshmukh Panipat 90 Biology

6 Jayshree Patel Pune 91 Maths

7 Prachi Jaiswal Gurugram 96 Hindi


DBMS
Since we have performed union all operation between both the tables, so all the records from the
first and second table are displayed, including the duplicate records.

3. INTERSECT:
o It is used to combine two SELECT statements, but it only returns the records which are common from
both SELECT statements.

Example 1:

Write a query to perform intersect operation between the table t_employees and the table
t2_employees.

Query:

1. mysql> SELECT *FROM t_employees INTERSECT SELECT *FROM t2_employees;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_employees table and perform INTERSECT operation with the records fetched by
the second SELECT query from the t2_employees table.

You will get the following output:

ID Name Hometown Percentage Favourite_Subject

2 Abhishek Pawar Production 45000 1

4 Shubham Mahale Accounts 57000 2

6 Bhushan Wagh R&D 75000 2

Since we have performed intersect operation between both the tables, so only the common records
from both the tables are displayed.

Example 2:

Write a query to perform intersect operation between the table t_students and the table
t2_students.
DBMS
Query:

1. mysql> SELECT *FROM t_students INTERSECT SELECT *FROM t2_students;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_students table and perform a UNION operation with the records fetched by the
second SELECT query from the t2_students table.

You will get the following output:

ID Name Hometown Percentage Favourite_Subject

1 Soniya Jain Udaipur 89 Physics

3 Anuja Rajput Jaipur 78 History

5 Renuka Deshmukh Panipat 90 Biology

7 Prachi Jaiswal Gurugram 96 Hindi

Since we have performed intersect operation between both the tables, so only the common records
from both the tables are displayed.

4. MINUS

o It displays the rows which are present in the first query but absent in the second query with no
duplicates.

Example 1:

Write a query to perform a minus operation between the table t_employees and the table
t2_employees.

Query:

1. mysql> SELECT *FROM t_employees MINUS SELECT *FROM t2_employees;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_employees table and perform MINUS operation with the records fetched by the
second SELECT query from the t2_employees table.
DBMS
You will get the following output:

ID Name Department Salary Year_of_Experience

1 Aakash Singh Development 72000 2

3 Pranav Deshmukh HR 59900 3

5 Sunil Kulkarni Development 87000 3

7 Paras Jaiswal Marketing 32000 1

Since we have performed Minus operation between both the tables, so only the unmatched records
from both the tables are displayed.

Example 2:

Write a query to perform a minus operation between the table t_students and the table t2_students.

Query:

1. mysql> SELECT *FROM t_students MINUS SELECT *FROM t2_students;

Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the
records from the t_employees table and perform a UNION operation with the records fetched by
the second SELECT query from the t2_employees table.

You will get the following output:

ID Name Hometown Percentage Favourite_Subject

2 Harshada Sharma Kanpur 92 Chemistry

4 Pranali Singh Nashik 88 Geography

6 Swati Kumari Faridabad 93 English


DBMS
Since we have performed a minus operation between both the tables, so only the Unmatched
records from both the tables are displayed.
DBMS

You might also like