DBMS Notes Full
DBMS Notes Full
Database Systems
1. Introducing the database and DBMS
Data:
• Data is defined as collection of raw facts about a place, person, thing or object involving in the
transactions of an organization.
• Data can be represented in various forms like text, numbers, images, audio, video, graphs,
document files, etc.
• Data constitutes the building blocks of information.
• Data is one of the important assets of the modern business.
• Data becomes relevant based on the context.
Information
• Information can be defined as processed data that increases the knowledge of end user.
• Information is used to reveal the meaning of data.
• Good, accurate and timely information is used in decision making.
• The quality of data influences the quality of information.
• Information can be presented in the tabular form, bar graph or an image.
Metadata
• Metadata is a special data that describes the characteristics or properties of the data.
• Metadata consists of name, data type, length, min, max, description, special constraints.
• Metadata allows the database designers and users understand what data exists and what data
means.
• Metadata is generally stored in a repository.
Database:
• Database can be defined as organized collection of logically related data.
• Database can be of any size and complexity.
• Data are structured so as to be easily stored, manipulated, and retrieved by users.
• Example: Sales person can store customers contacts on his laptop that consist of few mega bytes
of data or A big company can store the data of all activities in the organization which helps in
decision making.
DBMS:
• Database management system can be defined as reorganized collection of logically related data and set
of programs used for creating, storing, updating and retrieval of data from the database.
• DBMS acts as a mediator between end-user and the database.
• Database management system (DBMS): can be defined as collection of programs
that manages database structure and controls access to data.
• DBMS enables data to be shared.
• DBMS integrates many users’ views of the data.
Advantages of a DBMS:
• Improved data sharing: A database is designed as a shared resource. Authorized users are granted
permission to use the database, and each user is provided one or more user views. DBMS provides better
access to data and better-managed data.
• Improved data security: When number of users increases to access the data, the risk of data security
increases. But, DBMS provides a framework for better enforcement of data privacy and security
policies. A database can be accessed only by proper authentication usually by verifying login and
password.
• Better data integration: DBMS integrates the many different users' views into a single data
repository. This gives clear picture of the organization's operations. It becomes much easier to see how
actions in one segment of the company affect other segments.
• Minimized data inconsistency: Data inconsistency exists when different versions of the same data
appear in different places. In a DBMS, by eliminating this data redundancy, we can improve data
consistency. For example, if a customer address is stored only once, updating that becomes simple.
• Improved data access: The DBMS makes it possible to produce quick answers to any queries. A
query is a request or a question put to the DBMS for data manipulation or retrieval. Without any
programming experience, one can retrieve and display data very easily. The language used to write
queries is called Structured Query Language (SQL). For example, records from EMP table can be
displayed using the query “SELECT * FROM EMP”
• Improved decision making: Now a day business success depends on decision making which is based
on quality information generated by databases. In DBMS, better-managed data and improved data
access make it possible to generate quality information, on which better decisions are based.
• Program-Data Independence: The separation of data description (metadata) from the application
programs that use the data is called data independence. With the database approach, data descriptions
are stored in a central location called the repository. This allows an organization’s data to change
without changing the application programs that process the data.
accessing this file's data will have to change to access the changed file successfully. The reason for this
is that the file data characteristics are coded within the program.
Data Redundancy
• Data redundancy results in data inconsistency
• Data inconsistency happens when different and conflicting versions of the same data appear in different
places in the same file or in multiple files
• Errors more likely to occur when complex entries are made in several different files and recur
frequently in one or more files
• Data anomalies develop when required changes in redundant data are not made successfully.
Data Anomalies:
• Modification anomalies: Occur when changes must be made to existing records. However, in a large
scale system, such changes might occur in hundreds or even thousands of records. So it is clear, the
potential for data inconsistencies in the file system is great.
• Insertion anomalies: Occur when entering new records.
• Deletion anomalies: Occur when deleting records.
So obviously, from the above data anomalies problems, the potential for data inconsistencies in the file
system is great
3. Database Systems
• Database system consists of logically related data stored in a single logical data repository.
• Database system may be physically distributed among multiple storage facilities
• DBMS eliminates most of file system’s problems.
• Current generation stores data structures, relationships between structures, and access paths. Also
defines, stores, and manages all access paths and components
Database Environment
• Database system: defines and regulates the collection, storage, management, use of data
• Five major parts of a database system:
– Hardware
– Software
– People
– Procedures
– Data
1) Hardware: all the system’s physical devices
2) Software: To make the database system work properly, three types of software are needed: operating system,
DBMS software, and application programs.
a) Operating system: It manages all hardware components and allows other software to
run on the computers. Examples of operating system software include Windows, Linux and etc.
b) DBMS software: It manages the database within the database system. Some examples
of DBMS software include Oracle, Access, MySQL and etc.
c) Application programs: These are used to access and manipulate data in the DBMS and
to manage the computer environment in which data access and manipulation take
place. Application programs are most commonly used to access data to generate
reports. Most of the application programs provide GUI.
3) People: This component includes all users of the database system. According to the job Nature, five types of
users can be identified: systems administrators, database Administrators, database designers, systems analysts
and programmers, and end users.
a) System administrators: They supervise the database system's general operations.
b) Database administrators: They are also known as DBAs. They manage the DBMS and ensure that the
database is functioning properly.
c) Database designers: They design the database structure. They are the database architects. As this is
very critical, the designer's job responsibilities are increased.
d) Systems analysts and programmers: They design and implement the application programs. They
design and create the data entry screens, reports, and procedures through which end users can access and
manipulate the data.
e) End users: They are the people who use the application programs to run the organization's daily
operations. For example, sales-clerks, supervisors, managers are classified as end users.
4) Procedures: Procedures are the instructions and rules that supervise the design and use of the database
system. Procedures are a critical component of the system. Procedures play an important role in a company
because they enforce the standards by which business is conducted in an organization
5) Data: Data refers the collection of facts stored in the database. Because data are the raw
Material from which information is generated, no database can exist without database.
4.Types of databases
• Databases can be classified according to:
– Number of users
– Database location(s)
– Expected type and extent of use
• Single-user database supports only one user at a time
– Desktop database: single-user; runs on PC
• Multiuser database supports multiple users at the same time
– Workgroup and enterprise databases
• Centralized database: data located at a single site
• Distributed database: data distributed across several different sites
• Operational database: supports a company’s day-to-day operations
– Transactional or production database
• Data warehouse: stores data used for tactical or strategic decisions
5.Functions of DBMS
• Most functions are transparent to end users and can only be achieved through the DBMS
• Functions of DBMS are
• Data dictionary management
– DBMS stores definitions of data elements and relationships (metadata) in a data dictionary
– DBMS looks up required data component structures and relationships
– Changes automatically recorded in the dictionary
– DBMS provides data abstraction and removes structural and data dependency
• Data storage management
– DBMS creates and manages complex structures required for data storage
– Stores related data entry forms, screen definitions, report definitions, etc.
– Performance tuning: activities that make the database perform more efficiently
– DBMS stores the database in multiple physical data files
• Data transformation and presentation
– DBMS transforms data entered to conform to required data structures.
– DBMS transforms physically retrieved data to conform to user’s logical expectations.
• Security management
– DBMS creates a security system that enforces user security and data privacy.
– Security rules determine which users can access the database, which items can be accessed,
etc.
• Multiuser access control
– DBMS uses sophisticated algorithms to ensure concurrent access does not affect integrity.
• Backup and recovery management
– DBMS provides backup and data recovery to ensure data safety and integrity
– Recovery management deals with recovery of database after a failure
• Data integrity management
– DBMS promotes and enforces integrity rules in order to minimizes redundancy and
maximizes consistency
– Data relationships stored in data dictionary used to enforce data integrity
– Integrity is especially important in transaction-oriented database systems
• Database access languages and application programming interfaces
– DBMS provides access through a query language
– Query language is a nonprocedural language
– Structured Query Language (SQL) is the de facto query language supported by majority of
DBMS vendors
• Database communication interfaces
– Current DBMSs accept end-user requests via multiple different network environments
– End users generate answers to queries by filling in screen forms through Web browser
– DBMS automatically publishes predefined reports on a Web site
– DBMS connects to third-party systems to distribute information via e-mail
New, Specialized Personnel: Frequently, organizations that adopt the database approach need to hire or
train individuals to design and implement databases. This personnel increase seems to be expensive, but
an organization should not minimize the need for these specialized skills.
Installation and Management Cost and Complexity: A multi-user database management system is
large and complex software that has a high initial cost. It requires trained personnel to install and
operate, and also has annual maintenance costs. Installing such a system may also require upgrades to
the hardware and data communications systems in the organization.
Conversion Costs: The term “legacy systems” is used to refer to older applications in an organization
that are based on file processing. The cost of converting these older systems to modern database
technology may seem prohibitive to an organization.
Need for Explicit Backup and Recovery: A shared database must be accurate and available at all
times. This raises the need to have backup copies of data for restoring a database when damage
occurs. A modern database management system normally automates recovery tasks.
Organizational Conflict: A database requires an agreement on data definitions and ownership as well
as responsibilities for accurate data maintenance. The conflicts on data definitions, data formats and
coding causes updating of shared data. Handling these issues requires organizational commitment to the
database approach.
Entity-Relationship Model:
An entity–relationship model (E-R model) is a systematic way of describing and
defining a business process. An -ER model is typically implemented as a database. The
E-R model defines the conceptual view of a database, and is based on the notion of
real-world entities and relationships among them. While formulating real-world scenario
into the database model, the E-R Model creates entity set, relationship set, general
attributes and constraints.
Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a college database, students, teachers, classes, and courses
offered can be considered as entities. All these entities have some attributes or properties that
give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with
attribute sharing similar values. For example, a Students set may contain all the students of a
college; likewise a Teachers set may contain all the teachers of a college from all faculties.
Entity sets need not be disjoint.
Entities are represented by means of rectangles. Rectangles are named with the entity
set they represent.
Attributes
An attribute is a characteristic of an entity. Entities are represented by means of their
properties, called attributes. All attributes have values. For example, a student entity may
have name, class, and age as attributes. There exists a domain or range of values that can be
assigned to attributes. For example, a student's name cannot be a numeric value. It has to be
alphabetic. A student's age cannot be negative, etc. Attribute can be represented by an oval.
Relationship
The association among entities is called a relationship. For example, an employee works_at a
department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship can be represented by diamond shape.
Relationship Set-A set of relationships of similar type is called a relationship set. Like entities,
a relationship too can have attributes. These attributes are called descriptive attributes.
Types of Entities:
Weak Entity: Weak entity is an entity that depends on another entity. Weak entity doesn't
have key attribute (primary key) of their own. In other words, the entity set which does not
have sufficient attributes to form a primary key is called as Weak entity set. Double rectangle
represents weak entity.
Strong Entity: An entity which have an independent existence is called strong entity. A strong
entity set have their primary keys.
Types of Attributes:
• Simple attribute − Simple attributes are atomic values, which cannot be divided further.
For example, a student's phone number is an atomic value of 10 digits.
• Composite attribute − Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have first_name and last_name.
• Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database. For example, average_salary in a department should not be saved directly in
the database, instead it can be derived. For another example, age can be derived from
data_of_birth.
• Single-value attribute − Single-value attributes contain single value. For example −
Social_Security_Number.
• Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
Below are the few special case of attributes:
▪ Required attribute:-An attribute that must have a value.These attributes does
not allow NULL values
▪ Optional attribute:- An attribute that may or may not have a value. These
attributes allows NULL values
S-Addr S-DOB
S-Name
S-Phone
S-No
STUDENT
Figure 1
In the above figure the attributes S-No, S-Name, S-Addr, S-DOB are required
attributes and the attribute S-Phone is an optional attribute.
▪ Primary Key Attribute(Key Identifier):-
o One or more attributes that uniquely identify an entity instance
o Primary key attributes of an entity are underlined by solid line in ER
diagrams.
o Each entity has only one primary key.
o It does not allow null values
o In the above example S-No is a primary key for STUDENT entity
Type of Relationships:
A relationship describes an association among entities. For example, a relationship exists
between customers and agents that can be described as follows: an agent can serve many
customers, and each customer may be served by one agent.
The ER Model uses the term connectivity to label the type of relationship. There are
three types of relationships based on cardinality
Mapping Cardinalities: Mapping cardinalities defines the association between entities. There
are different types of relationships.
1. One-to-one Relationship (1 : 1) – One entity from entity set A can be associated with at
most one entity of entity set B and vice versa.
▪ Ex: An employee can be head of only one DEPARTMENT. A DEPARTMENT will have
one and only one HOD
EMPLOYEE DEPARTMENT
HOD
2. One-to - Many Relationship(1:M): One entity from entity set A can be associated
with more than one entities of entity set B however an entity from entity set B, can be
associated with at most one entity.
• Ex:-
DEPARTMENT COURSE
Offers
▪ Ex:-
Manager
EMPLOYEE
2. Binary Relationship:-A binary relationship exists when two entities are associated
in a relationship. A binary relationship can be weak or strong based on the
participating entities. Binary relationship is with degree 2.
▪ Ex:-
DOCTOR PATIENT
Prescription
Ex:-
EMPLOYEE Has
DEPENDENT
• In the above example the entity DEPENDENT is existence dependent on the entity
EMPLOYEE.
E-R Diagram
ER-Diagram is a visual representation of data that describes how data is related to each other.
Here are the geometric shapes and their meaning in an E-R Diagram –
Multivalued Attributes: E.g. A person can have more than one phone numbers so the
phone number attribute is multivalued.
Derived Attribute: E.g. Person age is a derived attribute as it changes over time and
can be derived from another attribute (Date of birth).
Key Attribute: Key attribute represents the main characteristic of an Entity. It is used to
represent Primary key. Ellipse with underlying lines represent Key Attribute
Composite attribute: An attribute can also have their own attributes. These attributes are
known as Composite attribute.
E-name
E-add D-Name
E-id
D-age
Employee
HAS DEPENDENT
• In the above example the entity EMPLOYEE has an attribute E-id that can qualify
as primary key therefore it is a strong entity.
• The entity DEPENDENT is not possessing any attribute that can qualify as a
primary key therefore it is a weak entity
• As per the relational Database rules every entity should possess a primary key
therefore primary key for DEPENDENT entity was build using the primary key of
EMPLOYEE entity.
UNIT-III
EXTENDED NORMALIZATION OF DATABASE TABLES
Normalization:- Normalization is a process for evaluating and correcting table’s structures to minimize data
redundancy thereby avoiding the occurring of data anomalies.
Need for Normalization
• To minimize data redundancy
• To avoid data anomalies resulting during insert, update or delete operations.
The Normalization Process
• The objective of normalization is to ensure that each table conforms to the concept of well defined
relations that satisfy the following characteristics
• Each table represents a single subject
• No data item will be unnecessarily stored in more than one table.
• All nonprime attributes in a table are dependent on the primary key.
• Each table should not exhibit insert , update and delete anomalies.
• Normalization process takes us through the steps that lead us through normal form to accomplish the
above objective.
• Functional Dependency:- the attribute B is fully
Normal form:- Normalization works through a series of stages called normal form.
• Following are the different normal forms:-
➢ First normal form(1NF)
➢ Second normal form(2NF)
➢ Third normal form(3NF)
➢ Boyce Codd normal form(BCNF)
➢ Fourth normal form(4NF)
• The primary key for the above table was made up of attributes: (PROJ_NUM, EMP-ID )
• Functional dependencies:-
• If any relation or table is not satisfying the above conditions it is said not to be 2NFand the following
steps are to convert the relation into the second normal form:-
o Step 1:- Writes each key component on a separate line.
o Step 2:- Assign corresponding dependent attributes
o Step 3:- Create a separate table for each determinant ad its dependencies.
Example: - The above table is not 2NF as it is not satisfying rule-2 i.e. the attributes ENAME,JOB,CHG_HRS,
PROJ_NAME are exhibiting partial dependencies.
PROJECT
PROJ_NUM PROJ_NAME
ASSIGNMENT
PROJ_NUM EMP-ID NO_HOUR
EMPLOYEE
EMP-ID ENAME JOB CHG-HRS
• If a table is in 2 NF and it exhibits transitive dependency it can be corrected by the following steps
1. Step1:- Identify each new determinant
2. Step2:- Identify the dependent attribute
3. Step3:- Remove the dependent attribute from the table and construct new table with determinant
and the dependents.
Ex:-
EMPLOYEE
EMP-ID ENAME JOB CHG-HRS
EMPLOYEE
EMP-ID ENAME JOB
JOB-DETAILS
JOB CHG-HRS
• Improving the design focus on Improving the database’s ability to provide information and on
enhancing its operational characteristics.
• Remember that normalization cannot, by itself, be relied onto make good designs. Instead, normalization
is valuable because its use helps eliminate data redundancies.
• Following issues are considered while improving the design
1. Evaluate PK Assignments
• The primary key assigned to an entity is evaluated thoroughly to avoid referential integrity problems.
• Sometimes according to the need new primary keys are identified.
• A surrogate key is an artificial PK introduced by the designer with the purpose of simplifying the
assignment of primary keys to tables.
• Surrogate keys are usually numeric, they are often automatically generated by the DBMS, they are free
of semantic content (they have no special meaning), and they are usually hidden from the end users.
▪ The Derived attribute is an attribute that derives value form other attributes values by using some
formula.
▪ According to good database design practices the value of a derived attribute need to be computed when
a new row is inserted rather than during the time of access of the derived attribute.
▪ This method helps in improving data retrieval process.
▪ A surrogate key is an artificial PK introduced by the designer with the purpose of simplifying the
assignment of primary keys to tables.
▪ Surrogate keys are usually numeric, they are often automatically generated by the DBMS.
▪ The value of surrogate key is automatically incremented for each new row.
▪ They are free of semantic content (they have no special meaning), and they are usually hidden from the
end users.
▪ Surrogate keys are used when M:N relationships exists or when Composite Primary Key exists.
• A table is in Boyce-Codd Normal Form (BCNF) when every determinant in the table is a candidate key.
• When a table contains only one candidate key then 3NF and BCNF are equivalent.
• When a table contains more than one candidate key then the table need to be corrected so that it contains
only one candidate key.
• Ex:-
• The attributes A & B together form a primary key. But it is identified that the non key attribute C
is determining the attribute B which is the part of primary key. The Entity has two candidate
keys hence not in BCNF.
• A table is in Boyce-Codd normal form (BCNF) and there are no multi-valued dependencies.
• A multi-valued dependency occurs when, for each value in field A, there is a set of values for field B
and a set of values for field C but fields B and C are not related.
Fifth Normal Form:
• A table is in fourth normal form (4NF) and there are no cyclic dependencies.
• A cyclic dependency can occur only when you have a multi-field primary key consisting of three or
more fields. For example, let's say your primary key consists of fields A, B, and C. A cyclic dependency
would arise if the values in those fields were related in pairs of A and B, B and C, and A and C.
• Fifth normal form is also called projection-join normal form. A projection is a new table holding a
subset of fields from an original table. When properly formed projections are joined, they must result in
the same set of data that was contained in the original table.
Codd’s Rules:
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up with
twelve rules of his own, which according to him, a database must obey in order to be regarded as a true
relational database.
These rules can be applied on any database system that manages stored data using only its relational
capabilities. This is a foundation rule, which acts as a base for all the other rules.
The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything
in a database must be stored in a table format.
Every single data element (value) is guaranteed to be accessible logically with a combination of table-name,
primary-key (row value), and attribute-name (column value). No other means, such as pointers, can be used to
access data.
The NULL values in a database must be given a systematic and uniform treatment. This is a very important
rule because a NULL can be interpreted as one the following − data is missing, data is not known, or data is
not applicable.
The structure description of the entire database must be stored in an online catalog, known as data dictionary,
which can be accessed by authorized users. Users can use the same query language to access the catalog which
they use to access the database itself.
A database can only be accessed using a language having linear syntax that supports data definition, data
manipulation, and transaction management operations. This language can be used directly or by means of some
application. If the database allows access to data without any help of this language, then it is considered as a
violation.
All the views of a database, which can theoretically be updated, must also be updatable by the system.
A database must support high-level insertion, updation, and deletion. This must not be limited to a single row,
that is, it must also support union, intersection and minus operations to yield sets of data records.
The data stored in a database must be independent of the applications that access the database. Any change in
the physical structure of a database must not have any impact on how the data is being accessed by external
applications.
The logical data in a database must be independent of its user’s view (application). Any change in logical data
must not affect the applications using it. For example, if two tables are merged or one is split into two different
tables, there should be no impact or change on the user application. This is one of the most difficult rule to
apply.
A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a database
independent of the front-end application and its interface.
The end-user must not be able to see that the data is distributed over various locations. Users should always get
the impression that the data is located at one site only. This rule has been regarded as the foundation of
distributed database systems.
If a system has an interface that provides access to low-level records, then the interface must not be able to
subvert the system and bypass security and integrity constraints.
DATA SECURITY
Data security refers to protective digital privacy measures that are applied to prevent unauthorized
access to computers, databases and websites. Data security also protects data from corruption. Data
security is an essential aspect of IT for organizations of every size and type.
Data security is also known as information security (IS) or computer security.
In simple terms, data security is the practice of keeping data protected from corruption and
unauthorized access. The focus behind data security is to ensure privacy while protecting personal or
corporate data.
Data is the raw form of information stored as columns and rows in our databases, network servers and
personal computers. This may be a wide range of information from personal files and intellectual
property to market analytics and details intended to top secret. Data could be anything of interest that
can be read or otherwise interpreted in human form.
However, some of this information isn't intended to leave the system. The unauthorized access of this
data could lead to numerous problems for the larger corporation or even the personal home user.
Having your bank account details stolen is just as damaging as the system administrator who was just
robbed for the client information in their database.
There has been a huge emphasis on data security as of late, largely because of the internet. There are a
number of options for locking down your data from software solutions to hardware mechanisms.
Computer users are certainly more conscious these days. The following are the essential guidelines to
secure your sensitive information.
Encryption
Encryption has become a critical security feature for thriving networks and active home users alike.
This security mechanism uses mathematical schemes and algorithms to scramble data into unreadable
text. It can only by decoded or decrypted by the party that possesses the associated key.
Authentication is another part of data security that we encounter with everyday computer usage. Just
think about when you log into your email or blog account. That single sign-on process is a form
authentication that allows you to log into applications, files, folders and even an entire computer
system. Once logged in, you have various given privileges until logging out. However, it requires
individuals to login using multiple factors of authentication. This may include a password, a one-time
password, a smart card or even a fingerprint.
Backup Solutions
Data security wouldn't be complete without a solution to backup your critical information. Though it
may appear secure while confined away in a machine, there is always a chance that your data can be
compromised. You could suddenly be hit with a malware infection where a virus destroys all of your
files. Someone could enter your computer and thieve data by sliding through a security hole in the
operating system. A reliable backup solution will allow you to restore your data instead of starting
completely from scratch.
increased considerably when computers started to be networked but with the Internet, they have
become one of the most important considerations in managing a computer system.
Hackers
Unless they are protected, computer systems are vulnerable to anyone who wants to edit, copy or delete
files without the owner’s permission. Such individuals are usually called hackers.
Malware
Malware, short for malicious software, is software designed to gain access to a computer system
without the owner's consent. The expression is a general term used by the computer industry to mean a
variety of forms of hostile, intrusive, or annoying software. These things are sometimes, incorrectly,
referred to as a computer virus. Software is considered malware based on the perceived intent of the
creator rather than any particular features.
Virus
A computer virus is a piece of software that is designed to disrupt or stop the normal working of a
computer. They are called viruses because like a biological virus, they are passed on from one infected
machine to another. Downloading software from the Internet, attachments to emails or using USB
memory sticks are the most common ways of a virus infecting your computer.
Worms
A computer worm is a self-replicating program. It uses a computer network to send copies of itself to
computers on the network and it may do so without any user intervention. It is able to do this because
of security weaknesses on the target computer. Unlike a virus it does not need to attach itself to an
existing program. Worms almost always cause at least some harm to the network, if only by consuming
bandwidth whereas viruses almost always corrupt or modify files on a targeted computer.
Trojan Horses
Trojan horses are designed to allow a hacker remote access to a target computer system. Once a Trojan
horse has been installed on a target computer system, it is possible for a hacker to access it remotely
and perform various operations. The operations that a hacker can perform are limited by user privileges
on the target computer system and the design of the Trojan horse.
Spyware
Spyware is a type of malware that is installed on computer and collects little bits of information at a
time about users without their knowledge. It can be very difficult for a user to tell if spyware is present
on a computer. Sometimes however, spywares such as key loggers are installed by a company, or on a
public computer such as in a library in order to secretly monitor other users.
While the term spyware suggests that software that secretly monitors the user's computing, the
functions of spyware extend well beyond simple monitoring. Spyware programs can collect various
types of personal information, such as Internet surfing habits and sites that have been visited, but can
also interfere with user control of the computer in other ways, such as installing additional software and
redirecting Web browser activity. Spyware is known to change computer settings, resulting in slow
connection speeds, different home pages, and/or loss of Internet or functionality of other programs.
Spyware is also known more formally as privacy-invasive software.
Adware
Adware, or advertising-supported software, is any software package that automatically plays, displays,
or downloads advertisements to a computer after the software is installed on it or while the application
is being used. Common forms of this type of malware are on websites where popup windows appear
when you land on the website. Some types of adware are also spyware.
Crimeware
Crimeware is a class of malware designed specifically to automate cybercrime. Its purpose is to carry
out identity theft. It is most often targeted at financial services companies such as banks online retailers
etc. for the purpose of taking funds from those accounts or making unauthorized transactions to benefit
the thief controlling the crimeware.
Spam
Spam is the abuse of electronic messaging systems to send unsolicited bulk messages indiscriminately.
While the most widely recognized form of spam is e-mail spam, the term is applied to similar abuses in
other media: instant messaging spam web search engine spam and social networking spam for example.
Phishing
Phishing is an e-mail fraud method in which the criminal sends out legitimate-looking email in an
attempt to gather personal and financial information from recipients. Typically, the messages appear to
come from well-known and trustworthy Web sites. Web sites that are frequently spoofed by phishers
include PayPal, eBay, MSN and Yahoo. A phishing expedition, like the fishing expedition it's named
after, is a speculative venture: the phisher puts the lure hoping to fool at least a few of the prey that
encounter it, take the bait. The criminal could then use the information to take money from the persons
account for example.
DATA BACKUP
In a computer system we have primary and secondary memory storage. Primary memory storage
devices - RAM is a volatile memory which stores disk buffer, active logs, and other related data of a
database. It stores all the recent transactions and the results too. When a query is fired, the database
first fetches in the primary memory for the data, if it does not exist there, then it moves to the
secondary memory to fetch the record. Fetching the record from primary memory is always faster than
secondary memory. If the primary memory crashes, all the data in the primary memory is lost and we
cannot recover the database.
In such cases, we can follow any one the following steps so that data in the primary memory are not
lost.
• We can create a copy of primary memory in the database with all the logs and buffers, and are
copied periodically into database. So in case of any failure, we will not lose all the data. We
can recover the data till the point it is last copied to the database.
• We can have checkpoints created at several places so that data is copied to the database.
Suppose the secondary memory itself crashes. Then all the data are lost and we cannot recover. We
have to think of some alternative solution for this because we cannot afford for loss of data in huge
database.
There are three methods used to back up the data in the secondary memory, so that it can be recovered
if there is any failure.
• Remote Backup: - Database copy is created and stored in the remote network. This database is
periodically updated with the current database so that it will be in sync with data and
other details. This remote database can be updated manually called offline backup. It can be
backed up online where the data is updated at current and remote database simultaneously. In
this case, as soon as there is a failure of current database, system automatically switches to the
remote database and starts functioning. The user will not know that there was a failure.
• In the second method, database is copied to memory devices like magnetic tapes and kept at
secured place. If there is any failure, the data would be copied from these tapes to bring the
database up.
• As the database grows, it is an overhead to backup whole database. Hence only the log files are
backed up at regular intervals. These log files will have all the information about the
transaction being made. So seeing these log files, database can be recovered. In this method
log files are backed up at regular intervals, and database is backed up once in a week.
There are two types of data backup – physical data backup and Logical data backup. The physical data
backup includes physical files like data files, log files, control files, redo- undo logs etc. They are the
foundation of the recovery mechanism in the database as they provide the minute details about the
transactions and modification to the database
Logical backup includes backup of logical data like tables, views, procedures, functions etc. Logical
data backup alone is not sufficient to recover the database as they provide only the structural
information. The physical data back actually provides the minute details about the database and is very
much important for recovery.
RECOVERY
A database is a very huge system with lots of data and transaction. The transaction in the database is
executed at each seconds of time and is very critical to the database. If there is any failure or crash
while executing the transaction, then it expected that no data is lost. It is necessary to revert the
changes of transaction to previously committed point. There are various techniques to recover the data
depending on the type of failure or crash.
• Transaction Failure: - This is the condition in the transaction where a transaction cannot
execute it further. This type of failure affects only few tables or processes. The failure can be
because of logical errors in the code or because of system errors like deadlock or
unavailability of system resources to execute the transactions.
• System Crash: - this can be because of hardware or software failure or because of external
factors like power failure. In most of the cases data in the secondary memory are not affected
because of this crash. This is because; the database has lots of integrity checkpoints to prevent
the data loss from secondary memory.
• DiskFailure: - these are the issues with hard disks like formation of bad sectors, disk head
crash, unavailability of disk etc.
As we have seen already, each transaction has ACID property. In case of transaction failure or system
crash, it should maintain its ACID property. Failing to maintain ACID is the failure of database system.
That means any transaction in the system cannot be left at the stage of its failure. It should either be
completed fully or rolled back to the previous consistent state.
Suppose there was a transaction on the Student database to enter the marks of a student in 3 subjects
and then to calculate his total. Suppose there is a transaction failure, when 3rd mark has been entered
into the table. This transaction cannot be left at this stage because student has marks in two subjects
already entered. When the system is recovered and total is calculated, it is calculated based on two
subject marks, which is not correct. In this case, either the transaction has to be completed fully to enter
the 3rd mark and calculate the total, or remove the marks that have entered already. Either completing
the transaction fully or revert the transaction fully brings the database into a consistent state and data
will not lead to any miscalculation.
• Log Based Recovery: - In this method, log of each transaction is maintained in some stable
storage, so that in case of any failure, it can be recovered from there to recover the database.
But storing the logs should be done before applying the actual transaction on the database.
Every log in this case will have information like what transaction is being executed, which values have
been modified to which value, and state of the transaction. All these log information will be stored in
the order of execution.
• Shadow paging: - This is the method where all the transactions are executed in the primary
memory. Once all the transactions completely executed, it will be updated to the database.
Hence, if there is any failure in the middle of transaction, it will not be reflected in the
database. Database will be updated after all the transaction.
Unit - IV
SQL Commands
• SQL commands are instructions used to communicate with the database to perform specific
task that work with data.
• SQL commands can be used not only for searching the database but also to perform various
other functions like, for example, you can create tables, add data to tables, or modify data,
• SQL commands are grouped into four major categories depending on their functionality:
• Data Definition Language (DDL) - These SQL commands are used for creating,
modifying, and dropping the structure of database tables. The commands are CREATE, ALTER,
DROP, RENAME, and TRUNCATE.
• Data Manipulation Language (DML) - These SQL commands are used for storing, retrieving,
modifying, and deleting data. These commands are SELECT, INSERT, UPDATE, and DELETE.
• Transaction Control Language (TCL) - These SQL commands are used for managing changes
affecting the data. These commands are COMMIT, and ROLLBACK.
• Data Control Language (DCL) - These SQL commands are used for providing security to database
objects. These commands are GRANT and REVOKE.
Data Definition Language (DDL)- These SQL commands are used for creating, modifying, and
dropping the structure of database tables. The commands are CREATE, ALTER, DROP, RENAME, and
TRUNCATE.
1. Create Statement:
This command is used for creating the structure of a table;
Using this command we specify the details like name of the column, datatype, size and Constraints
Syntax:-
2. Drop Statement:
• This command is used for deleting the table structure permanently from the database.
• When this command is used the data present in the table and the structure of table is deleted
permanently from the database.
• Syntax:
Drop table tablename;
• Example:- delete table student from the database
SQL> drop table student;
3. Rename Statement:
• This command is used for changing the name of a table.
• Syntax:
Rename table tablename to newtablename;
• Example:- Change the name of table student to newstudent
Ans:-SQL>rename table student to newstudent;
4. Truncate:
• This command is used for permanently deleting complete data present in the table.
• The structure of the table is retained.
• Syntax:
SQL>TRUNCATE TABLE table_name;
• Example:- Delete the data present in Student table.
Ans:-SQL>truncate table student;
1. Insert:
• The SQL INSERT Statement is used to add a new row of data into a table in the database.
• There are two basic syntaxes of INSERT statement as follows:
Here, column1, column2,...columnN are the names of the columns in the table into which you want to insert
data.
2. Update:
• The UPDATE statement allows you to update a single record or multiple records in a table.
• The syntax for the SQL UPDATE statement is:
UPDATE table
SET column = newvalue
WHERE condition;
• Example: update all supplier names in the suppliers table from IBM to HP.
SQL>UPDATE suppliers
SET name = 'HP'
WHERE name = 'IBM';
3. Delete:
• The SQL DELETE Query is used to delete the existing records from a table.
• You can use WHERE clause with DELETE query to delete selected rows, otherwise all the
records would be deleted.
• Syntax:
DELETE FROM table_name
WHERE [condition];
• Example: delete the employee whose salary is greater than 2000.
Delete from emp
Where sal>2000;
4. Select Statement
• SQL SELECT statement is used to fetch the data from a database table which returns data in the
form of result table.
• These result tables are called result-sets.
• A query may retrieve information from specified columns or from all of the columns in the table.
1. COMMIT Command
The COMMIT command is the transactional command used to save changes invoked by a transaction to the
database.
The COMMIT command is the transactional command used to save changes invoked by a transaction to the
database. The COMMIT command saves all the transactions to the database since the last COMMIT or
ROLLBACK command.
The syntax for the COMMIT command is as follows.
COMMIT;
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example which would delete those records from the table which have age = 25 and then
COMMIT the changes in the database.
SQL> DELETE FROM CUSTOMERS
WHERE AGE = 25;
SQL> COMMIT;
Thus, two rows from the table would be deleted and the SELECT statement would produce the following
result.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
2. ROLLBACK Command
The ROLLBACK command is the transactional command used to undo transactions that have not already been
saved to the database. This command can only be used to undo transactions since the last COMMIT or
ROLLBACK command was issued.
The syntax for a ROLLBACK command is as follows −
ROLLBACK;
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
3. SAVEPOINT Command
A SAVEPOINT is a point in a transaction when you can roll the transaction back to a certain point without
rolling back the entire transaction.
The syntax for a SAVEPOINT command is as shown below.
SAVEPOINT SAVEPOINT_NAME;
This command serves only in the creation of a SAVEPOINT among all the transactional statements. The
ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as shown below.
ROLLBACK TO SAVEPOINT_NAME;
Following is an example where you plan to delete the three different records from the CUSTOMERS table.
You want to create a SAVEPOINT before each delete, so that you can ROLLBACK to any SAVEPOINT at
any time to return the appropriate data to its original state.
Example
Consider the CUSTOMERS table having the following records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code block contains the series of operations.
SQL> SAVEPOINT SP1;
Savepoint created.
SQL> DELETE FROM CUSTOMERS WHERE ID=1;
1 row deleted.
SQL> SAVEPOINT SP2;
Savepoint created.
SQL> DELETE FROM CUSTOMERS WHERE ID=2;
1 row deleted.
SQL> SAVEPOINT SP3;
Savepoint created.
SQL> DELETE FROM CUSTOMERS WHERE ID=3;
1 row deleted.
Now that the three deletions have taken place, let us assume that you have changed your mind and decided to
ROLLBACK to the SAVEPOINT that you identified as SP2. Because SP2 was created after the first deletion,
the last two deletions are undone −
SQL> ROLLBACK TO SP2;
Rollback complete.
Notice that only the first deletion took place since you rolled back to SP2.
SQL> SELECT * FROM CUSTOMERS;
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
SQL Operators
An operator is a reserved word or a character used primarily in an SQL statement's WHERE clause to perform
operation(s), such as comparisons and arithmetic operations. These Operators are used to specify conditions in
an SQL statement and to serve as conjunctions for multiple conditions in a statement.
• Arithmetic Operators
• Logical Operators
• Comparison Operators
• Special Operators
1. SQL Arithmetic Operators
Assume 'variable a' holds 10 and 'variable b' holds 20, then –
* (Multiplication) Multiplies values on either side of the operator. a * b will give 200
Examples
Here are a few simple examples showing the usage of SQL Arithmetic Operators −
Example 1
Output
+--------+
| 10+ 20 |
+--------+
| 30 |
+--------+
1 row in set (0.00 sec)
Example 2
Output
+---------+
| 10 * 20 |
+---------+
| 200 |
+---------+
1 row in set (0.00 sec)
Example 3
SQL> select 10 / 5;
Output
+--------+
| 10 / 5 |
+--------+
| 2.0000 |
+--------+
1 row in set (0.03 sec)
Example 4
SQL> select 12 % 5;
Output
+---------+
| 12 % 5 |
+---------+
| 2 |
+---------+
1 row in set (0.00 sec)
Logical
Description
Operators
For the row to be selected at least one of the conditions
OR
must be true.
For a row to be selected all the specified conditions must
AND
be true.
For a row to be selected the specified condition must be
NOT
false.
SELECT first_name, last_name, age FROM student_details WHERE age >= 10 AND age <=
15;
The following table describes how logical "AND" operator selects a row.
OR
subject = 'Science'
The following table describes how logical "OR" operator selects a row.
'Football'
The following table describes how logical "NOT" operator selects a row.
Assume 'variable a' holds 10 and 'variable b' holds 20, then −
= Checks if the values of two operands are equal or not, if yes then (a = b) is not
Checks if the values of two operands are equal or not, if values are
!= (a != b) is true.
not equal then condition becomes true.
Checks if the values of two operands are equal or not, if values are
<> (a < > b) is true.
not equal then condition becomes true.
Checks if the value of left operand is greater than the value of right (a > b) is not
>
operand, if yes then condition becomes true. true.
Checks if the value of left operand is less than the value of right
< (a < b) is true.
operand, if yes then condition becomes true.
Checks if the value of left operand is greater than or equal to the (a >= b) is not
>=
value of right operand, if yes then condition becomes true. true.
Checks if the value of left operand is less than or equal to the value
<= (a <= b) is true.
of right operand, if yes then condition becomes true.
Checks if the value of left operand is not less than the value of right
!< (a !< b) is false.
operand, if yes then condition becomes true.
Checks if the value of left operand is not greater than the value of
!> (a !> b) is true.
right operand, if yes then condition becomes true.
Examples
Consider the CUSTOMERS table having the following records −
SQL> SELECT * FROM CUSTOMERS;
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
7 rows in set (0.00 sec)
Here are some simple examples showing the usage of SQL Comparison Operators −
Example 1
Output
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
3 rows in set (0.00 sec)
Example 2
SQL> SELECT * FROM CUSTOMERS WHERE SALARY = 2000;
Output
+----+---------+-----+-----------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+---------+-----+-----------+---------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+---------+-----+-----------+---------+
2 rows in set (0.00 sec)
Example 3
SQL> SELECT * FROM CUSTOMERS WHERE SALARY != 2000;
Output
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
5 rows in set (0.00 sec)
Example 4
SQL> SELECT * FROM CUSTOMERS WHERE SALARY <> 2000;
Output
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
5 rows in set (0.00 sec)
Standard SQL allows the use of IS NULL to check for a null attribute value. For example, suppose that you
want to list all products that do not have a vendor assigned (V_CODE is null). Such a null entry could be found
by using the command sequence:
Similarly, if you want to check a null date entry, the command sequence is:
Note that SQL uses a special operator to test for nulls. Why? Couldn’t you just enter a condition such as
_V_CODE = NULL_? No. Technically, NULL is not a “value” the way the number 0 (zero) or the blank space
is, but instead a NULL is a special property of an attribute that represents precisely the absence of any value.
The LIKE special operator is used in conjunction with wildcards to find patterns within string attributes.
Standard SQL allows you to use the percent sign (%) and underscore (_) wildcard characters to make matches
when the entire string is not known:
➢ _ % means any and all following or preceding characters are eligible. For example,
➢ 'J%' includes Johnson, Jones, Jernigan, July, and J-231Q.
➢ 'Jo%' includes Johnson and Jones.
➢ '%n' includes Johnson and Jernigan.
➢ _ _ means any one character may be substituted for the underscore. For example,
➢ '_23-456-6789' includes 123-456-6789, 223-456-6789, and 323-456-6789.
For example, the following query would find all VENDOR rows for contacts whose last names begin with
Smith.
Suppose that you do not know whether a person’s name is spelled Johnson or Johnsen. The wildcard character _
lets you find a match for either spelling. The proper search would be instituted by the query:
The logical operators may be used in conjunction with the special operators. For instance, the query:
The EXISTS special operator can be used whenever there is a requirement to execute a command based on the
result of another query. That is, if a subquery returns any rows, run the main query; otherwise, don’t. For
example, the following query will list all vendors, but only if there are products to order:
The EXISTS special operator is used in the following example to list all vendors, but only if there are products
with the quantity on hand, less than double the minimum quantity:
UNIT-V
SQL Joins
➢ SQL Joins are used to relate information in different tables.
➢ A Join condition is a part of the sql query that retrieves rows from two or more tables.
➢ A SQL Join condition is used in the SQL WHERE Clause of select
➢ The Syntax for joining two tables is:
• If a sql join condition is omitted or if it is invalid the join operation will result in a Cartesian product.
• The Cartesian product returns a number of rows equal to the product of all rows in all the tables being
joined.
➢ For example, if the first table has 20 rows and the second table has 10 rows, the result will be 20 * 10, or
200 rows. This query takes a long time to execute.
➢ SQL Joins can be classified into Equi join and Non Equijoin.
➢ It is a simple sql join condition which uses the equal sign as the comparison operator.
➢ Two types of equi joins are SQL Outer join and SQL Inner join.
➢ All the rows returned by the sql query satisfy the sql join condition specified.
For example: Display the product information for each order
➢ Since you are retrieving the data from two tables, we need to identify the common column between these
two tables, which is the product_id.
➢ The query for this type of sql joins would be like,
SELECT order_id, product_name, unit_price, supplier_name, total_units
FROM product, order_items
WHERE order_items.product_id = product.product_id;
➢ We can also use aliases to reference the column name,
➢ This sql join condition returns all rows from both tables which satisfy the join condition along with rows
which do not satisfy the join condition from one of the tables.
➢ The sql outer join operator in Oracle is ( + ) and is used on one side of the join condition only.
Example:-Display all the product data along with order items data, with null values displayed for order
items if a product has no order item, the sql query for outer join would be as shown below:
SELECT p.product_id, p.product_name, o.order_id, o.total_units
FROM order_items o, product p
WHERE o.product_id (+) = p.product_id;
➢ NOTE: If the (+) operator is used in the left side of the join condition it is equivalent to left outer join. If
used on the right side of the join condition it is equivalent to right outer join.
➢ A Self Join is a type of sql join which is used to join a table to itself, particularly when the table has a
FOREIGN KEY that references its own PRIMARY KEY. It is necessary to ensure that the join statement
defines an alias for both copies of the table to avoid column ambiguity.
➢ The below query is an example of a self join,
SELECT a.sales_person_id, a.name, a.manager_id, b.sales_person_id, b.name
FROM sales_person a, sales_person b
WHERE a.manager_id = b.sales_person_id;
➢ It is a sql join condition which makes use of some comparison operator other than the equal sign like
>, <, >=, <=
➢ For example: If you want to find the names of students who are not studying either Economics, the SQL
query would be like, (lets use student_details table defined earlier.)
SELECT first_name, last_name, subject
FROM student_details WHERE subject != 'Economics'
• SQL SELECT statement is used to fetch the data from a database table which returns data in the form of
result table.
• These result tables are called result-sets.
• A query may retrieve information from specified columns or from all of the columns in the table.
Syntax of SQL SELECT Statement:
SELECT column_list
FROM table-name
[WHERE Clause]
[GROUP BY clause]
[HAVING clause]
[ORDER BY clause];
• table-name is the name of the table from which the information is retrieved.
• column_list includes one or more columns from which data is retrieved.
• The code within the brackets is optional.
• Select column_list clause of select statement is used for specifying the columns required for display.
• FROM table-name clause of select statement is used for specifying the source tables from which data need
to be retrieved.
• [WHERE Clause] is used for filtering the data according to the given criteria
• [GROUP BY clause] is used for grouping the data according to the given column
• [HAVING clause] is used to apply filter on the rows obtained after group by clause.
• [ORDER BY clause] is used to sort the data in ascending or descending order according to the given
column.
Example1: Display the names of employees who year more than 4000.
SQL>select ename from emp where sal>4000;
Example2.: Display number of clerks in each department.
SQL> select deptno, count(empno) from emp
Where job=’clerk’
Group by deptno;
Example 3: Display employee details for employee in ascending order of their salary.
SQL> select * from emp order by sal;
The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data
into groups.
The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the ORDER BY
clause.
Syntax:
The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the conditions in
the WHERE clause and must precede the ORDER BY clause if one is used.
SELECT column1, column2
FROM table_name
WHERE [ conditions ]
GROUP BY column1, column2
ORDER BY column1, column2
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
If you want to know the total amount of salary on each customer, then GROUP BY query would be as follows:
+----------+-------------+
| NAME | SUM(SALARY) |
+----------+-------------+
| Chaitali | 6500.00 |
| Hardik | 8500.00 |
| kaushik | 2000.00 |
| Khilan | 1500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh | 2000.00 |
+----------+-------------+
Now, let us have following table where CUSTOMERS table has the following records with duplicate names:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Now again, if you want to know the total amount of salary on each customer, then GROUP BY query would
be as follows:
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result:
+---------+-------------+
| NAME | SUM(SALARY) |
+---------+-------------+
| Hardik | 8500.00 |
| kaushik | 8500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh | 3500.00 |
+---------+-------------+
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more
columns. Some database sorts query results in ascending order by default.
Syntax:
You can use more than one column in the ORDER BY clause. Make sure whatever column you are using to
sort, that column should be in column-list.
Example: Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
Following is an example, which would sort the result in ascending order by NAME and SALARY:
SQL> SELECT * FROM CUSTOMERS ORDER BY NAME, SALARY;
This would produce the following result:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in descending order by NAME:
The SQL DISTINCT keyword is used in conjunction with the SELECT statement to eliminate all the
duplicate records and fetching only unique records.
There may be a situation when you have multiple duplicate records in a table. While fetching such records, it
makes more sense to fetch only those unique records instead of fetching duplicate records.
Syntax:
The basic syntax of DISTINCT keyword to eliminate the duplicate records is as follows −
SELECT DISTINCT column1, column2,.....columnN
FROM table_name WHERE [condition]
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
First, let us see how the following SELECT query returns the duplicate salary records.
SQL> SELECT SALARY FROM CUSTOMERS
ORDER BY SALARY;
This would produce the following result, where the salary (2000) is coming twice which is a duplicate record
from the original table.
+----------+
| SALARY |
+----------+
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+----------+
Now, let us use the DISTINCT keyword with the above SELECT query and then see the result.
SQL> SELECT DISTINCT SALARY FROM CUSTOMERS ORDER BY SALARY;
This would produce the following result where we do not have any duplicate entry.
+----------+
| SALARY |
+----------+
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+----------+
The HAVING Clause enables you to specify conditions that filter which group results appear in the results.
The WHERE clause places conditions on the selected columns, whereas the HAVING clause places conditions
on groups created by the GROUP BY clause.
Syntax
The following code block shows the position of the HAVING Clause in a query.
SELECT FROM WHERE GROUP BY HAVING ORDER BY
The HAVING clause must follow the GROUP BY clause in a query and must also precede the ORDER BY
clause if used. The following code block has the syntax of the SELECT statement including the HAVING
clause −
SELECT column1, column2
FROM table1, table2
WHERE [ conditions ]
GROUP BY column1, column2
HAVING [ conditions ]
ORDER BY column1, column2
Example
+----+----------+-----+-----------+----------+
Following is an example, which would display a record for a similar age count that would be more than or
equal to 2.
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;
This would produce the following result −
+----+--------+-----+---------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+--------+-----+---------+---------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
1. COUNT Function: SQL COUNT function is the simplest function and very useful in counting the number of
records, which are expected to be returned by a SELECT statement.To understand COUNT function, consider
an employee_tbl table, which is having the following records −
SQL> SELECT * FROM employee_tbl;
+------+------+------------+--------------------+
| id | name | work_date | daily_typing_pages |
+------+------+------------+--------------------+
| 1 | John | 2007-01-24 | 250 |
| 2 | Ram | 2007-05-27 | 220 |
| 3 | Jack | 2007-05-06 | 170 |
| 5 | Zara | 350 |
+------+------+-------------------------+
3. MIN Function: SQL MIN function is used to find out the record with minimum value among a record set.
To understand MIN function, consider an employee_tbl table
Now suppose based on the above table you want to fetch minimum value of daily_typing_pages, then you can
do so simply using the following command −
SQL> SELECT MIN(daily_typing_pages) FROM employee_tbl;
+-------------------------+
| MIN(daily_typing_pages) |
+-------------------------+
| 100 |
+-------------------------+
You can find all the records with minimum value for each name using GROUP BY clause as follows −
SQL> SELECT id, name, work_date, MIN(daily_typing_pages) FROM employee_tbl GROUP BY name;
+------+------+-------------------------+
| id | name | MIN(daily_typing_pages) |
+------+------+-------------------------+
| 3 | Jack | 100 |
| 4 | Jill | 220 |
| 1 | John | 250 |
| 2 | Ram | 220 |
| 5 | Zara | 300 |
+------+------+-------------------------+
4. AVG Function: SQL AVG function is used to find out the average of a field in various records.
To understand AVG function, consider an employee_tbl table
Now suppose based on the above table you want to calculate average of all the dialy_typing_pages, then you
can do so by using the following command −
SQL> SELECT AVG(daily_typing_pages) FROM employee_tbl;
+-------------------------+
| AVG(daily_typing_pages) |
+-------------------------+
| 230.0000 |
+-------------------------+
You can take average of various records set using GROUP BYclause. Following example will take average all
the records related to a single person and you will have average typed pages by every person.
SQL> SELECT name, AVG(daily_typing_pages) FROM employee_tbl GROUP BY name;
+------+-------------------------+
| name | AVG(daily_typing_pages) |
+------+-------------------------+
| Jack | 135.0000 |
| Jill | 220.0000 |
| John | 250.0000 |
| Ram | 220.0000 |
| Zara | 325.0000 |
+------+-------------------------+
5. SUM Function: SQL SUM function is used to find out the sum of a field in various records.To
understand SUM function, consider an employee_tbl table.
Now suppose based on the above table you want to calculate total of all the dialy_typing_pages, then you can
do so by using the following command −
SQL> SELECT SUM(daily_typing_pages) FROM employee_tbl;
+-------------------------+
| SUM(daily_typing_pages) |
+-------------------------+
| 1610 |
+-------------------------+
You can take sum of various records set using GROUP BY clause. Following example will sum up all the
records related to a single person and you will have total typed pages by every person.
SQL> SELECT name, SUM(daily_typing_pages) FROM employee_tbl GROUP BY name;
+------+-------------------------+
| name | SUM(daily_typing_pages) |
+------+-------------------------+
| Jack | 270 |
| Jill | 220 |
| John | 250 |
| Ram | 220 |
| Zara | 650 |
+------+-------------------------+
6. SQRT Function: SQL SQRT function is used to find out the square root of any number. You can Use
SELECT statement to find out square root of any number as follows −
SQL> select SQRT(16);
+----------+
| SQRT(16) |
+----------+
| 4.000000 |
+----------+
You are seeing float value here because internally SQL will manipulate square root in float data type.
You can use SQRT function to find out square root of various records as well. To understand SQRT function
in more detail consider, an employee_tbl, table which is having the following records −
SQL> SELECT * FROM employee_tbl;
+------+------+------------+--------------------+
| id | name | work_date | daily_typing_pages |
+------+------+------------+--------------------+
| 1 | John | 2007-01-24 | 250 |
| 2 | Ram | 2007-05-27 | 220 |
| 3 | Jack | 2007-05-06 | 170 |
| 3 | Jack | 2007-04-06 | 100 |
| 4 | Jill | 2007-04-06 | 220 |
| 5 | Zara | 2007-06-06 | 300 |
| 5 | Zara | 2007-02-06 | 350 |
+------+------+------------+--------------------+
Now suppose based on the above table you want to calculate square root of all the dialy_typing_pages, then
you can do so by using the following command −
SQL> SELECT name, SQRT(daily_typing_pages) FROM employee_tbl;
+------+--------------------------+
| name | SQRT(daily_typing_pages) |
+------+--------------------------+
| John | 15.811388 |
| Ram | 14.832397 |
| Jack | 13.038405 |
| Jack | 10.000000 |
| Jill | 14.832397 |
| Zara | 17.320508 |
| Zara | 18.708287 |
+------+--------------------------+
The UNION set operator returns the combined results of the two SELECT statements.Essentially,it
removes duplicates from the results i.e. only one row will be listed for each duplicated result.To counter this
behavior, use the UNION ALL set operator which retains the duplicates in the final result.INTERSECT lists
only records that are common to both the SELECT queries; the MINUS set operator removes the second
query's results from the output if they are also found in the first query's results. INTERSECT and MINUS set
operations produce unduplicated results.
All the SET operators share the same degree of precedence among them.Instead,during query execution,
Oracle starts evaluation from left to right or from top to bottom.If explicitly parentheses are used, then the
order may differ as parentheses would be given priority over dangling operators.
Points to remember -
➢ Same number of columns must be selected by all participating SELECT statements. Column names used in the
display are taken from the first query.
➢ Data types of the column list must be compatible implicitly convertible by oracle. Oracle will not perform
implicit type conversion if corresponding columns in the component queries belong to different data type
groups. For example, if a column in the first component query is of data type DATE, and the corresponding
column in the second component query is of data type CHAR, Oracle will not perform implicit conversion, but
raise ORA-01790 error.
➢ Positional ordering must be used to sort the result set. Individual result set ordering is not allowed with Set
operators. ORDER BY can appear once at the end of the query.
➢ For example, UNION and INTERSECT operators are commutative, i.e. the order of queries is not important; it
doesn't change the final result.
➢ Performance wise, UNION ALL shows better performance as compared to UNION because resources are not
wasted in filtering duplicates and sorting the result set.
➢ Set operators can be the part of sub queries.
Consider the below five queries joined using UNION operator.The final combined result set contains value
from all the SQLs. Note the duplication removal and sorting of data.
NUM
-------
1
3
5
6
To be noted, the columns selected in the SELECT queries must be of compatible data type. Oracle throws an
error message when the rule is violated.
UNION and UNION ALL are similar in their functioning with a slight difference. But UNION ALL gives the
result set without removing duplication and sorting the data. For example,in above query UNION is replaced
by UNION ALL to see the effect.
UNION ALL Syntax
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL:
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
Consider the query demonstrated in UNION section. Note the difference in the output which is generated
without sorting and deduplication.
SELECT 1 NUM FROM DUAL
UNION ALL
SELECT 5 FROM DUAL
UNION ALL
SELECT 3 FROM DUAL
UNION ALL
NUM
-------
1
5
3
6
3
SALARY
---------
1500
1200
2000
The SQL MINUS Operator:
Minus operator displays the rows which are present in the first query but absent in the second query, with no
duplicates and data arranged in ascending order by default.
SELECT JOB_ID
FROM employees
WHERE DEPARTMENT_ID = 10
MINUS
SELECT JOB_ID
FROM employees
JOB_ID
-------------
HR
FIN
ADMIN
SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500) ;
Example:
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
Example:
Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS
table.
Following example updates SALARY by 0.25 times in CUSTOMERS table for all the customers
whose AGE is greater than or equal to 27:
FROM EMP T
WHERE E.department_id = T.department_id)