0% found this document useful (0 votes)
7 views

2021 Dbms

Uploaded by

Shiji Mathew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

2021 Dbms

Uploaded by

Shiji Mathew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

2021

1. A database designer, sometimes called a database architect or a data officer, is responsible for
designing the databases in an organization. He/she carefully evaluates the business
requirements and drafts the data models. Then, initial discussions with the business occur to
validate the understanding of the data and the business processes. A database designer’s job
is to translate the client’s business requirements into a data model that not only stores the
business data correctly but also supports the processes that use the data. A database
designer’s job also includes preparing supporting documentation when building the physical
database.

2. Procedural DML language:


Procedural Data Manipulation Language (DML) and non-procedural DML are two different types of
DMLs (Data Manipulation Language) that are used to manipulate data in a database.
Procedural DML

Procedural DMLs are a type of programming language that allows users to specify a series of actions to
be taken on a database. These actions are often executed in a specific order, or “procedure,” hence the
name. OR
In simple words, Procedural DML is a type of DML that requires the user to specify how to manipulate
the data. It requires the user to specify the steps that the system should take to manipulate the data.
Examples of procedural DMLs include languages such as COBOL, FORTRAN, and PL/SQL.

Non-procedural DML

Non-procedural DMLs, on the other hand, do not require users to specify a specific series of actions to
be taken on a database. Instead, they allow users to specify the desired result of a query, and the
database system itself is responsible for determining the most efficient way to achieve that result. Non-
procedural DMLs are often easier for users to learn and use, as they do not require a detailed
understanding of how the database system works. OR,
Non-procedural DML is a type of DML that does not require the user to specify how to manipulate the
data. It allows the user to specify what data they want, but not how to retrieve it. Non-procedural DMLs
are often easier to use than procedural DMLs, because they do not require the user to have as much
knowledge about the structure of the database.
Examples of non-procedural DMLs include SQL and QBE (Query By Example).

Difference between Procedural and Non-Procedural language:


Procedural Language Non-Procedural Language

It is command-driven language. It is a function-driven language

It works through the state of machine. It works through the mathematical functions.

Its semantics are quite tough. Its semantics are very simple.

It returns only restricted data types and allowed


It can return any datatype or value
values.

Overall efficiency is low as compared to


Overall efficiency is very high.
Procedural Language.

Size of the program written in Procedural language is Size of the Non-Procedural language
large. programs are small.
Procedural Language Non-Procedural Language

It is not suitable for time critical applications. It is suitable for time critical applications.

Iterative loops and Recursive calls both are used in Recursive calls are used in Non-Procedural
the Procedural languages. languages.

3. Responsibilities of buffer manager in dbms


The buffer manager is an important part of the database system. The data that are stored and to be
handled may be larger than the memory available. In such situations, it is the job of buffer manager to
handle it efficiently. It is sometimes referred as cache manager.
Responsibilities/roles of Buffer Manager

 Handles all requests for database blocks.


 Reads data from disk pages (hard disk) into main memory (RAM) whenever needed.
 If the data already exist in main memory, then buffer manager passes the address in main
memory to the requestor.
 If not, then fetch the data from hard disk to main memory and pass the address in memory to the
requestor.
 It decides which data to be cached in main memory.
The buffer manager has to use the following techniques to deliver its responsibilities efficiently;
 Replacement strategy – to create room for new data blocks from disk pages
 Pinned blocks – to recover database from crashes
 Forced output of blocks – to write forcefully back to disk as required

4. Derived attribute: An attribute is a property or characteristic of an entity. An entity may contain


any number of attributes. One of the attributes is considered as the primary key. In an Entity-
Relation model, attributes are represented in an elliptical shape.
Example: Student has attributes like name, age, roll number, and many more. To uniquely identify the
student, we use the primary key as a roll number as it is not repeated. Attributes can also be
subdivided into another set of attributes.

There are six such types of attributes: Simple, Composite, Single-valued, Multi-valued, and Derived
attribute.
Simple attribute :
An attribute that cannot be further subdivided into components is a simple attribute.
Example: The roll number of a student, the id number of an employee.
Composite attribute :
An attribute that can be split into components is a composite attribute.
Example: The address can be further split into house number, street number, city, state, country, and
pin code, the name can also be split into first name middle name, and last name.
Single-valued attribute :
The attribute which takes up only a single value for each entity instance is a single-valued attribute.
Example: The age of a student.
Multi-valued attribute :
The attribute which takes up more than a single value for each entity instance is a multi-valued
attribute.
Example: Phone number of a student: Landline and mobile.
Derived attribute :
An attribute that can be derived from other attributes is derived attributes.
Example: Total and average marks of a student.
Complex attribute :
Those attributes, which can be formed by the nesting of composite and multi-valued attributes, are
called “Complex Attributes“. These attributes are rarely used in DBMS(DataBase
Management System). That’s why they are not so popular.

Stored attribute:

The stored attribute are those attribute which doesn’t require any type of further update since they are
stored in the database.
Example: DOB(Date of birth) is the stored attribute.

Key attribute:

Key attributes are those attributes that can uniquely identify the entity in the entity set.
Example: Roll-No is the key attribute because it can uniquely identify the student.
Representation:
Complex attributes are the nesting of two or more composite and multi-valued attributes. Therefore,
these multi-valued and composite attributes are called ‘Components’ of complex attributes.
These components are grouped between parentheses ‘( )’ and multi-valued attributes between
curly braces ‘{ }’, Components are separated by commas ‘, ‘.
For example: let us consider a person having multiple phone numbers, emails, and an address.
Here, phone number and email are examples of multi-valued attributes and address is an example of
the composite attribute, because it can be divided into house number, street, city, and state.

Complex attributes

Components:
Email, Phone number, Address(All are separated by commas and multi-valued components are
represented between curly braces).
Complex Attribute: Address_EmPhone(You can choose any name).

Null Attribute:

This attribute can take NULL value when entity does not have value for it.
Example:
The ‘Net Banking Active Bin’ attribute gives weather particular customer having net banking facility
activated or not activated.
For bank which does not offer facility of net banking in customer table ‘Net Banking Active Bin’
attribute is always null till Net banking facility is not activated as this attribute indicates Bank offers net
banking facility or does not offers.

5. Define Domains and Tuples.


A tuple, also known as a record or row, is a basic unit of data in a relational database management
system (DBMS). A tuple represents a single instance of a relation, or table, in the database. Each
tuple contains a set of values, or attributes, that correspond to the columns, or fields, of the relation. A
tuple in a database management system is one record in the context of relational databases (one
row). One row in a table is known as a tuple.

Domain refers to current set of values found under an attribute name. It is the value under a column.

6. What is the basic SQL DDL commands?


Structured Query Language(SQL) is the database language by the use of which we can perform
certain operations on the existing database and also we can use this language to create a database.
SQL uses certain commands like Create, Drop, Insert, etc. to carry out the required tasks.
These SQL commands are mainly categorized into five categories:
1. DDL – Data Definition Language
2. DQL – Data Query Language
3. DML – Data Manipulation Language
4. DCL – Data Control Language
5. TCL – Transaction Control Language

DDL (Data Definition Language)


DDL or Data Definition Language actually consists of the SQL commands that can be used to define the
database schema. It simply deals with descriptions of the database schema and is used to create and
modify the structure of database objects in the database. DDL is a set of SQL commands used to create,
modify, and delete database structures but not data. These commands are normally not used by a
general user, who should be accessing the database via an application.
List of DDL commands:
 CREATE: This command is used to create the database or its objects (like table, index, function,
views, store procedure, and triggers).
 DROP: This command is used to delete objects from the database.
 ALTER: This is used to alter the structure of the database.
 TRUNCATE: This is used to remove all records from a table, including all spaces allocated for the
records are removed.
 COMMENT: This is used to add comments to the data dictionary.
 RENAME: This is used to rename an object existing in the database.

DQL (Data Query Language)


DQL statements are used for performing queries on the data within schema objects. The purpose of the
DQL Command is to get some schema relation based on the query passed to it. We can define DQL as
follows it is a component of SQL statement that allows getting data from the database and imposing
order upon it. It includes the SELECT statement. This command allows getting the data out of the
database to perform operations with it. When a SELECT is fired against a table or tables t he result is
compiled into a further temporary table, which is displayed or perhaps received by the program i.e. a
front-end.
List of DQL:
 SELECT: It is used to retrieve data from the database.

DML(Data Manipulation Language)


The SQL commands that deal with the manipulation of data present in the database belong to DML or
Data Manipulation Language and this includes most of the SQL statements. It is the component of the
SQL statement that controls access to data and to the database. Basically, DCL statements are grouped
with DML statements.
List of DML commands:
 INSERT: It is used to insert data into a table.
 UPDATE: It is used to update existing data within a table.
 DELETE: It is used to delete records from a database table.
 LOCK: Table control concurrency.
 CALL: Call a PL/SQL or JAVA subprogram.
 EXPLAIN PLAN: It describes the access path to data.

DCL (Data Control Language)


DCL includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions,
and other controls of the database system.
List of DCL commands:
GRANT: This command gives users access privileges to the database.
Syntax:
GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;
REVOKE: This command withdraws the user’s access privileges given by using the GRANT command.
Syntax:
REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;

TCL (Transaction Control Language)


Transactions group a set of tasks into a single execution unit. Each transaction begins with a specific
task and ends when all the tasks in the group successfully complete. If any of the tasks fail, the
transaction fails. Therefore, a transaction has only two results: success or failure. You can explore more
about transactions here. Hence, the following TCL commands are used to control the execution of a
transaction:
BEGIN: Opens a Transaction.
COMMIT: Commits a Transaction.
Syntax:
COMMIT;
ROLLBACK: Rollbacks a transaction in case of any error occurs.
Syntax:
ROLLBACK;
SAVEPOINT: Sets a save point within a transaction.
Syntax:
SAVEPOINT SAVEPOINT_NAME;

7. Syntax of SELECT command with example


After creating the table in an SQL database and inserting values into it, the next step is to check
whether the values are inserted properly into this table or not. For that, one must try to retrieve the
records present in the said table. You can do this using the SELECT statement.
The SELECT statement is used to fetch the data from a database table which returns this data in the
form of a result table. These result tables are called result-sets.
The basic syntax of the SELECT statement is as follows −
SELECT column1, column2, columnN FROM table_name;

Here, column1, column2... are the fields of a table whose values you want to fetch. If you want to fetch all
the fields available in the field, then you can use the following syntax.
SELECT * FROM table_name;
Example
Consider the CUSTOMERS table having the following records –

To Retrieve Selective Fields


The following code is an example, which would fetch the ID, Name and Salary fields of the customers
available in CUSTOMERS table.
SELECT ID, NAME, SALARY FROM CUSTOMERS;
Output
This would produce the following result −
8. How to order rows in SQL:
The ORDER BY is an optional clause of the SELECT statement. The ORDER BY clause allows you
to sort the rows returned by the SELECT clause by one or more sort expressions in ascending or
descending order.

There are several ways in which we can sort query results:

You can arrange rows in ascending or descending order By default, SQL uses order-by columns to
arrange rows in ascending order. For example, to arrange the book titles by ascending price, simply sort
the rows by the price column. The resulting SQL might look like this:

SELECT *

FROM titles

ORDER BY price

9. Modification Anomaly: Anomaly means inconsistency in the pattern from the normal form. In
Database Management System (DBMS), anomaly means the inconsistency occurred in the
relational table during the operations performed on the relational table.

There can be various reasons for anomalies to occur in the database. For example, if there is a lot of
redundant data present in our database then DBMS anomalies can occur. If a table is constructed in a
very poor manner then there is a chance of database anomaly. Due to database anomalies, the integrity
of the database suffers.

The other reason for the database anomalies is that all the data is stored in a single table. So, to remove
the anomalies of the database, normalization is the process which is done where the splitting of the table
and joining of the table (different types of join) occurs.

There can be three types of an anomaly in the database:

Updation / Update Anomaly

When we update some rows in the table, and if it leads to the inconsistency of the table then this anomaly
occurs. This type of anomaly is known as an updation anomaly. In the above table, if we want to update
the address of Ramesh then we will have to update all the rows where Ramesh is present. If during the
update we miss any single row, then there will be two addresses of Ramesh, which will lead to inconsistent
and wrong databases.

Insertion Anomaly

If there is a new row inserted in the table and it creates the inconsistency in the table then it is called the
insertion anomaly. For example, if in the above table, we create a new row of a worker, and if it is not
allocated to any department then we cannot insert it in the table so, it will create an insertion anomaly.

Deletion Anomaly

If we delete some rows from the table and if any other information or data which is required is also deleted
from the database, this is called the deletion anomaly in the database. For example, in the above table, if
we want to delete the department number ECT669 then the details of Rajesh will also be deleted since
Rajesh's details are dependent on the row of ECT669. So, there will be deletion anomalies in the table.

10. Define BCNF in DBMS:

The Boyce Codd Normal Form is also known as 3.5NF since it is a higher version of 3NF which
was developed to handle specific sorts of anomalies that 3NF did not solve. Once again, the
table must fulfill the 3rd Normal Form before continuing to BCNF. Moreover, every Right-Hand Side
(RHS) attribute of the functional dependencies should be dependent on the table's super key.
Now, let’s look at an example in order to better understand the principle of BCNF:

the problem here is that the Department is a prime attribute, while the ManagerID is not a primary
attribute, therefore, the table doesn’t satisfy the BCNF. In order to solve this issue and put the table
in Boyce Codd Normal Form, we have to split the table into two tables, as shown below:

In this table, the EmployeeID and Department form the primary key, meaning that the Department
attribute is a prime attribute.

The problem here is that the Department is a prime attribute, while the ManagerID is not a primary
attribute, therefore, the table doesn’t satisfy the BCNF. In order to solve this issue and put the table
in Boyce Codd Normal Form, we have to split the table into two tables, as shown below:

11. What is permanency in DBMS?

Durability ensures the permanency of something. In DBMS, the term durability ensures that the data
after the successful execution of the operation becomes permanent in the database. The durability of
the data should be so perfect that even if the system fails or leads to a crash, the database still survives.
However, if gets lost, it becomes the responsibility of the recovery manager for ensuring the durability
of the database. For committing the values, the COMMIT command must be used every time we make
changes.

12. Different types of privileges:


Confidentiality, integrity, and availability are the stamps of database security. Authorization is the
allowance to the user or process to access the set of objects. The type of access granted can be any
like, read-only, read, and write. Privilege means different Data Manipulation Language(DML) operations
which can be performed by the user on data like INSERT, UPDATE, SELECT and DELETE, etc.
There are two methods by which access control is performed is done by using the following.
1. Privileges
2. Roles
Let’s discuss one by one.
Privileges:
The authority or permission to access a named object as advised manner, for example, permission to
access a table. Privileges can allow permitting a particular user to connect to the database. In, other
words privileges are the allowance to the database by the database object.
 Database privileges —
A privilege is permission to execute one particular type of SQL statement or access a second
persons’ object. Database privilege controls the use of computing resources. Database privilege
does not apply to the Database administrator of the database.

 System privileges —
A system privilege is the right to perform an activity on a specific type of object. for example, the
privilege to delete rows of any table in a database is system privilege. There are a total of 60
different system privileges. System privileges allow users to CREATE, ALTER, or DROP the
database objects.

 Object privilege —
An object privilege is a privilege to perform a specific action on a particular table, function, or
package. For example, the right to delete rows from a table is an object privilege. For example, let
us consider a row of table GEEKSFORGEEKS that contains the name of the employee who is no
longer a part of the organization, then deleting that row is considered as an object privilege. Object
privilege allows the user to INSERT, DELETE, UPDATE, or SELECT the data in the database
object.

Roles
A role is a mechanism that can be used to allow authorization. A person or a group of people can be
allowed a role or group of roles. By many roles, the head can manage access privileges very easily. The
roles are provided by the database management system for easy and managed or controlled privilege
management.
Properties –
The following are the properties of the roles which allow easy privilege management inside a database:
 Reduced privilege administration —
The user can grant the privilege for a group of users who are related instead of granting the same
set of privileges to the users explicitly.
 Dynamic privilege management —
If the privilege of the group changes then, only the right of role needs to be changed.
 Application-specific security —
The user can also protect the use of a role by using a password. Applications can be created to
allow a role when entering the correct and best password. Users are not allowed the role if they do
not know about the password.

13. Data models:


Data Models

Data Model is the modeling of the data description, data semantics, and consistency constraints of the
data. It provides the conceptual tools for describing the design of a database at each level of data
abstraction. Therefore, there are following four data models used for understanding the structure of the
database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns within a
table. Thus, a relational model uses tables for representing data and in-between relationships. Tables are
also called relations. This model was initially described by Edgar F. Codd, in 1969. The relational data
model is the widely used model which is primarily used by commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and
relationships among them. These objects are known as entities, and relationship is an association among
these entities. This model was designed by Peter Chen and published in 1976 papers. It was widely used
in database designing. A set of attributes describe the entities. For example, student_name, student_id
describes the 'student' entity. A set of the same type of entities is known as an 'Entity set', and the set of
the same type of relationships is known as 'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and
object identity, as well. This model supports a rich type system that includes structured and collection
types. Thus, in 1980s, various database systems following the object-oriented approach were developed.
Here, the objects are nothing but the data carrying its properties. Skip 10s

4) Semistructured Data Model: This type of data model is different from the other three data models
(explained above). The semistructured data model allows the data specifications at places where the
individual data items of the same type may have different attributes sets. The Extensible Markup
Language, also known as XML, is widely used for representing the semistructured data. Although XML
was initially designed for including the markup information to the text document, it gains importance
because of its application in the exchange of data.

14. Data Abstraction and Data Independence


Database systems comprise complex data structures. In order to make the system efficient in terms of
retrieval of data, and reduce complexity in terms of usability of users, developers use abstraction i.e.
hide irrelevant details from the users. This approach simplifies database design.
There are mainly 3 levels of data abstraction:
Physical: This is the lowest level of data abstraction. It tells us how the data is actually stored in
memory. The access methods like sequential or random access and file organization methods like B+
trees and hashing are used for the same. Usability, size of memory, and the number of times the
records are factors that we need to know while designing the database.
Suppose we need to store the details of an employee. Blocks of storage and the amount of memory
used for these purposes are kept hidden from the user.
Logical: This level comprises the information that is actually stored in the database in the form of
tables. It also stores the relationship among the data entities in relatively simple structures. At this
level, the information available to the user at the view level is unknown.
We can store the various attributes of an employee and relationships, e.g. with the manager can also
be stored.
View: This is the highest level of abstraction. Only a part of the actual database is viewed by the
users. This level exists to ease the accessibility of the database by an individual user. Users view data
in the form of rows and columns. Tables and relations are used to store data. Multiple views of the
same database may exist. Users can just view the data and interact with the database, storage and
implementation details are hidden from them.
Example: In case of storing customer data,
Physical level – it will contains block of storages (bytes,GB,TB,etc)
Logical level – it will contain the fields and the attributes of data.
View level – it works with CLI or GUI access of database

The main purpose of data abstraction is to achieve data independence in order to save the time and
cost required when the database is modified or altered.
Data Independence is mainly defined as a property of DBMS that helps you to change the database
schema at one level of a system without requiring to change the schema at the next level. it helps to
keep the data separated from all program that makes use of it.
We have namely two levels of data independence arising from these levels of abstraction :
Physical level data independence: It refers to the characteristic of being able to modify the physical
schema without any alterations to the conceptual or logical schema, done for optimization purposes,
e.g., the Conceptual structure of the database would not be affected by any change in storage size of
the database system server. Changing from sequential to random access files is one such example.
These alterations or modifications to the physical structure may include:

 Utilizing new storage devices.


 Modifying data structures used for storage.
 Altering indexes or using alternative file organization techniques etc.
Logical level data independence: It refers characteristic of being able to modify the logical schema
without affecting the external schema or application program. The user view of the data would not be
affected by any changes to the conceptual view of the data. These changes may include insertion or
deletion of attributes, altering table structures entities or relationships to the logical schema, etc.
14. Concept of entity relationship model. How does it help in designing relationship
models?
ER (Entity Relationship) Diagram in DBMS
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used
to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to design
view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.

For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city, street
name, pin code, etc and there will be a relationship between them.

Component of ER Diagram
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken as
an entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key
attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be represented
by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date of
birth.
3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent
the relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

15. Cardinality and Participation


Cardinality is a count of the number of times one entity can (or must) be associated with each
occurrence of another entity. Participation refers to whether an entity must participate in a
relationship with another entity to exist.
In general, cardinality tells you “How many”. Cardinality can be:

o One to one (1:1): every time one entity occurs, there is exactly one occurrence of another entity.
o One to many (1:m): every time one entity occurs, there are multiple occurrences of another entity.
o Many to many (m:m): for each occurrence of an entity, there can be one or many occurrences of another
and vice versa.
In some notations, a cardinality constraint corresponds to maximum cardinality. In other notations,
cardinality may be combined with participation (a “minimum”).

Participation can be total or partial (optional):

o Total participation is where an entity must participate in a relationship to exist. For example, an
employee must work for at least one department to exist as an employee.
o Partial (optional) participation is where the entity can exist without participating in a relationship with
another entity [1]. For example, the entity course may exist within an organization, even though it has no
current students.

16. Explain UNION and INTERSECT commands with syntax with example.

UNION
The Union is a binary set operator in DBMS. It is used to combine the result set of two select
queries. Thus, It combines two result sets into one. In other words, the result set obtained after union
operation is the collection of the result set of both the tables.

But two necessary conditions need to be fulfilled when we use the union command. These are:

1. Both SELECT statements should have an equal number of fields in the same order.
2. The data types of these fields should either be the same or compatible with each other.
The Union operation can be demonstrated as follows:
The syntax for the union operation is as follows:

SELECT (coloumn_names) from table1 [WHERE condition] UNION SELECT (coloumn_names) from
table2 [WHERE condition];

The MySQL query for the union operation can be as follows:

SELECT color_name FROM colors_a UNION SELECT color_name FROM colors_b;

INTERSECT
Intersect is a binary set operator in DBMS. The intersection operation between two selections
returns only the common data sets or rows between them. It should be noted that the intersection
operation always returns the distinct rows. The duplicate rows will not be returned by the intersect
operator.

Here also, the above conditions of the union and minus are followed, i.e., the number of fields in both
the SELECT statements should be the same, with the same data type, and in the same order for the
intersection.

The intersection operation can be demonstrated as follows:

The syntax for the intersection operation is as follows:


SELECT (coloumn_names) from table1[WHERE condition] INTERSECT SELECT (coloumn_names)
from table2 [WHERE condition];

It is to be noted that the intersect operator is not present in MySQL. But we can make use of either 'IN'
or 'Exists' operator for performing an intersection operation in MySQL.

Here, we are using the 'IN' clause for demonstrating the examples.

The MySQL query for the intersection operation using the 'IN' operator can be as follows:

SELECT color_name FROM colors_a WHERE color_name IN(SELECT color_name FROM colors_b);

17. Explain UNIQUE AND EXIST


Within the WHERE clause lies many possibilities for modifying your SQL statement. Among these
possibilities are the EXISTS, UNIQUE, DISTINCT, and OVERLAPS predicates.
EXISTS
You can use the EXISTS predicate in conjunction with a subquery to determine whether the subquery
returns any rows. If the subquery returns at least one row, that result satisfies the EXISTS condition, and
the outer query executes. Consider the following example:
Consider the following example:
SELECT FirstName, LastName
FROM CUSTOMER
WHERE EXISTS
(SELECT DISTINCT CustomerID
FROM SALES
WHERE SALES.CustomerID = CUSTOMER.CustomerID);

Here the SALES table contains all of your company’s sales transactions. The table includes
the CustomerID of the customer who makes each purchase, as well as other pertinent information. The
CUSTOMER table contains each customer’s first and last names, but no information about specific
transactions.
The subquery in the preceding example returns a row for every customer who has made at least one
purchase. The outer query returns the first and last names of the customers who made the purchases
that the SALES table records.

EXISTS is equivalent to a comparison of COUNT with zero, as the following query shows:

SELECT FirstName, LastName


FROM CUSTOMER
WHERE 0 <>
(SELECT COUNT(*)
FROM SALES
WHERE SALES.CustomerID = CUSTOMER.CustomerID);

For every row in the SALES table that contains a CustomerID that’s equal to a CustomerID in the
CUSTOMER table, this statement displays the FirstName and LastName columns in the CUSTOMER
table. For every sale in the SALES table, therefore, the statement displays the name of the customer
who made the purchase.

UNIQUE

As you do with the EXISTS predicate, you use the UNIQUE predicate with a subquery. Although
the EXISTS predicate evaluates to True only if the subquery returns at least one row,
the UNIQUE predicate evaluates to True only if no two rows returned by the subquery are identical. In
other words, the UNIQUE predicate evaluates to True only if all the rows that its subquery returns are
unique.
Consider the following example:

SELECT FirstName, LastName


FROM CUSTOMER
WHERE UNIQUE
(SELECT CustomerID FROM SALES
WHERE SALES.CustomerID = CUSTOMER.CustomerID);

This statement retrieves the names of all new customers for whom the SALES table records only one
sale. Because a null value is an unknown value, two null values aren’t considered equal to each other;
when the UNIQUE keyword is applied to a result table that contains only two null rows,
the UNIQUE predicate evaluates to True.

DISTINCT

The DISTINCT predicate is similar to the UNIQUE predicate, except in the way it treats nulls. If all the
values in a result table are UNIQUE, then they’re also DISTINCT from each other.
However, unlike the result for the UNIQUE predicate, if the DISTINCT keyword is applied to a result
table that contains only two null rows, the DISTINCT predicate evaluates to False. Two null values
are not considered distinct from each other, while at the same time they are considered to be unique.

This strange situation seems contradictory, but there’s a reason for it. In some situations, you may want
to treat two null values as different from each other — in which case, use the UNIQUE predicate. When
you want to treat the two nulls as if they’re the same, use the DISTINCT predicate.

OVERLAPS

You use the OVERLAPS predicate to determine whether two time intervals overlap each other. This
predicate is useful for avoiding scheduling conflicts. If the two intervals overlap, the predicate returns a
True value. If they don’t overlap, the predicate returns a False value.
You can specify an interval in two ways: either as a start time and an end time or as a start time and a
duration. Here are some examples:

(TIME '2:55:00', INTERVAL '1' HOUR)


OVERLAPS
(TIME '3:30:00', INTERVAL '2' HOUR)

This first example returns a True because 3:30 is less than one hour after 2:55.

(TIME '9:00:00', TIME '9:30:00')


OVERLAPS
(TIME '9:29:00', TIME '9:31:00')

This example returns a True because you have a one-minute overlap between the two intervals.

(TIME '9:00:00', TIME '10:00:00')


OVERLAPS
(TIME '10:15:00', INTERVAL '3' HOUR)

This example returns a False because the two intervals don’t overlap.

(TIME '9:00:00', TIME '9:30:00')


OVERLAPS
(TIME '9:30:00', TIME '9:35:00')

This example returns a False because even though the two intervals are contiguous, they don’t overlap.
18. Define secondary indexing
Databases are a critical component of modern applications, storing vast amounts of data and serving as
a source of information for various functions. One of the primary challenges in managing databases is
providing efficient access to the stored data. To meet this challenge, database management systems
use various techniques, including indexing, to improve the performance of data retrieval
operations. Indexing is a method that creates a separate structure, referred to as an index, from the data
stored in a database. The purpose of an index is to allow for fast access to data without having to search
through the entire dataset. There are several types of indexes, including primary indexes and secondary
indexes.
What is Secondary Indexing in Databases?
Secondary indexing is a database management technique used to create additional indexes on data
stored in a database. The main purpose of secondary indexing is to improve the performance of queries
and to simplify the search for specific records within a database. A secondary index provides an alternate
means of accessing data in a database, in addition to the primary index. The primary index is typically
created when the database is created and is used as the primary means of accessing data in the
database. Secondary indexes, on the other hand, can be created and dropped at any time, allowing for
greater flexibility in managing the database.
Benefits
 Improved Query Performance: Secondary indexes can improve the performance of queries by
reducing the amount of data that needs to be scanned to find the desired records. With a secondary
index, the database can directly access the required records, rather than having to scan the entire
table.
 Flexibility: Secondary indexes provide greater flexibility in managing a database, as they can be
created and dropped at any time. This allows for a more dynamic approach to database
management, as the needs of the database can change over time.
 Simplified Search: Secondary indexes simplify the search for specific records within a database,
making it easier to find the desired data.
 Reduced Data Storage Overhead: Secondary indexes use a compact data structure that requires
less space to store compared to the original data. This means that you can store more data in a
database while reducing the amount of storage space required.
Types of Secondary Indexes
 B-tree Index: A B-tree index is a type of index that stores data in a balanced tree structure. B-tree
indexes are commonly used in relational databases and provide efficient search, insert, and delete
operations.
 Hash Index: A hash index is a type of index that uses a hash function to map data to a specific
location within the index. Hash indexes are commonly used in non-relational databases, such as
NoSQL databases, and provide fast access to data.
 Bitmap Index: A bitmap index is a type of index that uses a bitmap to represent the data in a
database. Each bit in the bitmap represents a specific record in the database, and the value of the
bit indicates whether the record is present or not. Bitmap indexes are commonly used in data
warehousing and business intelligence applications, as they provide efficient access to large
amounts of data.
When to Use Secondary Indexing
Secondary indexing should be used in database management systems when there is a need to improve
the performance of data retrieval operations that search for data based on specific conditions. Secondary
indexing is particularly useful in the following scenarios:
 Queries with Complex Search Criteria: Secondary indexes can be used to support complex
queries that search for data based on multiple conditions. By creating a secondary index based on
the columns used in the search criteria, database management systems can access the data more
efficiently.
 Large Data Sets: Secondary indexing can be beneficial for large data sets where the time and
resources required for data retrieval operations can be significant. By creating a secondary index,
database management systems can access the data more quickly, reducing the time and resources
required for data retrieval operations.
 Frequently Accessed Data: Secondary indexing should be used for frequently accessed data to
reduce the time and resources required for data retrieval operations. This is because secondary
indexes provide a fast and efficient way to access data stored in a database.
 Sorting and Aggregating Data: Secondary indexing can be used to support sorting and
aggregating data based on specific columns. By creating a secondary index based on the columns
used for sorting and aggregating, database management systems can access the data more
efficiently, reducing the time and resources required for data retrieval operations.
 Data Structure: The data structure of a database can also affect the decision to use secondary
indexing. For example, if the data is structured as a B-tree, a B-tree index may be the most
appropriate type of secondary index.
Conclusion

Secondary indexing is an essential technique used in database management systems to improve the
performance of data retrieval operations. By creating a separate index structure based on specific
columns, database management systems can access data more quickly and efficiently, reducing the
time and resources required for data retrieval operations.
Secondary indexing provides several benefits, including improved query performance, increased
flexibility, and reduced data storage overhead. It is particularly useful in scenarios where there is a need
to support complex search criteria, access large data sets, and sort and aggregate data based on
specific columns. However, it’s important to consider the trade-offs when using secondary indexing, as
it can also add additional overhead in terms of storage and update operations. The number and size of
secondary indexes should be carefully managed to minimize the impact on database performance.

19. Explain 1 NF, 2NF, 3NF


What is Database Normalization?
Database normalization is a database design principle for organizing data in an organized and consistent
way.

It helps you avoid redundancy and maintain the integrity of the database. It also helps you eliminate
undesirable characteristics associated with insertion, deletion, and updating.

What is the Purpose of Normalization?


The main purpose of database normalization is to avoid complexities, eliminate duplicates, and organize
data in a consistent way. In normalization, the data is divided into several tables linked together with
relationships.

Database administrators are able to achieve these relationships by using primary keys, foreign keys, and
composite keys.

To get it done, a primary key in one table, for example, employee_wages is related to the value from
another table, for instance, employee_data.
N.B.: A primary key is a column that uniquely identifies the rows of data in that table. It’s a unique
identifier such as an employee ID, student ID, voter’s identification number (VIN), and so on.
A foreign key is a field that relates to the primary key in another table.
A composite key is just like a primary key, but instead of having a column, it has multiple columns.
What is 1NF 2NF and 3NF?
1NF, 2NF, and 3NF are the first three types of database normalization. They stand for first normal
form, second normal form, and third normal form, respectively.
There are also 4NF (fourth normal form) and 5NF (fifth normal form). There’s even 6NF (sixth normal
form), but the commonest normal form you’ll see out there is 3NF (third normal form).

All the types of database normalization are cumulative – meaning each one builds on top of those
beneath it. So all the concepts in 1NF also carry over to 2NF, and so on.

The First Normal Form – 1NF


For a table to be in the first normal form, it must meet the following criteria:

 a single cell must not hold more than one value (atomicity)

 there must be a primary key for identification


 no duplicated rows or columns

 each column must have only one value for each row in the table
The Second Normal Form – 2NF
The 1NF only eliminates repeating groups, not redundancy. That’s why there is 2NF.

A table is said to be in 2NF if it meets the following criteria:

 it’s already in 1NF

 has no partial dependency. That is, all non-key attributes are fully dependent on a primary key.
The Third Normal Form – 3NF
When a table is in 2NF, it eliminates repeating groups and redundancy, but it does not eliminate
transitive partial dependency.

This means a non-prime attribute (an attribute that is not part of the candidate’s key) is dependent on
another non-prime attribute. This is what the third normal form (3NF) eliminates.

So, for a table to be in 3NF, it must:

 be in 2NF

 have no transitive partial dependency.


Examples of 1NF, 2NF, and 3NF
Database normalization is quite technical, but we will illustrate each of the normal forms with examples.

Imagine we're building a restaurant management application. That application needs to store data about
the company's employees and it starts out by creating the following table of employees:

All the entries are atomic and there is a composite primary key (employee_id, job_code) so the table is in
the first normal form (1NF).
But even if you only know someone's employee_id, then you can determine their name, home_state,
and state_code (because they should be the same person). This means name, home_state,
and state_code are dependent on employee_id (a part of primary composite key). So, the table is not
in 2NF. We should separate them to a different table to make it 2NF.
Example of Second Normal Form (2NF)
home_state is now dependent on state_code. So, if you know the state_code, then you can find
the home_state value.
To take this a step further, we should separate them again to a different table to make it 3NF.

Example of Third Normal Form (3NF)


Now our database is in 3NF.

Conclusion
This article took you through what database normalization is, its purpose, and its types. We also look at
those types of normalization and the criteria a table must meet before it can be certified to be in any of
them.

It is worth noting that most tables don’t exceed the 3NF limit, but you can also take them to 4NF and
5NF, depending on requirements and the size of the data at hand.

20. Explain the types of failures.

Failure Classification

To find that where the problem has occurred, we generalize a failure into the following categories:

1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from where it can't
go any further. If a few transaction or process is hurt, then this is called as transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because
the database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.

2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be


corrupted.

3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability
to the disk or any other failure, which destroy all or part of disk storage.

22. Why database system is superior than file based system:

Advantages of DBMS over File system

Read

Discuss
File System: A File Management system is a DBMS that allows access to single files or tables at a time.
In a File System, data is directly stored in a set of files. It contains flat files that have no relation to other
files (when only one table is stored in a single file, then this file is known as a flat file).
DBMS: A Database Management System (DBMS) is application software that allows users to efficiently
define, create, maintain and share databases. Defining a database involves specifying the data types,
structures and constraints of the data to be stored in the database. Creating a database involves storing
the data on some storage medium that is controlled by DBMS. Maintaining a database involves updating
the database whenever required to evolve and reflect changes in the miniworld and also generating
reports for each change. Sharing a database involves allowing multiple users to access the database.
DBMS also serves as an interface between the database and end users or application programs. It
provides control access to the data and ensures that data is consistent and correct by defining rules on
them.
An application program accesses the database by sending queries or requests for data to the DBMS. A
query causes some data to be retrieved from the database.

Advantages of DBMS over File system:

 Data redundancy and inconsistency: Redundancy is the concept of repetition of data i.e. each
data may have more than a single copy. The file system cannot control the redundancy of data as
each user defines and maintains the needed files for a specific application to run. There may be a
possibility that two users are maintaining the data of the same file for different applications. Hence
changes made by one user do not reflect in files used by second users, which leads to inconsistency
of data. Whereas DBMS controls redundancy by maintaining a single repository of data that is
defined once and is accessed by many users. As there is no or less redundancy, data remains
consistent.
 Data sharing: The file system does not allow sharing of data or sharing is too complex. Whereas in
DBMS, data can be shared easily due to a centralized system.
 Data concurrency: Concurrent access to data means more than one user is accessing the same
data at the same time. Anomalies occur when changes made by one user get lost because of
changes made by another user. The file system does not provide any procedure to stop anomalies.
Whereas DBMS provides a locking system to stop anomalies to occur.
 Data searching: For every search operation performed on the file system, a different application
program has to be written. While DBMS provides inbuilt searching operations. The user only has to
write a small query to retrieve data from the database.
 Data integrity: There may be cases when some constraints need to be applied to the data before
inserting it into the database. The file system does not provide any procedure to check these
constraints automatically. Whereas DBMS maintains data integrity by enforcing user-defined
constraints on data by itself.
 System crashing: In some cases, systems might have crashed due to various reasons. It is a bane
in the case of file systems because once the system crashes, there will be no recovery of the data
that’s been lost. A DBMS will have the recovery manager which retrieves the data making it another
advantage over file systems.
 Data security: A file system provides a password mechanism to protect the database but how long
can the password be protected? No one can guarantee that. This doesn’t happen in the case of
DBMS. DBMS has specialized features that help provide shielding to its data.
 Backup: It creates a backup subsystem to restore the data if required.
 Interfaces: It provides different multiple user interfaces like graphical user interface and application
program interface.
 Easy Maintenance: It is easily maintainable due to its centralized nature.
DBMS is continuously evolving from time to time. It is a powerful tool for data storage and protection. In
the coming years, we will get to witness an AI-based DBMS to retrieve databases of ancient eras.

23. Compare relation, relationship type, relationship set and structural constraints
 1. Relationship Types, Sets, and Instances

Relationship - When an attribute of one entity type refers to another entity type

Represent references as relationships not attributes

Relationship type R among n entity types E1 , E2 , ..., En

Defines a set of associations among entities from these entity types

Relationship instances ri

Each ri associates n individual entities (e1 , e2 , ..., en )

Each entity ej in ri is a member of entity set

 Informally, each relationship instance ri in R is an association of entities, where the association


includes exactly one entity from each participating entity type. Each such relationship
instance ri represents the fact that the entities participating in ri are related in some way in the
corresponding miniworld situation. For example, consider a relationship
type WORKS_FOR between the two entity types EMPLOYEE and DEPARTMENT, which
associates each employee with the department for which the employee works in the
corresponding entity set. Each relationship instance in the relationship
set WORKS_FOR associates one EMPLOYEE entity and one DEPARTMENT entity. Figure 7.9
illustrates this example, where each relationship
instance ri is shown connected to the EMPLOYEE and DEPARTMENT entities that participate in ri. In
the miniworld represented by Figure 7.9, employees e1, e3, and e6 work for department d1;
employees e2 and e4 work for department d2; and employ-ees e5 and e7 work for department d3.

 In ER diagrams, relationship types are displayed as diamond-shaped boxes, which are connected
by straight lines to the rectangular boxes representing the participat-ing entity types. The
relationship name is displayed in the diamond-shaped box (see Figure 7.2).

2. Relationship Degree, Role Names, and Recursive Relationships

Degree of a Relationship Type. The degree of a relationship type is the number of participating entity
types. Hence, the WORKS_FOR relationship is of degree two. A relationship type of degree two is
called binary, and one of degree three is called ternary. An example of a ternary relationship
is SUPPLY, shown in Figure 7.10, where each relationship instance ri associates three entities—a
supplier s, a part p, and a project j—whenever s supplies part p to project j. Relationships can
generally be of any degree, but the ones most common are binary relationships. Higher-degree
relationships are generally more complex than binary relationships; we characterize them further in
Section 7.9.
Relationships as Attributes. It is sometimes convenient to think of a binary relationship type in terms
of attributes, as we discussed in Section 7.3.3. Consider the WORKS_FOR relationship type in
Figure 7.9. One can think of an attribute called Department of the EMPLOYEE entity type, where the
value of Department for each EMPLOYEE entity is (a reference to) the DEPARTMENT entity for
which that employee works. Hence, the value set for this Department attribute is the set
of all DEPARTMENT entities, which is the DEPARTMENT entity set. This is what we did in Figure
7.8 when we specified the initial design of the entity type EMPLOYEE for the COMPANY database.
However, when we think of a binary relationship as an attribute, we always have two options. In this
example, the alternative is to think of a multivalued attribute Employee of the entity
type DEPARTMENT whose values for each DEPARTMENT entity is the set of EMPLOYEE entities
who work for that department. The value set of this Employee attribute is the power set of
the EMPLOYEE entity set. Either of these two attributes—
Department of EMPLOYEE or Employee of DEPARTMENT—can represent
the WORKS_FOR relationship type. If both are represented, they are constrained to be inverses of
each other.

Role Names and Recursive Relationships. Each entity type that participates in a relationship type
plays a particular role in the relationship. The role name signifies the role that a participating entity
from the entity type plays in each relation-ship instance, and helps to explain what the relationship
means. For example, in the WORKS_FOR relationship type, EMPLOYEE plays the role
of employee or worker and DEPARTMENT plays the role of department or employer.

Role names are not technically necessary in relationship types where all the partici-pating entity
types are distinct, since each participating entity type name can be used as the role name. However,
in some cases the same entity type participates more than once in a relationship type in different
roles. In such cases the role name becomes essential for distinguishing the meaning of the role that
each participating entity plays. Such relationship types are called recursive relationships. Figure
7.11 shows an example. The SUPERVISION relationship type relates an employee to a supervisor,
where both employee and supervisor entities are members of the same EMPLOYEE entity set.
Hence, the EMPLOYEE entity type participates twice in SUPERVISION: once in the role
of supervisor (or boss), and once in the role of supervisee (or subordinate). Each relationship
instance ri in SUPERVISION associates two employee entities ej and ek, one of which plays the role
of supervisor and the other the role of supervisee. In Figure 7.11, the lines marked ‘1’ represent the
super-visor role, and those marked ‘2’ represent the supervisee role;
hence, e1 supervises e2 and e3, e4 supervises e6 and e7, and e5 supervises e1 and e4. In this
example, each relationship instance must be connected with two lines, one marked with ‘1’
(supervisor) and the other with ‘2’ (supervisee).

3. Constraints on Binary Relationship Types

Relationship types usually have certain constraints that limit the possible combinations of entities
that may participate in the corresponding relationship set. These constraints are determined from the
miniworld situation that the relationships rep-resent. For example, in Figure 7.9, if the company has a
rule that each employee must work for exactly one department, then we would like to describe this
constraint in the schema. We can distinguish two main types of binary relationship
constraints: cardinality ratio and participation.

Cardinality Ratios for Binary Relationships. The cardinality ratio for a binary relationship specifies
the maximum number of relationship instances that an entity can participate in. For example, in
the WORKS_FOR binary relationship type, DEPARTMENT:EMPLOYEE is of cardinality ratio 1:N,
meaning that each department can be related to (that is, employs) any number of employees,9 but an
employee can be related to (work for) only one department. This means that for this particular
relationship WORKS_FOR, a particular department entity can be related to any number of
employees (N indicates there is no maximum number). On the other hand, an employee can be
related to a maximum of one department. The possible cardinality ratios for binary relationship types
are 1:1, 1:N, N:1, and M:N.

An example of a 1:1 binary relationship is MANAGES (Figure 7.12), which relates a department
entity to the employee who manages that department. This represents the miniworld constraints
that—at any point in time—an employee can manage one department only and a department can
have one manager only. The relation-ship type WORKS_ON (Figure 7.13) is of cardinality ratio M:N,
because the mini
world rule is that an employee can work on several projects and a project can have several
employees.

Cardinality ratios for binary relationships are represented on ER diagrams by dis-playing 1, M, and N
on the diamonds as shown in Figure 7.2. Notice that in this notation, we can either specify no
maximum (N) or a maximum of one (1) on participation. An alternative notation (see Section 7.7.4)
allows the designer to specify a specific maximum number on participation, such as 4 or 5.

Participation Constraints and Existence Dependencies. The participation constraint specifies


whether the existence of an entity depends on its being related to another entity via the relationship
type. This constraint specifies the minimum number of relationship instances that each entity can
participate in, and is some-times called the minimum cardinality constraint. There are two types of
participa-tion constraints—total and partial—that we illustrate by example. If a company policy states
that every employee must work for a department, then an employee entity can exist only if it
participates in at least one WORKS_FOR relationship instance (Figure 7.9). Thus, the participation
of EMPLOYEE in WORKS_FOR is called total participation, meaning that every entity in the total
set of employee entities must be related to a department entity via WORKS_FOR. Total participation
is also called existence dependency. In Figure 7.12 we do not expect every employee to manage a
department, so the participation of EMPLOYEE in the MANAGES relationship type is partial,
meaning that some or part of the set of employee entities are related to some department entity
via MANAGES, but not necessarily all. We will refer to the cardinality ratio and participation
constraints, taken together, as the structural constraints of a relationship type.

In ER diagrams, total participation (or existence dependency) is displayed as a double


line connecting the participating entity type to the relationship, whereas partial participation is
represented by a single line (see Figure 7.2). Notice that in this notation, we can either specify no
minimum (partial participation) or a minimum of one (total participation). The alternative notation (see
Section 7.7.4) allows the designer to specify a specific minimum number on participation in the
relationship, such as 4 or 5.

4. Attributes of Relationship Types

Relationship types can also have attributes, similar to those of entity types. For example, to record
the number of hours per week that an employee works on a particular project, we can include an
attribute Hours for the WORKS_ON relationship type in Figure 7.13. Another example is to include
the date on which a manager started managing a department via an attribute Start_date for
the MANAGES relationship type in Figure 7.12.

Notice that attributes of 1:1 or 1:N relationship types can be migrated to one of the participating
entity types. For example, the Start_date attribute for the MANAGES relationship can be an attribute
of either EMPLOYEE or DEPARTMENT, although conceptually it belongs to MANAGES. This is
because MANAGES is a 1:1 relation-ship, so every department or employee entity participates in at
most one relationship instance. Hence, the value of the Start_date attribute can be determined
separately, either by the participating department entity or by the participating employee (manager)
entity.
For a 1:N relationship type, a relationship attribute can be migrated only to the entity type on the N-
side of the relationship. For example, in Figure 7.9, if the WORKS_FOR relationship also has an
attribute Start_date that indicates when an employee started working for a department, this attribute
can be included as an attribute of EMPLOYEE. This is because each employee works for only one
department, and hence participates in at most one relationship instance in WORKS_FOR. In both 1:1
and 1:N relationship types, the decision where to place a relationship attribute—as a relationship
type attribute or as an attribute of a participating entity type—is determined subjectively by the
schema designer.

For M:N relationship types, some attributes may be determined by the combination of participating
entities in a relationship instance, not by any single entity. Such attributes must be specified as
relationship attributes. An example is the Hours attribute of the M:N relationship WORKS_ON (Figure
7.13); the number of hours per week an employee currently works on a project is determined by an
employee-project combination and not separately by either entity.

24. Explain the use of GROUP by and HAVING clause with syntax and examples.

The GROUP BY clause is a SQL command that is used to group rows that have the same values.
The GROUP BY clause is used in the SELECT statement. Optionally it is used in conjunction with
aggregate functions to produce summary reports from the database. It summarizes data from the
database. The queries that contain the GROUP BY clause are called grouped queries and only return a
single row for every grouped item.

SQL GROUP BY Syntax


SELECT statements... GROUP BY column_name1[,column_name2,...] [HAVING condition];

HERE

 “SELECT statements…” is the standard SQL SELECT command query.


 “GROUP BY column_name1” is the clause that performs the grouping based on column_name1.
 “[,column_name2,…]” is optional; represents other column names when the grouping is done on
more than one column.
 “[HAVING condition]” is optional; it is used to restrict the rows affected by the GROUP BY clause.
It is similar to the WHERE clause.

The HAVING clause was introduced in SQL to allow the filtering of query results based on aggregate
functions and groupings, which cannot be achieved using the WHERE clause that is used to filter
individual rows. The HAVING statement helps you control which rows of data are included in each group.

In simpler terms MSSQL, the HAVING clause is used to apply a filter on the result of GROUP BY based
on the specified condition. The conditions are Boolean type i.e. use of logical operators (AND, OR). This
clause was included in SQL as the WHERE keyword failed when we use it with aggregate expressions.
Having is a very generally used clause in SQL. Similar to WHERE it helps to apply conditions, but
HAVING works with groups. If you wish to filter a group, the HAVING clause comes into action.

 Having clause is used to filter data according to the conditions provided.


 Having a clause is generally used in reports of large data.
 Having clause is only used with the SELECT clause.
 The expression in the syntax can only have constants.
 In the query, ORDER BY is to be placed after the HAVING clause, if any.
 HAVING Clause is implemented in column operation.
 Having clause is generally used after GROUP BY.

Syntax:

How will you create and manage views?

SQL CREATE VIEW Statement

In SQL, a view is a virtual table based on the result-set of an SQL statement.


A view contains rows and columns, just like a real table. The fields in a view are fields from one or more
real tables in the database.

You can add SQL statements and functions to a view and present the data as if the data were coming
from one single table.

A view is created with the CREATE VIEW statement.

CREATE VIEW Syntax

SQL CREATE VIEW Example

SQL Updating a View

A view can be updated with the CREATE OR REPLACE VIEW statement.

SQL CREATE OR REPLACE VIEW Syntax

The following SQL adds the "City" column to the "Brazil Customers" view:

SQL Dropping a View

A view is deleted with the DROP VIEW statement.


SQL DROP VIEW Syntax

The following SQL drops the "Brazil Customers" view:

25. Types of single level ordered indexes:

Types of Single-Level Ordered Indexes


The idea behind an ordered index is similar to that behind the index used in a text-book, which lists
important terms at the end of the book in alphabetical order along with a list of page numbers where the
term appears in the book. We can search the book index for a certain term in the textbook to find a list
of addresses—page numbers in this case—and use these addresses to locate the specified pages first
and then search for the term on each specified page. The alternative, if no other guidance is given, would
be to sift slowly through the whole textbook word by word to find the term we are interested in; this
corresponds to doing a linear search, which scans the whole file. Of course, most books do have additional
information, such as chapter and section titles, which help us find a term without having to search through
the whole book. However, the index is the only exact indication of the pages where each term occurs in
the book.
For a file with a given record structure consisting of several fields (or attributes), an index access
structure is usually defined on a single field of a file, called an indexing field (or indexing
attribute). The index typically stores each value of the index field along with a list of pointers to all disk
blocks that contain records with that field value. The values in the index are ordered so that we can do
a binary search on the index. If both the data file and the index file are ordered, and since the index file is
typically much smaller than the data file, searching the index using a binary search is a better option.
Tree-structured multilevel indexes implement an extension of the binary search idea that reduces the
search space by 2-way partitioning at each search step, thereby creating a more efficient approach that
divides the search space in the file n-ways at each stage.

There are several types of ordered indexes. A primary index is specified on the ordering key field of
an ordered file of records. An ordering key field is used to physically order the file records on disk, and
every record has a unique value for that field. If the ordering field is not a key field—that is, if numerous
records in the file can have the same value for the ordering field— another type of index, called
a clustering index, can be used. The data file is called a clustered file in this latter case. Notice that a
file can have at most one physical ordering field, so it can have at most one primary index or one
clustering index, but not both. A third type of index, called a secondary index, can be specified on
any nonordering field of a file. A data file can have several secondary indexes in addition to its primary
access method. We discuss these types of single-level indexes in the next three subsections.

1. Primary Indexes
A primary index is an ordered file whose records are of fixed length with two fields, and it acts like an
access structure to efficiently search for and access the data records in a data file. The first field is of the
same data type as the ordering key field—called the primary key—of the data file, and the second field is
a pointer to a disk block (a block address). There is one index entry (or index record) in the index file for
each block in the data file. Each index entry has the value of the primary key field for the first record in a
block and a pointer to that block as its two field values. We will refer to the two field values of index
entry i as <K(i), P(i)>.

To create a primary index on the ordered file shown in Figure 17.7, we use the Name field as primary key,
because that is the ordering key field of the file (assuming that each value of Name is unique). Each entry
in the index has a Name value and a pointer. The first three index entries are as follows:

<K(1) = (Aaron, Ed), P(1) = address of block 1>

<K(2) = (Adams, John), P(2) = address of block 2>

<K(3) = (Alexander, Ed), P(3) = address of block 3>

The total number of entries in the index is the same as the number of disk blocks in the ordered data file.
The first record in each block of the data file is called the anchor record of the block, or simply the block
anchor.

Indexes can also be characterized as dense or sparse. A dense index has an index entry for every search
key value (and hence every record) in the data file. A sparse (or nondense) index, on the other hand,
has index entries for only some of the search values. A sparse index has fewer entries than the number
of records in the file. Thus, a primary index is a nondense (sparse) index, since it includes an entry for
each disk block of the data file and the keys of its anchor record rather than for every search value (or
every record).

The index file for a primary index occupies a much smaller space than does the data file, for two reasons.
First, there are fewer index entries than there are records in the data file. Second, each index entry is
typically smaller in size than a data record because it has only two fields; consequently, more index entries
than data records can fit in one block. Therefore, a binary search on the index file requires fewer block
accesses than a binary search on the data file.

A record whose primary key value is K lies in the block whose address is P(i), where K(i) ≤ K < K(i + 1).
The ith block in the data file contains all such records because of the physical ordering of the file records
on the primary key field. To retrieve a record, given the value K of its primary key field, we do a binary
search on the index file to find the appropriate index entry i, and then retrieve the data file block whose
address is P(i).3 Example 1 illustrates the saving in block accesses that is attainable when a primary index
is used to search for a record.

Example 1. Suppose that we have an ordered file with r = 30,000 records stored on a disk with block
size B = 1024 bytes. File records are of fixed size and are unspanned, with record length R = 100 bytes.
The blocking factor for the file would be bfr = (B/R) = (1024/100) = 10 records per block. The number of
blocks needed for the file is b = (r/bfr) = (30000/10) = 3000 blocks. A binary search on the data file would
need approximately log2b = (log23000) = 12 block accesses.

Now suppose that the ordering key field of the file is V = 9 bytes long, a block pointer is P = 6 bytes long,
and we have constructed a primary index for the file. The size of each index entry is Ri = (9 + 6) = 15 bytes,
so the blocking factor for the index is bfri = (B/Ri) = (1024/15) = 68 entries per block. The total number of
index entries ri is equal to the number of blocks in the data file, which is 3000. The num-ber of index blocks
is hence bi = (ri/bfri) = (3000/68) = 45 blocks. To perform a binary search on the index file would need
(log2bi) = (log245) = 6 block accesses. To search for a record using the index, we need one additional block
access to the data file for a total of 6 + 1 = 7 block accesses—an improvement over binary search on the
data file, which required 12 disk block accesses.

A major problem with a primary index—as with any ordered file—is insertion and deletion of records. With
a primary index, the problem is compounded because if we attempt to insert a record in its correct position
in the data file, we must not only move records to make space for the new record but also change some
index entries, since moving records will change the anchor records of some blocks. Another possibility is
to use a linked list of overflow records for each block in the data file. Records within each block and its
overflow linked list can be sorted to improve retrieval time. Record deletion is handled using deletion
markers.

2. Clustering Indexes

If file records are physically ordered on a nonkey field—which does not have a distinct value for each
record—that field is called the clustering field and the data file is called a clustered file. We can create
a different type of index, called a clustering index, to speed up retrieval of all the records that have the
same value for the clustering field. This differs from a primary index, which requires that the ordering field
of the data file have a distinct value for each record.

A clustering index is also an ordered file with two fields; the first field is of the same type as the clustering
field of the data file, and the second field is a disk block pointer. There is one entry in the clustering index
for each distinct value of the clustering field, and it contains the value and a pointer to the first block in the
data file that has a record with that value for its clustering field.

A clustering index is another example of a nondense index because it has an entry for every distinct
value of the indexing field, which is a nonkey by definition and hence has duplicate values rather than a
unique value for every record in the file. The main difference is that an index search uses the values of the
search field itself, whereas a hash directory search uses the binary hash value that is calculated by
applying the hash function to the search field.

3. Secondary Indexes

A secondary index provides a secondary means of accessing a data file for which some primary access
already exists. The data file records could be ordered, unordered, or hashed. The secondary index may
be created on a field that is a candidate key and has a unique value in every record, or on a nonkey field
with duplicate values. The index is again an ordered file with two fields. The first field is of the same data
type as some nonordering field of the data file that is an indexing field. The second field is either
a block pointer or a record pointer. Many secondary indexes (and hence, indexing fields) can be created
for the same file—each represents an additional means of accessing that file based on some specific field.

First we consider a secondary index access structure on a key (unique) field that has a distinct value for
every record. Such a field is sometimes called a secondary key; in the relational model, this would
correspond to any UNIQUE key attribute or to the primary key attribute of a table. In this case there is one
index entry for each record in the data file, which contains the value of the field for the record and a pointer
either to the block in which the record is stored or to the record itself. Hence, such an index is dense.
Again we refer to the two field values of index entry i as <K(i), P(i)>. The entries are ordered by value
of K(i), so we can perform a binary search. Because the records of the data file are not physically
ordered by values of the secondary key field, we cannot use block anchors. That is why an index entry is
created for each record in the data file, rather than for each block, as in the case of a primary index.
A secondary index usually needs more storage space and longer search time than does a primary index,
because of its larger number of entries. However, the improvement in search time for an arbitrary record
is much greater for a secondary index than for a primary index, since we would have to do a linear
search on the data file if the secondary index did not exist. For a primary index, we could still use a
binary search on the main file, even if the index did not exist. Example 2 illustrates the improvement in
number of blocks accessed.

Example 2. Consider the file of Example 1 with r = 30,000 fixed-length records of size R = 100 bytes
stored on a disk with block size B = 1024 bytes. The file has b = 3000 blocks, as calculated in Example
1. Suppose we want to search for a record with a specific value for the secondary key—a nonordering
key field of the file that is V = 9 bytes long. Without the secondary index, to do a linear search on the file
would require b/2 = 3000/2 = 1500 block accesses on the average. Suppose that we con-struct a
secondary index on that nonordering key field of the file. As in Example 1, a block pointer is P = 6 bytes
long, so each index entry is Ri = (9 + 6) = 15 bytes, and the blocking factor for the index is bfri = (B/Ri) =
(1024/15) = 68 entries per block. In a dense secondary index such as this, the total number of index
entries ri is equal to the number of records in the data file, which is 30,000. The number of blocks
needed for the index is hence bi = (ri /bfri) = (3000/68) = 442 blocks.

A binary search on this secondary index needs (log2bi) = (log2442) = 9 block accesses. To search for a
record using the index, we need an additional block access to the data file for a total of 9 + 1 = 10 block
accesses—a vast improvement over the 1500 block accesses needed on the average for a linear search,
but slightly worse than the 7 block accesses required for the primary index. This difference arose because
the primary index was nondense and hence shorter, with only 45 blocks in length.

We can also create a secondary index on a nonkey, nonordering field of a file. In this case, numerous
records in the data file can have the same value for the indexing field. There are several options for
implementing such an index:

Option 1 is to include duplicate index entries with the same K(i) value—one for each record. This would
be a dense index.

Option 2 is to have variable-length records for the index entries, with a repeating field for the pointer. We
keep a list of pointers <P(i, 1), ..., P(i, k)> in the index entry for K(i)—one pointer to each block that contains
a record whose indexing field value equals K(i). In either option 1 or option 2, the binary search algorithm
on the index must be modified appropriately to account for a variable number of index entries per index
key value.

Option 3, which is more commonly used, is to keep the index entries them-selves at a fixed length and
have a single entry for each index field value, but to create an extra level of indirection to handle the
multiple pointers. In this nondense scheme, the pointer P(i) in index entry <K(i), P(i)> points to a disk block,
which contains a set of record pointers; each record pointer in that disk block points to one of the data file
records with value K(i) for the index-ing field. If some value K(i) occurs in too many records, so that their
record pointers cannot fit in a single disk block, a cluster or linked list of blocks is used.

Notice that a secondary index provides a logical ordering on the records by the indexing field. If we
access the records in order of the entries in the secondary index, we get them in order of the indexing
field. The primary and clustering indexes assume that the field used for physical ordering of records in
the file is the same as the indexing field.
Extra Questions

You might also like