Database MGMT
Database MGMT
1. Define Database: A database is a collection of logically related data stored in an efficient and
compact manner. A database is usually controlled by a database management system (DBMS).
2. What is meant by the E-R model? ER model stands for an Entity-Relationship model. It is a high-
level data model. This model is used to define the data elements and relationship for a specified
system.
It develops a conceptual design for the database. It also develops a very simple and easy to design view of
data. In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city, street
name, pin code, etc and there will be a relationship between them.
3. Define Schema : A database schema defines how data is organized within a relational database;
this is inclusive of logical constraints such as, table names, fields, data types, and the relationships
between these entities. Schemas commonly use visual representations to communicate the
architecture of the database, becoming the foundation for an organization’s data management
discipline. This process of database schema design is also known as data modeling.
4. What is generalization? Generalization is like a bottom-up approach in which two or more entities
of lower level combine to form a higher level entity if they have some attributes in common.
In generalization, an entity of a higher level can also combine with the entities of the lower level to form a
further higher level entity. Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up approach. In generalization, entities are
combined to form a more generalized entity, i.e., subclasses are combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person.
5. What is specialization?
Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics.
Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
relationship set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
6. Types of functional dependency:
A full functional dependency is a functional dependency where the dependent attributes are determined by
the determinant attributes. For example, in the database of employees, the employee ID number fully
determines the employee’s name, address, and other personal information.
A partial functional dependency is a functional dependency where the dependent attributes are partially
determined by the determinant attributes. For example, in a database of employees, the employee ID
number may partially determine the employee’s address, but not the employee’s name or other personal
information.
A transitive functional dependency is a functional dependency where the dependent attributes are
determined by a set of attributes that are not included in the determinant attributes. For example, in a
database of employees, the employee ID number may determine the employee’s department, which in turn
determines the employee’s salary.
In full functional dependency, the non-prime In partial functional dependency, the non-prime
attribute is functionally dependent on the attribute is functionally dependent on part of a
2. candidate key. candidate key.
Data integrity is the overall accuracy, completeness, and consistency of data. Data integrity also refers to
the safety of data in regard to regulatory compliance — such as GDPR compliance — and security. It is
maintained by a collection of processes, rules, and standards implemented during the design phase. When
the integrity of data is secure, the information stored in a database will remain complete, accurate, and
reliable, no matter how long it’s stored or how often it’s accessed.
The importance of data integrity in protecting yourself from data loss or a data leak cannot be overstated. In
order to keep your data safe from outside forces acting with malicious intent, you must first ensure that
internal users are handling data correctly. By implementing the appropriate data validation and error
checking, you can ensure that sensitive data is never mis categorized or stored incorrectly, thus exposing
you to potential risk.
Data integrity in SQL databases refers to ensuring that each row of a table is uniquely identified so that
data can be retrieved separately. To achieve this, you need constraints on columns (constraints are sets of
rules). Data constraints prevent invalid data entry into the base tables of the database, which helps
maintain data integrity.
Referential integrity is a property of data that ensures the accuracy and consistency of data within a
relationship, where data is linked between two or more tables by foreign keys referencing primary keys. It
requires that whenever a foreign key value is used, it must reference a valid, existing primary key in the
parent table. This helps to prevent incorrect records from being added, deleted, or modified.
For example, if we delete a row in a primary table, we need to ensure that there’s no foreign key in any
related table with the value of the deleted row. We should only be able to delete a primary key if there are
no associated rows. Otherwise, we would end up with an orphaned record.
Referential integrity is a subset of data integrity, which is concerned with the accuracy and consistency of
all data (relationship or otherwise). Maintaining data integrity is a crucial part of working with databases.
Data control language (DCL) is used to access the stored data. It is mainly used for revoke and to grant the
user the required access to a database. In the database, this language does not have the feature of
rollback.
It is a part of the structured query language (SQL). It helps in controlling access to information stored in a
database. It complements the data manipulation language (DML) and the data definition language (DDL).
It is the simplest among three commands. It provides the administrators, to remove and set database
permissions to desired users as needed.
These commands are employed to grant, remove and deny permissions to users for retrieving and
manipulating a database.
A weak entity is an entity set that does not have sufficient attributes to form a primary key. It depends on
some other entity (known as owner entity) to ensure its existence. The weak entities have total participation
constraint (existence dependency) in its identifying relationship with owner identity. Weak entity types have
partial keys. Partial Keys are set of attributes with the help of which the tuples of the weak entities can be
distinguished and identified.
A strong entity is not dependent of any other entity in the relation. A strong entity will always have a primary
key. Strong entities are represented by a single rectangle. The relationship of two strong entities is
represented by a single diamond.
A weak entity is dependent on a strong entity to ensure its existence. Unlike a strong entity, a weak entity
does not have any primary key. It instead has a partial discriminator key. A weak entity is represented by a
double rectangle. The relation between one strong and one weak entity is represented by a double
diamond.
1. Strong entity always has a primary key. While a weak entity has a partial discriminator key.
Two strong entity’s relationship is While the relation between one strong and one weak
4.
represented by a single diamond. entity is represented by a double diamond.
S.NO Strong Entity Weak Entity
Entity Set
An entity set is a collection of similar types of entities that share the same attributes.
For example: All students of a school are a entity set of Student entities.
Creating a View
Views are created using the (SQL) CREATE VIEW statement. So for example, to create a view called say,
“NewCustomers” you would start with:
Insert operation is used to add new data to a database. Update operation is used to modify existing data in
a database. Delete operation is used to delete data from a databaseSelect operation is used to retrieve
data from a database.
Data manipulation is performed using tools like SQL (Structured Query Language), DML (Data
Manipulation Language), and Excel. SQL is used for structured data manipulation, whereas NoSQL
languages like MongoDB are used for unstructured data manipulation.
Data manipulation is one of the initial processes done in data analysis. It involves arranging or rearranging
data points to make it easier for users/data analysts to perform necessary insights or business
directives. The steps involved in data manipulation are: mine the data and create a database, perform data
preprocessing, arrange the data, transform the data, and perform data analysis.
On the other hand, a foreign key is a column or group of columns in a relational database table that
provides a link between data in two tables. It is a column (or columns) that references a column (most often
the primary key) of another table. A foreign key is used to identify the relationship between the tables
through the primary key of one table that is the primary key one table acts as a foreign key to another table
In summary, a primary key is used to ensure that data in the specific column is unique and a foreign key is
used to identify the relationship between the tables through the primary key of one table that is the primary
key one table acts as a foreign key to another table
To change the data type of a column in a table, use the following syntax:
ALTER TABLE table_name ALTER COLUMN column_name datatype;
For the EMPLOYEES table given above, the degree is 6. That is there are 6 attributes (columns/fields)
in this table.
STUDENT
RegNo SName Gen Phone
R1 Sundar M 9898786756
R3 Karthik M 8798987867
R4 John M 7898886756
R2 Ram M 9897786776
For the STUDENT table given above, the degree is 4. That is there are 4 attributes in the STUDENT
table.
In SQL, Inner Join and Outer Join are two types of join operations used to combine data from two or
more tables. The main difference between the two is that an Inner Join returns only the rows that
match in both tables, while an Outer Join returns all the rows from one table and matching rows from
the other table.
Table A
+----+-------+
| ID | Name |
+----+-------+
| 1 | John |
| 2 | Jane |
| 3 | Alice |
+----+-------+
Table B
+----+-------+
| ID | Color |
+----+-------+
| 1 | Red |
| 2 | Blue |
+----+-------+
If we perform an Inner Join on Table A and Table B using the ID column, we get the following result:
Inner Join
+----+------+-------+
| ID | Name | Color |
+----+------+-------+
| 1 | John | Red |
| 2 | Jane | Blue |
+----+------+-------+
On the other hand, if we perform a Left Outer Join on Table A and Table B using the ID column, we
get the following result:
Left Outer Join
+----+-------+-------+
| ID | Name | Color |
+----+-------+-------+
| 1 | John | Red |
| 2 | Jane | Blue |
| 3 | Alice | NULL |
+----+-------+-------+
All the rows from Table A are returned, along with the matching rows from Table B. If there is no
matching row in Table B, then NULL is returned for the columns of Table B.
Locking in a database management system is used for handling transactions in databases. The two-phase
locking protocol ensures serializable conflict schedules. A schedule is called conflict serializable if it can be
transformed into a serial schedule by swapping non-conflicting operations.
Two-Phase Locking
Shared Lock: Data can only be read when a shared lock is applied. Data cannot be written. It is denoted
as lock-S
Exclusive lock: Data can be read as well as written when an exclusive lock is applied. It is denoted
as lock-X
Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all the data changes
are stored, only then the locks are released.
1. The semantics of the procedural language is quite tough as compared to the non-procedural
language.
2. The functions in a nonprocedural programming language can return any type of data (data type) and
value. On the other hand, procedural language functions are not able to return all type of data and
value only restricted datatype and values are allowed.
3. For the time critical application the non-procedural language performs effectively while in case of
less time-critical application the procedural language produces better results and is more efficient.
4. Among the programs constructed in the procedural and nonprocedural language, the procedural
language program are of larger size while non-procedural language programs have a small size.
Procedural languages are command-driven and statement-oriented. The program code is written as
a sequence of instructions, where the user has to specify “what to do” and also “how to do” (step by
step procedure). These instructions are executed in the sequential order. Examples of procedural
languages include FORTRAN, COBOL, ALGOL, BASIC, C and Pascal 1.
On the other hand, non-procedural languages are function-driven and applicative. The user has to
specify only “what to do” and not “how to do”. It involves the development of the functions from other
functions to construct more complex functions. Examples of non-procedural languages
include SQL, PROLOG, and LISP
4 SURESH DELHI 18
A relational model diagram is a graphical representation of the structure and constraints of a relational
database. A relational model diagram consists of one or more tables, each with a unique name and a set of
attributes. Each attribute has a name and a data type, and optionally a domain or a set of allowed values.
A relational model diagram also shows the relationships between the tables, which are based on the
concept of keys. A key is an attribute or a set of attributes that can uniquely identify a tuple (row) in a table.
A primary key is a key that is chosen to be the main identifier for a table. A foreign key is a key that
references a primary key in another table, and establishes a link between the two tables.
A relational model diagram can also include integrity constraints, which are rules that specify the valid
states of the database. Some common types of integrity constraints are:
Domain constraints: These specify the valid values for an attribute, such as a range, a list, or a pattern.
Entity integrity: This ensures that every tuple in a table has a unique and non-null primary key value.
Referential integrity: This ensures that every foreign key value in a table either matches a primary key
value in another table, or is null.
Other constraints: These are any additional rules that apply to the database, such as uniqueness, check,
or not null constraints.
What is a Database Management System?
A database management system (DBMS) is software designed to store, retrieve, and manage data. The
most prevalent DBMS in an enterprise database system is the RDBMS. The complete form of RDBMS is
Relational Database Management System. Now that it is clear what a database management system is,
let’s learn about the relational database management system.
According to E. F. Codd’s relational model, an RDBMS allows users to construct, update, manage, and
interact with a relational database, allowing storing data in a tabular form. Therefore, consider RDBMS as
an advanced data management system that makes gaining insights from data a lot easier. But why do we
need a relational database?
Today, various businesses use relational database architecture instead of flat files or hierarchical
databases for their company database management system (DBMS). So, what is the reason for creating a
relational database? A relational database is purpose-built to efficiently handle a wide range of data
formats and process queries. And how is data in a relational database management system organized?
The answer to this is simple: a relational database management system organizes data in tables that can
be linked internally depending on shared data. This allows a user to retrieve one or more tables with just
one query easily. On the other hand, flat-file stores data in a single table structure, which is less efficient
and consumes more space and memory.
Hence, we need a relational database. An example of a relational database management system could be
a production department in an organization that leverages this model to process purchases and track
inventory.
The most commercially available and company-wide database management system or relational database
management system in use today is Structured Query Language (SQL database) to access the database.
Other widely used relational database management systems for companies include Oracle database,
MySQL, PostgreSQL (an open-source relational database), and Microsoft SQL Server. RDBMS structures
are commonly used to perform four basic operations: CRUD (create, read, update and delete), which are
critical in supporting consistent data management.
Now that you know the definition of an RDBMS let’s look at how it differs from a DBMS and the
characteristics of a relational database system.
Here are some of the main differences between an RDBMS and a DBMS:
Number of operators:
A DBMS allows only a single operator simultaneously, whereas multiple users can operate an
RDBMS concurrently. An RDBMS uses intricate algorithms that enable several users to access the
database while preserving data integrity simultaneously, significantly reducing response time.
A DBMS utilizes fewer data storage and retrieval resources than an RDBMS. The latter is more
complex due to its multi-table structure and cross-referencing capability, making it costlier than a
DBMS. RDBMSs are also generally used for enterprise-class applications, while DBMSs are more
commonly utilized for smaller, purpose-specific applications.
Data modification:
Altering data in a DBMS is quite difficult, whereas you can easily modify data in an RDBMS using
an SQL query. Thus, programmers can change/access multiple data elements simultaneously. This
is one of the reasons why an RDBMS is more efficient than a DBMS.
Data volume:
A DBMS is more suitable for handling low data volume, whereas an RDBMS can handle even large
data volumes.
A DBMS doesn’t involve keys and indexes, whereas an RDBMS specifies a relationship between
data elements via keys and indexes.
Data consistency:
As a DBMS does not follow the ACID (Atomicity, Consistency, Isolation, and Durability) model, the
data stored can have inconsistencies. In contrast, an RDBMS follows the ACID model, making it
structured and consistent.
Database structure:
A DBMS works by storing data in a hierarchical structure, while an RDBMS stores data in tables.
Data fetching speed:
In a DBMS, the process is relatively slow, especially when data is complex and extensive. This is
because each of the data elements must be fetched individually. In an RDBMS, data is fetched
faster because of the relational approach. Plus, SQL facilitates quicker data retrieval in an RDBMS.
Distributed databases:
A DBMS doesn’t support distributed databases, whereas an RDBMS offers full support for
distributed databases.
Client-server architecture:
Data is stored in a relational database in the form of multiple tables. A key question here arises, how does
a database structure work, and how is it implemented? Let’s understand this in detail.
A database structure works by arranging every table into rows (known as records/ tuples) and columns
(known as fields/attributes). Tables, columns, and rows are the three major components of a relational
database.
The pros of a relational database management system offer a systematic view of data, which helps
businesses improve their decision-making processes by enhancing different areas.
The authorization and access control features in relational database software support advanced encryption
and decryption, enabling database administrators to manage access to the stored data. This offers
significant benefits in terms of security. In addition, operators can modify access to the database tables and
even limit the available data to others. This makes RDBMSs an ideal data storage solution for businesses
where the higher management needs to control data access for workers and clients.
It is easier to add new data or modify existing tables in an RDBMS while maintaining data consistency with
the existing format. This is mainly because an RDBMS is ACID-compliant.
Better Flexibility and Scalability
An RDBMS offers more flexibility when updating data as the modifications only have to be made once. For
instance, updating the details in the main table will automatically update the relevant files and save you the
trouble of changing several files one by one. Plus, each table can be altered independently without
disturbing the others. This makes relational databases scalable for growing data volumes.
Easy Maintenance
Relational databases are considered low-maintenance because users can quickly test, regulate, fix and
back up data as the automation tool in RDBMS help systematize these tasks.
In relational database software, you can easily check for errors against the data in different records.
Furthermore, as each data item is stored at a single location, there’s no possibility of older versions blurring
the picture.
Conclusion
Over time, RDBMSs have evolved to provide increasingly advanced query optimization and sophisticated
plugins for enterprise developers. As a result, various enterprise applications of relational database
management systems exist. They also serve as a focal point in numerous applications, such as reporting,
analytics, and data warehousing.
In SQL, there are two types of privileges: system privileges and object privileges. System privileges
allow users to create, alter, or drop database objects, while object privileges allow users to execute,
select, insert, update, or delete data from database objects to which the privileges apply
System privileges are the rights to perform certain actions on the database or on specific types of
database objects. For example, the system privilege to create a table allows a user to create a table in their
own schema1. System privileges can also control the use of computing resources, such as the amount of
disk space or CPU time that a user can consume1. There are about 60 different system privileges in
Oracle1.
Object privileges are the rights to perform specific operations on a particular database object, such as a
table, view, sequence, or procedure2. For example, the object privilege to insert rows into a table allows a
user to insert data into that table1. Object privileges can be granted by the owner of the object or by another
user who has been granted the privilege with the WITH GRANT OPTION clause1.
Some examples of system privileges and object privileges are:
CREATE TABLE is a system privilege that allows a user to create a table in their own schema 1.
SELECT is an object privilege that allows a user to query data from a table or view1.
DROP ANY TABLE is a system privilege that allows a user to drop any table in the database, regardless
of the owner1.
UPDATE is an object privilege that allows a user to modify data in a table or view 1.
CREATE ANY INDEX is a system privilege that allows a user to create an index on any table in the
database, regardless of the owner1.
EXECUTE is an object privilege that allows a user to run a stored procedure or function1.
Tuple Relational Calculus (TRC), a non-procedural query language used in relational database
management systems (RDBMS) to retrieve data from tables. TRC is based on the concept of tuples, which
are ordered sets of attribute values that represent a single row or record in a database table. TRC is a
declarative language, meaning that it specifies what data is required from the database, rather than how to
retrieve it. TRC queries are expressed as logical formulas that describe the desired tuples.
where t is a tuple variable and P(t) is a logical formula that describes the conditions that the
tuples in the result must satisfy. The curly braces {} are used to indicate that the expression is a set of
tuples.
For example, let’s say we have a table called “Employees” with the following attributes: Employee ID,
Name, Salary, Department ID. To retrieve the names of all employees who earn more than $50,000 per
year, we can use the following TRC query:
TRC can also be used to perform more complex queries, such as joins and nested queries, by using
additional logical operators and expressions. While TRC is a powerful query language, it can be more
difficult to write and understand than other SQL-based query languages, such as Structured Query
Language (SQL). However, it is useful in certain applications, such as in the formal verification of database
schemas and in academic research.
Data retrieval is the process of obtaining data from a database management system (DBMS). In
databases, data retrieval is the process of identifying and extracting data from a database, based on a
query provided by the user or application. It enables the fetching of data from a database in order to
display it on a monitor and/or use within an application.
To retrieve the desired data, the user presents a set of criteria by a query. A query language, like
Structured Query Language (SQL), is used to prepare the queries.
SQL Server is the product of Microsoft which provides the facility to insert, update, delete and retrieve
data from database table so if you need to see the records of the table or a specific row or column then
you can also do it. Suppose that you have created a database and some tables to store the data in a
separate form and want to show or retrieve the data to see that is it correct or missing then you can do
it with the help of “Select” command. The Structured Query Language offers database users a powerful
and flexible data retrieval mechanism — the SELECT statement.
The SQL SELECT statement retrieves data from one or more tables in a database, allowing you to easily
access the specific information you need. This statement is commonly used and is essential for efficient
data retrieval. You can streamline your database queries using the SELECT statement and get the precise
data you require.
There are many commands which help to retrieve the data according to the different condition. Some of
them are where, order by, distinct, group by etc.
26. Draw and explain the 3 level architecture of the database system;
The architecture of a database system is greatly influenced by the underlying computer system on which
the database system runs. Database systems can be centralized, or client-server, where one server
machine executes work on behalf of multiple client machines. Database systems can also be designed to
exploit parallel computer architectures. Distributed databases span multiple geographically separated
machines
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system. The functional components of a database system can be broadly divided into the storage manager
and the query processor components. The storage manager is important because databases typically
require a large amount of storage space. The query processor is important because it helps the database
system simplify and facilitate access to data. It is the job of the database system to translate updates and
queries written in a nonprocedural language, at the logical level, into an efficient sequence of operations at
the physical level.
Database applications are usually partitioned into two or three parts. In a two-tier architecture, the
application resides at the client machine, where it invokes database system functionality at the server
machine through query language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server. In contrast, in a three-tier architecture, the
client machine acts as merely a front end and does not contain any direct database calls. Instead, the client
end communicates with an application server, usually through a forms interface.
The application server in turn communicates with a database system to access data. The business logic of
the application, which says what actions to carry out under what conditions, is embedded in the application
server, instead of being distributed across multiple clients. Three-tier applications are more appropriate for
large applications, and for applications that run on the WorldWideWeb.
There are several types of DBMS Architecture that we use according to the usage requirements. Types of
DBMS Architecture are discussed here.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
1-Tier Architecture
In 1-Tier Architecture the database is directly available to the user, the user can directly sit on the DBMS
and use it that is, the client, server, and Database are all present on the same machine. For Example: to
learn SQL we set up an SQL server and the database on the local system. This enables us to directly
interact with the relational database and execute operations. The industry won’t use this architecture they
logically go for 2-tier and 3-tier Architecture.
In a database, data is organized strictly in row and column format. The rows are called Tuple or Record.
The data items within one row may belong to different data types. On the other hand, the columns are often
called Domain or Attribute. All the data items within a single attribute are of the same data type.
Domain of Attributes The set of possible values that an attribute can take is called the domain of the
attribute. For example, the attribute day may take any value from the set {Monday, Tuesday ... Friday}.
Hence this set can be termed as the domain of the attribute day.
Key attribute The attribute (or combination of attributes) which is unique for every entity instance is called
key attribute. E.g the employee_id of an employee, pan_card_number of a person etc.If the key attribute
consists of two or more attributes in combination, it is called a composite key.
Simple attribute If an attribute cannot be divided into simpler components, it is a simple attribute. Example
for simple attribute : employee_id of an employee.
Composite attribute If an attribute can be split into components, it is called a composite attribute. Example
for composite attribute : Name of the employee which can be split into First_name, Middle_name, and
Last_name.
Single valued Attributes If an attribute can take only a single value for each entity instance, it is a single
valued attribute. example for single valued attribute : age of a student. It can take only one value for a
particular student.
Multi-valued Attributes If an attribute can take more than one value for each entity instance, it is a multi-
valued attribute. Multi-valued example for multi valued attribute : telephone number of an employee, a
particular employee may have multiple telephone numbers.
Derived Attribute An attribute which can be calculated or derived based on other attributes is a derived
attribute. Example for derived attribute : age of employee which can be calculated from date of birth and
current date.
Normalization is a database design technique that reduces data redundancy and eliminates
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides
larger tables into smaller tables and links them using relationships. The purpose of Normalisation in
SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third Normal
Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
1NF Example
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key that does not functionally dependant on any subset of
candidate key relation
It is clear that we can’t move forward to make our simple database in 2 nd Normalization form unless we
partition the table above.
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member
information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1. Records can
be uniquely identified in Table 1 using membership id
Foreign Key references the primary key of another Table! It helps connect your Tables
A foreign key can have a different name from its primary key
It ensures rows in one table have corresponding rows in another
Unlike the Primary key, they do not have to be unique. Most often they aren’t
Foreign keys can be null even though primary keys can not
3NF (Third Normal Form) Rules
Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
Below is a 3NF example in SQL database:
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3
Now our little example is at a level that cannot further be decomposed to attain higher normal form types of
normalization in DBMS. In fact, it is already in higher normalization forms. Separate efforts for moving into
next levels of normalizing data are normally needed in complex databases. However, we will be discussing
next levels of normalisation in DBMS in brief in the following.
The DDL Commands in Structured Query Language are used to create and modify the schema of the
database and its objects. The syntax of DDL commands is predefined for describing the data. The commands
of Data Definition Language deal with how the data should exist in the database.
1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command
CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database objects.
Example 1: This example describes how to create a new database using the CREATE DDL command.
Suppose, you want to create a Books database in the SQL database. To do this, you have to write the
following DDL Command:
Example 2: This example describes how to create a new table using the CREATE DDL command.
Suppose, you want to create a Student table with five columns in the SQL database. To do this, you have
to write the following DDL command:
DROP Command
DROP is a DDL command used to delete/remove the database objects from the SQL database. We can
easily remove the entire table, view, or index from the database using this DDL command.
Example 1: This example describes how to remove a database from the SQL database.
Syntax to remove a database:
Suppose, you want to delete the Books database from the SQL database. To do this, you have to write the
following DDL command:
Example 2: This example describes how to remove the existing table from the SQL database.
Suppose, you want to delete the Student table from the SQL database. To do this, you have to write the
following DDL command:
ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the database, and it also
changes the schema of database objects.
We can also add and drop constraints of the table using the ALTER command.
Example 1: This example shows how to add a new field to the existing table.
Suppose, you want to add the 'Father's_Name' column in the existing Student table. To do this, you have
to write the following DDL command:
TRUNCATE Command
TRUNCATE is another DDL command which deletes or removes all the records from the table.
This command also removes the space allocated for storing the table records.
Suppose, you want to delete the record of the Student table. To do this, you have to write the following
TRUNCATE DDL command:
TRUNCATE TABLE Student;
The above query successfully removed all the records from the student table.
RENAME Command
RENAME is a DDL command which is used to change the name of the database table.
This query changes the name of the table from Student to Student_Details.
Concurrency control concept comes under the Transaction in database management system
(DBMS). It is a procedure in DBMS which helps us for the management of two simultaneous
processes to execute without conflicts between each other, these conflicts occur in multi user
systems. Concurrency can simply be said to be executing multiple transactions at a time. It is
required to increase time efficiency. If many transactions try to access the same data, then
inconsistency arises. Concurrency control required to maintain consistency data. For example, if
we take ATM machines and do not use concurrency, multiple persons cannot draw money at a
time in different places. This is where we need concurrency.
The advantages of concurrency control are as follows − Waiting time will be decreased.
Response time will decrease. Resource utilization will increase. System performance &
Efficiency is increased.
4. Validation Concurrency Control: The optimistic approach is based on the assumption that the
majority of the database operations do not conflict. The optimistic approach requires neither locking nor
time stamping techniques. Instead, a transaction is executed without restrictions until it is committed.
Using an optimistic approach, each transaction moves through 2 or 3 phases, referred to as read,
validation and write.
(i) During read phase, the transaction reads the database, executes the needed computations and
makes the updates to a private copy of the database values. All update operations of the
transactions are recorded in a temporary update file, which is not accessed by the remaining
transactions.
(ii) During the validation phase, the transaction is validated to ensure that the change s made will not
affect the integrity and consistency of the database. If the validation test is positive, the transaction
goes to a write phase. If the validation test is negative, he transaction is restarted and the changes
are discarded.
(iii) During the write phase, the changes are permanently applied to the database.
Real-Life Example
Scenario: A world-famous band, “The Algorithmics,” is about to release tickets for their farewell concert.
Given their massive fan base, the ticketing system is expected to face a surge in access requests.
EventBriteMax must ensure that ticket sales are processed smoothly without double bookings or system
failures.
Two-Phase Locking Protocol (2PL):
Usage: Mainly for premium ticket pre-sales to fan club members. These sales occur a day before
the general ticket release.
Real-Life Example: When a fan club member logs in to buy a ticket, the system uses 2PL. It locks
the specific seat they choose during the transaction. Once the transaction completes, the lock is
released. This ensures that no two fan club members can book the same seat at the same time.
Timestamp Ordering Protocol:
Usage: For general ticket sales.
Real-Life Example: As thousands rush to book their tickets, each transaction gets a timestamp. If
two fans try to book the same seat simultaneously, the one with the earlier timestamp gets priority.
The other fan receives a message suggesting alternative seats.
Multi-Version Concurrency Control (MVCC):
Usage: Implemented in the mobile app version of the ticketing platform.
Real-Life Example: Fans using the mobile app see multiple versions of the seating chart. When a
fan selects a seat, they’re essentially choosing from a specific version of the seating database. If
their choice conflicts with a completed transaction, the system offers them the next best seat based
on the latest version of the database. This ensures smooth mobile user experience without
frequent transactional conflicts.
Validation Concurrency Control:
Usage: For group bookings where multiple seats are booked in a single transaction.
Real-Life Example: A group of friends tries to book 10 seats together. They choose their seats
and proceed to payment. Before finalizing, the system validates that all 10 seats are still available
(i.e., no seat was booked by another user in the meantime). If there’s a conflict, the group is
prompted to choose a different set of seats. If not, their booking is confirmed.
The concert ticket sales go off without a hitch. Fans rave about the smooth experience, even with such high
demand. Behind the scenes, EventBriteMax’s effective implementation of the four concurrency control
protocols played a crucial role in ensuring that every fan had a fair chance to purchase their ticket and no
seats were double-booked. The Algorithmics go on to have a fantastic farewell concert, with not a single
problem in the ticketing process.
Conclusion
Thus, Concurrency control techniques in Database Management Systems (DBMS) are pivotal for
maintaining data integrity, consistency, and reliability in multi-user database environments. These methods
prevent multiple transactions from interfering with one another, preventing possible data inconsistencies
and clashes.
A transaction is a single logical unit of work that accesses and possibly modifies the contents of a
database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain properties
are followed. These are called ACID properties.
Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is
no midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either
runs to completion or is not executed at all. It involves the following two operations.
—Abort: If a transaction aborts, changes made to the database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but
before write(Y)), then the amount has been deducted from X but not added to Y. This results in an
inconsistent database state. Therefore, the transaction must be executed in its entirety in order to ensure
the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database is consistent before and
after the transaction. It refers to the correctness of a database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a
result, T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of the database state. Transactions occur independently without interference. Changes
occurring in a particular transaction will not be visible to any other transaction until that particular change
in that transaction is written to memory or has been committed. This property ensures that the execution
of transactions concurrently will result in a state that is equivalent to a state achieved these wer e
executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of operations
takes place due to which T’’ reads the correct value of X but the incorrect value of Y and sum computed
by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in
isolation and changes should be visible only after they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and modifications
to the database are stored in and written to disk and they persist even if a system failure occurs. These
updates now become permanent and are stored in non-volatile memory. The effects of the transaction,
thus, are never lost.
Some important points:
Property Responsibility for maintaining properties
The ACID properties, in totality, provide a mechanism to ensure the correctness and consistency of a
database in a way such that each transaction is a group of operations that acts as a single unit, produces
consistent results, acts in isolation from other operations, and updates that it makes are durably stored.
ACID properties are the four key characteristics that define the reliability and consistency of a transaction
in a Database Management System (DBMS). The acronym ACID stands for Atomicity, Consistency,
Isolation, and Durability. Here is a brief description of each of these properties:
1. Atomicity: Atomicity ensures that a transaction is treated as a single, indivisible unit of work. Either
all the operations within the transaction are completed successfully, or none of them are. If any part
of the transaction fails, the entire transaction is rolled back to its original state, ensuring data
consistency and integrity.
2. Consistency: Consistency ensures that a transaction takes the database from one consistent state to
another consistent state. The database is in a consistent state both before and after the transaction
is executed. Constraints, such as unique keys and foreign keys, must be maintained to ensure data
consistency.
3. Isolation: Isolation ensures that multiple transactions can execute concurrently without interfering
with each other. Each transaction must be isolated from other transactions until it is completed. This
isolation prevents dirty reads, non-repeatable reads, and phantom reads.
4. Durability: Durability ensures that once a transaction is committed, its changes are permanent and
will survive any subsequent system failures. The transaction’s changes are saved to the database
permanently, and even if the system crashes, the changes remain intact and can be recovered.
Overall, ACID properties provide a framework for ensuring data consistency, integrity, and reliability in
DBMS. They ensure that transactions are executed in a reliable and consistent manner, even in the
presence of system failures, network issues, or other problems. These properties make DBMS a reliable
and efficient tool for managing data in modern organizations.
1. Data Consistency: ACID properties ensure that the data remains consistent and accurate after any
transaction execution.
2. Data Integrity: ACID properties maintain the integrity of the data by ensuring that any changes to the
database are permanent and cannot be lost.
3. Concurrency Control: ACID properties help to manage multiple transactions occurring concurrentl y
by preventing interference between them.
4. Recovery: ACID properties ensure that in case of any failure or crash, the system can recover the
data up to the point of failure or crash.
1. Performance: The ACID properties can cause a performance overhead in the system, as they require
additional processing to ensure data consistency and integrity.
2. Scalability: The ACID properties may cause scalability issues in large distributed systems where
multiple transactions occur concurrently.
3. Complexity: Implementing the ACID properties can increase the complexity of the system and
require significant expertise and resources.
Overall, the advantages of ACID properties in DBMS outweigh the disadvantages. They provide a
reliable and consistent approach to data
management, ensuring data integrity, accuracy, and reliability. However, in some cases, the
overhead of implementing ACID properties can cause performance and scalability issues. Therefore,
it’s important to balance the benefits of ACID properties against the specific needs and requirements
of the system.
32. Give SQL statements which create a STUDENT table consisting of the following fields: Name
CHAR(40) Class CHAR(6) Marks NUMBER(4) Rank CHAR(8)
);
33. Define functional dependency. Give the inference rules of functional dependencies.
A functional dependency is a relationship between two sets of attributes in a database, where one
set (the determinant) determines the values of the other set (the dependent).
For example, in a database of employees, the employee ID number (determinant) would determine the
employee’s name, address, and other personal information (dependent). This means that, given an
employee ID number, we can determine the corresponding employee’s name and other personal
information, but not vice versa.
Functional dependencies can also be represented using mathematical notation. For example, the functional
dependency above can be represented as:
It’s important to note that functional dependencies only apply to the individual tuples in the table, and not to
the table as a whole.
For example, if a database contains a table with the attributes “employee ID” and “employee name”, and
another table with the attributes “employee ID” and “employee address”, then there is a functional
dependency between “employee ID” and “employee name” in the first table, and between “employee ID”
and “employee address” in the second table.
By combining these two tables into one, with the attributes “employee ID”, “employee name”, and
“employee address”, the data redundancy is eliminated.
For example, if a database contains a table with the attributes “employee ID” and “employee name”, and
another table with the attributes “employee ID” and “employee address”, then there is a functional
dependency between “employee ID” and “employee name” in the first table, and between “employee ID”
and “employee address” in the second table. If the employee’s name is changed in the first table, but not in
the second table, then the data is inconsistent.
By combining these two tables into one, with the attributes “employee ID”, “employee name”, and
“employee address”, the data inconsistencies are eliminated.
There are several types of functional dependencies, including full functional dependencies, partial
functional dependencies, and transitive functional dependencies.
A full functional dependency is a functional dependency where the dependent attributes are determined
by the determinant attributes. For example, in the database of employees, the employee ID number fully
determines the employee’s name, address, and other personal information.
A partial functional dependency is a functional dependency where the dependent attributes are partially
determined by the determinant attributes. For example, in a database of employees, the employee ID
number may partially determine the employee’s address, but not the employee’s name or other personal
information.
A transitive functional dependency is a functional dependency where the dependent attributes are
determined by a set of attributes that are not included in the determinant attributes. For example, in a
database of employees, the employee ID number may determine the employee’s department, which in turn
determines the employee’s salary.
Functional dependencies are a crucial aspect of database design and are used to ensure that the database
is in a state of normalization. They help to minimize data redundancy and improve data integrity. However,
it’s important to note that functional dependencies are not the only factor to consider when designing a
database. Other factors such as performance and scalability should also be taken into account.
One of the most common ways to represent functional dependencies is using the Armstrong’s Axioms.
These are a set of rules that can be used to infer functional dependencies from a given set of functional
dependencies. These rules include reflexivity, augmentation, and transitivity.
Another way to represent functional dependencies is using the Normal Forms. Normal Forms are a set of
rules that are used to determine the degree of normalization of a database. There are several Normal
Forms, including First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and so
on.
First Normal Form (1NF) requires that each table have a primary key, and that all data in the table is atomic
(indivisible).
Second Normal Form (2NF) requires that the table is in 1NF, and that all non-primary key attributes are
functionally dependent on the primary key.
Third Normal Form (3NF) requires that the table is in 2NF, and that all non-primary key attributes are not
functionally dependent on any non-primary key attributes.
It’s important to note that functional dependencies are not always easy to identify, and may require a
thorough understanding of the data and the relationships between the data. Additionally, it’s not always
possible to achieve higher Normal Forms, and trade-offs may need to be made between normalization and
performance.
Conclusion
Functional dependencies are a crucial aspect of database design and are used to ensure that the database
is in a state of normalization. They help to minimize data redundancy and improve data integrity. However,
it’s important to note that functional dependencies are not the only factor to consider when designing a
database. Other factors such as performance and scalability should also be taken into account.
1. Reflexive Rule (IR1)
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
1. X → Y then XZ → YZ
Example:
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
1. If X → Y and Y → Z then X → Z
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
1. → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Integrity Constraints are the protocols that a table's data columns must follow. These are used to restrict
the types of information that can be entered into a table. This means that the data in the database is
accurate and reliable. You may apply integrity Constraints at the column or table level. The table-level
Integrity constraints apply to the entire table, while the column level constraints are only applied to one
column.
Domain Integrity Constraints define the permissible values for a given column. By applying these
constraints, you can restrict the data entered into a specific column, ensuring consistent data values across
your database.
Data type – The column must contain values of a specific data type
Data format – The format of the values in a column must follow a defined pattern
Range – The values must fall within a specified range
Enumeration – The values in the column can only be taken from a predefined set of values
For example, if you have a table containing information about employees' salaries, you might enforce a
domain integrity constraint on the "salary" column to ensure that only numeric values within a specific
range are entered.
Entity Integrity Constraints involve uniquely identifying the rows in a database table, such that there are no
duplicate or null values in a primary key column. A primary key is a unique column in a table that uniquely
identifies every row in the table. This constraint helps maintain the uniqueness and integrity of data by
preventing the existence of duplicate rows.
For instance, in a table storing customer information, a unique identification number (‘customer_id’) can
be assigned as the primary key to uniquely identify every customer.
Referential Integrity Constraint ensures that relationships between tables are maintained consistently. It is
enforced by using foreign keys, which are columns in a table that refer to a primary key in another table.
The foreign key helps to maintain the referential integrity between two related tables by making sure that
changes in one table's primary key are reflected in the corresponding foreign key in another table.
There are two main rules to uphold when it comes to Referential Integrity Constraints:
If a primary key value is updated or deleted, the corresponding foreign key values in the related
table must be updated or deleted as well.
Any new foreign key value added to the related table must have a corresponding primary key value
in the other table.
35. Describe the basic operators of relational algebra with an example for each
Relational Algebra is a procedural query language that takes relations as an input and returns
relations as an output. There are some basic operators which can be applied in relation to producing
the required results which we will discuss one by one. We will use STUDENT_SPOR TS,
EMPLOYEE, and STUDENT relations as given in Table 1, Table 2, and Table 3 respectively to
understand the various operators.
Table 1: STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2: EMPLOYEE
Table 3: STUDENT
Selection operator (σ): Selection operator is used to selecting tuples from a relation based on some
condition. Syntax:
σ (Cond)(Relation Name)
Extract students whose age is greater than 18 from STUDENT relation given in Table 3
σ (AGE>18)(STUDENT)
RESULT:
Projection Operator (∏): Projection operator is used to project particular columns from a
relation. Syntax:
∏(Column 1,Column 2….Column n)(Relation Name)
Extract ROLL_NO and NAME from STUDENT relation given in Table 3
∏(ROLL_NO,NAME)(STUDENT)
RESULT:
ROLL_NO NAME
1 RAM
2 RAMESH
3 SUJIT
4 SURESH
Note: If the resultant relation after projection has duplicate rows, it will be removed. For
Example ∏(ADDRESS)(STUDENT) will remove one duplicate row with the value DELHI and return three
rows.
Cross Product(X): Cross product is used to join two relations. For every row of Relation1, each row of
Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n tuples, cross product of
Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and STUDENT_SPORTS relation given
in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:
Union (U): Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have the same number of attributes and corresponding
attributes in two relations have the same domain). Union operator when applied on two relations R1 and
R2 will give a relation with tuples that are either in R1 or in R2. The tuples which are in both R1 and R2
will appear only once in the result relation. Syntax:
Relation1 U Relation2
Find the person who is either student or employees, we can use Union operators like:
STUDENT U EMPLOYEE
RESULT:
Minus (-): Minus on two relations R1 and R2 can only be computed if R1 and R2 are union compatible.
Minus operator when applied on two relations as R1-R2 will give a relation with tuples that are in R1 but
not in R2. Syntax:
Relation1 - Relation2
Find the person who is a student but not an employee, we can use minus operator like:
STUDENT - EMPLOYEE
RESULT:
SQL aggregate functions are functions that operate on a set of values and return a single value. They are
often used with the GROUP BY clause to divide the result set into groups of values and calculate a
summary statistic for each group1.
Some of the commonly used SQL aggregate functions are:
COUNT(): This function returns the number of rows in a table or a group. It can also count the number of
distinct or non-null values in a column2.
SUM(): This function returns the sum of all or distinct values in a column or a group2.
AVG(): This function returns the average of all or distinct values in a column or a group. It ignores null
values1.
MIN(): This function returns the minimum value in a column or a group. It ignores null values 1.
MAX(): This function returns the maximum value in a column or a group. It ignores null values 1.
The following illustrates how the aggregate function is used with the GROUP BY clause:
AVG
The AVG() function returns the average values in a set. The following illustrates the syntax of
the AVG() function:
The ALL keyword instructs the AVG() function to calculate the average of all values while
the DISTINCT keyword forces the function to operate on distinct values only. By default, the ALL option is
used.
The following example shows how to use the AVG() function to calculate the average salary of each
department:
SELECT
department_name, ROUND(AVG(salary), 0) avg_salary
FROM
employees
INNER JOIN
departments USING (department_id)
GROUP BY department_name
ORDER BY department_name;Code language: SQL (Structured Query Language) (sql)
MIN
The MIN() function returns the minimum value of a set. The following illustrates the syntax of
the MIN() function:
MIN(column | expression)Code language: SQL (Structured Query Language) (sql)
For example, the following statement returns the minimum salary of the employees in each department:
SELECT
department_name, MIN(salary) min_salary
FROM
employees
INNER JOIN
departments USING (department_id)
GROUP BY department_name
ORDER BY department_name;Code language: SQL (Structured Query Language) (sql)
MAX
The MAX() function returns the maximum value of a set. The MAX() function has the following syntax:
For example, the following statement returns the highest salary of employees in each department:
SELECT
department_name, MAX(salary) highest_salary
FROM
employees
INNER JOIN
departments USING (department_id)
GROUP BY department_name
ORDER BY department_name;Code language: SQL (Structured Query Language) (sql)
COUNT
The COUNT() function returns the number of items in a set. The following shows the syntax of
the COUNT() function:
COUNT ( [ALL | DISTINCT] column | expression | *)Code language: SQL (Structured Query Language)
(sql)
For example, the following example uses the COUNT(*) function to return the headcount of each
department:
SELECT
department_name, COUNT(*) headcount
FROM
employees
INNER JOIN
departments USING (department_id)
GROUP BY department_name
ORDER BY department_name;Code language: SQL (Structured Query Language) (sql)
SUM
The SUM() function returns the sum of all values. The following illustrates the syntax of the SUM() function:
For example, the following statement returns the total salary of all employees in each department:
SELECT
department_id, SUM(salary)
FROM
employees
GROUP BY department_id;
37. Draw the system architecture of DBMS. Explain each component in detail.
A Database stores a lot of critical information to access data quickly and securely. Hence it is important to
select the correct architecture for efficient data management. DBMS Architecture helps users to get their
requests done while connecting to the database. We choose database architecture depending on several
factors like the size of the database, number of users, and relationships between the users. There are two
types of database models that we generally use, logical model and physical model. Several types of
architecture are there in the database which we will deal with in the next section.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage requirements. Types of
DBMS Architecture are discussed here.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
1-Tier Architecture
In 1-Tier Architecture the database is directly available to the user, the user can directly sit on the DBMS
and use it that is, the client, server, and Database are all present on the same machine. For Example: to
learn SQL we set up an SQL server and the database on the local system. This enables us to directly
interact with the relational database and execute operations. The industry won’t use this architecture they
logically go for 2-tier and 3-tier Architecture.
The DML commands in Structured Query Language change the data present in the SQL database. We can
easily access, store, modify, update and delete the existing records from the database using DML
commands.
1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command
SELECT DML Command
SELECT is the most important data manipulation command in Structured Query Language. The SELECT
command shows the records of the specified table. It also shows the particular record of a particular
column by using the WHERE clause.
Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of those columns whose
data we want to retrieve from the table.
If we want to retrieve the data from all the columns of the table, we have to use the following SELECT
command:
Example 1: This example shows all the values of every column from the table.
This SQL statement displays the following values of the student table:
BCA1001 Abhay 85
BCA1002 Anuj 75
BCA1003 Bheem 60
BCA1004 Ram 79
BCA1005 Sumit 80
INSERT is another most important data manipulation command in Structured Query Language, which allows
users to insert data in database tables.
Example 1: This example describes how to insert the record in the database table.
Let's take the following student table, which consists of only 2 records of the student.
101 Ramesh 92 20
201 Jatin 83 19
Suppose, you want to insert a new record into the student table. For this, you have to write the following DML
INSERT command:
INSERT INTO Student (Stu_id, Stu_Name, Stu_Marks, Stu_Age) VALUES (104, Anmol, 89, 19);
UPDATE is another most important data manipulation command in Structured Query Language, which allows
users to update or modify the existing data in database tables.
DATE Table_name SET [column_name1= value_1, ….., column_nameN = value_N] WHERE CONDITION;
Here, 'UPDATE', 'SET', and 'WHERE' are the SQL keywords, and 'Table_name' is the name of the table
whose values you want to update.
Example 1: This example describes how to update the value of a single field.
P101 Chips 20 20
P102 Chocolates 60 40
P103 Maggi 75 5
P201 Biscuits 80 20
P203 Namkeen 40 50
Suppose, you want to update the Product_Price of the product whose Product_Id is P102. To do this, you
have to write the following DML UPDATE command:
A database administrator, or DBA, is responsible for maintaining, securing, and operating databases and
also ensures that data is correctly stored and retrieved. In addition, DBAs often work with developers to
design and implement new features and troubleshoot any issues.
Decides hardware –
They decide on economical hardware, based on cost, performance, and efficiency of hardware, and
best suits the organization. It is hardware that is an interface between end users and the database.
Manages data integrity and security –
Data integrity needs to be checked and managed accurately as it protects and restricts data from
unauthorized use. DBA eyes on relationships within data to maintain data integrity.
Database Accessibility –
Database Administrator is solely responsible for giving permission to access data available in the
database. It also makes sure who has the right to change the content.
Database design –
DBA is held responsible and accountable for logical, physical design, external model design, and
integrity and security control.
Database implementation –
DBA implements DBMS and checks database loading at the time of its implementation.
Query processing performance –
DBA enhances query processing by improving speed, performance, and accuracy.
Tuning Database Performance –
If the user is not able to get data speedily and accurately then it may lose organization’s business.
So by tuning SQL commands DBA can enhance the performance of the database.
41. Data Independence : Data Independence is defined as a property of DBMS that helps us to
change the Database schema at one level of a database system without requiring to change the
schema at the next higher level. Data independence helps us to keep data separated from all
programs that make use of it.
With Physical independence, you can easily change the physical storage structures or devices with an
effect on the conceptual schema. Any change done would be absorbed by the mapping between the
conceptual and internal levels. Physical data independence is achieved by the presence of the internal
level of the database and then the transformation from the conceptual level of the database to the internal
level.
1. External views
2. External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.
When compared to Physical Data independence, it is challenging to achieve logical data independence.
Modification at the logical levels is significant Modifications made at the internal levels may or may
whenever the logical structures of the database are not be needed to improve the performance of the
changed. structure.
Summary
Data Independence is the property of DBMS that helps you to change the Database schema at one
level of a database system without requiring to change the schema at the next higher level.
Two levels of data independence are 1) Physical and 2) Logical
Physical data independence helps you to separate conceptual levels from the internal/physical
levels
Logical Data Independence is the ability to change the conceptual scheme without changing
When compared to Physical Data independence, it is challenging to achieve logical data
independence
Data Independence Helps you to improve the quality of the data
A transaction is a single logical unit of work which accesses and possibly modifies the contents of a
database. Transactions access data using read and write operations. As transactions deal with accessing
and modifying the contents of the database, they must have some basic properties which help maintain
the consistency and integrity of the database before and after the transaction. Transactions follow 4
properties, namely, Atomicity, Consistency, Isolation, and Durability. Generally, these are referred to as
ACID properties of transactions in DBMS.
Operations of Transaction
A user can make different types of requests to access and modify the contents of a database. So, we have
different types of operations relating to a transaction. They are discussed as follows:
i) Read(X)
A read operation is used to read the value of X from the database and store it in a buffer in the main
memory for further actions such as displaying that value. Such an operation is performed when a user
wishes just to see any content of the database and not make any changes to it. For example, when a user
wants to check his/her account’s balance, a read operation would be performed on user’s account balance
from the database.
ii) Write(X)
A write operation is used to write the value to the database from the buffer in the main memory. For a write
operation to be performed, first a read operation is performed to bring its value in buffer, and then some
changes are made to it, e.g. some set of arithmetic operations are performed on it according to the user’s
request, then to store the modified value back in the database, a write operation is performed. For example,
when a user requests to withdraw some money from his account, his account balance is fetched from the
database using a read operation, then the amount to be deducted from the account is subtracted from this
value, and then the obtained value is stored back in the database using a write operation.
iii) Commit
This operation in transactions is used to maintain integrity in the database. Due to some failure of power,
hardware, or software, etc., a transaction might get interrupted before all its operations are completed.
This may cause ambiguity in the database, i.e. it might get inconsistent before and after the transaction.
To ensure that further operations of any other transaction are performed only after work of the current
transaction is done, a commit operation is performed to the changes made by a transaction permanently
to the database.
iv) Rollback
This operation is performed to bring the database to the last saved state when any transaction is interrupted
in between due to any power, hardware, or software failure. In simple words, it can be said that a rollback
operation does undo the operations of transactions that were performed before its interruption to achieve
a safe state of the database and avoid any kind of ambiguity or inconsistency.
Transaction Schedules
When multiple transaction requests are made at the same time, we need to decide their order of execution.
Thus, a transaction schedule can be defined as a chronological order of execution of multiple transactions.
There are broadly two types of transaction schedules discussed as follows,
i) Serial Schedule
In this kind of schedule, when multiple transactions are to be executed, they are executed serially, i.e. at
one time only one transaction is executed while others wait for the execution of the current transaction to
be completed. This ensures consistency in the database as transactions do not execute simultaneously.
But, it increases the waiting time of the transactions in the queue, which in turn lowers the throughput of
the system, i.e. number of transactions executed per time. To improve the throughput of the system,
another kind of schedule are used which has some more strict rules which help the database to remain
consistent even when transactions execute simultaneously.
ii) Non-Serial Schedule
To reduce the waiting time of transactions in the waiting queue and improve the system efficiency, we use
nonserial schedules which allow multiple transactions to start before a transaction is completely executed.
This may sometimes result in inconsistency and errors in database operation. So, these errors are handled
with specific algorithms to maintain the consistency of the database and improve CPU throughput as well.
Serial Schedules are also sometimes referred to as parallel schedules as transactions execute in parallel
in this kind of schedules.
Serializable
Serializability in DBMS is the property of a nonserial schedule that determines whether it would maintain
the database consistency or not. The nonserial schedule which ensures that the database would be
consistent after the transactions are executed in the order determined by that schedule is said to be
Serializable Schedules. The serial schedules always maintain database consistency as a transaction starts
only when the execution of the other transaction has been completed under it. Thus, serial schedules are
always serializable.
A transaction is a series of operations, so various states occur in its completion journey. They are
discussed as follows:
i) Active
It is the first stage of any transaction when it has begun to execute. The execution of the transaction takes
place in this state. Operations such as insertion, deletion, or updation are performed during this state.
During this state, the data records are under manipulation and they are not saved to the database, rather
they remain somewhere in a buffer in the main memory.
ii) Partially Committed
This state of transaction is achieved when it has completed most of the operations and is executing its
final operation. It can be a signal to the commit operation, as after the final operation of the transaction
completes its execution, the data has to be saved to the database through the commit operation. If some
kind of error occurs during this state, the transaction goes into a failed state, else it goes into the Committed
state.
iii) Commited
This state of transaction is achieved when all the transaction-related operations have been executed
successfully along with the Commit operation, i.e. data is saved into the database after the required
manipulations in this state. This marks the successful completion of a transaction.
iv) Failed
If any of the transaction-related operations cause an error during the active or partially committed state,
further execution of the transaction is stopped and it is brought into a failed state. Here, the database
recovery system makes sure that the database is in a consistent state.
v) Aborted
If the error is not resolved in the failed state, then the transaction is aborted and a rollback operation is
performed to bring database to the the last saved consistent state. When the transaction is aborted, the
database recovery module either restarts the transaction or kills it.
The illustration below shows the various states that a transaction may encounter in its completion
journey.
A privilege in a database management system is the permission to execute certain actions on the
database.
1. Access a table
2. Access permission to execute a database command
3. Access another user’s object
Privileges make certain actions possible, such as connecting to a database, creating a table, and executing
another user’s stored procedure.
Privileges are granted to users so they can accomplish a given task. If privileges are not granted with
guidance, this can lead to a security bridge on the database.
Categories of privileges
1. System privileges
2. Object privileges
1. System privileges
System privileges are mostly granted to administrative personnel and application developers. This privilege
is usually not open to end-users of the database.
2. Object privileges
Object privilege is the permission to access specific database objects. Object privilege entails performing a
specific action on a particular table, function, or package.
The right to delete rows from a table is an object privilege. Object privileges are granted to normal users,
unlike system privileges. Some of the popular object privileges are given below.
Conclusion
Both system and object privileges are very important in a database management system, as they help
secure the data stored in a database system.
There are two methods by which access control is performed is done by using the following.
1. Privileges
2. Roles
Privileges :
The authority or permission to access a named object as advised manner, for example,
permission to access a table. Privileges can allow permitting a particular use r to connect to the
database. In, other words privileges are the allowance to the database by the database object.
Database privileges —
A privilege is permission to execute one particular type of SQL statement or access a second
persons’ object. Database privilege controls the use of computing resources. Database privilege
does not apply to the Database administrator of the database.
System privileges —
A system privilege is the right to perform an activity on a specific type of object. for example, the
privilege to delete rows of any table in a database is system privilege. There are a total of 60
different system privileges. System privileges allow users to CREATE, ALTER, or DROP the
database objects.
Object privilege —
An object privilege is a privilege to perform a specific action on a particular table, function, or
package. For example, the right to delete rows from a table is an object privilege. For example, let us
consider a row of table GEEKSFORGEEKS that contains the name of the employee who is no longer
a part of the organization, then deleting that row is considered as an object privilege. Object privilege
allows the user to INSERT, DELETE, UPDATE, or SELECT the data in the datab ase object.
Roles:
A role is a mechanism that can be used to allow authorization. A person or a group of people can be
allowed a role or group of roles. By many roles, the head can manage access privileges very easily. The
roles are provided by the database management system for easy and managed or controlled privilege
management.
Properties –
The following are the properties of the roles which allow easy privilege management inside a database:
Reduced privilege administration —
The user can grant the privilege for a group of users who are related instead of granting the same
set of privileges to the users explicitly.
Dynamic privilege management —
If the privilege of the group changes then, only the right of role needs to be changed.
Application-specific security —
The user can also protect the use of a role by using a password. Applications can be created to allow
a role when entering the correct and best password. Users are not allowed the role if they do not
know about the password.
Let us now learn about different ways of granting privileges to the users:
Granting SELECT Privilege to a User in a Table:
1. To grant Select Privilege to a table named “users” where User Name is Amit, the following GRANT
statement should be executed.
2. The general syntax of specifying a username is: ‘user_name’@’address’
3. If the user ‘Amit’ is on the local host then we have to mention it as ‘Amit’@’localhost’. Or suppose if
the ‘Amit’ username is on 192.168.1.100 IP address then we have to mention it as
‘Amit’@’192.168.1.100’.
‘user_name’@’address’ – When you’re granting or revoking permissions in MySQL, you use the
‘username’ or ‘hostname’ format to tell which users are allowed or denied. This is important for keeping
security and access control in place, so here’s why we use it:
Granularity of Access Control
Multi-User Environments
User identification
GRANT SELECT ON Users TO 'Amit'@'localhost;'
Granting more than one Privilege to a User in a Table: To grant multiple Privileges to a user
named “Amit” in a table “users”, the following GRANT statement should be executed.
GRANT SELECT, INSERT, DELETE, UPDATE ON Users TO 'Amit'@'localhost';
Granting All the Privilege to a User in a Table: To Grant all the privileges to a user named “Amit”
in a table “users”, the following Grant statement should be executed.
GRANT ALL ON Users TO 'Amit'@'localhost';
Granting a Privilege to all Users in a Table: To Grant a specific privilege to all the users in a table
“users”, the following Grant statement should be executed.
GRANT SELECT ON Users TO '*'@'localhost';
In the above example the “*” symbol is used to grant select permission to all the users of the table
“users”.
Granting Privileges on Functions/Procedures: While using functions and procedures, the Grant
statement can be used to grant users the ability to execute the functions and procedures in
MySQL. Granting Execute Privilege: Execute privilege gives the ability to execute a function or
procedure.Syntax:
GRANT EXECUTE ON [ PROCEDURE | FUNCTION ] object TO user;
Different ways of granting EXECUTE Privileges
Granting EXECUTE privileges on a function in MySQL.: If there is a function named
“CalculateSalary” and you want to grant EXECUTE access to the user named Amit, then the
following GRANT statement should be executed.
GRANT EXECUTE ON FUNCTION Calculatesalary TO 'Amit'@'localhost';
Granting EXECUTE privileges to all Users on a function in MySQL.: If there is a function named
“CalculateSalary” and you want to grant EXECUTE access to all the users, then the following
GRANT statement should be executed.
GRANT EXECUTE ON FUNCTION Calculatesalary TO '*'@'localhost';
Granting EXECUTE privilege to a Users on a procedure in MySQL.: If there is a procedure
named “DBMSProcedure” and you want to grant EXECUTE access to the user named Amit, then the
following GRANT statement should be executed.
GRANT EXECUTE ON PROCEDURE DBMSProcedure TO 'Amit'@'localhost';
Granting EXECUTE privileges to all Users on a procedure in MySQL.: If there is a procedure
called “DBMSProcedure” and you want to grant EXECUTE access to all the users, then the following
GRANT statement should be executed.
GRANT EXECUTE ON PROCEDURE DBMSProcedure TO '*'@'localhost';
Checking the Privileges Granted to a User: To see the privileges granted to a user in a table, the
SHOW GRANTS statement is used. To check the privileges granted to a user named “Amit” and host
as “localhost”, the following SHOW GRANTS statement will be executed:
SHOW GRANTS FOR 'Amit'@'localhost';
Output:
GRANTS FOR Amit@localhost
GRANT USAGE ON *.* TO `SUPER`@`localhost`
Revoking Privileges from a Table
The Revoke statement is used to revoke some or all of the privileges which have been granted to a user
in the past.
Syntax:
REVOKE privileges ON object FROM user;
Parameters Used:
object: It is the name of the database object from which permissions are being revoked. In the case
of revoking privileges from a table, this would be the table name.
user: It is the name of the user from whom the privileges are being revoked.
Privileges can be of the following values
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards often used in conjunction with the LIKE operator:
Example
Syntax
Demo Database
The _ Wildcard
It can be any character or number, but each _ represents one, and only one, character.
Example
Return all customers from a city that starts with 'L' followed by one wildcard character, then 'nd' and then
two wildcard characters:
The % Wildcard
Example
Return all customers from a city that contains the letter 'L':
Starts With
To return records that starts with a specific letter or phrase, add the % at the end of the letter or phrase.
Example
Example
Return all customers that starts with 'a' or starts with 'b':
Ends With
To return records that ends with a specific letter or phrase, add the % at the beginning of the letter or
phrase.
Example
Example
Return all customers that starts with "b" and ends with "s":
Contains
To return records that contains a specific letter or phrase, add the % both before and after the letter or
phrase.
Example
Combine Wildcards
Any wildcard, like % and _ , can be used in combination with other wildcards.
Example
Return all customers that starts with "a" and are at least 3 characters in length:
Example
Without Wildcard
If no wildcard is specified, the phrase has to have an exact match to return a result.
Example
In converting an ER-Diagram into a database schema, each entity type is transformed into a relation or
table.
All the attributes of the entity type are converted into the attributes columns of the table.
Each tuple or row of the relation or table is an entity of the entity type.
In our case, each tuple or row represents the details of an employee of the company.
Types of Tuples
There are two types of tuples in a database management system:
Physical Tuples: Physical Tuples are the actual data stored in the storage media of a database. It is
also known as a record or row.
Logical Tuples: Logical Tuples are the data representation in memory, where data is temporarily
stored before being written to disk or during a query operation.
Both physical and logical tuples have the same attributes, but their representation and usage can differ
based on the context of the operation.
47. Different types of lock available
In database management systems, locks are used to synchronize the access of multiple transactions to
the same data item. A lock is a variable associated with a data item that describes the status of the item
with respect to possible operations that can be applied to it. There are several types of locks used in
concurrency control, including binary locks and shared/exclusive locks .
Binary locks are simple but restrictive and are not used in practice. They can have two states or values:
locked and unlocked. A distinct lock is associated with each database item. If the value of the lock on an
item is 1, the item cannot be accessed by a database operation that requests the item. If the value of the
lock on an item is 0, then the item can be accessed when requested .
Shared/exclusive locks provide more general locking capabilities and are used in practical database locking
schemes. Shared locks are acquired when only read operation is to be performed. Shared locks can be
shared between multiple transactions as there is no data being altered. Exclusive locks are acquired when
a write operation is to be performed. Exclusive locks are not shared between transactions .
If the transaction T1 is holding a shared lock in data item A, then the control m anager can grant the
shared lock to transaction T2 as compatibility is TRUE, but it cannot grant the exclusive lock as
compatibility is FALSE.
In simple words if transaction T1 is reading a data item A, then same data item A can be read by
another transaction T2 but cannot be written by another transaction.
Similarly if an exclusive lock (i.e. lock for read and write operations) is hold on the data item in some
transaction then no other transaction can acquire Shared or Exclusive lock as the compatibility
function denoted FALSE.
Difference between Shared Lock and Exclusive Lock :
. Shared Lock Exclusive Lock
Shared lock can be placed on objects that do Exclusive lock can only be placed on
not have an exclusive lock already placed on objects that do not have any other kind of
2. them. lock.
Issued when transaction wants to read item Issued when transaction wants to update
4. that do not have an exclusive lock. unlocked item.
Any number of transaction can hold shared Exclusive lock can be hold by only one
5. lock on an item. transaction.
Database auditing is the process of analyzing and monitoring a database to ensure its security,
compliance, performance, and integrity. Database auditing can help you identify vulnerabilities, track data
access and modifications, comply with regulations, and optimize database performance. There are different
types of database audits, such as security auditing, compliance auditing, data auditing, and configuration
auditing, depending on the objectives and scope of the audit.
A database audit requires analysis of your database, including users, their permissions, and access to
data to ensure compliance with regulations.
Security Auditing – Security auditing verifies that robust passwords are in place, ensures that sensitive data
is protected through encryption, and confirms that only those with proper clearance can access the
information.
Compliance Auditing – Ensures compliance with industry regulations and legal requirements such as
GDPR, HIPAA, PCI, and SOX. It involves reviewing the database to confirm that proper measures are in
place to protect data and that the organization is adhering to relevant laws and regulations about data
management.
Data Auditing - A data audit monitors and logs data access and modifications. It allows you to trace who
accessed the data and what changes were made, including identifying individuals responsible for adding,
modifying, or deleting data. It also enables tracking of when these changes are made.
Configuration Auditing - Configuration auditing involves monitoring and tracking the actions taken by users
and database administrators, including creating and modifying database objects, managing user accounts,
and making changes to the database's configuration. In addition, it covers system-level changes such as
database software updates, operating system modifications, and hardware changes.
Additional types of database audits can be more granular such as SQL statement, SQL privilege, and
schema object audits. Or, more widely, database audits can be to review administrative activity, data
access and modification, user denials or login failures, and system-wide changes.
The benefits of a database audit include security, compliance, and data integrity. A database audit can help
you ensure your organization is not vulnerable to potential threats, remain compliant with relevant laws and
regulations such as GDPR, HIPAA, PCI, and SOX, and ensure data is accurate, complete, and consistent.
A database audit can also help with business continuity by making sure the database is available and
accessible at all times. In addition, should an issue occur where a database becomes corrupt or attacked, a
database audit can ensure that a disaster recovery plan is in place.
With proper auditing and tracking, which includes detailed records of all activities that have taken place in a
database, you can quickly discover common issues during a database audit. By resolving these errors, you
can increase the performance of your database, which would otherwise cause the database to be slow due
to slow queries, blocked processes, and other bottlenecks.
Performing a database audit depends on the needs and requirements of your organization. Below are four
key areas you should focus on when performing a database audit.
Database administrators use tools such as Lynis, a security auditing tool for Linux, to help with database
audits. It is free, open source, and can be modified or extended based on your preferences. To identify and
prioritize security issues and vulnerabilities, Lynis provides detailed reports of your database’s security-
related configuration settings. As a result, database administrators can schedule regular security
assessments and identify potential issues early.
Correcting issues discovered during a database audit depends on the type of database audited. The first
thing a database auditor needs to do is review the audit report and understand the identified issues. Then
create a plan of action to address the issues. Before making any corrections to a database, it is
recommended that you make backups in case you need to revert the database to its original state.
Resolve issues by applying patches and updates and ensuring the database is running the latest version.
Additionally, more advanced changes can be made in the database’s configuration settings for security,
such as authentication and access control. Database administrators can also reorganize database objects
such as tables and indexes to improve performance, as this can also resolve issues.
Once your changes and updates are made, monitor the database carefully to ensure no additional issues
are discovered. For best practice, performing a database audit after making changes and updates ensures
the database is running properly.
Conclusion
Performing a database audit should be done regularly. With the help of tools like Lynis, you can schedule
database audits as frequently as needed, which can help you protect your database data and help increase
database performance.
In database management an aggregate function is a function where the values of multiple rows are
grouped together as input on certain criteria to form a single value of more significant meaning.
Count():
Count(*): Returns total number of records .i.e 6.
Count(salary): Return number of Non Null values over the column salary. i.e 5.
Count(Distinct Salary): Return number of distinct Non Null values over the column salary .i.e 4
Sum():
sum(salary): Sum all Non Null values of Column salary i.e., 310
sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.
Avg():
Min():
Min(salary): Minimum value in the salary column except NULL i.e., 40.
Max(salary): Maximum value in the salary i.e., 80.
Primary Key:
Primary Key is a set of attributes (or attribute) which uniquely identify the tuples in relation or table.
The primary key is a minimal super key, so there is one and only one primary key in any
relationship. For example,
Primary key is a minimal super key. So there is one While in a relation there can be more
1.
and only one primary key in a relation. than one candidate key.
Any attribute of Primary key cannot contain NULL While in Candidate key any attribute
2.
value. can contain NULL value.
Primary key specifies the important attribute for the Candidate specifies the key which can
4.
relation. qualify for primary key.
S.NO Primary Key Candidate Key
51. BCNF
Application of the general definitions of 2NF and 3NF may identify additional redundancy caused by
dependencies that violate one or more candidate keys. However, despite these additional constraints,
dependencies can still exist that will cause redundancy to be present in 3NF relations. This weakness
in 3NF resulted in the presentation of a stronger normal form called the Boyce-Codd Normal Form
(Codd, 1974).
Although, 3NF is an adequate normal form for relational databases, still, this (3NF) normal form may
not remove 100% redundancy because of X−>Y functional dependency if X is not a candidate key of
the given relation. This can be solved by Boyce-Codd Normal Form (BCNF).
It can be inferred that every relation in BCNF is also in 3NF. To put it another way, a relation in 3NF
need not be in BCNF. Ponder over this statement for a while.
To determine the highest normal form of a given relation R with functional dependencies, the first step is
to check whether the BCNF condition holds. If R is found to be in BCNF, it can be safely deduced that the
relation is also in 3NF, 2NF, and 1NF as the hierarchy shows. The 1NF has the least restrictive constraint
– it only requires a relation R to have atomic values in each tuple. The 2NF has a slightly more restrictive
constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less restrictive than the
BCNF. In this manner, the restriction increases as we traverse down the hierarchy.
Examples
Here, we are going to discuss some basic examples which let you understand the properties of BCNF. We
will discuss multiple examples here.
Example 1
Let us consider the student database, in which data of the student are mentioned.
101 201
101 202
102 401
102 402
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all attributes of the
relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation,
so there will be only 1 candidate key {AC}.
Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and
others will be non-prime {B, D, E} in this example.
Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of
candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form
(B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute)
and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of
an FD should be super key or RHS should be a prime attribute. So the highest normal form of relation will be
the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful observation is required to conclude
that the above dependency is a Transitive Dependency as the prime attribute B transitively depends on the
key AB through C. Now, the first and the third FD are in BCNF as they both contain the candidate key (or
simply KEY) on their left sides. The second dependency, however, is not in BCNF but is definitely in 3NF
due to the presence of the prime attribute on the right side. So, the highest normal form of R is 3NF as all
three FDs satisfy the necessary conditions to be in 3NF.
Example 3
In 3NF the functional dependencies are already in In BCNF the functional dependencies
4.
1NF and 2NF. are already in 1NF, 2NF and 3NF.
3NF can be obtained without sacrificing all Dependencies may not be preserved in
10.
dependencies. BCNF.
3NF can be achieved without losing any For obtaining BCNF we may lose some
11.
information from the old table. information from old table.
4NF
BCNF
2 A relation in BCNF may have multi- <A relation in 4NF must not have any multi-valued
valued dependency. dependency.
5 If a relation is in BCNF then it will have If a relation is in 4NF then it will have less
more redundancy as compared to 4NF. redundancy as compared to BCNF .
If a relation is in BCNF then all If a relation is in 4NF then all redundancy based on
6 redundancy based on functional functional dependency as well as multi-valued
dependency has been removed. dependency has been removed.
For a relation, number of tables in
7 For a relation, number of tables in 4NF is greater
BCNF is less than or equal to number of
than or equal to number of tables in BCNF.
tables in 4NF.
9 In real world database designing, In real world database designing, generally 4NF is
generally 3NF or BCNF is preferred. not preferred by database designer.
10 A relation in BCNF may contain multi- A relation in 4NF may only contain join
valued as well as join dependency. dependency.
A relation in 4NF must not have any A relation in 5NF must not have any join
2.
multi-valued dependency. dependency.
Fourth Normal Form is less stronger Fifth Normal form is more stronger than Fourth
4.
in comparison to Fifth Normal form. Normal Form.
If a relation is in Fourth Normal Form If a relation is in Fifth Normal Form then it will less
5.
then it will have more redundancy. redundancy.
If a relation is in Fourth Normal Form If a relation is in Fifth Normal Form then it cannot be
6. then it may be decomposed further decomposed further into sub-relations without any
into sub-relations. modification in meaning or facts.
52. What is an ER diagram? Explain the symbol used in it with the help of an example
The Entity Relational Model is a model for identifying entities to be represented in the database and
representation of how those entities are related. The ER data model specifies enterprise schema that
represents the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the database. ER
models are used to model real-world objects like a person, a car, or a company and the relation between
these real-world objects. In short, the ER Diagram is the structural format of the database.
Why Use ER Diagrams In DBMS?
ER diagrams are used to represent the E-R model in a database, which makes them easy to be
converted into relations (tables).
ER diagrams provide the purpose of real-world modeling of objects which makes them intently
useful.
ER diagrams require no technical knowledge and no hardware support.
These diagrams are very easy to understand and easy to create even for a naive user.
It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which consists of these
symbols:
Rectangles: Rectangles represent Entities in the ER Model.
Ellipses: Ellipses represent Attributes in the ER Model.
Diamond: Diamonds represent Relationships among Entities.
Lines: Lines represent attributes to entities and entity sets with other relationship types.
Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
Double Rectangle: Double Rectangle represents a Weak Entity.
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database System.
Entity
An Entity may be an object with a physical existence – a particular person, car, house, or employee – or it
may be an object with a conceptual existence – a company, a job, or a university course.
Entity Set: An Entity is an object of Entity Type and a set of all entities is called an entity set. For Example,
E1 is an entity having Entity Type Student and the set of all students is called Entity Set. In ER diagram,
Entity Type is represented as:
Entity Set
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on other Entity in
the Schema. It has a primary key, that helps in identifying it uniquely, and it is represented by a rectangle.
These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But some entity
type exists for which key attributes can’t be defined. These are called Weak Entity types.
For Example, A company may store the information of dependents (Parents, Children, Spouse) of an
Employee. But the dependents don’t have existed without the employee. So Dependent will be a Weak
Entity Type and Employee will be Identifying Entity type for Dependent, which means it is Strong Entity
Type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity types is
always total. The relationship between the weak entity type and its identifying strong entity type is called
identifying relationship and it is represented by a double diamond.
Attributes
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age, Address,
and Mobile_No are the attributes that define entity type Student. In ER diagram, the attribute is represented
by an oval.
Attribute
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute. For
example, Roll_No will be unique for each student. In ER diagram, the key attribute is represented by an
oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example, the
Address attribute of the student Entity type consists of Street, City, State, and Country. In ER diagram, the
composite attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No (can be more
than one for a given student). In ER diagram, a multivalued attribute is represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute. e.g.;
Age (can be derived from DOB). In ER diagram, the derived attribute is represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following relationship set depicts
S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the relationship
is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a relation, the relationship is called
an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the relationship, the
cardinality is one-to-one. Let us assume that a male can marry one female and a female can marry one
male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more than one
relationship and the total number of tables that can be used in this is 2. Let us assume that one surgeon
deparment can accomodate many doctors. So the Cardinality will be 1 to M. It means one deparment has
many Doctors.
total number of tables that can used is 3.
one to many cardinality
3. Many-to-One: When entities in one entity set can take part only once in the relationship set and entities
in other entity sets can take part more than once in the relationship set, cardinality is many to one. Let us
assume that a student can take only one course but one course can be taken by many students. So the
cardinality will be n to 1. It means that for one course there can be n students but for one student, there
will be only one course.
The total number of tables that can be used in this is 3.
In this case, each student is taking only 1 course but 1 course has been taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the relationship
cardinality is many to many. Let us assume that a student can take more than one course and one course
can be taken by many students. So the relationship will be many to many.
the total number of tables that can be used in this is 3.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and S4. So it is
many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If each student
must enroll in a course, the participation of students will be total. Total participation is shown by a double
line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the relationship. If
some courses are not enrolled by any of the students, the participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and
Course Entity set having partial participation.
Every student in the Student Entity set participates in a relationship but there exists a course C4 that is not
taking part in the relationship.
How to Draw ER Diagram?
The very first step is Identifying all the Entities, and place them in a Rectangle, and labeling them
accordingly.
The next step is to identify the relationship between them and pace them accordingly using the
Diamond, and make sure that, Relationships are not connected to each other.
Attach attributes to the entities properly.
Remove redundant entities and relationships.
Add proper colors to highlight the data present in the database.
Database users are categorized based up on their interaction with the database. These are seven types
of database users in DBMS.
1. Database Administrator (DBA) : Database Administrator (DBA) is a person/team who defines the
schema and also controls the 3 levels of database. The DBA will then create a new account id and
password for the user if he/she need to access the database. DBA is also responsible for providing
security to the database and he allows only the authorized users to access/modify the data bas e.
DBA is responsible for the problems such as security breaches and poor system response time.
DBA also monitors the recovery and backup and provide technical support.
The DBA has a DBA account in the DBMS which called a system or superuser account.
DBA repairs damage caused due to hardware and/or software failures.
DBA is the one having privileges to perform DCL (Data Control Language) operations such as
GRANT and REVOKE, to allow/restrict a particular user from accessing the database.
2. Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t have any
DBMS knowledge but they frequently use the database applications in their daily life to get the
desired results. For examples, Railway’s ticket booking users are naive users. Clerk s in any bank is
a naive user because they don’t have any DBMS knowledge but they still use the database and
perform their given task.
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check
whether all the requirements of end users are satisfied.
4. Sophisticated Users : Sophisticated users can be engineers, scientists, business analyst, who are
familiar with the database. They can develop their own database applications according to their
requirement. They don’t write the program code but they interact the database by writing SQL
queries directly through the query processor.
5. Database Designers : Data Base Designers are the users who design the structure of database
which includes tables, indexes, views, triggers, stored procedures and constraints which are usually
enforced before the database is created or populated with data. He/she controls what data must be
stored and how the data items to be related. It is responsibility of Database Designers to understan d
the requirements of different user groups and then create a design which satisfies the need of all the
user groups.
6. Application Programmers : Application Programmers also referred as System Analysts or simply
Software Engineers, are the back-end programmers who writes the code for the application
programs. They are the computer professionals. These programs could be written in Programming
languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc. Application programmers
design, debug, test, and maintain set of programs called “canned transactions” for the Naive
(parametric) users in order to interact with database.
7. Casual Users / Temporary Users : Casual Users are the users who occasionally use/access the
database but each time when they access the database they require the new information, for
example, Middle or higher level manager.
8. Specialized users : Specialized users are sophisticated users who write
specialized database application that does not fit into the traditional data -
processing framework. Among these applications are computer aided-design
systems, knowledge-base and expert systems etc.
Database security includes a variety of measures used to secure database management systems from
malicious cyber-attacks and illegitimate use. Database security programs are designed to protect not
only the data within the database, but also the data management system itself, and every application
that accesses it, from misuse, damage, and intrusion.
Database security encompasses tools, processes, and methodologies which establish security inside a
database environment.
Database Security means keeping sensitive information safe and prevent the loss of data. Security of
data base is controlled by Database Administrator (DBA).
The following are the main control measures are used to provide security of data in databases:
1. Authentication
2. Access control
3. Inference control
4. Flow control
5. Database Security applying Statistical Method
6. Encryption
These are explained as following below.
1. Authentication :
Authentication is the process of confirmation that whether the user log in onl y according to the rights
provided to him to perform the activities of data base. A particular user can login only up to his
privilege but he can’t access the other sensitive data. The privilege of accessing sensitive data is
restricted by using Authentication.
By using these authentication tools for biometrics such as retina and figure prints can prevent the
data base from unauthorized/malicious users.
2. Access Control :
The security mechanism of DBMS must include some provisions for restricting access to the data
base by unauthorized users. Access control is done by creating user accounts and to control login
process by the DBMS. So, that database access of sensitive data is possible only to those people
(database users) who are allowed to access such data and to restrict access to unauthorized
persons.
The database system must also keep the track of all operations performed by certain user
throughout the entire login time.
3. Inference Control :
This method is known as the countermeasures to statistical database security problem. It is used to
prevent the user from completing any inference channel. This method protect sensitive information
from indirect disclosure.
Inferences are of two types, identity disclosure or attribute disclosure.
4. Flow Control :
This prevents information from flowing in a way that it reaches unauthorized users. Channels are the
pathways for information to flow implicitly in ways that violate the privacy policy of a company are
called convert channels.
5. Database Security applying Statistical Method :
Statistical database security focuses on the protection of confidential individual values stored in and
used for statistical purposes and used to retrieve the summaries of values based on categories. They
do not permit to retrieve the individual information.
This allows to access the database to get statistical information about the number of employees in
the company but not to access the detailed confidential/personal information about the specific
individual employee.
6. Encryption :
This method is mainly used to protect sensitive data (such as credit card numbers, OTP numbers)
and other sensitive numbers. The data is encoded using some encoding algorithms.
An unauthorized user who tries to access this encoded data will face difficult y in decoding it, but
authorized users are given decoding keys to decode data.
Insider Threats
An insider threat is a security risk from one of the following three sources, each of which has privileged
means of entry to the database:
An insider threat is one of the most typical causes of database security breaches and it often occurs
because a lot of employees have been granted privileged user access.
Human Error
Weak passwords, password sharing, accidental erasure or corruption of data, and other undesirable user
behaviors are still the cause of almost half of data breaches reported.
Exploitation of Database Software Vulnerabilities
Attackers constantly attempt to isolate and target vulnerabilities in software, and database management
software is a highly valuable target. New vulnerabilities are discovered daily, and all open source database
management platforms and commercial database software vendors issue security patches regularly.
However, if you don’t use these patches quickly, your database might be exposed to attack.
Even if you do apply patches on time, there is always the risk of zero-day attacks, when attackers discover
a vulnerability, but it has not yet been discovered and patched by the database vendor.
A database-specific threat involves the use of arbitrary non-SQL and SQL attack strings into database
queries. Typically, these are queries created as an extension of web application forms, or received via
HTTP requests. Any database system is vulnerable to these attacks, if developers do not adhere to secure
coding practices, and if the organization does not carry out regular vulnerability testing.
Buffer overflow takes place when a process tries to write a large amount of data to a fixed-length block of
memory, more than it is permitted to hold. Attackers might use the excess data, kept in adjacent memory
addresses, as the starting point from which to launch attacks.
In a denial of service (DoS) attack, the cybercriminal overwhelms the target service—in this instance the
database server—using a large amount of fake requests. The result is that the server cannot carry out
genuine requests from actual users, and often crashes or becomes unstable.
In a distributed denial of service attack (DDoS), fake traffic is generated by a large number of computers,
participating in a botnet controlled by the attacker. This generates very large traffic volumes, which are
difficult to stop without a highly scalable defensive architecture. Cloud-based DDoS protection services can
scale up dynamically to address very large DDoS attacks.
Malware
Malware is software written to take advantage of vulnerabilities or to cause harm to a database. Malware
could arrive through any endpoint device connected to the database’s network. Malware protection is
important on any endpoint, but especially so on database servers, because of their high value and
sensitivity.
An Evolving IT Environment
The evolving IT environment is making databases more susceptible to threats. Here are trends that can
lead to new types of attacks on databases, or may require new defensive measures:
Growing data volumes—storage, data capture, and processing is growing exponentially across almost all
organizations. Any data security practices or tools must be highly scalable to address distant and near-
future requirements.
Distributed infrastructure—network environments are increasing in complexity, especially as businesses
transfer workloads to hybrid cloud or multi-cloud architectures, making the deployment, management, and
choice of security solutions more difficult.
Increasingly tight regulatory requirements—the worldwide regulatory compliance landscape is growing
in complexity, so following all mandates are becoming more challenging.
Cybersecurity skills shortage—there is a global shortage of skilled cybersecurity professionals, and
organizations are finding it difficult to fill security roles. This can make it more difficult to defend critical
infrastructure, including databases.
A database server is a physical or virtual machine running the database. Securing a database server, also
known as “hardening”, is a process that includes physical security, network security, and secure operating
system configuration.
Refrain from sharing a server for web applications and database applications, if your database
contains sensitive data. Although it could be cheaper, and easier, to host your site and database together
on a hosting provider, you are placing the security of your data in someone else’s hands.
If you do rely on a web hosting service to manage your database, you should ensure that it is a company
with a strong security track record. It is best to stay clear of free hosting services due to the possible lack of
security.
If you manage your database in an on-premise data center, keep in mind that your data center is also
prone to attacks from outsiders or insider threats. Ensure you have physical security measures, including
locks, cameras, and security personnel in your physical facility. Any access to physical servers must be
logged and only granted to authorized individuals.
In addition, do not leave database backups in locations that are publicly accessible, such as temporary
partitions, web folders, or unsecured cloud storage buckets.
Lock Down Accounts and Privileges
Let’s consider the Oracle database server. After the database is installed, the Oracle database
configuration assistant (DBCA) automatically expires and locks most of the default database user accounts.
If you install an Oracle database manually, this doesn’t happen and default privileged accounts won’t be
expired or locked. Their password stays the same as their username, by default. An attacker will try to use
these credentials first to connect to the database.
It is critical to ensure that every privileged account on a database server is configured with a strong, unique
password. If accounts are not needed, they should be expired and locked.
For the remaining accounts, access has to be limited to the absolute minimum required. Each account
should only have access to the tables and operations (for example, SELECT or INSERT) required by the
user. Avoid creating user accounts with access to every table in the database.
Ensure that patches remain current. Effective database patch management is a crucial security practice
because attackers are actively seeking out new security flaws in databases, and new viruses and malware
appear on a daily basis.
A timely deployment of up-to-date versions of database service packs, critical security hotfixes, and
cumulative updates will improve the stability of database performance.
Organizations store their applications in databases. In most real-world scenarios, the end-user doesn’t
require direct access to the database. Thus, you should block all public network access to database
servers unless you are a hosting provider. Ideally, an organization should set up gateway servers (VPN or
SSH tunnels) for remote administrators.
Irrespective of how solid your defenses are, there is always a possibility that a hacker may infiltrate your
system. Yet, attackers are not the only threat to the security of your database. Your employees may also
pose a risk to your business. There is always the possibility that a malicious or careless insider will gain
access to a file they don’t have permission to access.
Encrypting your data makes it unreadable to both attackers and employees. Without an encryption key,
they cannot access it, this provides a last line of defense against unwelcome intrusions. Encrypt all-
important application files, data files, and backups so that unauthorized users cannot read your critical
data.
Here are several best practices you can use to improve the security of sensitive databases.
Actively Manage Passwords and User Access
If you have a large organization, you must think about automating access management via password
management or access management software. This will provide permitted users with a short-term
password with the rights they need every time they need to gain access to a database.
It also keeps track of the activities completed during that time frame and stops administrators from sharing
passwords. While administrators may feel that sharing passwords is convenient, however, doing so makes
effective database accountability and security almost impossible.
Once you have put in place your database security infrastructure, you must test it against a real threat.
Auditing or performing penetration tests against your own database will help you get into the mindset of a
cybercriminal and isolate any vulnerabilities you may have overlooked.
To make sure the test is comprehensive, involve ethical hackers or recognized penetration testing services
in your security testing. Penetration testers provide extensive reports listing database vulnerabilities, and it
is important to quickly investigate and remediate these vulnerabilities. Run a penetration test on a critical
database system at least once per year.
Continually scanning your database for breach attempts increases your security and lets you rapidly react
to possible attacks.
In particular, File Integrity Monitoring (FIM) can help you log all actions carried out on the database’s server
and to alert you of potential breaches. When FIM detects a change to important database files, ensure
security teams are alerted and able to investigate and respond to the threat.
You should use a firewall to protect your database server from database security threats. By default, a
firewall does not permit access to traffic. It needs to also stop your database from starting outbound
connections unless there is a particular reason for doing so.
As well as safeguarding the database with a firewall, you must deploy a web application firewall (WAF).
This is because attacks aimed at web applications, including SQL injection, can be used to gain illicit
access to your databases.
A database firewall will not stop most web application attacks, because traditional firewalls operate at the
network layer, while web application layers operate at the application layer (layer 7 of the OSI model). A
WAF operates at layer 7 and is able to detect malicious web application traffic, such as SQL injection
attacks, and block it before it can harm your database.
55. How will you create and manage views
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real table in
the database. We can create a view by selecting fields from one or more tables present in the database.
A View can either have all the rows of a table or specific rows based on certain condition. In this article
we will learn about creating , deleting and updating Views.
Sample Tables:
StudentDetails
StudentMarks
CREATING VIEWS
We can create View using CREATE VIEW statement. A View can be created from a single table or
multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
In this example, we will create a view named StudentNames from the table StudentDetails. Query:
CREATE VIEW StudentNames AS
SELECT S_ID, NAME
FROM StudentDetails
ORDER BY NAME;
If we now query the view as,
SELECT * FROM StudentNames;
Output:
Creating View from multiple tables: In this example we will create a View named MarksView from
two tables StudentDetails and StudentMarks. To create a View from multiple tables we can simply
include multiple tables in the SELECT statement. Query:
CREATE VIEW MarksView AS
SELECT StudentDetails.NAME, StudentDetails.ADDRESS, StudentMarks.MARKS
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;
To display data of View MarksView:
SELECT * FROM MarksView;
Output:
OR
DELETING VIEWS
We have learned about creating a View, but what if a created View is not needed any more? Obviously
we will want to delete it. SQL allows us to delete an existing View. We can delete or drop a View using
the DROP statement. Syntax:
DROP VIEW view_name;
Output:
Inserting a row in a view: We can insert a row in a View in a same way as we do in a table. We can
use the INSERT INTO statement of SQL to insert a row in a View.
Syntax:
INSERT INTO view_name(column1, column2 , column3,..)
VALUES(value1, value2, value3..);
Deleting a row from a View: Deleting rows from a view is also as simple as deleting rows from a
table. We can use the DELETE statement of SQL to delete rows from a view. Also deleting a row
from a view first delete the row from the actual table and the change is then reflected in the view.
Syntax:
DELETE FROM view_name
WHERE condition;
Database Management System (DBMS) is a collection of interrelated data and a set of software
tools/programs that access, process, and manipulate data. It allo ws access, retrieval, and use of that
data by considering appropriate security measures. The Database Management system (DBMS) is
really useful for better data integration and security.
Relational Algebra is a procedural query language that takes relations as an input and returns
relations as an output. There are some basic operators which can be applied in relation to producing
the required results which we will discuss one by one. We will use STUDENT_SPORTS,
EMPLOYEE, and STUDENT relations as given in Table 1, Table 2, and Table 3 respectively to
understand the various operators.
Table 1: STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2: EMPLOYEE
Table 3: STUDENT
Selection operator (?): Selection operator is used to selecting tuples from a relation based on some
condition. Syntax:
? (Cond)(Relation Name)
Extract students whose age is greater than 18 from STUDENT relation given in Table 3
? (AGE>18)(STUDENT)
[Note: SELECT operator does not show any result, the projection operator must be called before the
selection operator to generate or project the result. So, the correct syntax to generate the result
is: ?(? (AGE>18)(STUDENT))]
RESULT:
Projection Operator (?): Projection operator is used to project particular columns from a
relation. Syntax:
?(Column 1,Column 2….Column n)(Relation Name)
Extract ROLL_NO and NAME from STUDENT relation given in Table 3
?(ROLL_NO,NAME)(STUDENT)
RESULT:
ROLL_NO NAME
1 RAM
2 RAMESH
3 SUJIT
4 SURESH
Note: If the resultant relation after projection has duplicate rows, it will be removed. For
Example ?(ADDRESS)(STUDENT) will remove one duplicate row with the value DELHI and return three
rows.
Cross Product(X): Cross product is used to join two relations. For every row of Relation1, each row
of Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n tuples, cross
product of Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and STUDENT_SPORTS relation
given in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:
9455123 Badmint
1 RAM DELHI 18 1
451 on
9455123
1 RAM DELHI 18 2 Cricket
451
9455123 Badmint
1 RAM DELHI 18 2
451 on
9455123 Badmint
1 RAM DELHI 18 4
451 on
SURE 9156768
4 DELHI 18 2 Cricket
SH 971
Union (U): Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have the same number of attributes and corresponding
attributes in two relations have the same domain). Union operator when applied on two relations R1
and R2 will give a relation with tuples that are either in R1 or in R2. The tuples which are in both R1
and R2 will appear only once in the result relation. Syntax:
Relation1 U Relation2
Find the person who is either student or employees, we can use Union operators like:
STUDENT U EMPLOYEE
RESULT:
Minus (-): Minus on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible. Minus operator when applied on two relations as R1-R2 will give a relation with tuples
that are in R1 but not in R2. Syntax:
Relation1 - Relation2
Find the person who is a student but not an employee, we can use minus operator like:
STUDENT - EMPLOYEE
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE
Indexing improves database performance by minimizing the number of disc visits required to fulfill a
query. It is a data structure technique used to locate and quickly access data in databases. Several
database fields are used to generate indexes. The main key or candidate key of the table is
duplicated in the first column, which is the Search key. To speed up data retrieval, the values are
also kept in sorted order. It should be highlighted that sorting the data is not required. The second
column is the Data Reference or Pointer which contains a set of pointers holding the address of the
disk block where that particular key value can be found.
Attributes of Indexing
Access Types: This refers to the type of access such as value-based search, range access, etc.
Access Time: It refers to the time needed to find a particular data element or set of elements.
Insertion Time: It refers to the time taken to find the appropriate space and insert new data.
Deletion Time: Time taken to find an item and delete it as well as update the index structure.
Space Overhead: It refers to the additional space required by the index.
In general, there are two types of file organization mechanisms that are followed by the indexing methods
to store the data:
Sequential File Organization or Ordered Index File
In this, the indices are based on a sorted ordering of the values. These are generally fast and a more
traditional type of storing mechanism. These Ordered or Sequential file organizations might store the data
in a dense or sparse format.
Dense Index
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with
that search key value.
Sparse Index
The index record appears only for a few items in the data file. Each item points to a block
as shown.
To locate a record, we find the index record with the largest search key value less than
or equal to the search key value we are looking for.
We start at that record pointed to by the index record, and proceed along with the
pointers in the file (that is, sequentially) until we find the desired record.
Number of Accesses required=log₂(n)+1, (here n=number of blocks acquired by index
file)
Hash File Organization
Indices are based on the values being distributed uniformly across a range of buckets. The buckets to
which a value is assigned are determined by a function called a hash function. There are primarily three
methods of indexing:
Clustered Indexing: When more than two records are stored in the same file this type of storing is
known as cluster indexing. By using cluster indexing we can reduce the cost of searching reason
being multiple records related to the same thing are stored in one place and it also gives the frequent
joining of more than two tables (records).
The clustering index is defined on an ordered data file. The data file is ordered on a non-key field. In
some cases, the index is created on non-primary key columns which may not be unique for each
record. In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create an index out of them. This method is known as the
clustering index. Essentially, records with similar properties are grouped together, and indexes for
these groupings are formed.
Students studying each semester, for example, are grouped together. First-semester students,
second-semester students, third-semester students, and so on are categorized.
Primary Indexing: This is a type of Clustered Indexing wherein the data is sorted according to the
search key and the primary key of the database table is used to create the index. It is a default
format of indexing where it induces sequential file organization. As primary keys are unique and are
stored in a sorted manner, the performance of the searching operation is quite efficient.
Non-clustered or Secondary Indexing: A non-clustered index just tells us where the data lies, i.e. it
gives us a list of virtual pointers or references to the location where the data is actually stored. Data
is not physically stored in the order of the index. Instead, data is present in leaf nodes. For eg. the
contents page of a book. Each entry gives us the page number or location of the information stored.
The actual data here(information on each page of the book) is not organized but we have an ordered
reference(contents page) to where the data points actually lie. We can have only dense ordering in
the non-clustered index as sparse ordering is not possible because data is not physically organized
accordingly.
It requires more time as compared to the clustered index because some amount of extra work is
done in order to extract the data by further following the pointer. In the case of a clustered index,
data is directly present in front of the index.
Multilevel Indexing: With the growth of the size of the database, indices also grow. As the index
is stored in the main memory, a single-level index might become too large a size to store with
multiple disk accesses. The multilevel indexing segregates the main block into various smaller
blocks so that the same can be stored in a single block. The outer blocks are divided into inner
blocks which in turn are pointed to the data blocks. This can be easily stored in the main memory
with fewer overheads.
Advantages of Indexing
Improved Query Performance: Indexing enables faster data retrieval from the database. The
database may rapidly discover rows that match a specific value or collection of values by generating
an index on a column, minimizing the amount of time it takes to perform a query.
Efficient Data Access: Indexing can enhance data access efficiency by lowering the amount of disk
I/O required to retrieve data. The database can maintain the data pages for frequently visited
columns in memory by generating an index on those columns, decreasing the requirement to read
from disk.
Optimized Data Sorting: Indexing can also improve the performance of sorting operations. By
creating an index on the columns used for sorting, the database can avoid sorting the entire table
and instead sort only the relevant rows.
Consistent Data Performance: Indexing can assist ensure that the database performs consistently
even as the amount of data in the database rises. Without indexing, queries may take longer to run
as the number of rows in the table grows, while indexing maintains a roughly consistent speed.
By ensuring that only unique values are inserted into columns that have been indexed as unique,
indexing can also be utilized to ensure the integrity of data. This avoids storing duplicate data in the
database, which might lead to issues when performing queries or reports.
Overall, indexing in databases provides significant benefits for improving query performance, efficient data
access, optimized data sorting, consistent data performance, and enforced data integrity
Disadvantages of Indexing
Indexing necessitates more storage space to hold the index data structure, which might increase the
total size of the database.
Increased database maintenance overhead: Indexes must be maintained as data is added,
destroyed, or modified in the table, which might raise database maintenance overhead.
Indexing can reduce insert and update performance since the index data structure must be updated
each time data is modified.
Choosing an index can be difficult: It can be challenging to choose the right indexes for a specific
query or application and may call for a detailed examination of the data and access patterns.
Features of Indexing
The development of data structures, such as B-trees or hash tables, that provide quick access to
certain data items is known as indexing. The data structures themselves are built on the values of
the indexed columns, which are utilized to quickly find the data objects.
The most important columns for indexing columns are selected based on how frequently they are
used and the sorts of queries they are subjected to. The cardinality, selectivity, and uniqueness of
the indexing columns can be taken into account.
There are several different index types used by databases, including primary, secondary, clustered,
and non-clustered indexes. Based on the particular needs of the database system, each form of
index offers benefits and drawbacks.
For the database system to function at its best, periodic index maintenance is required. According to
changes in the data and usage patterns, maintenance work involves building, updating, and
removing indexes.
Database query optimization involves indexing, which is essential. The query optimizer utilizes the
indexes to choose the best execution strategy for a particular query based on the cost of accessing
the data and the selectivity of the indexing columns.
Databases make use of a range of indexing strategies, including covering indexes, index-only scans,
and partial indexes. These techniques maximize the utilization of indexes for particular types of
queries and data access.
When non-contiguous data blocks are stored in an index, it can result in index fragmentation, which
makes the index less effective. Regular index maintenance, such as defragmentation and
reorganization, can decrease fragmentation.
Conclusion
Indexing is a very useful technique that helps in optimizing the search time in database queries. The table
of database indexing consists of a search key and pointer. There are four types of indexing: Primary,
Secondary Clustering, and Multivalued Indexing. Primary indexing is divided into two types, dense and
sparse. Dense indexing is used when the index table contains records for every search key. Sparse
indexing is used when the index table does not use a search key for every record. Multilevel indexing
uses B+ Tree. The main purpose of indexing is to provide better performance for data retrieval.
States through which a transaction goes during its lifetime. These are the states which tell about the
current state of the Transaction and also tell how we will further do the processing in the
transactions. These states govern the rules which decide the fate of the transaction whether it will
commit or abort.
They also use Transaction log. Transaction log is a file maintain by recovery management
component to record all the activities of the transaction. After commit is done transaction log file is
removed.
1. Active State –
When the instructions of the transaction are running then the transaction is in active state. If all the
‘read and write’ operations are performed without any error then it goes to the “partially committed
state”; if any instruction fails, it goes to the “failed state”.
2. Partially Committed –
After completion of all the read and write operation the changes are made in main memory or local
buffer. If the changes are made permanent on the DataBase then the state will change to “committed
state” and in case of failure it will go to the “failed state”.
3. Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in making
a permanent change of data on Data Base.
4. Aborted State –
After having any type of failure the transaction goes from “failed state” to “aborted state” and since in
previous states, the changes are only made to local buffer or main memory and hence these
changes are deleted or rolled-back.
5. Committed State –
It is the state when the changes are made permanent on the Data Base and the transaction is
complete and therefore terminated in the “terminated state”.
6. Terminated State –
If there isn’t any roll-back or the transaction comes from the “committed state”, then the system is
consistent and ready for new transaction and the old transaction is terminated.
A Data Model in Database Management System (DBMS) is the concept of tools that are developed to
summarize the description of the database. Data Models provide us with a transparent picture of data
which helps us in creating an actual database. It shows us from the design of the data to its proper
implementation of data.
Types of Relational Models
1. Conceptual Data Model
2. Representational Data Model
3. Physical Data Model
It is basically classified into 3 types:-
1. Conceptual Data Model
The conceptual data model describes the database at a very high level and is useful to understand the
needs or requirements of the database. It is this model, that is used in the requirement -gathering process
i.e. before the Database Designers start making a particular database. One such popular model is
the entity/relationship model (ER model). The E/R model specializes in entities, relationships, and even
attributes that are used by database designers. In terms of this concept, a discussion can be made even
with non-computer science(non-technical) users and stakeholders, and their requirements can be
understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is used to define the data and
the relationships between them. It is basically a conceptual design of any database which is easy to design
the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object, class, etc. These
are represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are represented by
Eclipse in an ER Diagram. It can be Age, Roll Number, or Marks for a Student.
3. Relationship: Relationships are used to define relations among different entities. Diamonds and
Rhombus are used to show Relationships.
This type of data model is used to represent only the logical part of the database and does not represent
the physical structure of the database. The representational data model allows us to focus primarily, on
the design part of the database. A popular representational model is a Relational model. The relational
Model consists of Relational Algebra and Relational Calculus. In the Relational Model, we basically use
tables to represent our data and the relationships between them. It is a theoretical concept whose pra ctical
implementation is done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the base for the
Physical model
The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in a
database is stored physically on a secondary storage device such as discs and tapes. This is stored in the
form of files, records, and certain other data structures. It has all the information on the format in which the
files are present and the structure of the databases, the presence of external data structures, and their
relation to each other. Here, we basically save tables in memory so they can be accessed efficiently. In
order to come up with a good physical model, we have to work on the relational model in a better
way. Structured Query Language (SQL) is used to practically implement Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS system. This
model is typically created by DBA and developers. The purpose is actual implementation of the
database.
The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and nullability of
the relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are d efined
Some Other Data Models
1. Hierarchical Model
The hierarchical Model is one of the oldest models in the data model which was developed by IBM, in the
1950s. In a hierarchical model, data are viewed as a collection of tables, or we can say segments that form
a hierarchical relation. In this, the data is organized into a tree-like structure where each record consists
of one parent record and many children. Even if the segments are connected as a chain -like structure by
logical associations, then the instant structure can be a fan structure with multiple branches. We call the
illogical associations as directional associations.
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This model is the
generalization of the hierarchical model. This model can consist of multiple parent segments and these
segments are grouped as levels but there exists a logical association between the segments belonging to
any level. Mostly, there exists a many-to-many logical association between any of the two segments.
In the Object-Oriented Data Model, data and their relationships are contained in a single structure which
is referred to as an object in this data model. In this, real-world problems are represented as objects with
different attributes. All objects have multiple relationships between them. Basically, it is a combination of
Object Oriented programming and a Relational Database Model.
The float data model basically consists of a two-dimensional array of data models that do not contain any
duplicate elements in the array. This data model has one drawback it cannot store a large amount of data
that is the tables can not be of large size.
The Context data model is simply a data model which consists of more than one data model. For example,
the Context data model consists of ER Model, Object-Oriented Data Model, etc. This model allows users
to do more than one thing which each individual data model can do.
6. Semi-Structured Data Model
Semi-Structured data models deal with the data in a flexible way. Some entities may have extra attributes
and some entities may have some missing attributes. Basically, you can represent data here in a flexible
way.
Advantages of Data Models
1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the physical database.
5. The information in the data model can be used for defining the relationship between tables, primary
and foreign keys, and stored procedures.
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes difficult to understand the data model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristics.
Conclusion
Data modeling is the process of developing data model for the data to be stored in a Database.
Data Models ensure consistency in naming conventions, default values, semantics, security while
ensuring quality of the data.
Data Model structure helps to define the relational tables, primary and foreign keys and stored
procedures.
There are three types of conceptual, logical, and physical.
The main aim of conceptual model is to establish the entities, their attributes, and their relationships.
Logical data model defines the structure of the data elements and set the relationships between
them.
A Physical Data Model describes the database specific implementation of the data model.
The main goal of a designing data model is to make certain that data objects offered by the
functional team are represented accurately.
The biggest drawback is that even smaller change made in structure require modification in the
entire application.
Reading this Data Modeling tutorial, you will learn from the basic conc epts such as What is Data
Model? Introduction to different types of Data Model, advantages, disadvantages, and data model
example.
What is DML?
DML commands it to allow you to manage the data stored in the database, although DML commands are
not auto-committed. Moreover, they are not permanent. So, It is possible to roll back the operation. The full
form of DML is Data Manipulation Language.
Data Definition Language (DDL) helps you to Data Manipulation Language (DML command) allows you
define the database structure or schema. to manage the data stored in the database.
DDL command is used to create the database DML command is used to populate and manipulate
schema. database
It defines the column of the table. It adds or updates the row of the table
DDL statements affect the whole table. DML effects one or more rows.
Important DDL commands are: 1) CREATE, 2) ALTER, 3) DROP, 4) TRUNCATE, etc. while important
DML commands are: 1) INSERT, 2) UPDATE, 3) DELETE, 4) MERGE, etc.
Why DDL?
Here, are reasons for using DDL method:
Why DML?
Here, benefits/ pros of DML:
The DML statements allow you to modify the data stored in a database.
Users can specify what data is needed.
DML offers many different flavors and capabilities between database vendors.
It offers an efficient human interaction with the system.
CREATE
CREATE statements is used to define the database structure schema:
Syntax:
Syntax:
DROP TABLE ;
For example:
Syntax:
Syntax:
INSERT
UPDATE
DELETE
INSERT
This is a statement that is a SQL query. This command is used to insert data into the row of a table.
Syntax:
INSERT INTO students (RollNo, FIrstName, LastName) VALUES ('60', 'Tom', 'Erichsen');
UPDATE
This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE students
SET FirstName = 'Jhon', LastName=' Wick'
WHERE StudID = 3;
DELETE
This command is used to remove one or more rows from a table.
Syntax:
CREATE
Syntax:
The parameter tableName denotes the name of the table that you are going to create.
The parameters column_1, column_2… denote the columns to be added to the table.
A column should be specified as either NULL or NOT NULL. If you don’t specify, SQL Server will
take NULL as the default
Example:
Example:
INSERT
In PL/SQL, we can insert the data into any table using the SQL command INSERT INTO. This command
will take the table name, table column, and column values as the input and insert the value in the base
table.
The INSERT command can also take the values directly from another table using ‘SELECT’ statement
rather than giving the values for each column. Through ‘SELECT’ statement, we can insert as many rows
as the base table contains.
Syntax:
BEGIN
INSERT INTO <table_name>(<column1 >,<column2>,...<column_n>)
VALUES(<valuel><value2>,...:<value_n>);
END;
The above syntax shows the INSERT INTO command. The table name and values are mandatory fields,
whereas column names are not mandatory if the insert statements have values for all the columns of the
table.
The keyword ‘VALUES’ is mandatory if the values are given separately, as shown above.
Syntax:
BEGIN
INSERT INTO <table_name>(<columnl>,<column2>,...,<column_n>)
SELECT <columnl>,<column2>,.. <column_n> FROM <table_name2>;
END;
The above syntax shows the INSERT INTO command that takes the values directly from the
<table_name2> using the SELECT command.
The keyword ‘VALUES’ should not be present in this case, as the values are not given separately.
DELETE
Below is the Syntax to delete table
Syntax:
Example:
Syntax:
SELECT expression
FROM tableName
[WHERE condition];
Example:
Clustered Index
Non-Clustered Index
A clustered index defines the order in which data is stored in the table which can be sorted in only one way.
So, there can be an only a single clustered index for every table. In an RDBMS, usually, the primary key
allows you to create a clustered index based on that specific column.
For example, a book can have more than one index, one at the beginning which displays the contents of a
book unit wise while the second index shows the index of terms in alphabetical order.
A non-clustering index is defined in the non-ordering field of the table. This type of indexing method helps
you to improve the performance of queries that use keys which are not assigned as a primary key. A non-
clustered index allows you to add a unique key for a table.
Characteristic of Clustered Index
You can sort the records and store A non-clustered index helps you to creates a
Use for clustered index physically in memory as logical order for data rows and uses pointers
per the order. for physical data files.
Allows you to stores data pages in the This indexing method never stores data pages
Storing method
leaf nodes of the index. in the leaf nodes of the index.
The size of the clustered index is quite The size of the non-clustered index is small
Size
large. compared to the clustered index.
Additional disk
Not Required Required to store the index separately
space
By Default Primary Keys Of The Table It can be used with unique constraint on the
Type of key
is a Clustered Index. table which acts as a composite key.
A clustered index can improve the It should be created on columns which are
Main feature
performance of data retrieval. used in joins.
The following query will be retrieved faster compared to the clustered index.
Clustered indexes are an ideal option for range or group by with max, min, count type queries
In this type of index, a search can go straight to a specific point in data so that you can keep
reading sequentially from there.
Clustered index method uses location mechanism to locate index entry at the start of a range.
It is an effective method for range searches when a range of search key values is requested.
Helps you to minimize page transfers and maximize the cache hits.
A non-clustering index helps you to retrieves data quickly from the database table.
Helps you to avoid the overhead cost associated with the clustered index
A table may have multiple non-clustered indexes in RDBMS. So, it can be used to create more
than one index.
A non-clustered index helps you to stores data in a logical order but does not allow to sort data
rows physically.
Lookup process on non-clustered index becomes costly.
Every time the clustering key is updated, a corresponding update is required on the non-clustered
index as it stores the clustering key.