DBMS Assignment
DBMS Assignment
1 b)
1.
(a) What do you mean by Relational Algebra? Explain the
following relational algebra operations with suitable
examples:
Attributes
Any real-world object is considered to be an entity that has self-existence and these entities in DBMS
have their own characteristics and properties known as attributes. Attributes give us additional
information about entities and help us to study their relationship within the specified system.
Attributes in an ER (Entity Relationship) model are always represented in an elliptical shape. There
are different types of attributes in DBMS: Simple, Composite, Single Valued, Multi-Valued, Stored,
Derived, Key, and Complex attributes. An entity may contain any number of attributes while one of
the attributes is considered to be a primary key attribute. An attribute can take its values from a set of
possible values for each entity instance in an ER model in DBMS.
Types of Attributes
Simple Attributes
Simple attributes in an ER model diagram are independent attributes that can't be classified further
and also, can't be subdivided into any other component. These attributes are also known as atomic
attributes.
Composite Attributes
Composite attributes have opposite functionality to simple attributes as we can further subdivide
composite attributes into different components or sub-parts that form simple attributes. In simple
terms, composite attributes are composed of
one or more simple attributes.
Single-Valued Attributes
Single-valued attributes are those attributes that consist of a single value for each entity instance and
can't store more than one value. The value of these single-valued attributes always remains the same,
just like the name of a person.
Derived Attributes
Derived attributes in DBMS are those attributes whose values can be derived from the values of other
attributes. They are always dependent upon other attributes for their value.
Key Attributes
Key attributes are special types of attributes that act as the primary key for an entity and they can
uniquely identify an entity from an entity set. The values that key attributes store must be unique and
non-repeating.
As part of the Enhanced ER Model, three new concepts were added to the existing ER Model,
1. Generalization
2. Specialization
3. Aggregation
Generalization
A generalization is a bottom-up approach in which two lower-level entities combine to form a
higher-level entity. In generalization, the higher-level entity can also combine with other lower-level
entities to make further higher-level entities. It’s more like a Superclass and Subclass system, but the
only difference is the approach, which is bottom-up. Hence, entities are combined to form a more
generalized entity, in other words, sub-classes are combined to form a super-class.
Generalization of entities helps in establishing relationships between the tables, as they have
common attributes to be identified. It makes the structuring of the database management simpler and
easy to identify all the relevant attributes.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity
can be broken down into two lower level entities. In specialization, a higher level entity may not have
any lower-level entity sets, it’s possible.
Aggregation
Aggregation is a process when relation between two entities is treated as a single entity. The specific
entities are combined because they do not make sense on their own. To establish a single entity,
aggregation creates a relationship that combines these entities. The resulting entity makes sense
because it enables the system to function well.
Aggregation in database management systems helps at ensuring that all entities are utilized within
the system. Without this operation, the trivial entities may become inoperative.
1 b)
1.
(a) What do you mean by Relational Algebra? Explain the
following relational algebra operations with suitable
examples:
Relational Algebra
Relational algebra is a theoretical way of manipulating data in a relational database. There was not
much attention on this subject until the publication of the relational model by Edgar F Codd, in the
publication of the relational model Edgar proposes that algebra be used as a basis for database query
languages.
Relational algebra allows you to perform query operations and data manipulation, such as selection,
projection, union, difference, Cartesian product and join. These operations are based on mathematical
sets and are applied to relationships that represent the tables and data in a relational database. We can
say that SQL is a database query language derived from relational algebra.
Basically, a relational algebra is a Set of Operations on Relations. It’s going to work on the relation
and does the operation and gives the result as a relation only. Input is One or More Relations but
Output is a only relation.
Relational Algebra provides a Theoretical Foundation for relational database. Relational algebra
allows us to understand database operations, and to get the required result from the given query.
Relational Algebra Operations
➢ Restrict (σ)
➢ Projection (π)
➢ Join (⋈)
➢ Division (÷)
➢ Union (∪)
➢ Intersection (∩)
➢ Minus (-)
➢ Cartesian Product (×)
Relational algebra is composed of Set Theory and some specific operations, such as,
Basic Operators
➢ Selection (σ) : filters the tuples of a relation based on a specific condition.
➢ Projection (π) : selects certain columns of a relation, discarding the others.
Set Theory
➢ Union (∪) : combines two relations, returning all distinct tuples.
➢ Intersection (∩) : returns the tuples common to two relations.
➢ Difference (-) : Returns tuples that are in one relation but not in the other.
Combination Operator
➢ Cartesian product (×) : Also known as a “cross join”, the Cartesian product combines all
tuples of two relations. The result is a new relation that contains all possible combinations of
tuples from the two input relations. For example, if the first relation has m tuples and the
second relation has n tuples, the Cartesian product will have m x n tuples.
➢ Join (⨝) : Join combines tuples from two or more relations based on a join condition. The
join condition is specified to compare the attribute values in the involved relationships.
Selection Operation (σ) : This operator is used to select rows from within the table that fit under the
predetermined rules. As an example below,
The command above will search from within the Books table and bring you the tuples where the
subject attribute matches health and price attribute matches 450. The logical operation ^ sign stands
for “and” operation.
Projection Operation (∏) : This operator will bring you one or multiple attributes of your choice.
But notice that when there are duplicates, this operator will print only one of them on the screen.
Division (÷) : This operator will take two parameters, and bring out all the tuples from the first
attribute that consists of the same tuples with the second parameter.
The example above will first create a table with only the names of the Zumba class members: Adriana
Lima and Kelly Gale. And then, it will go through the gym members list to find the tuples with the
exact same data in the name attribute: Adriana Lima, 3 months. Finally, it will exclude the name
attribute which is already common and the output table will only consist of the data “3 months”.
Cartesian Product (Χ) : This operator will combine the information from two different tables into
one. Each tuple from the first parameter (table) will match each tuple from the second parameter
(table), and therefore, each possible combination of tuples will be listed.
Let’s say that we have an id and an author attribute on both books and articles tables. Books table
consists of 1, 2, 3 for Charles Dickens, Jack London and Emily Dickinson while the Articles table
consists of A, B for Jack London and Shereen Bhan.
Books Χ Articles
Inner join (⋈) : This operator will compare the two tables, find a common attribute, and combine the
data based on that common attribute.
Let’s say that the Gym members table consists of name, membership term while the Zuma class
members table consists of name and sex. If the Gym members table consists of Adriana Lima, 3
months and Miranda Kerr, 4 months while Zumba class members consist of Adriana Lima, Female
and Kelly Gale, Female. The combination table will take the common attribute “name” and give us
the data table that consists of Adriana Lima, 3 months, Female.
Relational Calculus
Relational calculus, a non-procedural query language in database management systems, guides users
on what data is needed without specifying how to obtain it. Commonly utilized in commercial
relational languages like SQL-QBE and QUEL, relational calculus ensures a focus on desired data
without delving into procedural details, promoting a more efficient and abstract approach to querying
in relational databases.
Those languages that only care about What to get from the database without getting into how to get
the results are called Declarative Language. Relational Calculus is a Declarative Language. So
Relational Calculus is a Declarative Language that uses Predicate Logic or First-Order Logic to
determine the results from Database.
Customer Table
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345
Example : Write a TRC query to get all the data of customers whose zip code is 12345.
TRC Query :
{t \| t ∈ Customer ∧ t.Zipcode = 12345}
Workflow of query : The tuple variable "t" will go through every tuple of the Customer table. Each
row will check whether the Cust_Zipcode is 12345 or not and only return those rows that satisfies the
Predicate expression condition. The TRC expression above can be read as "Return all the tuples which
belong to the Customer Table and whose Zipcode is equal to 12345."
1 Rohit 12345
4. Amit 12345
Let's take the example of Customer Database and try to understand DRC queries with some examples.
Customer Table
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345
Example : Write a DRC query to get the data of all customers with Zip code 12345.
DRC query :
{<x1,x2,x3> \| <x1,x2> ∈ Customer ∧ x3 = 12345 }
Workflow of Query : In the above query x1,x2,x3 (ordered) refers to the attribute or column which
we need in the result, and the predicate condition is that the first two domain variables x1 and x2
should be present while matching the condition for each row and the third domain variable x3 should
be equal to 12345.
1 Rohit 12345
4 Amit 12345
2. Explain DDL and DML with proper syntax and examples.
Characteristics of DDL
1. Schema Definition : DDL commands are used to define the schema of a database, which
includes the structure of tables, columns, data types, and constraints. This schema forms the
blueprint of how data is organized within the database.
2. Irreversible Operations : Many DDL operations are irreversible, meaning that once they are
executed, the changes cannot be easily undone without potentially losing data. This makes it
essential to execute DDL commands with caution.
3. Implicitly Committed : DDL commands are automatically committed, which means that
once a DDL command is executed, the changes are immediately saved to the database. There
is no need to explicitly commit the transaction, and rolling back the operation is not
straightforward.
4. Impact on Metadata : DDL commands directly impact the database’s metadata, altering the
definitions of tables, columns, and other objects. These changes are reflected in the system
catalog, which stores metadata information.
➢ ALTER : alter command is used to alter/modify the structure of a database like adding new
columns, removing a column, or changing the datatype of columns.
//adding a new column in an existing table.
ALTER TABLE table_name ADD column_name datatype(size);
➢ TRUNCATE : truncate command is used to delete the data from a table. It means making an
empty table and the structure of the database is still the same.
TRUNCATE TABLE table_name;
➢ RENAME : rename command is used for a new name to the existing database objects.
RENAME TABLE old_table_name TO new_table_name;
Example of DDL
Importance of DDL
DDL commands are vital for database design and administration. They allow developers to set up and
maintain the structure of the database, ensuring data integrity and optimizing performance. Proper use
of DDL commands helps in organizing data logically, enforcing data constraints, and managing access
permissions.
Characteristics of DML
1. Data Interaction : DML commands are focused on manipulating the actual data within the
database tables. They enable users to perform operations like inserting new records, updating
existing ones, and deleting unwanted data.
2. Reversible Operations : DML operations are generally reversible, meaning changes can be
rolled back if needed. This is particularly useful in maintaining data consistency and
recovering from accidental modifications.
3. Transaction Control : DML commands support transaction control, allowing users to group
multiple operations into a single transaction. Transactions can be committed or rolled back as
a unit, ensuring that either all changes are applied or none are.
4. Selective Operations : DML commands often include powerful querying capabilities that
allow users to filter and select specific data based on various criteria. This enables efficient
data retrieval and manipulation tailored to specific needs.
List of DML commands,
➢ INSERT : insert command is used to insert data into a table.
INSERT INTO table_name VALUES(column1_value, column2_value..);
➢ UPDATE : update command is used to update /change the existing data from a table
UPDATE table_name SET column_name = value WHERE conditions;
Example of DML
UPDATE student
SET mark = 95.7
WHERE id = 1;
DELETE
FROM student
WHERE id = 2;
Importance of DML
DML commands are critical for the practical use of a database. They empower users to manage and
interact with the data effectively, enabling dynamic data operations and real-time updates. Proper use
of DML ensures that data remains accurate, up-to-date, and relevant to the business needs. It also
allows for sophisticated querying and reporting, which are essential for decision-making and analysis.
3. Discuss various integrity constraints that are applied to a
database.
Integrity Constraints
Integrity constraints are a set of rules associated when performing data inserting, deleting, updating
for a database. For example, if we consider a school, the set of rules that are to be followed by us in
the school is our code of conduct. In the similar way Integrity constraints are the set of rules that have
to be followed when performing certain operations to a database.
These Integrity Constraints also known as ICs in shorter format are basically of 4 types. Namely,
Domain Constraint, Entity Integrity Constraint, Referential Integrity Constraint and Key Constraint.
Domain Constraint
After mapping our conceptual database design into the logical database design we get a set of
relations(tables). So in these relations we start entering the data that is required for the database. In
these relations for each attribute there is a column allocated. When entering data to these columns we
have to know what type of data we are going to insert. There are few data types such as integer, string,
date, time etc. Inserting the correct type of data into the correct field is the first rule which is
known as Domain Constraint.
Key Constraint
In a relation, to identify each entry uniquely, there is a primary key value for each row. So Key
constraint means that the primary key values of a single relation should not be repeated.
BCNF (Boyce Codd Normal Form) is an advanced version of the third normal form
(3NF), and often, it is also known as the 3.5 Normal Form. 3NF doesn't remove
100% redundancy in the cases where for a functional dependency (say, A->B), A is
not the candidate key of the table. To deal with such situations, BCNF was
introduced.
BCNF is based on functional dependencies, and all the candidate keys of the relation
are taken into consideration. BCNF is stricter than 3NF and has some additional
constraints along with the general definition of 3NF.
Similarly, if the relation is found to be in 3NF, it is also in 2NF and 1NF. The 3NF in
DBMS has more restrictions and strict constraints than the first two normal forms,
but it is less strict than the BCNF. This shows that the restriction always increases
as we traverse down the hierarchy.
● It should satisfy all the conditions of the Third Normal Form (3NF).
● For any functional dependency (A->B), A should be either the super key or
the candidate key. In simple words, it means that A can't be a non-prime
attribute if B is given as a prime attribute.
In this example, we have a relation R with three columns: Id, Subject, and
Professor. We have to find the highest normalization form, and also, if it is not in
BCNF, we have to decompose it to satisfy the conditions of BCNF.
Id Subject Professor
103 C# Lakshay
● Using Id and Subject together, we can find all unique records and also the
other columns of the table. Hence, the Id and Subject together form the
primary key.
● The table is in 1NF because all the values inside a column are atomic and of
the same domain.
● We can't uniquely identify a record solely with the help of either the Id or the
Subject name. As there is no partial dependency, the table is also in 2NF.
● There is no transitive dependency because the non-prime attribute i.e.,
Professor, is not deriving any other non-prime attribute column in the table.
Hence, the table is also in 3NF.
● There is a point to be noted that the table is not in BCNF (Boyce-Codd
Normal Form).
As we know each professor teaches only one subject, but one subject may be taught
by multiple professors. This shows that there is a dependency between the subject &
the professor, and the subject is always dependent on the professor (professor ->
subject). As we know the professor column is a non-prime attribute, while the
subject is a prime attribute. This is not allowed in BCNF in DBMS. For BCNF, the
deriving attribute (professor here) must be a prime attribute.
the Student table and the Professor table to satisfy the conditions of BCNF.
Student Table
1 101 Mayank
2 101 Kartik
3 102 Sarthak
4 103 Lakshay
5 104 Mayank
Professor Table
Professor Subject
Mayank Java
Kartik C++
Sarthak Java
Lakshay C#
Professor is now the primary key and the prime attribute column, deriving
the subject column. Hence, it is in BCNF.
Difference Between 3NF and BCNF
3NF BCNF
3NF stands for Third Normal Form. BCNF stands for Boyce Codd Normal Form.
In 3NF the functional dependencies are already In BCNF the functional dependencies are
in 1NF and 2NF. already in 1NF, 2NF and 3NF.
In 3NF there is preservation of all functional In BCNF there may or may not be preservation
dependencies. of all functional dependencies.
3NF can be achieved without losing any For obtaining BCNF we may lose some
information from the old table. information from the old table.
3NF and BCNF are two types or forms of normalization. 3NF is used for reducing the redundancy
(data duplication) and anomalies by ensuring that there should be no transitive dependencies . BCNF
has more strict rules than 3NF. We can say that, Every BCNF relation is also in 3NF, but every 3NF
relation is not in BCNF.