Chapter 5 Relational Data Model
Chapter 5 Relational Data Model
Relational data model is the primary data model, which is used widely
around the world for data storage and processing. This model is simple and
it has all the properties and capabilities required to process data with
storage efficiency.
Tuple − A single row of a table, which contains a single record for that
relation is called a tuple.
Relation key − Each row has one or more attributes, known as relation
key, which can identify the row in the relation (table) uniquely.
Constraints
Every relation has some conditions that must hold for it to be a valid
relation. These conditions are called Relational Integrity Constraints.
There are three main integrity constraints −
Key constraints
Domain constraints
In a relation with a key attribute, no two tuples can have identical values for key
attributes.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age can
only be a positive integer. The same constraints have been tried to employ
on the attributes of a relation. Every attribute is bound to have a specific
range of values. For example, age cannot be less than zero and telephone
numbers cannot contain a digit outside 0-9.
Mapping Entity
Entity's attributes should become fields of tables with their respective data types.
Mapping Relationship
Add the primary keys of all participating Entities as fields of table with their
respective data types.
Declare a primary key composing all the primary keys of participating entities.
Mapping Process
Mapping Process
Declare primary key of higher-level table and the primary key for lower-level
table.
Example
Multivalued dependency
Multivalued dependency occurs when there are more than
one independent multivalued attributes in a table.For example: Consider
a bike manufacture company, which produces two colors (Black and white)
in each model every year.
Here columns manuf_year and color are independent of each other and
dependent on bike_model. In this case these two columns are said to be
multivalued dependent on bike_model. These dependencies can be
represented like this:
X->Y
Y does not ->X
Y->Z
{Book} ->{Author} (if we know the book, we knows the author name)
Normalization
If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at
all.
Each attribute must contain only a single value from its pre-defined domain.
We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any of
the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in
Second Normal Form.
o X is a super-key or,
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and
only prime key attribute. We find that City can be identified by Stu_ID as
well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
If we know the zip code 20001, we can determine the city is Washington DC.
To bring this relation into third normal form, we break the relation into two
relations as follows −
Relational Algebra
Relational database systems are expected to be equipped with a query
language that can assist its users to query the database instances. There
are two kinds of query languages − relational algebra and relational
calculus.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses
operators to perform queries. An operator can be either unary or binary.
They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate
results are also considered relations.
Select
Project
Union
Set different
Cartesian product
Rename
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is
450.
Output − Selects tuples from books where subject is 'database' and 'price' is
450 or those books published after 2010.
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
For example −
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
Notation − r Χ s
r Χ s = { q t | q ∈ r and t ∈ s}
Output − Yields a relation, which shows all the books and articles written
by Elmasri.
The results of relational algebra are also relations but without any name.
The rename operation allows us to rename the output relation. 'rename'
operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
DBMS - Joins
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,..
,Bn) such that the attributes don’t have anything in common, that is R1 ∩
R2 = Φ.
Student
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
Student_detail
Equijoin
Natural Join ( ⋈ )
Natural join does not use any comparison operator. It does not concatenate
the way a Cartesian product does. We can perform a Natural Join only if
there is at least one common attribute that exists between two relations. In
addition, the attributes must have the same name and domain.
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join
includes only those tuples with matching attributes and the rest are
discarded in the resulting relation. Therefore, we need to use outer joins to
include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins − left outer join, right outer
join, and full outer join.
All the tuples from the Left relation, R, are included in the resulting relation.
If there are tuples in R without any matching tuple in the Right relation S,
then the S-attributes of the resulting relation are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
All the tuples from the Right relation, S, are included in the resulting
relation. If there are tuples in S without any matching tuple in R, then the
R-attributes of resulting relation are made NULL.
Courses HoD
A B C D
Courses HoD
A B C D